Page 1
1Chapter 1: Introduction
Unit 1
1 INTRODUCTION
Unit Structure
1.0 Objectives
1.1 Origin
1.2 Digital Image Processing
1.3 Applications and Examples of Digital Image Processing
1.4 Fundamental steps in Digital Image Processing
1.5 Components of Digital Image Processing
1.6 Elements of Visual Perception
1.7 Image Sensing and Acquisition
1.8 Image Sampling and Quantization
1.9 Basic Relationship between pixels
1.10 Summary
1.11 Review Questions
1.12 References
1.0 Objectives
1. To enhance and improve the information of the picture for better interpretation and understanding
2. Processing the acquired images for efficient data storage, transmission over
network and extraction of right information from the picture.
1.1 Origin
The first industry to use digital images was the newspaper industry. The pictures
were sent by submarine cable between London and New York .
1.2 Digital Image Processing
The human eye -brain mechanism produces the best imaging system. An image is
an object or visual which one sees. It is a 2 -dimensional function of a 3 -dimensional munotes.in
Page 2
2IMAGE PROCESSING
world that surrounds us. Basically, images are 2 -D light intensity function f(x, y)
where x an d y are spatial or plane co -ordinates and the amplitude at any co -
ordinates pair (x, y) is defined as the intensity or gray level of the image at that
point .
If x, y and the intensity values are all finite and discrete, then the image is known
as a digita l image.
The digital image is composed of a finite number of elements which has a particular
location and value. These elements are called picture elements or pixels or pels.
Digital Image Processing is modification and enhancement of images by applying
various filtering and enhancement techniques for perceiving better visual information and perform various analysis on the images.
Image processing accepts images as inputs and generates a modified image as
output for human perception or the modified image may provide useful information.
1.3 Applications and Examples of Digital Image Processing
To develop a basic understanding of the breadth and length of the applications, the
image processing applications are categorized according to their sources.
The principal source of energy for images is the electromagnetic energy spectrum.
It also includes acoustic, ultrasonic, electronic etc.
Synthetic images used for modelling and visualization are generated in computer.
The most common applications of digital image processing are
1. Gamma -ray Imaging
Imaging based on gamma rays are mostly for nuclear medicine, astronomical
observations. In this type of imaging, images are produced from the emissions collected from gamma -ray detectors.
2. X-ray Imaging
X-ray imaging is used mainly for medical diagnostics and industrial imaging.
It is also used for astronomical applications.
3. Ultraviolet Imaging
It is used for lithography, industrial inspection, microscopy, lasers, biological
imaging and astronomical observations.
munotes.in
Page 3
3Chapter 1: Introduction
4. Wide applications include
a. Remote Sensing
b. Light microscopy
c. Astronomy
d. Weather observation and prediction
e. Visual inspection of manufactured goods
f. Traffic monitoring and surveillance
g. License plate character recognition
h. Currency recognition
i. Finger -print and face recognition
j. Radar imaging to explore inaccessible regions of the Earth’s surface.
k. Mineral and oil exploration
l. Ultrasound imaging of foetus
The list of applications is limited only for writing here, the applications of
digital image processing is wide and the scope is large.
1.4 Fundamental Steps in Digital Image Processing
Fig:1.1 Image Courtesy://Digital Image Processing -Gonzalez
munotes.in
Page 4
4IMAGE PROCESSING
1. Image acquisition
The image is captured by a sensor like camera or any analog device and
digitized if the output of the camera or sensor is not already in digital form,
using analogue -to-digital convert er. It involves pre -processing of images.
2. Image Enhancement
The process of manipulating an image so that the result is more suitable than the original for specific applications. The idea behind enhancement techniques is to bring out details that are hidden, or simple to highlight certain
features of interest in an image
3. Image Restoration
Deals with improving the appearance of an image.
Based on mathematical or probabilistic models of image degradation.
4. Colour Image Processing
Use the colour of the image to extract features of interest in an image.
Understand the basics concepts of color models and color processing in
digital domain.
5. Wavelets
Wavelets are the foundation of representing images in various degrees of
resolution. It is used for image data compr ession and for representation of
images in smaller regions.
6. Compression
Deals with various techniques used for reducing the storage required to save
an image in digital form or the bandwidth required to transmit the images.
7. Morphological Processing
Deals with tools for extracting image components that are useful in the
representation and description of shape. In this step, there would be a
transition from processes that output images, to processes that output image
attribut es.
8. Segmentation
Segmentation partitions an image into its constituent parts or objects.
Autonomous segmentation is the most difficult tasks in digital image
processing. munotes.in
Page 5
5Chapter 1: Introduction
The more accurate the segmentation, the better automated object classification.
9. Feature extraction
It consists of feature detection and feature description. Feature detection refers to finding a feature in an image, region or boundary. Feature description assigns quantitative attributes to the detected features.
10. Image pattern classifica tion
The process that assigns a label to an object based on its feature descriptors.
There are various classification algorithms like correlation, Bayes classifiers
to identify and predict the class label for the object.
1.5 Components of an Image Processing Systems
Fig 1.2: Components of a general -purpose image processing systems
(Image courtesy:// Digital Image Processing -Fourth Edition)
1. Image Sensors
Two subsystems are required to acquire digital images.
The first is the physical device that is sensitive to the energy radiated by the
object we wish to image (Sensor). The second, called a digitizer, is a device
for converting the output of the physical sensing device into digital form. The
digitizer converts the electrical outputs t o digital data.
munotes.in
Page 6
6IMAGE PROCESSING
2. Specialized Image Processing Hardware
It usually consists of the digitizer, and the hardware that performs other
primitive operations, such as an arithmetic logic unit (ALU), which performs
arithmetic and logical operations in parallel on entire images. This type of
hardware is called a s frontend subsystem, and its most distinguishing
characteristic is speed. In other words, this unit performs functions that
require fast data throughputs that the typical main computer cannot handle.
3. Computer
The computer in an image processing system is a general -purpose computer
and can range from a PC to a supercomputer.
4. Image Processing Software
The software performs tasks with the help of specialized modules. There are
many software’s available commercially.
5. Mass Storage
Digital storage for image processing applications basically is divided into
three principal categories.
a) Short -term storage for use during processing
b) On-line storage for fast recall
c) Archival storage which is characterized by infrequent access.
Storage is measured in bytes (Kbytes, Mbytes, Gbytes and Tbytes)
6. Image Displays
Monitors are driven by the outputs of the image and graphics display cards
that are an integral part of a computer system. There are variants in monitor
displays.
7. Hardcopy devices
Used for recording images, include laser printers, film cameras, heat -
sensitive devices, inkjet units and digital units, such as optical and CD -ROM
disks.
8. Networking and cloud communication
Transmission bandwidth has improved due to opt ical fibre and other cloud
technologies.
munotes.in
Page 7
7Chapter 1: Introduction
1.6 Elements of Visual Perception
Image processing applications produce images that are to be viewed by human
observers.
The Elements of Visual Perception are
1) Structure of the Human eye
2) Image formation in the eye
3) Brightness adaptation and discrimination
1) Structure of the Human eye
Fig 1.3: Simplified diagram of the human eye. (Image Courtesy: nptel)
The first part of the visual system is the eye. Its form is nearly spherical and
its diameter is approximately 20 mm. Its outer cover consists of the ‘cornea'
and ‘sclera'
The cornea is a tough transparent tissue in the front part of the eye. The sclera
is an opaque membrane, which is continuous with cornea and covers the
remainder of the eye. Direct ly below the sclera lies the “choroids”, which has
many blood vessels. At its anterior extreme lies the iris diaphragm. The light
enters in the eye through the central opening of the iris, whose diameter varies
from 2mm to 8mm, according to the illuminatio n conditions. Behind the iris
is the “lens” which consists of concentric layers of fibrous cells and contains
up to 60 to 70% of water. Its operation is similar to that of the man -made
munotes.in
Page 8
8IMAGE PROCESSING
optical lenses. It focuses the light on the “retina” which is the inner most
membrane of the eye.
Retina has two kinds of photoreceptors: cones and rods. The cones are highly
sensitive to color. Their number is 6 -7 million and they are mainly located at
the central part of the retina. Each cone is connected to one nerve end.
Cone vision is the photopic or bright light vision. Rods serve to view the
general picture of the vision field. They are sensitive to low levels of
illumination and cannot discriminate colors. This is the scotopic or dim -light
vision. Their number is 75 to 150 million and they are distributed over the
retinal surface. Several rods are connected to a single nerve end. This fact
and their large spatial distribution explain their low resolution.
Both cones and rods transform light to electric stimulus, which is carried
through the optical nerve to the human brain for the high -level image
processing and perception.
2) Image formation in the eye
x In the human eye, the distance between the centr e of the lens and the
imaging sensor is fixed
x Lens in the eye is flexible
x Shape controlled by muscles
x To focus on distance objects – Muscles flatten lens
x To focus on close objects – Muscles allow lens to thicken
3) Brightness Adaptation and Discrimination
x Digital Images are displayed as a discrete set of intensi ty
x Eye’s ability to discriminate intensities at a given adaptation level is an
important consideration when displaying images .
x Range of brightness's that can be discriminated simultaneously is small
in comparison to total adaptation range.
x For a given set of conditions the current sensitivity level of the visual
system is called the brightness adaptation level. munotes.in
Page 9
9Chapter 1: Introduction
Fig 1.4: Range of subjective brightness adaptation level
Weber Ratio
The ability of the eye to discriminate between changes in light intensity at any
specific adaptation level.
7KHTXDQWLW\ǻ,F, ǻ,FLVWKHLQFUHPHQWRILOOXPLQDWLR n discriminable 50% of the
time with background illumination I which is constant, is called the Weber Ratio.
Small weber ratio - good brightness discrimination
Large weber ratio -poor brightness discrimination
Visual Perception
Perceived brightness is not a simple function of intensity
Mach bands - Visual system tends to undershoot or overshoot around the boundaries
of regions with different intensities.
Simultan eous Contrast - regions perceived interest does not depend on intensity.
Optical illusions - Eyes fill in non -existing information or perceives geometry
properties of objects in an incorrect manner.
1.7 Image Sensing and Acquisition
Images are generated by t he combination of illumination of source and the
reflection or absorption of energy from that source by element of the scene being
imaged.
munotes.in
Page 10
10IMAGE PROCESSING
There are three principal sensor arrangements that can be used to transform incident
energy into digital images.
Incoming energy is transformed into a voltage pulse by input electric power and
sensor response where a digital quantity is obtained by digitizing the response.
1.71. Image Acquisition using a single sensing element
1.7.2 Image Acquisition using Line sensor
1.7.3 Image Acquisition using Array sensor
Image Acquisition using sensor strips
munotes.in
Page 11
11Chapter 1: Introduction
The most common sensor is the photodiode constructed of silicon materials and
output voltage waveform proportional to light. Using a filter in front of the sensor
improves its selectively.
In order to generate a 2 -D image using a single sensor, there have to be relative
displacements in both the x - and y -directions between the sensor and the area to be
imaged. An arrangement used in high precision scanning, where a film negative is
mounted onto a drum w hose mechanical rotation provides displacement in one
dimension. The single sensor is mounted on a lead screw that provides motion in
the perpendicular direction. Since mechanical motion can be controlled with high
precision, this method is an inexpensive (but slow) way to obtain high -resolution
images.
1.8 Image Sampling and Quantization
The output of most sensors is a continuous voltage waveform whose amplitude and
spatial behaviour are related to the physical phenomenon being sensed. To create a
digital image, it is essential to convert the continuous sensed data into digital form.
This involves two processes: sampling and quantization.
1.1.1 Image Sampling
An image may be continuous with respect to the x and y coordinates and also in
amplitude. To convert it into digital form we have to sample the function in both
coordinates and in amplitudes.
Digitalizing the coordinate values is called sampling.
Digita lizing the amplitude values is called quantization.
There is a continuous image along the line segment AB. To sample this function,
we take equally spaced samples along line AB. The location of each sample is given
by a vertical tick back (mark) in the bot tom part. The samples are shown as block
squares superimposed on function the set of these discrete locations gives the
sampled function.
In order to form a digital image, the gray level values must also be converted
(quantized) into discrete quantities. S o, we divide the gray level scale into eight
discrete levels ranging from black to white. The vertical tick mark assigns the
specific value assigned to each of the eight level values. The continuous gray levels
are quantized simply by assigning one of the eight discrete gray levels to each
sample. The assignment it made depending on the vertical proximity of a simple to
a vertical tick mark. munotes.in
Page 12
12IMAGE PROCESSING
Fig 1.5: Sampling
Fig 1.6: Quantization
munotes.in
Page 13
13Chapter 1: Introduction
Example
Fig 1.7 a) Continuous image projected onto a sensor array
b) Result of sampling and quantization
Digital Image
A digital image f [m, n] described in a 2D discrete space is derived from an
analogue image f (x.y) in a 2D continuous space through a sampling process that is
frequently referred to as digitization.
The 2D continuous image f (x, y) is divided into N rows and M columns. The
intersection of a row and a column is termed a pixel. The value assigned to the integer coordinates [m, n] with {m=0,1, 2…, M -1}and {n=0,1, 2..., N -1}is F
[m, n].
1.9 Basic Relationship between pixels
Pixel Relationships
Neighbours of a pixel
A pixel p at coordinate (x, y) has four horizontal and vertical neighbour whose
coordinate can be given by
(x+1, y) (x -1, y) ( x, y + 1) (x, y -1)
This set of pixels is called the 4 -neighbours of p and is denoted by n4(p), Each
pixel is at a unit distance from (x, y) and some of the neighbours of P lie outside
the digital image or (x, y) is on the border of the image .
The four-diagonal neighbour of P have co -ordinates
(x+1, y+1), (x+1, y+1), (x-1, y+1), (x-1, y-1)
And are denoted by nd(p) these points, together with the 4 -neighbours are called
8 – neighbours of P denoted by n8(p)
munotes.in
Page 14
14IMAGE PROCESSING
Adjacency
Let V be the set of gray–level values used to define adjacency in a binary image, if
V= {1} we are referencing to adjacency of pixel with value. Three types of
adjacencies occur
4-Adjacency – two-pixel P and Q with value from V are 4 –adjacency if A is in the
set N4(P)
8- Adjacency – two-pixel P and Q with value from V are 8 –adjacency if A is in the
set N8(P)
M-adjacency –two-pixel P and Q with value from V are m– adjacency if
x Q is in N4 (p) or
x Q is in Nd (q) and the set N4(p) ŀ N4(q) has no pixel whose values
are from V
x Distance measures
For pixel p, q and z with coordinate (x, y), (s, t) and (v, w) respectively D is a
distance function or metric if
'>ST@2^ D [p.q] = O iff p=q} D
[p.q] = D [p.q] and
'>ST@2^ D [p.q] + D (q, z)
The Euclidean Distance between p and is defined as De ( p, q) = I y – t I
The D4 Education Distance b etween p and is defined as De ( p, q) = I y – t I
1.10 Summary
In this chapter, we learned about the basic concepts in digital image processing .
The various components comprise and contribute to image acquisition. Image
perception, brightness, and adaptation. Sampling and quantization technique to
discretize image. We also learned about the basic relationship between pixels
1.11 Review Questions
1. Explain Digital Image Processing briefly?
2. Describe various methods of image sensing and acquisition.
3. Explain pixel and its relationships with its neighbourhood pixels.
4. Write a short note on Image Sampling and Quantization.
munotes.in
Page 15
15Chapter 1: Introduction
1.12 References for in detail study
1. www.ImageProcessingplace.com
2. https://nptel.ac.in/courses/106/105/106105032/
3. https://nptel.ac.in/courses/117/105/117105079/
4. All Images courtesy from Digital Image Processing by Gonzalez and Woods,
4th Edition
5. Figure 1.1 adapted from
https://www.bharathuniv.ac.in/coll eges1/downloads/courseware_ece/notes/BEC0
07%20%20 -Digital%20image%20processing.pdf
munotes.in
Page 16
16IMAGE PROCESSING
Unit 1
2 INTENSITY TRANSFORMATIONS
AND SPATIAL FILTERING
Unit Structure
2.0 Objectives
2.1 Introduction
2.2 Basics Intensity Transformation functions
2.3 Histogram Processing
2.4 Fundamentals of spatial filtering
2.4.1 Smoothing (Low pass) Spatial filters
2.4.2 Sharpening (High pass) Spatial filters
2.4.3 Using LAPLACIAN - Second derivative for image sharpening
2.4.4 Unsharp masking and high boost filtering
2.5 High pass, Band reject and band pass filters
2.6 Spatial Intensity transformation and filtering methods using fuzzy techniques
2.7 Summary
2.8 Review Questions
2.9 References
2.0 Objective s
1. To understand the meaning of spatial domain and apply techniques for
intensity -based transformations
2. To know the concept of the histogram and use it for image enhancement
3. To understand the mechanics of spatial filtering and use combination methods or enh ancing image
4. Use fuzzy techniques to perform spatial filtering methods
2.1 Introduction
The principal objective of enhancement is to process an image so that the result is
more suitable than the original image for a specific application. Image enhanceme nt
approaches fall into two broad categories munotes.in
Page 17
17Chapter 2: Intensity Transformations and Spatial Filtering
1. Spatial domain methods
2. Frequency domain methods
The chapter mainly focusses on Spatial domain methods. All methods and
techniques covered here are for spatial domain.
The term spatial domain refers to the image plane itself and approaches in these
categories are based on direct manipulation of pixel in an image.
Spatial domain process is denoted by the expression
g (x, y) =T [f (x, y)]
f (x, y) - input image, T - operator on f, defined over some neighbourhoo d of
f (x, y)
g (x, y) -processed image
The neighbourhood of a point (x, y) can be explained by using as square or
rectangular sub image area centred at (x, y).
Fig 2.1 A 3 x 3 neighbourhood about a point (x0, y0) in an image. The
neighbourhood is moved from pixel to pixel in the image to generate an output
image
The center of sub image is moved from pixel to pixel starting at the top left corner.
The operator T is ap plied to each location (x, y) to find the output g at that location.
The process utilizes only the pixel in the area of the image spanned by the
neighbourhood.
munotes.in
Page 18
18IMAGE PROCESSING
2.2 Basic Intensity Transformation Functions
It is the simplest form of the transformation when the neighbourhood is of size 1 x
1. In this case g depends only on the value of f at (x, y) and T becomes an intensity
or gray level transformation function of the form
S=T(r)
r-Denotes the gray level of f (x, y)
s-Denotes the gray level of g (x, y) at a ny point (x, y)
Because enhancement at any point in an image deepens only on the gray level at
that point, technique in this category is referred to as point processing.
2.2.1 Point Processing
Contract stretching -It produces an image of higher contrast than the original one.
The operation is performed by darkening the levels below m and brightening the
levels above m in the original image.
Fig: 2.2 Contrast Stretching
In this technique the value of r below m are compressed by the transformation
function into a narrow range of s towards black . The opposite effect takes place
for the values of r above m.
Thresholding function – It is a limiting case where T(r) produces a two levels
binary image.
munotes.in
Page 19
19Chapter 2: Intensity Transformations and Spatial Filtering
The values below m are transformed as black and above m are transformed as
white.
Fig 2.3 Thresholding function
2.2.2 Basic Intensity Level Transformation
These are the simplest image enhancement techniques
2.2.2.1. Image Negative – The negative of in image with gray level in the range
[0, L -1] is obtained by using the negative transformation.
The expression of the transformation is s= L-1-r
Reverting the intensity levels of an image in this manner produces the equivalent
of a photographic negative. This type of processing is practically suited for
enhancing white or gray details embedded in dark regions of an image especially
when the black areas are dominant in size.
Fig 2.4 Basic intensity transformation functions
munotes.in
Page 20
20IMAGE PROCESSING
2.2.2.2 Log transformations -
The general form of log transform is S=c log(1+R)
Where R >= 0
This transformation maps a narrow range of gray level values in the input image
into a wider range of output gray levels. The opposite is true for higher values of
input levels. These transformations are used to expand the values of dark pixels in
an image while compressing the higher -level values. The opposite is true for
inverse log transformation.
The log transformation function has an important characteristic that it compresses
the dynamic range of images with large variations in pixel values.
E.g.- Fourier spectrum
Fig 2.5 a) Fourier Spectrum b) Log transformed Fourier spectrum
2.2.2.3 Power law transformation
Power law transformation has the basic function
S= c r ஔ
Where c and ஔare positive constants.
Power law curves with fractional values of ஔ map a narrow range of dark input
values into a wider range of output values, with the opposite being true for higher
values of input gray levels. We may get various curves by varying values of ஔ
Fig: 2.6 Shape of the curve formed by the gamma equation
munotes.in
Page 21
21Chapter 2: Intensity Transformations and Spatial Filtering
A variety of devices used for image capture, printing and display respond according
to a power law. The process used to correct this power law response phenomenon
is called gamma correction.
For E.g. -CRT devices have intensity to voltage response that is a power function
Gamma correction is important if displaying an image accurately on a computer
screen is of concern. Images that are not corrected properly can look either
bleached out or too dark.
Color phenomenon also uses this concept of gamma correction. It is becoming
more popular due to use of images over the internet.
It is important in general purpose contract manipulation.
To make an image black we use ஔ!and ஔ <1 for white image.
2.2.2.4 Piece wise Linear transformation functions -
The principal advantage of piecewise linear functions is that these functions can
be arbitrarily complex. But their specification requires considerably more user
input
Contrast Stretching -
It is the simplest piecewise linear transformation function.
We may have various low contrast images and that might result due to various
reasons such as lack of illumination, problem in imaging sensor or wrong setting
of lens aperture during image acquisition.
The idea behind contrast stretching is to increase the dynamic range of gray levels
in the i mage being processed.
Fig 2.7 Piecewise linear transformation function
munotes.in
Page 22
22IMAGE PROCESSING
The location of points (r1, s1) and (r 2, s2) control the shape of the curve
a) If r1=r2 and s 1=s2, the transformation is a linear function that deduces no
change in gray levels.
b) If r1=s1, s1=0 , and s2=L -1, then the transformation become a
thresholding function that creates a binary image
c) Intermediate values of (r1, s1) and (r2, s2) produce various degrees of
spread in the gray value of the output image thus effecting its contract.
*HQHUDOO\UUDQGVVVRWKDWWKHIXQFWLRQLVVLQJOHYDOXHGDQG
monotonically increasing
Gray Level Slicing - Highlighting a specific range of gray levels in an image is
often desirable
For example, when enhancing features such as masses of water i n satellite image
and enhancing flaws in x - ray images.
There are two ways
1) This method is to display a high value for all gray level in the range of
interest and a low value for all other gray level
2) Second method is to brighten the desired ranges of gray levels but
preserve the background and gray level tonalities in the image
Fig 2.8 a) Highlights the range [A, B] b) Highlights the range [A, B] and
preserves background details
2.2.2.5 Bit Plane Slicing
It is important to highlight the contribution made to the total image appearance by
specific bits. Suppose that each pixel is represented by 8 bits.
munotes.in
Page 23
23Chapter 2: Intensity Transformations and Spatial Filtering
Imagine that an image is composed of eight 1 -bit planes ranging from bit plane 0
for the least significant bit to bit plane 7 for the most significant bit. In terms of
8-bit bytes, plane 0 contains all the lowest order bits in the image and plane 7
contains all the high order bits.
Fig 2.9 Bit -planes of an image
2.3 Histogram Processing
The histogram of a digital image with gray levels in the range [0, L -1] is a
discrete function of the form H(rk)=nk
where rk is the kth gray level and nk is the number of pixels in the image having
the level rk.
A normalized histogram is given by the equation
P(rk)=nk/n for k=0,1, 2, . . ., L -1
P(rk) gives the estimate of the probability of occurrence of gray level rk. The sum
of all components of a normalized histogram is equal to 1.
The histogram plots are simple plots of H(rk)=nk versus rk.
Fig 2.10 Four image types and their corresponding histograms. (a) dark; (b) light; (c) low contrast; d) high contrast
munotes.in
Page 24
24IMAGE PROCESSING
In the dark image the components of the histogram are concentrated on the low
(dark) side of the gray scale. In case of bright image, the histogram components are
biased towards the high side of the gray scale.
The histogram of a low contrast image will be narrow and will be centered towards
the middle of the gray scale.
The components of the histogram in the high contrast image cover a broad range of
the gray scale. The net effect of this will be an image that shows a great deal of
gray levels details and has high dynamic range.
2.3.1 Histogram Equalization
Histogram equalization is a com mon technique for enhancing the appearance of
images . Suppose we have an image which is predominantly dark. Then its
histogram would be skewed towards the lower end of the grey scale and all the
image detail are compressed into the dark end of the histogram. If we could ‘stretch
out’ the grey levels at the dark end to produce a more uniformly distributed
histogram then the image would become much clearer.
Let there be a continuous function with r being gray levels of the image to be
enhanced.
The ra nge of r is [0, 1] with r=0 repressing black and r=1 representing white.
The transformation function is of the form.
S=T(r) where 0It produces a level s for every pixel value r in the original image. the
transformation function is assumed to fulfill t wo condition T(r)) is single valued
and monotonically increasing in the interval 0The transformation function should be single valued so that the inverse transformations should exist. Monotonically increasing condition preserves the
incre asing order from black to white in the output image. The second conditions
guarantee that the output gray levels will be in the same range as the input levels.
The gray levels of the image may be viewed as random variables in the interval
[0.1]
The most fundamental descriptor of a random variable is its probability density
function (PDF) Pr(r) and Ps(s) denote the probability density functions of random
variables r and s respectively. Basic results from an elementary probability theory
state that if Pr(r) a nd Tr are known and T -1 (s) satisfies conditions (a), then the
probability density function Ps(s) of the transformed variable s is given by the
formula - munotes.in
Page 25
25Chapter 2: Intensity Transformations and Spatial Filtering
Thus, the PDF of the transformed variable s is the determined by the gray levels
PDF of the input image and by the chosen transformations function
A transformation function of a particular importance in image processing
This is the cumulative distribution function of r
Using this definition of T we see that the derivative of s with respect to r is
Substituting it back in the expression for Ps we may get
An important point here is that Tr depends on Pr(r) but the resulting Ps(s) always
is uniform, and independent of th e form of P(r).
For discrete values we deal with probability and summations instead of probability
density functions and integrals.
The probability of occurrence of gray levels rk in an image as approximated
Pr(r)=nk/N
N is the total number of the pixels i n an image. nk is the number of the pixels that
have gray level rk. L is the total number of possible gray levels in the image. The
discrete transformation function is given by
Thus, a processed image is obtained by mapping each pixel with levels rk in t he
input image into a corresponding pixel with level sk in the output image.
A plot of Pr (rk) versus rk is called a histogram. The transformation function given
by the above equation is the called histogram equalization or linearization. Given
an image the process of histogram equalization consists of implementing the
transformation function which is based on information that can be extracted
directly from the given image, wit hout the need for further parameter specification.
munotes.in
Page 26
26IMAGE PROCESSING
Fig 2.11 Left column - original images, center column - histogram equalized images
and last column - histogram of histogram equalized images.
Equalization automatically determines a transformation function that seeks to
produce an output image that has a uniform histogram. It is a good approach when
automatic enhancement is needed.
2.3.2 Histogram Matching (Specification)
In some cases, it may be desirable to specify the shape of the histogram that we
wish the processed image to have. Histogram equalization does not allow interactive image enhancement and generates only one result: an approximation to a uniform histogram. Sometimes we
need to be able to specify particular histogram shapes capable of highlighting
certain gray -level ranges. The method used to generate a processed image that has
a specified histogram is called histogram matching or histogram specification.
Algorithm
1. Compute sk=Pf (k), k = 0, , L -1, the cumulative normalized histogram of f .
2. Compute G(k), k = 0, …, L -1, the transformation function, from the given
histogram hz
munotes.in
Page 27
27Chapter 2: Intensity Transformations and Spatial Filtering
3. Compute G -1(sk) for each k = 0, …, L -1 using an iterative method (iterate
on z), or in effect, directly compute G -1(Pf (k))
4. Transform f using G -1(Pf (k))
Global and Local enhancement
In earlier methods pixels were modified by a transformation function based on the
gray level of an entire image. It is not suitable when enhancement is to be done in
some small areas of the image. This problem can be solved by local enhan cement
where a transformation function is applied only in the neighborhood of pixels in
the interested region.
Define square or rectangular neighborhood (mask) and move the center from pixel
to pixel. For each neighborhood
1) Calculate histogram of the points in the neighbourhood
2) Obtain histogram equalization/specification function
3) Map gray level of pixel centered in neighbourhood
4) The center of the neighbourhood region is then moved to an adjacent pixel
location and the procedure is repeated.
Fig 2.12a) Original Image b) Image obtained after global histogram equalization
and c) after local histogram equalization.
d) Result of local enhancement based on local histogram
munotes.in
Page 28
28IMAGE PROCESSING
2.4 Fundamentals of Spatial filtering
1. Spatial filtering modifies an image by rep lacing the value of each pixel by a
function of the values of the pixel and its neighbours.
2. If the operation performed on the image pixels is linear, then the filter is
called linear spatial filter. Otherwise, the filter is a non -linear filter.
Spatial f iltering is an example of neighborhood operations, in this the operations
are done on the values of the image pixels in the neighborhood and the
corresponding value of a sub image that has the same dimensions as of the
neighborhood.
This sub image is called a filter, mask, kernel, template or window ; the values in
the filter sub image are referred to as coefficients rather than pixel. Spatial filtering
operations are performed directly on the pixel values (amplitude/gray scale) of the
image. The process consists of moving the filter mask from point to point in the
image. At each point (x, y) the response is calculated using a predefined
relationship.
Fig 2.13 The mechanics of spatial filtering using a 3 x 3 kernel
For linear spatial filtering the response is given by a sum of products of the filter
coefficient and the corresponding image pixels in the area spanned by the filter
mask.
munotes.in
Page 29
29Chapter 2: Intensity Transformations and Spatial Filtering
The results F (x, y) of linear filtering with the filter mask at point (x, y) in th e image
is given by
w (-1, -1) f (x -1, y-1) + w ( -1,0) f (x -1, y) + ………...+ w (1, 1) f (x + 1, y +1)
The coefficient w (0,0) coincides with image value f (x, y) indicating that mask it
centered at (x, y) when the computation of sum of products takes place.
For a mask of size M x N, we assume m=2a+1 and n=2b+1 , where a and b are
nonnegative integers. It shows that all the masks are of odd size .
In the general linear filtering of an image of size f of size M*N with a filter mask
of size m*m is given by the expression.
Where a= (m -1)/2 and b = (n -1)/2
To generate a complete filtered image this equation must be applied for x=0, 1, 2,
-----M-1 and y=0,1,2 ---, N-1. Thus, the mask processes all the pixels in the image.
The process of linear filtering is similar to frequency domain concept called
convolution. For this reason, linear spatial filtering often is referred to as convolving a mask with an image. Filter mask are sometimes called convolution
mask.
Padding the image by adding rows and columns of o’s & of padding by replicating
rows and columns when the centre of the filter approaches the border of the
image.
2.4.1 Smoothing Spatial Filters(lowpass) These filters are used for blurring and noise reduction blurring is used in preprocessing steps such as removal of small details from an image prior to object
extraction and bridging of small gaps in lines or curves.
Smoothing Linear Filters
The output of a smoothing liner spatial filter is simply the average of the pixel
contained in the neighborhood of the filter mask. These filters are also called
averaging filters or low pass filters. Smoothing filters are used in combination with
other image enhancemen t techniques.
The operation is performed by replacing the value of every pixel in the image by
the average of the gray levels in the neighborhood defined by the filter mask . This
process reduces sharp transitions in gray levels in the image.
munotes.in
Page 30
30IMAGE PROCESSING
A major application of smoothing is noise reduction but because edges are also
provided using sharp transitions so smoothing filters have the undesirable side
effect that they blur edges. It also removes an effect false contouring which is
caused by using insufficient number of gray levels in the image.
Irrelevant details can also be removed by filters, irrelevant means which are not of
interest.
A spatial averaging filter in which all coefficients are equal is sometimes referred
to as a “box filter ”. 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 A weighted average filter is the one in which pixel are multiplied by different
coefficients.
Order Statistics Filter
These are nonlinear spatial filter whose response is based on ordering of the pixels
contained in the image area compressed by the filter and the replacing the value of
the center pixel with value determined by the ranking result.
The best example of this category is median filter. In this filter the values of the
center pix el are replaced by median of gray levels in the neighborhood of that pixel.
Median filters are popular because, for certain types of random noise, they provide
excellent noise -reduction capabilities, with considerably less blurring than linear
smoothing filters.
These filters are particularly effective in the case of impulse or salt and pepper
noise. It is called so because of its appearance as white and black dots superimposed
on an image.
In order to perform median filtering at a point in an image, we first sort the values
of the pixel in the question and its neighbors, determine their median and assign
this value to that pixel.
Low pass filtering is used for shadow correction, region extraction if used with
thresholding.
2.4.2 Sharpening Spatial Filter s (High pass)
The principal objective of sharpening is to highlight fine details in an image or to
enhance details that have been blurred either in error or as a natural effect of
particular method for image acquisition . munotes.in
Page 31
31Chapter 2: Intensity Transformations and Spatial Filtering
The applications of image sharpening range from electronic printing and medical
imaging to industrial inspection and autonomous guidance in military systems.
As smoothing can be achieved by integration, sharpening can be achieved by spatial
differentiation. The strength of respon se of derivative operator is proportional to
the degree of discontinuity of the image at that point at which the operator
is applied. Thus, image differentiation enhances edges and other discontinuities and
deemphasizes the areas with slow varying grey levels.
It is a common practice to approximate the magnitude of the gradient by using
absolute values instead of square and square roots.
A basic def inition of a first order derivative of a one -dimensional function f(x) is
the difference
A second order derivative as the difference
Fig 2.14 A scan line from an image showing ramp, step and constant segments
with its first and secon d order derivative
munotes.in
Page 32
32IMAGE PROCESSING
2.4.3.Using LAPLACIAN - Second derivative for image sharpening
The second order derivative is calculated using Laplacian. It is simplest isotropic
filter. Isotropic filters are the ones whose response is independent of the direction
of the image to which the operator is applied.
The Laplacian for a two -dimensional function f (x, y) is defined as
In the x -direction
In the y -direction
From the above equation, the discrete La placian of two variable is
The equation can be represented using any one of the following masks
Fig 2.15 Laplacian kernels
Laplacian highlights gray -level discontinuities in an image and deemphasizes the
regions of slow varying gray levels. This makes the background a black image.
The background texture can be recovered by adding the original and Laplacian
images.
munotes.in
Page 33
33Chapter 2: Intensity Transformations and Spatial Filtering
The stre ngth of the response of a derivative operator is propositional to the degree
of discontinuity of the image at that point at which the operator is applied. Thus,
image differentiation enhances eddies and other discontinuities and deemphasizes
areas with slowly varying gray levels.
The derivative of a digital function is defined in terms of differences. Any first
derivative definition
(1) Must be zero in flat segments (areas of constant gray level values)
(2) Must be nonzero at the onset of a gray level step or ramp
(3) Must be nonzero along ramps.
Any second derivative definition
;ϭͿ Must be zero in flat areas
;ϮͿ Must be nonzero at the onset and end of a gray level step or ramp
;ϯͿ Must be zero along ramps of constant slope.
Fig 2.16 Example of sharpening from blurred image to sharpened image
munotes.in
Page 34
34IMAGE PROCESSING
It is common practice to approximate the magnitude of the gradient by using
absolute values instead of squares and square roots.
Roberts Goss gradient operators
For implementing the gradient operators Z1 Z2 Z3 Z4 Z5 Z6 Z7 Z8 Z9
The smallest filter mask is size 3x3 mask
The difference between third and first row a 3x3 mask approximates the derivate
in the x-direction and difference between the third and first column approximates
the derivative in y-direction. These masks are called sobel operators.
2.4.4 Unsharp masking and highboost filtering
Unsharp masking means subtracting a blurred version of an image form the image
itself.
It consists of following steps
1. Blur the original image
2. Subtract the blurred image from the original
3. Add the mask to the original
ZKHUHZHLQFOXGHGDZHLJKWNNIRUJHQHUDOLW\:KHQN ZHKDYHXQVKDUS
masking, as defined above. When k > 1, the process is referred to as highboost
filtering. Choosing k < 1 reduces the contribution of the unsha rp mask
-1 0 0 1
munotes.in
Page 35
35Chapter 2: Intensity Transformations and Spatial Filtering
2.5 Highpass, Band reject and Band pass filters from Lowpass
filters
Fig 2.17 Low pass, Highpass, Bandreject , and Bandpass filters
Spatial and frequency -domain linear filters are classified into four broad categories:
lowpass and highpass filters, bandpass , and bandreject filters.
Lowpass filter - A lowpass filters attenuate or delete high frequencies while passing
low frequencies.
Highpass filter - A highpass filter deletes or attenuates all frequencies below a
cut-off value, u0, and passes all frequencies above this value. A highpass filter
transfer function is obtained by subtracting a lowpass function from 1. This
operation is in the frequency domain.
Bandreject filter - These filters can be constructed from the sum of a low pass and a
highpass function with different cut -off frequencies. They are known as notch
filters.
Bandpass filter - The bandpass filter transfer function can be obtained by
subtracting the bandreject function from 1 (a unit impulse in the spatia l domain).
munotes.in
Page 36
36IMAGE PROCESSING
2.6 Spatial Intensity transformation and filtering methods using
fuzzy techniques
Fuzzy sets and its applications are very useful when it comes to image processing
specially to spatial intensity transformation and filtering. Fuzzy logic proves more
beneficial and flexible than classical set theory.
Fuzzy logic deals with degree of membership in a set and never probabilistic. They
find application in situation characterized by vagueness and imprecision.
munotes.in
Page 37
37Chapter 2: Intensity Transformations and Spatial Filtering
Fig: 2.18 Membership funct ions with respect to their equations
The principal steps followed in the application of rule -based fuzzy logic
1. Fuzzify the inputs
2. Perform any required fuzzy logical operations
3. Apply an implication method
4. Apply an aggregation method to the fuzzy sets from step 3
5. Defuzzify the final output fuzzy set
The problem specific knowledge can be formalized with the following IF -THEN
fuzzy rules. Consider the following rules for a fruit.
R1: IF the color is green, THEN the fruit is verdant
OR
R2: IF the color is yellow, THEN the fruit is half -mature
OR
R3: IF the color is red, THEN the fruit is mature
munotes.in
Page 38
38IMAGE PROCESSING
Fig 2.19 Five basic steps used to implement the fuzzy rule -based system
Using Fuzzy sets for Intensity Tr ansformation
Consider a general problem of contrast enhancement. The following rules can be
defined for enhancing the contrast in a gray -scale image.
R1: IF a pixel is dark, THEN make it darker
R2: IF a pixel is gray, THEN make it gray
R3: IF a pixel is bright, THEN make it brighter
The concept of dark, gray , and bright can be expressed and evaluated using
membership functions. Fuzzy image processing is computationally intensive
because the entire process of fuzzification, processing the antecedents of all rules,
implication, aggregation , and defuzzification m ust be applied to every pixel in the
input image.
The output of these fuzzy rules whose output is interpreted as constant intensities
is considered as singletons which means their membership functions are constant.
Singletons greatly reduce the computational requirements in fuzzy image processing. Similarly , fuzzy rules can be applied for spatial filtering. When
applying fuzzy sets to spatial filtering, the basic approach is to define the
neighbourhood properties, that capture the essence of the filter are to detect.
munotes.in
Page 39
39Chapter 2: Intensity Transformations and Spatial Filtering
2.7 Summary
In this chapter, we learned about the various spatial domain transformations for
enhancing the image , histogram of the image, spatial filters and their application
on images. We also learned about fuzzy filtering in images.
2.8 Review Que stions
1. Explain spatial domain processing.
2. Describe point processing, contrast enhancement.
3. What are the basic intensity transformations?
4. What is filtering? What do you understand by the term spatial filtering?
5. Explain histogram equalization.
6. Write a short note on smoothing filters.
7. Write a short note on sharpening filters.
8. Explain the use of fuzzy techniques in image processing.
2.9 References
1. Digital Image Processing, third edition by Gonzalez and Woods.
2. www.Imageprocessingplace.com
3. Refer to nptel courses on digital image processing for further detailed study.
4. All Images courtesy from Digital Image Processing by Gonzalez and Woods,
4th Edition
munotes.in
Page 40
40IMAGE PROCESSING
Unit 2
3 FILTERING IN THE FREQUENCY DOMAIN
Unit Structure
3.0 Objectives
3.1 Introduction
3.2 Sampling and Fourier Transform of sampled functions
3.3 Discrete Fourier transform of one variable
3.4 Discrete Fourier transform of two variables
3.5 Properties of 2D - DFT and IDFT
3.6 Basics of filtering in the frequency domain
3.7 Smoothing in Frequency Domain
3.8 Sharpening in Frequency Domain
3.9 Selective filtering
3.10 Fast Fourier Transform
3.11 Summary
3.12 Review questions
3.13 References
3.0 Objective s
1. To understand the frequency domain , the difference , and advantage of
frequency domain from the spatial domain.
2. Basic steps of filtering in the frequency domain.
3. Familiarize with smoothing and sharpening filters in the frequency
domain.
4. Understand the working of Fast Fourier Transform
3.1 Introduction
Fourier transform named after the French mathematician Jean Baptiste Joseph
Fourier is one of the most prominent transforms used in Image processing.
Any function that periodically repeats itself can be expressed as a sum of sines and
cosines of different frequencies each multiplied by a different coefficient, this sum munotes.in
Page 41
41Chapter 3: Filtering in the Frequency Domain
is called Fourier series . Even the functions whi ch are non -periodic but whose area
under the curve if finite can also be represented in such form; this is now called
Fourier transform.
A function represented in either of these forms and can be completely reconstructed
via an inverse process with no loss of information.
3.2 Sampling and Fourier transform of sampled functions
A continuous function is converted into a sequence of discrete values before they
can be processed in a computer, which requires sampling and quantization.
&RQVLGHUDFRQWLQXRXVIXQFWLRQIWWREHVDPSOHGDWXQLIRUPLQWHUYDOVǻ7RIWKH
independent variable t.
The si mplest way to sample is to multiply f(t) by a sampling function equal to the
WUDLQRILPSXOVHVǻ7XQLWVDSDUW7KHHTXDWLRQRIWKHVDPSOHGIXQFWLRQLVJLYHQE\
The Fourier Transform of a sampled function
Let F(u) denote the Fourier transform of a contin uous function f(t). The sampled
function is a product of f(t) and an impulse train.
The Fourier transform of the product of the two functions in the spatial domain is
the convolution of the transforms of the two functions in the frequency domain.
The Fouri er transform of the sampled function is given by
where the given equation below is the Fourier transform of the impulse train.
3.3 Discrete Fourier Transform of one variable
1-D Fourier Transformation and its Inverse
Let f(x) be a continuous functio n of real variable x. The Fourier transform of f(x)
is
^`³f
f dxuxj xf uF xf ] 2 exp[)( )( )( Smunotes.in
Page 42
42IMAGE PROCESSING
Where j = ξെͳ
F(u) is composed of an infinite sum of sine and cosine terms and each value of u
determines the frequency of its corresponding sine -cosine pair where u is a
frequency variable.
Given F(u), f(x) can be obtained by the inverse Fourier transform )( )}({1xf uF
3.4 Discrete Fourier Transform of two variables
2-Dimensional Fourier transform
Images being 2 -dimensional functions, we need to define a 2 -D Fourier transform.
Fourier and Inverse Fourier transform of a two variable continuous function is
given by dxdyvy ux j yxf vuF yxf )] (2 exp[),( ),( )},({ ³³f
ff
fS dudvvy ux j vuF yxf vuF ]) (2 exp[),( ),( )},({1³³f
ff
f S
Where u, v are frequency variables.
Since the work is on digital images, rather than continuous variables discrete
Fourier transform is more appropriate
1-D Discrete Fourier Transform
The discrete Fourier transform of one variable f(x) is given by )X
0I[H[S>MSX[0@
[ 0
¦
where u = 0,1, 2……, M -1 and I[ IXH[S>MSX[0@
X 0
¦
Where x = 0,1, 2……, M – 1 ³f
f duuxj uF ] 2 exp[)( Smunotes.in
Page 43
43Chapter 3: Filtering in the Frequency Domain
To compute F(u) we substitute u=0 in the exponential term and sum for all values
of x.
Repeating for all M values of u, it takes M*M summations and multiplications to
compute discrete Fourier transform )X
0I[H[S>MSX[0@
[ 0¦
Where u = = 0,1, 2……, M -1
For discrete functions, the Fourier transform and its inverse exist always.
F(u) is complex, F(u) = R(u) + j I (u)
Where R is real and I is imaginary
The Fourier transform of a real function is generally complex and polar coordinates
are used.
Magnitude or spectrum of the Fourier transform is given by
Magnitude specifies how much of a certain frequency component is present
Phase angle or phase spectrum of the Fourier transform is given by
The phase specifies where the frequency component is in the images
The square of the spectrum is referred to as the power spectrum of f(x) (spectral
density).
2-D Discrete Fourier transform
The Fourier transform of a 2D discrete function (image) f (x, y) of size Mx N is
given by:
where u = 0,1, 2……, M -1 and v = 0,1, 2……, N -1
The inverse 2 -D Discret e Fourier transform is given by
munotes.in
Page 44
44IMAGE PROCESSING
where x = = 0,1, 2……, M – 1 and y = 0,1, 2……, N – 1 The 2D Fourier Spectrum, Phase Spectrum and Power Spectrum can be respectively denoted by
Properties of 2D Discrete Fourier Transform
The Periodicity property: F (u, v) in 2D DFT has a period of N in horizontal and M
in vertical directions.
The Symmetry property: The magnitude of the transform is centered on the origin
Fig 3.1 M x N Fourier Spectrum and Spectrum obtained
by multiplying f(x,y) by ( -1)x+ y
munotes.in
Page 45
45Chapter 3: Filtering in the Frequency Domain
Fig 3.2 a) Image b) Spectrum across four corners
c) Centred Spectrum d) Log Transformed image
The center value (at the origin) of the Frequency Spectrum corresponds to the
ZERO frequency component which also referred to as the DC component in an
image: Substituting 0,0 to the origin, the Fourier transform function yields to the
average/DC component value as follows:
F (0,0) is called “dc” component of the spectrum (current of zero frequency)
The above equation implies that the value of the Fourier transform at the origin is
equal to the average gray level of the image.
3.5 Properties of 2D DFT
1) Fourier transform is conjugate symmetric.
F (u, v) =F*( -u, -v)
so |F (u, v) |=|F ( -u, -v) |
munotes.in
Page 46
46IMAGE PROCESSING
which means that the Fourier spectrum is symmetric.
2) The relationship between the samples in the spatial and frequency domain
ǻX ଵ
ெο௫ DQGǻY ଵ
ேο௬
1) Translation and Rotation
Fourier transform pairs are given by
Using the polar coordinates
x= r cos ߠ ,y = r sin ߠ u = ߱ cos߮ v = ߱ sin߮
It results in the following transform pair
which indicates that rotating f(x, y) by an angle ߠ0 rotates F(u, v) by the same
angle.
2) Periodicity
The 2D Fourier tr ansform and its inverse are infinitely periodic in the u and
v directions
*Summary of the DFT pairs
Name DFT pairs
munotes.in
Page 47
47Chapter 3: Filtering in the Frequency Domain
3.6 Basics of filtering in the frequency domain
Filtering techniques in the frequency domain are based on modifying the Fourier
transform to achieve a specific objective, and then computing the inverse DFT to
get us back to the spatial domain.
Basic Steps for Filtering in the Frequency Domain:
1. Multi ply the input image by ( -1) x+y to center the transform.
2. Compute F (u, v), the DFT of the image from (1).
3. Multiply F (u, v) by a filter function H (u, v).
4. Compute the inverse DFT of the result in (3).
5. Obtain the real part of the result in (4).
munotes.in
Page 48
48IMAGE PROCESSING
6. Multiply the result in (5) by ( -1) x+y
Given the filter H (u, v) (filter transfer function) in the frequency domain, the
Fourier
transform of the output image (filtered image) is given by:
G (u, v) = H (u, v) F (u, v)
The filtered image g (x, y) is simply the inverse Fourier transform of G (u, v).
g (x, y) =
3.7 Smoothing in the Frequency Domain
Smoothing is a low pass operation in the frequency domain. Smoothing is achieved
in the frequency domain by high frequency attenuation of a specified range of high
frequency components in the transform of a given image.
There are three types of lowpass filters: Ideal, Butterworth, and Gaussian Low pass
filters
1. Ideal Low Pass filter
A lowpass filter is a filter that “cuts off” all high frequency components of
the Fourier transform that are at a distance greater than a specified distance
D0 from the origin of the transform. However, ILPF is not used practically.
The transfer function of an Ideal Low pass filter is given by
Where D (u , v) is the distance from point (u, v) to the center of frequency
rectangle and D 0 is the nonnegative constant.
Fig 3.3 a) Perspective plot of ILPF b) Function as an image
c) Radial cross section *XY>@
munotes.in
Page 49
49Chapter 3: Filtering in the Frequency Domain
Fig 3.4 a) Original image b - f) Results of filtering using ILPFs with cut -off
frequencies set at radii values 10, 30, 60, 160, and 460
2. Butterworth Low pass filter
It has a parameter called the filter order. For high values of filter order, it
approaches the form of the ideal filter whereas for low filter order values it
acts as a Gaussian filter. It may be viewed as a transition between two
extremes. The transfer function of a Butterworth low pass filter (BLPF) of
order n with cut off frequency at distance Do from the origin is defined as
A BLPF approaches the sharpness of an ILPF function with considerably
less ringing.
The kernel corresponding to the BLPF of order 1 has neither ringing nor
negative values. The kernel corre sponding to a BLPF of order 2 shows mild
ringing and small negative values. As the remaining images show, ringing
becomes significant for higher order filters. A BLPF of order 20 has a spatial
kernel that exhibits ringing characteristics similar to those o f the ILPF.
munotes.in
Page 50
50IMAGE PROCESSING
Fig 3.5 a) Perspective plot of BLPF b) Function as an image
c) Radial cross section of order n=1 through 4
3. Gaussian Low pass filter
The transfer function of GLPF is of the form
Where D (u, v) - the distance of point (u, v) from the center of the transform
ı '0- specified cut off frequency
The inverse Fourier transform of a frequency domain Gaussian function is
also Gaussian. This means that a spatial Gaussian filter kern el, obtained by
computing the IDFT, will have no ringing effect.
Fig 3.6 a) Perspective plot of GLPF b) Function as an image
c) Radial cross section for D 0 values.
3.8 Sharpening in the Frequency Domain
Image sharpening can be achieved by a high pass filtering process, which attenuates
the low -frequency components without disturbing high -frequency information.
These are radially symmetric and completely specified by a cross section.
If we have the transfer function of a low pass filter the correspondin g high pass
filter can be obtained using the equation
HHP (u, v) = 1 - HLP (u, v)
munotes.in
Page 51
51Chapter 3: Filtering in the Frequency Domain
Where H LP (u, v) is the transfer function of the low pass filter.
1) Ideal High Pass filter
An ideal highpass filter (IHPF) transfer function is given by
where, D( u, v) is the distance from the center of the frequency rectangle
Fig 3.7 Perspective plot , image and radial cross section of ILPF, GLPF and BLPF
2) Butterworth High pass filter
The transfer function of Butterworth High Pass filter of order n is given by
the equa tion
3) Gaussian High Pass filter
The transfer function of a Gaussian High Pass Filter is given by the
equation
munotes.in
Page 52
52IMAGE PROCESSING
Homomorphic Filtering
Homomorphic filters are widely used in image processing for compensating
the effect of no uniform illumination in an image. Pixel intensities in an
image represent the light reflected from the corresponding points in the
objects. As per as image model, image f (x, y) may be characterized by two
components: (1) the amount of source light incident on the scene being
viewed, and (2) the amount of light reflected by the objects in the scene.
These portions of light are called the illumination and reflectance components, and are denoted i (x, y) and r (x , y) respectively. The functions
i (x , y) and r ( x , y) combine multiplicatively to give the image function f (
x , y):
f (x, y) = i (x, y). r (x, y) (1)
where 0 < i (x, y) < a and 0 < r (x, y) < 1. Homomorphic filters are used in
such situations where the image is subjected to the multiplicative interference or noise as depicted in Eq. 1. We cannot easily use the above
product to operate separately on the frequency components of illumination
and reflection because the Fourier transform of f (x, y) is not separable; that
is
F [f (x, y)) not equal to F [i (x, y)]. F [r (x, y)].
We can separate the two components by taking the logarithm of the two sides
ln f (x, y) = ln i (x, y) + ln r (x, y).
Taking Fourier transforms on both sides we get,
F [ln f (x, y)} = F [ln i (x, y)} + F[ln r(x, y)].
that is, F (x, y) = I (x, y) + R (x, y), where F, I and R are the Fourier transforms
ln f (x, y), ln i (x, y ), and ln r (x, y). respectively. The function F represents
the Fourier transform of the sum of two images: a low -frequency illumination
image and a high -frequency reflectance image. If we now apply a filter with
a transfer function that suppresses low- frequency components and enhances
high-frequency components, then we can suppress the illumination component and enhance the reflectance component. Taking the inverse
transform of F (x, y) and then ant i-logarithm, we get
f’ (x, y) = i’
Fig 3.8 Steps in Homomorphic filtering
munotes.in
Page 53
53Chapter 3: Filtering in the Frequency Domain
3.9 Selective Filtering
Selective filtering is to process specific bands of frequencies or small regions of
the frequency rectangle. Filters in the first category are called band filters. If
frequencies in the band are filtered out, the band filter is called a bandreject filter;
similarly, if the frequencies are passed, the filter is called a bandpass filter. Filters
in the second category are called notch filters. They are further classified as whether
the frequencies in the notch areas are rejected or passed.
BANDREJECT AND BANDPASS FILTERS
Bandpass and Bandreject filter transfer functions in the frequency domain can be
constructed by combining lowpa ss and highpass filter transfer functions. Lowpass
filter transfer functions are the basis for forming highpass, bandreject, and bandpass
filter functions. A bandpass filter transfer function is obtained from a bandreject
function.
HBP (u, v) = 1 – HBR (u, v)
Fig 3.9 Transfer functions of Bandreject filters
NOTCH FILTERS
Notch filters are the most useful of the selective filters. A notch filter rejects (or
passes) frequencies in a predefined neighbourhood of the frequency rectangle.
Zerophase -shift filters must be symmetric about the origin (center of the frequency
rectang le), so a notch filter transfer function with center at (u 0, v0) must have a
corresponding notch at location ( -u0, - v0).Notch reject filter transfer functions are
constructed as products of highpass filter transfer functions whose centres have
been transl ated to the centres of the notches.
3.10 Fast Fourier Transform Algorithm
Discrete Fourier transform and Inverse Discrete Fourier transform takes a lot of
computations in real-time as learned theoretically. To implement the transform
practically, the Fast Fouri er Transform algorithm is used. Fast Fourier transform
(FFT), reduces computations to the order of MN log2 MN multiplications and
additions.
munotes.in
Page 54
54IMAGE PROCESSING
3.11 Summary
In this chapter, we learned about the frequency domain, filtering in the frequency
domain , and its eff ective usage in image processing. The use of the fast Fourier
algorithm and its purpose practically.
3.12 Review questions
1. Compare and contrast spatial domain and frequency domain
2. How does filtering in the frequency domain works?
3. What are the basic steps of filtering in the frequency domain?
4. What are smoothing filters? Explain the various types of low pass filters.
5. Explain sharpening in the frequency domain.
6. Write a short note on selective filtering.
7. Explain homomorphic filtering b riefly.
8. What is FFT? Why FFT is used practically and not DFT?
9. Mention the properties of DFT.
3.13 References
1. Digital Image Processing, fourth edition by Gonzalez and Woods.
2. www.Imageprocessingplace.com
3. Refer to nptel courses on digital image processing for further detailed study
on FFT and other related concepts.
4. All Images courtesy from Digital Image Processing by Gonzalez an d Woods,
4th Edition
munotes.in
Page 55
55Chapter 4: Image Restoration and Reconstruction
Unit 2
4 IMAGE RESTORATION AND
RECONSTRUCTION
Unit Structure
4.0 Objectives
4.1 Introduction
4.2 Model of the Image degradation/restoration process
4.3 Noise models
4.4 Restoration in Spatial and Noise Reduction in Frequency Domain
4.5 Inverse Filtering
4.6 Minimum Mean Square Error (Wiener) Filtering
4.7 Geometric mean filter
4.8 Image Reconstruction from projections
4.9 Summary
4.10 Review questions
4.11 References
Objectives
1. Understand the characteristics of various noise models used in image
processing.
2. Understand the reduction of noise in the frequency domain.
3. Familiarize with Inverse, Geometric mean filters and understand
reconstruction of images using projections
4.1 Introduction
Restoration attempts to recover an image that has been degraded by using a priori
knowledge of the degradation phenomenon. Thus, r estoration techniques are
oriented toward modelling the degradation and applying the inverse process in
order to recover the original image. munotes.in
Page 56
56IMAGE PROCESSING
It deals with the characteristics of various noise models used in image processing,
and how to estimate from image data the parameters that define those models.
The chapter gives an overview of minimum mean -square -error (Wiener) filtering
and its advantages over direct inverse filtering.
Image Restoration refers to a class of methods that aim to remove or reduce the
degradations that have occurred while the digital image was being obtained. All-
natural images when displayed have gone through some sort of degradation:
a) During display mode
b) Acquisition mode.
c) Processing mode
The degradations may be due to
a) Sensor noise
b) Blur due to camera mis focus
c) Relative object -camera motion
d) Random atmospheric turbulence
4.2 A model of the Image Degradation/Restoration process
Fig 4.1 A model of the Image Degradation/Restoration process
The degradation process operates on a degradation function that operates on an
input image with an additive noise term. The input image is represented by using
WKHQRWDWLRQI[\QRLVHWHUPFDQEHUHSUHVHQWHGDVȘ[\ These two terms when
combined give the result as g(x,y). If we are given g(x,y), some knowledge about
the degradation function H or J and some knowledge about the additive noise
WHHPȘ[\WKH objective of restoration is to obtain an estimate f'(x,y) of the
original image. We want the estimate to be as close as possible to the original image.
munotes.in
Page 57
57Chapter 4: Image Restoration and Reconstruction
7KHPRUHZHNQRZDERXWKDQGȘWKH closer f(x,y) will be to f'(x,y).
If it is a linear position invariant process, then degraded image is given in the spatial
domai n by
J[\ I[\
K[\Ș[\
h(x,y) is spatial representation of degradation function and symbol * represents
convolution. In frequency domain we may write this equation as
G (u, v) =F (u, v) H (u, v) +N (u, v)
The terms in the capital letters are the Fourier Transform of the corresponding
terms in the spatial domain.
4.3 Noise models
The principal sources of noise in digital images arise during image acquisition
and/or transmission.
For example, in acquiring images with a CCD camera, light levels and sensor
temperature are major factors affecting the amount of noise in the resulting image. Images are corrupted during transmission principally by interference in the transmission channel. For example, an image transmitted using a wireless netwo rk
might be corrupted by lightning or other atmospheric disturbance.
Spatial and Frequency properties of Noise
1. When the Fourier spectrum of noise is constant, the noise is called white
noise.
2. Noise is independent of spatial coordinates. Noise is uncorrela ted with
respect to the image itself
Common Noise found in Image Processing applications
1. Gaussian Noise
These noise models are used frequently in practices because of its tractability
in both spatial and frequency domain.
The PDF of Gaussian random variable, z, is given by
where, z represents intensity, z(bar) is the mean (average) value of z and ߪ is
the standard deviation.
munotes.in
Page 58
58IMAGE PROCESSING
2. Rayleigh Noise
The PDF of Rayleigh noise is given by
The mean and variance of z when this random variable is characterized by a
Rayleigh PDF
3. Erlang (gamma) noise
The PDF of Erlang noise is given by
where the parameters are such that a > b, b is a positive integer, and “!”
indicates factorial. The mean and variance of z are
4. Exponential Noise
The PDF of exponential noise is given by
where a > 0. The mean and variance of z are
munotes.in
Page 59
59Chapter 4: Image Restoration and Reconstruction
5. Uniform Noise
The PDF of uniform noise is
The mean and variance of z are
6. Salt-and-Pepper Noise
The PDF of salt -and-pepper noise is given by
where V is any integer value in the range 0 < V < 2k – 1
Fig 4.2 Noise Models with their probability density functions
munotes.in
Page 60
60IMAGE PROCESSING
4.4 Restoration in Spatial and Noise Reduction in Frequency
Domain
When an image is degraded only by additive noise, the equation becomes
J[\ I[\Ș[\
or
G (u, v) = F (u, v) + N (u, v)
The noise terms are unknown so subtracting them from g (x, y) or G (u, v) is not a
realistic approach. In the case of periodic noise it is possible to estimate N(u,v) from
the spectrum G(u, v). So, N (u, v) can be subtracted from G (u, v) to obtain an
estimate of original image. Spatial filtering can be done when only additive noise
is present.
The following techniques can be used to reduce the noise effect
Mean Filters
1) Arithmetic Mean Filters
It is the simplest mean filter. Let S xy represents the set of coordinates in the
sub image of size m*n centered at point (x,y). The arithmetic mean filter
computes the average value of the corrupted image g(x,y ) in the area defined
by S xy. The value of the restored image f at any point (x,y) is the arithmetic
mean computed using the pixels in the region defined by Sxy
This operation can be using a convolution mask in which all coefficients
have value 1/mn
A mean filter smoothes local variations in image Noise is reduced as a result
of blurring.
2) Geometric mean filters
An image restored using a geometric mean filter is given by the expression
where indicates multiplication. Here, each restored pixel is given by the
munotes.in
Page 61
61Chapter 4: Image Restoration and Reconstruction
product of all the pixels in the subimage area, raised to the power 1/ mn
A geometric mean filter achieves smoothing comparable to an arithmetic
mean filter, but it tends to lose le ss image detail in the process.
3) Harmonic mean filter
The harmonic mean filter works well for salt noise, but fails for pepper noise.
It does well also with other types of noise like Gaussian noise.
The harmonic mean filtering operation is give n by the expression
4) Contraharmonic mean filter
The Contraharmonic mean filter yields a restored image based on the
expression
where Q is called the order of the filter. This filter is well suited for reducing
or virtually eliminating the effects of salt -and-pepper noise. For positive
values of Q, the filter eliminates pepper noise. For negative values of Q, it
eliminates salt noise. It cannot do both simultaneously.
The Contraharmonic filter reduces to the arithmetic mean filter if Q = 0, and
WRWKHKDUPRQLFPHDQILOWHULI4 í
ORDER STATISTICS FILTERS
Order statistics filters are spatial filters whose response is based on ordering the
pixel contained in the image area encompassed by the filter. The response of the
filter at any point is determined by the ranking result.
1) Median Filters
These filters replace the value of a pixel by the median of the intensity levels
in a predefined neighbourhood of that pixel
munotes.in
Page 62
62IMAGE PROCESSING
where, as before, Sxy is a sub-image (neighb ourhood) centered on point (x,
y). The value of the pixel at (x, y) is included in the computation of the
median. Median filters provide excellent noise -reduction capabilities, with
considerably less blurring than linear smoothing filters of similar size.
Median filters are particularly effective in the presence of both bipolar and
unipolar impulse noise.
2) Max and Min filters
Using the l00th percentile of a ranked set of numbers is called the max filter
and is given by the equation
This filter is useful for finding the brightest points in an image or for eroding
dark regions adjacent to bright areas
The 0th percentile filter is min filter
This filter is useful for flinging the darkest point in the image. Also, it
reduces salt noise of the min o peration.
3. Mid-point filter
The midpoint filter simply computes the midpoint between the maximum
and minimum values in the area encompassed by the filter
It combines the order statistics and averaging filter. This filter works best
for randomly distributed noise like Gaussian or uniform noise.
4. Alpha -Trimmed mean filter
A filter formed by averaging these remaining pixels is called an alpha -
trimmed mean filter. The form of this filter is given by
ZKHUHWKHYDOXHRIGFDQUDQJHIURPWRPQí:KHQG WKHDOSKD -
trimmed filter reduces to the arithmetic mean filter. ,IZHFKRRVHGPQ í
the filter becomes a median filter.
munotes.in
Page 63
63Chapter 4: Image Restoration and Reconstruction
Periodic Noise Reduction using frequency domain filtering
The basic idea is that periodic noise appears as concentrated bursts of energy in the
Fourier transform, at locations corresponding to t he frequencies of the periodic
interference. The approach is to use a selective filter to isolate the noise. In
restoration of images corrupted by periodic interference, the tool of choice is a
notch filter.
The general form of a notch filter transfer func tion is
A notch pass filter transfer function is obtained from a notch reject function using
the expression
4.5 Inverse Filtering
It is a process of restoring an image degraded by a degradation function H. This
function can be obtained by any method. The simplest approach to restoration is
direct, inverse filtering.
Inverse filtering provides an estimate F(u,v) of the transform of the original image
simply by the transform of the degraded image G(u,v) by the degradation function.
It shows an interesting result that even if we know the depredation function we
cannot recover the underrated image exactly because N(u,v) is not known. If the
degradation value has zero or very small values then the ratio N(u,v)/H(u,v) could
easily dominat e the estimate F(u,v).
4.6 Minimum Mean Square Error (Wiener) Filtering
Wiener Filtering is an approach that incorporates both the degradation function and
statistical characteristics of noise into the restoration process. The method is
founded on considering images and noise as random variables, and the objective is
to find a n estimate ˆ f of the uncorrupted image f such that the mean square error
between them is minimized. This error measure is defined as
munotes.in
Page 64
64IMAGE PROCESSING
The minimum error function of the above expression is given by the expression
The product of a complex quantity with its conjugate is equal to the magnitude of
the complex quantity squared. This result is known as the Wiener filter
The terms used are as follows
4.7 Geometric Mean filter
The generalized form of Wiener filter is the geometric mean filt er.
where and ߚ are nonnegative, real constants. The geometric mean filter transfer
function consists of the two expressions in brackets raised to the powers DQGí
, respectively.
When = 1 the geometric mean filter reduces to the inverse filter . With a = 0 the
filter becomes the so -called parametric Wiener filter, which reduces to the “standard” Wiener filter when ߚ =1
munotes.in
Page 65
65Chapter 4: Image Restoration and Reconstruction
4.8 Image Reconstruction from projections
The basic principles of projections apply to other CT imaging modalities, such as SPECT (single photon emission tomography), PET (positron emission tomography), MRI (magnetic resonance imaging), and some modalities of ultrasound imaging. One can use projec tions with Radon transform or fan-beam projections or slice theorem for understanding image reconstruction from projections. A filter function can be used for image reconstruction. This approach is appropriately called filtered back projection. In practice , the data are discrete, so
all frequency -domain computations are carried out using a 1 -D FFT algorithm .
4.9 Summary
In this chapter, we learned about the various noise models regarding image
degradation and reconstruction. The different sources of noise. T he purpose of
filters and the working of spatial and frequency -based filters to reconstruct images.
4.10 Review questions
1. Explain the model of image degradation and reconstruction with a diagram?
2. What are the different sources of noise?
3. Explain the various noise models.
4. Describe the various spatial filters/frequency -based filters used to reconstruct
images.
5. Write a short note on Wiener Filtering.
4.11 References
1. Digital Image Processing, fourth edition by Gonzalez and Woods.
2. www.Imageprocessingplace.com
3. Refer to nptel courses on digital image processing for a further detailed study
on FFT and other related concepts
4. All Images courtesy from Digital Image Processing by Gonzalez and Woods,
4th Edition
munotes.in
Page 66
66IMAGE PROCESSING
Unit 3
5 WAVELET AND OTHER IMAGE TRANSFORMS
Unit Structure
5.0 Objective
5.1 Preliminaries
5.2 Matrix -based transforms
5.3 Correlation
5.4 Basis Functions in the Time -Frequency Plane
5.5 Basis Images
5.6 Fourier -Related Transforms
5.7 Walsh -Hadamard Transforms
5.8 Slant Transform
5.9 Haar Transform
5.10 Wavelet Transforms
5.11 Color Fundamentals
5.12 Color Models
5.13 Pseudocolor Image Processing
5.14 Full-Color Image Processing
5.15 Color Transformations
5.16 Color Imag e Smoothing and Sharpening
5.17 Using Color in Image Segmentation
5.18 Noise in Color Images
5.19 Color Image Compression
5.20 Summary
5.21 Unit End questions
5.22 Reference for further reading
munotes.in
Page 67
67Chapter 5: Wavelet and Other Image Transforms
5.0 Objective
In image compression, the wavelet transformation plays an extremely important
role. Wavelet transformation is a more appropriate technique than Fourier for
image compression applications. Fourier transform is not practical for spectral
computing information as all the previous and future i nformation on the signal are
essential throughout the entire of the time domain and frequencies cannot be
observed varying in time, as the resulting function is time -independent. The Fourier
discrete transform is member of an essential class of linear tran sformations,
including Hartley, sine, cosine, Walsh -Hadamard, Slant, Haar and the wavelet
transform. Such transformations, decompose tasks into weighted amounts of
orthogonal or biorthogonal functions that can be studied with the use of linear
algebra tool s and functional analysis tools.
5.1 Preliminaries
In Linear Algebra, A vector space (called as linear space) is a group of objects that
can be added and multiplied ('scaled') by numbers called scalars. Scalars are
sometimes taken by real numbers, but there are vector space with scalar
multiplication by complex numbers, rational numbers or usually any field also
exists. Vector addition and scalar multiplication operations must fulfil some
requirements, known as vector axioms. An inner product is a widespread of the dot
product. In a vector space, it is a method of multiplying vectors together, the result
being a scalar one.
An inner product <·,·> fulfils four characteristics for a real vector space. Let u, v
and w be scalar vectors, an GĮEHDVFDODUWKHQ
1. =+.
DOSKDYZ! ĮYZ!
3. =.
4. >=0 and equal if and only if v=0.
It's known as the positive -definite fourth condition in the above list. Notice that
certain authors define an inner product coul d be a function <·,·> that satisfies only
the third of the above terms with a (weaker) non -degenerated (i.e., when = 0 for all
w, then v=0) addition of the (weaker) condition. Functions fulfilling all four criteria
are usually referred to in such literatur e as positive -definite interior products, while
internal products that are not defined positively are often referred to as uncertain.
While subtly, this disparity presents many remarkable phenomena. Inner products munotes.in
Page 68
68IMAGE PROCESSING
that are not positive -definite, for exampl e, can create 'standards' under which those
vectors (such vectors are called spacelike) have imaginary magnitude and cause
'metrics' that are not actual metrics. The inner product of Lorentzia is an example
of an indefinite inner product.
An inner product space is called a vector space with an inner product this
description also refers in any field to an abstract vector space.
1. The real numbers of the inner product
=xy.
Euclidean space RN is an infinite set of all real N -tuples.
eq -1
Where u and v are column vectors
2. Unit space through complex C number with inner product function. Inner
product spaces over the field of complex numbers are sometimes referred to
as unitary spaces.
eq -2
where * indicates the complex operation of the conjugate, and u and v are
complex values column vector
3. The vector space C[a; b] of all real -valued continuous functions on a closed
interval [a; b] is an inner product spa ce, whose inner product is defined by
eq -3
4. The norm or length of vector z in all three inner product spaces is as specified
eq -4
Equations through (5 to 15) are valid for all inner product spaces, including
those defined by Eqs. (3) to (4) a nd the angle between two nonzero vectors z
and w is
munotes.in
Page 69
69Chapter 5: Wavelet and Other Image Transforms
5. Although the meaning must always be considered, the term 'vector' is usually
used for abstract vectors. A matrix (e.g., column vector) or a continuous function may be a vector .
eq-5
6. If the z norm is 1, z is considered to have been normal ized. If it is said that
Eq-5, z and w are orthogonal. A natural resu lt of these definitions is that a
number of non -zero vectors are jointly or equally orthogonal, provided it only
eq-6
7. They constitute an orthogonal basis for the inner product space .They are an
orthonormal basis, when the basic vectors are normatively
eq-7
8. Similarly, a set of w0, W1, W2, ... vectors and an additional set of
dual vectors is said to be biorthogonal and biorthogonal basis vector space
eq-8
9. They are a biorthonormal basis if and only if
eq-9
10. As a method for coherently representing an infinite set of vectors, the
foundation of inner product space is among the most important principles in
linear algebra. The subsequent derivation that depends on the orthogonally
of the basis vectors is fundamen tal to the matrix -based transformations of the
next section. Let W = {} 0 1 {}, w 2,... Be the orthogonal basis of the inner
product area V, and let z V be supported. The vector z can then be expressed
as the following linear combination of base vectors.
eq-10
munotes.in
Page 70
70IMAGE PROCESSING
11. whose inner product with basis vector w i is
eq-11
12. Since they are mutually orthogonal the inner product s on the right side of Eq.
(11) are 0 unless the vector subscriptions for which the inner products are
computed match [see Eq. ( 7)]. Thus, the only nonzero word is to exclude zero
terms and to divide the two sides of the equation by giving
eq-12
13. Which reduces to
eq-13
14. If the basic vector norms are 1
eq-14
15. And
eq-15
5.2 Matrix -Based Transforms
The 1 -D discrete Fourier transform is one of a class of significant transformations
that can be represented in a general relation. The word transform is used in
mathematics to describe a change in form without any corresponding change in
value.
eq-16
munotes.in
Page 71
71Chapter 5: Wavelet and Other Image Transforms
Where x is a spatial variable, T(u) is the transform of f(x), r(x,u) is a forward
transformation kernel, and integer u is a transform variable with values in the range
Similarly, the inverse transform of T(u) is
eq-17
If s(x,u) is a inverse transformation kernel and x is a transformation kernel for the
range r(x,u), and s(x,u) in Eqs.(16) and (17) that only depends on x and u and not
on f(x) or T(u) values, then the existence of the transform pair that is defined will
be determined.
Graphical equat ions ( 17) are shown in Fig.5. 1. It should be taken into consideration
that the f(x) is a weighted total of N inverse (e.g. s(x,u) for) kernel functions and
that the T(u) for weight . Each N s(x,u) contributes f(x) to each x. Expanding the
right side of Eq. (17)
FIGURE 5.1
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
A graphical illustration of Eq. (18)
eq-18
munotes.in
Page 72
72IMAGE PROCESSING
It is readily evident which a linear expan sion is represented as Eq. (11 ), with Eq.
(18) s(x,u), T(u) in place (e.g. th e basis of vectors) and Equ. (11 ). In this case, a
linear expansion is immediately apparent. When we suppose s(x,u) is orthonormal
vectors of a product's inner space in Eq. (18), Eq. (13 ) tells us that
eq-19
And T(u) can be calculated for transform through the inner products
We frequently use subscripts to define the matrix or vector elements. This means
the first column vector f element, which is f (0), and the first c olumn of column
vector element s (0, 3).
eq-20
eq-21
And
eq-22
munotes.in
Page 73
73Chapter 5: Wavelet and Other Image Transforms
and using them to rewrite Eq. (19) as
eq-23
Combining the N -base vectors in the transform matrix
eq-24
Then Eq. (23) is replaced into Eq. (21) and EQ. (1) can be used to achieve
By using Eq. (1), we assume that real -value vector is the most common case. For
complex inner product space, equation (2) must be used.
eq-25
Or
eq-26
munotes.in
Page 74
74IMAGE PROCESSING
eq-27
Where Eqs. (1) and (7) respectively follow the two las t steps. Since AAT=I it gives
Eq (26) pre -multiplication by AT and simplification gives f=ATt Eqs. (16) and ( 17)
thus become the matrix -based pair of transformations.
eq-28
And
eq-29
It is essential to understand that we assumed that the N transform basis vectors (i.e.
s u for u =0, 1… .N-1) of the Transformation matrix A were true and orthonormal in
the derivations of Eqs (28) and (29). In compliance with Eq (7)
eq-30
munotes.in
Page 75
75Chapter 5: Wavelet and Other Image Transforms
The assumed orthonormality makes it possible to calculate forward t ransforms
without explicit references to a kernel of forward transformation, i.e. t=Af , when
A is a function acts in the reverse transformation kernel s(x,u).
Because of the reality that the base vector A is real and orthonormal, the transform
is termed the orthogo nal transform defined in Eq. ( 28). It preservatives the inner
products is i.e., ۃۃf1,f2 ۄۄ =ۃۃt1,t2 ۄۄ =ۃۃA f1,A f 2 ۄۄ and therefore, the distances and
angles pre and post transformation among vectors. The rows and columns of A are
orthonormal AAT = ATA = I, so Aí1 = AT. The outcome in the Eqs. (28) And Eq
(29) are a reversible transfor m pair . Substituted by Eq. ( 29) with ( 28) produces t=
Af= AATt=t, while replacing Eq. (28) with ( 29), f= ATt=ATAF=f gives.
eq-31
And
eq-32
Where the inverse transformation kernels r(x,y,u,v) and s( x,y,u,v) respectively are forward . Transform T(u,v) and invert kernel transformation s(x,y,u,v), respectively, with eq. (32) way to set a linear expansions to f(x,y), are regarded as
another weighting coefficients and basis vectors . Forward transformatio n kernel
r(x,y,u,v) is separable if
eq-33
and symmetric if r 1is functionally equal to r 2so
eq-34
When the transformation kernels are real and orthonormal; both r and s are
distinguishable and symmetrical, then the matrix equivalents for Eqs. (6 -31) and
(6-32) shall be substitute r in Eqs. (33) and r in Eqs. (34), respectively, for
separable, separable symmetrical inverse kernels.
T=AFAT eq-35
munotes.in
Page 76
76IMAGE PROCESSING
And
F=ATTA eq-36
When F is a matrix of f( x, y), T is a transform and A is as defined in the Equation
(24) Equation. The column and row transformations of F are measured according
to the F pre and post -multiplication by A and Eq. (35). This actually breaks the 2 -
D transformation into two 1 -D trans formations, which mirror the 2 -D DFT process.
EXAMPLE 1 A simple orthogonal transformation.
Consider the 2 -element basis vectors
And note they are orthonormal in accordance with Eq. (30) :
Substitute and transformation matrix in or out of Eq. (6-24).
eq-37
and the transform of matrix
follows from Eq. (35)
munotes.in
Page 77
77Chapter 5: Wavelet and Other Image Transforms
In accordance with Eq. (36) , the inverse of transform T is
In conclusion, we note that A is an orthogonal transformation matrix for which
Rectangular Arrays
When rectangular arrays are transformed, Eqs. (35) and (36) are changed. In
comparison to square .
eq-38
And
eq-39
where F, AM and AN are of size MXN ,MXM,NXN and respectively. Both and are
defined in accordance with Eq. ( 24)
EXAMPLE 2: Computing the transform of a rectangular array.
A simple transformation in which M and N are 2 and 3, respectively, is
munotes.in
Page 78
78IMAGE PROCESSING
Where matrices F, A2 and A3 in the first step of the computation are defined. As
expected, 2X3 the transforming output T is the same as F
To demonstrate that it is an orthogonal matrix of transformation, and that Eq
reverses transformation (39)
Complex Orthonormal Basis Vectors
An orthonormal basis is the basis of which vectors are orthogonal to each other and
have a unit norm.
Orthonormal bases are essential in applications because a vector is especially
simple to represent on an orthonormal basis, called Fourier expansi on.
We have to be familiar with the concepts of internal product and standard to
understand this concept.
Orthonormal sets
Remember that if the inner product equals zero, two vectors are orthogonal.
Definition Let S have an inner product vector space <...>. A set of K vectors
S1…Sk ȯS, if and only if are an orthonormal set.
eq-40
Where * is the complex operation of the conjugate. If vectors of basis (35) and (36)
are problematic as opposed to real -value,
T=AFAT eq-41
And
F=A*T TA* eq-42
Transformation matrix A is then considered the unitary matrix and the unitary
transformation pair Eq (41) and Eq(42) is equal. The 1 -D counterpart of Eq. (41)
and (42) are thus an essential and useful property of A:
Orthogonal transformations are a unique case wherein the functions of expansion
are real-valued.
Both transforms maintain inner products.
t=Af eq-43
f=A*T t eq-44
munotes.in
Page 79
79Chapter 5: Wavelet and Other Image Transforms
EXAMPLE 3: A transform with complex -valued basis vectors.
The inverse of a unitary transformation matrix, which is the inverse of the
transformation matrix, is the inverse.
eq-45
is its conjugate transpose. Thus,
Where the matrix and where A is a single matrix, which in Eqs (41)
can be used through eq(44).. It can be shown easily if the base vectors in A satisfy
Eq. (40).
Biorthonormal Basis Vectors
Expansion functions S 0,S1…,S N-1 are biorthonormal in Eq. ( 24) if there are dual
expansion functions ~S0, ~S 1…,~S N-1 that
eq-46
Even the expansion functions and dual functions do not need to be orthonormal.
With a set of biorthonormal expansion features, Eqs. (35) and (36) are
transformed
eq-47
And
eq-48
munotes.in
Page 80
80IMAGE PROCESSING
Transformation matrix A remains as described by Eq. ( 24); dual transformation
matrix is an NX N matrix .The dual expansion functions
of its rows are transposed. When the function of expansion and its duals are
identical, i.e. , reduces the Equal Eq(47) and Eq(48) respectively to Eq.
(35) and Eq(36) are
eq-49
eq-50
EXAMPLE 4 : A biorthonormal transform.
Consider the real biorthonormal transformation matrices
A and are biorthonormal .The transform of 1 -D column vector f=[30 11 210
6] T is
And =tTt =65,084 which is unequal to the transformation have not
preserve inner products. but, reversible :
The forward and inverse transforms are calculated by using Eqs. (49) and (50) ,
respectively.
munotes.in
Page 81
81Chapter 5: Wavelet and Other Image Transforms
Finally, we mention that the majority of the concepts in this area can be used to
extend the form continuously.
The coefficients and base vectors for the expansion of the internal product space
&>DE@E\Į u and S u for u= for given f(x) and basis s u(x) for u=
.The appropriate coefficients of expansion may be calculated by
defining of Integral Internal Product of the C([a, b]) -i.e., Eq's -3 and all internal
product spaces' gener al properties, that is, Eqs. (10) by ( 15). Thus, for e.g. su(x) for
u= . are orthonormal basis vectors of C([a, b])
eq-52
EXAMPLE 5 : The Fourier series and discrete Fourier transform
Consider the representation in linear expanding of orthonormal base vectors of
the form of the continuous periodic function T
eq-53
In accordance with Eqs. (51) and (52)
eq-54
And
eq-55
yields the following discrete counterparts of Eqs. (53) through (55)
munotes.in
Page 82
82IMAGE PROCESSING
eq-56
eq-57
And
eq-58
The discrete complex basis vectors of Eq. (56) are an orthonormal basis of inner
product space CN Equations (58) and (59)
Consider, Eqs (55) and (58) , in the calculation of Fourier and the Fourier discrete
transfor m of I[ VLQʌ[ of period of T=1 are now used . In accordance with
Eq. (55)
DQGLQWKHVDPHZD\Į -1= j0.5 hence entire another coefficients are zero, the
resultant Fourier series is
eq-59
5.3 Correlation
The correlation of f and g indicated by two continual functions f(x) and g(x) are
denoted by , is defined as
,QIDFWWKHWHUPFURVVFRUUHODWLRQVKRXOGEHXVHGZKHQI[J[DQGDXWR -
correlation when f(x)=g(x) Equation ( 61) is valid for both cases
munotes.in
Page 83
83Chapter 5: Wavelet and Other Image Transforms
eq-61
Here the final step are follows from Eq. (3) ZLWKĮ -DQGĮ 6RPHWLPHVUHODWHG
to as the sliding inner product f and g, the correlation tests the similarity of f(x) and
g(x) according to their relative displacement ᶭx . if ᶭx =0.
As the name suggests, sliding inner product, display s lide between functions,
multiply them and compute the area. The functions become increasingly identical
as the area increases.
f յ g(0) = < f(x) ,g(x) > eq-62
And Eq. ( 52) defining continuous orthonormal expa nsion coefficients in Eq. ( 51)
can be written as alternatively
Įu = =f յ su (0) Thus, the expansion coefficients are correlations of one point, with zero displacement ᶭ. The similarity of f(x) and su(x) is measured E\Įx
The discrete equivalents of Eqs. (61) Through (63) are Integers n and m, fn
indicates the nth element in f, and g n+m implies the element (n+m) in g. Equation
(66) is the result of Equation (65) and (23)
eq-64
f յ g(0) =< f,g> eq -65
and
T(u)= =S u յ f (0)
Similar comments on Eq (63) and continuous series expansions and continuing
transforms in Eq (66) may be made. Every orthogonal transformation element and
orthogonal transformations are discrete. Each orthogonal transformation element.
5.4 Basis Functions in the Time -Frequency Plane
As Fig.5.2, which shows the basis vectors of certain commonly encountered
transformations, most orthogonal basis sets of sinusoids, square waves, ramps and
other small waves called wavelets are associated with mathemat ical activity. If h(t)
is a vector base and g(t) is the function transformed, the transform coefficient is a
munotes.in
Page 84
84IMAGE PROCESSING
measure of the similarity between g and h as noted in the previous section. The
large g յ h(0) values show that g and h have significant time and f requency
characteristics (shape, bandwidth, etc.). Therefore, if h is the ramp sh aped base
function in figure (d), then transform coefficient could be utilized to identify linear
gradients of brightness in a line of the image. If h is a sinusoidal functio n, like that
of Fig. (a), on the other hand it is possible to use g յ h(0). Plots such as those in
Fig. 5.2 and a similarity measure like this can reveal much about transforming the
function's time -frequency features
Independent variables T and F instead o f spatial variables x and u are used in the
introduction to the time frequency plane. g(t) and h(f) continuous functions are
replaced by f(x) and S u(x). Although concepts use continuous functions and
variables, they also apply to distinct functions and va riables
munotes.in
Page 85
85Chapter 5: Wavelet and Other Image Transforms
FIGURE 5.2 Basis vectors (for ) of some commonly encountered transforms: (a)
Fourier basis (real and imaginary parts), (b) discrete Cosine basis, (c) Walsh -Hadamard basis, (d) Slant basis, (e) Haar basis, (f) Daubechies basis, (g) Biorthogonal B -spline basis and its dual, and (h) the st andard basis, which is
included for reference only (i.e., not used as the basis of a transform).
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
The position of h on th e plane of time-frequency Fig. 5.3 is a strictly objective
descriptor of h and therefore g for large values of (a) . let p h =|h(t) |2 / ||h(t)2||2 be a
probability density function with mean
In Eq. ( 67) , every value of t is weighted by ph to calculate a weighted mean with
respect to coordinate t
eq-67
munotes.in
Page 86
86IMAGE PROCESSING
FIGURE 5.3
(a) Basis function localization in the time -frequency plane. (b) A standard basis
function, its spectrum, and location in the time frequency plane. (c) A complex
sinusoidal basis function (with its real and imaginary parts shown as solid and
dashed lines, respectively), its spectrum, and location in the time -frequency plane.
(Reference from "Digital Image Processing fourth edition by Rafael C. Gonzalez
and Richard E. Woods)
and variance
eq-68
Let pH (f) =|H(F) |2 / ||H(F)2||2 be a probability density function with mean
eq -69
and variance
The Fourier transform of H is where f denotes frequency. H( f) (t). Then, as shown
in Fig. 5 .3(a), the energy of base function h is concentrated at (µ t ,µf) on the time -
frequency plane. In a rectangular region called the Heisenberg box or a cell, the
majority of the energy comes from such an area ıt ıf
The energy of continuous function h(t) is ȁ݄ሺݐሻଶ |dtஶ
ିஶ
munotes.in
Page 87
87Chapter 5: Wavelet and Other Image Transforms
So if indicated in angular frequency, the const ant on the right side of Eq. ( 71)
even with a Gaussian base function, whose transform is also a Gaussian feature,
equal treatment is feasible.
ߪ௧ଶߪଶଵ
ଵగమ eq-71
Because a function support can be describe d as the series of points where the
function is non -zero, the principle of uncertainty in Heisenberg tells us that a
function can't be supported on time and on frequency at all. Reveal so that both ߪ௧
and ߪit can’t be very small. Hence the basis fun ction ߜሺݐെݐሻ illustrated in Fig.
5.3(b) is correctly localized with the time[i.e ߪ௧ൌͲ, hence the width of ߜሺݐെݐሻ
is equal to zero], its spectrum is also nonzero on the whole f -axis. i.e, hence ߦߜሺ݂െ݂ሻൌሺെ݆ʹߨ݂ݐሻ and ȁሺെ݆ʹߨ݂ݐሻ |=1 for entire f ,ߪൌ
λ, The output is an infinitesimal small , infinitely large Heisenberg cell on the time -
frequency plane. Basis function ሺʹߨ݂ݐሻ of Fig. 5 .3(c) , on the other hand, is
primarily, nonzero on the broad time axis, nonetheless is clearly localized in
frequency. Because ߦሼሺെ݆ʹߨ݂ݐሻൌߜሺ݂െ݂ሻሽ spectrum ȁߜሺ݂െ݂ሻȁ is zero
at fully frequencies other than The resultant Heisenberg cell is infinitely comprehensive and infinitesimally small in height ߪఠୀ As Figs. 5 .3(b) and (c)
illustrate, perfect localization in time is accompanied by a loss of localization in
frequency and vice versa.
Repeating once more to F ig. 5.2, notice the basis for the DFT of Fig. 5 .2(a) and the
standard basis of Fig. 5.2 (h) in Fig. 5 .3(c) and (b), accordingly, are discrete
examples (for N=16 ) of the impulses and complex exponential functions. In the top
half of Fig. 5.2, the other basis is both the index u frequency and the width or
support 16. For a given u, their locations are similar in the time frequency plane.
This is especially obvious when u is 8 and the basic functions of the Heisenberg
cells are the same. For all the other u parameters ߤ௧ǡߪ௧Ǥ µt and ߪ௧Ǥ of the
heisenberg cell and their value is similar, where there are slight variations between
cosine, ramp or square wave forms . Similarly, with exception of the already
described standard basis, the basis funct ion of the bottom half of Fig. 5.2 is also
analogous for a given u. These basis functions are small waves called wavelets of
form that are scaled and shifted
The DFT basis functions do not seem to be ordered frequency due to aliases.
߰௦ǡ௧ሺݐሻൌʹ௦Ȁଶ߰ሺʹ௦ݐെ߬ሻ
In which s and ߬ are integers and mother wavelet ߰ሺݐሻ is a real, square -integrable
function having a bandpass -like spectrum. Parameter decides the positio n of ߰௦ǡ௧ሺݐሻ on the t -axis, s decided its width —that is, how broad or nar row it is along munotes.in
Page 88
88IMAGE PROCESSING
the t-axis, and ʹ௦ manage its amplitude , the functions analogous to in Fig. 5 .2 have
lowpass spectra and ar e known as scaling functions. Eq ( 72) generates a basis
which is defined by the cells Heisenberg on the right side of Fig. 5.4 along with a
properly constructed mother wavelet
FIGURE 5.4
Time and frequency localization of 128 -point Daubechies basis functions.
(Reference from "Digital Image Processing fourth edition by Rafael C. Gonzalez
and Richard E. Woods)
The proof of Eq. ( 73)
ॅሼ߰ሺʹ௧ݐሻሽൌଵ
ଶೞȲሺ
ଶೞሻ Eq -73
And the spectrum is spread for positive s values — each component of the
frequency is shifted by more than one by factor of 2S. The compressing time
expands the spectrum as was the case for a rectangu lar pulse. It is shown in Figs
5.4(b) – (d) graphically. Note that i n Fig. 5.4 (c), the width of the base function is
half as in (d) whereas its width of its spectrum is twice that of (d). It is shifted by a
factor of two higher frequency. The s ame could be explained in Fig. 5.4 (b) in
comparison to (c) the base function and spectrum. Halving time support and
doubling frequency support, Heisenberg cells of different widths and heights are
produced with the same area. In addition, each cell row to the right of Fig. 5.4
represents a single scale s and frequency range. The cells in a row are moved over
time in relation to each other.
ॅሼ߰ሺݐെ߬ሻൌ݁ିଶగఛȲሺ݂ሻ eq-74
Thus ȁॅሼ߰ሺݐെ߬ሻȁൌȁȲሺ݂ሻ| and the spectra of the time -shifted wavelets are
identical.
munotes.in
Page 89
89Chapter 5: Wavelet and Other Image Transforms
5.4 Basis Images
As the inverse transformati on kernel s(x,y,u,v) in Eq. ( 32) only retains the
indices x,y,u,v, rathe r than f(x,y) or T(u,v), Eq. ( 32) can also be used as the
matrix sum of the kernel
eq-75
where F is an matrix having the elements of f(x, y) and
eq-76
For ݑǡݒൌͲǡǥǤǡܰെͳ F is then clearly defined as a linear combination of ܰଶ
matrices having size ܰܺܰ ,that is , the ܵ௨Ǥ௩ for ݑǡݒൌͲǡǥǤǡܰെͳ , when the
underlying s(x,y,u,v) are real -valued, independent, and symmetric
ܵ௨ǡ௩ୀܵ௨ܵ௩்
Eq-77
Where S u and S v defined by Eq. ( 20) are previously. F is a 2 -D image and is called
basic images in the context of digital imag e processing. As show n in Fig. 5.5 (a), it
can be arranged in a array to give a concise view of 2 -D basis functions they
represent.
EXAMPLE: The basis images of the standard basis .
In figure 5.2 (h), ݁the basis of the NX1 column vector, which is 1 in nth and all
other elements are 0, would be a certain instance (N=16) of a standard basis
݁ǡ݁ଵǡǥǤǤǡ݁ேିଵ . Since it is real orthonormal, and the corresponding orthogonal transformation matrix is during the corresponding 2 -D transform. In
other words, the transfor mation of F on the standard base is F —confident that a
discrete function is implicitly represented in respect of the standard basis if it is
written in vector form.
The basis image of the 2-D size standard base Figure 5.5 (b) Just as the 1 -D-basis
vectores, that are non zero at just one moment (or x -value), are not zero at just one
munotes.in
Page 90
90IMAGE PROCESSING
point on a xyp lane, the basis images of Fig. 5.5 (b) are non zero. This is a result of
Eq (77) , hence ݏ௨ǡ௩ൌ݁௨݁௩்ൌܧ௨ǡ௩ where ܧ௨ǡ௩ is a matrix with zeros having a 1
in the uth row and vth column . in similar way , the DFT basis shown in fig 5.6 .
which follow from Eq. ( 77) , Eq. ( 22) , and the defined equation of the 1 -D DFT
expansion functions [i.e., Eq. ( 56) ]. Note the DFT basis image of maximum
frequency occurs when u and v are 4, just as the 1 -D DFT basis function of
maximum frequency occurred at in Fig. 5.1
FIGURE 5.5 (a) Basis image organization and (b) a standard basis of size For
clarity, a gray border has been added around each basis imag e. The origin of each
basis image (i.e., ݔൌݕൌͲ) is at its top left
(Reference from "Digital Image Processing fourth edition by Rafael C. Gonzalez
and Richard E. Woods)
FIGURE 5.6 (a) Tranformation matrix ܣிof the discrete Fourier transform for N=8
where ߱ൌ݁ିଶగȀ଼ݎሺͳെ݆ሻȀξʹ or (b) and (c)
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
The real and imaginary parts of the DFT basis images of size 8X8 For clarity, a
black border has been added around each basis image. For 1 -D transforms, matrix
ܣி is used in conjunction with Eqs. (43) and ( 44) ; for 2 -D transforms, it is used
with Eqs. (6-41) and (6 -42).
munotes.in
Page 91
91Chapter 5: Wavelet and Other Image Transforms
5.6 Fourier -Related Transforms
The Fourier real function transformation is highly complex. In this section we
discuss three transforms associated to Fourier which are real rather than complex:
the discrete Hartley transform, discrete cosine transform, and discrete sine transform. All three transforms handle unnecessary numbers' computational complexity and can be carried out via fast FFT -like algorithms.
The Discrete Hartley Transform
A discrete Hartley Transform (DHT) is a Fourier -related transformation of discrete,
periodic data analogous to the discrete Fourier Transform (DFT), with analogue
applications in signal processing and associated fields .Its primary difference from
DFT would be that it transforms real inputs to real outputs without involving
complex numbers intrinsically. Much like DFT is the discreet analogue of the
continuous Fourier transformation (FT), DHT is the discrete analogue of the
Hartley transform, which Ralph V. L. Hartley introduced in 1942.
The transformation m atrix of the discrete Hartley transform (DHT) is obtained by
replacing the inverse transformation kernel.
The function cas, the terminology for the cosine -and-sin function, is defined
asݏܽܿሺߠሻൌ
ሺߠሻሺߠሻ
ݏሺݔǡݑሻൌͳ
ξ݊ݏܽܿ൬ʹߨݔݑ
ܰ൰
=ଵ
ξቂݏܿቀଶగ௨௫
ேቁݏ݅݊ቀଶగ௨௫
ேቁቃ eq -78
Whose divisible 2 -D complement is
We shall not consider the non - divisible form ݏሺݔǡݕǡݑǡݒሻൌଵ
ேݏܽܿቀଶగሺ௨௫ା௩௬ሻ
ேቁ
ݏሺݔǡݕǡݑǡݒሻ =ቂଵ
ξݏܽܿቀଶగ௨௫
ேቁቃቂଵ
ξݏܽܿቀଶగ௩௬
ேቁቃ eq -79
Meanwhile the resultant DHT transformation matrix —represented ࢅࡴ in Fig. 5.7
—is real, orthogonal, and symmetric, ࢅࡴൌࢅࡴࢀൌࢅࡴିࢊࢇࢅࡴ should be
used in the calculation of both forward and inverse transform. For 1 -D transforms,
ܣு is used in comb ination with Eqs. (6 -28) and ( 29) for 2 -D transforms, Eqs. (35)
and ( 36) are used . Since ܣுis symmetric, the forward and inverse transforms are
same. munotes.in
Page 92
92IMAGE PROCESSING
FIGURE 5.7
The transformation matrix and basis images of the discrete Hartley transform for
(a) Graphical representation of orthogonal transformation matrix ܣு (b) ܣு
rounded to two decimal places, and (c) 2 -D basis images.
(Reference from "Digital Image Pr ocessing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
For 1 -D transforms, matrix ܣு is used in conjunction with Eqs. (28) and ( 29) ; for
2-D tran sforms, it is used with Eqs. (35) and ( 36)
Note the resemblance of the harmonically associat ed DHT basic functions in
Figure. 5.7 (a) and the real part of the DFT basic f unctions in Fig. 5.1 . It's simple to
demonstrate that
ࢅࡴൌࢇࢋࡾሼࡲሽെࡵࢇࢍሼࡲሽ
ൌࢇࢋࡾሼሻሼࡲሽ eq-(80)
Where ࡲ is the unit transformation matrix of the DFT. In addition, given the real
part of the DFT kernel.
In Eqs. (81) and ( 82) , subscripts HY and F are used to represent the Hartley and
Fourier kernels, respectively.
ܴ݁ሼݏிሺݔǡݑሻሽൌܴ݁൜ͳ
ξܰ݁ଶగ௨௫
ேൠൌͳ
ξܰܿݏ൬ʹߨݔݑ
ܰ൰
eq-(81)
The discrete Hartley kernel can be rewritten using triginometric identity
ܽܿݏሺߠሻൌξʹ
ሺߠെగ
ସሻ[see Eq. ( 78)] as
ݏுሺݔǡݑሻൌටଶ
ሺଶగ௨௫
ேെగ
ସሻ eq-(82)
munotes.in
Page 93
93Chapter 5: Wavelet and Other Image Transforms
The basic functions of the Fourier and Hartley discrete transformations will be
scaled ξʹ and shifted byߨȀͶ, i.e. scaled and shifted by one another. When
comparing Figures 5.1 and 5.7 , the shift is evident (a)
The Discrete Cosine Transform
The most frequently found form discrete cosine transform (DCT) is achieved by
replacing the inverse transformations kernel. There are 8 standard DCT variations
and various symmetry conditions are assumed. For example, even about a sample
or about a point between two samples could be assumed to be the entry
eq-83
Where
eq-84
5.7 Walsh -Hadamard Transforms
An orthogonal transformation technique, the Walsh -Hadamard transform breaks
down a signal into a set of basis functions. It is non -sinusoidal and orthogonal in
nature. Walsh functions, that are rectangular or square waves with values of +1 or
–1, represent as the basis functions for these equations. Walsh -Hadamard
transforms are also referred to as Hadamard transforms Walsh transforms, or
Walsh -Fourier transforms, depending on the context.
(eq-85)
This corresponds to the modulo 2 arithmetic operation perf ormed in the exponent
of Eq. (8 5) and is the kth bit in the binary representation of z. For example, if
n=3 and z=6 and (110 in binary), and If b0(2) = 0, b 1 (z) = 1 and b z (z) = 1 if N=2
the resulting Hadamard -ordered transformation matrix is
munotes.in
Page 94
94IMAGE PROCESSING
WHTs with Hadamard or natural ordering are denoted by the letter Aw, which
stands for Hadamard or natural order transformation matrix. Although it is of size
2X2 in this case, it is more commonly of size N X N, where N X N is the dimension
of the discrete function being transformed.
Where the matrix on the right (without t he scalar multiplier) is called a Hadamard
matrix of order 2. Letting HN denote the Hadamard matrix of order N, a simple
recursive relationship for generating Hadamard -ordered transformation matrices is
Where
And
And
munotes.in
Page 95
95Chapter 5: Wavelet and Other Image Transforms
5.8 Slant Transform
The N x N Slant transform matrices are defined by the recursion
where N = 2n, IM, denotes an M x M identity matrix, and
The parameters a n and b n are defined by the recursions
which solve to give
Using these formulas, the 4 x 4 Slant transformation matrix is obtained as
Properties of the Slant Transform
(i) The slant transform is real and orthogonal.
S = S*
S-1 = ST
munotes.in
Page 96
96IMAGE PROCESSING
(ii) The slant transform is fast, it can be implemented in (N log2N) operations
on an N x 1 vector.
(iii) The en ergy deal for images in this transform is rated in very good to
excellent range.
(iv) The mean vectors for slant transform matrix S are not sequentially ordered
IRUQ
5.9 Haar Transform
In general, the Haar functions hk(x) are described on the continuum interval x א
[0, 1] for k = 0,...,N —1 for which N = 2n. The integer k could be partitioned in a
unique way as follows:
k= 2P+ q-1
Where SQ-1; q = 0, 1 for p = 0 and 1T2P IRUS 0. For example, when N =
4, we have
Representing k by (p ,q), the Haar functions are defined as
In order to produce the Haar transform, it is necessary to allow x take discrete
values at m/N, where m =0, 1,..., N -1. The Haar transform is defined as follows
for N = 8.
munotes.in
Page 97
97Chapter 5: Wavelet and Other Image Transforms
Properties of the Haar Transform
1. The Haar transform is real and orthogonal. Therefore,
Hr=Hr*
Hr-1 = HrT
2. In computing, the Haar transform is an extremely rapid transformation. If
you have a N x 1 vector, you can do it in O (N) operations, which is quite
fast.
3. The basis vectors of the Haar matrix are ordered in a frequency -ordered
manner.
4. For images, the energy compaction of the Haar transform is weak.
5.10 Wavelet Transforms
The wavelet transform is quite close to the Fourier transform (or much more similar
to the windowed Fourier transform), but it has an e ntirely different merit function
than the Fourier transform. The most significant distinction is as follows: On one
hand, the Fourier transform breaks down the input signal into sines and cosines,
i.e., functions that are localised in Fourier space; on the other hand, the wavelet
transform employs functions that are localised in both real and Fourier space. For
the most part, the wavelet transform may be stated mathematically using the
following equation:
where * denotes the complex conjugate symbol and function ȥ denotes a
function of some kind. This function can be chosen at random, as long as it
complies with a few rules.
Because of this, the Wavelet transform is actually an infinite collection of different
transforms that can be computed depending on the merit function that is employed
in its computation. This is the primary reason why we can hear the word "wavelet
transform" used in a variety of various settings and applications, including computer science. There are a variety of methods for categorizing the many types
of wavelet transformations. We just present the division based on the wavelet
orthogonally . We can employ orthogonal wavelets for discrete wavelet transform
development and non -orthogonal wavelets for continuous wavelet transform
munotes.in
Page 98
98IMAGE PROCESSING
development while developing discrete wavelet transforms, for example. The
following are the characteristics of these two transform ations:
After performing the discrete wavelet transform, a data vector of the same length
as the input is returned. In most cases, even in this vector, many data points are
close to zero. This corresponds to the fact that it decomposes into a set of wavele ts
(functions) that are orthogonal to the translations and scaling of the original image
data. As a result, we decompose such a signal into a wavelet coefficient spectrum
with the same or smaller number of wavelet coefficients as the number of signal
data points. Because there is no redundant information in a wavelet spectrum of
this type, it is particularly well suited for signal processing and compression
applications.
Instead, the continuous wavelet transform yields an array that is one dimension
bigger than the input data. We can obtain an image of the time -frequency plane
from a single -dimensional data set. We can clearly see the progression of the
signal's frequencies over the course of the signal's duration and compare the
spectrum to the spectra of o ther signals. Because the non -orthogonal collection of
wavelets is utilized in this analysis, the data is strongly correlated, resulting in a
significant amount of redundancy. This makes it easier to see the outcomes in a
more humane light.
Discrete Wavele t Transform
With the discrete wavelet transform (DWT), you can achieve a more precise
implementation of the wavelet transform by employing a discrete subset of wavelet
scales and translations that adhere to certain constraints. For lack of a better term,
this transform decomposes the signal into a set of wavelets that are mutually
orthogonal to one another, which is the primary difference between it and the
continuous wavelet transform (CWT), or its discrete time series implementation,
which is sometimes re ferred to as discrete -time continuous wavelet transform (DT -
CWT).
The wavelet can be generated from a scaling function that explains the scaling
qualities of the wavelet's scaling properties. The requirement that the scaling
functions must be orthogonal to their discrete translations imposes some mathematical requirements on them, which are referenced throughout the text, for
example, the dilation equation, which can be found here.
munotes.in
Page 99
99Chapter 5: Wavelet and Other Image Transforms
Color Image Processing
The utilization of color for image processing is in spired by two key factors. First,
color is an important descriptor that also simplifies the identification and extraction
of objects from the image. Color image processing is divided into two main parts:
full color and pseudo -color processing. In the initi al group, the images in question
are usually taken with a full -color sensor, such as a color TV camera or a color
scanner. In the second group, the issue is to assign a color to a specific monochrome
intensity or set of intensities. Until recently, much of the digital color image
processing was performed at the pseudo -color level. Even so, over the last decade,
color sensors and color image processing hardware have become available at
affordable prices. As a result, full -color image processing techniques ar e now used
in a wide variety of applications, including publishing, visualization and the
Internet.
x After this chapter is complete, learners can understand color fundamentals
and the color spectrum.
x Know many of the color models used in digital image proce ssing.
x Understand how to implement fundamental techniques, including intensity
slicing and intensity -to-color transformations in pseudo -color image processing.
x You know how to decide whether a gray -scale approach can be extended to
color pictures.
x Know how to work with full color images, including color transformations,
color add -ons and tone/color corrections.
x Have to know the effect of noise in the processing of color images.
x Know how to perform spatial filtering on color images.
x Understand the benefits o f using color in the segmentation of images.
5.11 Colour Fundamentals
Introducing and discussing the visible light spectrum. A series of component colors
includes the visible light, also known as white light. The light passes through
triangular prism and these colors are also observed. Figure 5.8: The colors of the
white light on the prism – red, orange, yellow, green, blue and violet – are
separated. Dispersion is defined as the isolation of visible light into its various
colors. munotes.in
Page 100
100IMAGE PROCESSING
Figure 5.8 Color sp ectrum seen by passing white light through a prism.
James Clerk Maxwell demonstrated that electromagnetic radiation is a form of
light. Radio waves, visible light and rays contain this radiation. As a radiation
range, which extends beyond visible radiation to include one side of the radio
wave and the other side of the gamma rays, Figure 2 indicates electra -magnetic
radiation. A very small part of the electromagnetic spectrum occupies the visible
light field. The sunlight is visible and stretches beyond the red (infraround IR)
and ultraviolet (UV) with a maximum of yellow intensity.
Figure 5.9. The electromagnetic spectrum
Fig.5.9 . The electromagnetic spectrum, which encompasses the visible region of
light, extends from gamma rays with wave lengths of one hundredth of a nanometer
to radio waves with wave lengths of one meter or greater.
In regard to light as an electromagnetic wave, the spectral signature of a color can
be recognized by its wavelength. We perceive the waves as color , the shorter the
munotes.in
Page 101
101Chapter 5: Wavelet and Other Image Transforms
wavelength violet and the longest wave is red. Visible light is the range of
electromagnetic wavelengths to which the eye reacts. The human eye is unable to
adapt to radiation of longer or shorter wave lengths.
Figure 5.9 a. A wave representation of three different light hues:
red, yellow -green and violet
Fig 5.9. A wave representation of three different light hues: red, yellow -green and
violet, each with a different wavelength, which represents the distance between
wave crests.
Fig.5.9 a shows three standard waves of visible light. The longitude of the wave is
the distance from the crest to the next and the lambda, ߣ ,is indicated in Greek.
Violet light is a 410 nanometer wavelength electromagnetic radiation with 680
Nanometers of red lig ht.
Hundreds of thousands of different colors and intensities can be distinguishable
from the human visual system, but just about 100 grey shades. Therefore, a lot of additional information can be found within an image and this additional information can b e used in order to simplify image analyses, e.g. object identification and extraction based on color.
To define a specific color, three separate quantities are used. The color of the
wavelength is the dominant one. The colors visible on the electromagnetic
spectrum range from approximately 410nm (violet) to 680nm (red), as shown in
Figure 5.9a.
The saturation depends on the purity of excitation and on the amount of white light
combined with the hue. There is a pure hue completely saturated, i.e. no mixed
white light. Hue and saturation combined decide the chromaticity of a specified
munotes.in
Page 102
102IMAGE PROCESSING
color. Ultimately, the actual amount of light determines the intensity with a more
intense colors corresponding to the more light.
Achromatic light has no color - only quantity o r intensity is its attribute. Grey level
is an intensity measurement. The intensity of the energy is determined by the
physical quantity. Brightness or luminance, on the other hand, is defined by color
perception and is also psychological. Blue is perceive d to be as darker than green,
given the intensity of blue and green. Note also that our perception of intensity is
nonlinear, as normalized intensity variations are considered to be the same changes
in brightness between 0.1 and 0.11 and 0.5 to 0.55.
Color mainly depends on the object's reflecting properties. We see the reflected
rays, while some are absorbed. The color and function of the human visual system
must also be taken into consideration. For instance, when green but no red light
lights it, the obj ect which reflects red or green, appears green, and in the absence
of green light, the object appears red. In pure white light, it will appear yellow (=
red + green).
Tristimulus theory of color perception
There are 3 kinds of cones in the human retina. Fi gure 5.10 shows the response of
each type of cone in accordance with the wavelength of the incident light. At 440nm
(blue), 545nm (green) and 580nm the peaks for each curve are (red). Note that the
last two in the yellow range of the spectrum actually peak .
Figure 5.10 : Spectral response curves for each cone type. The peaks for each curve
are at 440nm (blue), 545nm (green) and 580nm (red).
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 103
103Chapter 5: Wavelet and Other Image Transforms
CIE primaries
The tristimulus color perception theory seems to suggest that a combination of three
primaries – red, green and blue – can produce any color, while almost all visible
colors can be corresponded with one another, but some cannot. However, when one
of the primary colors, it can be balanced by a combination of the other two, and
thus a negative weighting of the primary can be considered for this particular one.
In 1931, the Commission Internationale de l'Éclairage (CIE), which could be
applied to the shape of all visible colors, specified three basic primaries, the names
X, Y and Z. The primary Y is selected such that the color match function matches
the luminous efficiency function of the human eye precisely, as shown in figure
5.11 for the sum of the three curves.
Figure 5.11 : The CIE Chromaticity Diagram showing all visible colors. x and y are
the normalized amounts of the X and Y primari es present, and hence z = 1 - x - y
gives the amount of the Z primary required.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 104
104IMAGE PROCESSING
Both visible colors appear in the CIE Chromaticity Diagram (see Figure 5.11 ). The
x and y axis provide normal X and Y primary quantities for a certain color and,
therefore, z = 1 – x – y provides the necessary amount of Z primary. Chromaticity
is independe nt of the luminous energy and depends on dominant wavelength and
saturation. Colors of the same chromaticity but differing luminance all map in that
area to the same point.
The pure spectrum colors lie on the curved border and a regular white light is
determined to be close to equivalent energy point x = y = z = 1/3. The endpoints of
a line at this point are followed by additional colors, that is, colors that add white.
Both colors in any line of the chromaticity diagram can be achieved by combining
the col ors of the end points of the line as shown in Figure 5.12 . In addition, the
vertices will form all colors in a triangle by mixing colors.
Figure 5.12 : Mixing colors on the chromaticity diagram.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
All colors on the line IJ can be obtained by mixing colors I and J, and all colors in
the triangle IJK can be obtained by mixing colors I, J and K.
5.12 Color Models
By specifying a 3D coordinate system and a subsp ace that includes all constructible
colors within a particular model, color patterns provide a standard way to specify a
particular color. Every color that can be described by a model is a single point
within the subspace it defines. The basic hardware (RG B, CMY, and YIQ) or image
processing applications (HSI) are oriented to each color model.
munotes.in
Page 105
105Chapter 5: Wavelet and Other Image Transforms
The RGB Model
In the RGB model, an image is composed of 3 independent image levels: red, green
and blue. One is in each of the prim ary colors. (As seen in figure 5.10 , the regular
wavelengths are for the three primaries). The sum of each primary component
present is specified in a particular color. The geometry for color defined with a
Cartesian coordination system for RGB color model is shown in Figure 5.13 . In the
line between the black and white vertices is the greyscale spectrum, i.e. these colors
made of equal quantities of each primary.
Figure 5.13 : The RGB color cube. The greyscale spectrum lies on the line joining
the black and white vertices.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
It is an additive model, which is to say that the colors present in t he light add to
new colors and is suitable, for example, for mixing colored light. The additive color
mix of red, green, and blue primaries to form three secondary colors yellow (red +
green), cyan (blue + green) and magenta (red + blue) and white ((red + green +
blue) in the picture on the left of Figure 5.13 .
For the most color displays and video cameras the RGB model is used.
The pixel depth is termed the number of bits used to display each pixel in RGB
space. Consider an RGB image, which contains 8 -bit images of each red, green,
and blue image. Each pixel RGB [that is, triplets of values (R, G, B)] is 24 bits in
depth under th ese situations (3 image planes times the number of bits per plane). A
24-bit RGB image is sometimes referred to by the term full -color image. The total
munotes.in
Page 106
106IMAGE PROCESSING
number of possible colors in a 24 -bit RGB image is (28)3= 16, 777, 216 . Figure
5.14 illustrate the 24 -bit RGB color cube analogous to the diagram in Figure 5.14
Note that the range of values in the cube are also scaled to the numbers that
represent the number bits in the images for digital images. If the key images are 8 -
bit, the limits of the cube would b e [0, 255]. For instance, at the point [255, 255,
255] white will be in the cube.
Figure 5.14 A 24 -bit RGB color cube.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
The CMY and CMYK Color Models
A subtractive model for absorption of the color, for example because of colored
pigments in paints, is the CMY (cyan -magenta -yellow) model. While the RGB
model determines what's added to black to obtain a particular color, the CMY
model determines w hat is subtracted from white. The primaries are cyan, magenta
and yellow in this case, the secondary color being red, green and blue.
When the surface is illuminated with a cyan pigment with white light, no red light,
including magenta, yellow and blue, is reflected. The RGB and CMY model relationship is defined by:
ܥ
ܯ
ܻ൩ൌͳ
ͳ
ͳ൩െܴ
ܩ
ܤ൩ eq-1
Most devices, such as color printers and copiers, that store colored pigments on
paper need CMY data input or make an RGB internal conversion to CMY. The
munotes.in
Page 107
107Chapter 5: Wavelet and Other Image Transforms
conversion is ca rried out with simple operation (1) where all color values are
assumed to be standardized into the [0, 1] range. Equation (1) shows that the light
reflected by the pure cyan surface does not contain any red (i.e., C = 1 — R in the
equation). Likewise, pure magenta is not green and pure yellow is not blue.
Equation (1) also shows that a collection of CMY values allows RGB values to be
easily obtained by deducting from 1. Per CMY value. As previously stated, this
color model is used for the gene ration of a hardcopy output while image processing,
so that it is usually of little practical interest to reverse CMY to RGB operation.
The CMY model is used by printing devices and filters.
Figure 5.15 : The figure on the left shows the additive mixing o f red, green and blue
primaries to form the three secondary colors yellow (red + green), cyan (blue +
green) and magenta (red + blue), and white ((red + green + blue). The figure on the
right shows the three subtractive primaries, and their pairwise combin ations to form
red, green and blue, and finally black by subtracting all three primaries from white.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
The same quantities can contain black in the pigments primaries, cyan, magenta,
and purple. The combination of these colors in practice makes a black look muddy.
Thus, a fourth color, black, denoted by the K, will be added to create true black
(which then in print is the dominant color) which will lead to th e color of the
CMYK model. The black is applied to the measurements needed for true black. So
when publishers speak of 'four -color printing,' they refer to the 3 CMY colors, plus
a slice of black.
munotes.in
Page 108
108IMAGE PROCESSING
To convert from CMY to CMYK, use the following formula:
K = min (C, M, Y) eq -2
C = C K
M = M K
Y = Y – K
If then we have pure black, with no color contributions, from which it follows
that
C=0 eq -3
M=0 eq -4
Y=0 eq -5
The C, M, and Y on the right side of Eqs . (2) -(5) are in the CMY color system.
The C, M, and Y on the left of Eqs. (6) -(9) are in the CMYK system.
Otherwise
& &í.í.HT -6
0 0í.í.HT -7
Y=(Y -K)/(1 -K) eq -9
Where all values are assumed to be in the range [0, 1]. The conversi ons from
CMYK back to CMY are:
& &
í..HT -10
0 0
í..HT -11
< <
í..HT -12
The C, M, Y, and K on the right side of Eqs. (10) - (12) are in the CMYK color
system. The C, M, and Y on the left of these equations are in the CM Y system
The HSI Color Model
The HSI model would improve the RGB model. The color model of the hue
saturation intensity is closely similar to the color sensing properties of human
vision. The method, which transforms from RGB to HSI or back, is harder than
other color models. I denote the intensity of light, H refers to the hue indicating the
measure of the purity of colors, S refers to the saturation. If the saturation value of
a color is high, it indicates that the color is the low white color. munotes.in
Page 109
109Chapter 5: Wavelet and Other Image Transforms
Three quantities of hue, saturation and intensity may be defined for color. This is
the HSI model and the whole color space that can be specified is shown in Figure
5.16
Figure 5.16 : The HSI model with the left HSI, with the right HSI triangle, formed
by a horizontal slice with a particular intensity through the HSI solid. The color is
measured by a red, and the distance from the axis is saturated. The colors, i.e. pure
colors, on the solid surface are fully saturated, and the grey spectrum on the solid
axis. Hue is undefined for these colors.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Conversion between the RGB model and the HSI model is quite complicated. The
intensity is given by
ܫൌܴܩܤ
͵
where the quantities R, G and B are the amounts, normalized to the range [0,1] in
the red, green and blue components. Therefore the intensity is only an average of
the components red, green and blue. The saturation is provided through:
ܵൌͳെሺܴǡܩǡܤሻ
ܫൌͳെ͵
ܴܩܤሺܴǡܩǡܤሻ
where the min(R,G,B) term is really just indicating the amount of white present. If
any of R, G or B are zero, there is no white and we have a pure color.
munotes.in
Page 110
110IMAGE PROCESSING
Certainly this quantity is easy to measure and interpret. The c olor model of HSI
(hue, saturation, intensity) separates the intensity component into a chromatic
image from the data that conveys colors (hue and saturation). The HSI model is
therefore an ideal tool for developing image processing algorithms that are nat ural
and intuitive to humans, based on color descriptions. The primary colors are
divided by 120°. The secondary colors are 60° from the primaries, which means
that the angle between secondaries is also 120°. Figure 10 shows the same
hexagonal shape and an arbitrary color point (shown as a dot)
An angle from a reference point determines the hue of the point. In general a 0°
angle of the red axis is 0 hue, and the hue from there increases in reverse. The
vector length from origin to point is the saturation ( the distance from the vertical
axis). The origin is defined by the vertical intensity axis intersection of the color
plan. High intensity axis, length of the /vector up to a color point and the angle this
vector creates with the red axis are the important components of the HSI color
spaces.
Figure 5.17 Hue and saturation in the HSI color model. The dot is any color point.
The angle from the red axis gives the hue. The length of the vector is the saturation.
The intensity of all colors in any of these plan es is given by the position of the plane
on the vertical intensity axis.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 111
111Chapter 5: Wavelet and Other Image Transforms
Figure 5.18 The HSI color model based on (a) triangular, and (b) circular color
planes. The triangles and circles are perpendicular to the vertical intensity axis.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Converting colors from RGB to HIS
Consider RGB values n ormalized to the range [0; 1].
Given an RGB value, H is obtained as follows:
ܪൌ൜ߠ݂݅ܤܩ
͵Ͳെߠ݂݅ܤܩ
It should be normalized to the range [0; 1] by dividing the quantity computed above
by 360
șLVJLYHQE\
ߠൌݏܿିଵቊͳȀʹሾሺܴെܩሻሺܴെܤሻሿሾሺܴെܩሻଶሺܴെܤሻሺܩെܤሻሿଶሿቋ
munotes.in
Page 112
112IMAGE PROCESSING
șLVPHDVXUHGZLWKUHVSHFWWRUHGD[LVRI+6,VSDFH
Saturation is given by
ܵൌͳെ͵
ܴܩܤሾሺܴǡܩǡܤሻሿ
Intensity component is given by
ܫൌܴܩܤ
͵
Converting colors from HSI to RGB
Consider the values of HSI in the interval [0; 1]
+VKRXOGEHPXOWLSOLHGE\RUʌWRUHFRYHUWKHDQJOHIXUWKHUFRPSXWDWLRQLV
based on the value of H
RG sector – ƕ+ƕ
ܤൌܫሺͳെܵሻ
ܴൌܫͳܵݏܿܪ
ݏܿሺͲെܪሻ൨
ܩൌ͵ܫെሺܴܤሻ
GB sector – ƕ+ƕ
ܪᇱൌܪെͳʹͲ
ܴൌܫሺͳെܵሻ
ܩൌܫቈͳܵݏܿܪᇱ
ݏܿሺͲെܪᇱሻ
ܤൌ͵ܫെሺܴܩሻ
BR sector – ƕ+ƕ
ܪᇱൌܪെʹͶͲ
ܩൌܫሺͳെܵሻ
ܤൌܫቈͳܵݏܿܪᇱ
ݏܿሺͲെܪᇱሻ
ܴൌ͵ܫെሺܩܤሻ
munotes.in
Page 113
113Chapter 5: Wavelet and Other Image Transforms
A Device Independent Color Model
Color model CMYK and RGB are device -dependent, that is to say, on the way a
color is translated. They point to a particular device, but don't have information
about the final person's perception of the color. The color with the identical RGB
parameters is viewed by us in various ways based on the brightness contrast and
the sharpness of your computer monitor, ambient light and the angle we are looking
at the monitors. In a 'CMYK' color model, a person's perception of color depends
on an even higher variety of conditions (for example gl ossy paper is more vivid
and rich, than a matte color), particularly paint, the humidity of the papers dried up
and the attributes of the printed printing press, etc.
We append (such as) color profiles to hardware -reliant color patterns to transmit
extra dependable color information to a person. So each profile includes information about a specific way of human color transmission and adapts the last
color to the original color settings by adding or removing any constituents. For
instance, a colored profile is used for printing on glossy foils, 10% Cyan cleaning
and 5% Yellow adding to the original color, because of the unique attributes of the
printing press, the film itself and other conditions. However, not all the transfer
color problems can be solved, ev en attached profiles.
Device -independent color models have no information on a person's color transfer. They describe color that people with normal colored vision perceive mathematically.
A device independent color space is one in which the coordinates use d to define
the color, wherever they are applied, produce the same color. The color space CIE
L*a*b* is an example of a device -dependent color space (known as CIELAB and
based on the human visual system).
The color components are given by the following equ ations:
ܮכൌͳͳǤ݄൬ܻ
ܻ௪൰െͳ
ߙכൌͷͲͲ݄൬ܺ
ܺ௪൰െ݄൬ܻ
ܻ௪൰൨
and
ܾכൌʹͲͲ݄൬ܻ
ܻ௪൰െ݄൬ܼ
ܼ௪൰൨
where
݄ሺݍሻൌቐඥݍయݍͲǤͲͲͺͺͷǤ
Ǥͺݍͳ
ͳͳݍͲǤͲͲͺͺͷ munotes.in
Page 114
114IMAGE PROCESSING
The reference white tri stimulus values of Xw, YW and Zw are usual white of a
diffuser reflective of CIE D65 illumination standard (defined by x = 0.3127 and y
= 0.3290 in CIE chromaticity diagram of Figure 5.11 ). Figure 5.11 : The colors of
the L*a*b are colorimetric (i.e. colors that match are identically enco ded), are
perceptually uniform (i.e. the color differences among various colors are uniformly
perceived - see Mac Adams class paper [1942]). Although the format cannot be
viewed directly (it is required to convert to another color space), its spectrum cove rs
the whole spectrum of viewers and can precisely display, print or print, or input
device. Like the L*a*b system, it is an excellent intensity decoupled (presented in
lightness L*) from the color (presented with a* for red minus green and b* for green
minus blue), which makes it useful both for the manipulation of images (tone and
contrast editing) and for image compression application. Calibrated imagery
systems benefit primarily from interactive, independent correction of tonal and
color imbalances in t wo sequence operations. Before irregularities such as colors
over and under saturated, the problems of the tonal range of the images are
resolved. The tonal range of an image, also known as its key type concerns its
overall color intensity distribution. Mo st information on high -key images are
concentrated at high (or light) intensities; colors for low -key images are mainly at
low intensities; middle -key images lie among them. As in the monochrome case,
the intensities or the image of color between the highl ights and the shadows are often equally desirable. The following examples show a range of color transformations for tonal and color balance correction.
5.13 Pseudocolor Image Processing
Pseudo color processing (also known as false color means assigning col ors to grey
values, on the basis of a given criterion. The name pseudo or false color is applied
to distinguish the color assignment process from the process associated with true
color for human visualization and interpretation of gray -scale events in an image or image sequence. The fact that human beings distinguish thousands of colors shades and intensities from only two dozen or so shades of the grey is one
of the main reasons for coloring.
Intensity Slicing and Color Coding
This is a straightforward ca se for pseudocolor image processing. It is also known
as color coding or density slicing.
Imagine a 3D function greyscale image with the intensity of the third dimension.
The pixel position coordinates would "slice" the image into two parts when placing
a plane parallel to the horizontal plane. Then different colors can be assigned to munotes.in
Page 115
115Chapter 5: Wavelet and Other Image Transforms
different levels. Figure 5.8 shows an example of using a plane at f(x, y) = l i to slice
the image function into two levels.
If each side of the plane shown in Figure 5.19 is assigned a different color, a pixel
whose grey level is over the plane is coded in one color, whilst any pixel underneath
it is coded in one color. One of the two colors may be assigned arbitrarily to the
levels on the plane itself. The result is a two -colored picture that can control its
relative appearance by moving the sliding plane upward and downward.
The technique can be formulated as continues to follow in particular. The grey scale
is [0, L -1], and l et lo is indicate as black [f(x,y),y) = 0], and level l L-1the
represented white is [f(x,y) =L -1]. Assume that P planes are defined in levels l 1,
l2,....,l p are perpendicular to the intensity axis. If 0 < P < L –1, the grey scale is
divided into I 1, I2... I p + 1.
݂݂݅ሺݔǡݕሻ߳ܫݐ݈݂݁ሺݔǡݕሻൌܿ
Where ck is the color accompanying with the kth intensity interval I k defined by the
partitioning planes at l = k - 1 and l = k.
Figure 5.19 Geometric interpretation of the intensity slicing technique.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
The concept of planes is particularly useful to interpret the intensity -slicing
technique in geometric terms. Figure displays an alternative im age defining the
same mapping as in Fig. Depending on whether the mapping function in the figure
is above or below the li value, any grey input level is assigned one of two colors.
The mapping function takes on a staircase form when more levels are used.
munotes.in
Page 116
116IMAGE PROCESSING
Figure 5.20 An alternative representation of the intensity -slicing technique.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Figure shows a simple yet practical application of intensity slicing. The grayscale
image of the radiation testing pattern Picker Thyroid Phant om is Figure 5.21 (a)
and the intensity slicing of this image is divided into eight colors, as shown in
Figure 5.21 (b). Regions that occur in the grayscale are in fact very variable, as the
different colors in the sliced image show.
The left lobe is a dull grey in the grayscale picture, for example, which makes it
difficult to pick out changes in intensity. In contrast, 8 regions of constant intensity
are clearly visible in the picture, one for each of the colors used. With different
color numbers and intensity intervals, the characteristics of intensity variations in
a grayscale image are quickly determined. In situations such as the one shown here,
the object of interest has a uniform texture with variations in intensity which are
difficult to analyze visually.
Figure 5.21 (a) Grayscale image of the Picker Thyroid Phantom. (b) Result of
intensity slicing using eight colors.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 117
117Chapter 5: Wavelet and Other Image Transforms
The above simple example divided the grayscale into intervals and each was given
a different color, regardless of the significance of the grayscale. In this case, the
interest was simply to view the various grey levels that make up the image. Intensity
slicing plays a much more meaningful role when the grayscale is divided according
to the physical charac teristics of the image Figure 5.22 .(a) shows, for example, the
X-ray image of a weld (wide dark horizontal area) with multiple cracks and
porosities (the bright streaks running horizontally through the middle of the
image). If porosity or crack is present in a weld, the full force of X -rays through
the object saturates the image sensor across the object. Thus, an 8 -bit image of
intensity values of 255 in such a system implies a welding problem automatically.
If visual analysis is used to inspect welds (a widely accepted method still now),
the inspector can make the work significantly easier with the simple color coding
which allocates one color to level 255 or another to all other intensity levels. The
result is displayed in Figure 5.22 (b). The conclusion that if images were pr esented
in the form of Figure 5.22 .(b) no explanation was required that human error rates
would be lower in other words, if you are looking for an intensity or range of values,
intensity slicing is a simple but effective visualization aid, especially if many
images need to be checked regularly.
Figure 5.22 (a) X -ray image of a weld. (b) Result of color coding.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Intensity to Color Tran sformations
The idea behind this approach is to carry out three independent grey pixel
transformations of any input pixel. The three outcomes are then separately fed into
a color Television screen's red, green and blue channels. This technique generates
a composite image whose color contents are amplified by the nature of the
transformations functions. Mention that the gray -level image values are transformations and are not position functions.
munotes.in
Page 118
118IMAGE PROCESSING
In the intensity slicing, piecewise linear functions of the grey level generate linear
functions to produce colors. This method can, however, be focused on smooth,
nonlinear functions, which give significant versatility as might be expected.
Figure 5.23 Functional block diagram for pseudocolor ݂ோǡ݂ீ݂݀݊ܽ image
processing. Images and are fed into the corresponding red, green, and blue inputs
of an RGB color monitor.
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
The type of processing shown just now is very powerful for visualizing events of
interest for complex images, in particular when they go beyond our normal sensing
capabilities. Figure 5.24 is a very good example. These are Jupiter Moon Io
images, shown in pseudo -color, when several se nsor photos of the Galileo
spacecraft are combined, some of which cannot be seen on the eye in spectral
regions. However, it is possible to combine the sensed image with a meaningful
pseudo -color map by understanding physical and chemical processes that ca n
influence sensors' responses.
Figure 5.24: pseudo color rendition Jupiter Moon Io.
munotes.in
Page 119
119Chapter 5: Wavelet and Other Image Transforms
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
One way of combining sensed image data is by showing differences in the
composition of surface chemicals or by modifying the reflected surface sunlight.
The pseudo color image, for example, in Figure 5.25 shows bright red representing
the newly expelled ma terial from Io's active volcano. This image provides these
features much more easily than would have been possible through individual
analysis of the component images
Figure 5.25 : A close -up (Courtesy of NASA)
(Reference from "Digital Image Processing fo urth edition by
Rafael C. Gonzalez and Richard E. Woods
5.14 Basics of Full -Color Image Processing
Techniques for full color image processing fall under two categories. In the first
category, each component image is constructed independently and a composite
color image from the individual elements is then established. One work directly
with color pixels in the second category. Due to the at least three color images,
color pixels are actually vectors. In the RGB system for example, every color point
could be interpreted as a vector in the RGB coordinate system extending from the
source to that point.
Let c represent an arbitrary vector in RGB color space:
ܿൌܥோ
ܥீ
ܥ൩ൌܴ
ܩ
ܤ൩
munotes.in
Page 120
120IMAGE PROCESSING
This equation shows that the components of c are at one point simply the RGB
components of a color image. When the color components are a coordinate function
(x, y), the notation is used
ܿൌܥோሺݔǡݕሻ
ܥீሺݔǡݕሻ
ܥሺݔǡݕሻൌܴሺݔǡݕሻ
ܩሺݔǡݕሻ
ܤሺݔǡݕሻ– pq(2)
MN vectors such as c(x, y), x = 0,1, 1, 2,...,M - l; y = 0,1,2,,...,N - 1, are present on
an image of the size M X N.
It should be kept in mind clearly that Eq.(2) represents a v ector with spatial
variables of x and y components. Two conditions have to be met for the equivalence
of each color component and vector -based processing: First, both vectors and
scalars must be covered by this process. Second, it must be independent of al l other
components for each component of the vector.
Figure 5.26 Spatial masks for gray -scale and RGB color images.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Figure 5.26 shows grey scale and full -color images for the spatial processing in the
neighborhood. Assume the process is an average neighborhood. The average grey
level of all pixels in the neighborhood would be summarized and divided by the
total number of pixels in the neighborhood in Figure 5.26 (a). In Figure 5.26 ( b),
an average of all vectors in the neighborhood would be summed up and the total
number of vectors in the neighborhood would be divided by each component. So
each component of the average pixel is the sum of the image pixel of the same
component, which is the same as the result if the average is per color component.
munotes.in
Page 121
121Chapter 5: Wavelet and Other Image Transforms
5.15 Colour Transformations
In the context of a single color model, the color transformation processes the color
image components.
Formulation
As with the techniques of transformation at gray -level of model colour with the
expression
ሺǡሻൌሾሺǡሻሿ
Where f(x,y) is an color input image, g(x,y) is a color output image that's
transformed or processed, and T has become an operator i n f over a space area of
f (x,y).
The pixel values here are three -fold or quartets from the color space selected to
represent the images (e. g. groups of three or four values).
We introduced the basic grayscale transformations similarly to the approach. I n this
section we will focus only on the color changes in the form.
ൌሺͳǡʹǡǤǤǤǤǤǤǡሻǡൌͳǡʹǡǤǤǤǤǤǤǤǤǡ
Where ri and si are variables for notation simplicity which are f(x,y) and g (x,y) at
any point (x,y ) color components, n is the number of color components, and
{T1,T1,....,Tn} is a set of functions for transformation and color map that operates
at the time of ri to produce si. Notice that n transformations, Ti, integrate in
equation to implement the sin gle transformation function T . The color space
selected to describe f and g pixels defines the value of n. The red, green, and blue
components for the image input respectively indicate when the RGB color space is
selected, e.g. n=3, r1,r2, and r3. When se lecting CMYK or its color spaces, n=4 or
n=3.
Figure above shows a color image of a large -size (4' x 5 ') high -resolution color
image of a bowl of strawberries and cup of coffee that's been digitized. The initial
CMYK scan components are in the second row of the figure. In these images, 0 is
black and 1 color component in each CMYK. This is shown by the strawberries,
because the images that correspond to the two CMYK components are the brightest
of the large quantities of magenta and yellow. Black is spare and generally
contained in the bowl of strawberries in coffee and shadows. The last row or Fig. 7
shows the components of Figure 5.27 — equated, (2) (4). As expected, a monochrome rendering of the original color is the intensity component. Figure 7:
compon ents computed using equation, (2) up to (4 )are shown in this last row . The
intensity component is presumed to be a monochrome version of the original full -
colour. Moreover the strawberries are relatively pure in colour; they have the
highest saturation o r a minimum dilution of the hues in the image by white light,
so finally we find it difficult to describe the hue component. munotes.in
Page 122
122IMAGE PROCESSING
The challenge is exacerbated by the fact that in a model where 0 and 360o meet,
there is a discontinuity and that the saturation of the hue is uncertain ( — for
example for white, black, and pure grey). The discontinuity is quite obvious around
the strawberries, which are displayed in both black (0) and white grey level values
(1).
Figure 5.27 : A full -color image and its various colo r-space components (Original
image courtesy Med -data Interactive.)
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Equation (2 ) can be used with any color space components of Figure 5.27 (2).
Theory shows that any changes in any color model can be carried out. However, a
few other operations are more suitable for specific models in practice . The cost of
transferring representations into a decision on the colour space to implement this
must be considered for a given transformation. Suppose, for example, we want to
change the image intensity using Fig. 5.27.
݃ሺݔǡݕሻൌ݂݇ሺݔǡݕሻ
Where 0 < k < 1 , the simple transformation can be done in the colour space.
ݏଷൌݎ݇ଷ
Where s1 = r1, and s2 = r2. O nly its component intensity r3 is altered. Three
components have to be changed in the RGB colour space:
munotes.in
Page 123
123Chapter 5: Wavelet and Other Image Transformsݏൌݎ݇݅ൌͳǡʹǡ͵
Similar linear transformations are required in CMY space:
ݏൌݎ݇ሺͳെ݇ሻ݅ൌͳǡʹǡ͵
In the same way, the transformations need ed to change the CMYK image intensity
are given
ݏൌ൜ݎ݅ൌͳǡʹǡ͵
ݎ݇ሺͳെ݇ሻ݅ൌͶ
This equation tells us that we only modify the fourth (K) component to modify the
intensity of a CMYK image.
Figure 5.28 shows that the transformations are applied to the full color image. In
Figure 5.28 (c) through (h) the mapping functions are shown graphically. The
CMYK mapping function consists of two components, the same is the case with
HSI; one component is handled by the transformations, the other by the rest. As
most various transformations have been made, the net outcome of modifying the
color intensity by a fixed value for everyone has been the identical
Figure 5.28 Adjusting the intensity of an image using color transformations. (a)
Original image. (b) Result of decreasing its intensity by 30% (i.e., letting ). (c)
The required RGB mapping function. (d) –(e) The required CMYK mapping
functions. (f) The required CMY mapping function. (g) –(h) The required HSI
mapping functions.
(Original image courtesy of MedData Interactive.)
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 124
124IMAGE PROCESSING
It must be noted that every trans formation represented through out varies depending
on only one component within the space of its colors. The red ou tcome component,
varies depending on only the red input and is autonomous of the green and blue
inputs. Straightforward and regularly used tools for color processing include
transformations of this type. As mentioned at the start of our discussion, it can be
done on a per -color basis.
We would also evaluate numerous transformations of this type in the remainder of
this section and discuss the matter wherein the functions for c omponent
transformation depend on all the color components of the input image and therefore
cannot be performed on a single color component basis.
Color Complements
The color circle (also known as the color wheel) shown in Figure 5.29 came from
Sir Isaac N ewton who developed his first position at the end of the color spectrum
in the seventeenth century. The color circle is an image of the colors organized as
per their chromatic connection. The circle consists of the primary colors being
equally distant. The secondary colors are then positioned in an equal distance
arrangement between both the primary colors . The net outcome is that the color
circle is supplemented by hues directly opposite one another. Our interest in
supplements i s due to their analogy with the grayscale negative.
Color complements are, as with the gray -scale particular instance, effective to
enhance the detail embedded in the dark regions of a color image notably if the
regions dominate. Some of these concepts are explained in the following example.
Figure 5.29 Color complements on the color circle.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 125
125Chapter 5: Wavelet and Other Image Transforms
EXAMPLE: Computing color image complements.
The full picture of Figure 5.30 and its complement to colour is displ ayed in Figure
5.30 (a) and (c). Figure 5.30 (b) illustrates the RGB transformations used to
calculate a complement. They correspond to the negative grayscale process. The
complement recalls conventional c olour film negative photographic elements. The
cyans in the complement replace the reds of the original image. The compliment is
white when the original image is black, etc . The colour circle of Figure 5.30 .
allows the original image to be predicted for e ach hue in the supplement image, and
only a corresponding input colour functions for each RGB component that is
involved in the calculation of a complement.
Figure 5.30 Color complement transformations. (a) Original image. (b) Complement transformation functions. (c) Complement of (a) based on the RGB
mapping functions. (d) An approximation of the RGB complement using HSI
transformations.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Wood
Color Slicing
It is useful to highlight a special range of colours in an image to separate objects
from their environment. The basic idea: (1) demonstrates the colours of interest so
they can distinguish themselves from the background; or (2) uses a color -defined
area as a mask. The simplest approach is to extend the techniques of intensity
slicing. Even so, since the colour pixel is an n -dimensional quantity , the consequent colour transformation functions are more difficult than their counterparts in
munotes.in
Page 126
126IMAGE PROCESSING
grey scale. In essence, the necessary transformations seem to be quite complicated
than that of the originally thought color component transformations. It is be cause
every practical color slicing method includes the transformed color components of
every pixel as a function of all the color components of the original pixel.
One way of "slicing" a color image is to map colors into a nonprominent neutral
color after a certain range of interest. When a Cube of width W (or hypercube for)
is contained in interest’s colors, the required transformation set shall be determined
through prototypes (e.g. average) of color with components.
ݏൌ൝ͲǤͷ݂݅ቂቚݎെܽหݓ
ʹቃ
ݎݐ݄݁ݏ݅ݓݎ݁ݕ݊ܽͳ݆݊݅ൌͳǡʹǡǥǡ݊
Such transformations illustrate the colours surrounding the prototype by forcing all
other colours to the centre point of the colour space (this is an arbitrarily chosen
neutral point) . For example, the colour space for the RGB is middle gray or colour
(0.5, 0.5, 0.5) with an appropriate neutral point.
Eq. has become whenever a sphere is used to indicate the colours of interest.
ݏൌ൞ͲǤͷ݂݅ሺݎെܽ
ୀଵሻଶܴଶ
ݎݐ݄݁ݏ݅ݓݎ݁݅ൌͳǡʹǡǥǡ݊
The enclosed sphere radius (or hypersphere for) and its centre is component (i.e., a
prototypical colour). Other bene ficial variations of Eq and include the
implementation of multiple colour protot ypes and a reduction of the intensity of
colours, rather than a neutral constant.
5.16 Color Image Smoothing and Sharpening
It will be illustrated in this section how the fundamentals of this type of
neighborhood processing are applied to the task of smo othing and sharpening
colored images.
Color Image Smoothing
Using the grayscale picture smoothing technique, one can view of spatial filtering
as a spatial filtering operation in which the coefficients of the filtering mask are all
1. As the mask is moved across the image to be smoothed, each pixel is replaced
by the average of the pixels in the neighborhood indicated by the mask, resulting
in a smoothed image with fewer pixels. The application of this principle to the
processing of full -color photogr aphs is straightforward. The most significant
distinction is that, rather than dealing with scalar gray -level values, we must deal
with component vectors of the form given in Equation ( 2).
Assume Sxy is the set of coordinates that define a neighbourhood in an RGB colour
image that is centred at ( x, y ). In an RGB colour image , the average of the RGB
component vectors in this neighbourhood is calculated as follows: munotes.in
Page 127
127Chapter 5: Wavelet and Other Image Transforms
Equation ( 1 )
It follows from Equation ( 2 ) and the properties of vector addition that
Equation ( 2 )
Identify the components of this vector as scalar images that would be created by
independently smoothing each plane of the beginning RGB image using standard
gray-scale neighborhood processing, as shown in the example below. Consequently, we conclude that smoothing by neighborhood averaging can be
performed on a per -color plane basis using neighborhood averaging. In this case,
the result is identical to the one obtained when the averaging is conducted using
RGB color vectors.
Figure 5.31 Color Image Smoothing
[Image refer from “ Smoothing, Sharpening and Segmentation of Image ”Dr Mir
Mohammad Azad, M N I Chowdhury International Journal of Engineering and Applied
Sciences (IJEAS) ISSN: 2394 -3661, Volume -5, Issue -5, May 2018 ]
munotes.in
Page 128
128IMAGE PROCESSING
Eg:- Consider the color image shown in Figure 5.31 (a). The red, green, and blue
planes of this image are depicted in Figs, Figure 5.31 (b) through (c)
The HSI components of the image are depicted in Figures Figure 5.31 (a) through
(d). We simply smooth each o f the RGB colour planes on its own, and then merge
the smoothed planes to generate a smoothed full -color result by combining the
processed planes.
Figure 5.32 HSI component of the RGB color image
[Image refer from “ Smoothing, Sharpening and Segmentation of Image ”Dr Mir
Mohammad Azad, M N I Chowdhury International Journal of Engineering and Applied
Sciences (IJEAS) ISSN: 2394 -3661, Volume -5, Issue -5, May 2018 ]
Undoubtedly, the most significant advantage of the HSI colour model is that it
deco uples intensity (which is closely related to grey scale) from colour information.
The fact that it is suited for various gray -scale processing techniques suggests that
smoothing simply the intensity component of the HSI representation shown in
Figure 5.32 may be more efficient than smoothing the entire representation. This
approach's benefits and/or drawbacks are demonstrated next by smoothing only the
intensity component, while keeping the hue and saturation components unaffected,
and converting the resul t to an RGB image for display.
Color Image Sharpening
In this section, we will look into image sharpening with the Laplacian function.
Knowing the Laplacian of an input vector, we can define it as a vector with
components that are identical to the Laplacia n of each individual scalar component
of the input vector. This definition comes from the field of vector analysis. The
Laplacian of vector c in Equation I corresponds to the RGB colour scheme.
munotes.in
Page 129
129Chapter 5: Wavelet and Other Image Transforms
In the same way that we learned in the previous section, we can compute the
Laplacian of a full -color image by computing the Laplacian of each component
image individually.
Figure 5.33 Image sharpening with the Laplacian
[Image refer from “ Smoothing, Sharpening and Segmentation of Image ”Dr Mir
Mohammad Azad, M N I Chowdhury International Journal of Engineering and Applied
Sciences (IJEAS) ISSN: 2394 -3661, Volume -5, Issue -5, May 2018 ]
Using Equation to compute the Laplacians of the RGB component pictures in
Figure 5.30 and merging them to generate the sharpened full -color result in Figure
5.33, .This result was obtained by multiplying the Laplacian of the intensity
component by the hue and saturation components, which remained constant.
5.17 Using Color in Image Segmenta tion
Segmentation is a process that partitions an image into regions.
Segmentation in HIS color Space
If we want to segment an image depending on colour, for example, and in this case,
Furthermore , we wish to carry out the procedure on an individual basis. When it
comes to planes, it is reasonable to think of the HSI space because the hue picture
provides an accessible representation of colour. Typically, using saturation as a
masking image, you can separate off more information. In the hue image, look for
munotes.in
Page 130
130IMAGE PROCESSING
areas of interest. The image of intensity is as follows: Because of this, it is
employed less frequently for colour picture segmentation. It does not contain any
colour information. The following is an illustration: This is an example of how
segmentation i s carried out in the HSI system.
Figure 5.34 Image segmentation in HSI space. (a) Original. (b) Hue. (c) Saturation.
(d) Intensity. (e) Binary saturation mask (f) Product of (b) and (e). (g) Histogram
of (f). (h) Segmentation of red components from (a).
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Figure 5.34 (f) depicts the product of the mask and the hue image, and Figure 5.34
(g) depicts the histogram of the product image (note that the gray scale is in the
range [0, 1]). We can observe in the histogram that the high values
(which correspond to the values of interest) are grouped at the very high of the
grayscale, clos e to the value of one. Figure 5.34 (h) depicts the binary image
munotes.in
Page 131
131Chapter 5: Wavelet and Other Image Transforms
produced b y thresholding a product image with a threshold value of 0.9 as the
outcome of the thresholding operation. Using the spatial location of the white spots
in this image, we can determine which points in the original image have the reddish
hue that we are loo king for. The segmentation approach used here was far from
ideal, as there are areas in the original image where we would definitely state there
is a reddish hue, but which were not identified by this segmentation method. The
regions depicted in white in Figure 5.34 (h) are, however, the best that this
approach can accomplish in detecting the reddish components of the original
image, as determined by trial and comparison. The segmentation method detailed
in the following section has the potential to produce better outcomes than the
previous method.
Segmentation in RGB vector Space
Working in HSI space is more intuitive, segmentation is one area in which better
results generally are obtained by using RGB color vectors. The approach is
straightforward. Suppose that the objective is to segment objects of a specified
color range in an RGB image. Given a set of sample color point’s representative of
the colors of interest, we obtain an estimate of the “average” color that we wish to
segment. Let this averag e color be denoted by the RGB vector a. The objective of
segmentation is to classify each RGB pixel in a given image as having a color in
the specified range or not. In order to perform this comparison, it is necessary to
have a measure of similarity. One of the simplest measures in the Euclidean
distance. Let z is similar to a if the distance between them is less than a specified
threshold, D0. The Euclidean distance between z and a is given by
In this case, the subscripts R, G, and B represent the RGB c omponents of vectors a
and z, respectively. As shown in the illustration, th e locus of points such that D ( z,
a ) D0 is a solid sphere with radius D0, and this is known as the locus of points.
Figure 5.34 (a). Points that are contained within or on the surface of the object.
Points outside the sphere satisfy the provided colour condition; spheres that satisfy
the specified colour criterion sphere od does not exist. It is necessary to code these
two groups of p oints in the picture. Using, for example, black and white, one can
create a binary segmented image.
munotes.in
Page 132
132IMAGE PROCESSING
Figure 5.34 Three approaches for regions for RGB vector segmentation.(a ->b->c)
(Reference from "Digital Image Processing fourth edition by Rafael C. Gonzalez
and Richard E. Woods)
5.18 Noise in Color Images
It is possible to use color images with the noise models. While the noise content of
a color image typically has the same properties in each color channel, it is possible
for separate color channel s to be affected differentially by noise in some cases. If
the electronics of a particular channel malfunction, this is one possibility to
consider. Different noise levels, on the other hand, are more likely to be produced
by changes in the relative amount s of noise. A measure of the amount of
illumination supplied to each individual color channel using a red filter in a CCD
camera, for example, can significantly weaken the image. because of the amount
of illumination detected by the red sensor elements Bec ause CCD sensors are
noisier at lower levels of illumination, the resulting red light is more intense. In this
circumstance, the noise component of an RGB image would tend to be louder than
the noise components of the other two component images.
In this ex ample, we explore the impact of noise in colour images. The image in
Figure 5.35 (a) through (c) depicts the three primary colours in an RGB image with
added additive Gaussian noise, whereas Figure 5.35 (d) represents the final image
after the colours have been mixed together. In color images, fine grain noise, such
as this, appears less obvious since it is masked by colour. The results of converting
the RGB image in Figure 5.35 to HSI are shown in Figure 5.35 (a) through (c).
Consider how much worse the hue and saturation values in the noisy image have
gotten when compared to the original HSI image. Similarly, the overall image of
munotes.in
Page 133
133Chapter 5: Wavelet and Other Image Transforms
Figure 5.35 (c) is a little smoother and has less noise than any of the thr ee noisy
RGB images. As can be seen in Equation, the intensity image is the sum of the
RGB images
Figure 5.35 (a)–(c) Red, green, and blue 8 -bit component images corrupted by
additive Gaussian noise of mean 0 and standard deviation of 28 intensity level s. (d)
Resulting RGB image
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Figure 5.36 HSI components of the noisy color image in Figure 5.35 (d) . (a) Hue.
(b) Saturation. (c) Intensity.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 134
134IMAGE PROCESSING
When a single RGB channel is damaged by noise, for example, converting to HSI
distributes the noise among all HS I component images. An illustration of this is
shown in Figure 5.36 . Figure 5.36 (a) illustrates an RGB image with a corrupted
green component caused by salt and pepper noise with a probability of either salt
or pepper of 0.05. The HSI component images in Figure 5.36 (b) through (d) clearly
demonstrate how the noise extended from the green RGB channel to all of the HSI
images. Naturally, this is not surprising, as the HSI components are computed using
all RGB components.
Figure 5.36 (a) RGB image with green plane corrupted by salt -and-pepper noise.
(b) Hue component of HSI image. (c) Saturation component. (d) Intensity
component
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
5.19 Color Image Compres sion
Due to the fact that the number of bits necessary to represent colour is often three
to four times that required to represent grey levels, data compression is critical for
the storing and transmission of colour images. In the previous sections' RGB,
CMY(K), and HSI images, the data that are compressed are the components of each
colour pixel (e.g., the red, green, and blue components of the pixels in an RGB
munotes.in
Page 135
135Chapter 5: Wavelet and Other Image Transformsimage); these are the mechanism by which colour information is given. Compression is the process of removing duplicate and/or unneeded data.
Figure 5.37 (a) depicts a 24 -bit RGB full -color representation of an iris, where each
of the red, green, and blue components is represented by eight bits. Figure 5.37 (b)
was created using a compressed version of the image in (a) and is thus a compressed
and then decompressed approximation of it. Although the compressed image
cannot be displayed directly on a colour monitor —it must be decompressed first —
it comprises just one data bit (and hence one storage bit) fo r every 230 bits of data
in the original image. Suppose that the image is of size 2000 × 3000 =6.106 pixels.
The image is 24 bits/pixel, so it storage size is 144 ڄ 106 bits.
Figure 5.37 Color image compression. (a) Original RGB image. (b) Result of
compressing, then decompressing the image in (a).
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 136
136IMAGE PROCESSING
5.20 Summary
We have studied the basis form ation of digital images and the real and imaginary
parts of the DFT basis images .Several transforms such as discrete Hartley
Transform (DHT) Discrete Cosine Transform , Walsh -Hadamard Transforms, Slant
Transform, Haar Transform and their properties, Wavelet Transforms and Discrete
Wavelet Transform. Studies the fundamental of colour,CIE primaries, different
types of color models like RGB Model ,CMY and CMYK Color Models ,HSI Color
Model and concepts of conversion for color models from RGB to HIS ,HSI to RGB,
Color model CM YK and RGB for independent devices, Pseudo color processing ,Intensity to Color Transformations ,Techniques for full color image processing , Color Image Smoothing and Sharpening and so on.
5.21 Unit End questions
1. Explain the Matrix -Based Transforms.
2. Explain t he basic functions in the Time -Frequency Plane.
3. Write short note on Discrete Hartley Transform.
4. Write short note on Slant Transform.
5. Write short note on Haar Transform.
6. Write short note on Wavelet Transforms.
7. Explain the color fundamentals.
8. Explain the col or models.
9. Explain the Pseudocolor Image Processing
10. Explain the color image Segmentation.
11. Explain the noise in the color images.
12. Explain the Color image compression.
5.22 Reference for further reading
1. Digital Image Processing Gonzalez and Woods Pearson/Prentice Hall Fourth
2018
2. Fundamentals of Digital Image Processing A K. Jain PHI
3. The Image Processing Handbook J. C. Russ CRC Fifth 2010
4. All Images courtesy from Digital Image Processing by Gonzalez and Woods,
4th Edition
munotes.in
Page 137
137Chapter 6: Image Compression and Watermarking
Unit 3
6 IMAGE COMPRESSION AND WATERMARKING
Unit Structure
6.0 Introduction
6.1 Fundamentals
6.2 Huffman Coding
6.3 Golomb Coding
6.4 Arithmetic Coding
6.5 LZW Coding
6.6 Run -length Coding
6.7 Symbol -based Coding
6.8 8 Bit -plane Coding
6.9 Block Transform Coding
6.10 Predictive Coding
6.11 Wavelet Coding
6.12 Digital Image Watermarking
6.13 Summary
6.14 Unit End questions
6.15 Reference for further reading
6.0 Introduction
In recent years multimedia product development and demand have increased more
and more, contributing to an inadequate network bandwidth and memory storage
device. The theory of data compression therefore becomes increasingly important
in order to reduce the redundancy of data, saving more bandwidth and space in
hardware. The process of encoding information using fewer bits or other information units than an unencoded representation is data compression or source -
coding in computer science and information theory. It helps to reduce costly
resource consumption, for exampl e, disk space or bandwidth transmission. Theory
and practice will be introduced, the most commonly used compression techniques
examined, and the industry standards described, that will help us. This section
concludes on the process of introducing visible a nd invisible data (such as
information concerning copyright) to images into digital images watermarking. munotes.in
Page 138
138IMAGE PROCESSING
6.1 Fundamentals
Data compression, also known as compaction, the process by which data is
reduced, typically using encoding techniques, necessary for storage or the transfer
of a given information piece. There must be a clear distinction between information
and data. It's not synonymous with it. Actually, data is the means of transmitting
information. The same amount of information may be represented by different
amounts of data. You can see an example in order to understand image redundancy
or data redundancy in digital image processing. Suppose two people told a story
from Ramesh and Suresh. In the comparison with Ramesh, Suresh told the story in
less words, where Ramesh had too many words to tell the same story. Either
Ramesh said irrelevant information/data that isn't the part of his story, or he
repeated his words several times. In other words, it includes data (or words) not
providing any relevant information or simply restoring what has already been
known. This means that data redundancy is included
Data redundancy is a key issue for the compression of digital images. It is not an
abstract concept, but a quantifiable entity mathematica lly. If n1 and n2 indicate the
number of information -carrying in two data sets that comprise the same information, the relative data redundancy (R D) of the first data set (n1) can be
defined as
ܴൌͳെͳ
ܥோ
Where CR is commonly referred to as the compression ratio
ܥோൌ݊ଵ
݊ଶ
In case n2 = n1, C R = 1, R D = 0 shows that the first representation of the information
does not contain redundant data (related to the second dataset). In n2 << n1 ,ܥோฺ
λฺܴͳ which invol ves considerable compression and highly redundant data.
Finally, the second set of data contains a lot more than the original representation
of݊ଶب݊ଵܥோฺͲฺܴλ, respectively. Naturally this is the usually unwanted case of expansion of the data. C R and R D are generally intervals (0,λ)
and ( -λ, 1), respectively. A useful Compression Ratio, such as 10 (or 10:1), means
that for each one unit in a second or compressed data set, a first data set contains
ten information carrying units (say bits) . The corresponding 0.9 redundancy means
that 90% of the first data is redundant.
Three fundamental data redundancies could be recognized and used in digital
image compression: coding redundancy, interpixel redundancy, and psychovisual
redundancy. Data com pression is obtained by reducing or eliminating one or more
of these redundancies. munotes.in
Page 139
139Chapter 6: Image Compression and Watermarking
Coding Redundancy:
In this we use a wording to demonstrate how an image's grey histogram can also
provide a good insight into code construction in order to minimize the amou nt of
data used for it.
Again suppose a discrete random variable r k in interval [0, 1] is the grey levels in
the image and every r k is likely to occur probability pr (rk).
ሺݎሻൌ݊
݊݇ൌͲǡͳǡʹǡǥǡܮെͳ
Where L is the number of grey level, n k is the number that is demonstrated in the
image with the kth grey level, and n is the total pixel number in the image. If the
number of bits used for each r k's value is l (r k), the average number of bits needed
for each pixel is
ܮ௩ൌ݈ିଵ
ୀሺݎሻሺݎሻ
In other words, the average length of the code words allocated to the different grey
level values is determined by summing up each grey level's number of bits and
probability of the grey level occurrence. The nu mber of bits needed to code the M
X N image is also MNL avg.
Interpixel Redundancy:
Take i nto account the images in Fig. 6 .1(a) and (b). These images show almost
identica l histograms as shown in Figs. 6 .1(c) and (d). It is also important to note
that both histograms are trimodal, and that three dominant ranges of grey level
values are present. Due to the not equally likely grey values of these images,
variable length coding can be applied for reducing the redundancy of coding
resulting from a straight or natural binary pixel coding. Nevertheless, the coding
process does not affect the correlation level between the pixels in the images. That
is, the codes used to represent each image's grey levels are not related to the
correlation between pixels. munotes.in
Page 140
140IMAGE PROCESSING
Fig.6.1 Two images and their gray -level histograms and normalized
autocorrelation coefficients along one line.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
The autocorrelation coefficient calculated along a line of ea ch picture is shown in
Figures 6 .1(e) and (f).
ߛሺο݊ሻൌܣሺοሻ
ܣሺͲሻ
Where
ܣሺο݊ሻൌͳܰെο݂݊ሺݔǡݕሻ݂ሺݔǡݕο݊ሻேିଵିο௬ୀ
munotes.in
Page 141
141Chapter 6: Image Compression and Watermarking
7KHDERYHVFDOLQJIDFWRULVWKHQXPEHURIVXPWHUPVIRUHDFKLQWHJHUYDOXHRIǻQ
This value is variable. Nat XUDOO\WKHWRWDORISL[HOVRQDOLQHPXVWEHǻQ strictly
lower than N. The variable x is the line co -ordinate for the computation used.
Notice the shape of functions in Figs. 6 .1(e) and (f) . Their shapes can be associated
objecti vely to the structure in Figs. 6 .1(a) and (b). This relationship is especially
apparent in Fig. 6 .1(f), where there is a direct correlation between both the
significant correlations among both pixels divided into 45 and 90 samples between
the verti cally oriented match es of Fig. 6 .1. (b).
These images represent another significant form of data redundancy – one which
is directly related to the image's interpixel correlations. The information contained
in individual pixels is comparatively small because the value of any g iven pixel
could be adequately predicted from the value of its neighbours. A great deal of a
single pixel's visual contribution to a picture is redundant, based on its neighbours' values. A variety of names have been coined to refer to these interpixel dependencies, including spatial redundancy, geometric redundancy, and frame
redundancy. To include all of them we use the term interpixel redundancy.
The 2D pixel array usually used for human visualization and interpretation should
be changed into a more effe ctive (but usually nonvisual) format to reduce interpixel
redundancies in an image. An image may for instance be represented by the
differences between adjacent pixels. This type of transformations is known as
mapping (i.e. those which delete interpixel re dundancy). They are called reversible mappings if you can reconstruct the original picture components from the transformed data package.
Psychovisual Redundancy:
The brightness of a region, as the eye perceives, depends not only on the light
reflected by t hat region. For instance, variations in intensity (Mach bands) in an
area of constant intensity can be perceived. Such phenomena arise because the eye
does not respond to all visual information with equal sensitivity. Some information
is simply less import ant for normal visual processing than other information. It is
said that this information is psycho -visually redundant. It can be removed without
adversely affecting the quality of the perception of images.
It is not surprising that psychovisual redundanci es are in place, since human
perception of the data in an image does not usually require a quantitative analysis of each pixel value in an image. An observer usually investigates and psychologically combines characteristics, such as edges and textures, int o
recognizable groups. In order to complete the process of image interpretation the
brain correlate these groups with previous knowledge. Psycho -visual redundancy
differs from the previously mentioned redundancies. Psychovisual redundancy is munotes.in
Page 142
142IMAGE PROCESSING
associated with actual or measurable visual information, in contrast to coding and
interpixel redundancy. The removal of the information is only possible as it is not
necessary for the normal visual processing. As the removal of redundant psycho -
visual dat a causes the loss of quantitative information, quantization is commonly
called.
This terminology corresponds to the ordinary use of the word, usually by mapping
a wide range of input values to a limited number of output values. The quantization
results in loss of data compression, since the operation is irreversible (visual
information lost).
Fidelity Criteria Realizable or quantitative information is lost due to the removal of psychovisually redundant data. Since information relevant is lost, it is highly
desirable to use repeatable or reproducible methods to quantify the nature and
extent of the data loss. As the basis for such an evaluation, there are two general
classes of criteria:
A) Objective fid elity criteria and
B) Subjective fidelity criteria.
When the information loss level could be described as a function of the original or
input image as well as the compressed and then decompressed output image, an
objective fidelity criterion is stated for this. The root -mean -square error (rms) of an
input to the output image is a good example. That f(x, y) is an image input, and
allow f(x, y) to indicate an estimation or approximation of f(x, y) which is the result
of compression and decompression of the in put afterwards. e(x, y) error between f
(x, y) and ݂መሺݔǡݕሻfor any value of x and y can be defined as
݁ሺݔǡݕሻൌ݂መሺݔǡݕሻെ݂ሺݔǡݕሻ
so that the cumulative error among the two images is
ൣ݂መሺݔǡݕሻെ݂ሺݔǡݕሻ൧ேିଵ
௬ୀெିଵ
௫ୀ
Where the images are M X N. The root -mean -square error, e rms is the square root
averaged over the M X N array and the errors between f(x, y) and ݂መሺݔǡݕሻ
݁௦ൌͳ
ܰܯൣ݂መሺݔǡݕሻെ݂ሺݔǡݕሻ൧ଶேିଵ
௬ୀெିଵ
௫ୀଵȀଶ
The mean -square signal -to-noise ratio of the compr essed -decompressed image is a
closely linked objective fidelity criterion . If݂መሺݔǡݕሻ is considered to be the sum of munotes.in
Page 143
143Chapter 6: Image Compression and Watermarking
the original image f(x, y) and a noise signal e(x, y), the mean -square signal -to-
noise ratio of the output image, denoted SNR rms, is
ܴܰܵ௦ൌσσ݂መሺݔǡݕሻଶ ேିଵ
௬ୀெିଵ
௫ୀ
ͳ
ܯܰσσൣ݂መሺݔǡݕሻെ݂ሺݔǡݕሻ൧ଶேିଵ
௬ୀெିଵ
௫ୀ
In the square root of Eq. above, the rms value of the signal -to-noise relation, known
as the SNR rms, is derived.
While objective fidelity criteria provide a simple and convenient m echanism for
assessing loss of information, human beings are ultimately viewed the most decompressed images. Therefore, it is often more appropriate to measure image
quality with a human observer's subjective assessments. To this end, a "typical"
decompres sed image is shown in a corresponding cross section of viewers and the
evaluations are averaged. The assessments can be carried out using an absolute
evaluation scale or by comparing f(x, y) and ݂መሺݔǡݕሻside-by-side.
Image compression models
Fig.6.2 illustrates that there are two different structural blocks of a compression
system, one encoder and one decoder. The input image f(x,y), which produces a set
of symbols from the input data, is fed into the encoder. The encoded representation
is then fed in to the decoder after transmission through the channel to produce the
revised output ݂መሺݔǡݕሻǤ݂መሺݔǡݕሻcan be an accurate f (x, y) replica in general, or not.
If so, the system is error -free or information -free; otherwise, the reconstructed
image contains som e degree of distortion. Two relatively independent functions or
subblocks comprise both the encoder and decoder shown in Fig.6.2 . The encoder
consists of a source encoder that eliminates input redundancies and a channel
encoder that enhances the noise immu nity of the output from the source encoder.
As intended, a channel decoder with a source decoder is included with the decoder.
If there is noise free (not an error) channel between the encoder and decoder, the
channel encoder and Decoder shall be removed a nd the specific encoder and
Decoder shall be respectively the source encoder and decoder.
Fig.6.2 A general compression system model
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 144
144IMAGE PROCESSING
The Source Encoder and Decoder:
The source encoder must reduce or delete any coding, interpixel or psychovisual
redundancies in the input image. In each case, the particular application and related
fidelity criteria determine the best encoding method of use. Usually, a series of
three independent operations can model this ap proach. As illustrated by Fig. 6.3
(a), one of the three redundancies is reduced by each operation. The corresponding
source decoder is shown in Figure 6.3 (b). The mapper converts the input data i nto
(usually non -visual) format in the first step of the source encoding process so that
interpixel redundancies are reduced in the input image. Generally, this operation is
reversible and can decrease the data needed to represent the image directly or not .
Fig.6.3 (a) Source encoder and (b) source decoder model
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Run-length coding is an instance of mapping that leads directly to data compression
in this initial phase of the whole encoding process. An example of the opposite is
the image representation by a set of transforming coefficients. This is where the
mapper converts the image into a range of coefficients, rendering its interpixel
redundancies available in later phases of the encoding process for compression.
The second step , or quantizer block in figure 6.3 (a), decreases the precision of the
output of the mapper according to a fidelity criterion pre -established. The psycho -
visual redundancies o f the image are reduced by this stage. This is an irreversible
procedure. It is an irreversible operation. Therefore, if the compression is required,
it must be omitted.
The symbol coder generates a fixed or variable -length code to constitute the
quantizer output in the third and final phases of the source encoding process and
maps the source output to a code. The term symbol coder differs from the entire
source encoding process. This coding operation is the same. In most cases, the
munotes.in
Page 145
145Chapter 6: Image Compression and Watermarking
mapped and quantized dat a set is represented by variable -length code. It provides
the shortest code words to the output values that most often occur and reduces the
redundancy of the code. Naturally, the operation is reversible. After the symbol
coding step is completed, each of the three redundancies has been removed by the
Image .
Figure 6.3 (a) indicates three successive operations in the source encoding process
although not all three are generally contained in each compression system.
Remember, for instance, that if error -free c ompression is required, the quantizer
must be removed.
Fig. 6.3.(a). The mapper and quantizer, for example, are often represented in the
predictive compression systems by a single block, which performs both operations
simultaneously. Fig 6.3(b) includes only two components in a source decoder: a
decoder for symbols and an inverse mapper. The inverted operations of the source
encoder and mapper blocks are performed on these blocks in the reverse order.
Since quantizations lead to irreversible loss of information, the general sourc es
decoder model shown in Fig. 6.3 (b) does not include reverse quantizing blocks.
The Channel Encoder and Decoder:
In the overall encoding -decoding process, a channel encoder and decoder play a
significant r ole when the channel in Fig. 6.3 is noisy or susceptible to error. They
are intended to decrease channel noise by inserting in the source encoded data a controlled form of redundancy. Since the source encoder's output is low in redundancy, transmission noise withou t this controlled redundancy is highly
sensible. R. W. Hamming developed one of the most effective channel coding
methods (Hamming [1950]). It is based on adding sufficient bits to the encoded
data to ensure that certain minimum bits are changed between va lid words. (Multi -
bit errors can be detected and corrected by adding extra redundancy bits.) The
Hamming 7 -bit number (7, 4) is a 4 -bit binary number b3b2b1b0, which is h 1, h2,
h3...., h 6, h7.
݄ଵൌܾଷْܾଶْܾ
݄ଶൌܾଷْܾଵْܾ
݄ସൌܾଶْܾଵْܾ
݄ଷൌܾଷ
݄ହൌܾଶ
݄ൌܾଵ
݄ൌܾ munotes.in
Page 146
146IMAGE PROCESSING
Where X refers to an exclusive OR operation. Please note: bits h 1, h2 and h 4 are
equal parity bits respectively for b 3 b2 b2, b3b1b0, and b 2b1b0. (Recall that a string
of binary bits has even parity if the number of bits with a value of 1 is even.) The
channel decoder must check the coded value for an odd parity over the bit fields,
even parity, to decode a Hamming encoded result. The non-zero word C 4c2c1 is a
one-bit error.
ܿଵൌ݄ଵ۩݄ଷ۩݄ହ۩݄
ܿଶൌ݄ଶ۩݄ଷ۩݄۩݄
ܿସൌ݄ସ۩݄ହ۩݄۩݄
When a non -zero value is identified, the decoder will simply add the word code bit
position that is indicated by the word parity. The decoded binary value is then
removed as h 3h5h6h7 from the corrected code word.
Variable -Length Coding:
The easiest way to compress images without errors is to only minimise coding
redundancy. Coding redundancy is usually available in any normal grey level
binary encoding of a grey scale image. The grey levels can be removed by coding.
In order to do that, a varying -length code must be constructed which assigns the
shortest possible code words to the most probable grey levels. Here, we look at
various optimal and almost optimal methods for building such a code. The
techniques are expressed in the information theory language. Practically the source
symbols may be the grey image levels or the output of a mapping process at a grey
level (pixel discrepancies, run -lengths and so on
6.2 Huffman coding:
Huffman is the most popular way to remove coding redundancy (Huf fman [1952]).
The Huffman code provides the smallest number of code symbols per source
symbol when independently coding the symbols of an information source. The
resultant code is optimum for a set value of n in terms of the noiseless coding
theorem, provi ded the source symbols are coded one at a time.
The first step in Huffman's technique is to produce a series of reduced source values
by arranging the probability of the symbols being considered and combining the
lowest probability symbols to the next sour ce reduction in a single symbol. This
binary coding process can be seen in Figure 6.4 (K-ary Huffman codes can also be
constructed). A hypothesized set of source symbols and their probabilities in terms
of decreasing probability values can be ordered from top to bottom. To decrease
the initial source, the 0.06 and 0.04 probabilities of the bottom of the two sources
are combined with the probability 0.1 of a 'compound symbol.' The compound
symbol is placed in the first column for reducing source and its associated munotes.in
Page 147
147Chapter 6: Image Compression and Watermarkingprobability, so that reduced spring probabilities are also most likely to be determin ed. This process is then repeated until a reduced source with two symbols
(at the far right) is reached.
The second step in the process of Huffman is to code each source that is reduced,
beginning with the smallest source and returning to the original sour ce. Naturally,
symbols 0 and 1. are the minimal binary code for a two symbol source. As
illustrated in Fig. 6.4, the two symbols were allocated to the right (the assignment
is arbitrary; reversing the order of the 0 and 1 would work just as well). Since th e
reduced source symbol is 0.6, combining two two symbols in the reduced source
on the left, the code 0 is now allocated to these two symbols and o ne symbol 0 and
1 are arbitrary appended to each to differentiate between them. For each reduced
source, this process was repeated until it reaches the initial source. T he final code
is shown in fig. 6.5 at the far left. This code has an average length.
Fig.6.4 Huffman source reductions.
Fig.6.5 Huffman code assignment procedure.
Lavg = (0.4)(1) + (0.3)(2) + (0.1)(3) + (0.1)(4) + (0.06)(5) + (0.04)(5) = 2.2 bits/pixel
and the entropy of the source is 2.14 bits/symbol. The resulting Huffman code
efficiency is 0.973.
The procedure of Huffman generates optimum code for a number of symbols and
probab ilities that are restricted from the coding of the symbols one at a time. Once
the code is generated, it is easy to encode and/or decode in a lookup table way. The
code as a whole is a block code that can be decoded instantly. A block code is
named as the code is mapped to a fixed sequence of code symbols for each source
symbol. It's immediate because every word of code can be decoded into a series of
symbols without referring to the following symbols. It is unique in that every code
munotes.in
Page 148
148IMAGE PROCESSING
string can be decoded i n one way only. Thus any Huffman string of encoded
symbols can be decoded by examining left -to-right at the individual string symbols.
The left to the right scan of the 010100111100 encoded string shows that the initial
word code of the Fig. 6.5 binary cod e is 01010, which is the symbol code a3. The
next valid code is 011, which is symbol a1. The completely decoded message to be
a3a1a2a2a6 is revealed in this way.
6.3 Golomb Coding
A Golomb code is a variable -length code similar to Huffman; but, unlike Huff man,
it is based on a simple concept of the likelihood of the values (which are explicitly
treated as natural numbers rather than abstract symbols): small values are more
likely than large ones. The precise relation between size and probability is captured
in a parameter, the divisor.
To Golomb -code a number, find the divisor's quotient and remainder. The quotient
should be written in unary notation, followed by the remainder in truncated binary
notation. In practise, a stop bit is required following the quotient: if the quotient is
expressed as a succession of zeroes, the stop bit is a one (or vice versa - and people
do seem to prefer to write their unary numbers with ones, which is Wrong). The
remainder's length can be calculated by the divisor.
A Golomb -Rice code is a Golomb code with a power of two as the divisor, allowing
for more efficient implementation via shifts and masks rather than division and
modulo.
For example, here's the Golomb -Rice code with divisor 4, for numbers up to 15:
munotes.in
Page 149
149Chapter 6: Image Compression and Watermarking
The source of information A generates the symb ols {A0, A1, A2, A3 and A4} with
the corresponding probabilities {0.4, 0.3, 0.15, 0.1 and 0.05}. Encoding the source
symbols using binary encoder and Golomb encoder gives: Source Symbol Pi Binary Code Golomb Code Source Symbol A0 0.4 000 0 A0 A1 0.3 001 10 A1 A2 0.15 010 110 A2 A3 0.1 011 1110 A3 A4 0.05 100 1111 A4 Lavg H = 2.0087 3 2.05 L avg
The Entropy of the source is
Since we have 5 symbols (5<8=23), we need 3 bits at least to represent each symbol
in binary (fixed -length code). Hence the average length of the binary code is \
Thus the efficiency of the binary code is
The average length of the Golomb code is
Thus the efficiency of the Golomb code is
This example demonstrates that the efficiency of the Golomb encoder is much
higher than that of the binary encoder.
munotes.in
Page 150
150IMAGE PROCESSING
6.4 Arithmetic coding:
Unlike previously mentioned variable -length codes, nonblock codes are created by
arithmetic coding. There is no single correspondence between the source symbols
and the code words of arithmetic coding that can be traced back to Elias's work. A
single arithmetic code word is instead assigned to a whole sequence of source
symbols (or message). The word code describes an interval between 0 and 1 of real
numbers. With the increase in the n umber of symbols in the message, the range for
representing it becomes smaller and the number of units of information necessary
for representing the range is larger. The size of the interval is reduced by each
message symbol according to its probability. S ince the technique requires that each
source symbol does not translate into an integral number of coding symbols (that
is, coding the symbols one by one at a time) as is the approach of Huffman, it
reaches (but only in theory) the boundary of noiseless cod ing theorem.
Fig.6.6 Arithmetic coding procedure
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
The fundamental arithmetic coding procedure is illustrated in Fig.6.6 . A five -
symbol or message sequence, a1a2a3a3a4, is coded in this section from a four -
symbol source. The message is supposed to occupy the complete half -open
interval [0, 1) at the beginn ing of the coding process. As shown in table 5.2, this
interval was originally divided into four regions, depending on each source
symbol's probabilities. For instance, the symbol ax, is related to the sub -interval [0,
0.2). The message interval is origina lly restricted to [0, 0.2] because the message is
the first symbol of the coded message. Thus Fig.6.6 [0, 0.2] extends the figure to
its full height, with the end points marked with the narrowing range values.
munotes.in
Page 151
151Chapter 6: Image Compression and Watermarking
Table 6 .1 Arithmetic coding example
Symbol a 2 reduces the subinterval by [0.04, 0.08), and a3 closes it by [0.056,
0.072), respectively. A special end -of-message indicator should be reserved for the
final message symbol, and the range is narrowed by [0.06752, 0.0688). Naturally,
any number can be us ed to represent the message within this subinterval —for
example, 0.068.
Three decimal digits represent the five -symbol message in the arithmetically coded
message of Fig.6.6 . It is evaluated in 3/5 or 0.6 decimal digits by source symbol
with entropy, which is 0.58 decimal digits or 10 -ary units/symbol.
The resulting arithmetical code reaches the limit introduced by the noiseless coding
theorem, as the length of the coded sequence increases.
In practice two factors lead to the coding performance being less t han limited (1)
by adding the end -of-message indicator required to separate one message from the
other; and (2) by applying finite arithmetic precision. The latter issue is addressed
in practical implementations of arithmetic coding with a scaling strategy and a rounding strategy (Langdon and Rissanen [1981]). The scaling strategy renormalizes every [0, 1] subinterval before splitting it into the probabilities of the
symbols. The rounding strategy ensures that finite precision arithmetic truncation
does not prevent correctly representing coding subintervals.
6.5 LZW Coding:
The Lempel Ziv -Welch (LZW) coding method applies fixed -long coding words to
variable sequences of source symbols but does not require a priori knowledge of
the probability of encoding symbols occurring. LZW compression has been built
into a number of mainstream image file formats, such as the interchange format
(GIF), tagged image file format (TIFF), and the portable document format (PDF).
Conceptually, LZW coding is very strai ghtforward. A codebook or "dictionary" is
created at the beginning of the coding process that contain the source symbols to
be coded. The first 256 words in the dictionary are given to grey values 0, 1, 2...
And 255 for 8 -bit monochrome images. The gray -level sequences not included in
munotes.in
Page 152
152IMAGE PROCESSING
the dictionary are algorithmically determined (for example the next unused)
locations when an encoder sequences the image pixels consecutively examines. If,
for example, the two first pixels of an image are white, sequence "25 5-255," the
address below is reserved for grey levels 0 to 255, and can be assigned to position
256. The next time two white pixels of the following number are encountered, the
code word 256 is used to represent them with the location address, which contai ns
sequence 255 -255. If in the code process a 9 -bit, 512 -word dictionary is used, a
single 9 -bit code word replaces the original bits (8 + 8) used to represent two pixels.
Cleary, a major system parameter is the size of the dictionary. If it is too small, it
is less likely to detect corresponding grey level sequences; if it is too large, the size
of the code words affects the performance of the compression.
Consider the following 4 x 4, 8 -bit image of a vertical edge:
The steps involved in coding 16 pixel s will be described in Table 6.2 . The
following starting content is supposed to include a 512 words dictionary:
Initially unused are locations 256 through 511. It is encoded with a left -to-right,
top-to-bottom processing of its pixels. The variable — column 1 of Table 6.2 —
known as the currently recognised sequence is linked to each successive Gray Level
value. Thi s variable is null or empty at the start, as can be seen. The dictionary will
be searched for each concatenated sequence and the newly concatenated and
accepted sequence (i.e. located in the dictionary) will replace the dictionary as it
has in the first ro w of the table. This has been done in row 2 column 1.
munotes.in
Page 153
153Chapter 6: Image Compression and Watermarking
There is no generation of output codes or alteration of the dictionary. If, however,
the concatenated sequence cannot be identified, the current sequence is output as
the next encoded value, a concatena ted but unknown sequence is added to the
dictionary, and the current pixel value is started in the sequence currently
recognized. This was done in table row 2. The last two columns provide a detailed
gray-level sequence when the whole 4 x 4 image is scanne d. There are defined nine
extra words of code. The dictionary includes 265 code words and a number of
repetitive gray -level sequences were successfully identified by the LZW algorithm —which uses them in order to reduce the original 128 -bit image to 90 bits
(i.e., 10 9 -bit codes). Reading the third column from top to bottom produces the
encoded result. The compression ratio resulting is 1.42:1.
One special feature of LZW coding that has just been demonstrated is that while
the data is encoded, the code dicti onary or book is created. Noteworthy, the LZW
decoder produces the same decompression dictionary as it decodes the encoded
data stream simultaneously. While this example does not require a strategy for
handling dictionary overflow, most useful applications require a strategy. A
convenient way is to flush or reset the dictionary when complete and continue to
code with a new initialized dictionary. A more complicated way is to monitor
compression and flush the dictionary in bad or unacceptable circumstances. Instead
it is easy to track and replace the least used dictionary entries if necessary.
Table 6.3 LZW coding example
munotes.in
Page 154
154IMAGE PROCESSING
6.6 Run-Length Coding
Repeating intensity images on their rows (or columns) may also be compressed by
showing the same intensity runs as run-length pairs, where a new intensity starts
with each run -length pair and the number of consecutive pixels with that intensity.
In the 1950s the technique, known as Run -Length Encoding (RLE) was developed
and became the traditional compression method in facsimile coding, along with its
2-D extensions. Compression is accomplished by elimination of a simple type of
spatial redundancy – the same -intensity groups. When there are few (or no) runs of
identical pixels, run -length encoding results in data exp ansion.
Run Length Encoding (RLE) is not a technique used too commonly these days, but
it is an excellent way of making a sense of some of the compression problems.
Imagine that we have a simple black and white image.
One way a computer can store this im age in binary is to use the format of '0' in
white, and '1' in black (this is a "bitmap," since pixels have been mapped to bits).
The above picture would be shown as follows using this method:
011000010000110
100000111000001
000001111100000
000011111110000
000111111111000
001111101111100
011111000111110
111110000011111
011111000111110
001111101111100
000111111111000
000011111110000
000001111100000
100000111000001
munotes.in
Page 155
155Chapter 6: Image Compression and Watermarking
011000010000110
The main question in compression is whether or not the same image can be
represented in fewer bits so that the original image can still be reconstructed.
We can prove it. There are various ways to do this, but we concentrate on a way
called run length encoding in this section.
Imagine reading the above bits to someone who copie d them... you might tell things
like "five zeroes" earlier on rather than "zero zero zero zero c zero." That is the
basic concept behind run time encoding (RLE), used in the storage of digital images
to save space. In run length encoding, each row is repla ced by numbers that indicate
how many consecutive pixels the same colour is, often beginning with the number
of white pixels. For instance, the first line in the above image contains one white,
two black and four white, one black, four white and two black pixel.
011000010000110
This could be represented as follows.
1, 2, 4, 1, 4, 2, 1
For the second row, since we have to say the number of white pixels before
saying black, we have to tell specifically that there is zero at the beginning of the
row.
100000111000001
You may ask why we must first say the number of white pixels that was zero in
this case. The explanation is that if we had no clear rule, the computer would not
know which colour was what when the picture in this form was shown.0, 1, 5, 3,
5, 1
The third row contains five whites, five blacks, five whites.
000001111100000
This is coded as:
5, 5, 5
That means we get the following representation for the first three rows.
1, 2, 4, 1, 4, 2, 1
0, 1, 5, 3, 5, 1
5, 5, 5
You can work out what the oth er rows would be following this same system. munotes.in
Page 156
156IMAGE PROCESSING
Representation for the remaining rows
The remaining rows are
4, 7, 4
3, 9, 3
2, 5, 1, 5, 2
1, 5, 3, 5, 1
0, 5, 5, 5
1, 5, 3, 5, 1
2, 5, 1, 5, 2
3, 9, 3
4, 7, 4
5, 5, 5
0, 1, 5, 3, 5, 1
1, 2, 4, 1, 4, 2, 1
Converting run length encoding back to the original representation
Just to ensure that we can reverse the compression process, have a go at finding
the original representation (zeroes and ones) of this (compressed) image.
4, 11, 3
4, 9, 2, 1, 2
4, 9, 2, 1 , 2
4, 11, 3
4, 9, 5
4, 9, 5
5, 7, 6
0, 17, 1
1, 15, 2
The CCITT (consultative committee of the international telegraph and telephone)
and run -length coding are CCITT standards for encoding a binary and gray -level
image. With this method, images are scanned row by row and identifies the run .
The output vector run-length pixel value and run length is defined.
Hrun-length = ( H 0 + H 1)/ (L 0 +L1)
H0 = entropy of the black run
H1 = entropy of white run
L0 = average value of black run
L1 = average value of white run munotes.in
Page 157
157Chapter 6: Image Compression and Watermarking
6.7 Symbol -Based Coding
It is also referred to as token -based coding in some circles. A symbol is a collection
of sub images that appear frequently in a given context. Each symbol is stored in a
symbol dictionary, and the image is coded as an asset of the triplet (x1, y1, t1), (x 2,
y2, t2), etc., where the location of the symbol is specified by the triplet (xi, yi) ,
where ti is the token, which is a reference to the symbol in the dictionary, and each
triplet represents an instance of the dictionary symbol in the image.
Consider t he straightforward bilevel image shown in Fig. 6.7 (a). It includes only
one word, banana, which is made up of three distinct symbols: a b, three a's, and
two n's. Given that the letter b is the initial symbol recognized during the coding
process, its 9 X 7 bitmap is stored at location 0 of the symbol dictionary, as shown
in the following diagram. As illustrated in Fig. 6.7 (b), the token identifying the b
bitmap is the number 0. To illustrate how this works, the first triplet in the encoded
image's repre sentation [see Figure 6.8.7(c) ] is (0, 2, 0,] denoting that the upper -left
corner (an optional convention) of the rectangular bitmap indicating the b symbol
is to be placed at the location (0, 2) in the decoded image. It is possible to encode
the remainin g portion of an image with five extra triple ts after the bitmaps for n
symbols have been detected and added to the dictionary, as shown in the following
example. Compression occurs as long as the six triplets required to locate the
symbols in the image, as well as the three bitmaps required to define them, are
lower in size than the original image. It is assumed that each triplet is formed of
three bytes in this example, and that the starting picture has or 459 bits. In this case,
the compressed represent ation has or 285 bits and the compression ratio c=1.61 is
obtained by multiplying the starting image by (6 3 8 + [(9 7) + (6 7) + (6 6)]. For
example, to decode the symbol -based representation shown in Figure 6.7 (c), you
simply read the bitmaps of the sym bols given in the triplets from the symbol
dictionary and arrange them in the appropriate positions at the spatial coordinates
specified in the respective triplets.
FIGURE 6. 7 (a) bi-level document, (b) symbol dictionary, and (c) the triplets used
to loc ate the symbols in the document.
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 158
158IMAGE PROCESSING
In the beginning of the 1970s, Symbol -based compression was suggested, but only
recently became practical. Improvements in symbol matching and enhanced CPU
processing speeds have enabled both dictionary symbols to be selected and to see
where they can be found in an image in time. And symbol -based decoding is
definitely faster than encoding, as with many other compression methods. At last,
we observe that to further improve the compression performance, both the bitmaps
symbols which are stored in the dictionary and the three versions used for the
reference themselves can be encoded. If the resulting compression is lossless as in
fig. 6.7, only exact symbol matches are permitted. If small differences are allowed,
there will be some level for the reconstruction error.
6.8 8 Bit-Plane Coding:
An effective approach to reduce the redundancy of the interpixels of an im age is to
independently process the image bit planes. The technique, called Bit Plane Coding
(bitplane coding), is based on the principle of decomposing multilevel
(monochromatic or color -based) images into a series of binary images.
Bit-plane decompositio n:
A m-bit grey image in the form of the base 2 polynomial could be displaying the
grey levels
ܽିଵʹିଵܽିଵʹିଵڮܽଵʹଵܽʹ
Based on this property, the m coefficients of the polynomial into 1 -bit bits is a
simple way of decomposing the image into a series of binary images. The By
collecting the a0 bits of each pixel the zero -order bit plane is formed, while (m - 1)
the bit bit plane contains am -1, bits or coefficients. Generally, the number of every
bit plane from 0 to m -1 is determined by setting its pixels to equal the values of
each pixel of the respective bits and polynomial coefficients of the original image.
The inherent drawback is that small changes in the grey level can have a considerable effect on the complexity of bit planes. When the 127 (01111111) intensity pixel is neighboring to the 128 (10000000), for example, the corresponding 0 to1 (or 1 to 0) transitioning sha ll be included on every bit plane.
As the two binary codes for 127 and 128 are different most important bits, for
example, Bit plane 7 will have a zero -valued pixel next to a pixel value 1, which
will create a transition between 0 to 1 (or 1 to 0) at this point.
݃ൌܽ۩ܽାଵͲ݅݉െʹ
݃ିଵൌܽିଵ
A different approach to decomposition (which eliminates the influence of small
variations in grey levels) is to first image representation with the m -bit Gray munotes.in
Page 159
159Chapter 6: Image Compression and Watermarking
code. From Here, X denotes the exclusive OR operation: The m -Bit Gray Code
gm-1... g2g1g0 which corresponds to the polynomial in Eq. above. The only feature
of this code is that the following words differ only in one bit. Small grey level
changes are also unlikely to affect all mbit levels. For inst ance, only the 7th bit
plane can contain transitions of 0 to 1 when grey level 127 and 128 are next to each
other, because the Gray codes of 127 and 128, respectively, are 11000000 and
01000000.
6.9 Transform Coding:
Both predictive coding methods work dir ectly on the image pixels and are also
spatial domain methods. In this code, the techniques of compression based on
modification of the image transformation are considered. A reversible, linear
transform (such as a Fourier transform) is used in transform c odification to map the
image into a set of coefficients that are then quantified and coded. For most natural
images, many of the coefficients have small dimensions and can be quantified
grossly (or completely discarded) with a small distortion in the image . A number
of transformations can be used to convert image data, including the discrete Fourier
transform (DFT).
Fig. 6.8 A transform coding system: (a) encoder; (b) decoder.
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
A standard transform coding system is shown in Figure 6.8. The decoder implements the inverse phase sequence of the encoder, which carries out four fairly straightforward operation (with exception of the quantification function): decomposition of the subimages, transformation, and quantification and coding. An
image N X N is first divided up into sub -images of size n X n, then transformed
into sub -image tran sforms (N/N) 2 of each size n X n. The aim of the process of
transformation is to decorrelate the pixels of any subimage or to place as little
information as possible in the transforming coefficients.
These coefficients have the least effect on the consist ency of the subimage. The
encryption process finishes the quantified coefficients by coding (usually with a
variable length code). Any or all of these processing stages of transformative
munotes.in
Page 160
160IMAGE PROCESSING
encoding may be adapted to local image contents or fixed for all sub -images,
known as non -adaptive transform encodings.
6.10 Predictive Coding
Lossless Predictive Coding:
The error -free method of compression does not require an image to be decomposed
into a group of bit planes. The technique, also known as loss-predictive coding,
consists of removing the redundancy of interpixel pixel by collecting the latest
information in each pixel and encoding the new information only. The new pixel
information is defined as the difference between the actual and predicte d pixel
value .
The essential elements of the lossless predictive coding system are shown in Figure
8.1. The system is made up of an encoder and a decoder with the same predictor.
The predictor creates the expected pixel value depend on a set of previous in puts,
when each pixel of the input image, denoting fn is introduced to the encoder. The
predictor's output then is rounded to the nearest integer, denoted ݂መሺ݊ሻ, and used to
form a differential or prediction error coding the following element in the
comp ressed data stream using variable -length (by the symbol encoder).
݁ሺ݊ሻൌ݂ሺ݊ሻെ݂መሺ݊ሻ
Fig. 6.9 A lossless predictive coding model: (a) encoder; (b) decoder
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
The decoder of received variable -length code words is rebuilt from Fig. 8.1(b)
and inverse operations are performed to decompress or recreate the original input
sequence.
݂ሺ݊ሻൌ݁ሺ݊ሻ݂መሺ݊ሻ
Various techniques for ݂መሺ݊ሻ generation may be used local, global and adaptive.
Furthermore, a linear combination of m of previous pixels forms in most cases a
prediction. In other words, ݂መሺ݊ሻൌݑݎ݊݀ߙ݂ሺ݊െ݅ሻୀଵ൩
munotes.in
Page 161
161Chapter 6: Image Compression and Watermarking
Where m is a linear predictor order, round is a function used to indicate an operation
RIWKHURXQGLQJRUQHDUHVWLQWHJHUDQGĮLIRUL PDUHFRHIILFLHQWVRI
prediction. The subscript n indexes predictor outputs according to their occurrence
time in raster scan applications. In Eqns. above, t he most explicit notations f (t),
݂መሺݐሻand e (t) could be replaced, where t represents time. In Eqns. In all cases, n
shall is being used as an index on the spatial coordinates and/or frame number of
an image (in a time sequence). For example, Eq. above c an be written as 1 -D linear
predictive coding
݂መሺݔǡݕሻൌ݀݊ݑݎߙ݂ሺݔǡݕെ݅ሻ
ୀଵ൩
While each of the subscripted variable's function of spatial coordinates x and y are
expressed explicitly. The Eq. says the linear 1 -D prediction f(x, y) is the functi on
alone on the current line of the previous pixels. In 2 -D predictive coding, the
prediction depends on the previous pixels in a top to bottom left -to-right image
scan. In the 3D case, these pixels and prior pixels of earlier frames are the basis of
this case. For the first m pixel of each line, the equal above cannot be evaluated, so
these pixels must be coded by other methods (e.g. Huffman code) and considered
as an overhead of the predictive coding process.
Lossy Predictive Coding:
within this types of coding, a quantizer is added to the lossless prediction models
and the resulting balance between reconstruction precision and compression
performance is evaluated. As Fig. 6.10 illustrates, between the symbol encoder as
well as the position during which the prediction error was created, the quantizer
that captures the nearest integer of the error -free encoder would be inserted. It maps
the prediction error into a limited range of ݁Ƹሺ݊ሻoutputs, which determine the
compression and distortion associated with the lossy predictive coding.
Fig. 6.10 A lossy predictive coding model: (a) encoder and ( b) decoder .
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 162
162IMAGE PROCESSING
The error -free figure encoder should be modified to accommodate the insertion of the quantization step, so that the encoder and decoder produce predictions equivalent. As shown in fig. 6.10 (a), the the lossy encoder's predictor is placed
within a feedback loop, where its input (referred to as݂ሺ݊ሻሶ) is produced in
function of previous predictions and the corresponding quantified errors. The
results are obtained.
݂ሺ݊ሻሶൌ݁ሺ݊ሻሶ݂መሺ݊ሻ
This configuration of the closed loop avoids error building in the output of the
decoder. Fig. 6.10(b) shows that the decoder output is also provided by the Eqn
above.
Optimal predictors:
The optimal predictor used by most predictive coding applications limits the mean -
square prediction error of the encoder. The notation ܧሼǤሽdenotes the statistical
expectation operator.
ܧሼ݁ଶሺ݊ሻሽൌܧቄቂ݂ሺ݊ሻെ݂መሺ݊ሻ൧ଶቅ
subject to the constraint that
݂ሶሺ݊ሻൌ݁ሶሺ݊ሻ݂መሺ݊ሻൎ݁ሺ݊ሻ݂መሺ݊ሻൌ݂ሺ݊ሻ
That is, the optimization criterion is chosen in order to minimize the mean -
square prediction error, it is assumed that the quantization error is minimal
[݁ሶሺ݊ሻൎ݁ሺ݊ሻ] and the prediction is restricted to a linear combination of m of
previo us pixels. 1 These constraints are not necessary, but at the same time they
greatly simplify analysis and reduce the computational complexity of the predictor.
The resulting predictive coding method is called the differential pulse code
modulation (DPCM)
6.11 Wavelet Coding:
The coding of the wavelet is based on the assumption that the coefficients of a
transform that decorrelates an image's pixels can be coded better than the originals.
If most important visual information are packaged into a small number of
coefficients as the basis for the transformation, the remainder of the coefficients
which be quantized coarsely or truncated to zero with a little distortion of the
image.
The standard wavelet coding system is shown in Figure. For the encoding of a 2J
X 2J image, a wavelet analysisȲ, — a minimum decomposition level (J - P) are
chosen for the purpose of calculating the discrete transformation of the image munotes.in
Page 163
163Chapter 6: Image Compression and Watermarking
wavelet. The fast wavelet transform can be used when the wavelet contains an
additional scaling fun ction ࡇ . In both instances, the computed transforms a significant part of the original image with a zero mean and laplacian distribution to
horizontal, vertical and diagonal decomposition coefficients.
Reference from "Digital Image Processing fourth edit ion by
Rafael C. Gonzalez and Richard E. Woods)
Since several of the measured coefficients have little visual detail, Inter coefficient
and coding redundancies can be reduced and quantified. In addition, quantization
could be adapted to take advantage of any positional correlation over the P
decomposition levels. One or more lossless coding techniques can be used in the
final symbol codin g step including run -length, Huffman coding, arithmetic and bit
plane coding. Decoding is done by inverting encoding – with exception of
quantization, which cannot be precisely reversed.
The major difference between the wavelet system and the transform cod ing system
is that the subimage processing phases of the transform coder are not supported.
Due to its computational efficiency as well as its intrinsically local existence (e.g.
its base functions are restricted in duration), it is needless to subdivide t he original
image.
Wavelet Selection
In Fig. 6.11 all aspects of wavelet coding systems and performance are affected by
the wavelets selected as a base for forward and reverse transformation. They have
a direct impact on the complex computation of the tran sforms and, less directly, on
the system's ability to recompress acceptable error images. The number of filter
taps equal to the number of non -zero wavelet coefficients and scaling of the vector
coefficients can be applied as a digital filter sequence if t he transforming tap has a
supplementary scaling function. The wavelet's capacity to pack information in a small number of transforming coefficients determines its performance in compression and reconstruction. Daubechies waves and biorthogonal waves are th e
most widely used expansion functions for wavelet -based compression. The latter
allow useful analytical properties such as the number of disappearances that can be
incorporated in the decomposition filters, while important synthesis properties such
as smo oth reconstruction are incorporated in the reconstruction filters.
munotes.in
Page 164
164IMAGE PROCESSING
The four disc rete wavelengths in Figure 6.12 are shown. In Fig 6.12 (a) Haar
wavelets have been used as expansion or base functions, the simplest and only
discontinuous wavelets considered in this example. Wavlets from Daubechies, one
of the most widely used imagery wave lets, were employed in Fig. 6.1 2(b), and
Daubechies w aves with increased symmetry sym metry were extended to Fig. 6
.12(c) The Cohen -Daubechies -Feauveau -wavelets in illustration of biorthog onal
wavelets used in Fig. 6.12 (d) are included. Like previous results, the subordinate
structure was visible with a coef ficient of intensity 128 corresponding to coefficient
value 0 by scaling all detailed coefficients.
FIGURE 6.12
Three -scale wavelet transforms with respect to (a) Haar wavelets, (b) Daubechies
wavelets, (c) symlets, and (d) Cohen -Daubechies -Feauveau bior thogonal wavelets.
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
As illustrated in Table 6.2 , the number of operations required to compute the
transforms in Fig. 6.12 grows from four to twenty -eight multiplications and adds
each coefficient (for each decomposition level) as you move from Fig. 6.12(a) to
(d). Each of the four transformations was computed using a fast wavelet transform
formulation (i.e., filter bank). Notably, as computational complexity (i.e., the
amount of filter taps) rises, so does the performance of information packing. When
Haar wavelets are used and detail coefficients less than 1.5 are cancelled out, 33.8
percent of the entire transform is zeroed out. With more sophisticated biorthogonal
wavelets, the percentage of zeroes coefficients increases to 42.1 percent, nearly
doubling the possible compression.
munotes.in
Page 165
165Chapter 6: Image Compression and Watermarking
TABLE 6.2
Wavelet transform filter taps and zeroed coefficients when truncating the transforms in Fig. 8.43 below 1.5
6.12 Digital Image Watermarking
The techniques and protocols make it possible to distribute images (in the form of
photographs or videos) via digital media and the Internet. Regrettably, the images
distributed in this manner can be reproduced frequently and without error, jeopardizing the rights of their owners. Even when images are encrypted for
distribution, they are rendered vulnerable upon decryption. To deter unauthorized
duplication, one method is to embed one or more pieces of information, generally
calle d to as a watermark, into highly vulnerable images in such a manner so the
watermarks become indistinguishable from the images themselves. They defend
the rights of their owners in a variety of ways as important aspects of the
watermarked images, including the following:
1. Copyright identification. When an owner's rights are violated, watermarks
can provide information that serves as proof of ownership.
2. Identification or fingerprinting of the user. Legal users' identities can be
embedded in watermarks and used to track down the sources of unauthorized
copies.
3. Authenticity determination. A watermark can serve as a guarantee that an
image has not been altered, provided the watermark is designed to be
destroyed upon image modification.
4. Automated m onitoring. Watermarks can be tracked by systems that keep
track of when and where images are used (for example, programmes that scan
the Web for images embedded in Web sites). Monitoring is beneficial for
collecting royalties and/or locating unauthorized u sers.
5. Copy protection. Watermarks can be used to set usage and copying restrictions on images (e.g., to DVD players)
In this section, we will discuss digital image watermarking, which is the technique
of embedding data into an image in such a way that it may be utilized to make a Wavelet Filter Taps (Scaling + Wavelet) Zeroed Coefficients Haar 2+2 33.3% Daubechies 8+8 40.9% Symlet 8+8 41.2% Biorthogonal 17+11 42.1% munotes.in
Page 166
166IMAGE PROCESSING
assertion about the image. The methods outlined have no resemblance to the
compression techn iques discussed previously (although they do involve the coding
of information). Indeed, watermarking and compression are diametrically opposed
in several aspects. While compression aims to reduce the amount of data used to
represent images, watermarking a ims to enhance them by adding information and
data (i.e., watermarks). Watermarks can be visible or invisible.
A visible watermark is a subimage or image that is opaque or semi -transparent and
is superimposed on another image (i.e., the image being waterm arked) to make it
clear to the observer. Television networks frequently include visible watermarks
(modelled after their logos) in the top or lower right -hand corner of the screen. As
illustrated in the following example, visible watermarking is often acco mplished in
the spatial domain.
The image in Fig. 6.13(b) is the image's lower right quadrant with a scaled -down
version of the watermark in Fig. 6.13 (a) superimposed on top. Using fw the
watermarked image as a starting point, we may represent it as a linear combination
of the unmarked image f and the watermark w.
FIGURE 6.13
A simple visible watermark: (a) watermark; (b) the watermarked image; and (c) the
difference between the wate rmarked image and the original (non -watermarked)
image
Reference from "Digital Image Processing fourth edition by Rafael C. Gonzalez
and Richard E. Woods)
݂௪ൌሺͳെߙሻ݂ݓߙ
Wherein constant determines the watermark's relative visibility to the underlying
image. If Įis 1, the watermark is transparent and fully obscures the underlying
image. As the value approaches 0, more than just the underlying image is seen and
less of the waterm ark. In general , 0Į1, as shown in Fig. 6.13(b), Į=0.3. The
computed difference (intensity -scaled) between the watermarked image in Fig.
6.13 (b) and the unmarked image is shown in Figure 6.13(c). Intensity 128, on the
other hand, represents a difference o f 0. It's worth noting that the underlying image
is visible through the "semi transparent" watermark. Both Fig. 6.13(b) and the
difference image in Fig. 6.13 (c) demonstrate this .
munotes.in
Page 167
167Chapter 6: Image Compression and Watermarking
Invisible watermarks, in contrast to the visible watermark in the previous example,
cannot be seen with the human eye. They are invisible yet recoverable with the use of a proper decoding algorithm. By introducing them as visually redundant information [information that the human eye ignores or cannot perceive], their
invisibilit y is ensured. The example in Figure 6.14 (a) is simple. Due to the fact
that the least significant bits of an 8 -bit image have little effect on our perception
of it, the watermark from Fig. 6.14 (a) was put or "hidden" in the image's two least
significant bits. Using the notation presented previously, let us say
݂௪ൌͶ൬݂
Ͷ൰ݓ
Ͷ
and conduct the computations using unsigned integer arithmetic. By dividing and
multiplying by 4, the two least significant bits of f are set to 0, by dividing w by
64, the two most s ignificant bits of w are shifted into the two least significant bit
positions, and by adding the two results, the LSB watermarked image is generated.
Note that in Fig. 6.14(a), the embedded watermark is not visible. However, by
zeroing the image's most sig nificant six bits and scaling the remaining values to the
entire intensity range, the watermark can be retrieved as shown in Fig. 6.14. (b)
FIGURE 6.14 A simple invisible watermark: (a) watermarked image; (b) the
extracted watermark; (c) the watermarked image after high quality JPEG compression and decompression; and (d) the extracted watermark from (c).
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
A critical aspect of invisible watermarks is their resistance to removal attempts,
both accidental and purposeful. Weak invisible watermarks are removed when the
images in which they are embedded are modified. This is a useful quality in some
applications, such as image authentication. As illustrated in Figures 6.14 (c) and
(d), the LSB watermarked image in Figure 6.14 (a) has a delicate invisible
watermark. The watermark is deleted if the image in (a) is compressed and
decompressed using lossy JPEG. The outcome of compressing and decompressing
Figure 6.14 (a) is shown in Figure 6.14 (c); the rms error is 2.1 bits. When we use
the same procedure as in (b) to extract the watermark from this image, the outcome
is incoherent [see Fig. 6.14 (d) ]. While lossy compression and decompression
munotes.in
Page 168
168IMAGE PROCESSING
saved the image's critical visual information, they obliterated the image's delicate
watermark.
Robust invisible watermarks are meant to withstand image modification, whether accidental or purposeful. Lossy compression, linear and nonlinear filtering, cropping, rotation, a nd resampling are all examples of common accidental attacks.
Intentional attacks can take the form of printing and rescanning, as well as the
insertion of extra watermarks and/or noise. Naturally, it is superfluous to endure
attacks that render the image u nusable.
6.13 Summary
We have seen different compression techniques with an examples and various types
of image compression algorithms that are implemented in various types of image
processing applications, watermarking techniques for digital images.
6.14 Unit End q uestions
1. What is data compression?
2. Write a short note on
a) Coding Redundancy b) Interpixel Redundancy
c) Psychovisual Redundancy
b) What is a Fidelity Criteria?
c) Explain the Huffman coding with an example.
d) Explain the Golomb Coding with an example.
e) Explain the Arithmetic Coding with an example.
f) Explain the LZW Coding with an example.
g) Explain the Run -length Coding with an example.
h) Explain the Symbol -based Coding with an example.
i) Write a short note on Block Transform Coding.
j) Write a short note on Predi ctive Coding.
k) Write a short note on Wavelet Coding.
l) Write a short note on Digital Image Watermarking.
6.15 Reference for further reading
1. Digital Image Processing Gonzalez and Woods Pearson/Prentice Hall Fourth
2018
2. Fundament als of Digital Image Processing A K. Jain PHI
3. The Image Processing Handbook J. C. Russ CRC Fifth 2010
4. All Images courtesy from Digital Image Processing by Gonzalez a nd Woods,
4th Edition
munotes.in
Page 169
169Chapter 7: Image Compression and Watermarking
Unit 4
7 MORPHOLOGICAL IMAGE PROCESSING
Unit Structure
7.0 Objective
7.1 Preliminaries,
7.2 Erosion and Dilation
7.3 Opening and Closing,
7.4 The Hit -or-Miss Transform
7.5 Morphological Algorithms
7.6 Morphological Reconstruction
7.7 Morphological Operations on Binary Images
7.8 Grayscale Morphology
7.9 Summary
7.10 Unit End questions
7.11 Reference for further reading
7.0 Objective
Upon the completion of this chapter, the readers will know
x This section defines a number of important concepts related to the subject of
mathematical morphology and how the method is extensively applied in
digital image processing.
x A variety of tools have been used for binary image morphology, incorporating erosion, dilation, opening, closing, and the techniques can also
be combined them to generate more complex tools.
x Capable of developing binary image morphology based algorithms for the
performance of tasks including morphological smoothing, edge -detection
and skeletonisation
x Learn how binary image morphology to gray scale images can be expanded.
x Will be able to develop grey scale image processing algorithms for tasks like
textural segmentation, granulometry, gray -scale gradient computing and
others munotes.in
Page 170
170IMAGE PROCESSING
7.1 Preliminaries
Morphological image processing is a set of non -linear image processing relevant to
the shape or morphology of the image features. As per Wikipedia, morphological
operations only depend on the specific ordering of the values of the pixels, not the
numerical values therein. Morphological operations may also be used in grayscale
images such that the light transfer functions of the grayscale are uncertain and their
absolute pixel values are either of no or minor concern.
Morphological techniques e valuate an image called a structural element with a
small shape or template. The structuring element is placed anywhere in the image
and compared with the corresponding pixel neighborhood. In some operations, the
element is tested if it "fits" in the neigh borhood, while others test if it "hits" or
crosses it:
Figure 7.1 Probing of an image with a structuring element
(white and grey pixels have zero and non -zero values, respectively).
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
A morphology operation on a binary image produces a new binary image with a
pixel value not zero only, if the test at that location in the input image is successful.
A small binary image is the structuring element, i.e. a small matrix of pixels, each
with its own zero or one value:
x The sizes of the structuring element indicate the matrix dimensions.
x The pattern and zeroes specify the structure element shape .
x One of its pixels typically originates from a structuring element, but the origin
can usually be outside the structuring element.
munotes.in
Page 171
171Chapter 7: Image Compression and Watermarking
[Reference from
https://www.cs.auckland.ac.nz/courses/compsci773s1c/lectures/ImageProcessing -
html/topic4.htm ]
The odd dimensions of the structuring matrix and of the origin identified as the core of the matrix is a standard procedure. In morphological images, Structuring elements play the same role in linear image filtering as convolution
kernels.
If a structuring element is positioned in a binary image, each pixel of the structuring
element is associated with the corresponding pixel of the neighborhood. The
structuring el ement should match the image if the corresponding image pixel is 1
for each of its pixels also set to 1. Likewise, if the corresponding image pixel is
also 1. For at least one of their pixels set to 1, a structuring element is said to hit
or intersect an image.
Figure 7.2 Fitting and hitting of a binary image with structuring elements s 1 and s 2.
[Reference from
https://www.cs.auckland.ac.nz/courses/compsci773s1c/lectures/ImageProcessing -
html/topic4.htm ]
munotes.in
Page 172
172IMAGE PROCESSING
7.2 Erosion and dilation
Erosion
Erosion is the alternative to dilation. If dilation expands an image, erosion shrinks
it. The structural element determines how the image is shrunk. Typically, the
structural element is smaller than t he image, measuring 3 x 3 size.
This results in a faster computation time as compared to using larger structural
elements. Almost identical to dilation, erosion will mov e the structural element
from l eft to right and top to bottom.
At the center position, denoted by the structuring element's centers, the procedure
checks to see if there is complete overlap with the structuring e lement or not.
If no complete overlapping oc curs, the center pixel given by the structural element's
center will be set to white or 0. Assume that X represents the reference binary image
and B represents the structuring element. The equation for erosion is as follows:
ܺٓܤൌ൛ݖหܤ௭אܺൟ
According to the equation, the outcome element z is only evaluated if the
structuring element is a subset of or equal to the binary picture X. Fig. b illustrates
this process. Again, the white square represents O, whereas the black square
represents 1.
At position •, the erosion process begins. There is no total overlap in this case, and
hence the pix el at location • remains white.
After shifting the structural element to the right, the identical condition is seen.
Because there is no complete overlapping a t position u, the black square designated
with • • will be converted white.
The structural element is then relocated further to the point shown by •• •. Here, we
observe that the overlapping is complete, since all of the structuring element's black
squares overlap with the image's black squares.
As a result, the image's structuring element's center will be black. The result is seen
in Figure 7.3 b after the structuring element reaches the final pixel. Erosion is a
thinning operator that causes an image to b ecome smaller. By eroding an image, it
is possible to eliminate narrow sections while thinning out wider ones.
Figure 7.3 Erosion
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 173
173Chapter 7: Image Compression and Watermarking
Dilation
As the binary image is expanded from its original shape, dilation is defined as the
technique of expanding a binary image from one shape to another. The structural
element determines how the binary picture is expanded and how it is displayed.
This structuring ele ment is smaller in size when compared to the image itself, and
the size that is typically used for the structuring element is 3 × 3 pixels in size.
Dilation is similar to convolution in that the structuring element is mirrored and
shifted left to right and top to bottom; at each shift, the process looks for any
overlapping identical pixels between the structuring element and the binary image.
If there is an overlap, the pixels beneath the structuring element's center position
will be set to 1 or black.
Assume that X represents the reference image and B represents the structuring
element. Equation defines the dilation operation.
ܺ۩ܤൌ൛ݖหൣሺܤሻ௭ځܺ൧אܺൟ
Where B is the rotation of the image B about the origin. The equation indicates that
when the struc turing element B dilates the image X, the outcome element z is that
at least one element in B intersects with an element in X.
Whether this is the case, the position of the structural element within the image will
be set to 'ON'. This procedure is depicted in Fig. 7.4 a. I is represented by the black
square, and 0 is represented by the white square.
At first, the structuring element's center is aligned at position •. There is no overlap
between the black squares of B and the black squares of X at this point ; so, the
square will remain white at position •.
After that, the structural element will be transferred to the right. At position **, one
of B's black squares overlaps or intersects with X's black square.
As a result, the square at position • • will be tu rned to black. Consequently, the
structural element B is shifted left to right and top to bottom on picture X to produce
the dilated image shown in Fig. 7.4 a.
The dilation operator is a binary object enlargement operator. Dilation has a variety
of applica tions, but the most common is bridging gaps in an image, as B expands
the features of X. munotes.in
Page 174
174IMAGE PROCESSING
Figure 7.4 Dilation
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
7.3 Opening and Closing
Apart from the two major operations of erosion and dilation, there are two
secondary operations that are critical in the processing of morphological images:
opening and its opposite, closing. We are primarily concerned in opening, while
the properties of closing are usua lly equivalent via complementation. While
opening is defined in terms of erosion and dilation, it has a more geometric
formulation in regards of structuring element fits, which serves as the basis for its
implementation.
Opening
The opening of image A by i mage B is denoted by ל and is defined as the result of
erosion and dilation by
ܣלܤൌሺܣٓܤሻْܤ
The functional symbol for opening isߛܤሺܣሻ. To illustrate the role of opening in
processing, we use the following corresponding formulation:
ܣלܤൌራሼܤ௫ǣܤ௫ؿܣሽ
The opening is determined in this case by union all translations of the structural
element that fit within the input image. Each fit is identified, and the opening is
determined by adding the translations of the structural elements to each identif ied
location. Indeed, this is exactly what eroding and eventually dilating refers to.
In Fig. 7.5 , a rectangle is eroded and then dilated by a disks to demonstrate how
opening is expressed as erosion followed by dilation. Additiona lly, the fitting effect
can be discerned: opening the rectangle has resulted in it being rounded from the
munotes.in
Page 175
175Chapter 7: Image Compression and Watermarking
inside, this rounding being caused by the method in which the disks was "rolled
around" inside the rectangle in order to establish a union of the fits. If the structural
elem ent had been a small square with a horizontal base, no rounding occurred and
the opened image remained identical to the original.
In Fig. 7.5 , we have two instances of opening. By opening with a disc, you create
a filter that smooth from the inside out; in other words, it rounds corners that extend into the background. With a square structural element, the impression is significantly different.
Rather than viewing the opened image as the processing's final output, we might
use an alt ernative approach. Consider subtracting the opening from the input image
using set theory. This operator is referred to as the opening top -hat:
Figure 7.5 (a) Structuring element, (b) input image, (c) erosion, (d) opening.
Figure 7.6. (a) Structuring element, (b) input image, (c) opening, (d) opening top -hat.
(Reference from "Digital Image Processing fourth edition
by Rafael C. Gonzalez and Richard E. Woods)
The opening top -hat in Fig. 7.5 comprises of protruding input -image corners into
the background and can be used fo r recognition purposes. Figure 7.6 illustrates
another application of the opening top -hat to detect gear teeth. Although the disk is
frequently used because its shape effect is rotationally unchangeable, there are
numerous circumstances when it is advantageous to use other types of structuring
elements.
munotes.in
Page 176
176IMAGE PROCESSING
Figure 7.7 (a) Structuring element, (b) input image, (c) dilation, (d) closing
(Reference from "Digital Image Processing fourth edition
by Rafael C. Gonzalez and Richard E. Woods)
Closing
Closing is a complementary procedure to opening, which is described as a dilation
followed by an erosion. The closing of A by B is indicated by ܣή B and defined by
ܣήܤൌሺܣ۩ܤሻٚܤ
In functional notation, closing is also denoted by ܤሺܣሻ. Closing is depicted in
Figure 7.7 . The effect is visible in the way the closing has been filtered from the
outside, smoothing only the protruding corners into the image.
Because closing is the dual operator of the opening,
ܣήܤൌሺܣιܤሻ
Because closing is synonymous with opening, opening is synonymous with
closing: yields that complement one another
ܣιܤൌሺܣήܤሻ
How the structuring element from the c losing is reflected. If the object is a disc or
any other symmetrical shape, reflection is irrelevant. We might use duality in
conjunction with the union formulation of opening, thus fitting, or "rolling the ball,"
around the image's perimeter. With the us e of an asymmetric al structuring element,
Figure 7.8 depicts the duality between open and close.
Due to the fact that the closing contains the input image, subtracting the input image
from the closing yields the closing top -hat operator:
ܣήƸܤൌሺܣήܤሻെܣ
Figure 7.9 illustrates the closing and the closing top -hat in the application. A
shape's convex hull is approximated by closing it with a large disk , and its convex
hull inadequacies are approximated by closing it with a top -hat. Due to their strong
discriminant property, these defects are frequently exploited in character recognition.
munotes.in
Page 177
177Chapter 7: Image Compression and Watermarking
Figure 7.8 Duality between open and close with a nonsymmetrical
structuring element.
(Reference from "Digital Image Processing fourth edition
by Rafael C. Gonzalez and Richard E. Woods)
Figure 7.9 (a) Input image, (b) closing by a disk, (d) closing top -hat.
(Reference from "Digital Image Processing fourth edition
by Rafael C. Gonzalez and Richard E. Woods) Consider the image and structural element, with the opening and closing represented. Opening, as a filter, has cleaned the boundary by removing minor
extrusions; however, it has done so much more precisely than erosion, with the
result that the opened image is a far more accurate duplicate of the original than the
eroded image. Similar remarks apply to the closing, with the exception of minor
invasions bein g filled. Notably, while the origin's position in relation to the
structuring element influences both erosion and dilation, it has no effect on opening
and shutting.
munotes.in
Page 178
178IMAGE PROCESSING
Figure 7.10 (a) Input image, (b) structuring element, (c) opening, (d) closing.
(Reference from "Digital Image Processing fourth edition
by Rafael C. Gonzalez and Richard E. Woods)
7.4 The Hit -or-Miss Transform
The hit -and-miss transform is a basic binary morphological operation that can be
used to inspect an image for specific patte rns of foreground and background pixels.
It is, in fact, the fundamental operation of binary morphology, as it is the source of
practically all other binary morphological operators. As is the case with other
binary morphological operators, it accepts a bin ary image and a structuring element
as input and outputs another binary image.
The hit -or-miss transformation of an image A by B is denoted by A ٘B.
B is a structural element pair B=(B1,B2).. Instead of a single element,
B1: the collection of B elements a ssociated with a certain object
B2 A collection of B elements that relate to the background
As follows is the definition of the hit -or-miss transform:
ܣ٘ܤൌሺܣٚܤଵሻځሺܣٚܤଶሻ
This transform is important for locating all pixel configurations that correspond to
the B 1 structure (i.e. a match) but not to the B 2 structure (i.e. a miss). As a result,
the hit -or-miss transform is used to detect shapes.
Using the two structural elements B1 and B2, use the hit -or-miss transform to
determine the locations of the following shape pixel configuration in the image
below.
munotes.in
Page 179
179Chapter 7: Image Compression and Watermarking
Solution:
The figure below illustrates how the hit -or-miss transform is applied to the image
from the preceding example.
Figure 7.11 (a) Binary image. (b) Result of applying hit -or-miss transform.
(Reference from "Digital Image Processing fourth edition
by Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 180
180IMAGE PROCESSING
7.5 Morphological Algorithms
Morphology's primary application is to extract image compone nts that are useful
for representing and describing shape. Boundary extraction, skeletonization (i.e.
extracting the skeleton of an object), and thinning are all performed using
morphological algorithms.
Boundary Extraction
The boundary of a set A, indicat HGE\ȕ$FDQEHGHWHUPLQHGLQWKHIROORZLQJ
manner:
Ⱦሺሻെെሺٚሻ
where B denotes the structuring element.
The figure below illustrates how to extract an object's boundary from a binary
image.
Figure 7.12 (a) Binary image. (b) Object boundary extracted using the previous
equation and 3×3 square structuring element.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Due to the fact that the structuring element is 3x3 pixels in size, the resulting
boundary is one pixel thick. Thus, utilizing the 5x5 structuring element results in a
boundary that is between 2 and 3 pixels thick, as illustrated in the following figure.
Figure 7.13 Object boundary extracted using 5×5 square structuring element
(Reference from "Digital Image Processing fourth edition
by Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 181
181Chapter 7: Image Compression and Watermarking
Thinning
Thinning is the process of reducing binary objects or shapes in an image to single -
pixel width strokes. The definition of thinning a set A by a structuring element B
is as follows:
ܣٔܤൌܣെሺܣ٘ܤሻൌܣځሺܣ٘ܤሻ
Because we are simply matching the pattern (shape) to the structuring elements,
there is no need for a background operation in the hit -or-miss transform.
B is a sequence of structuring elements in this case:
ሼܤሽൌሼܤଵǡܤଶǡܤଷǡǤǤǡܤሽ
where Bi denotes Bi -1's rotation. Thus, the following can be expressed as the
thinning equation:
ܣٔሼܤሽൌሺሺǤǤ൫ሺܣٔܤଵሻٔܤଶ൯ǤǤሻٔܤሻ
The entire procedure is repeated until no additional modifications are required. The
following figure illustrates how to thinning the fingerprint ridges to a thickness of
one pixel.
Figure 7.14 (a) Original fingerprint image. (b) Image thinned once. (c) Image
thinned twice. (d) Image thinned until stability (no changes occur).
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Figure 7.15 (a) illustrates a set of structural elements that are usually used for
thinning (notice that Bi is equivalent Bi-1 to rotated clockwise by 45o and Fig. 7.15
(b) illustrates a set A that is to be thinned using the approach mentioned before.
The result of thinning A with a single pass of B1 to obtain A 1 depicted in Figure
7.15 (c). The result of thinning A1 with B2 is depicted in Figure 7.15 (c), and the
outcomes of passes with the other structural components are depicted in Figures
7.15 (e) through (k) (there were no changes from A7to A8or from A9 to A11).
Convergence occurred following the second pass of B6 Figure 7.15 (l), which
depicts the thinned result. Finally, Fig. 7.15 (m) illustrates the converted m -
connectivity of the thinned set .
munotes.in
Page 182
182IMAGE PROCESSING
FIGURE 7.15 (a) Structuring elements. (b) Set A. (c) Result of thinning A B1
with (shaded). (d) Result of thinning with A1 and B 2(e)–(i) Results of thinning
with the next six SEs. (There was no change between A7andA8 (j)–(k) Result of
using the first four elements again. (l) Result after convergence. (m) Result
converted to m -connectivity.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Thickening
Thickening is the morphological polar opposite of thinning, defined by the
expression
ܣٖܤൌܣڂሺܣ٘ܤሻ
where B is a thickening structuring element. As with thinning, thickening is a
sequential process:
ܣٖሼܤሽൌሺሺǤǤ൫ሺܣٖܤଵሻٖܤଶ൯ǤǤሻٖܤሻ
The structural components utilised for thickening are identical to those illustrated
in Fig. 7.16 (a), exc ept that both 1's and 0's have been changed. However, in
practise, a separate algorithm for thickening is rarely utilised. Rather than that, the
conventional approach is to thin the background of the set in question and then to
complement the outcome. In o ther words, to thicken a set A, we first form Ac a thin
Ac set and then complement it to achieve the thickened set A. This approach is
munotes.in
Page 183
183Chapter 7: Image Compression and Watermarking
illustrated in Figure 7.16. As previously stated, we display only set A and picture
I, not the padded version of image I.
FIGURE 7.16 (a) Set A. (b) Complement of A. (c) Result of thinning the
complement. (d Thickened set obtained by complementing (c). (e) Final result,
with no disconnected points.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Convex hull
If the straight line segment connecting any two points in A lies entirely within A,
the set is called to be convex. The smallest convex set containing an arbitrary set S
is called the convex hull H. The difference between H and S is referred to as the
convex deficiency of S. Convex hull and convex deficient are useful for describing
objects.
Assume that B i, i=1,2,3,4 are the four structural elements. The approach entails
implementing the following equatio n:
ܺൌ൫ܺିٔܤ൯ܣ݅ൌͳǡʹǡ͵ǡͶ݇ൌͳǡʹǡ͵ǡͶ
with X 0 = A
When the procedure converges ( ܺൌܺെͳ), we let D i = ܺ. The convex hull of
A is then
ܥሺܣሻൌራܦସ
ିଵ
As an outcome, the procedure entails repeatedly applying the hit -or-miss transform
to A with B1; once no further changes take place, the union with A is conducted
and the outcome is denoted by D. Repeat the technique with B2 (applied to A) until
munotes.in
Page 184
184IMAGE PROCESSING
no additio nal changes occur, and so forth... The convex hull of A is formed by the
union of the four resultant Ds.
Figure 7.17 Convex hull
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
One disadvantage of the approach is that the convex hull can become larger than
the minimum dimensions necessary to ensure convexity. One straightforward way
to mitigate this effect is to constrain growth to the vertical and horizontal
dimensions of the initia l collection.
Hole filling
A hole is defined as a region of background pixels that is encircled by a connected
border of foreground pixels.
Assume that A is a set containing eight connected boundaries, each boundary
covering a background region (a hole). T he purpose is to fill all holes with ones
given a point in each hole (for binary images). We begin by creating an array X 0 of
zeros (of the same size as the array containing A), except the locations in X 0 that
correspond to the given point in each hole, which are set to one. The following
process then replaces all of the holes with ones:
ܺൌሺܺିଵْܤሻתܣ k=1,2,3
munotes.in
Page 185
185Chapter 7: Image Compression and Watermarking
where A is the symmetric structuring element
If X k = X k-1, the algorithm stops at iterat ion step k. Thus, the set X k contains all
filled holes, while the union of X k and A includes all filled holes and their
boundaries. If left unchecked, the dilatation could fill the entire region. However, the intersection with the complement of A at each s tep restricts the result to the region of interest. This is an illustration of how a morphological process might be conditioned in order to achieve a desired property. In the current context, it is
referred to as a conditional dilation.
Figure 7.18 Hole filling
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
An image that could result from thresholding to 2 levels a scene containing polished
spheres (ball bearings). Dark spots could be results of reflections. The objective is
to eliminate reflections by hole filling…
A (white) point selected Result of filling that Result of filling all inside one sphere
the spheres. A (white) point was selected as the result of filling all the spheres
within one sp here.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 186
186IMAGE PROCESSING
Pruning
Pruning techniques are critical for thinning and skeletonizing algorithms, as these
operations frequently leave parasitic elements that must be "cleaned up" in post -
processing. We begin with a pruning problem and work our way up to a
morphological solution. A popular approach to automated character recognition is
to evaluate the shape of each character's skeleton. These skele tons are frequently
defined by "spurs" (parasitic components). Spurs are formed during erosion as a
result of inconsistencies in the strokes that comprise the characters. We propose a
morphological technique for resolving this issue, presuming that the len gth of a
parasitic component does not exceed a predefined number of pixels.
Figure 7.19 Pruning
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
The method is focused on gradually deleting a parasitic branch's termination point.
Additionally, this reduces (or eliminates) the length of other branches in the
character. In the absence of additional structural information, any branch with three
or less pixels should be deleted. Thinning an input set A by the use of a sequence
of structural components aimed at detecting .
Let
ܺൌܣלሼܤሽ
where {B} denotes the order of the structural elements. This sequence is made up
of two distinct structures, each of which is rotated by 900 degrees to create a total
of 8 pieces.
munotes.in
Page 187
187Chapter 7: Image Compression and Watermarking
After applying the equation 3 times, the set X1 is obtained, and the character is
"restored" to its original form but with the parasitic branches eliminated.
To begin, we create a set X 2 that c ontains all of the end points in X1:
ܺଶൌራሺܺଵ଼
ିଵ۪ܤሻ
where Bk denotes the identical structural parts. The following step is to dilate the
end points three times using set A as a delimiter:
ܺଷൌሺܺଶ۩ܪሻתܣ
where H is a three -dimensional three -dimensio nal structuring element composed
of ones and the intersection with A as applied after each step. This type of
conditional dilation prevents non -zero elements from appearing outside of the
region of interest.
Finally, by combining X3 and X1, the desired res ult is obtained:
ܺସൌܺଵܺଷ
X3 occasionally includes the "tips" of some parasitic branches in more complex
circumstances. This can occur when the branches' termination sites are close to the
skeleton. Although these elements may be deleted in X1, they may reappear during
the dilation because they are legitimate points in A. Because parasitic elements are
typically brief in comparison to legitimate strokes, they are rarely picked up again.
As they are disconnected regions, their detection and eradication ar e simple.
Skeletonization
Skeletonization is a technique for decreasing foreground regions in a binary image
to a skeletal residue that retains the majority of the original region's extent and
connectivity while discarding the majority of the original foreground pixels. To
understand how this operates, consider that the foreground regions of the input
binary image are composed of a uniformly slow -burning material. Simultaneously
light flames along the region's boundary and observe the fire spre ad towards the
interior. The fire will extinguish itself at spots when it crosses two distinct borders,
and these points are referred to as the 'quench line'. This is the skeleton line. It is
obvious from this definition that thinning results in the format ion of a skeleton.
Figure 7.20: Skeletonization, Medial axis transform.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 188
188IMAGE PROCESSING
Another approach to see the skeleton is as the loci of the centres of bi -tangent
circles that completely encompass the foreground region under consideration. This
is illustrated in Figure 7.21 for a rectangular shape.
Figure 7.21: Skeleton of a rectangle defined in terms of bi -tangent circles.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods) Although the terms medial axis transform (MAT) and skeletonization are frequently used interchangeably, we shall make a subtle distinction between them.
The skeleton is nothing more than a binary image of a simple skeleton. On the other
hand, the MAT is a grayscale image in which each point on the skeleton has an
intensity that corresponds t o its distance from an object border.
A set A's skeleton S(A) can be thought of as:
a) If z is a point in S(A) and (D)z is the largest disk centred on z and included
within A, no larger disks(not necessarily centred on z) containing (D)z and
contained wit hin A can be found. A maximal disks is denoted by the
disk(D)z.
b) The disk (D)z makes two or more distinct contacts with the boundary of A.
A's skeleton can be described in terms of erosions and openings as follows:
ܵሺܣሻൌራݏ
ୀሺܣሻ
Sk(A) = (A ٓ %í(A ٓ kB) ל
where B is a structural element that (A ٓ kB) denotes k sequential erosions
beginning with A; i. e., A is eroded first by B, followed by the result by B, and so
on:
(A ٓ kB) = ((…((A ٓ )ٓ )ٓ )…ٓ )
K is the last iterative step before A erodes to an empty set:
K = max{k |(A ٓ N% }
munotes.in
Page 189
189Chapter 7: Image Compression and Watermarking
S(A) can be obtained as the union of the skeleton subsets S k(A),k = 0, 1, 2, …,K
.A can be reconstructed from these subsets:
ܣൌራሺܵሺܣሻْܤ݇
ୀሻ
Where ሺܵሺܣሻْܤ݇ )denotes a series of k sequential dilations, beginning with
Sk(A)
(ܵ (A) ْ kB ) = ((…(( ܵሺܣሻ ْ )ْ B) ْ )…ْ )
FIGURE 7.22 (a) Set A. (b) Various positions of maximum disks whose centers
partially define the skeleton of A. (c) Another maximum disk, whose center
defines a different segment of the skeleton of A. (d) Complete skeleton (dashed)
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
7.6 Morphological Reconstruction
Reconstruction is a morphological operation that entails the transformation of two
images and a structural factor (instea d of a single image and structuring element).
The change begins with a single image, the marker. The second image, the mask,
serves as a constraint on the transformation. Connectivity is defined by the
structuring element used. We will use 8 -connectivity ( the default), which indicates
that B is a 3 × 3 * matrix of 1s in the following discussion, with the center defined
at coordinates (2, 2).
If G is the mask and F is the marker, the reconstruction of G from F, denoted R G(F),
is defined by the following iter ative procedure:
1. Initialize h 1 to be the marker image, F.
2. Create the structuring element: B = ones(3).
3. Repeat
munotes.in
Page 190
190IMAGE PROCESSING
݄ାଵൌሺ݄۩ܤሻתܩ
until݄ାଵൌ݄
4. RG(F)=݄ାଵ
Marker F must be a subset of G: ܨكܩ
The prior iterative approach is illustrated in Figure 10.21. While this iterative
formulation is conceptually sound, there are far quicker computational approaches
available.
FIGURE 7.23 Morphological reconstruction. (a) Original image (the mask). (b)
Marker image. (c) –(e) Intermediate result after 100, 200, and 300 iterations,
respectively. (f) Final result. (The outlines of the objects in the mask image are
superimposed on (b) –(e) as visual references.)
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Geodesic Dilation and Erosion
Geodesic dilation and erosion are key procedures in the field of morphological
reconstruction. Additionally, these processes add a new dimension to conventional
dilation and erosion. They allow us to rebuild certain shapes within an image.
Nevertheless, we are still working with binary images in this case. If you're
unfamiliar with binary images, they are images that contain ei ther black or white
pixels. In other words, we may divide these images into two groups: one foreground
and one background.
The unique aspect of these two methods is that we employ two images instead of
one for each. Likewise to the simple versions, we requ ire an image to which we
can apply dilation or erosion. Along with this image, we'll need a masking image
to control the resultant image's growth and shrinkage.
munotes.in
Page 191
191Chapter 7: Image Compression and Watermarking
We must obtain the intersection of the masked picture and the dilated image using
geodesic dila tion. As previously stated, masking the image will constrain the
growth of subsequent dilations.
Let denote the marker image and the mask image, F كG.The geodesic dilation of
size 1 of the marker image with respect to the mask, denoted by ܦீሺଵሻሺܨሻ
ܦீሺଵሻሺܨሻൌሺܨْܤሻתܩ
The geodesic dilation of size of the marker image with respect to, denoted by
ܦீሺሻሺܨሻ is defined as
ܦீሺሻሺܨሻൌܦீሺଵሻሺܨሻ(ܦீሺିଵሻሺܨሻሻ
FIGURE 7.24 Illustration of a geodesic dilation of size 1. Note that the marker
image contains a point from the object in G. If continued, subsequent dilations
and maskings would eventually result in the object contained in G.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
The geodesic erosion of marker F's size 1 in relation to mask G is defined as
ܧீሺଵሻሺܨሻൌሺܨٓܤሻܩ
Where represents a predefined union (or logical OR operation). Geodesic
erosion of F with regard to G of size n is defined as
ܧீሺሻሺܨሻൌܧீሺଵሻሺܨሻ(ܧீሺିଵሻሺܨሻሻ
:KHUHQLVDSRVLWLYHQXPEHU ݀݊ܽܧீሺሻሺܨሻൌܨ and at each step, the set union
ensures that the geodesic erosion of an image keeps greater than or equal to its mask
image. As the forms , Indicate, geodesic dilation and erosion are duals in terms of
set complementation. Figure 7.25 illustrates a geodesic erosi on of size one. The
steps depicted are a direct application.
munotes.in
Page 192
192IMAGE PROCESSING
FIGURE 7.25 Illustration of a geodesic erosion of size 1.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Geodesic dilation and erosion coincide over a finite number of repetitive steps as
the mask restricts the marker image's propagation or shrinkage.
Morphological Reconstruction by Dilation and by Erosion
On the basis of previous notions, morphological reconstruction by dilation of a
marker image F with respect to a mask image G, indicated ܴீሺܨሻ, is defined as
iterative geodesic dilation of F with respect to G until stability is obtained; that is,
ܴீሺܨሻൌܦீሺሻሺܨሻ
with k such that
ܦீሺሻሺܨሻൌܦீሺାଵሻሺܨሻ
Reconstruction by dilation is seen in Figure 7.26. The process initiated in Figure
7.26 is continued in Figure 7.26 (a). After acquiring ܦீሺଵሻሺܨሻ this result, the next
stage in reconstruction is to dilate it and then AND it with mask G ܦீሺଶሻሺܨሻ , as
shown in Fig. 7.26 (b). Following that, dilation of ܦீሺଶሻሺܨሻ and masking with G
produces ܦீሺଷሻሺܨሻand so on. This approach is continued until the sys tem reaches a
state of stability. Adding one additional step to this example results ܦீሺହሻሺܨሻൌ
ܦீሺሻሺܨሻ in the image, morphologically reconstructed via dilation, being provided
by ܴீሺሻሺܨሻൌܦீሺହሻሺܨሻ as illustrated. As expected, the reconstructed image i s
identical to the mask.
munotes.in
Page 193
193Chapter 7: Image Compression and Watermarking
FIGURE 7.26 Illustration of morphological reconstruction by dilation. Sets ܦீሺଵሻሺܨሻǡG, B and F are from Fig. 7.24 . The mask (G) is shown dotted for reference.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Similarly, the morphological reconstruction by erosion of a marker image F with
regard to a mask image G, indicated ܴீሺሻሺܨሻ, is specified as the iterat ive geodesic
erosion of F with respect to G; that is,
ܴீாሺܨሻൌܧீሺሻሺܨሻ
with k such that
ܧீሺሻሺܨሻൌܧீሺାଵሻሺܨሻ
Create a figure similar to Fig 7.26a.m. for erosion -based morphological
reconstruction. In terms of set complementation, reconstruction through dilation
and erosion are duals.
Opening by Reconstruction
In morphological opening ( ܣٓܤሻْܤ ,the erosion operation eliminates objects
smaller than structuring element B, and the dilation operation recovers the size and
shape of the remaining objects. However, restoration precision is greatly dependent
on the type of structural element and the fo rm of the restored objects during the
dilation operation. The opening by reconstruction method allows for a more comprehensive restoration of items following erosion. It is defined as the reconstruction of n erosions of F by B in relation to using geodesic dilation.
munotes.in
Page 194
194IMAGE PROCESSINGܱோሺሻሺܨሻൌܴிሺሻሺܨٓܤ݊ሻ
Where ܨٓܤ݊ is a marker image and F is a mask image used in morphological
reconstruction via dilation.
ܴிሾሺܨٓܤ݊ሻሿൌܦிሺሻሾሺܨٓܤ݊ሻሿ D represents geodesic dilation with k iterations
until stability is reached, i. e.
ܦிሺሻሾሺܨٓܤ݊ሻሿൌܦிሺିଵሻሾሺܨٓܤ݊ሻሿ . Hence ܦிሺଵሻሾሺܨٓܤ݊ሻሿൌሺሾሺܨٓ
ܤ݊ሻሿْܤሻתܨ Because the mask image constrains the marker image within the
growth region, the dilation operation on the marker image will not extend outside
the mask image. The marker image is thus a subset of the mask image. ሾሺܨٓ
ܤ݊ሻሿكܨ
The images below demonstrate a straightforward opening -by-reconstruction
procedure for extracting vertical strokes from an input text image. Due to the fact
that the original image was transformed from grayscale to binary, it contains a few
distortions in some characters, resulting in i dentical characters having differing
vertical lengths. The structural element in this example is an 8 -pixel vertical line
that is used during the erosion operation to locate things of interest. Additionally,
morphological reconstruction by dilatation ܴிሾሺܨٓܤ݊ሻሿൌܦிሺሻሾሺܨٓܤ݊ሻሿ
iterates k =9 times until the resulting image converges.
FIGURE 7.27 Original image for opening by reconstruction
FIGURE 7.28 Marker image
munotes.in
Page 195
195Chapter 7: Image Compression and Watermarking
FIGURE 7.29 Result of opening by reconstruction
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Automatic Algorithm for Filling Holes
We devised an algorithm for filling holes based on the knowledge of a hole's starting place. We present a fully automated approach for morphological reconstruction. Assume that I(x, y) is a binary image, and that we create a marker
image F that is 0 everyw here except at the image boundary.
ܨሺݔǡݕሻൌሼͳെܫሺݔǡݕሻ
Ͳݐ݄݁ݏ݅ݓݎ݂݁݅ሺݔǡݕሻݏܾ݅݊ݎ݀ݎ݂݁ܫ
Only those dark pixels of I(x,y) that are adjacent to the b oundary have a value of
1 in F(x,y).
The binary image that contains all regions (holes) is given by: ܪൌሾܴூሺሻሺܨሻሿ
We seek to fill the hole by image I.
The complement surrounds the hole with a wall.
The marker image F is one at the border except for the original image's border
pixels.
munotes.in
Page 196
196IMAGE PROCESSING
FIGURE 7.29.1Hole filling using morphological reconstruction.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
The marker F dilates inward starting at the boundary.
The complement acts as an AND mask, preventing any foreground pixels (including the wall) from altering throughout iterations.
Only the hole points are provided by the final procedure.
A more practical illustration is shown in Figure 7.30. The complement of the text
image in Figure 7.30 (a) is shown in Fig ure 7.30 (b), and the marker image, F,
produced is shown in Figure 7.30 (c). This image is entirely black with a white (1's)
border, except for areas corresponding to 1's in the original image's border (the
border values are difficult to identify visually at the magnification exhibited, and
also since the page is practically white). Finally, Fig. 7.30 (d) illustrates the image
after all holes have been filled.
FIGURE 7.30 (a) Text image of size pixels. (b) Complement of (a) for use as a
mask image. (c) Marker image. (d) Result of hole -filling
(Reference from "Digital Image Processing fourth edition
by Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 197
197Chapter 7: Image Compression and Watermarking
Border Clearing
Object extraction from images is a critical problem in automated image analysis.
A procedure for deleting items that touch (are connected) the image boundary is
advantageous since it leaves only complete objects for subsequent processing. it
indicates that only a portion of an object remains visible in the field of view.
As a mask, the original image is used. The image of the marker is
ܨሺݔǡݕሻൌሼͳെܫሺݔǡݕሻ
Ͳݐ݄݁ݏ݅ݓݎ݂݁݅ሺݔǡݕሻݏܾ݅݊ݎ݀ݎ݂݁ܫ
The border clearing algorithm calculates a morphological reconstruction first
ܴூሺܨሻ, which essentially extracts the items that touch the boundary, and then
generates a new image with no objects touching the borders ܫെܴூሺܨሻ.
Examine the original text image from Fig. 7.31 (a) once again. Figure 7.31 (a)
illustrates the reconstruction ܴூሺܨሻgenerated by employing a 1's 3x3 structural
element. The objects that contact the original image's border can be seen on the
right side of Fig. 7.31 (a). The image X in Figure 7.31 (b) was produced. If the task
at hand is automated character recognition, obtaining an image in which no
characters contact the border is extremely beneficial because it avoids the difficulty
of recognizing partial characters (at best a challenging work).
FIGURE 7.31 (a) Reconstruction by dilation of marker image. (b) Image with no
objects touching the border.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
7.7 Summary of Morphological Operations on Binary Images
The structure elements employed in the various binary morphological approaches
covered thus far are summarized in Figure 7.32. The shaded elements represent
foreground values (which are normally denoted by 1's in numerical arrays), the
white elements represent background values (which are typically denoted by 0's),
and the x's represent "don't care" elements.
munotes.in
Page 198
198IMAGE PROCESSING
FIGURE 7.32 Five basic types of structuring elements used
for binary morphology
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Grayscale Morphology
Grayscale morphology's structuring components provide the same primary role as
their binary counterparts: they act as "probes" to evaluate a given image for specific
properties. In grayscale morphology, structural features fall into one of two
categories: nonflat and flat. Each is illustrated in Figure 7.33. The image in Figure
7.33 (a) depicts a hemispherical grayscale SE, while Figure 7.33 (c) depicts a
horizontal intensity profile through its center. Figure 7.33 (b) depicts a flat
structuring element in the shape of a disk, while Figure 7.33 (d) depicts the
element's equivalent intensity profile. To facilitate understanding, the elements in
Fig. 7.33 are depicted as continuous quantities; their computer implement ation is
based on digital approximations. Grayscale nonflat SEs are not widely utilized in
practice due to a variety of issues. The erosion of an image f caused by a SE b at
any point (x,y) is defined as the image's minimum value in the region coinciding
with b when b's origin is at (x,y):
FIGURE 7.33 Nonflat and flat structuring elements, and corresponding horizontal
intensity profiles through their centers. All examples in this section
are based on flat SEs.
(Reference from "Digital Image Processing fourth edition
by Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 199
199Chapter 7: Image Compression and Watermarking
Grayscale Erosion and Dilation
The grayscale erosion of f caused by a flat structuring element b at locations (x, y)
is defined as the image's least value in the region coincident with b(x, y) when b's
origin is at (x, y) . The erosion of an image f by a structuring element b at (x, y) is
expressed in equation form as
ሾ݂ܾٓሿሺݔǡݕሻൌ
ሺ௦ǡ௧ሻאሼ݂ሺݔݏǡݕݐሻሽ
A similar concept can be applied to spatial correlation, in which the coordinates x
and y are incremented through all possible values to arrive at the origin of b, which
then visits every pixel in f. That is, we locate the structural element at every pixel
location in the image, and use that as the starting point to find the erosion of f by b.
Erosion happens only at the locations that have a minimum value of f with respect
to b. For example, if b is a square structuring element of size 3X3 and a point has
the minimum value of f in the region that it spans, then finding the erosion at that
point requires locating the minimum of the nine values of f that are located within
the defined region that is spanned by b when its origin is at that point.
Corresponding ly, the grayscale dilation of f via a flat structuring element b at any
point on the coordinate plane (x, y) is determined by the maximum value of the
image in the window spanned by ܾwhen the origin of ܾis at (x, y). That is,
ሾْ݂ܾሿሺݔǡݕሻൌ
ሺ௦ǡ௧ሻאሼ݂ሺݔെݏǡݕെݐሻሽ
Whereas in grayscale erosion with a flat SE, each neighbourhood of (x, y)
containing a coincident of b computes the minimum intensity value of f in the
neighbourhood, we expect that the eroded image will be darker than the original. It
also f ollows that bright features will be reduced in size, while dark features will be
enlarged. In Fig. 7.34 (b), erosion is demonstrated by using a disk SE with a radius
of 2 pixels and a height of 2 pixels. What we're seeing here is obvious: the effects
that have been described are apparent in the eroded image. Consider, for example,
the extent to which the int ensities of the small bright dots in Fig. 7.34 (b) were
reduced, which caused them to barely be visible. At the same time, the dark features
in the figure grew in thickness. The erosion is slightly darker in general compared
to the original image's backgro und.
FIGURE 7.34 (a) Gray -scale X -ray image of size pixels. (b) Erosion using a flat
disk SE with a radius of 2 pixels. (c) Dilation using the same SE
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 200
200IMAGE PROCESSING
As shown in Fig. 7.34 (c), this was the outcome of dilation of the same SE . Erosion
does the opposite of creating these effects. To make the features more apparent, the
light ones were thickened and the darker ones were made less noticeable. In Fig.
7.34 (a) lacks the visible of thin black connecting wires located in the left, mi ddle,
and right bottom. However, unlike the eroded small white dots in Fig. 7.34 (b), the
dark dots in the dilated image are still obvious. As it turns out, the reason the black
dots are larger than the white dots is because the black dots were initially l arger
than the white dots in terms of their size relative to the SE. Finally, compare the
dilated and non -dilated images in Fig. 7.34 (a)
The erosion of image f by a nonflat SE b N is defined as:
ሾ݂ܾٓሿሺݔǡݕሻൌ
ሺ௦ǡ௧ሻאಿሼ݂ሺݔݏǡݕݐሻെܾேሺݏǡݐሻሽ
The dil ation of image f by a nonflat SE b N is defined as:
ሾْ݂ܾேሿሺݔǡݕሻൌ
ሺ௦ǡ௧ሻאಿሼ݂ሺݔെݏǡݕെݐሻܾேሺݏǡݐሻሽ
When the SE is flat, the equations become identical to previous ones, as long as
they do not contain a constant.
When dual operations like erosion a nd dilation are taken into consideration, one's
functions and abilities can be complemented and reflected.
ሾ݂ܾٓሿሺݔǡݕሻൌൣْ݂ܾ൧ሺݔǡݕሻ
Similarly,
ሾْ݂ܾሿሺݔǡݕሻൌൣ݂۩ܾ൧ሺݔǡݕሻ
Grayscale Opening and Closing
The opening of image f by SE b is:
݂לܾൌሺ݂ܾٚሻ۩ܾ
Opening is merely f being eroded of f by b, followed by being dilated the
result by b. Likewise, the grayscale closing of f by b is signified by ݂ڄܾ
݂ڄܾൌሺ݂۩ܾሻܾٚ
Grayscale images' opening and closing are duals in terms of complementation and
SE reflection:
ϳ͘ϯϱAnd ሺ݂לܾሻൌ݂ڄܾ munotes.in
Page 201
201Chapter 7: Image Compression and Watermarking
The concept is illustrated in one dimension in Figure 7.35. Assume the curve in
Fig. ϳ͘ϯϱ(a) represents an image's intensity profile along a single row. In Figure
ϳ͘ϯϱ(b), a flat structuring element is shown in so many positions, pushed up against
the curve's bottom. The thick curve in Fig. ϳ͘ϯϱ(c) represents the entire opening.
So because s tructuring element is too large to fit completely inside the upward
peaks of the curve, the opening clips the tops of the peaks, with the amount clipped
proportional to the structuring element's reach into the peak. In general, openings
are used to elimina te small, bright details while maintaining the overall intensity
levels and larger bright features.
FIGURE 7.35 Grayscale opening and closing in one dimension. (a) Original 1 -D
signal. (b) Flat structuring element pushed up underneath the signal (c) Opening.
(d) Flat structuring element pushed down along the top of the signal. (e) Closing
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Closing is depicted graphically in Figure ϳ͘ϯϱ(d). Take note o f how the structuring
element is pushed down onto the curve as it is translated to all regions. The closing,
as illustrated in Fig. ϳ͘ϯϱ(e), is formed by determining the lowest points achieved
by every part of the structuring element since it slides again st the curve's upper
side. The grayscale opening compliance with the following requirements:
a. ݂ιܾռ݂
b. ݂݂݅ଵռ݂ଶݐ݄݂݊݁ଵιܾռ݂ଶιܾ
c. ሺ݂ιܾሻלܾ=݂ιܾ
The notation ݍռݎ is being used to denote that the domain of q is a subset of the
domain of r, as well as that any (x, y) in the domain of q is also a subset of the
domain of r.
munotes.in
Page 202
202IMAGE PROCESSING
Likewise, the closing operation meets the following requirements:
a. ݂ռ݂ڄܾ
b. ݂݂݅ଵռ݂ଶݐ݄݂݊݁ଵڄܾռ݂ଶڄܾ
c. (݂ڄܾሻڄܾ=݂ڄܾ
Example 1 Grayscale opening and closing.
Figure 7.36 broadens the one -dimensional concepts illustrated in Figure 7.36 to two
dimensions. 7.35 The image in Figure 7.36 (a) is identical to the one and Fig. 7.36
(b) is the opening created by a disk structuring element with a height of one pixel
and a radius of three pixels. As expected, the intensity of all bright features
decreased in proportion to their sizes relative to the SE. When compared to Fig.
7.36 (b), we see that, in contrast to erosion, opening had a negligible effect on the
image's dark features and a negligible effect on the background. Likewise, Fig. 7.36
(c) illustrates the image being closed with a disc of radius 5 (because the small
round black dots are larger than the small white dots, a larger disc was required to
achieve comparable results to the opening). The bright details and background in
this image were largely unaffected, but the dark features were attenuated, with the
degree of attenuation dependent on the features' relative sizes to the SE.
FIGURE 7.36 (a) A grayscale X -ray image of size pixels. (b) Opening using a
disk SE with a radius of pixels. (c) Closing using an SE of radius 5.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
7.8 Some Basic Grayscale Morphological Algorithms
Numerous grayscale morphological techniques have been developed on the basis
of the grayscale morphological concept s introduced thus far. The following
discussion illustrates several of these algorithms.
Due to the fact that opening inhibits bright details smaller than the predefined SE
while removing dark details fairly unaffected and closing has the opposite effect,
munotes.in
Page 203
203Chapter 7: Image Compression and Watermarking
these two operations are frequently combined as morphological filters for image
smoothing and noise removal. Consider Fig. ϳ͘ϯϳ(a), which depicts an X -ray image
of the Cygnus Loop supernova. Suppose again for purpose of this study that the
central light region is the objective and that the smaller components are noise. Our
goal is to eliminat e noise. The result of opening the original image with a flat
disk of radius 1 and then closing it with a SE of the same size is shown in Figure
ϳ͘ϯϳ(b). The figures ϳ͘ϯϳ(c) and (d) illustrate the same operation with SEs of 3
and 5 radii, respectively. A s expected, this sequence demonstrates progressive
elimination of small components as SE size increases. The final result demonstrates
that noise has been largely eliminated. The noise components on the lower right
side of the image could not be completely removed due to their size being larger
than the other successfully removed image elements.
FIGURE ϳ͘ϯϳ(a) image of the Cygnus Loop supernova, taken in the X -ray band
by NASA’s Hubble Telescope. (b) –(d) Results of performing opening and closing
sequences on the original image with disk structuring elements
of radii, 1, 3, and 5, respectively
(Reference fro m "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Morphological Gradient
Dilation and erosion, in conjunction with image subtraction, can be used to obtain
the morphological gradient, g, of a grayscale image f in the following manner:
݃ൌሺ݂۩ܾሻെሺ݂ܾٓሻ
Where b denotes an appropriate structuring element. The cumulative consequence
of this equation is that dilation thickens and erosion shrinks regions in an image.
Their distinction highlights the divisions between reg ions. Because homogenous
regions are unaffected (as long as the SE is not too large in relation to the image's
resolution), the subtraction operation seeks to eliminate them. As a result, an image
munotes.in
Page 204
204IMAGE PROCESSING
is created in which the edges are enhanced and the homogene ous areas are
suppressed, creating a "derivative like" (gradient) effect.
An illustration of this is shown in Figure 7.38. Figure 7.38 (a) depicts a CT scan of
the head, while the following two figures depict the opening and closing with a flat
SE of 1's. Take note of the aforementioned thickening and shrinking. The
morphological gradient obtained is depicted in Figure 7.38 (d). As expected of a 2 -
D derivative image, the boundaries between regions are clearly defined.
FIGURE 7.38 (a) image of a head CT scan. (b) Dilation. (c) Erosion. (d)
Morphological gradient, computed as the difference between (b) and (c)
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Top-Hat and Bottom -Hat Transformations
Combini ng image subtraction with open and closed operations yields a top -hat
transform and a bottom -hat transform. A grayscale image f's top -hat transformation
is defined as f minus its opening.
ܶ௧ሺ݂ሻൌ݂െሺ݂ڄܾሻ
Similarly, the bottom -hat transformation of f is defined as the closing of f minus fௗ:
ܤ௧ሺ݂ሻൌሺ݂ڄܾሻെ݂
These transformations have a variety of applications, one of which is the removal
of objects from images using an element that opens or closes instead of resizing the
object that was removed. After that, the difference operation produces an image
that contains only the component that was deleted. In the case of bright objects o n
a dark background, the top -hat transform is used, whereas the bottom -hat transform
munotes.in
Page 205
205Chapter 7: Image Compression and Watermarking
is used in the case of the opposite. The white top hat tr ansform and the black
bottom -hat transform are two common names for these transformations.
Take, for example, Fig. 7.39 (a), which depicts a grain of rice in various states of
development. In this image, nonuniform lighting was used to create the image, as
evidenced by the darker area in the bottom rightmost portion of the image. As
shown in Figure 7.39 (b), thresholding was performed with the Otsu method, which
is an optimal thresholding meth od . When nonuniform illumination is used in
conjunction with a nonuniform image, the result is segmentation errors in the dark
area (where several grains of rice were not obtained from the background) and in
the top left portion, where parts of the backgr ound were interpreted as rice. Figure
7.39 (c) depicts the image being opened by a disk with a radius of 40 degrees. This
SE was large enough that it could not be accommodated in any of the objects. A
result of this was that the objects were removed, havin g left only a close estimate
of the background to be seen. In this image, the shading pattern is clearly visible.
It is expected that the background will become more uniform as a result of
subtracting this image from the original (i.e., applying a top -hat transformation).
As illustrated in Fig. 7.39 (d), this is indeed the case. Although the background is
not perfectly uniform, the differences between light and dark extremes are less
extreme, and this was sufficient to produce a correct thresholding result, in which
all of the rice grains were properl y extracted using Otsu's method, as shown in Fig.
7.39 (e).
FIGURE 7.39 Using the top -hat transformation for shading correction. (a)
Original image of size pixels. (b) Thresholded image. (c) Image opened using a
disk SE of radius 40. (d) Top -hat transfor mation (the image minus its opening).
(e) Thresholded top -hat image.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 206
206IMAGE PROCESSING
Granulometry
Granulometry is a branch of science that is concerned with determining the size
distribution of particles in an image by analysing the image. It is difficult to count
particles when they are not neatly separated, which makes counting based on
individual pa rticle identification a difficult task. It is possible to estimate particle
size distribution indirectly using morphology, without having to identify and
measure individual particles, by using morphology.
The technique is simple and clear. When dealing wit h particles that have regular
shapes and are lighter in color than the background, the technique comprises in
implementing openings with SEs that grow in size as the particle size increases.
The underlying concept is that opening operations of a specific s ize should have
the greatest impact on regions of the input image that contain particles of similar
size to the opening operation. We compute the sum of the pixel values for each
image that results from an opening in the window. Because, as we discussed ea rlier,
openings reduce the intensity of light features in an image, this sum, which is
referred to as the surface area, decreases as the SE size increases. In this case, the
procedure produces a 1 -D array, with each element corresponding to a single pixel
in the opening for the size SE corresponding to that particular location in the array.
For the purpose of highlighting the differences between successive openings, we
compute the difference between adjacent elements of the one -dimensional array. If
the dif ferences are plotted, the peaks in the plot are indicative of the size
distributions of the particles in the image that are most prevalent in the image.
Consider the image of wood dowel plugs in Fig. ϳ͘ϰϬ(a), which depicts two
dominant sizes of wood dowel plugs. Due to the possibility of variations in the
openings caused by the wood grain in the dowels, smoothing the openings is a
sensible preprocessing step. Figure ϳ͘ϰϬ(b) depicts an image that has been
smoothed using the morphological smoothing filter d iscussed previously, with a
disk radius of 5. Figures ϳ͘ϰϬ(c) through (f) depict image openings created by disks with radii of 10, 20, 25, and 30 degrees, respectively. The small dowels' contribution to the intensity contribution in Fig. ϳ͘ϰϬ(d) has been nearly
eliminated, as can be seen in the figure. The contribution of the large dowels has
been significantly reduced in Fig. ϳ͘ϰϬ(e), and it has been reduced even further in
Fig. ϳ͘ϰϬ(f). Because it is smaller in size than the other larger dowels, the l arge
dowel near the top right corner of the image is much darker than the others in Fig.
ϳ͘ϰϬ(e). If we had been attempting to detect faulty dowels, this would have been
useful information to have. munotes.in
Page 207
207Chapter 7: Image Compression and Watermarking
FIGURE ϳ͘ϰϬ(a) image of wood dowels. (b) Smoothed ima ge. (c) –(f) Openings
of (b) with disks of radii equal to 10, 20, 25, and 30 pixels, respectively.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
A plot of the difference array is depicted in Figure 7.41. As stated earlier, we
assume statistically significant differences (peaks in the plot) across radii at which the SE is significant enough to accommodate a collection of particles with approximately the same diameter as the particle . The result shown in Fig. ϳ͘ϰϭhas
2 different peaks, which indicates that there are two dominant object sizes in the
image.
FIGURE 7.41 Differences in surface area as a function of SE disk radius, r. The
two peaks indicate that there are two dominant pa rticle sizes in the image.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 208
208IMAGE PROCESSING
Textural Segmentation
Figure 7.42 (a) depicts a image of dark blobs overlaid on a light background, which
is a noisy image. The image is divided into two textural regions: a region on the
right that is composed of large blobs, and a region on the left that is composed of
smaller blobs. Finding a boundary between two regions depending on their textural
content, although i n this scenario is dictated by the sizes and spatial distribution of
the blobs. The process of dividing an image into regions is referred to as
segmentation.
FIGURE 7.42 Textural segmentation. (a) A image consisting of two types of blobs.
(b) Image with small blobs removed by closing (a). (c) Image with light patches
between large blobs removed by opening (b). (d) Original image with boundary
between the two regions in (c) superimposed. The boundary was obtained using a
morphological gradient.
(Reference from "Digital Image Processing fourth edition
by Rafael C. Gonzalez and Richard E. Woods)
As we can see, the objects of interest are darker than the background, and thus we
know that if we close the image with a structural element that is large r than the
small, distracting blobs, the blobs will be erased. Fig. 7.42 (b), which shows the
outcome of closing the input image with a disk with a radius of 30 pixels,
demonstrates that this is in fact the case. It is around 25 pixels in radius for the
smallest blobs. As a result, we have an image of large, dark blobs on a light
background at this stage. We can achieve a similar result by opening this image
with a structuring element that is large in comparison to the separation between
these blobs. The ne t outcome must be an image wherein the light patches between
both the blobs are erased, leaving the dark blobs and the newly dark patches
between these blobs. A disk with a radius of 60 is used to obtain the goals shown
in Figure 7.42 (c).
munotes.in
Page 209
209Chapter 7: Image Compression and Watermarking
Grayscale Morpho logical Reconstruction
Grayscale morphological reconstruction is based on the same principles as binary
images. The marker and mask pictures, respectively, are denoted by the f and g. We
will consider that the both images are grayscale images of the same size, which
means that the intensity of f at any point in the image will be smaller than the
intensity of g at that point in the image. With respect to g, the geodesic dilation of
f of size 1 is seen andis defined as.
ܦሺଵሻሺ݂ሻൌሺْ݂ܾሻٿ݃
ר Represents the point -wise minimum operator and b is a valid structuring
element. The geodesic dilation of size 1 is obtained by computing the dilation of f
by b first, then choosing the mini mum between the result and g at each location (x,
y)
The geodesic dilation of size n of f with respect to g is defined as
ܦሺሻሺ݂ሻൌܦሺଵሻሺܦሺିଵሻ݂ሻ
With ܦሺሻሺ݂ሻൌ݂
Similarly, the geodesic erosion of size 1 of f ௗZLWK respect to g is defined as
ܧሺଵሻሺ݂ሻൌሺ݂ܾٓሻڀ݃
where denotes the point -wise maximum operator. The geodesic erosion of size n is
defined as
ܧሺሻሺ݂ሻൌܧሺଵሻሺܧሺିଵሻ݂ሻ
With ܧሺሻሺ݂ሻൌ݂
The morphological reconstruction by dilation of a grayscale mask i mage, g, by a
grayscale marker image, f, indicated by is described as the geodesic dilation of f
with respect to g, iterated until stability is achieved;
ܴሺሻሺ݂ሻൌܦሺሻሺ݂ሻ
With k such that ܦሺሻሺ݂ሻൌܦሺାଵሻሺ݂ሻ
The morphological reconstruction caused by erosion of g by f, defined by ܴሺாሻሺ݂ሻ
is similarly defined as
ܴሺாሻሺ݂ሻൌܧሺሻሺ݂ሻ
With k such that ܧሺሻሺ݂ሻൌܧሺାଵሻሺ݂ሻ munotes.in
Page 210
210IMAGE PROCESSING
For the same reasons as in the binary case, opening by reconstruction of grayscale
images begins with eroding the input image and using it as a marker, and then
utilizes that image as the mask. The opening through reconstruction of size n of an
image f is represented by dilation of the erosion of size n of f with respect to f; that
is,
ܱோሺሻሺ݂ሻൌܴሺሻሺ݂ܾٓ݊ሻ
Where ݂ܾٓ݊ GHQRWHVQVXFFHVVLYHHURVLRQVE\EVWDUWLQJZLWKIௗ
Furthermore, the closing of an image f by reconstruction of size n is defined as the
reconstruction by erosion of the dilation of size n of f with respect to f; that is,
ܥோሺሻሺ݂ሻൌܴሺாሻሺْ݂ܾ݊ሻ
In this case, represent n the number of successive dilations by b, starting from the
letter f. Beca use of duality, it is possible to achieve the closing by reconstruction
of an image by complementing the image, acquiring the opening by reconstruction,
and then complementing the outcome. As a final point, as demonstrated in the
following example, an imp ortant method known as top -hat by reconstruction
involves subtracting with an image the opening created by reconstruction.
7.9 Summary
We have seen Morphological image processing techniques like Erosion and
dilation ,Opening and Closing ,The Hit -or-Miss Transf orm and different types of Morphological Algorithms which can applied in various image processing applications. Reconstruction is a morphological operation over an images. The
Studies of different Morphological Operations on Binary Images .Morphological
Algorithms on basic grayscale images
7.10 Unit End questions
1. What is a Morphological image processing?
2. Write a short note on Erosion and Dilation.
3. Write a short note on opening and closing .
4. Explain the Hit -or-Miss Transform.
5. Write a short note on Boundary Extraction.
6. Write a short note on Thinning and Thickening.
7. Write a short note on Convex hull and Hole filling. munotes.in
Page 211
211Chapter 7: Image Compression and Watermarking
8. Write a short note on Pruning and Skeletonization.
9. Explain the Geodesic Dilation and Erosion.
10. Explain the Morphological Reconstruction by Dilati on and by Erosion .
11. Write a short note on Opening by Reconstruction .
12. State the algorithm for filling holes.
13. Explain the Grayscale Morphology.
14. Write a short note on Grayscale Erosion and Dilation.
15. Explain the Grayscale Opening and Closing.
16. Explain the Morphological Smoothing.
17. Explain the Morphological Gradient.
18. Write a short note on Top -Hat and Bottom -Hat Transformations.
19. What is a Granulometry ?
20. What is a Textural Segmentation?
7.11 Reference for further reading
1. Digital Image Processing Gonzalez and Woods Pearson/Prentice Hall Fourth
2018
2. Fundamentals of Digital Image Processing A K. Jain PHI
3. The Image Processing Handbook J. C. Russ CRC Fifth 2010
4. All Images courtesy from Digital Image Processing by Gonzalez and Woods,
4th Edition
munotes.in
Page 212
212IMAGE PROCESSING
Unit 4
8 IMAGE SEGMENTATION, EDGE DETECTION,
THRESHOLDING, AND REGION DETECTION
Unit Structure
8.0 Objective
8.1 Fundamentals
8.2 Thresholding
8.3 Segmentation by Region Growing and by Region Splitting and Merging
8.4 Region Segmentation Using Clustering and Superpixels
8.5 Region Segmentation Using Graph Cuts
8.6 Segmentation Using Morphological Watersheds
8.7 Use of Motion in Segmentation
8.8 Summary
8.9 Unit End questions
8.10 Reference for further reading
8.0 Objective
The chapter's segmentation algorithms are based on one of two fundamental
properties of image intensity values: discontinuity or similarity. In the first
category, an image is partitioned into regions according to abrupt changes in
intensity, such as edges . The second category of approaches is based on segmenting
an image into regions that are similar based on a set of predefined criteria. Methods
in this category include thresholding, region growth, and region splitting and
merging. We demonstrate that by combining methods from disparate categories, such as edge detection and thresholding, we can improve segmentation performance. Additionally, we discuss image segmentation using clustering and
superpixels, as well as an introduction to graph cuts, a techniq ue that is well -suited
for extracting the image's principal regions. Following that, a discussion of image
segmentation based on morphology is presented, an approach that combines several
of the characteristics of segmentation based on techniques. munotes.in
Page 213
213Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
After reading this chapter, readers should be able to:
x Understand the characteristics of different types of edges encountered in
exercise.
x Understand how spatial filtering can be used for edge detection.
x Familiarize yourself with additional types of edge detection techniques that
extend beyond spatial filtering.
x Utilize several different approaches to gain a better understanding of image
thresholding.
x Understand how to optimize segmentation by combining thresholding and
spatial filtering.
x Familiarize yours elf with region -based segmentation techniques, such as
clustering and superpixels.
x Understand the segmentation techniques used with graph cuts and morphological watersheds.
x Understand the fundamental techniques for incorporating motion into image
segmentat ion.
8.1 Fundamentals
Let R denote the the whole spatial region that an image occupies. Segmentation is
a process that divides the image R into n subregions, R1, R2,..., Rn, in such a way
that
a. ڂܴൌܴ
ିଵ
b. ܴis a connected set, for i=0,1,2,3..,n;
c. ܴתܴൌݎ݂݈݈݆ܽ݅݀݊ܽǡ്݆݅
d. ܳሺܴሻൌܴܶܧܷݎ݂ൌͲǡͳǡʹǡ͵ǤǤǡ
e. ܳ൫ܴܴ൯ൌܴܶܧܷfor any adjacent regions ܴܴ݀݊ܽ
P (R k) signifies a logical predicate described over the points in the set R i and
whereas is the null set. Condition (a) specifies that the segmentation should be
complete; each pixel must be contained within a region. Conditions (b) require
that points wit hin a region be connected in a predefined way. As indicated by
condition (c), the regions must be disjoint. Condition (d) specifies the properties
that all pixels within a segmented region must satisfy —for example, P (Ri) = TRUE
if all pixels within Ri hav e the same g rey level. Finally, condition (e ) establishes
that regions Ri and Rj are distinct in terms of the predicate P. munotes.in
Page 214
214IMAGE PROCESSING
As we can see, the fundamental issue in segmentation is partitioning an image into
regions that meet the aforementioned criteria.
Monochrome image segmentation algorithms are typically based on one of two
important properties of intensity values: discontinuity or similarity. In the first
category, we assume that region boundaries are sufficiently distinct from one
another a nd from the background to permit boundary detection via local intensity
discontinuities. Edge -based segmentation is the primary technique used in this
category. The second category of Region -based segmentation approaches is based
on partitioning an image into regions that are similar based on a set of predefined
criteria.
The prior concepts are illustrated in Figure 8.1. Figure 8.1 (a) depicts a region of
constant intensity overlaid on a darker, also constant -intensity background. The
overall image is comp osed of these two regions. The result of computing the inner
region's boundary using intensity discontinuities is shown in Figure 8.1 (b). Points
on either side of the boundary are black (zero) due to the absence of intensity
discontinuities in those regio ns. To segment the image, we allocate one level (for
example, white) to all pixels on or inside the boundary and another level (for
example, black) to all points outside the boundary. The result of such a procedure
is depicted in Figure 8.1 (c). As we can see, this result satisfies conditions (a)
through (c) stated at the beginning of this section. The predicate of condition (d) is
as follows: If a pixel is on or within the boundary, it should be labelled white;
otherwise, it should be labelled black. As sh own in Fig. 8.1 (c), this predicate is
TRUE for the point’s labelled black or white. Likewise, both segmented regions
(object and background) satisfy condition (e)
FIGURE 8 .1 (a) Image of a constant intensity region. (b) Boundary based on
intensity discontinuities. (c) Result of segmentation. (d) Image of a texture region.
(e) Result of intensity discontinuity computations (note the large number of small
edges). (f) Result of segmentation based on region properties
Reference from "Digital I mage Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 215
215Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
The following three images demonstrate segmentation by region. Figure 8.1 (d) is
similar to Figure 8.1 (a), other than that the inner region's intensities form a textured
pattern. The outcome of computing intensity discontinuities in this image is shown
in Figure 8.1 (e). Because of the many spurious intensity changes, it is hard to
identify a distinctive boundary for the original image, as many of the nonnegative
intensity variatio ns are linked to the boundary, making edge -based segmentation
ineffective. However, because the outer region is consistent, we only need a
predicate that distinguishes among textured and constant regions to solve this
segmentation problem. The standard dev iation of pixel values is a measure that
actually achieves this by being greater than zero in areas of the texture region and
less than zero in all other regions. The result of segmenting the original image into
subregions of varying sizes is depicted in F igure 8.1 (f). Each subregion was then
labelled white if the standard deviation of its pixels was positive (i.e., if the
predicate was TRUE), and zero if the standard deviation was negative. Due to the
fact that groups of squares were labelled with the sam e intensity, the result has a
"blocky" appearance around the region's perimeter (smaller squares would have
given a smoother region boundary). Finally, keep in mind that these results also
satisfy the five segmentation criteria stated at the start of this section.
Complete v/s partial segmentation
In complete segmentation
x Segmented disjoint regions uniquely correspond to objects in the input image.
x Cooperation with high computation levels that make use of problem's domain -specific knowledge is required.
In partial segmentation
x Segmented regions do not always correspond to the image object.
Discontinuity -based segmentation
In a discontinuity -based approach, an image's partitions or sub -divisions are
determined by abrupt changes in the image's intensity level. We are primarily
concerned with identifying isolated points, lines, and edges in an image. To do so,
we employ the 3 X 3 Mask operation.
Figure 8.2. An image
munotes.in
Page 216
216IMAGE PROCESSING
Figure 8.3. 3X3 Mask
ࡾൌࢃǡ
ୀ
ୀࢌሺ࢞ǡ࢟ሻ
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
The discontinuity -based segmentation can be classified into three approaches: (1)
Point detection, (2) Line detection, and (3) Edge detection.
Point, Line, and Edge Detection
This section discusses segmentation techniques that are relied on detecting sharp,
localised transformations in intensity. We are focused in three types of image
characteristics: isolated points, lines, and edges. Edge pixels are pixels where the
image's frequency sudden modifies, while edges (or edge segments) are collec tions
of connected edge p ixels.
Edge detectors are low -level image processing tools that are used to detect pixel
boundaries. A line can be thought of as a (typically) thin edge segment where the
intensity of the background one on each side of the line is significantly greater or
lesser than the intensity of the line pixels. Indeed, as we will explain shortly, lines
outcome in what are referred to as "roof edges." Finally, an isolated point can be
thought of as a foreground (background) pixel surrounded by other foreground
(backgro und) pixels.
Point Detection
In a digital image, the simplest type of discontinuity is a point. The most frequently
used technique for detecting discontinuities is to apply a (n X n) mask to each point
in the image. The mask is constructed in the manner de picted in Figure 2.
Figure 8.4. A mask for point detection
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 217
217Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
The point is located at the (x, y) coordinates of the mask's centre in an image. If the
equivalent value of R is
צܴצܶ
Where R denotes the mask's response at any point within the image and T denotes
a non -negative threshold value. This indicates that an isolated point is detected at
the specified value (x, y). This formulation is used to calculate the weighted
differences between the centre point and its neighbours, as an isolated point's grey
level will be significantly different than that of its neighbo urs [ ]. As illustrated in
Figure 3, the result of the point detection mask is as follows.
Figure 8.5 . (a) Gray -scale image with a nearly invisible isolated black point (b)
Image showing the detected point
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Line Detection
Line detection is the next level of complexity in terms of image discontinuity in the
direction of the image. A response could be computed f or any point in the image
that indicates the direction with which the point of a line is most closely associated.
Two masks ith and jth are used for line detection. Following that, there is
צܴצצܴצǡ്݅
It means that the corresponding points is more likely to be associated with a line in
the direction of the mask i.
Figure 8.6. Line Detector masks in (a) Horizontal direction (b) 45° direction (c)
Vertical direction (d) - 45° direction
Reference from "Di gital Image Processing fourth edition
by Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 218
218IMAGE PROCESSING
The position of a given pixel [] is defined by the greatest response calculation from
these matrices. Figure 8.7 illustrates the result of the line detection mask.
Figure 8.7. (a) Original Image (b) result showing with horizontal detector (c) with
45° detector (d) with vertical detec tor (e) with -45° detector
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
We can detect lines in a specified direction using lines detector masks. For instance,
we're interested in locating all lines th at are one pixel thick and oriented at -45
degrees. For this, we use a digitized (binary) portion of an electronics circuit's wire -
bond mask. The o utcomes are depicted in Figure 8.8 .
Figure 8.8. (a) Image of a wire -bond mask (b) Result of processing with the -45°
detector (c) Zoomed view of the top, left region of -45° detector (d) Zoomed view
of the bottom, right region of -45° detector (e) Absolute value of -45° detector (f)
All points whose values satisfied the condition g >= T, where g is the image in (e)
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Edge detection
Due to the peculiarity of isolated points and lines of unitary pixel thickness in most
practical applications, edge detection is the most frequently used technique for grey
level discontinuity segmentation. An edge is a line that separates two regions of
varying intensity. It is extremely useful for detecting discontinuities in images.
When an image t ransitions from darkness to l ightness or vice versa. Figure 8.9
illustrates the changes in intensity, first -order derivative, and second -order
derivative.
munotes.in
Page 219
219Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
Figure 8.9. (a) Intensity profile (b) First -order derivatives
(c) Second -order derivatives
Referen ce from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
First -order derivatives.
First-order derivatives respond whenever there is a discontinuity in intensity levels.
On the leading edge, it's positive, and on the trailing edge, it's negative. An image
f(x, y) and the gradient operator f are given in the following equation:
݂ൌܩ௫
ܩ௬൨= ఋ
ఋ௫
ఋ
ఋ௬
Strength of രሬሬሬሬሬሬ݂ is given by
f = magnitude of രሬሬሬሬሬሬ݂
ൌඥܩ௫ଶܩ௬ଶ
؆צܩ௫צצܩ௬צ
It gives the strength of edge at location (x, y)
ߙሺݔǡݕሻൌିଵܩ௫
ܩ௬
munotes.in
Page 220
220IMAGE PROCESSING
Figure 8 .10. (a) Original Image (b) צܩ௫צcomponent of the gradient along x -
direction (c) צܩ௬צcomponent of the gradient along y -direction (d) Gradient
Image צܩ௫צצܩ௬צ
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
It is possible to calculate the gradient of an image in several ways.
Prewitt Edge operator
Figure 8.11 . Masks used for Prewitt Edge operator
There is a gradient in the horizontal direction, and a gradient in the vertical
direction, which is found by a mask. The mask also computes the gradient in the
horizontal direction, which is found by another mask. ܩ௫ and ܩ௬ components can
be found at different locations in an image by using these two masks. In this way,
we can determine the strength and direction of the edge at a given location (x, y).
Sobel Edge operator
Figure 8.12 . Masks used for Sobel Edge operator
Reference from "Digital Image Processing fourth edition
by Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 221
221Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
It calculates the average value of an image. Image noise has a negative impact on
image quality. As a result, it is preferable to the prewitt edge operator because it
produces a smoothing effect and reduces spurious edges caused by noise in the
image.
Second -order derivatives
It is positive at the darker side and negative at the white side. It is very sensitive to
noise present in an image. That’s why it is not used for edge detection. But, it is
very useful for extracting some secondary information i.e . we can find out whether
the point lies on the darker side or the white side.
Zero -crossing: It is useful to identify the exact location of the edge where there is
gradual transition of intensity from dark to bright region and vice -versa.
The derivative o perators have a variety of higher -order relationships:
Laplacian operator
The Laplacian mask is given by:
Figure 8.13 . Masks used for Laplacian operator
Reference from "Digital Image Processing fourth edition
by Rafael C. Gonzalez and Richard E. Woods)
ൌࢾࢌ
ࢾ࢞ +ࢾࢌ
ࢾ࢟
If we consider the diagonal elements:
Figure 8.14 . Masks used for Laplacian operator using 8 -connectivity
Reference from "Digital Image Processing fourth edition
by Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 222
222IMAGE PROCESSING
It is less sensitive to noise, and because of this, it can cause a double edge effect.
However, extracting secondary information is quite helpful with it.
To reduce the effect of noise, we first apply a Gaussian operator to smooth the
image, and then we use the Laplacian operator to refine it. Together, the LoG
(Laplacian of Gaussian) operator is known as the Laplacian of Gaussian operation.
LoG operator
The LoG mask is given by
Figure 8.15 . Masks used for LoG operator
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
The Gaussian operator is given by:
݄ሺݔǡݕሻൌቆെݔଶݕଶ
ʹߪଶቇ
Where ݔଶݕଶൌݎଶ
ࢎൌቆݎଶെߪଶ
ߪସቇ࢞ࢋቆെݎଶ
ʹߪଶቇ
Canny operator
It is very important method to find edges by isolating noise from the image before
find edges of images, without affecting the features of the edges in the image and
then applying the tendency to find the edges in the image and the critical value for
threshold.
1. Convolve image f(r, c) with a Gaussian function to get smooth image g(r, c).
f(r, c) = f(r, c) * G(r, c)
2. Apply first difference gradient operator to compute edge strength.
3. Apply non-maximal or critical suppression to the gradient magnitude.
4. Apply threshold to the non -maximal suppression image.
munotes.in
Page 223
223Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
The result of all operators are shown
Figure 8.16 . (a) Original image (b) Result using with Prewitt operator (c) Result
using with Roberts operator (d) Result using with Sobel operator (e)Result using
with LoG operator (f) Result using with Canny operator
Reference from "Digital Image Processing fourth e dition by
Rafael C. Gonzalez and Richard E. Woods)
Marr Hildreth Edge Detector (Laplacian of Gaussian)
Neuroscience is one of the sources of inspiration for the Marr Hildreth edge
detector. The filter Marr developed is a Laplacian filter. The Laplacian filter is especially susceptible to noise. Edge detection is commonly done using the Laplacian of an image, which is especially useful for revealing regions of rapid
intensity change. Before applying the Laplacian filter, we should use a Gaussian
filter to properly process the image. The associative property of the convolution
operation applies both to the Gaussian smoothing filter and the Laplacian filter,
which allows us to first perform the convolution and then do the hybrid filter
operation in order to achieve the same result. The image to the right shows the
results of this experiment.
munotes.in
Page 224
224IMAGE PROCESSING
Figure 8.17 a) shape of LoG with inverted hat
b) 9X9 LoG mask when sigma = 1.4
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
The shape of LoG is somewhat similar to an inverted hat. This image shows the
9X9 LoG mask when sigma = 1.4. The LoG of an image can be obtained directly
by convolving the mask over the image. After obtaining the LoG of an im age, what
do we do?
Source: https://homepages.inf.ed.ac.uk/rbf/HIPR2/log.htm
Figure 8.18 a) intensity variation of an image with a step edge
b) LoG of the image
This image illustrates the intensity variation of an image with a step edge as it
relates to the x -axis. The plot in the right side of this image illustrates the LoG of
the image. When there is an edge in the image, there is a zero crossing in LoG.
munotes.in
Page 225
225Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
Since t his is a simple and obvious demonstration, it shows that we can determine
the edge when a zero crossing occurs.
Figure 8.19
a) After applying LoG filter b) After applying threshold over zero crossing
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
The Canny Edge Detector
A multi -step edge detection algorithm is also known as a "canny edge detector."
Detection of edges can be achieved by following the steps listed below.
1. Removal of random noise in an input image via Gaussian distribution filter
2. Gaussian filter is used to calculate the gradient magnitude, which is then
used to calculate the magnitude of the image pixels along the x and y axes.
3. Considering a group of neighbors for any curve in a direction perpendicular
to the given edge, suppress the non -max edge contributor pixel points.
4. Furthermore, use the Hysteresis Thresholding method to protect the pixels
that are above the gradient m agnitude, even if they are only slightly above
it, while neglecting the ones that are lower threshold value.
Before getting too deep into the techniques below, consider the following
three conclusions first showed up at by J.K. Canny who created the
algor ithm:
x Good Detection: To avoid getting a false positive or a false negative, the
optimal detector must be reliable.
x Good Localization : The detected edges are considered to be close to true
edges.
x Single Response Constraint: For each edge point, only one po int must be
returned by the detector.
munotes.in
Page 226
226IMAGE PROCESSING
Steps to follow during Canny Algorithm:
Noise Reduction
An edge detector is a type of high pass filter that amplifies high -frequency
components while suppressing low -frequency ones. Due to the fact that both edges
and noise are high -frequency components, edge detectors have a tendency to
amplify noise. To avoid this, we use a low -pass filter to smooth the image. Canny
accomplishes this through the use of a Gaussian filter.
Figure 8.20
[Image reference from https://theailearner.com/2019/05/22/canny -edge -detector/ ]
While a larger filter reduces noise, it degrades edge localization, and vice versa. In
general, a value of 55 is a good choice, but this may vary according to the image.
Finding Intensity Gradient o f the Image
The pixel might not be close to matching up with its neighboring pixels when the
noise is present. Inappropriate edge detection may occur. In order to avoid having
the same issue, we use the Gaussian filter, which is applied to the image, then
convolved with the image, removing the noise, preventing the desired edges in
output images.
Convoluting the Gaussian filter or kernel g(x,y) with an image I is demonstrated in
the following example. Here, we want to ensure that any given pixel in the outp ut
is similar to its neighbors , and so we use the matrix [1 1 1] to maintain this similarity
and remove noise.
ܵൌܫכ݃ሺݔǡݕሻൌ݃ሺݔǡݕሻכܫ
݃ሺݔǡݕሻൌͳ
ξʹߪߨ݁ି௫మା௬మ
ଶఙమ
g(x,y)= Gaussian Distribution
I = input image
munotes.in
Page 227
227Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
Derivative:
Calculate the derivative of the filter with respect to the X and Y dimensions and
convolve it with I to obtain the magnitude of the gradient along the dimensions.
Additionally, the image's direction can be calculated using the tangent of the angle
formed by the two dime nsions.
ܵൌሺ݃כܫሻൌሺ݃ሻכܫ
ܵൌቂ݃௫
݃௬ቃכܫൌ݃௫כܫ
݃௬כܫ൨
ܵൌۏێێۍ߲
ݔ߲
߲
ݕ߲ےۑۑېൌቂ݃௫
݃௬ቃ
The convolution described above generates a gradient vector with magnitude and
direction.
ሺܵ௫ǡܵ௬ሻݐ݊݁݅݀ܽݎܩ ݒ݁ܿݐݎ
݉ݐ݅݊݃ܽݑ݁݀ൌට൫ܵ௫ଶܵ௬ଶ൯
݊݅ݐܿ݁ݎ݅݀ ൌߠൌିଵܵ௫
ܵ௬
The following is an illustration of Gaussian Derivatives, which contribute to the
edges in output images.
Figure 8.21 Gaussian Derivatives
[Image reference from https://theailearner.com/2019/05/22/canny -edge -detector/ ]
Clearly, the edges remain quite blurred or thick. Keep in mind that an edge detector
should produce a single precise response corresponding to the edge. As a result, we
munotes.in
Page 228
228IMAGE PROCESSING
must thin the edges, or in ot her words, locate the largest edge. This is accomplished
through the use of Non -max Suppression.
Non-Max Suppression
This is a technique for thinning the edges. This checks whether or not a pixel is a
local maximum in its neighborhood in the direction of t he gradient. It is retained as
an edge pixel if it is a local maximum; otherwise, it is suppressed.
Neighboring pixels are located in the horizontal, vertical, and diagonal directions
(0°, 45°, 90°, and 135°) for each pixel. As a result, we must round off the gradient
direction at each pixel to one of these values, as illustrated below.
Figure 8.22 Neighboring pixels are located in the horizontal, vertical, and
diagonal directions (0°, 45°, 90°, and 135°) for each pixel.
[Image reference from https://theailearner.com/2019/05/22/canny -edge -detector/ ]
After rounding, we'll compare each pixel's value t o the gradient direction's two
neighboring pixels. If that pixel signifies a local maximum, it is managed to retain
as an edge pixel; otherwise, it is suppressed. Thus, only the most substantial
responses will remain.
Consider the following.
Assume the gra dient direction is 17 degrees for pixel 'A.' we will round to 0 degrees
because 17 is closer to 0. Then, using the rounded gradient direction, we select
neighboring pixels (See B and C in below figure). If A has a greater intensity value
than B and C, it i s retained as an edge pixel; otherwise, and it is suppressed.
munotes.in
Page 229
229Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
Figure 8.22 Gradient direction
[Image reference from https://theailearner.com/2019/05/22/canny -edge -detector/ ]
Figure 8.23 a) Gradient Magnitude b) Non -max suppression
[Image reference from https://theailearner.com/2019/05/22/canny -edge -detector/ ]
Clearly, the edges are thinned, but some are brighter than others. The brighter ones
can be considered strong edges, while the lighter ones may be true edges or may be
noise.
Hysteresis Thresholding:
Non-max suppression produces an image with a far more accurate depiction of its
real edges. However, as can be seen, certain edges are brighter than others. The
brighter ones can be considered strong edges, whereas the lighter ones may be true
edges or may be noise. Canny employs hysteresis thresholding to resolve the
question of "which edges are truly edges and which are not." We define two
thresholds in this section: 'High' and 'Low.'
x Any edges with intensity greater than ‘High’ are the sure edges.
x Any edges with intensity less than ‘Low’ are sure to be non -edges.
x The edges between ‘High’ and ‘Low’ thresholds are classified as edges only
if they are connected to a sure edge otherwise discarded.
Let’s take an ex ample to understand
munotes.in
Page 230
230IMAGE PROCESSING
Figure 8.24 'High' and 'Low threshold
[Image reference from https://theailearner.com/2019/05/22/canny -edge -detector/ ]
Here, A and B are sure -edges as they are above ‘High’ threshold. Similarly, D is a
sure non -edge. Both ‘E’ and ‘C’ a re weak edges but since ‘C’ is connected to ‘B’
which is a sure edge, ‘C’ is also considered as a strong edge. Using the same logic
‘E’ is discarded. This way we will get only the strong edges in the image.
Figure 8.25 a) Non -max suppression b) Final Output
Local Processing
In this type of analysis, two primary properties are used to determine the similarity
of edge pixels:
1. The strength of the response of the gradient operator used to generate the
edge pixel.
2. The gradient 's direction
Within a small neighbourhood, for example, 3x3, 5x5, all points with shared
properties are connected. If both the magnitude and direction criteria are met, a
point (x',y') in the vicinity of (x,y) is linked to the pixel at (x,y).
|I[
\
í I[\_7KUHVKROG7P
_Į[
\
íĮ[\_7KUHVKROG
munotes.in
Page 231
231Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
Figure 8.26 (a) Original image ; (b) detection result without local processing ; (c)
detection result with local processing (Tm=0.15 x max(| f|) and Td=pi/9
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Global Processing Using the Hough Transform
The Hough transform can be used to link pixels and identify curves. In the polar
coordinate system, the straigh t line denoted by y=mx+c can be expressed as,
ȡ [FRVș\VLQș«««««««L
Where, denotes a vector extending from the origin to the point on the straight line
y=mx+c that is closest to the origin. This vector will be perpendicular to the line
from the origin to the point closest to the line, as illustrated below.
Figure 8.27 Vector perpendicular to the line from the origin to the point closest to
the line
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Any line in the x, y plane pertains to a point in the two -dimensional space defined
by the SDUDPHWHUDQGș7KH+RXJKWUDQVIRUPRIDVWUDLJKWOLQHLQWKH[ ,y plane is a SDUWLFXODUSRLQWLQWKHȡșVSDFHDQGWKHVHSRLQWVPXVWIXOILOWKHVSHFLILHGHTXDWLRQwith the constants x1,y1. Thus, the locus of all such lines in the x, y plane
corresponds to the location of the particular sinusoidal curve in, space.
munotes.in
Page 232
232IMAGE PROCESSING
Assume we have the edge points xi,yi that are parallel to the straight line with
SDUDPHWHUVȡș(DFKHGJHSRLQWFRUUHVSRQGVWRDVLQXVRLGDOFXUYHLQVSDFHEXW
WKHVHFXUYHVPXVWLQWHUVHFWDWȡș'XHWRWKHIDFWWKDWWKLVLVDVKDUHGOLQH
Consider the following equation for a line: y1= ax1+b
By varying the values of a and b in this equation, an infinite number of lines can
pass through this point (x1,y1).
Figure 8.28 line: y1= ax1+b
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
If we rewrite the equation as follows:
B= -ax1+y1
A straight line is obtained if we consider the ab plane instead of the xy plane (xi,yi).
This entire line in the ab plane is due to a single point in the xy plane and different
values of and b. Another point (x2, y2) in the xy plane can be considered here. Its
slope intercept equation is,
< D[E«««««««
This is what we get when we write it in terms of the ab plane.
B= -D[\««««««
In the ab plane, this is a different line. In the ab plane, these two lines will intersect
only if they are part of a straight line in the x -axis plane. (a',b') is the coordinates
of the point of intersection in the ab plane. A slope -intercept equation y=a'x+b' can
be written as follows: y=a'x+b'. This line passes through the points (x1,y1), and
(x2,y2) in the y -axis.
munotes.in
Page 233
233Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
Figure 8.29 line passes through the points (x1,y1), and (x2,y2) in the y -axis.
Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
8.2 Thresholding
In terms of image segmentation, thresholding is the simplest method. Using
thresholding to create binary images from a grayscale image
The Basics of Intensity Thresholding
Assume that the intensity histogram in Fig. 8.30 (a) refers to an image, f( x, y), that
is comprised of light objects on a dark background, and that the intensity values of
the object and background pixels are sorted into two dominating modes. One
simple method of distinguishing the objects from the background is to set a
threshol d, T, which differentiates the two modes. Then, any point (x, y) in the image
at which the object is located is referred to as an object point. In any other case, the
point is referred to as a background point. With another way of saying it, the
segmented image, indicated by g(x, y), can be represented as
A coordinate f(x, y) is a measure of the intensity of f. (x, y).
݃ሺݔǡݕሻൌ൜ͳ݂݂݅ሺݔǡݕሻܶ
Ͳ݂݂݅ሺݔǡݕሻܶ
Global thresholding occurs when T is a constant that is applied to a whole image.
We use the term variable thresholding when the value of T varies over an image.
As a result, the value of T at every given location (x, y) in an image depends on the
features of the neighbourhood of (x, y) (for example, the average intensity of the
pixels in the neighbor hood). The term dynamic thresholding or adaptive
thresholding is typically used when T relies on the spatial coordinates (x, y). These
terms are not used universally.
Figure 8.30(b) illustrates a more complex and difficult thresholding problem
involving a histogram with three dominant modes, for example, two types of light
objects on a dark background. Multiple thresholding categorizes a point (x, y) as
munotes.in
Page 234
234IMAGE PROCESSINGrelating to the background if f(x,y) T2. That is to say, the segmented image is specified by
݃ሺݔǡݕሻൌቐܽሺǡሻ൏ଵ
ܾଵሺǡሻଶ
ܿሺǡሻଶ
FIGURE 8.30 Intensity histograms that can be partitioned (a) by a single threshold, and (b) by dual thresholds
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods) Basic Global Thresholding Whenever the intensity distributions of objects and background pixels are substantially distinct it is possible to utilize a single (global) threshold that applies to the whole image In many other applications, there is typically sufficient variation across images so, even though global thresholding is a reasonable method an algorithm capable of predicting the threshold value for each image is needed. For this aim, the iterative approach described below can be used: 1. An initial threshold (T) is selected; this could be accomplished at random or
by any other method that is preferred. 2. In the same manner as explained before, the image is segmented into object
and background pixels, yielding two sets of pixels:
a. G1 = {f(m,n):f(m,n)>T} (object pixels)
b. G2 = {f(m,n):f(m,n) T} (background pixels) (note, f(m,n) is the value
of the pixel located in the mth column, nth row) 3. The average of each set is computed.
a. m1 = average value of G1
b. m2 = average value of G2 4. A new threshold is created that is the average of m1 and m2
a. T‘ = (m1 + m2)/2
munotes.in
Page 235
235Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
5. Return to step two, this time using the new threshold computed in step four,
and repeat the process until the new threshold matches the one before it (i.e.,
until convergence has been achieved).
Segmentation using the preceding iterative algorithm is illustrated in Figure 8.31.
The original image is shown in Figure 8.31 (a), and the image histogram, which
demo nstrates a distinct valley, is shown in Figure 8.31(b). The basic global
algorithm produced the threshold after three iterations, starting with T equal to the
image's average intensity, and using Figure 8.31 (c) to segment the original image.
As expected g iven the histogram's distinct separation of modes, the segmentation
of object and background was flawless.
FIGURE 8.31(a) Noisy fingerprint. (b) Histogram. (c) Segmented result using a
global threshold (thin image border added for clarity). (Original ima ge courtesy
of the National Institute of Standards and Technology.).
(Reference from "Digital Image Processing fourth edition
by Rafael C. Gonzalez and Richard E. Woods)
Optimum Global Thresholding Using Otsu’s Method
Let us begin by comprehending Otsu's approach. The method processes the image
histogram, segmenting the objects based on the class variance minimization.
Generally, this technique produces the desired results when dealing with bimodal
images. The histogram of such an image contains two distinct peaks that correspond to distinct ranges of intensity values.
The fundamental idea is to divide the image histogram into two clusters using a
threshold defined by the weighted variance of these classes, denoted by ߪ௪ଶሺݐሻ
The enti re computation equation can be summarized as follows :ߪ௪ଶሺݐሻൌ
ݓଵሺݐሻߪଵଶሺݐሻݓଶሺݐሻߪଶଶሺݐሻ, where ݓଵሺݐሻ and ݓଶሺݐሻare the probabilities of the
two classes divided by a threshold t that is between 0 and 255 inclusively.
As demonstrated in Otsu's paper, there are usually two ways to determine the
threshold. The first objective is to minimize the within -class variance defined
munotes.in
Page 236
236IMAGE PROCESSING
aboveߪ௪ଶሺݐሻ, while the second objective is to maximise the between -class variance
using the follo wing expression:
ߪଶሺݐሻൌݓଵሺݐሻݓଶሺݐሻሾߤଵሺݐሻെߤଶሺݐሻሿଶWhere ߤ denotes the mean of class i.
The probability P is computed for every pixel value in two distinct clusters C 1 and
C2 using the cluster probability functions
ݓଵሺݐሻൌܲሺ݅ሻ௧
ୀଵ
ݓଶሺݐሻൌܲሺ݅ሻூ
ୀ௧ାଵ
It's important to note that the image can be viewed as an intensity function f(x , y),
with gray -level values .The number of pixels that have a specified gray -level I is
signified by i. The image's as a whole pixel count is n.
Thus, the probability of encountering gray -level I is:
ܲሺ݅ሻൌ݊
݊
The C 1 pixel intensity values are in the range [1, t], while the C 2 pixel intensity
values are in the range [t + 1, I], where I is the maximum pixel value (255).
The following phase is to achieve the means for C 1 and C 2, which are denoted
appropriately by ߤଵሺݐሻ andߤଶሺݐሻ):
ߤଵሺݐሻൌ݅ܲሺ݅ሻ
ݓଵሺݐሻ௧
ୀଵ
ߤଶሺݐሻሻൌ݅ܲሺ݅ሻ
ݓଶሺݐሻூ
ୀ௧ାଵ
Now distinctly remember the above equation for the weighted variance within
classes. We will determine the remaining components ( ߪଵଶ,ߪଶଶ) by combining all
of the above -mentioned ingredients:
ߪଵଶሺݐሻൌሾ݅െߤଵሺݐሻሿଶ݅ܲሺ݅ሻ
ݓଵሺݐሻ௧
ୀଵ
ߪଶଶሺݐሻሻൌሾ݅െߤଶሺݐሻሿଶ݅ܲሺ݅ሻ
ݓଶሺݐሻூ
ୀ௧ାଵ
It's important to note that if the threshold is selected wrongly, the variance of certain
class will be quite large. To obtain the total variance, we essentially have to add the
variances within and bet ween classes: ߪ்ଶൌߪ௪ଶሺݐሻߪଶሺݐሻ , where ߪଶሺݐሻൌmunotes.in
Page 237
237Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detectionݓଵሺݐሻݓଶሺݐሻሾߤଵሺݐሻെߤଶሺݐሻሿଶ. The image's total variance (ߪ்ଶ) is independent of the
threshold.
Thus, the pipeline of the general algorithm for the between -class variance
maximization option can be represented as follows:
1. calculate the histogram and intensity level probabilities
2. initialize ݓଵሺͲሻ, ߤሺͲሻ
3. iterate over poss ible thresholds: t = 0,..., max _intensity
x update the values of ݓ ,ߤ, where ݓi is a probabilit y and ߤ is a
mean of class i
x calculate the between -class variance value ߪଶሺݐሻ
4. the final threshold is the maximum ߪଶሺݐሻ value
Otsu’s Binarization Application
The authors propose an enhanced Otsu's method as one approach for estimating the
location of underwater landmarks. The advantages of this approach are that it
allows for precise real -time segmentation of underwater features and has been
demonstrated to perform better than threshold segmentation methods. Consider its
concept in greater detail by examining the side -scan sonar (SSS) shipwreck image
provided in the article. Modern SSS systems are capable of imaging a large area of
the sea floor in two dimensions and produc ing realistic images. Thus, their
background contains regions of sludge and aquatic animals in the form of spots that
are typically equal to 30 pixels in diameter.
FIGURE 8.32 : SSS sea bottom image
[ Image reference https://learnopencv.com/otsu -thresholding -with-opencv/ ]
They distort proper image processing because their grey level is similar to that of
certain zones of foreground objects. As shown below, the segmented image
produced by the classical Otsu technique co ntains these artefacts:
munotes.in
Page 238
238IMAGE PROCESSING
FIGURE 8.33 : SSS sea bottom image
[ Image reference https://learnopencv.com/otsu -thresholding -with-opencv/ ]
To address this spot challenge, a technique is based on Otsu's binarization was
developed to restrict the search range of the suitable segmentation threshold for
foreground object division. The following is the i mproved Otsu's method pipeline:
1. Calculate Otsu's threshold T
2. Calculate N 30 using Canny edge detection
3. if N 30 > 300, calculate final T value.
As a result, the wrecked ship stands out clearly from the background:
FIGURE 8.34 : Improved Otsu’s result
[ Image reference https://learnopencv.com/otsu -thresholding -with-opencv/ ]
Using Image Smoothing to Improve Global Thresholding
The image from Figure 8.35 (a), the histogram is shown in Figure 8.35 (b), and the
image is shown in Figure 8.35 (c) after it has been thresholded using Otsu's method.
Each black point in the white region and each white point in the bla ck region
represents a thresholding error, indicating that the segmentation was a complete
failure. The result of smoothing the noisy image with a size averaging kernel is
shown in Figure 8.35 (d), and the histogram is shown in Figure 8.35 (e). Smoothing
improves the shape of the histogram, and we would expect the smoothed image's
thresholding to be nearly perfect. As illustrated in Figure 8.35 (f), this is the case.
munotes.in
Page 239
239Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
The blurring of the boundary between object and background resulted in the slight
distortio n of the object -background boundary in the segmented, smoothed image.
Indeed, the more aggressively we smooth an image, the more boundary errors in
the segmented result we should anticipate.
FIGURE 8.35 (a) Noisy image (c) and (b) its histogram. (c) Result obtained using
Otsu’s method. (d) Noisy image smoothed using an averaging kernel and (e) its
histogram. (f) Result of thresholding using Otsu’s method
(Reference from "Digital Image Processing fourth ed ition
by Rafael C. Gonzalez and Richard E. Woods)
Following that, we examine the effect of significantly shrinking the foreground
region in comparison to the background. The result is depicted in Figure 8.36 (a).
In this image, the noise is additive Gauss ian with a mean of zero and a standard
deviation of ten intensity levels (as opposed to 50 in the previous example). As
illustrated in Fig. 8.36 (b), the histogram lacks a clear valley, implying that
segmentation will fail, as confirmed by the result in Fi g. 8.36 (c). The image
smoothed with a size averaging kernel is shown in Figure 8.36 (d), and the
corresponding histogram is shown in Figure 8.36 (e). As expected, the net effect
was to narrow the histogram's spread, but the distribution remained unimodal. As
illustrated in Fig. 8.36 (f), segmentation failed once more. The reason for the failure
is that the region is so small that its contribution to the histogram is negligible in
comparison to the noise -induced intensity spread. In these instances, the fol lowing
section's approach is more likely to succeed.
munotes.in
Page 240
240IMAGE PROCESSING
FIGURE 8.36 (a) Noisy image and (b) its histogram. (c) Result obtained using
Otsu’s method. (d) Noisy image smoothed using a averaging kernel and (e) its
histogram. (f) Result of thresholding using Otsu’s method. Thresholding failed in
both cases to extract the object of interest
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Using Edges to Improve Global Thresholding
The foreground and background of an image are a necessary condition for
segmenting it using a glob al threshold. There is sufficient space between the
histogram peaks, such that by adjusting the threshold between the gaps, the
foreground can be distinguished from the background. OneOkHistograms, or
histograms suitable for global threshold segmentation, frequently exhibit the
following properties:
x tall
x narrow
x symmetric
x separated by deep valleys
However the human eye appears to have a high contrast between foreground and
background at times, segmenting an image with a global threshold is not easy.
Determine the threshold adaptively, because the foreground or background content
is excessive, swamping the contribution of the other party to the grayscale
histogram. This experiment is designed to address this issue. The central idea to
solve this problem is Stat histogram only for the place where the foreground and
munotes.in
Page 241
241Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
the background are adjacent. And to find nei ghboring places can often be through
magnitude of gradient of image or Laplacian operator get .
The following algorithm summarizes the preceding discussion, where f(x, y)
denotes the input image:
1. using any of the methodologies, evaluate an edge image as the magnitude of
the gradient or the absolute value of the Laplacian of f(x, y).
2. Enter a value for the threshold, T.
3. Using T from Step 2, threshold the image from Step 1 to create a binary
image, ்݃ሺݔǡݕሻ
This image is used as a mask in the subsequent step to select pixels from f(x,
y) that correspond to the mask's "strong" edge pixels.
4. Create a histogram by excluding the pixels in f(x, y) that correspond to the
locations of the 1 -valued p ixels in ்݃ሺݔǡݕሻ
5. Using the histogram created in Step 4, globally segment f(x, y) using, for
example, Otsu's method.
If T is set to a value less than the minimum value of the edge image, it will contain
only 1's, implying that the image histogram wi ll be computed using all pixels in
f(x, y). In this particular instance, the previous algorithm performs global
thresholding on the original image's histogram. It is common practice to specify T
as a percentile, which is typically set high (e.g., in the hi gh 90's) to ensure that only
a small subset of the gradient/Laplacian image pixel is used in the computation.
The following examples demonstrate the previously discussed concepts. The
gradient is used in the first example, while the Laplacian is used in th e second.
Using either approach, similar results can be obtained in both examples.
EXAMPLE 1 : Using edge information based on the gradient to improve global
thresholding.
The critical point is the images and histograms in Figures 8.37(a) and (b)
correspond to those in Figure 8 .36. As you saw, smoothing accompanied by
thresholding was unable to segment this image. The purpose of this example is to
demonstrate how to solve a problem using edge information. The mask image in
Figure 8.37(c) was created using a gradient magnitude image thresholded at the
99.7 percentile. The image in Figure 8.37 (d) is the result of multiplying the mask
by the input image. The histogram in Figure 8.37(e) represents the nonzero
elements in Figure 8.37(d). Take note that this histogram possesses the critical
characteristics discussed previously; namely, it contains reasonably symmetrical munotes.in
Page 242
242IMAGE PROCESSING
modes separated by a deep valley. Thus, whereas the histogram of the original noisy
image did not indicate the possibility of successfully thresholding, the histogram in
Fig. 8.37(e) indicates that it is possible to successfully threshold the small object
from the background. As illustrated in Fig. 8.37(f), this is t he case. This image was
created by applying Otsu's to the noisy image in Fig. 8.37 (a) and then applying the
Otsu threshold globally to the noisy image. The outcome is nearly flawless generate
an appropriate derivative image.
FIGURE 8.37 (a) Noisy image from Fig. 8.36 (a) and (b) its histogram. (c) Mask
image formed as the gradient magnitude image thresholded at the 99.7 percentile.
(d) Image formed as the product of (a) and (c). (e) Histogram of the nonzero
pixels in the image in (d). (f) Result of segmenting image (a) with the Otsu
threshold based on the histogram in (e). The threshold was 134, which is
approximately midway between the peaks in this histogram
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
EXAMPLE 2: Using edge information based on the Laplacian to improve
global thresholding.
We take into account a much more complicated thresholding problem in this
example. Figure 8.38 (a) illustrates an 8 -bit image of yeast cells on which we wish
to perform global thresholding in order to achieve the regions pertaining to the
bright spots. As a starting point, Fig. 8.38 (b) depicts the image histogram, and Fig.
8.38 (c) depicts the result obtained by applying Otsu's method directly to the image.
As can be seen, Otsu's method fell short of the original objective of detecting bright
spots. While the method was successful in isolating several cell regions, several of
the segmented regions on the right were actually joined. The Otsu method
determined a threshold of 42 and a separability measure of 0.636.
munotes.in
Page 243
243Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
FIGURE 8.38 (a) Image of yeast cells. (b) H istogram of (a). (c) Segmentation of
(a) with Otsu’s method using the histogram in (b). (d) Mask image formed by
thresholding the absolute Laplacian image. (e) Histogram of the nonzero pixels in
the product of (a) and (d). (f) Original image thresholded us ing Otsu’s method
based on the histogram in (e).
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
The mask image ்݃ሺݔǡݕሻis shown in Figure 8.38 (d). It was created by calculating
the exact value of the Laplacian image and then thresholding it with T set to 115
on an intensity scale ranging from 0 to 255. T equals nearly to the 99.5 percentile
of the values in the exact Laplacian image, and thus th resholding at this level results
in a limited set of pixels, as illustrated in Fig. 8.38 (d). As anticipated from the
previous section, the points cluster close to the edge of the bright spots in this
image. The histogram in Figure 8.38 (e) represents the nonzero pixels in the product
of (a) and (b) (d). Finally, Fig. 8.38 (f) illustrates the outcome of dividing the actual
image globally using Otsu's method based on the histogram in Fig. 8.38 (e). This
result is consistent with the locations of the image's bright spots. The Otsu method
calculated a threshold of 115 and a separability measure of 0.762, which are both
greater than the values obtained using the original histogram. We can even improve
the segmentation of complete cell regions by varying the perc entile at which the
threshold is set. For instance, Fig. 8.38 illustrates the result obtained using the same procedure as described previously, but with the threshold set to 55, or approximately 5% of the maximum value of the absolute Laplacian image. This
value is in the 53.9 percentile of the values contained in that image. This result is
munotes.in
Page 244
244IMAGE PROCESSING
clearly superior to that shown in Fig. 8.38 (c) when Otsu's method is used in
conjunction with the histogram of the original image.
FIGURE 8.39 Image in Fig. 8.38 (a) segmented using the same procedure as
explained in Figs. 8.38 (d) through (f ), but using a lower value to threshold the
absolute Laplacian image.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Ric hard E. Woods)
Multiple Thresholds
We have thus far concentrated on image segmentation via a single global threshold.
Otsu's method is extensible to an infinite number of thresholds due to the fact that
the separability way of measuring upon which it is ba sed is also extensible to an
infinite number of classes. The between -class variance generalizes to the expression in the case of K classes.
ߪଶൌܲሺ݉െ݉ீሻଶ
ୀଵ
Where
ܲൌ݅
ఢೖ
And
݉ൌ݅
ఢೖ
As previously stated, the global mean is ݉ீ 7KH.FODVVHVDUHVHSDUDWHGE\.í
thresholds whose values ݇ଵכǡ݇ଶǡכǥǤǡ݇ିଵכ
ߪଶ൫݇ଵכǡ݇ଶǡכǥǤǡ݇ିଵכ൯ൌ
ழభழమழڮ಼షభழିଵߪଶሺ݇ଵǡ݇ଶǡǥǡ݇ିଵሻ
Even though this outcome holds true for any number of classes, it ends up losing
implying as the number of classes increases due to the fact that we have been
munotes.in
Page 245
245Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
dealing with a single variable (intensity). Indeed, the variance between classes is
frequently ex pressed in terms of multiple variables expressed as vectors .In
practice , whenever there is reason to assume that the issue can be resolved
appropriately with two thresholds, multiple global thresholding is considered an
appropriate approach. Usually, appl ications requiring more than two thresholds are
solved using a combination of intensity values and other parameters. Rather than
that, additional descriptors (for example, colour) are used, and the application is
recast as a pattern recognition problem.
The between -class variance for three classes consisting of three intensity intervals
(separated by two thresholds) is given by:
ߪଶൌܲଵሺ݉ଵെ݉ீሻଶܲଶሺ݉ଶെ݉ீሻଶܲଷሺ݉ଷെ݉ீሻଶ
Where
ܲଵൌܲభ
ୀ
ܲଶൌܲభ
ୀభାଵ
ܲଷൌܲିଵ
ୀమାଵ
And
݉ଵൌͳ
ܲଵܲభ
ୀ
݉ଶൌͳ
ܲଶܲభ
ୀభାଵ
݉ଷൌͳ
ܲଷܲିଵ
ୀమାଵ
The following relationships also hold:
ܲଵ݉ଵܲଶ݉ଶܲଷ݉ଷൌ݉ீ
And
ܲଵܲଶܲଷൌͳ
The three classes are separated by two thresholds whose values ݇ଵכand
݇ଶכmaximize
ߪଶሺ݇ଵכǡ݇ଶכሻൌ
ழభழమழିଵߪଶሺ݇ଵǡ݇ଶሻ
Algorithm munotes.in
Page 246
246IMAGE PROCESSING
(1) Let ݇ଵൌͳ
(2) Increment ݇ଶ through all its values greater than ݇ଵ DQGOHVVWKDQ/í
(3) Increment ݇ଵ to its next value and increment ݇ଶ through all its values greater
than ݇ଵ and less than െͳ
(4) Repeat until ݇ଵൌെ͵
This results in a 2 -D array ߪଶሺ݇ଵǡ݇ଶሻ, after which ݇ଵכand ݇ଶכ that correspond to
the maximum value in the array, are selected
Segmentation is as follows: ݃ሺݔǡݕሻൌቐܽǡ݂݂݅ሺݔǡݕሻ݇ଵכ
ܾǡ݂݅݇ଵכ൏݂ሺݔǡݕሻ
ܿǡ݂݂݅ሺݔǡݕሻ݇ଶכ݇ଶכ
Separability measure :ߟሺ݇ଵכǡ݇ଶכሻൌఙಳమሺభכǡమכሻ
ఙಸమ
EXAMPLE 3: Multiple global thresholding
Figure 8.40 (a) depicts an iceberg. The goal of this example is to segment the image
into three distinct regions: the dark background, the illuminated portion of the
iceberg, and the shadowed portion. The image histogram in Fig. 8.40 (b)
demonstrates that two threshol ds are required to resolve this problem. The
procedure described previously resulted in the thresholds shown in Fig. 8.40 (b),
which are located near the centres of the two histogram valleys. The segmentation
shown in Figure 8.40 (c) is the result of using these two thresholds. The measure
of separability was 0.954. The primary reason this example worked so well is that
the histogram contains three distinct modes separated by fairly large, deep valleys.
However, we can do even better by utilizing superpixel s.
FIGURE 8.40 (a) Image of an iceberg. (b) Histogram. (c) Image segmented into
three regions using dual Otsu thresholds.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 247
247Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
Variable Thresholding
Noise and non uniform illumination have a significant impact on the performance
of a thresholding algorithm. We demonstra ted how image smoothing and utilize of
edge information can be extremel y beneficial. However, there are times when this
type of preprocessing is either impractical or ineffective in addressing the problem,
to the point where none of the thresholding methods discussed thus far can resolve
the problem. In those kind of cases, a s we will demonstrate in the discussion that
follows, the next level of thresholding complexity involves variable thresholding.
Variable Thresholding Based on Local Image Properties
A fundamental concept to variable thresholding is to calculate a threshold at each
point in the image, (x, y), depending on one or more required characteristics in the
image's neighborhood (x, y). While this may appear to be a lengthy process,
advanced algorithms and hardware allow rapid neighborhood processing,
particularly for basic procedures such as logical and arithmetic operation.
The mean and standard deviation of the pixel values in the neighborhood of each
point in an image are being used to illustrate the method. Since these two quantities
are descriptors of average intensity and contrast, they are helpful in determining
local thresholds. Let ݉௫௬ and ߪ௫௬ signify the mean and standard deviatio n of the
set of pixel values in an image's neighborhood ܵ௫ǡ௬, centred on the coordinates
(x, y). The following are examples of common types of variable thresholds that are
determined by the local image properties.
ܶ௫௬ൌܽߪ௫௬ܾ݉௫௬
Where a and b are denotes nonnegative constants, and
Take note that T is threshold array also the same size as the image in which it was
achieved. The threshold value at a particular location in the array (x, y) is used to
segment the value of an image at that point.
ܶ௫௬ൌܽߪ௫௬ܾ݉ீ
Where ݉ீ denotes the mean of the global image. The segmented image is
calculated as follows:
݃ሺݔǡݕሻൌቊͳ݂݂݅ሺݔǡݕሻܶ௫௬
Ͳ݂݂݅ሺݔǡݕሻܶ௫௬
Where f(x, y) denotes the input images. This equation is assessed for each pixel
point i n the image, and a different threshold is calculated with each (x, y) location
utilizing the pixels in the neighborhood ܵ௫ǡ௬.
Significant power could be inserted to variable thresholding (with only a modest
improvement in calculation) through using predi cates based on various parameters
calculated in the neighborhood of a point (x, y):
݃ሺݔǡݕሻൌ൜ͳ݂݅ܳሺ݈݈ܿܽܽݎ݉ܽݏݎ݁ݐ݁ሻ݅ݏܧܷܴܶ
Ͳ݂݅ܳሺ݈݈ܿܽܽݎ݉ܽݏݎ݁ݐ݁ሻሻ݅ݏܮܣܨܧܵ munotes.in
Page 248
248IMAGE PROCESSING
Where Q is a predicate depending on parameters calculated from neighbouring
pixels. Consider the following predicate, which is derived from the local mean and
standard deviation:
ܳ൫ߪ௫௬ǡ݉௫௬൯ൌ൜ܧܷܴ݂݂ܶ݅ሺݔǡݕሻܽߪ௫௬ܦܰܣ݂ሺݔǡݕሻܾ݉௫௬
ܨܣܵܮܧݐܱ݄ݓݎ݁݅݁ݏ
EXAMPLE 4 : Variable thresholding based on local image properties.
The yeast image is depicted in Figure 8.41 (a). Given that this image contains three
distinct intensity levels, it is logical to assume that the dual thresholding would be an effective segmentation technique. The outcome of applying the dual thresholding method is summarized in Figure 8.41 (b). As illustrated in the figure,
while it was necessary to distinguish the bright areas from the background, the mid -
gray portions upon on right side of the image were not correctly segregated (i.e.,
separated). To demonstrate the application of local thresholdi ng, we calculated the local standard deviation for all (x, y) values in the input image using a neighborhood of size the result is depicted in Figure 8.41 (c). Take note of how the
faint outer lines accurately delineate the cell boundaries. Following that, we
constructed a predicate in the manner, but by using global mean rather than When
the background is fairly constant and then all the object intensities are above or
below the background intensity, selecting the global mean usually gets better
results. T he values and have been used to accomplish the specification of the
predicate (as is typically the case in applications such as this, these parameters were
estimated experimentally). After that, the image was segmented. As illustrated in
Fig. 8.41 (d), seg mentation was quite successful. Notably, all outer regions have
been segmented correctly, and the majority of the inner, brighter regions have been
isolated correctly.
FIGURE 8.41 (a) Image from Fig. 8.40. (b) Image segmented using the dual
thresholding approach ( c) Image of local standard deviations. (d) Result obtained
using local thresholding.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 249
249Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
Variable Thresholding Based on Moving Averages
The moving average technique is a variation on the variable threshold processing
technique. The variable threshold is comparative to the processing of global
thresholds. The term "global threshold processing" relates to the procedure of
determining a fixed thr eshold based on the entire image. If the size of each pixel in
the image is greater than this Otherwise, the value is considered to be in the
background. The variable threshold indicates that each pixel or pixel block in the
image has a unique threshold. I f a pixel exceeds its associated threshold, it is
viewed to be in the foreground. The moving average method scans the entire image
in a linear zigzag pattern, creates a threshold at each point, and compares the grey
value at that point to the threshold to segment the image.
Method
Suppose a 5x5 picture is shown below, aij represents the gray value at position
(i, j).
Due to the fact that it is linearly scanned in the z -shape, the two -dimensional matrix
must be converted to a one -dimensional row matrix.
The moving average algorithm makes use of two parameters, n and b. The average
of n pixels is represented by n, and b is a threshold coefficient. The following one -
dimensional matrix could be used as a filter to filter and average the image obtained
previously.
This allows for the calculation of the average mij at each point. The threshold at
this pixel is calculated by multiplying the parameter b by mij. Then you can compare each pixel's grayscale to the threshold to obtain the final
segmented image.
munotes.in
Page 250
250IMAGE PROCESSING
FIGURE 8.42 a) Original image b ) Moving average processing c) Finally,
perform a minimum filter on the result
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
8.3 Segmentation by Region Growing and by Region Splitting and
Merging
Region -Based Segmentation:
Segmentation is used to divide an image into regions. We decided to approach this
issue by defining regions using discontinuities in grey levels and segmenting them
using thresho lds based on the distribution of pixel properties such as gray -level
values or colour.
Basic Formulation:
Assume that R is used to represent the entire image region. Segmentation can be
thought of as a process that divides R into n subregions, R1, R2 ... Rn, in such a way
that
a. ڂܴൌܴ
ିଵ
b. ܴis a connected set, for i=0,1,2,3..,n;
c. ܴתܴൌݎ݂݈݈݆ܽ݅݀݊ܽǡ്݆݅
d. ܳሺܴሻൌܴܶܧܷݎ݂ൌͲǡͳǡʹǡ͵ǤǤǡ
e. ܳ൫ܴܴ൯ൌܴܶܧܷfor any adjacent regions ܴܴ݀݊ܽ
P (R k) signifies a logical predicate described over the points in the set R i and
whereas is the null set. Condition (a) specifies that the segmentation should be
complete; each pixel must be contained within a region. Conditions (b) require
that points wit hin a region be connected in a predefined way. As indicated by
condition (c), the regions must be disjoint. Condition (d) specifies the properties
that all pixels within a segmented region must satisfy —for example, P (Ri) = TRUE
munotes.in
Page 251
251Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
if all pixels within Ri hav e the same grey level. Finally, condition (e) establishes
that regions Ri and Rj are distinct in terms of the predicate P.
Region Growing:
As the name implies, region growing is a process that groups pixels or subregions
accordance with the given criteria into larger regions. The fundamental approach
is to begin with a collection of "seed" points and grow regions from them by
attaching to each seed those neighboring pixels that share similar properties (such
as specific ranges of grey level or color). When no a priori information is provided,
the operation is to calculate the same set of properties for each pixel that will
eventually be used to assign pixels to regions during the growing process. If such
outcome of the these calculations exposes clusters of values, the pixels with
properties that put them close to the clusters' centroid could be used as seeds.
The choice of similarity criteria is determined not only by the nature of the problem
at hand, but also from the type of image data available. For inst ance, analysis of
land-use satellite imagery is highly dependent on the use of colour. Without the inherent information contained in colour images, this problem would be significantly more difficult, if not impossible, to solve. When analyzing
monochrome i mages, region analysis must be performed using a set of descriptors
based on grey levels and spatial properties (such as moments or texture).
(VVHQWLDOO\UHJLRQH[SDQVLRQVKRXOGKDOWZKHQQRDGGLWLRQDOSL[HOVIXO¿OOWKH
region's inclusion criteria. Gray l evel, texture, and colour are all local in nature and
do not take into account the region's "history." Additional criteria that enhance the
power of a region growing algorithm include the concept of size, similarity between
a candidate pixel and the pixels grown thus far (for example, a comparison of the
candidate's grey level to the average grey level of the grown region), and the shape
of the region being grown. The use of these types of descriptors is predicated on
the assumption that at least a partial model of expected results is available.
Let f(x, y) represent an input image; S(x, y) represent a seed array that included 1's
at seed point locations and 0's elsewhere ; and Q represent a predicate to be
implemented at every location (x, y). Assume that arrays f and S are of equal size.
The following is a simple region -growing algorithm based on 8 -connectivity.
1. Identify all connected components in S(x, y) and reduce them to a single
pixel; label all such pixels found as 1. All other pixels in S a re labelled with
a value of 0.
2. Create an image f Q such that at each point (x, y), f Q (x,y)=1 if the input image
fulfils a provided predicate, Q, and f Q (x,y)=0 otherwise. munotes.in
Page 252
252IMAGE PROCESSING
3. Let g be an image created by adding all the 1 -valued points in f Q that are 8 -
connected to each seed point in S.
4. Assign a unique region label to each connected component in g. (e.g.,integers
or letters). This is the image segmented as a result of region growing.
EXAMPLE 5: Segmentation by region growing
Figure 8.43 (a) illustrat es an 8 -bit X -ray image of a weld (the horizontal dark
region) with multiple cracks and porosities (the bright regions running horizontally
via the image's centre). By segmenting the defective weld regions, we prove use of
region growing. These regions cou ld have been used for number of purposes, such
as weld inspection, archiving historical studies, and controlling an automated
welding system.
FIGURE 8.43 (a) X-ray image of a defective weld. (b) Histogram. (c) Initial seed
image. (d) Final seed image (the points were enlarged for clarity). (e) Absolute
value of the difference between the seed value (255) and (a). (f) Histogram of (e).
(g) Difference image thres holded using dual thresholds. (h) Difference image
thresholded with the smallest of the dual thresholds. (i) Segmentation result
obtained by region growing
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods )
munotes.in
Page 253
253Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
The first step is to establish the seed points. Because cracks and porosities attenuate
X-rays significantly less than solid welds, we anticipate the regions probably
contains such types of defects to be substantially brighter than the remainder of the
X-ray image. We can obtain the seed points by thresholding the original image with
a high percentile threshold. The histogram of the image is shown in Figure 8.43
(b), and the thresholded outcome is shown in Figure 8.43 (c), with a threshold equal
to the image's 99.9 percentile of intensity values, which in this case was 254. . The
result of morphologically eroding each connected component in Figure 8.43 (c) to
a single point is shown in Figure 8.43 (d).
Following that, we must clearly state a predicate. I n this example, we're interested
in appending to each seed all pixels that are (a) eight -connected to it and (b)
"similar" to it. Our predicate is applied at each location (x, y) using absolute
intensity differences as a measure of similarity.
ܳൌ൝ܧܷܴ݂ܶ݅ݐ݄݁ݑ݈ݏܾܽ݁ݐ݂݂݀݁݅ݎ݂݁ܿ݊݁݅ݏ݊݁ݐ݊݅ݐ݅ݏ݁
ݐܾ݁ݓ݊݁݁ݐ݄݁݀݁݁ݏܽ݀݊ݐ݄݁݅ݔ݈݁ܽݐሺݔǡݕሻ݅ݏܶ
ܵܮܵܣܨܧܱݐ݄݁ݏ݅ݓݎ݁
Where T denotes a predefined threshold. Whereas this predicate is relied on
intensity differences as well as contains a particular threshold, we can define more
sophisticated methods in which each pixel has a distinctive threshold and other
properties are used in addition to differences. As the remainder of this examp le
demonstrates, the preceding predicate is sufficient to resolve the problem.
As stated previously, all seed values are 255 because the image was thresholded at
a value of 254. The difference between the seed value (255) and the value in Figure
8.43 (a) is illustrated in Figure 8.43 (e). The image in Figure 8.43 (e) contains all
of the distinctions necessary to compute the predicate at each location (x, y). The
corresponding histogram is shown in Figure 8.43 (f). We require a threshold for
establishing sim ilarity in the predicate. Because the histogram contains three
primary modes, we can begin by applying the dual thresholding technique to the
difference image. In this case, the resulting two thresholds were and, as we can see,
they closely correspond to t he histogram's valleys. (As a brief aside, the image was
segmented using these two thresholds. The result in Fig. 8.43 (g) demonstrates that
despite the fact that the thresholds are in the deep valleys of the histogram,
segmenting the defects cannot be acc omplished using dual thresholds.)
The difference image is thresholded in Figure 8.43 (h) using only The black points
represent pixels for which the predicate was TRUE; the white points represent
pixels for which the predicate was FALSE. The critical findin g here is that the
points in the weld's good regions failed the predicate, and thus will be excluded
from the final result. The region growing algorithm will consider the points in the munotes.in
Page 254
254IMAGE PROCESSING
outer region as candidates. Step 3 will, however, reject the outer poin ts because
they are not connected to the seeds in an 8 -way fashion. Indeed, as illustrated in
Fig. 8.43 (i), this step resulted in the correct segmentation, indicating that connectivity was a necessary condition in this case. Finally, note that we used the
same value in Step 4 for all regions discovered by the algorithm. In this case, it was
visually preferable to do so because all of those regions in this application have the
same physical meaning —they all represent porosities.
Region Splitting and Merging
The method described earlier grows regions from seed points. An option available
is to at first subdivide an image into a series of non - overlapping regions and then
merge and/or split the regions in order to satisfy the segmentation conditions.
Followin g that, the fundamentals of region splitting and merging are mentioned.
Assume that R represents the entire image region and that a predicate Q is chosen.
One method for segmenting R is to subdivide it subsequently into smaller and
smaller quadrant regions , such that we begin with the entire region, R. If the region
is split into quadrants. If Q is FALSE for any quadrant, that quadrant is divided into
sub-quadrants, and so forth. This technique is conveniently represented by so -
called quadtrees; which is, t rees in which each node will have precisely four
descendants, as illustrated in Fig. 8.44 (the images equivalent to the nodes of a
quadtree are occasionally referred to as quadregions or quadimages). Take note that
the root of the tree represents the entire image, while each node represents the
subdivision of a node into four descendant nodes. Only was subdivided further in
this instance.
FIGURE 8.44 ( a) Partitioned image. (b) Corresponding quadtree.
R represents the entire image region.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
When splitting is used exclusively, the final partition typically includes adjacent
regions with identical properties. This disadvantage can be overcome by permitting
munotes.in
Page 255
255Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
for both merging as well as splitting. Satisfying segmentation constraints. It
involves m erging adjacent regions with pixels that satisfy the predicate Q. That is,
two adjacent regions are merged only if they are adjacent.
The preceding discussion can be summarised as follows: at any step, we
1. Split any region Ri into four disjoint quadra nts for which ܳሺܴሻൌܧܵܮܣܨ;
2. When no further splitting is possible, merge any adjacent regions ܴ and ܴ
for which ܳሺܴܴሻൌܴܷܶܧ
3. Stop when no further merging is possible.
Numerous variations are possible on this basic theme. For instance, if we allow the
merging of any two adjacent regions in Step 2 and each one satisfies the predicate
independently, a significant simplification occurs. This results in a significantly
simpler (and faster) algorithm, as predicate testing is restri cted to individual
quadregions. As demonstrated in the following example, this simplification is still
capable of producing acceptable segmentation results.
EXAMPLE 6: Segmentation by region splitting and merging
The Cygnus Loop supernova is depicted in Figure 8.45 (a) in an X -ray image. The
purpose of this example is to segment (extract) the "ring" of less dense matter that
surrounds the dense inner region. The region of importance exhibits several readily
apparen t characteristics that should assist in segmentation. To begin, we recognize
that the data in this region are random, implying that their standard deviation ought
to be greater than that of the background (which is close to zero) and the large
central regi on, which is smooth. Likewise, the mean value (average intensity) of a
region containing outer ring data must be greater than the mean of the darker
background and less than the mean of the lighter central region. Thus, using the
following predicate, we sh ould be able to segment the region of interest:
FIGURE 8.45 (a) Image of the Cygnus Loop supernova, taken in the X -ray band
by NASA’s Hubble Telescope. (b) Through (d) Results of limiting the smallest
allowed quadregion to be of sizes of and pixels, resp ectively. (Original image
courtesy of NASA.)
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 256
256IMAGE PROCESSINGܳൌቄܧܷܴ݂ܶ݅ߪோܽܦܰܣͲ൏݉ோ൏ܾܨܣܵܮܧݐ݄ݓݎ݁݅݁ݏ
where ߪோand ݉ோ denote the standard deviation and mean of the processed region,
respectively, and a and b denote nonnegative constants.
Numerous regions within the external area of interest were analyzed , and the mean
intensity of pixels in those regions was never greate r than 125, and the standard
deviation was always greater than 10. The figures 8.45 (b) through (d) display the
results acquired by varying the minimum size allowed for quadregions from 32 to
8. White pixels were assigned to quadregions that satisfied the predicate; black
pixels were assigned to the remainder of the region. The optimal result in contexts
of obtaining the outer region's shape was achieved by using quadregions of size In
Fig. 8.45 (d), the small black squares represent quadregions of size who se pixels
do not satisfy the predicate. By using smaller quadregions, the number of such
black regions would increase. Using larger regions than those shown here results
in a more "block -like" segmentation. Not that the segmented region (white pixels)
in each case was a connected region that completely separated the inner, smoother
region from the background. Thus, segmentation partitioned the image effectively
into three distinct areas that correspond to the image's three primary features:
background, dens e region, and sparse region. Using any of the white regions in Fig.
8.45 as a mask would make extracting these regions from the original image a
relatively simple task.
8.4 Region Segmentation using Clustering and Superpixels
We will describe two relevant methods to region segmentation in this section. The
first one is a more traditional approach based on clustering data according to
variables such as intensity and color . The second approach is extremely quite
advanced and is focused primarily on the extraction of "superpixels" from an image
via clustering.
Region Segmentation Using K -Means Clustering
Clustering is a technique for categorizing a set of data into a predetermined number
of groups. K-means clustering is a widely used technique. In k -means clustering, a
collection of data is partitioned into k number groups.
It divides a given set of data into k distinct clusters. The K -means algorithm is
divided into two distinct phases. It calculates the k centroid in the first phase and
then assigns each point to the cluster with the closest centroid to the respective data
point in the second phase. There are several ways to define the distance to the
nearest centroid, with the Euclidean distance being one of the most frequently used. munotes.in
Page 257
257Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
Once the grouping is complete, it recalculates the new centroid of each cluster and,
using that centroid, calculates a new Euclidean distance between each centre and
each data point, assigning the cluster points with the smallest Euclidean distance to
the cluster. The partiti on's clusters are defined by their member objects and
centroid. The centroid of each cluster is the point to which the sum of the distances
between all of the cluster's objects is minimized . Thus, K -means is an iterative
algorithm for minimizing the sum of the distances between each object and its
cluster centroid across all clusters.
Consider an image with a resolution of x ,y that needs to be clustered into k clusters.
Let p(x, y) denote the clustering of an input pixel to be cluter and ck denote the
clust er centres. The k -means13 clustering algorithm is as follows:
1. Initialize the cluster k and the centre.
2. Using the relation given below, calculate the Euclidean distance d between
the image's centre and each pixel.
݀ൌצሺݔǡݕሻെܿצ
3. Based on the distance d, assign all pixels to the nearest centre.
4. Once all pixels have been assigned, recalculate the center's position using the
relationship shown below.
ܿൌͳ
݇ሺݔǡݕሻ
௫ఢೖ ௬ఢೖ
5. Repeat the process until the tolerance or error valu e is satisfied.
6. Resize the cluster pixels to match the dimensions of the image.
While k -means has the substantial advantage of being simple to implement, it does
have some disadvantages. The final clustering results' quality is determined by the
randomly c hosen initial centroid. As a result, if the initial centroid is chosen
randomly, the result will be different for different initial centres. Thus, the initial
centre will be carefully chosen to achieve the segmentation we desire. Additionally,
computationa l complexity is a factor to consider when designing the K -means
clustering algorithm. It is dependent on the number of data elements, clusters, and
iterations.
EXAMPLE 7: Using k -means clustering for segmentation
Figure 8.46 (a) depicts an image of size pixels, while Figure 8.46 (b) depicts the
segmentation produced by the k -means algorithm, the algorithm extracted all of the
image's meaningful regions with high accuracy. Compare the quality of the munotes.in
Page 258
258IMAGE PROCESSING
characters in both images, f or instance. It is critical to understand that the entire segmentation process was carried out using clustering of a single variable (intensity). Due to the fact that k -means operates on vector observations in general,
its ability to discriminate between r egions increases as the number of components
in vector z increases.
FIGURE 8.46 (a) Image of size pixels. (b) Image segmented using the k -means
algorithm with k=3
(Reference from "Digital Image Processing fourth edition
by Rafael C. Gonzalez and Richard E. Woods)
Region Segmentation using Superpixels
A superpixel is a collection of pixels that share certain characteristics (like pixel
intensity) . Superpixels are be coming increasingly useful in a variety of computer
vision and image processing algorithms, such as image segmentation, semantic
labelling, object detection and tracking, and so on, because they carry more
information than pixels.
Because pixels belonging to a given superpixel share similar visual properties,
superpixels have a perceptual meaning.
They represent images in a convenient and compact manner, which can be
extremely useful for computationally intensive problems.
The concept behind superpixels is to replace the conventional pixel grid with
primitive regions that are more perceptually important than specific pixels. The
objective is to decrease computational burden and also to strengthen segmentation
performance of the algorithm by removing superfluous detail. A simple illustration
will assist in explaining the fundamental concept of superpixel representations.
Figure 8.47 (a) depicts a 480,000 -pixel -wide image with varying degrees of detail
that could be defined ver bally as follows: "This is an image of two large carved
figures in the foreground and at least three much smaller carved figures resting on
a fence behind the large figures." The figures are on a beach, against a background
munotes.in
Page 259
259Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
of ocean and sky.” Figure 8.47 (b) depicts the identical image defined by 4,000
superpixels and their boundaries, while Figure 8.47 (c) depicts the superpixel
image. Somebody could make the argument that the superpixel image's level of
particulars outcomes in the same description as the initial, however the the latter
contains only 4,000 primitive units, compared to the original's 480,000. Whether
the superpixel representation is "adequate" is application -dependent. Yes, if the
objective is to define the image at the level of detail menti oned previously. On the
other hand, if the primary objective is to identify defects at pixel -level resolutions, the answer is obviously no. Consequently, there are applications, such as computerised medical diagnosis, where any form of approximate represen tation is
unacceptable. Nonetheless, there are numerous application areas, such as image
database queries, autonomous navigation, and certain branches of robotics, where the cost savings and potential performance improvements far outweigh any discernible l oss of image detail.
FIGURE 8.47(a) Image of size (480,000) pixels. (b) Image composed of 4,000
superpixels (the boundaries between superpixels (in white) are superimposed on
the superpixel image for reference —the boundaries are not part of t he data). (c)
Superpixel image. (Original image courtesy of the U.S. National Park Services.)
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Boundar ies are one of the most important aspects of any superpixel presentation.
That is, superpixel images must preserve boundaries between regions of interest.
As shown in Fig. 8.47 (c), this is the case. Note, for instance, how distinguishable
the figures are from the background. The same is accurate of the ocean -to-sky
boundaries and between the beach and the ocean. Other features include topological
property preservation and, of course, computational efficiency. This section's
superpixel algorithm meets these criteria. Reduce the difference between a superpixel image and its parent image to save storage and computation time. A
similar image is shown in Fig. 8.48 (a), while Fig. 8.48 (b) is a superpixel image
made up of 40,000 superpixels . The only visual difference between these images is
a slight difference in contrast caused by averaging intensities in each superpixel
munotes.in
Page 260
260IMAGE PROCESSING
region (we will discuss this in more detail later in this section). FIG. 8.48 (c) shows
an overall difference in contras t, as well as minor differences around sharp edges.
Remember that the superpixel image has fewer elements than the original and that
contrast differences can be easily corrected using histogram processing.
FIGURE 8.48 (a) Original image. (b) Image compos ed of 40,000 superpixels. (c)
Difference between (a) and (b).
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Finally, we show the results of reducing superpixels to 1,000, 500, and 250.
Compared to Fig . 8.47 (a), Fig. 8.49 shows a significant loss of detail, but the first
two images contain most of the detail relevant to the image description. Two of the
three small carvings on the back fence were removed. So did the 250 -element
superpixel image. However, the principal regio n boundaries and basic topology of
the images were preserved.
FIGURE 8.49 Top row: Results of using 1,000, 500, and 250 superpixels in the
representation of Fig. 8.47 (a) . As before, the boundaries between superpixels are
superimposed on the images for reference. Bottom row: Superpixel images
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 261
261Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
SLIC Superpixel Algorithm
Simple Linear Iterative Clustering is a state -of-the-art algorithm for segmenting
superpixels that is computationally efficient. In a nutshell, the algorithm clusters
pixels in the consolidated five -dimensional colour and image plane space to
generate comp act, nearly uniform superpixels efficiently.
Actually, the approach is quite straightforward. SLIC performs a local clustering
of pixels in a five -dimensional space defined by the CIELAB colorspace's L, a, and
b values and the pixels' x, y coordinates. It uses a different distance measurement,
which results in more compact and regular superpixel shapes, and it can be used on
both grayscale and colour images.
SLIC creates superpixels by clustering pixels according to their colour similarity
and image plane p roximity. Clustering is performed using a five -dimensional
[labxy] space. For small colour distances, the CIELAB colour space is considered
to be eternally uniform. It is not recommended to use Euclidean distance alone in
5D space, and thus the authors int roduced a new distance measure that takes
superpixel size into account.
Distance Measure
SLIC accepts an input of a desired number of approximately equal -sized
superpixels K. As a result, each superpixel will consist of approximately N/K
pixels. Thus, for superpixels of equal size, there will be a superpixel centre at each
JULGLQWHUYDO6 ¥1.
K cluster centres for superpixels C k = [l k, ak, bk, xk, yk] where k = [1, K] are chosen at regular grid intervals S. Given that any cluster has a spatial extent of approximately S2, it can be believed that pixels related with this cluster are located
within a 2S x 2S area in the xy plane around the superpixel centre.
For small distances in CIELAB colorspace, Euclidean distances are meaningful.
When spatial pixel distances exceed this perceptual colour distance threshold,
spatial pixel distances begin to outweigh pixel colour similarity.
Distance measure Ds is defined as follows.
dlab ¥O k - li)2 + (a k - ai)2 + (b k - bi)2 )
dxy ¥[ k - xi)2 + (y k - yi)2 )
Ds = d lab + (m / S)* d xy
Ds is the sum of the lab and xy plane distances normalised by the grid interval S. In
Ds, we introduce the variable m, which allows us to adjust the superpixel's
compactness. The larger the value of m, the more emphasis is placed on spatial
proximity and the cluster becomes more compact. This can be any value between
[1 and 20]. The algorithm 's authors chose m=10 as the starting point. munotes.in
Page 262
262IMAGE PROCESSING
Algorithm
It starts by randomly sampling K frequently spaced cluster centres and relocating
them to seed locations corresponding to the neighborhood's lowest gradient
position. This avoids placing them on an edg e and minimises the possibility of
selecting a noisy pixel. The gradients of an image are calculated as
*[\ __,[\í,[í\__2 __,[\í,[\í__2
where I(x, y) denotes the lab vector associated with the pixel at position (x, y), and
||.|| denotes the L2 norm. This takes both colour and intensity information into
account.
Each pixel in the image corresponds to the cluster centre closest to it whose search
area overlaps this pixel. After associating all pixels with the near est cluster centre,
a new centre is computed as the average labxy vector of all cluster pixels.
At the conclusion of this process, a few stray labels may remain, i.e., pixels adjacent
to a larger segment with the same label but not connected to it. In the final step of
the algorithm, it enforces connectivity by relabeling disjoint segments with the
labels of the largest neighbouring cluster.
SLIC Algorithm can be summarized as -
Algorithm 1 Efficient superpixel segmentation
1. Initialize cluster centers C k = [l k, ak, bk, xk, yk]T by sampling pixels at regular
grid steps S.
2. Perturb cluster centers in an n x n neighborhood* to the lowest gradient
position.
3. repeat
4. For each cluster center C k do
5. Assign the best matching pixels from a 2S x 2S square neighborhood around
the cluster center according to the distance measure.
6. end for
7. Compute new cluster centers and residual error E { L1 distance between
previous centers and recomputed centers}
8. until E < threshold
9. Enforce connectivity. munotes.in
Page 263
263Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
FIGURE 8.50 results of implementation (with K = 100 and m = 20)
[Image reference https://darshita1405.medium.com/superpixels -and-slic-
6b2d8a6e4f08 ]
8.5 Region Segmentation Using Graph Cuts
This section discusses a technique for dividing an image into regions that involves
demonstrating the image's pixels as nodes in a graph and then determining the best
partition (cut) of the graph into groups of nodes. Optimality is defined by criteria
that have a high value for members of a group (i.e., a region) and a low value for
members of other groups. As demonstrated later in this section, graph -cut
segmentation is capable of generating results that are sometimes superior to those
obtained using any of the previously studied segmentation methods. The cost of
this potential benefit is increased implementation complexity, which generally
results in slower execution.
Images as Graphs
A graph, G, is a mathematical structure comprised of a set V of nodes and an
associated set E of edges connecting those vertices:
Node s and edges are referred to as vertices and links.
G= (V, E)
Where V is a set and Cartesian product V × V
E ك V × V
munotes.in
Page 264
264IMAGE PROCESSING
is a collection of elements from V that are ordered pairs. ሺǡሻא Indicates that ሺǡሻא the graph is stated to be undirected; otherwise, the graph is stated to be
directed. For instance, we can think of a street map as a graph, with nodes
representing street intersections and edges representing the streets that connect
those intersections. T he graph is undirected if all streets are bidirectional (meaning
that we can travel both ways from any two intersections). Otherwise, the graph is
directed if at least one street is one -way.
We are focused in undirected graphs whose edges are even farther characterized by
a matrix, W, for whom the element w (i, j) represents the weight associated with
the edge connecting nodes I and j. Due to the fact that the graph is undirected, w (i,
j) = w (j, i) indicating that W is a symmetric matrix. The weights are chosen so that
they are proportional to one or more measures of similarity between all pairs of
nodes. A weighted graph is a graph whose edges are associated with weights.
The purpose of this section is to represent an image to be segmented as a weighted,
undirected graph, with nodes representing the image's pixels and edges connecting
each pair of nodes. Each edge's weight, w (i, j), is proportional to the similarity
between nodes i and j. We then actively sought to partition the graph's nodes into
disjoin t subsets V 1, V2… VK, where the resemblance between nodes within a subset
is high and the resemblance between nodes in different subsets is low. The
partitioned subsets' nodes correspond to the segmented image's regions.
Additionally, superpixels are well -suited for use as graph nodes. Thus, when we
pertain to "pixels" in an image in this section, we are implicitly referring to
superpixels.
By cutting the graph, set V is partitioned into subsets. A graph cut is a division of
V into two subsets A and B in such a way that
A %9DQG$ŀ%
Where the cut is accomplished by omitting the edges that connect subgraphs A and B. There are two critical aspects to incorporating graph cuts for image segmentation: (1) associating the graph with the image; and (2) cutting the graph
in such a way that makes s ense in terms of partitioning the image into background
and foreground (object) pixels.
The simplified approach depicted in Figure 8.51 demonstrates how to generate a
graph from an image. The nodes of the graph correspond to the pixels in the image,
and to simplify the explanation, we permit edges only between adjacent pixels via
4-connectivity, which implies that neither diagonal edges link the pixels. However,
take note that edges are stipulated in between each pair of pixels in overall. Weights munotes.in
Page 265
265Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
for edges are usually evaluated using spatial relationships (for instance, distance
from the vertex pixel) and intensity measures (for example, texture and colour),
which are consistent with pixel similarity. The degree of similarity among two
pixels is described i n this simple example as the opposite of the differentiation in
their intensities. – i.e., for two nodes (pixels) ݊ and݊, the weight of the edge
between them isݓሺ݅ǡ݆ሻൌͳȀሺหܫሺ݊ሻെܫ൫݊൯หܿሻ, where ܫሺ݊ሻ and ܫ൫݊൯are the
intensities of the two nodes (pixels), and c is a constant
That can be used to prevent division by zero. Thus, the closer the intensity values
between adjacent pixels are, the greater the value of w.
FIGURE 8.51(a) A 3 × 3 image. (c) A correspon ding graph. (d) Graph cut. (c)
Segmented image.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
While the general form in Fig. 8.51 is what we want to concentrate on in this
section, we also provide another, more common approach for developing image
graphs just for completeness. The same graph as shown in Figure 8.52 is shown
here, however, two additional nodes, the source and sink ter minal nodes, are
presented in addition to the previous, called here the "source" and "sink" terminal
nodes, respectively, all of which are connected via unidirectional links called t -
links.
The terminal nodes do not play a part to the image; rather, they s imply serve to
connect each pixel to one of two possible background or foreground states (an
object or not).
Possible outcomes are the t -links' weights. Thickness of each t -link is proportional
to the probability that the graph node it is connected to is a foreground or
munotes.in
Page 266
266IMAGE PROCESSING
background pixel in Figures 8.52 (c) and (d) (the thicknesses shown are so that the
segmentation result would be the same as in Fig. 8.51). It is up to the designer to
decide which of the two nodes we call "background" or "foreground".
MINIM UM GRAPH CUTS
After expressing an image as a graph, the graph is cut into two or more subgraphs.
Each resulting subgraph's nodes (pixels) correlate to a region in the segmented
image. According to Fig. 8.52, methods depend on analysing the graph as a flow
network (of pipes, for instance) and determining what is generally referred to as a
minimum graph cut. This methodology is established on the Max -Flow, Minimum -
Cut Theorem. This theorem states that the maximum volume of flow moving fro m
source to sink equals the minimum cut in a flow network. This minimum cut is
described as the smallest total weight of the edges that, if eliminated, will indeed
cause the sink to become disconnected from the source:
ݐݑܿሺܣǡܤሻൌݓሺݑǡݒሻ
௨אǡא
FIGURE 8.52 (a) Same image as in Fig. 8.51 (a). (c) Corresponding graph and
terminal nodes. (d) Graph cut. (b) Se nted image.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 267
267Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
While the min -cut method is elegant, it can contribute in groupings that favor cutting small groups of isolated nodes in a graph, resulting in incorrect segmentation. In Figure 8.53, the two regions of interest are denoted by the
tightness of the pixel gro upings. Meaningful edge weights will be inversely
proportional to the distance between two points. However, this would result in
smaller weights for isolated points, resulting in min cuts such as the one shown in
Fig. 8.53.
FIGURE 8.53 An example showing how a min cut can lead to a meaningless
segmentation. In this example, the similarity between pixels is defined as their
spatial proximity, which results in two distinct regions
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
The normalized cut algorithm calculates the cost of the cut as a fraction of the total
number of edge connections to all nodes in the graph.
ݐݑܿܰሺܣǡܤሻൌݐݑܿሺܣǡܤሻ
ݏݏܽܿሺܣǡܸሻݐݑܿሺܣǡܤሻ
ݏݏܽܿሺܤǡܸሻ
where ݐݑܿሺܣǡܤሻ is given by
ݏݏܽܿሺܣǡܸሻൌݓሺݑǡݖሻ
௨אǡ௭א
is the weighted sum of all edges connecting nodes in subgraph A to nodes in the
entire graph. Similarly,
ݏݏܽܿሺܤǡܸሻൌݓሺݒǡݖሻ
௨אǡ௭א
munotes.in
Page 268
268IMAGE PROCESSING
The total weight of all the edges from all the edges in B, and connecting all the
vertices . As you can see, the ܽܿݏݏሺܣǡܸሻ݅ݏthe cut of 'A' from the rest of the
graph, and in the same way, ܽܿݏݏሺܤǡܸሻ݅ݏ the cut of 'B' from the rest of the graph.
We can define a metric for total normalized association within graph partitions if
we consider concepts with the same basic building blocks.
ܰܿݏݏܽሺܣǡܤሻൌܽܿݏݏሺܣǡܣሻ
ܽܿݏݏሺܣǡܸሻܽܿݏݏሺܤǡܤሻ
ܽܿݏݏሺܤǡܸሻ
Where ܽܿݏݏሺܣǡܣሻ and ܽܿݏݏሺܤǡܤሻare the overall weights linking the nodes
within A and B . It is trivial to demonstrate that
ܰܿݐݑሺܣǡܤሻൌʹെܰܿݏݏܽሺܣǡܤሻ
Which indicates that minimizing ܰܿݐݑሺܣǡܤሻmaximizes ܰܿݏݏܽሺܣǡܤሻ
simultaneously.
8.6 Segmentation Using Morphological Watersheds
Previously, we discussed segmentation using three primary concepts: edge detection, thresholding, and region extraction. All of these methods was found to have some advantages (for example, global thresholding is fast) and some disadvantages (for example, the need for post processing, such as edge lin king, in
edge based segmentation). We will discuss a method based on the concept of so -
called morphological watersheds in this section. Segmentation by watersheds
incorporates several of the concepts from the other three methods and thus
frequently results in more stable segmentation results, including connected segmentation boundaries. Additionally, this approach establishes a straightforward
framework for incorporating knowledge -based constraints into the segmentation
process.
Watersheds and catchment basins are well -known concepts in topography. Individual catchment basins are divided by watershed lines. Watershed segmentation is a widely used segmentation technique that originated in the field
of mathematical morphology. Watershed segmentati on is a powerful and rapid
technique that can be used for contour detection as well as region -based
segmentation. Watershed segmentation is reliant on ridges for proper segmentation,
a property that is frequently satisfied in contour detection, where the o bjects'
boundaries are expressed as ridges. By calculating an image's edge map, it is
possible to convert the edges of objects into ridges for region -based segmentation.
Watershed management is typically accomplished through region -based growth
guided by a set of markers in order to avoid severe over -segmentation. munotes.in
Page 269
269Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
In topography, the Watersheds are well -known. It was initially proposed as a
technique for image segmentation. It is a method of image segmentation that is
morphologically based. For the watershed transformation, the gradient magnitude
of an image is treated as a topographic surface. Watershed lines can be located in a
variety of ways. The complete division of an image via watershed transformation
is largely dependent on an accurate estimation of t he image gradients. The
watershed transform result is degraded by the background noise, resulting in the
over segmentation. Additionally, under segmentation is caused when low -contrast
edges generate small magnitude gradients, resulting in the erroneous me rging of
distinct regions. Watershed lines can be located in a variety of ways. Different
approaches to segmentation based on the watershed principle are possible. The image data can be interpreted as a topographic surface with altitudes represented by the gradient image grey levels. Region boundaries correspond to high watersheds, while the interiors of lowgradient regions correspond to catchment basins.
The catchment basins of the topographic surface are homogeneous in the sense that
all pixels belonging to the same catchment basin are connected to the basin's region
of minimum altitude (gray -level) via a simple path of pixels with monotonically
decreasing altitude (gray -level) along the path. These catchment basins then
represent the segments of the segme nted image.
FIGURE 8.54 Gray Level profile of image data b) watershed segmentation –
Local minima of gray level yield catchment basins,
local maxima define the watershed lines.
Two fundamental approaches to segmenting watershed images. The first one
begins by determining a downstream path from each image pixel to a local
minimum of image surface altitude. A catchment basin is described as the sum of
pixels whose downstream paths all terminate at the same altitude minimum.
Whereas the downstream paths for continuous altitude surfaces can be easily
determined by measuring the local gradients, there are no rules that define the
downstream paths uniquely for digital surfaces.
munotes.in
Page 270
270IMAGE PROCESSING
The second approach is nearly identical to the first; rather than identifying
downstream paths, catchment basins are filled from the bottom up. Assume that
each local minimum has a hole in it and that the topographic surface is submerged
in water; water begins to fill all catchment basins, the minima of which are below
the water level. If two catchment basins merge as a result of further immersion, a
dam is built to the highest surface altitude and serves as a watershed boundary.
An efficient algorithm begins by sorting the pixels in ascending order of their grey
values, followed by a flooding step that involves a fast breadth -first scan of all
pixels in ascending order of their grey levels.
A brightness histogram is computed during the sorting step. Simultaneously, a list
of pointers to gray -level h pixels is created and associated with each histogram
gray-level, allowing direct access to all gray -level pixels. The flooding step makes
extensive use of information about the image's pixel sorting.
Assume that the flooding has reached a level (graylevel, altitude) k. Then, fo r each
pixel with a grey value less than or equal to k, a unique catchment basin label has
already been assigned.
Following that, pixels with a grey level of k+1 must be processed. If at least one of
its neighbours already has this label, a pixel with gray -level k+1 may belong to a
labelled catchment basin.
FIGURE 8.55 A geodesic influence zone of a catchment basin
All pixels with a grey level of k+1 that are within a catchment basin's influence
zone are labelled l, indicating that the catchment basin is growing. The queued
pixels are processed sequentially, and any pixels that cannot be assigned to an
existing label represent newly discovered catchment basins and are assigned new
and unique labels.
munotes.in
Page 271
271Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
Watershed -flooding analogy
Assume that the grayscale image is a landscape. Allow water to rise from the
bottom of each valley, i.e. give each valley's water its own labe l. Construct a dam
or a watershed as soon as the water from two valleys converges. These watersheds
will define the boundaries between the image's various regions. The watershed
effect can be applied directly to an image, to an edge -enhanced image, or to a
distance -transformed image.
Watershed -drop of water analogy
This grayscale image is presented as a landscape. A drop of water that lands
anywhere in the landscape will flow down to a local minimum. For each local
minimum in the landscape, there is a coll ection of points referred to as the
catchment basin from which a drop of water will flow to that particular minimum.
The watershed is defined by the boundaries between adjacent catchment basins.
Following are some examples of watersheds directly applied to grey level images.
FIGURE 8.56 watershed directly applied on gray level image
FIGURE 8.57 watershed directly applied on gray level image
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Example for seeded watershed
Each cell contains a nucleus, which can be distinguished using threshold and
watershed segmentation. Using these nuclei as seeds, it is simple to locate the
cytoplasm.
munotes.in
Page 272
272IMAGE PROCESSING
FIGURE 8.58 seeded watershed
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
Marker Based Watershed Segmentation
There will be two markers. The first is an internal marker, while the second is an
external marker. Internal markers are being used to restrict the number of regions
by specifying the objects of interest. For example, seeds can be assigned manually
or automatically to regio ns. Merging regions devoid of markers is permitted (no
dam is to be built) External markers are pixels that we are certain are in the
background. Watershed lines are common external markers that denote the
boundaries of a region. Internal markers can be us ed to obtain watershed lines for
the gradient of the segmented image. As external markers, use the obtained
watershed lines.
Each externally defined region contains a single internal marker and a portion of
the background. The issue is simplified by dividi ng each region into two sections:
an object (which contains internal markers) and a single background (containing
external markers). The following figure illustrates the segmentation of a watershed
using markers.
FIGURE 8.59 Marker Based Watershed Segmen tation
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 273
273Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
8.7 Use of Motion in Segmentation
Motion is a powerful tool that is used in a variety of applications, including
robotics, autonomous motion naviga tion, and dynamic scene analysis, to retrieve
objects of interest from a background of undesired detail. The motion is caused by
the sensing system's relative displacement from the scene being used. There are
two distinct methods for applying motion to seg mentation, and they are as follows.
1. Spatial domain approach
2. Frequency domain approach.
In the following section, we will discuss spatial domain techniques.
Spatial domain techniques
Consider the following: There are a number of image frames available, each taken
at a different point in time. Let f(x, y, t1) and f(x, y, t2) denote the two images taken
at times t1 and t2. Additionally, assume that the image contains a large number of
stationary objects and only one moving object. It is possible to detect the boundary
of a moving object in such circumstances. The procedure for determining the
boundary of a moving object is to find the difference image between images taken
at various point s in time. The image taken at the 0th interval could be used as a
reference image, and subsequent images could be used to subtract from it, resulting
in an image that contains only the boundary of the moving object and cancels out
all stationary objects.
FIGURE 8.60 (a) Image of a cheque (b ) Histogram of the image
(c) Segmented image
munotes.in
Page 274
274IMAGE PROCESSING
[Image reference from fundamentals of digital image processing by s. annadurai
pearson publication ]
The difference image between two images taken at time t1 and t2 may be defined
as
݀ǡሺݔǡݕሻൌ൜ͳ݂݅ห݂ሺݔǡݕǡݐሻെ݂൫ݔǡݕǡݐ൯หܶݓ݄݁ݎ݁ܶݏ݅ܽݐ݄ݏ݁ݎ݄݈݀
Ͳݐ݄ݎ݁ݏ݅ݓ݁
where T denotes a nonnegative threshold. Notably, ݀ǡሺݔǡݕሻ has a value of 1 at
spatial coordinates (x,y) only if the difference in intens ity among the two images is
meaningfully different at all of those coordinates, as defined by T. Consequently,
note that the coordinates (x,y) cover the dimensions of the two images, asserting
that the difference image is the same size as the sequence's i mages.
All pixels with value 1 in ݀ǡሺݔǡݕሻ are considered to be the result of object motion.
This technique is applicable only if the two images are spatially registered and the
illumination is relatively constant within the bounds defined by T. In practice , noise
can also result in 1 -valued entries in݀ǡሺݔǡݕሻ. Typically, these entries are isolated
points in the difference image, and removing them is as simple as creating four or
eight connected regions of 1's in image ݀ǡሺݔǡݕሻ then ignoring any region with less
than a predefined number of elements. While this approach may result in the
omission of small and/or slow -moving objects, it increases the likelihood that the
remaining entries in the difference image are due to motion and not noise.
Accu mulative Differences
Consider a series of image frames taken at times t1, t2, t3,..., tn and denoted by the
variables f(x, y, t1), f(x, y, t2),..., f(x, y, tn) (x, y, tn ). Assume that the reference
image is the first image frame f(x, y, t1). By comparing the reference image in the
sequence, an accumulative difference image is obtained. Each pixel location in the
accumulative image has its count incremented whenever a diff erence between the
reference and the image in the sequence occurs at that pixel location. Thus, when
the kth frame is compared to the reference, the entry in the given pixel of the
accumulative image indicates the number of times the grey level at that pos ition
differed from the reference image's corresponding pixel value. The equation is used
to cal culate the differences. Figure 8.6 0 illustrates these concepts .
Suppose that the intensity values of moving objects are greater than those of the
background, th ree types of ADIs are considered. To simplify the notation, let
ܴሺݔǡݕሻdenote the reference image and k denote t k, such that ݂ሺݔǡݕǡ݇ሻൌ
݂ሺݔǡݕǡݐሻWe begin by assuming that ܴሺݔǡݕ=)f(x,y,1) Then, for any k > 1, and munotes.in
Page 275
275Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
remembering that the ADI values a re counts, we define the following accumulative
differences for all relevant values of ሺݔǡݕሻ:
ܣሺݔǡݕሻൌ൜ܣିଵሺݔǡݕሻͳ݂݅ȁܴሺݔǡݕǡሻെ݂ሺݔǡݕǡ݇ሻȁܶ
ܣିଵሺݔǡݕሻݐ݄ݓݎ݁݅݁ݏ
ܲሺݔǡݕሻൌ൜ܲିଵሺݔǡݕሻͳ݂݅ȁܴሺݔǡݕǡሻെ݂ሺݔǡݕǡ݇ሻȁܶ
ܲିଵሺݔǡݕሻݐ݄ݓݎ݁݅݁ݏ
ܰሺݔǡݕሻൌ൜ܰିଵሺݔǡݕሻͳ݂݅ȁܴሺݔǡݕǡሻെ݂ሺݔǡݕǡ݇ሻȁ൏െܶ
ܰିଵሺݔǡݕሻݐ݄ݓݎ݁݅݁ݏ
where ܣሺݔǡݕሻ, ܲሺݔǡݕሻ, and ܰሺݔǡݕሻdenote the absolute, positive, and negative
ADIs computed with the kth image in the sequence, respectively. Each of the three
ADIs begins with zero counts and is the same size as the sequence's images. If the
intensity values of the background pixels are greater than the values of the moving
objects, the order of the inequalities and signs of the thresholds in above equation
are reversed.
EXAMPLE: Computation of the absolute, positive, and negative accumulative
difference images.
The three ADIs are exp ressed as intensity images in Figure 8.61 for a rectangular
object with a dimension of 75 X 50 pixels which is moving southeasterly at a speed
of 5 2 pixels per frame. The images are 256 X 256 pixels in size. We consider the
following into account: (1) The positive ADI's nonzero area equals the width of the
moving object; (2) the positive ADI's location correlates to the location of the
moving object in the frame of reference; (3) the positive ADI's count number stops
growing when the moving object is compl etely displaced from the same object in
the reference frame; (4) The absolute ADI encompasses the positive and negative
ADI regions; and (5) The direction and speed of a moving object can be evaluated
by analyzing the entries in the absolute and negative A DIs.
FIGURE 8.61 ADIs of a rectangular object moving in a southeasterly direction.
(a) Absolute ADI. (b) Positive ADI. (c) Negative ADI.
(Reference from "Digital Image Processing fourth edition by
Rafael C. Gonzalez and Richard E. Woods)
munotes.in
Page 276
276IMAGE PROCESSING
8.8 Summary
We have studied that the image segmentation can be performed in three different
approaches such as region based approach, boundary based approach, and edge
based approach like Point detection, line detection and edge detection. We have
seen different types of edge detection operation such as prewitt , sobel ,Laplacian
,log ,canny ,the Basics concept of Intensity Thresholding ,Otsu’s Binarization
Application.
In the case of region growing, adjacent pixels are evaluated and assigned to a region
class if no ed ges are detected. Top -down region splitting separates an image into
smaller portions that are more uniform than the entire image.
It is the goal of clustering to identify patterns in a data collection by grouping them
into distinct clusters, where each clu ster is more closely related to the other clusters
than the other clusters are to each other.
A histogram of the image can be utilized as a tool to determine the threshold value
for thresholding given image.
8.9 Unit End questions
1. What are the derivative opera tors useful in image segmentation? Explain
their role in segmentation
2. What is thresholding? Explain about global thresholding
3. Explain about basic adaptive thresholding process used Understand in image
segmentation
4. Explain in detail the threshold selecti on based on boundary characteristics
5. Explain about region based segmentation.
6. What are the derivative operators useful in image segmentation? Explain
their role in segmentation.
7. Explain about the Global processing via the Hough Transform for edge
linkin g
8. Explain about the Global processing via graph -theoretic techniques for edge
linking
9. Explain about Region Splitting and Merging with an example.
10. Write about edge detection
11. Explain about the Local processing for edge linking munotes.in
Page 277
277Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection
12. Write short note on Region G rowing
13. Write the mask for prewitt operator
14. Write the mask for sobel operator
15. Write the mask for laplacian operator
8.10 Reference for further reading
1. Digital Image Processing Gonzalez and Woods Pearson/Prentice Hall Fourth
2018
2. Fundamentals of Digital Image Processing A K. Jain PHI
3. The Image Processing Handbook J. C. Russ CRC Fifth 2010
4. All Images courtesy from Digital Image Processing by Gonzalez and Woods,
4th Edition
munotes.in
Page 278
278IMAGE PROCESSING
Unit 5
9 IMAGE SEGMENTATION II
Unit Structure
9.0 Objectives
9.1 Introduction
9.2 Image Segmentation using Snakes
9.3 Segmentation using Level Sets
9.4 Summary
9.5 Review questions
9.6 References
Objectives
1. To understand the process of segregating the foreground and background of
images.
2. To understand the working of image segmentation using contours and snakes.
9.1 Introduction
Image segmentation is a process of segregating the foreground and background of
images. Basically, canny edge detection algorithms are used in most applica tions
to detect edges. However, in many applications like medical image processing and
imagery applications, instead of basic algorithms active contours known as snakes
are used. This chapter focuses on Image segmentation using snakes and level sets.
9.2 Image Segmentation using Snakes
In image segmentation methods, connectivity and homogeneity are based on image
data. In medical image segmentation analysis, segmentation depends on anatomy.
It is usually difficult to get good segmentation. Active contours are used for image
segmentation. Prior knowledge is essential to get to know the real problem.
In an active contour framework, object segmentation is achieved by using a closed
contour to the objects’ boundary. The snake is active as it continuously evolv ing
such that the energy is minimized. munotes.in
Page 279
279Chapter 9: Image Segmentation I I
The energy function for a snake works in two parts, internal and external energies
Esnake = E internal + E external
The internal energy depends on its intrinsic properties such as length and curvature.
The external energy depends on factors such as image structure and constraints the
user would have imposed. Snakes start with closed curve and minimizes the total
energy function to deform until they reach an optimal state. The main advantage of
active contour is that it results in closed coherent areas with smooth boundaries.
Fig 9.1 Active contour is semi -automatic as it requires the user to mark the initial
contour
Edge based and region based active contour a) Initial contour b) and c)
Intermediate contour d) Final contour
Fig 9.2 A snake is a parametric curve
A higher -level process or the user initializes any curve close to the object boundary.
The snake then deforms and moves towards the desired object boundary. In the
end, it completely shrink -wraps ar ound the object. Deformable models are curves
or surfaces defined within an image domain that can move under the influence of
munotes.in
Page 280
280IMAGE PROCESSING
internal forces, which are defined within the curve itself and external forces within
the image data.
Problems with Image segment ation using snakes
1. Snakes may over smooth the boundary
2. Initialization is crucial
3. Depends on number and spacing of control points
4. It may not follow topological changes of objects.
9.3 Image Segmentation using Level Sets
Level Set is an alternative representation to closed contours. Level sets fit and track
objects of interest by modifying the embedding function instead of a curve
function.
Fig 9.3 Ultrasound image segmentation without reinitialization
using Level Set evolution
9.4 Summary
In this chapter, we learned about the process of image segmentation . The process
of image segmentation uses active contours , snakes, and level sets.
9.5 Questions
1. What are the disadvantages of image segmentation classical methods?
2. Explain image segmentation using active contours.
3. What are level sets in image segmentation? Explain.
munotes.in
Page 281
281Chapter 9: Image Segmentation I I
9.6 References
1. https://www.cs.cmu.edu/~galeotti/methods_course/Segmentation2 -
Snakes.pdf
2. Medical Image Processing by G.R. Sinha and Bhagwati Charan Patel, PHI
publications
3. http://home.iitj.ac.in/~manpreet.bedi/btp/documents/sar.pdf
4. https://www.slideshare.net/UlaBac/lec11 -active -contour -and-level -set-for-
medical -image -segmentation (Reference to image Fig:10.3)
5. Credits to Mr. Ulas Bagci, Assist. Prof at UCF, Research center for Computer
Vision
6. https://www.cs.cmu.edu/~galeotti/methods_course/Level_Sets_and_Parame
tric_Transforms.pdf
For detailed study, use the mentioned links
munotes.in
Page 282
282IMAGE PROCESSING
Unit 5
10 FEATURE EXTRACTION
Unit Structure
10.0 Objectives
10.1 Introduction
10.2 Boundary Preprocessing
10. 3 Boundary feature descriptors
10.4 Region feature descriptors
10.5 Whole -Image features
10.6 Scale -Invariant Feature Transform (SIFT)
10.7 Summary
10.8 End of the questions
10.9 References
10.0 Objectives
1. Understand the concept of feature extraction and the various feature extraction
algorithms.
2. To broadly be aware of feature detection and feature description techniques.
10.1 Introduction
Feature extraction almost always follows the output of a segmentation stage, which
usually is raw pixel data, constituting either the boundary of a region (i.e., the set
of pixels separating one image region from another) or all the points in the region
itself. Feature extraction consists of feature detection and feature description.
Feature detection refers to finding the features in an image, region , or boundary.
Feature description assigns quantitative attributes to the detected features.
10.2 Boundary Preprocessing
10.2.1 Boundary Following (Tracing)
A boundary -following algorithm whose output is an ordered sequence of points.
We assume (1) that w e are working with binary images in which object and
background points are labelled 1 and 0, respectively; and (2) that images are padded munotes.in
Page 283
283Chapter 10: Feature Extraction
with a border of 0’s to eliminate the possibility of an object merging with the image
border.
The following algorithm traces the boundary of a 1 -valued region, R, in a binary
image.
1. Let the starting point, b0, be the uppermost -leftmost point† in the image that
is labelled 1. Denote by c0 the west neighbour of b0. Clearly, c0 is always a
background point. Examine the 8-neighbors of b0, starting at c0 and proceeding in a clockwise direction. Let b1 denote the first neighbour encountered whose value is 1, and let c1 be the (background) point immediately preceding b1 in the sequence. Store the locations of b0 for use
in Step 5.
2. Let b = b0 and c = c0.
3. Let the 8 -neighbors of b, starting at c and proceeding in a clockwise direction,
be denoted by n 1, n2, …… n 8. Find the first neighbor labelled 1 and denote it
by n k
4. Let b = n k and c =n k-1.
5. Repeat Steps 3 and 4 until b =b0. The sequence of b points found when the
algorithm stops is the set of ordered boundary points Fig 10.1 Illustration of the first few steps in the boundary -following algorithm. The
point to be processed next is labelled in bold, black; the points yet to be processed
are gray; and the points found by the algorithm are shaded. Squares without labels
are considered background (0) values.
10.2.2 Chain Codes
Chain codes are used to represent a boundary by a connected sequence of straight -
line segments of specified length and direction. A chain code representation is
based on 4 - or 8-connectivity of the segments. The direction of each segment is
coded by using a numbering scheme. A boundary code formed as a sequence of
such directional numbers is referred to as a Freeman chain code.
Digital images usually are acquired and processed in a grid format with equal
spacing in the x - and y -directions, so a chain code could be generated by following
munotes.in
Page 284
284IMAGE PROCESSING
a boundary in, say, a clockwise direction and assigni ng a direction to the segments
connecting every pair of pixels
Fig 1 0.2 Direction numbers for 1) 4 -directional 2) 8 -directional chain code
Fig 1 0.3 a) Digital boundary with resampling grid b) Resampled grid
c) 8-directional chain -coded boundary
Exampl e of 4 -directional chain code
Consider the given image, the 4 -directional chain code is given by
0 0 3 0 3 3 2 2 2 1 1 1
munotes.in
Page 285
285Chapter 10: Feature Extraction
Circular difference = last -first = 0 – 0 = 0
Therefore, circular chain code is 0 0 0 3 0 3 3 2 2 2 1 1 1
Shape number - Shape no of a boundary obtained from a chain code is defined as
the smallest magnitude of the circular first difference.
10.2.3 Slope Chain codes (SCC)
Using Freeman chain codes generally requires resampling a boundary to smooth
small varia tions, a process that implies defining a grid and subsequently assigning
all boundary points to their closest neighbours in the grid. The SCC of a 2 -D curve
is obtained by placing straight -line segments of equal length around the curve, with
the end points of the segments touching the curve. Obtaining an SSC requires
calculating the slope changes between contiguous line segments, and normalizing
the changes to the continuous (open) interval ( -1, 1).
Fig 1 0.4 (a) An open curve. (b) A straight -line segment . (c) Traversing the curve
using circumferences to determine slope changes; the dot is the origin (starting
point). (d) Range of slope changes in the open interval ( -1 ,1)
10.2.4 Boundary Approximation using Minimum -Perimeter Polygon
A digital boundary can be approximated with arbitrary accuracy by a polygon. For
a closed curve, the approximation becomes exact when the number of segments of
the polygon is equal to the number of points in the boundary, so each pair of
munotes.in
Page 286
286IMAGE PROCESSING
adjacent points defines a segment of the polygon. A boundary can be represented
by Minimum -Perimeter Polygon.
MPP Algorithm
1. The MPP bounded by a simply connected cellular complex is not self -
intersecting.
2. Every convex vertex of the MPP is a W vertex, but not every W vertex of a
boundary is a vertex of the MPP.
3. Every mirrored concave vertex of the MPP is a B vertex, but not every B
vertex of a boundary is a vertex of the MPP.
4. All B vertices are on or outside the MPP, and all W vertices are on or inside
the MPP.
5. The uppermost -leftmost vertex in a sequence of vertices contained in a
cellular complex is always a W vertex of the MPP.
Fig 1 0.5 a) The Region b) Convex and Concave c) MPP superimposed on
concave, convex vertices
10.2.5 Signatures
A signature is a 1 -D func tional representation of a 2 -D boundary. Plot the distance
from the centroid to the boundary as a function of angle. The basic idea of using
signatures is to reduce the boundary representation to a 1 -D function. Though
simplicity is the advantage, the disa dvantage is scaling of the entire function
depends on only two values: the minimum and maximum.
Distance versus angle - Plot the distance from the centroid to the boundary as a
IXQFWLRQRIDQJOHVVLJQDWXUH Ușș aʌ
munotes.in
Page 287
287Chapter 10: Feature Extraction
Fig 10.6 Distance versus angl e signatures
10.2.6 Skeletons, Medial Axes and Distance transforms
The skeleton of a region is the set of points in the region that are equidistant from
the border of the region. The skeleton is obtained using one of two principal
approaches: (1) by succe ssively thinning the region (e.g., using morphological
erosion) while preserving end points and line connectivity (this is called topology -
preserving thinning); or (2) by computing the medial axis of the region via an
efficient implementation of the medial axis transform (MAT). The skeleton of a
region is defined as its medial axis.
Fig 10.7 Medial axes of three simple regions
The distance transform of a region of foreground pixels in a background of zeros
is the distance from every pixel to the nearest n onzero valued pixel.
Fig 10.8 An image with its distance transform
munotes.in
Page 288
288IMAGE PROCESSING
10.3 Boundary feature descriptors
10.3.1 Basic Boundary descriptors
The length of a boundary is one of its simplest descriptors.
The diameter of a boundary B is defined as
where D is a distance measure and pi and pj are points on the boundary. The value
of the diameter and the orientation of a line segment connecting the two extreme
points that comprise the diameter is called the major axis (or longest chord) of the
boundary.
The lengt h and orientation of the major axis are given by
The minor axis (also called the longest perpendicular chord) of a boundary is
defined as the line perpendicular to the major axis.
The curvature of a boundary is defined as the rate of change of slope.
10.3.2 Fourier descriptors
It represents the boundary as a sequence of coordinates and treats each coordinate
pair as a complex number. The discrete Fourier transform (DFT) of s( k) is
Where u = 0,1, 2, …., K -1. The complex coefficients a (u) is called th e Fourier
descriptors of the boundary.
munotes.in
Page 289
289Chapter 10: Feature Extraction
Fig 10.9 Basic properties of Fourier descriptors
10.4 Region Feature Descriptors
10.4.1 Basic Descriptors
The area of a region is defined as the number of pixels in the region.
The perimeter of a region is the length of its boundary.
Compactness of a region, defined as the perimeter squared over the area
Another Dimensionless measure is circularity (also called roundness), defined as
The value of this descriptor is 1 for a circle (its maximum value) and p 4 for a
square.
The eccentricity of a region relative to an ellipse as the eccentricity of an ellipse
that has the same second central moments as the region.
10.4.2 Topological Descriptors
Topology is the study of properties of a figure that are unaffected by any
deformation, provided that there is no tearing or joining of the figure (sometimes
these are called rubber -sheet distortions). Topological property useful for region
description is the number of connected components of an image. The numbe r of
holes H and connected components C in a figure can be used to define the Euler
number, E. The Euler number is also a topological property. E =C - H
munotes.in
Page 290
290IMAGE PROCESSING
Fig 10.10 (a) A region with two holes. (b) A region with
three connected components.
Fig 10.11 Reg ion with Euler number 0 and -1 respectively
Fig 10.12 A region with polygonal network
10.4.3 Central Moments
The 2 -D moment of order (p + q) of an M N× digital image, f (x, y), is defined as
where p = 0,1,2… and q = 0,1,2 … are integers.
The central moment of order (p + q) is defined as
where p = 0,1,2… and q = 0,1,2 … are integers
munotes.in
Page 291
291Chapter 10: Feature Extraction
The normalized central moment of order (p + q), is defined as
Where
A set of seven, 2 -D moment invariants can be derived from the second and third
normalized central moments
10.5 Whole Image features
The principal feature detection methods are based on detecting corners and the
other works with entire regions in an image.
The Harris -Stephens Corner Detector
The basic approach is this: Corners are detected by running a small window over
an image.
The detector window is designed to compute intensity changes
munotes.in
Page 292
292IMAGE PROCESSING
Fig 10.13 Illustration of Harris -Stephens Corner detector
Three scenarios
(1) Areas of zero (or small) intensity changes in all directions, which happens
when the window is located in a constant (or nearly constant) region, as in
location A.
(2) areas of changes in one direction but no (or small) changes in the orthogonal
direction, which this happens when the window spans a boundary between
two region s, as in location B
(3) areas of significant changes in all directions, a condition that happens when
the window contains a corner (or isolated points), as in location C.
The HS corner detector is a mathematical formulation that attempts to differentiate
between these three conditions.
Let f denote an image, and let f (s, t) denote a patch of the image defined by the values of (s, t). A patch of the same size, but shifted by (x, y),
is given by f (s + x, t + y).
The above equation can be written in the form
Where
munotes.in
Page 293
293Chapter 10: Feature Extraction
Matrix M is called as Harris matrix.
10.6 Scale Invariant Feature Transform (SIFT)
SIFT is an algorithm developed by Lowe [2004] for extracting invariant features
from an image. I t is called a transform because it transforms image data into scale -
invariant coordinates relative to local image features. SIFT is by far the most
complex feature detection and description approach. In the presence of variables
such as scale changes, rota tion, changes in illumination, and changes in viewpoint,
methods like SIFT is used.
SIFT features (called key points) are invariant to image scale and rotation, and are
robust across a range of affine distortions, changes in 3 -D viewpoint, noise, and
chang es of illumination. The input to SIFT is an image. Its output is an n -
dimensional feature vector whose elements are the invariant feature descriptors
The first stage of the SIFT algorithm is to find image locations that are invariant to
scale change. This is achieved by searching for stable features across all possible
scales, using a function of scale known as scale space. The parameter controlling
the smoothing is referred to as the scale parameter. In SIFT, Gaussian kernels are
used to implement smoothin g, so the scale parameter is the standard deviation.
SIFT subdivides scale space into octaves, with each octave corresponding to a
doubling of ߪ. SIFT initially finds the locations of key points using the Gaussian
filtered images, then refines the locatio ns and validity of those key points using improving the accuracy and eliminating edge responses, compute key point orientations and descriptors.
10.7 Summary
In this chapter, we learned about feature extraction algorithms. We also learned
region -based and Fourier descriptors . The role of chain codes, signatures, and
whole image features.
munotes.in
Page 294
294IMAGE PROCESSING
10.8 Review questions
1. How can one represent and describe features after extraction?
2. What are region descriptors?
3. Explain Fourier descriptors briefly.
4. What are chain codes? Explain with examples.
5. Write a short note on signatures
6. Describe Whole image features.
7. Explain Harris -Stephens Corner detector algorithm.
8. Write a short not e on SIFT.
10.9 References
1. Digital Image Processing by Gonzalez and Woods, 4th Edition.
2. All Images courtesy from Digital Image Processing by Gonzalez and Woods,
4th Edition
3. https://www.ee.columbia.edu/~xlx/courses/ee4830 -sp08/notes/lec10.pdf
munotes.in