TYBSC-CS-Digital-image-processing-munotes

Page 1

1 1
INTRODUCTION TO IMAGE -
PROCESSING SYSTEM
Unit structure
1.0 Objectives
1.1 Introduction
1.2 Image Sampling
1.2.1 Theory of 2D Sampling
1.2.2 Retrieving Image From Its Sample
1.2.3 Violation of Sampling criterion
1.3 Quantization
1.4 Resolution
1.5 Human Visual Systems
1.5.1 Anatomy of HVS
1.5.2 Scotopic AndPhotopic Vision
1.5.3 Brightness and contrast
1.5.4 Laws Of Physics Associated With Vision
1.5.5 MACH BAND EFFECT
1.6 Classification of Digital Images
1.7 Image Types
1.8 Elements of an Image -proces sing System
1.9 Applications of Digital Image Processing
1.10 Summary
1.11 References
1.0 OBJECTIVES
As the name indicates digital image processing helps :
 To study two -dimensional Signals and Systems.
 To understand image fundamentals and transforms neces sary for image
processing.
 To study the image enhancement techniques in spatial and frequency
domain.
 To study image segmentation and image compression techniques munotes.in

Page 2


Digital Image Processing
2 1.1 INTRODUCTION
In current era digital plays an important role in several areas like
geog raphy, information technology, medicine etc.
Digital image processing(DIP)refers to management of digital images
through a digital computer. It is a subarea of signals and systems but
particularly focusses on images. DIP aims on using a computer system tha t
is able to perform several processing of an image. The process contains
the input of that system which is a digital image , the system then process
that image using effective algorithms, and gives as an output an image.
The most general example is Matla b, Adobe Photoshop. It is widely used
for processing digital images.
Before proceeding let us define some terms as follows :
1. IMAGE : An image is defined as an two dimensional signal. It is
defined using mathematical function f(x,y) where x and y are the tw o
co-ordinates horizontally and vertically called as spatial or plane
coordinates.The value of f(x,y) at any point is gives the pixel value at
that point of an image. represents a measure of some characteristic such
as brightness or color of a viewed scene. An image is a projection of a
3- D scene into a 2D projection plane
2. ANALOG IMAGE : An analog image cab ne represented
mathematically using continuous range of values which represent
intensity and position. Some of the examples include television
images, ph otographs, paintings, and medical images etc.
3. DIGITAL IMAGE : It is composed of matrix of small pixels. For
operating with the images, there are various software and algorithms.
Some of th e examples of digital images are color processing, image
recognitio n, video processing, etc. Analog signals can be convted into
digital using two important operations i.e sampling and quantisation.


Figure 1
Advantages of Digital Image Processing
 Fast processing and effectiveness of cost
 Easier Image rebuilding and refor matting
 Speedy image storing and retrieval
 Fast and high -quality image distribution.
Disadvantages of Digital Image Processing
 It is very much time -consuming, needs high speed processor.
 Enlargement of image not possible after certain size.
 It is very much expensive depending on the particular system. Digital
Image Analog Image Sampling
Quantization
munotes.in

Page 3


Introduction to Image -
Processing System
3 4. Digital Image representation :
A digital image is an :
 N x N array of elements
 2-dimensional discrete signal
To convert an image in digital format either sca nner or digital camera is
used. T hese can be created directly on computer screens.
5 Neighbours of Pixel
A pixel will have four neighbpurs if they exist. NORTH, SOUTH, EAST
and WEST.
North
West P East
South
Figure 2
1.2 IMAGE SAMPLING
Sampling is the procedure of evaluating at discrete spati al points the
information of brightness . f(x,y) – a continuous image function can be
sampled in the plane with the help of discrete grid of sampling points.
1.2.1 THEORY of 2D SAMPLING
Let the analog image be represented by f(x,y). the corresponding discrete
edition of f(x,y) is obtained by defining f(x,y) at particular instances called
as sampling intervals x,y which are real positive constants.

Steps in 2D sampling are as follows :
1. Take analog sample f(x,y)
2. Take fourier transform after which we get spectrum of f(x,y) denoted
by F( 1,2) as


munotes.in

Page 4


Digital Image Processing
4 3. Then take inverse Fourire transform after which we get

4. For discrete signal we specify
where
and
are expressed in radians
5. Differentiating
,
and subst ituting in Inverse transform in step 3
we get discretised form of f(x,y) as given below
6. Replacing the integration by summation , changing limits we have

An alertnate approach for the above procedure is multiplying the anlaog
image by a 2D comb functio n.It is a rectangular grid of points as shown in
figure 3 . the spaces in between thr grid points are x,y respectively.

Figure 3 Figure 4

Figure 4 represents a 3 Dimensional view of comb function . The 2D
view also called as bed -of-nail functio n defined by :
munotes.in

Page 5


Introduction to Image -
Processing System
5 After multiplying the comb function and analog function f(x,y) we get
discrete version of analog image given by

Next fourier transform of above product of functions is computed and
convolution of signals is applied to obtain

Summatio n being an linear operator , interchanging the order we get :

We can see that the final equations from both the methods are similar.
1.2.2 RETRIEVING IMAGE FROM ITS SAMPLE
Since discreteness in one domain leads to periodicity in other domain,
sampling in spatial domain also leads to a periodic spectrum in the
frequency domain as in figure 5

Figure 5 :Periodic spectrum of sampled image.
In order to retrieve the original image from the sampled spectrum we
apply the condition : sampling frequency shoul d be greater than two
times the maximum signal frequency which is called “The Sampling
Theorem”

munotes.in

Page 6


Digital Image Processing
6 That is (*)

Where 2
are called Nyquist rates .
A low pass filter is used to extract the desired spectrum whose transfer
function is given by :

The region of support is indicated as P in figure 5. The continuous image
can be obtained from sampled spectrum by multiplying the sampled
spectrum with low pass filter as given below :

1.2.3 VIOLATION OF SAMPLING CRITERIA
Violation of sampling cr iteria given in (*) leads to aliasing which
fundamentally happens due to under sampling. It also leads to overlapping
of spectrum in the
direction as shown in figure 6.


Figure 6 : Along the direction of
under sampling munotes.in

Page 7


Introduction to Image -
Processing System
7

Figure 7 : Along the direction of
under sampling
In the similar manner violation of sampling criteria given in (*) leads to to
overlapping of spectrum in the
direction as shown in figure 7.

Violation of sampling criteria given in (*) simultaneously leads to to
overlapping of spectrum in both the
direction as shown in
figure 8.


Figure 8 : Along the direction of both
and
under sampling

munotes.in

Page 8


Digital Image Processing
8 1.3 QUANTIZATION
Quantization is a process of transforming a real valued sampled image to
one taking only a finite number of distinct values. Under quantization
procedure the amplitude values of the image are digitized. In naïve
words, when you are quantizing an image, you are actually partitioning a
signal into quanta(partitions).
Quantiser design includes :
 Input decision level
 Output representation level
 Number of levels.
Quantisers are classified in two types :
1. Scalar quantisers
2. Vector quantisers

Figure 9 : Classification of quantisers
1.4 RES OLUTION
Resolution contributes the degree of distinguishable details. It is classified
as :
1. Spatial resolution : Spatial resolution is defined as the smallest
discernible detail in an image. Spatial resolution states that the clarity
of an image cannot be d etermined by the pixel resolution. The number
of pixels in an image does not matter.It is the number of independent
pixels values per inch.sampling is the principal factor determining this
resolution .
2. Gray – level resolution : Gray level resolution refer s to the predictable
or deterministic change in the shades or levels of gray in an image. In
short gray level resolution is equal to the number of bits per pixel.

munotes.in

Page 9


Introduction to Image -
Processing System
9 1.5 HUMAN VISUAL SYSTEM
Human Visual System (HVS) is the most complicated system in life.
Many image processing applications are anticipated to produce images
that are to be watched by human observers. In HVS , the eyes act as the
sensor or camera, neurons act as the connecting cable and the brain acts
as the processor. It is hence important to realize the characteristics and
limitations of the human visual system to understand the “receiver” of the
2D signals.
1.5.1 ANATOMY OF THE HVS

Figure 10 : Vertical section of the Human Eye
The first part of the visual system is the eye. This is shown i n figure. Its
form is nearly spherical and its diameter is approximately 20 mm. Its outer
cover consists of the ‘cornea' and ‘sclera' The cornea is a tough transparent
tissue in the front part of the eye. The sclera is an opaque membrane,
which is continuo us with cornea and covers the remainder of the eye.
Directly below the sclera lies the “choroids”, which has many blood
vessels. At its anterior extreme lies the iris diaphragm. The light enters in
the eye through the central opening of the iris, whose dia meter varies from
2mm to 8mm, according to the illumination conditions. Behind the iris is
the “lens” which consists of concentric layers of fibrous cells and contains
up to 60 to 70% of water. Its operation is similar to that of the man made
optical lense s. It focuses the light on the “retina” which is the innermost
membrane of the eye. Retina has two kinds of photoreceptors: cones and
rods. The cones are highly sensitive to color. Their number is 6 -7 million
and they are mainly located at the central part of the retina. Each cone is
connected to one nerve end. Cone vision is the photopic or bright light
vision. Rods serve to view the general picture of the vision field. They are
sensitive to low levels of illumination and cannot discriminate colors. This
is the scotopic or dim -light vision. Their number is 75 to 150 million and
they are distributed over the retinal surface.
Several rods are connected to a single nerve end. This fact and their large
spatial distribution explain their low resolution. Both con es and rods
transform light to electric stimulus, which is carried through the optical munotes.in

Page 10


Digital Image Processing
10 nerve to the human brain for the high level image processing and
perception.
1.5.2 SCOTOPIC AND PHOTOPIC VISION
Scotopic vision is the vision of the eye under low light conditions. Cone
cells do not function as well as rod cells in low level lighting so scotopic
vision occurs completely through rod cells, which are most sensitive to
wavelengths of light on the electromagnetic spectrum of 498nm, which
would be the blue -green bands of colour.
Photopic vision is the vision of the eye under well -lit conditions, generally
usual daylight light intensity. It allows colour perception which is
facilitated by cone cells. Cone cells have a higher visual acuity as well as
providing t he eye’s colour sensitivity.
Scotopic vision uses only rods to see, meaning that objects are visible, but
appear in black and white, whereas photopic vision uses cones and
provides color.
1.5.3 BRIGHTNESS AND CONTRAST
The Contrast a nd Brightness function enhances the appearance of raster
data by modifying the brightness and contrast within the image.
Brightness increases the overall lightness of the image —for example,
making dark colors lighter and light colors whiter —while contrast adjusts
the difference between the darkest and lightest colors. Below is an
example of adjustments made to the brightness and contrast of an image.
1.5.4 LAWS OF PHYSICS ASSOCIATED WITH VISION
The important laws associated with vision are :
1. Weber's law : It states that the just noticeable difference in stimulus
intensity may affect the production of sensations proportionally.
In simple terms, the size of the intensity of stimuli will show a
proportionate change in producing the sense experiences.
 I = kL
Where ΔI(Delta I) repr esents the difference threshold, I represents the
initial stimulus intensity and k signifies that the proportion on the left side
of the equation remains constant despite variations in the I term.
It is denoted by Delta I.
2. Steven’s Law : If L is physical s timulus and I the perceived sensation
then the mathematical form of steven’s law is
I =  Ln
Where L is the intensity against black background, the value of n is 0.5
munotes.in

Page 11


Introduction to Image -
Processing System
11 3. Bloch’s Law : It is given by

Where C is constant valid for a duration of shorter tha n 0.1 second
4. Ricco’s law : It is given by

5. Piper’s law : It is given by

1.5.5 MACH BAND EFFECT
Another characteristic of HVS is that it tends to “overshoot” around image
edges (boundaries of regions having different intensity). As a result,
regions of c onstant intensity, which are close to edges, appear to have
varying intensity. Such an example is shown in Figure 11 . The stripes
appear to have varying intensity along the horizontal dimension, whereas
their intensity is constant. This effect is called Ma ch band effect. It
indicates that the human eye is sensitive to edge information and that it
has high -pass characteristics.

Figure 11 : Mach band effect
1.6 CLASSIFICATION OF DIGITAL IMAGES
Digital images can be classified in two types as follows :
1. Raste r or Bitmap I mage : Also known as bitmap images are made
of pixels or tiny dots that use color and tone to produce the
image. Pixels appear like little squares on graph paper when the
image is zoomed in or enlarged. These images are created by
digital came ras, by scanning images into a computer or with
raster -based software. Each image can only contain a fixed
number of pixels; the amount of pixels determines the quality of
the image. This is known as resolution. More pixels result in
better quality at the same or larger sizes as the original, but this munotes.in

Page 12


Digital Image Processing
12 also increases the size of the file and the amount of space it
takes to store the file. The lower the number of pixels, the lower
the resolution. Resolution limits the size the image can be scaled
up without b eing able to see pixels.
Common raster formats are BMP (windows Bitmap), PCX (Paintbrush),
TIFF (Tag Inter Leave Format) etc.

Figure 1 2 : Zooming Of Raster Image
2. Vector Image : These images consist of anchored dots and are
connected by lines and curves, similar to the connect -the-
dot activities you may have done as a kid. Because these
graphics are not based on pixels, they are known as resolution
independent, which makes them infinitely scalable. Their lines
are sharp, without any loss in quality or deta il, no matter what
their size. These graphics are also device -independent, which
means their quality doesn't depend on the number of dots
available on a printer or the number of pixels on a screen.
Because they consist of lines and anchor points, the size of the
file is relatively small.

Figure 1 3 : Zooming Of Vector Image


munotes.in

Page 13


Introduction to Image -
Processing System
13 1.7 IMAGE TYPES
Images are classified into four categories:
1. Binary image : Also called as Black and White image ,the binary image
as it name sys , contain only two pixel values - 0 and 1.Here 0 refers to
black color and 1 refers to white color. It is also know as Monochrome
image . A gray scale image can be converted to black and white image by
using threshold operation.


Figure 14 : Binary image representation

2. Gray scale I mage : Grayscale images, a kind of black -and-white or
gray monochrome, are composed exclusively of shades of gray. They
conatin only brightness information.The contrast ranges from black at the
weakest intensity to white at the strongest.In a gray scale image a p ixel is
represented by a word or 8 -bits, where a value of 0 represents black and
255 white.

Figure 1 5 Gray scale image representation

3. Colour Image : It has three values per pixel which measure
chrominance and intensity of light.Each pixel is avector of
colourcomponents.Commoncolour spaces are RGB(RegGren Blue),
CMYK(Cyan, Magenta, Yellow, Black) and HSV(Hue, Saturation
,Value).

4. Volume Image: It is three -dimensional image and can be obtained
from some medical imaging euquipmentwhere individual data points
are called voxels - volume pixels. Example : CAT scan
munotes.in

Page 14


Digital Image Processing
14 5. Range Image :These are special class of digital images .Here each
pixel represents a distance between a reference frame and a visible
point in the screen.Also called as depth images

6. Multispectral Image :These are images of same object taken in
different bands of visible or infrared regions.they contain information
outside the normal perception range of human.
1.8 ELEMENTS OF IMAGE PROCESSING SYSTEM
Different elemenst of image processing system are :
1. Image – auisition elements : Image acquisition can be defined as the
act of procuring an image from sourcesi.e computer and storing them
in devices . This can be done via hardware systems such as cameras,
encoders, sensors, etc. Digital cameras are popular n owadays. Most of
the digital cameras use CCD or CMOS image sensor. Some of
thdefinitios commonly used i n solid -state image sensors are :
a. CCD (Charge -Coupled Device): A light -sensitive array of silicon
cells that is commonly used for digital camera image sen sors. It
generates electrical current in proportion to light input and allows the
simultaneous capture of many pixels with one brief exposure.
b. Dark current: the charge of signal collected by pixel in absence of
light
c. Photo site : it is the portion of silicon that functions as a light sensitive
area.
d. Pixel : It is the discrete sensitive cell that collects and holda photo
charge.
e. Fill factor :The fill factor of an imaging sensor is defined as the ratio
of a pixel's light sensitive area to its total theoretical area.
f. Quantum efficiency :It is the measure of the effectiveness of an
imaging device to convert incident photons into electrons
2. Image – storage devices : in order to store the image obtained from
camera for processing and future use dif ferent storing devi ces are
used.B asica lly there are two types of storage devices :
i. Disk system
ii. Flash memory
3. Image – display devices : the final stage of image – processing step is
image display.Usually computer moniors are used for the same.some
of the factors likw size of mo nitor, number of colours ,spatial
resolution needs to be considered. munotes.in

Page 15


Introduction to Image -
Processing System
15 1.9 APPLICATIONS OF DIGITAL IMAGE
PROCESSING
Some of the most important fields in which digital image processing is
widely used are as given below :
 Image sharpening and restoration
 Medic al field
 Remote sensing
 Transmission and encoding
 Machine/Robot vision
 Color processing
 Pattern recognition
 Video processing
 Microscopic Imaging
1.10 SUMMARY
In this chapter we learnt about basics of digital image processing like its
structure components a nd conversion.
1.11 EXERCISES
1. Explain 2D image sampling.
2. Explain anatomy of human vision system.
3. Explain sampling and quantization.
4. Explain different element of image processing.
5. Discuss applications of image processing.
1.11 REFERENCES
1) Digital Image Pr ocessing, S Jayaraman, S Esakkirajan, T
Veerakumar,Tata McGraw -Hill Education Pvt. Ltd., 2009
2) Digital Image Processing 3rd Edition, Rafael C Gonzalez, Richard E
Woods, Pearson, 2008
3) Scilab Textbook Companion for Digital Image Processing, S.
Jayaraman, S. EsakkirajanAnd T. Veerakumar, 2016
munotes.in

Page 16

16 2
CONVOLUTION AND CORRELATION
Unit structure
2.0 Objectives
2.1 Introduction
2.2 2D Signals
2.3 2D Systems
2.4 2D convolution
2.4.1 Graphical Method
2.4.2 Z Transform
2.4.3 Matrix analysis
2.4.4 Circular convolution
2.5 2D Correlation
2.6 Summary
2.7 References
2.0 OBJECTIVES
In this chapter we are going to learn about theory of signals and some
mathematical concepts required in signal processing .
2.1 INTRODUCTION
Signals convey information. Systems transform signals. A signal can be,
for examp le, a sequence of commands or a list of names. Mathematically,
we model both signals and systems as functions. A signal is a function that
maps a domain, often time or space, into a range, often a physical measure
such as air pressure or light intensity. A system is a function that maps
signals from its domain —its input signals —into signals in its range —its
output signals. Both the domain and the range are sets of signals (signal
spaces). Thus, systems are functions that operate on functions.
An example of one D signal is ECG, 2D signal is a still image.
2.2 2D SIGNALS
2D discrete signals are represented as : x(n 1,n2) where x is a real or
complex value and n 1,n2a pair of integers. munotes.in

Page 17


Convolution and
Correlation
17  2D unit impulse sequence: it is given by x(n 1,n2) = (n1,n2)
Which is given by

 Line impulse : different types are :
o Vertical line impulse
o Horizontal line impulse
o Diagonal impulse
Figure 1 : 2D impulse sequence Vertical line impulse Horizontal
Line Impulse



Figure 2 : Diagonal impulse function Diagonal function
representation
 Exponential Sequence : they are defined by :

 Separable Sequence : A signal x(n 1,n2) is said to be separable if can be
represented as product of the function n 1 alone and a function n 2 alone
given as x(n 1,n2) =x 1(n1)x2(n2) munotes.in

Page 18


Digital Image Processing
18  Periodic Sequence : A 2d sequence is periodic if it repeats itself in a
regulary spaced interval.hence a sequence is periodic if
x(n 1,n2) = x(n 1 + N 1,n2) = x(n 1,n2 + N 2) where N 1 and N 2 are positive
integers
2.3 2D SYSTEMS
A 2D system is a device or algorithm that carries out some operation on a
2D signal.if x(n 1,n2) is input signal to the system and y(n 1,n2) is output of
the system then the relation is : y(n 1,n2) = T[x(n 1,n2)] where T is the
transformation by the system to produce output from input.
Classification of 2D systems : It can be classified as :
1. Linear Vs Non -Linear systems : A linear system is the one that
follows the laws of superposition. This law is necessary and sufficient
condition to prove the linearity of t he system.the Linearity is defines as
:

Where a and b are scalar constants.

Figure 3 : Superposition Principle
2. Shift Variant Vs shift -invariant systems : If the input -output of the
systems does not change with time then the system is said to be shift
invariant. Mathematically it is given by :

3. Static Vs Dynamic system : If the output of the systems at nay instant
depends on most of the input sample but not on past and future
samples of the input then the system is said to be static or memoryless.
In any other case th system is said to be dynamic or of having
memory.

munotes.in

Page 19


Convolution and
Correlation
19 4. Stable system : A 2D linear shift invariant system is stable if and
only if its impulse response is absolutely summable given by :

2.4 2D CONVOLUTION
Convolution and correlation are used to extract information from
images.They are linear and shift -invariant operations.Convolution is the
relationship between a system's input signal, output signal, and impulse
response. Shift -invariant means that we perform the same operation at
every po int in the image.Linear means that this operation is linear, that is,
we replace every pixel with a linear
combination of its neighbors. These two properties make these operations
very simple;it’s simpler if we do the same thing everywhere, and linear
operations are always thesimplest ones. Correlation is a way to identify a
known waveform in a noisy background. Correlation is a mathematical
operation much very similar to convolution. Just as with convolution,
correlation uses two signals to produce a third signal. This third signal is
called the cross -correlation of the two input signals. If a signal is
correlated with itself, the resulting signal is instead called the
autocorrelation.
Convolution has wide range of applications like image fitering ,
enhancem ent, restoration ,feature extraction and template matching.
The 2 -dimensional discrete convolution between two signals x[n 1,n2] and
h[n 1,n2] is given by :

Convultions can be performed either in spatial domain or frequency
domain.
2.4.1 2D CONVOLUTION THR OUGH GRAPHICAL METHOD
In 2D convolution through graphical method following operations are
performed :
 Folding
 Shifting and
 Addition
munotes.in

Page 20


Digital Image Processing
20 The process is performed as follows :
1. Given two matrices as input :x[n,m] and h[,m] ,first determine the
dimension of the resultant matrix i.e if x[n,m] is matrix of order 2x3
and h[,m] is of order 3x1 then resultant matrix is of order 4x3

2. The resultant matrix is calculated as :





munotes.in

Page 21


Convolution and
Correlation
21 3. Determine the values y(0,0), y(0,1) etc.wherey(0,0) is obtained as :

Similary values for other elements.
4. The resulatant matrix is :

2.4.2 2D CONVOLUTION THROUGH Z TRANSFORM
The Z -transform is a mathematical tool which is used to convert the
difference equations in discrete time domain into the algebraic equations
in z-domain.
Mathematica lly, if x(n) is a discrete time function, then its Z -transform is
defined as,

The convolution between two signals x(n 1,n2) and h(n 1,n2) is given by
y(n 1,n2) = x(n 1,n2) ** h(n 1,n2) where ** indicated convolution
taking Z transform on both the sides we get
Y(Z 1,Z2) = X(Z 1,Z2) x H(Z 1,Z2)
Convolution is one domain is equal to multiplication in other domain. munotes.in

Page 22


Digital Image Processing
22 2.4.3 2D CONVOLUTION THROUGH MATRIX ANALYSIS
2D convolution can be performed using matrix multiplication with the
help of block Teoplitz matrix of one ma trix and column matrix of the
other .
The flowchart for the same is as given below :

The number of block matrices depend on number of rows of x[m,n]
2.4.4 CIRCULAR CONVOLUTION
Circular convolution can be carried out for signals which are periodic in
nature.
The convolution between two signals x(n 1,n2) and h(n 1,n2) is given by
y(n 1,n2) = x(n 1,n2)  h(n 1,n2)
The flowchart for the same is as given below : munotes.in

Page 23


Convolution and
Correlation
23

Circular correlation is widely used in zooming operation of digital
cameras.
2.5 2D CORRELATION
Correlation is a technique of signal matching .It is an important
component of radar,sonar, digital communication and other
systems.Mathematically similar to convolution it is of following types :

Correlation can be performed using :
 Matrix methods : with t he help of block Teoplitz matrix and block
Circulant matrix in terms of convolution

munotes.in

Page 24


Digital Image Processing
24 The flowchart for the same is as given below :



munotes.in

Page 25


Convolution and
Correlation
25 Circular correlation can be performed through matrix method as follows :







munotes.in

Page 26


Digital Image Processing
26 Circular correlation can be perform ed through transform method as
follows :

2.6 SUMMARY
In this chapter we learnt about convolution that can be carried out for
filtering process, its types and methods to perform.we also learnt about
correlation which is used to find the similarity between images or part of
images.
2.7 EXERCISES
1. Write a note on 2d signals.
2. Explain 2D systems and its classifications.
3. Define convolution . Explain its types of representation.
4. Write a note on cicular convolution.
5. Discuss applications ofCircular Convolution
6. Write a note on Correlation. munotes.in

Page 27


Convolution and
Correlation
27 2.7 REFERENCES
1) Digital Image Processing, S Jayaraman, S Esakkirajan, T
Veerakumar,Tata McGraw -Hill Education Pvt. Ltd., 2009
2) Digital Image Processing 3rd Edition, Rafael C Gonzalez, Richard E
Woods, Pearson, 2008
3) Sc ilab Textbook Companion for Digital Image Processing, S.
Jayaraman, S. EsakkirajanAnd T. Veerakumar, 2016



munotes.in

Page 28

28 3
IMAGE TRANSFORMS
Unit structure
3.0 Objectives
3.1 Introduction
3.2 Need for transform
3.3 Image transforms
3.4 Fourier transform
3.5 2D Discrete Fourier transform
3.6 Properties of 2D D FT
3.7 Other transforms
3.6 Summary
3.7 References
3.0 OBJECT IVES
In this chapter we are going to learn about image transforms which are
widely used in image analysis and image processing .Transform is a
mathematical tool basically used to move from one domain to another i.e
from time domain to frequency domain . t hey are useful for fast
computation of correlation and convolution.
3.1 INTRODUCTION
An image transform can be applied to an image to convert it from one
domain to another. Viewing an image in domains such as frequency
or Hough space enables the identific ation of features that may not be as
easily detected in the spatial domain. Common image transforms include:
 Hough Transform, used to find lines in an image
 Radon Transform, used to reconstruct images from fan -beam and
parallel -beam projection data
 Discret e Cosine Transform, used in image and video compression
 Discrete Fourier Transform, used in filtering and frequency analysis
 Wavelet Transform, used to perform discrete wavelet analysis, denoise,
and fuse images munotes.in

Page 29


Image Transforms
29 3.2 NEED FOR TRANSFORM
The main purpose of t he transform can be divided into following 5
groups:
1. Visualization: The objects which are not visible, they are observed.
2. Image sharpening and restoration: It is used for better image
resolution.
3. Image retrieval: An image of interest can be seen
4. Measureme nt of pattern: In an image, all the objects are measured.
5. Image Recognition: Each object in an image can be distinguished.
Transform is a mathematical toolto transform a signa. Hence it is required
for :
 Mathematical convenience

 To extract more informati on .


Figure 1 : Transformation Concept

munotes.in

Page 30


Digital Image Proces sing
30 3.3 IMAGE TRANSFORMS
Image transfor is represenatation of image . it is done for following two
reasons :
1. To isolate critical component of image so that they are accessible
directly.
2. It will make the image data c ompact so that they can be efficiently
stored and transmitted.
Different types of image transforms that wil be discussed are :
 Fourier transform
 Walschtransform
 Hadamardtransform
 Discrete Cosine transform
 Wavelet transform etc
Classification of image trans forms : they can be classified on the basis of
nature of fuctions as follows :

Figure 2 :Classification of Image transforms
3.4 FOURIER TRANSFORM
The Fourier transform states that that the non periodic signals whose area
under the curve is finite can al so be represented into integrals of the sines
and cosines after being multiplied by a certain weight. munotes.in

Page 31


Image Transforms
31 The Fourier transform is a representation of an image as a sum of complex
exponentials of varying magnitudes, frequencies, and phases. The Fourier
transfo rm plays a critical role in a broad range of image processing
applications, including enhancement, analysis, restoration, and
compression.
The Fourier transform also has many wide applications that include ,
image compression (e.g JPEG compression) , filtr ering and image
analysis.

A continuous Time Fourier Transfrom (CTFT) is defined as :

A continuous time signal x(t) is converted to discrete tie signal x(nT)
using sampling process where T is the sampling interval.

The fourier transform of finite ener gy Discrete time signal is given by :





munotes.in

Page 32


Digital Image Proces sing
32 After a series of transformations :

3.5 2D DISCRETE FOURIER TRANSFORM
Functioning with the Fourier transform on a computer usually comprises a
form of the transform called as the discrete Fourier transform ( DFT). A
discrete transform is a transform whose input and output values are
discrete samples, making it convenient for computer manipulation. There
are two principal reasons for using this form of the transform:
 The input and output of the DFT are both dis crete, which makes it
convenient for computer manipulations.
 There is a fast algorithm for computing the DFT known as the fast
Fourier transform (FFT).
where R(k,l) represents the real part of spectrum and l(k,l) the imaginary
part.


munotes.in

Page 33


Image Transforms
33 3.6 PROPERTIES O F 2D -DFT
There are many properties of the transformation that give vision into the
content of the frequency domain representation of a signal and allow us to
manipulate singals in one domain or the other.
Following are some of the properties :
1. Separable pr operty : It is computed in two steps by successive 1D
operations on rows and columns of an image .


Figure 3 :Computation of 2D -DFT bySeparable Property
2. Spatial Shift property: Since both the space and frequency domains
are considered periodic for the pu rposes of the transforms, shifting
means rotating around the boundaries.

3. Periodicity Property :

4. Convolution property : Convolution is a mathematical tool for
merging two signals to produce a third signal. In other words, the
convolution can be defined a s a mathematical operation that is used to
express the relation between input and output an LTI
system.convoltion in spatial domain is same as multiplication in
frequency domain .
munotes.in

Page 34


Digital Image Proces sing
34 5. Correlation property : It is used to find relative similarity in between
two signals.When we find similarity between of a signal to itself it is
called as autocorrelation . On the other hand when we find similarity
between two signals it is called as cross correlation. The cross
correlation between two signals is same as perfor ming convolution of
one sequence with the folded version of other sequence.Thus this
property says that correlation of two sequneces in time domain is same
as multiplication of DFT of one sequence and time reversal of DFT of
another sequence in frequency d omain.
6. Scaling property : Just as in one dimension, shrinking in one domain
causes expansion in the other for the 2D DFT. This means that as an
object grows in an image, the corresponding features in the frequency
domain will expand.Sclaing is used mainly t o shrink or expand the size
of an image.

7. Conjugate symmetry :

8. Orthogonality property :

9. RoatationProperty : It states that if a function is rotated y an angle
then its Fourier transform also rotates by ana equal amount.


munotes.in

Page 35


Image Transforms
35 10. Multiplication by Exponen t:

All the above properties in nutshell are as follows :

Table 1 : Properties of DFT
3.7 OTHER TRANSFORMS
Some other transforms used in image processing are :
1. Walsh Transform : The Walsh -Hadamard transform is a non -
sinusoidal, orthogonal transform wid ely used in signal and image
processing. In this transform, the signal is decomposed into a set of
basis functions (similar to harmonics in Fourier).



2. Hadamard Transform: Same as Walsh transform with the difference
that rows of the transform matrix ar e re-ordered.The elements of
mutually orthogonal basis vectors of a Hadamard transform are either
+1 or -1. munotes.in

Page 36


Digital Image Proces sing
36 3. HAAR transform: Haar wavelet compression is an efficient way to
perform both lossless and lossy image compression . It relies on
averaging and differ encing values in an image matrix to produce a
matrix which is sparse or nearly sparse. A sparse matrix is a matrix in
which a large portion of its entries are 0.it is based on class of
orthogonal matrices whose elements are either +1 , -1 or 0 multiplied
by powers of
.Algorithm to generate HAAR basis is :


4. Slant Transform: A new unitary transform called the slant transform,
specifically designed for image coding, was developed by Enomoto
and Shibata. The transformation possesses a discrete sa wtoothlike
basis vector which efficiently represents linear brightness variations
along an image line. The slant transformation has been utilized in
several transform image -coding systems for monochrome and color
images
5. Discrete Cosine Transform: It is a tr ansform that is mainly used in
compression algorithms. It transforms data points in a spatial domain
into a frequency domain. This makes it easier to find the repetition of
patterns. Like any other transform, it is also invertible. This means we
can return the actual data points if the transforms are given.It was
developed by Ahmad, Natrajan and Rao in 1974. munotes.in

Page 37


Image Transforms
37

Where k varies from 0 to N -1
6. KL Transform: Named after Karim Karhunen and Michel Loeve
Transform (KLT), it is also known as the Hotelling trans - form or
Principal Component Analysis (PCA). It is based on the statistical
properties of the image and has several important properties that make
it useful for image processing particularly for image compression.The
main purpose of image compression is to st ore the image in fewer bits
as compared to original image, now data from neighboring pixels in an
image are highly correlated.More image compression can be achieved
by de -correlating this data. The KL transform does the task of de -
correlating the data thus facilitating higher degree of compression.It is
used in clustering analysis and image compression. There are four
major steps in order to find the KL transform : -
a. Find the mean vector and covariance matrix of the given image x.
b. Find the Eigen values and th en the eigen vectors of the covariance
matrix
c. Create the transformation matrix T, such that rows of T are eigen
vectors
d. Find the KL Transform
Comparison of various image transforms

Table 2 :Comparison of Transforms
3.8 SUMMARY
In this chapter we studied why image transforms are required in digital
image processing , the different types available .

munotes.in

Page 38


Digital Image Proces sing
38 3.9 EXERCISES
1. Why we need image transforms
2. What are different types of transforms?
3. Explain properties of DFT.
4. What is main difference between Walsh and Hadam ard transform?
5. Explain Fourier transform and its properties .
6. What are advantages of Walsh over Fourier transform?
3.10 REFERENCES
1) Digital Image Processing, S Jayaraman, S Esakkirajan, T Veerakumar,
Tata McGraw -Hill Education Pvt. Ltd., 2009
2) Digit al Image Processing 3rd Edition, Rafael C Gonzalez, Richard E
Woods, Pearson, 2008
3) Scilab Textbook Companion for Digital Image Processing, S.
Jayaraman, S. EsakkirajanAnd T. Veerakumar, 2016




munotes.in

Page 39

39 4
IMAGE ENHANCEMENT
Unit Structure
4.0 Objectives
4.1 Introduction
4.2 Image Enhancement in spatial domain
4.2.1 Point operation
4.2.2 Mask operation
4.2.3 Global operation
4.3 Enhancement through Point operations
4.3.1 Types of point operations
4.4 Histogram manipulation
4.5 LinearGray Level Transformation
4.6 Nonlinear Gray Level Transformation
4.6.1 Thresholding
4.6.2 Gray -level slicing
4.6.3 Logarithmic transformation
4.6.4 Exponential transformation
4.6.5 Power law transformation
4.7 Local or neighborhood operation
4.7.1 Spatial filtering
4.7.2 Linear filtering
4.7.3 Mean filter/ Average filter/ Low pass filter
4.7.4 Weighted average filter
4.7.5 Bartlett filter
4.7.6 Gaussian filter
4.8 Median Filter
4.9 Spatial domain High pass filtering munotes.in

Page 40


Digita l Image Processing
40 4.9.1 High boost filtering
4.9.2 Unsharp masking
4.10 Bit-plane slicing
4.11 Image Enhancement in frequency domain
4.12 Homomorphic filter
4.13 Zooming operation
4.14 Image Arithmetic
4.14.1 Image addition
4.14.2 Image subtraction
4.14.3 Image multiplicatio n
4.14.4 Image division
4.15 Summary
4.16 List of References
4.17 Unit End Exercises
4.0 OBJECTIVES
 To understand the image enhancement in spatial and frequency
domain
 To get familiar with point and mask operations in spatial domain
 To learn different t ypes of transformations on an image and the
filtering operations
4.1 INTRODUCTION
The goal of image enhancement is to make it easier for viewers to
understand the information contained in images. A better -quality image is
produced by an enhancement algorit hm when it is used for a certain
application. This can be accomplished by either reducing noise or boosting
image contrast.
For presentation and analysis, image -enhancement techniques are used to
highlight, sharpen, or smooth visual characteristics. Applic ation -specific
enhancement techniques are frequently created via empirical research.
Image enhancement techniques highlight particular image elements to
enhance how an image is perceived visually.
Image -enhancement methods can be divided into two groups in general,
including
(1) A method in the spatial domain; and (2) A method in the transform
domain. munotes.in

Page 41


Image Enhancement
41 Whereas the transform domain approach works with an image's Fourier
transform before transforming it back into the spatial domain, the spatial
domain method o perates directly on pixels. Because they are quick and
easy to use, histogram -based basic improvement approaches can produce
results that are acceptable for some applications. By removing a portion of
a filtered component from the original image, unsharp m asking sharpens
the edges. Unsharp masking is a technique that has gained popularity as a
diagnostic aid.
4.2 IMAGE ENHANCEMENT IN SPATIAL DOMAIN
The alteration of pixel values is the focus of the spatial domain approach.
The three basic categories of the spatial domain technique are i) point
operation, (ii) mask operation, and (iii) global operation.
4.2.1 Point operation
In point operation, each pixel is modified by an equation that is not
dependent on other pixel values. The point operation is illustrate d in
Figure 1. The point operation is represented by

In point operation, T operates on one pixel, or there exists a one -to-one
mapping between the input image f (m, n) and the output image g (m, n).

Figure 1: Point operation
4.2.2 Mask operation
As sho wn in Figure 2, each pixel is altered during the mask operation in
accordance with the values in a small neighbourhood. Spatial low -pass
filtering with a box filter or median filter are examples of mask operations.
In \smask operation, the operator T opera tes on the neighbourhood of
pixels.Here, mask is a little matrix whose elements are frequently referred
to as weights. Every mask has a history. A symmetric mask's roots are
typically found in its centre pixel position. Depending on the desired
usage, any pixel location may be used as the origin for non -symmetric
masks. munotes.in

Page 42


Digita l Image Processing
42

Figure 2: Mask processing
4.2.3 Global operation
All of the image's pixel values are taken into account during global
operation. Frequency domain operations are often global operations.
4.3 ENHANCEMENT THROUGH POINT OPERATIONS
Each pixel value is mapped to a new pixel value in point operation. Point
operations essentially have no memory. The enhancement at any position
in a point operation depends only on the image value there.The point
operation, which is shown in Figure 3, converts the input picture f (m, n)
into the output image g (m, n).It is clear from the figure that every f (m, n)
pixel with the same grey level maps to a single grey value in the final
image.

Figure 3: Illustration o f point operation
4.3.1 Types of point operations
Some of the examples of point operation include (i) brightness
modification, (ii) contrast manipulation, and (iii) histogram manipulation
1] Brightness modification
The value assigned to each pixel in an im age determines how bright it is.
A constant is either added to or subtracted from the luminance of each
sample value to alter the brightness of the image. By adding a constant
value to each and every pixel in the image, the brightness can be munotes.in

Page 43


Image Enhancement
43 raised.The bri ghtness can also be reduced by deducting a fixed amount
from each and every pixel in the image.
 Increasing the brightness of an image: A simple method to increase the
brightness value of an image is to add a constant value to each and
every pixel of the im age. If f [m, n] represents the original image then
a new image g [m, n] is obtained by adding a constant k to each pixel
of f [m, n].This is represented by
g[m, n] = f [m, n] + k

Figure 4: Brightness enhancement

 Decreasing the brightness of an image: The brightness of an image can
be decreased by subtracting a constant k from all the pixels of the
input image f [m,n]. This is represented by
g [m, n] = f [m, n] – k

Figure 5: Brightness suppression
2] Contrast adjustment
Contrast adjustment is done by scaling all the pixels of the image by a
constant k. It is given by
g [m, n] = f [m, n] ∗ k
Changing the contrast of an image, changes the range of luminance values
present in the image
Specifying a value above 1 will increase the contrast by making bright
samples brighter and dark samples darker, thus expanding on the range
used. A value bel ow 1 will do the opposite and reduce a smaller range of
sample values. An original image and its contrast -manipulated images are
illustrated in Figure 6. munotes.in

Page 44


Digita l Image Processing
44

Figure 6: Contrast manipulation
4.4 HISTOGRAM MANIPULATION
Histogram manipulation generally involves changing the histogram of an
input image to enhance the image's visual appeal. One must have a
foundational understanding of the histogram of the image in order to
comprehend histogram manipulation.
A] Histogram: The gray -level values of an image are plo tted against the
number of instances of each type of grey in the image to create the
histogram. A useful summary of an image's intensities is provided by the
histogram, but it is unable to transmit any knowledge of the spatial
connections between pixels. F urther information on image contrast and
brightness is given by the histogram.
1. The lowest grey levels will be concentrated in the histogram of a dark
image.
2. A bright image will have a greater grey level cluster in the histogram.
3. The histogram won't be even ly distributed for a low -contrast image;
instead, it will be narrow.
4. The histogram will have an equal spread in the grey level for a high -
contrast image.
Image brightness may be improved by modifying the histogram of the
image.
B] Histogram equalisation: In order to distribute the grey levels of an
image uniformly across its range, a procedure called equalisation is used.
Based on the image histogram, histogram equalisation reassigns the
brightness values of pixels. The goal of histogram equalisation is to
produce a picture with a histogram that is as flat as feasible. More
aesthetically acceptable outcomes are produced by histogram equalisation
across a wider spectrum of photos.
C] Procedure to perform Histogram equalization : Histogram
equalisation is done by performing the following steps:
1. Find the running sum of the histogram values. munotes.in

Page 45


Image Enhancement
45 2. Normalise the values from Step (1) by dividing by the total number of
pixels.
3. Multiply the values from Step (2) by the maximum gray -level value and
round.
4. Map the gray level values to the results from Step (3) using a one -to-
one correspondence.
Example: Perform histogram equalisation on the following image:

Solution: The maximum value is found to be 5. We need a minimum of 3
bits to represent the number. Ther e are eight possible gray levels from 0 to
7. The histogram of the input image is given below:

Step 1: Compute the running sum of histogram values. The running sum of
histogram values is otherwise known as cumulative frequency distribution

Step 2: Divide the running sum obtained in Step 1 by the total number of
pixels. In this case, the total number of pixels is 25.

Step 3: Multiply the result obtained in Step 2 by the maximum gray -level
value, which is 7 in this case munotes.in

Page 46


Digita l Image Processing
46

The result is then rounded to th e closest integer to get the following table:

Step 4: Mapping of gray level by a one -to-one correspondence:

The original image and the histogram equalised image are shown side by
side.

4.5 LINEAR GRAY LEVEL TRANSFORMATION
An image's linear transforma tion is a function that converts each pixel's
grayscale value into a different grayscale at the same location using a
linear function. The linear transformation is represented by munotes.in

Page 47


Image Enhancement
47

The linear transformation is given by g(m, n) = T [ f (m, n)].
Image negati ve or inverse transformation:
Light and dark are turned around in the inverse transformation. An image
negative is an illustration of an inverse transformation. Each pixel is
subtracted from the highest pixel value to create a negative image. The
negative picture for an 8 -bit image can be created by reversing the scaling
of the grey levels according to the transformation
g (m, n) = 255 − f (m, n)
The graphical representation of negation is shown figure 7.

Figure 7: Graphical representation of negations
Negative images are useful in the display of medical images and
producing negative prints of images.
4.6 NONLINEAR GRAY LEVEL TRANSFORMATION
Non-linear transformation maps small equal intervals into non -equal
intervals. Following are few of the non -linear tr ansformations:
4.6.1 Thresholding
To extract a portion of an image that contains all the information,
thresholding is necessary. The problem of segmentation in general
includes thresholding. Hard thresholding and soft thresholding are two
major categories for thresholding.The procedure of hard thresholding
entails reducing coefficients with absolute values below the threshold to
zero. Another technique is soft thresholding, which shrinks the nonzero
coefficients towards zero by first setting to zero coeffic ients whose
absolute values are below the threshold.

munotes.in

Page 48


Digita l Image Processing
48 4.6.2 Gray -level slicing
The purpose of gray -level slicing is to highlight a specific range of gray
values. Two different approaches can be adopted for gray -level slicing
(a) Gray -level Slicing without Pre serving Background This displays
high values for a range of interest and low values in other areas. The main
drawback of this approach is that the background information is discarded.
(b) Gray -level Slicing with Background In gray -level slicing with
backgr ound, the objective is to display high values for the range of interest
and original gray level values in other areas. This approach preserves the
background of the image.
4.6.3 Logarithmic transformation
The logarithmic transformation is given by
g (m, n) = clog 2( f (m, n) + 1)
This type of mapping spreads out the lower gray levels. For an 8 -bit
image, the lower gray level is zero and the higher gray level is 255. It is
desirable to map 0 to 0 and 255 to 255. The function g (m, n) = clog( f (m,
n) + 1) spr eads out the lower gray levels.
4.6.4 Exponential transformation
In multiplicative filtering processes, exponential transformation has
primarily been utilized in conjunction with logarithmic transformation. An
exponential transfer function has the effect o f extending high -contrast
edges while contracting low -contrast edges in an image. This is not a
desired enhancement transformation because it typically results in images
with less visible detail than the original.
4.6.5 Power law transformation
A physical device like a CRT does not produce light with a linear
relationship to the input signal.The intensity generated at the display's
surface is about the applied voltage raised by a factor of 2.5. Gamma is the
name given to the numerical value of this power fu nction's exponent. To
obtain accurate intensity reproduction, this non -linearity must be adjusted.
The formula for the power law transformation is

where f(m, n) is the input image and g(m, n) is the output image. Gamma
(γ) can take either integer or frac tion values.
4.7 LOCAL OR NEIGHBORHOOD OPERATION
Neighborhood operation modifies an image's pixels based on how the
pixels in their immediate vicinity work. The term "convoluting a mask
with an image" is frequently used to describe linear spatial filtering . munotes.in

Page 49


Image Enhancement
49 Convolution masks or convolution kernels are other names for the filter
masks.
4.7.1 Spatial filtering
Using spatial filtering, the original picture pixel value that corresponds to
the kernel's centre is replaced with the sum of the original pixel values in
the region that the kernel corresponds to, multiplied by the kernel weights.
4.7.2 Linear filtering
Each pixel in the input image is replaced with a linear combination of
brightness of nearby pixels in linear filtering. In other words, each pixel
value in the output image represents a weighted sum of the pixels nearby
the corresponding pixel in the input image. A picture can be both
sharpened and smoothed via linear filtering. Using a convolution mask,
one can create a spatially invariant linear filter. The linear filter is spatially
variable if various filter weights are applied to various portions of the
image.
4.7.3 Mean filter/ Average filter/ Low pass filter
The average of all the values in the immediate area is used to replace each
pixel with the me an filter. The degree of filtration is determined by the
size of the neighbourhood. Each pixel in a spatial averaging operation is
replaced by the weighted average of the pixels in its immediate vicinity.
The low -pass filter removes the sharp changes that cause blurring and
keeps the smooth area of the image. The following is the 3 by 3 spatial
mask that may execute the averaging operation:

It should be noted that for a low -pass spatial mask, the elements' sum is
equal to 1. The larger the mask becomes, t he more blurring there will be.
In order to precisely find the core pixel, the mask's size is typically
unusual.



munotes.in

Page 50


Digita l Image Processing
50 4.7.4 Weighted average filter
The mask of a weighted average filter is given by

The weighted average filter gets its name because it is clear from the mask
that the closest pixels to the centre are weighted more heavily than the
farthest pixels.The pixel that has to be updated is changed by multiplying
the value of the neighbouring pixel by the matrix's weights and dividing
the result by t he sum of the matrix's coefficients.
4.7.5 Bartlett filter
In the spatial domain, the bartlett filter is a triangle -shaped filter. A bartlett
filter can be created by taking the product of two box filters in the
frequency domain or by convolving two box f ilters in the spatial domain.

4.7.6 Gaussian filter
A class of linear smoothing filters known as "Gaussian filters" select their
weights based on the characteristics of a Gaussian function. The Gaussian
kernel is frequently employed for smoothing. The fo rmula for the
Gaussian filter in continuous space is

The expression above demonstrates the separability of a Gaussian filter. A
extremely effective filter for eliminating noise derived from a normal
distribution is the Gaussian smoothing filter. A partic ular type of
averaging called Gaussian smoothing uses a 2D Gaussian as its kernel.
4.8 MEDIAN FILTER
Similar to the mean filter, the median filter in image processing is
typically used to minimize noise in an image. To preserve relevant detail
in the image , it frequently performs better than the mean filter.
munotes.in

Page 51


Image Enhancement
51 Median Filter is a simple and powerful non -linear filter.
 It is employed to lessen the degree of intensity difference between
adjacent pixels.
 With this filter, the pixel value is swapped out for the m edian value.
 In order to calculate the median, all the pixel values are sorted first
into ascending order, and then the middle pixel value is substituted for
the pixel being calculated.
In order to determine if a pixel is typical of its surroundings, the m edian
filter examines each pixel in turn and looks at its close neighbours. The
median of those values is used to replace the pixel value rather than just
the mean of its neighbours' pixel values. The median is derived by placing
the pixel under considerat ion in the middle of the neighborhood's pixel
values, which have been sorted into numerical order. The average of the
two middle pixel values is used (if the neighbourhood under consideration
has an even number of pixels).
Example: Compute the median value of the marked pixel using 3x3 mask


4.9 SPATIAL DOMAIN HIGH PASS FILTERING
Image sharpening's primary goal is to draw attention to the image's minute
features. The high -frequency components are enhanced through image
sharpening. The spatial filter or s patial mask which performs image
sharpening is given below:

The sum of all the weights is zero; this implies that the resulting signal
will have zero dc value.
munotes.in

Page 52


Digita l Image Processing
52 4.9.1 High boost filtering
High -frequency emphasis filter is another name for a high -boost fi lter. To
help with visual interpretation, a high -boost filter is utilized to keep some
of the low -frequency components. Before deleting the low -pass image in
high-boost filtering, the input image f(m, n) is amplified by an
amplification factor A.
The expre ssion for the high -boost filter then becomes
High boost = A x f(m,n) - low pass
Adding and subtracting 1 with the gain factor, we get

Thus,

4.9.2 Unsharp masking
One of the methods frequently used for edge enhancement is unsharp
masking. This method ti ps the balance of the image towards the sharper
material by subtracting the smoothed version of the image from the
original image. Below is a description of how to accomplish unsharp
masking:
1. Apply an image blur filter.
2. Subtract from the original ima ge the result from Step 1.
3. Add a weighted fraction to the result from Step 2 to get the final result.
4. Join the original image to the outcome from Step 3.
The equation for the unsharp masking operation is

4.10 BIT -PLANE SLICING
The gray level of eac h pixel in a digital image is stored as one or more
bytes in a computer. For an 8 -bit image, 0 is encoded as 0 0 0 0 0 0 0 0,
and 255 is encoded as 1 1 1 1 1 1 1 1. Any number between 0 and 255 is
encoded as one byte. The bit in the far -left side is referr ed as the Most
Significant Bit (MSB), because a change in that bit would significantly munotes.in

Page 53


Image Enhancement
53 change the value encoded by the byte. The bit in the far right is referred as
the Least Significant Bit (LSB), because a change in this bit does not
change the encoded g ray value much. The bit -plane representation of an
eight -bit digital image is shown in figure 8.

Figure 8: Bit plane representation of a digital image
One or more bits of the byte are used for each pixel when an image is
represented using the bit -plane s licing technique. The original grey level is
converted to a binary representation because one can only use the MSB to
represent a pixel.
Bit plane slicing has three major purposes:
i) converting a grayscale image to a binary image;
ii) representing an image wit h fewer bits and compressing it to a smaller
size; and
iii) enhancing the image by sharpening certain areas.
4.11 IMAGE ENHANCEMENT IN FREQUENCY
DOMAIN
Frequency is the rate at which a particular periodic event repeats itself.
Spatial frequency in image proces sing is the fluctuation of image
brightness with image position in space. A signal that changes over time
can be converted into a sequence of easy periodic variations. A signal is
broken down into a collection of sine waves with varying frequency and
phase using the Fourier transform. Because a Fourier transform is entirely
reversible, if we compute a picture's Fourier transform and then
immediately inverse convert the result, we can recover the original image.
On the other hand, we can emphasize some frequ ency components and
attenuate others by multiplying each element of the Fourier coefficient by
an appropriately selected weighting function.After computing an inverse
transform, the corresponding changes in the spatial form can be observed.
Frequency domai n filtering, often known as Fourier filtering, is the
process of selectively enhancing or suppressing frequency components.
The adjacency relationship between the pixels is described by the spatial
representation of image data. The frequency domain represe ntation, on the
other hand, groups the visual data based on the frequency distribution. In
frequency domain filtering, the picture data is divided into different
spectral bands, each of which represents a particular range of image
information. Frequency do main filtering is the method of selective
frequency inclusion or exclusion. munotes.in

Page 54


Digita l Image Processing
54 It is possible to perform filtering in the frequency domain by specifying
the frequencies that should be kept and the frequencies that should be
discarded. Spatial domain filtering is accomplished by convolving the
image with a filter kernel. We know that
Convolution in spatial domain = Multiplication in the frequency domain
If filtering in spatial domain is done by convolving the input image f (m,
n) with the filter kernel h(m, n)

In the frequency domain, filtering corresponds to the multiplication of the
image spectrum by the Fourier transform of the filter kernel, which is
referred to as the frequency response of the filter

Here, F (k, l) is the spectrum of the input image and H (k, l) is the
spectrum of the filter kernel. Thus, frequency domain filtering is
accomplished by taking the Fourier transform of the image and the Fourier
transform of the kernel, multiplying the two Fourier transforms, and taking
the inverse Fourier tr ansform of the result. The multiplication of the
Fourier transforms needs to be carried out point by point. This point -by-
point multiplication requires that the Fourier transforms of the image and
the kernel themselves have the same dimensions. As convolut ion kernels
are commonly much smaller than the images they are used to filter, it is
required to zero pad out the kernel to the size of the image to accomplish
this process.
4.12 HOMOMORPHIC FILTER
An image can be modeled as the product of an illumination function and
the reflectance function at every point. Based on this fact, the simple
model for an image is given by

This model is known as illumination -reflectance model. The illumination -
reflectance model can be used to address the problem of improving the
quality of an image that has been acquired under poor illumination
conditions.
In the above equation, f(n 1,n2) represents the image, i(n 1,n2) represents the
illumination component and r (n 1,n2) represents the reflectance
component. For many images, th e illumination is the primary contributor
to the dynamic range and varies slowly in space, while the reflectance
component r (n 1,n2) represents the details of the object and varies rapidly
in space. If the illumination and the reflectance components have t o be
handled separately, the logarithm of the input function f (n 1,n2) is taken. munotes.in

Page 55


Image Enhancement
55 Because f(n 1,n2) is the product of i (n 1,n2) with r (n 1,n2), the log of f (n 1,n2)
separates the two components as illustrated below:

where F I (k, l) and F R (k, l) are the Fo urier transform of the illumination
and reflectance components respectively. Then, the desired filter function
H(k,l) can be applied separately to the illumination and the reflectance
component separately as shown below:

In order to visualize the image, inverse Fourier transform followed by
exponential function is applied. First, the inverse Fourier transform is
applied as shown below:

The desired enhanced image is obtained by taking the exponential
operation as given below:

Here, g (n 1,n2) represents the enhanced version of the original image f
(n1,n2). The sequence of operation can be represented by a block diagram
as shown in figure 9

Figure 9: Homomorphic filtering block diagram
4.13 ZOOMING OPERATION
When an image has fine features, it could be difficult to see those elements
well in a monitor's normal display. Under these circumstances, zooming in
on the image enables viewing of fine features. The zooming function
allows for the expansion of an image. Holding a magnifying glass in front
of the s creen is the same as zooming. The simplest method of zooming is
the "replication" of pixels. "Put the same value of the pixel in a grid of
NxN pixels for every pixel of the image," is the replication principle.
Diagrammatic representation of this is in fig ure 10. munotes.in

Page 56


Digita l Image Processing
56

Figure 10: Zooming illustration
4.14 IMAGE ARITHMETIC
Several arithmetic processes, including image addition, subtraction,
multiplication and division are taken into account in image arithmetic.
Objects can be added to and subtracted from images using addition and
subtraction operations.
4.14.1 Image addition
Image addition is used to create double exposure. If f (m, n) and g(m, n)
represent two images then the addition of these two images to get the
resultant image is given by

If multiple image s of a given region are available for approximately the
same date and if a part of one of the images has some noise, then that part
can be compensated from other images available through image addition.
The concept of image addition is illustrated in figur e 11.

Figure 11: Image addition
In figure 11 there are three images (a), (b) and (c). Here, image (c) is
obtained by addition of images in (a) and (b).
4.14.2 Image subtraction
Image subtraction is used to find the changes between two images of a
same sc ene. The mathematical representation of image subtraction is given
by

To assess the degree of change in an area, two dates of co -registered
images can be used with the subtraction operation. Image subtraction can munotes.in

Page 57


Image Enhancement
57 be used to remove certain features in the image. The image subtraction is
illustrated in the figure 12.

Figure 12: Image subtraction
The image subtraction is illustrated in figure 12.The images (a) and (b) are
subtracted to get the image (c).
4.14.3 Image multiplication
Image multiplication is basically used for masking. If the analyst is
interested in a part of an image then extracting that area can be done by
multiplying the area by one and the rest by zero. The operation is depicted
in the figure 13.

Figure 13: Image multiplication
From the figure, it is obvious that the multiplication operation is used to
extract specific information from the image. From the figure, it is obvious
that the coin of interest is highlighted.
4.14.4 Image division
Dividing the pixels in one image by the correspo nding pixels in a second
image is commonly used in transformation.The operation is depicted in
the figure 14.From the figure it is obvious that the result of image division
is just opposite to that of image multiplication. munotes.in

Page 58


Digita l Image Processing
58

Figure 14: Image division
4.15 SUMMARY
 By modifying intensity functions, picture spectral content, or a
combination of these functions, image enhancement aims to alter how
the image is perceived.
 Images can be enhanced by removing blurriness and noise, boosting
contrast, and bringing ou t features, for instance.
 Either the spatial domain or the frequency domain can be used for
image enhancement. Application determines which image -
enhancement method should be used.
 The gray -level changes used in the spatial -domain picture
enhancement techn ique include image negative, image slicing, image
thresholding, gamma correction, etc.
 Each pixel's value is recalculated in a point operation in accordance
with a specific transformation. A common aspect of point operations is
image brightness and contras t correction.
 The frequency of occurrence of the grey levels is indicated by the
histogram of an image. The histogram of two photos can be identical.
A histogram is rotation -invariant.
4.16 LIST OF REFERENCES
1. Digital Image Processing, S Jayaraman, S Esakk irajan, T Veerakumar,
Tata McGraw -Hill Education Pvt. Ltd., 2009.
2. Digital Image Processing 3rd Edition, Rafael C Gonzalez, Richard E
Woods, Pearson, 2008.
3. Scilab Textbook Companion for Digital Image Proc essing, S.
Jayaraman, S. Esakkirajan and T. Veerakumar, 2016
(https://scilab.in/textbook_companion/generate_book/125).


munotes.in

Page 59


Image Enhancement
59 4.17 UNIT END EXERCISES
1) Explain the image enhancement in spatial domain.
2) Describe the enhancement process through point operations.
3) Write a note on Histogram manipulation.
4) Explain the Linear Gray Level Transformation.
5) Write a note on Nonlinear Gray Level Transformation.
6) Describe Median Filter.
7) Write a note on Spatial domain High pass filtering.
8) Explain the concept of Bit -plane slicing.
9) Describe the process of Image Enhancement in frequency domain.
10) Write a note on Homomorphic filter.
11) Describe various zooming operations.
12) State and explain various arithmetic operations performed on an
image.



munotes.in

Page 60

60 5
BINARY IMAGE PROCESSING
Unit Structure
5.0 Objectives
5.1 Introduction
5.2 Mathematical morphology
5.3 Structuring elements
5.4 Morphological image processing
5.5 Logical operations
5.6 Morphological operations
5.6.1Dilation
5.6.2 Erosion
5.6.3 Dilation and Erosion -based operations
5.7 Distance Transform
5.7.1 Euclidean distance
5.7.2 City -Block distance
5.7.3 Chessboard distance
5.7.4 Quasi - Euclidean distance transform
5.8 Summary
5.9 List of References
5.10 Unit End Exercises
5.0 OBJECTIV ES
 To get familiar with the concept of binary image processing and
various morphological operations involved

 To understand several binary image operations

 To get familiar with the fundamentals of distance transform


munotes.in

Page 61


Binary Image Processing
61 5.1 INTRODUCTION
The study of form, organization, and appearance is known as morphology.
A group of non -linear techniques known as mathematical morphology can
be used to an image to eliminate details that are smaller than a particular
reference shape. The initial definition of mathematical morphology's
operations as set operations demonstrated its use for handling sets of 2D
point sets. The morphological operations can be used to skelatonize, filter,
and extract the edges of a picture. Dilation, erosion, opening, and closing
are some of the fundamental morphological operations covered in this
chapter. They are followed by the characteristics of morphological
processes.Although binary, grayscale, and colour pictures can all be
subjected to morphological processes, our focus in this chapter is on using
various morphological procedures on binary images.An important subclass
of digital images are binary images, which have only two grey levels. The
final product of picture segmentation is typically a binary image.
5.2 MATHEMATICAL MORPHOLOGY
A high ly fruitful area of image processing is mathematical morphology. A
method for removing visual elements that are helpful for representation
and description is mathematical morphology. The technique was originally
developed by Matheron and Serra at the Ecole des Mines in Paris. The
collection of structural data on the visual domain serves as the inspiration.
Set theory serves as the sole foundation for the subject matter of
mathematical morphology. Several useful operators described in
mathematical morphology can be used by employing set operations. In
mathematical morphology, sets stand in for the image's objects.For
instance, a binary image's black or white pixels collectively represent a
morphological description of the image. The sets in a binary image are the
3-D image domain's individuals with their integer elements. A 3 -D tuple
with the elements x and y coordinates is used to represent each element in
the image. The development of image -segmentation techniques with a
variety of applications can be based on mathematical morphology, and it
also plays a significant part in techniques for picture description.
5.3 STRUCTURING ELEMENTS
When applied to an image, mathematical morphology is a group of non -
linear procedures that can be used to eliminate details tha t are smaller than
a particular reference shape, known as a structuring element. With its
various shapes and sizes, the structural element in a morphological
operation plays a significant role. A number of 0s and 1s in the structural
elements determine sha pe and size. The resultant value is applied to the
circle in figure 1 known as the centre pixel. Depending on the user's
perspective, this circle can be located wherever in the structuring element.
Figure 1 displays some examples of structural element refe rences. munotes.in

Page 62


Digital Image Processing
62

Figure 1: Example of structuring element reference
5.4 MORPHOLOGICAL IMAGE PROCESSING
By shifting a structuring element so that it is centred over each pixel of the
binary picture to be edited at some time, morphological operations are
defined. A logical operation is carried out on the pixels covered by the
structuring element when it is centred over a specific area of the image,
producing a binary output. As seen in figure 2, the morphological image
processing resembles a convolution method. The structural element, like
the convolution kernel, can be any size and can include complements of 1s
and 0s.A specific logical action is carried out between the structuring
element and the underlying binary image at each pixel point. At that pixel
position i n the output image, the binary outcome of that logical operation
is saved. The structuring element's size, content, and type of logical
operation all influence the impact that is produced. The binary picture and
structural element sets could be defined in 1, 2, or even higher dimensions
rather than being limited to sets on the 2D plane. Do the logical operation
if the structuring element is precisely positioned on the binary image; else,
leave the generated binary image pixel alone.

Figure 2: Morphologica l image processing munotes.in

Page 63


Binary Image Processing
63 5.5 LOGICAL OPERATIONS
Logical operations includes AND, OR, NOT, EXOR operations, etc.
1] AND operation: The AND logic operation is similar to the intersection
operation of set theory and the AND operation of two images is shown in
Fig. 3. The AND logic operation computes the common area of images A
and B as a resultant image.

Figure 3: AND operation of images A and B
2] OR operation: The OR operation is similar to the union operation and
the OR operation result of images A and B is sho wn in figure 4.

Figure 4: OR operation of images A and B
3] NOT operation: The NOT operation is similar to the complement
function of set theory. In the NOT operation, the image pixel 1 is changed
to 0 and vice versa. The NOT operation of Image A is show n in figure 5.

Figure 5: NOT operation of image A
4] EXOR operation: In the XOR operation, if the pixels of Image A and
Image B are complementary to each other than the resultant image pixel is
black, otherwise the resultant image pixel is white. The XOR operation of
images A and B as shown in figure 6. munotes.in

Page 64


Digital Image Processing
64

Figure 6: EXOR operation of images A and B
5.6 MORPHOLOGICAL OPERATIONS
Dilatation and erosion are the two primary morphological processes. They
can be stated using a kernel that operates on a binary inp ut image called X,
where white pixels signify uniform regions and black pixels signify region
boundaries.By translating a structural element, B, over the image points
and looking at the intersection of the translated kernel coordinates and
image coordinate s, erosion and dilation operate conceptually.
5.6.1 Dilation
The binary picture is expanded from its original shape during the dilation
process. The structural element determines how the binary picture is
enlarged. Compared to the image itself, this struc turing element is smaller
in size; typically, the structuring element is 3 × 3. The structuring element
is reflected and shifted from left to right and from top to bottom
throughout the dilation process, similar to the convolution process. During
each shif t, the procedure searches for any overlapping identical pixels
between the structuring element and that of the binary picture. The pixels
under the centre location of the structuring element will be set to 1 or
black if there is an overlap.
Let us define X as the reference image and B as the structuring element.
The dilation operation is defined by

where Bˆ is the image B rotated about the origin. Equation 10.7 states that
when the image X is dilated by the structuring element B, the outcome
element z wou ld be that there will be at least one element in B that
intersects with an element in X. If this is the case, the position where the
structuring element is being centred on the image will be ‘ON’. This
process is illustrated in figure 7. The black square r epresents 1 and the
white square represents 0.
Initially, the centre of the structuring element is aligned at position *. At
this point, there is no overlapping between the black squares of B and the
black squares of X; hence at position * the square will remain white. This
structuring element will then be shifted towards right. At position **, we
find that one of the black squares of B is overlapping or intersecting with
the black square of X. Thus, at position ** the square will be changed to
black. Simil arly, the structuring element B is shifted from left to right and munotes.in

Page 65


Binary Image Processing
65 from top to bottom on the image X to yield the dilated image as shown in
figure 7.
The dilation is an expansion operator that enlarges binary objects. Dilation
has many uses, but the major o ne is bridging gaps in an image, due to the
fact that B is expanding the features of X.

Figure 7: Dilation process
5.6.2 Erosion
Dilation's opposite process is erosion. Erosion decreases an image if
dilation enlarges it. The structural element determines how the image is
reduced. With a 3x3 size, the structural element is typically smaller than
the image. When compared to greater structuring -element sizes, this will
guarantee faster computation times. The erosion process will shift the
structural element from left to right and top to bottom, almost identical to
the dilatation process. The method will check whether there is a complete
overlap with the structuring element at the centre location, denoted by the
centre of the structuring element, or not.If the re is no complete
overlapping then the centre pixel indicated by the centre of the structuring
element will be set white or 0.
Let us define X as the reference binary image and B as the structuring
element. Erosion is defined by the equation

The above e quation states that the outcome element z is considered only
when the structuring element is a subset or equal to the binary image X.
This process is depicted in figure 8. Again, the white square indicates 0
and the black square indicates 1.
The erosion pr ocess starts at position *. Here, there is no complete
overlapping, and so the pixel at the position * will remain white. The
structuring element is then shifted to the right and the same condition is
observed. At position **, complete overlapping is not p resent; thus, the
black square marked with ** will be turned to white. The structuring
element is then shifted further until its centre reaches the position marked
by ***. Here, we see that the overlapping is complete, that is, all the black
squares in the structuring element overlap with the black squares in the munotes.in

Page 66


Digital Image Processing
66 image. Hence, the centre of the structuring element corresponding to the
image will be black.
Erosion is a thinning operator that shrinks an image. By applying erosion
to an image, narrow regions c an be eliminated, while wider ones are
thinned.

Figure 8: Erosion process
5.6.3 Dilation and erosion -based operations
Erosion and dilation can be combined to solve specific filtering tasks. The
most widely used combinations are opening and closing.
1] Op ening:Opening is based on the morphological operations, erosion
and dilation. Opening breaks up thin strips, smoothes the interior of object
contours, and removes thin image areas. It involves applying erosion and
dilation procedures to the image in that o rder.
The opening procedure is used to clean up CCD flaws and image noise.
By rounding corners from inside the object where the kernel utilizes fits,
the opening filters details and simplifies images. The mathematical
representation of the opening procedur e is

where X is an input image and B is a structuring element.
2] Closing:The opposite of the opening process is the shutting operation. It
is a process of first dilatation and then erosion. A single -pixel object's
closure process fills in the tiny imper fections and spaces. It has the same
result as an opening operation in that it retains object forms and sizes and
smooths out features.
The mathematical representation of the closing procedure is

where X is an input image and B is the structuring element . Closing
protects coarse structures, closes small gaps and rounds off concave
corners.
munotes.in

Page 67


Binary Image Processing
67 5.7 DISTANCE TRANSFORM
An operator called the distance transform is often exclusively used with
binary pictures. Each feature pixel in a binary image is given a value by
the distance transform that is equal to its distance from the closest non -
feature pixels. It can be used to gather details on the form and placement
of the foreground pixels in relation to one another. It has been used in
numerous scientific areas, incl uding surface reconstruction, pattern
recognition, and medical imaging. When comparing binary images, the
distance transforms are crucial, especially for images produced by local
feature -detection methods like edge or corner detection.
Depending on how the distance is calculated, there are various types of
distance transforms. The Gaussian distance transform, city block,
chessboard, and Euclidean distance can all be used to calculate the
distance between pixels. The following is a description of the distanc e
metrics utilized.
5.7.1 Euclidean distance
The value at a pixel in the Euclidean distance transform is directly
proportional to the distance in Euclidean terms between that pixel and the
object pixel that is closest to it. The procedure is noise -sensitiv e because it
determines the value at the pixel of interest using the value at a single
object pixel.
The straight -line distance between two pixels is given by

The input image and its Euclidean distance transform are shown in the
figure 9.

Figure 9: Euc lidean distance transform
5.7.2 City -Block distance
The city -block distance of two pixels

This metric measures the path between the pixels based on a four -
connected neighbourhood, as shown in figure 10. For a pixel ‘p’ with the
coordinates (x, y), the se t of pixels given by, munotes.in

Page 68


Digital Image Processing
68

Above equation is called its four neighbours, as shown in Fig. 10. The
input and its city block distance are shown in Fig. 11.

Figure 10: Four -connected neighbours

Figure 11: City block distance transform
5.7.3 Chessboard distan ce
The chessboard distance of two pixels is given by

The chessboard distance metric measures the path between the pixels
based on an eight -connected neighbourhood, as shown in figure 13. The
set of eight neighbour pixels is defined as

The above equatio n is called the eight connected neighbours as shown in
Fig. 12. The chessboard distance of the image and the original image is
shown in Fig. 13.

Figure 12: Eight connected neighbours munotes.in

Page 69


Binary Image Processing
69

Figure 13: Chessboard distance transform
5.7.4 Quasi - Euclidean dista nce transform
The quasi -Euclidean metric measures the total Euclidean distance along a
set of horizontal, vertical and diagonal line segments. The quasi -Euclidean
distance of two pixels is given by

The quasi -Euclidean distance transform of the image is s hown in Figure
14

Figure 14: Quasi -Euclidean distance transform
5.8 SUMMARY
 The principle behind morphological image processing is to probe an
image with a tiny form or pattern called a structuring element. To
carry out a certain activity, structural ele ments of various shapes may
be used.
 Dilatation and erosion are the two primary morphological processes.
 By successively moving the structuring element's origin to all feasible
pixel locations throughout the dilation procedure, the image is probed
with the structuring element; the result is 1 if the structuring element
and the picture have a non -zero intersection and 0 otherwise.
 The erosion procedure involves probing the image with the structuring
element by successively moving the structuring element's or igin to all munotes.in

Page 70


Digital Image Processing
70 potential pixel locations. If the structuring element is entirely
contained in the image, the result is 1, otherwise it is 0.
 Opening and closing morphological processes are morphological
filters that eliminate undesirable elements from an imag e.
5.9 LIST OF REFERENCES
1] Digital Image Processing, S Jayaraman, S Esakkirajan, T
Veerakumar,Tata McGraw -Hill Education Pvt. Ltd., 2009.
2] Digital Image Processing 3rd Edition, Rafael C Gonzalez, Richard E
Woods, Pearson, 2008.
3] Scilab Textbook Com panion for Digital Image Processing, S.
Jayaraman, S. Esakkirajanand T. Veerakumar, 2016
(https://scilab.in/textbook_companion/generate_book/125).
5.10 UNIT END EXERCISES
1] Explain the mathematical morphology.
2] What do you mean bystructuring elements?
3] Write a note on Morphological image processing.
4] Explain thelogical operations.
5] State and describe the Morphological operations.
6] What is Dilation?
7]What is Erosion?
8] Describe the Dilation and Erosion -based operations.
9] Write a note on Dist ance Transform.
10] Explain Euclidean distance.
11] Describe City -Block distance.
12] What is Chessboard distance?
13] Explain Quasi - Euclidean distance transform.

munotes.in

Page 71

71 6
COLOUR IMAGE PROCESSING
Unit Structure
6.0 Objectives
6.1 Introduction
6.2 Formation of colour
6.3 Human perception of colour
6.4 Colour Model
6.4.1 RGB colour model
6.4.2 CMY colour model
6.4.3 HIS colour model
6.4.4 YIQ colour model
6.4.5 YCbCr colou r coordinate
6.5 Colour image quantization
6.5.1 Classification of colour quantisation technique
6.6 Histogram of a colour image
6.6.1 Histogram equalization of a colour image
6.7 Summary
6.8 List of References
6.9 Unit End Exercises
6.0 OBJECTIVES
 To u nderstand the human perception and concepts related to colour
image processing

 To learn different colour models

 To get familiar with colour histogram equalization
6.1 INTRODUCTION
The perception of colour is the result of incoming visible light on the
retina. The most visually arresting aspect of every image is its colour,
which also has a big impact on how scenically beautiful it is. munotes.in

Page 72


Digital Image Processing
72 Understanding the nature of light is required in order to comprehend
colour. Light has two distinct natures. It exhibits wav e and particle
characteristics. After striking the retina, photons cause electric impulses
that, once they reach the brain, are interpreted into colour. Light waves
with various wavelengths are seen as having various colours.The human
eye cannot, however, see every wavelength. The visible spectrum is made
up of wavelengths between 380 and 780 nanometers.
6.2 FORMATION OF COLOUR
A variety of physiochemical processes result in the colours that people see
every day.Some things act as light sources, while other s do little more than
reflect incident light. Fundamentally, there are three different ways that
colours are formed: three processes: additive processing, subtractive
processing andpigmentation.
1] Additive colour formation
The spectral distributions for t wo or more light rays are added in additive
colour creation.The total amount of photons in the same range that are
present in all of the component colours makes up the final colour.TV
monitors use the additive color -formation principle.
2] Subtractive colo ur formation
When light is transmitted through a light filter, subtractive colour creation
happens. A light filter partially transmits and partially absorbs the light
that it receives. For instance, a green filter allows radiation from the green
portion of the spectrum to pass while blocking radiation from other
wavelengths. The resultant colour is made up of the wavelengths that can
pass through each filter when used in sequence. The projection of colour
slides onto a screen result in subtractive colour ge neration.
3] Colour formation by pigments
Colored particles suspended in a liquid make up a pigment. The light that
strikes these particles can be absorbed or reflected by them. A light beam
is scattered by the particles as it encounters a surface covered in pigment,
with consecutive and simultaneous actions of reflection, transmission, and
absorption. These occurrences define the type (colour) of light that the
surface reflects. One can see colours in a painting thanks to colour
pigmentation.
6.3 HUMAN PER CEPTION OF COLOUR
The 400 to 700 nm range of the light spectrum is detectable by the human
visual system. The two most crucial senses through which humans
perceive their environment are sight and hearing. According to estimates,
humans get 75% of their inf ormation visually. Perception, sight, or
comprehension are terms used to describe the use of visual information.
The idea for creating analogous paradigms for computers came from an
understanding of human perceptual processing capabilities. A brief munotes.in

Page 73


Colour Image Processing
73 explana tion of the anatomy of the human visual system is the best way to
introduce the intricate aspects of visual perception.The Human Visual
System (HVS) is a system for processing information that receives
spatially sampled images from the cones and rods in th e eye and infers the
characteristics of the objects it sees. Cones are the photoreceptors in
charge of colour vision. The retina contains three different kinds of cones.
They are
i) L-receptors which are sensitive to light of long wavelengths
ii) M-receptors whi ch are sensitive to light of middle wavelengths
iii) S-receptors which are sensitive to light of short wavelengths
The RGB sensors are often identified by the Greek letters rho (red),
gamma (green), and beta (blue). Fig. 1 displays the sensitivity curves for
rho, gamma, and beta.

Figure 1: Sensitivity of rho, gamma and beta curves
The strength of the colours one receives for each of the wavelengths in the
visual spectrum is determined by the sensitivity curves of the rho, gamma,
and beta sensors in the human eye. The visual system can create the
perception of colour by selective sensing of various light wavelengths.The
sensitivity of light or the lighting of things affects colour perception as
well. Cones are not activated at very low illumination levels. Yet, even
with very little illumination, rods are triggered.
6.4 COLOUR MODEL
By specifying a 3D coordinate system and a subspace that encompasses all
constructible colours within a specific model, colour models offer a
uniform approach to identify a specific colour. A model can be used to
specify any colour. Each colour model is tailored for a particular piece of
software or image -processing tool.
6.4.1 RGB colour model
In an RGB colour model, the three primary colours red, green and blue
form the axis of the cube which is illustrated in Fig. 2. Each point in the
cube represents a specific colour. This model is good for setting the
electron gun for a CRT. munotes.in

Page 74


Digital Image Processing
74

Figure 2: RGB colour cube
RGB is an additive colour model. From Fig. 3, it is obvious that
Magenta = Red + Blue
Yellow = Red + Green
Cyan = Blue + Green

Figure 3: RGB colour model
Limitations of RGB model:
(i) The RGB colour coordinates depend on the device. This suggests
that the RGB model will generally fail to accurately replicate the
same colour on diff erent displays.
(ii) The RGB model is not consistently perceived. The implication is that
in all regions of the colour space, a unit of coordinate distance does
not equate to the same perceived colour difference.
(iii) Because this model is based on device signals rather than display
brightness values, it is challenging to relate it to the look of colours.
munotes.in

Page 75


Colour Image Processing
75 6.4.2 CMY colour model
Printers produce an image by reflective light, which is basically a
subtractive process. Printers commonly employ the CMY model. The
CMY model cube is illustrated in Fig. 4. The conversion between RGB
and CMY is done as follows
C = 1 -R
M = 1 -G
Y = 1 -B

Figure 4: CMY colour cube
CMY is a subtractive colour model. From Fig. 5, it is obvious that
Magenta = White – Green
Cyan = White – Red
Yellow = White –Blue

Figure 5: CMY colour model
munotes.in

Page 76


Digital Image Processing
76 6.4.3 HIS colour model
Hue, Saturation, and Intensity is referred to as HSI. The observer's
perceived main colour is represented by hue. It is a characteristic
connected to the predominant wavelengt h. The term "saturation" describes
the degree of purity or the quantity of white light combined with a hue.
Brightness is reflected in intensity. As hue and saturation correspond to
how colours are seen by humans, HSI decouples colour information from
intensity information, making this representation particularly helpful for
creating image -processing algorithms. HIS colour space is well -liked
because it is based on how people see colour.
The conversion from RGB space to HSI space is given below:

The HIS colour model can be considered as a cylinder, where the
coordinates r, θ, and z are saturation, hue and intensity respectively. The
cylindrical representation of the HIS colour model is illustrated in Fig 6.

Figure 6: Cylindrical representation of HIS co lour model
6.4.4 YIQ colour model
The YIQ colour model is defined by the National Television System
Committee (NTSC). In this model, Y represents the luminance; and I and
Q describe the chrominance. Conversion between RGB and YIQ is given
below munotes.in

Page 77


Colour Image Processing
77

6.4.5 YCb Cr colour coordinate
The YCbCr colour coordinate was developed as part of the ITU -R BT.601
during the establishment of a worldwide digital video component standard.
The YCbCr signals are scaled and offset versions of the YIQ format. Y is
defined to have a nominal range of 16 to 235; Cb and Cr are defined to
have a range of 16 to 240, with zero signal corresponding to level 128.
There are several YCbCr formats such as 4:4:4, 4:2:2 and 4:1:1. The
sampling format implies that the sampling rates of Cb and Cr ar e one -half
of that of Y.
6.5 COLOUR IMAGE QUANTISATION
The process of dividing the original colour space into larger cells in order
to reduce the size of a colour space is known as colour quantization. The
method of color -image quantization involves loweri ng the number of
colours in a digital colour image. A lossy image compression operation is
essentially what color -image quantization is. There are two main processes
in the process of quantizing colour images:
(i) Choose a small number of colours from the available combinations of
red, green, and blue to create a colour map.
(ii) Assigning a colour from the colour map to each colour pixel in the
colour image.
The basic goal of colour image quantization is to transfer the original
colour image's set of colou rs to the quantized image's significantly smaller
range of colours. A colour palette is a more condensed collection of
exemplary hues. The discrepancy between the original and the quantized
images should be as little as possible after the mapping.
6.5.1 Cl assification of colour quantisation technique
Colour quantisation, in general, can be classified into (i) uniform colour
quantisation, and (ii) non -uniform colour quantisation
1] Uniform colour quantisation
The simplest colour quantization technique is uni form quantization. Each
colour cell in uniform colour quantization represents a cube whose edges
are parallel to the colour axes and each cell is the same size. Fig. 7
provides a visual picture of uniform colour quantization.

munotes.in

Page 78


Digital Image Processing
78 The formula for the uniform colour quantization scheme is


Figure 7: Representation of uniform quantisation in two dimensions
2] Non -uniform colour quantisation
By utilizing non -uniform quantization, different thresholds are used on
each colour axis individually to split the colo ur space. Based on an
investigation of the initial distribution in the colour space, the thresholds
are determined. Fig. 8 provides a visual illustration of non -uniform
quantization.
Non-uniform quantization over various thresholds is mathematically
repres ented as follows
munotes.in

Page 79


Colour Image Processing
79

Figure 8:Representation of non -uniform quantisation in two dimensions
6.6 HISTOGRAM OF A COLOUR IMAGE
The histogram of the image displays the grey level's occurrence frequency.
It provides the frequency with which a specific grey leve l appears in the
image. The histogram is anticipated to provide the frequency with which a
specific colour appears in a colour image.
6.6.1 Histogram equalization of a colour image
A colour image's histogram equalisation is done by first converting the
RGB image to the YIQ(NTSC) format. The Y component alone is then
subjected to histogram equalisation. The I and Q parts are unaltered. The
equalised Y, unmodified I, and Q components are then converted to the
RGB format in the histogram.
6.7 SUMMARY
 Between 4 00 nm (blue) and 700 nm is the range of visible light (red).
 Three different types of color -sensitive pigments, each responding to a
different spectrum of wavelengths, are present in cones.
 The tristimulus model of colour vision, which is commonly used to
describe how the responses of these many sensors combine to give us
colour perception.
 Red, green, and blue are replaced by the three primaries X, Y, and Z in
CIE.
 The Y component was selected to match luminous efficacy;Y=
luminance X, Z = chromaticity
 The colour space spanned by a set of primary colours is called a colour
gamut.
 The different colour models are RGB, CMY, YCbCr, YIQ, etc.
munotes.in

Page 80


Digital Image Processing
80  The RGB model is used most often for additive models. In an RGB
model, adding a colour and its complement create white .
 The most common subtractive model is the CMY model. In a CMY
model, adding a subtractive colour and its complement create black.
 Each pixel of the colour image will have three values, one each for the
red, green and blue components.
 The histogram of a colour image can be considered to consist of three
separate one -dimensional histograms, one for each tricolour
component.
6.8 LIST OF REFERENCES
1) Digital Image Processing, S Jayaraman, S Esakkirajan, T Veerakumar,Tata McGraw -Hill Education Pvt. Ltd., 2 009.
2) Digital Image Processing 3rd Edition, Rafael C Gonzalez, Richard E
Woods, Pearson, 2008.
3) Scilab Textbook Companion for Digital Image Processing, S.
Jayaraman, S. Esakkirajan and T. Veerakumar, 2016
(https://scilab.in/textbook_companion/generate _book/125).
6.9 UNIT END EXERCISES
1) Explain the formation of colour.
2) Describe the human perception of colour.
3) Discuss different colour models.
4) Write a note on colour image quantization.
5) Explain the classification of colour quantisation techn ique.
6) Write a note on Histogram of a colour image.
7) Describe the Histogram equalization of a colour image.

munotes.in

Page 81

81 7
IMAGE SEGMENTATION
Unit Structure
7.0 Objectives
7.1 Introduction
7.2 Image segmentation techniques
7.3 Region approach
7.3.1 Region growing
7.3.2 Region splitting
7.3.3 Region splitting and merging
7.4 Clustering techniques
7.4.1 Hierarchical clusteri ng
7.4.2 Partitional clustering
7.4.3 k -means clustering
7.4.4 Fuzzy clustering
7.5 Thresholding
7.5.1 Global Thresholding
7.5.2 Adaptive Thresholding
7.5.3 Histogram -based threshold selection
7.6 Edge -based segmentation
7.7 Edge detection
7.7.1 Gradient o perator
7.7.2 Edge detection using first order derivates
7.7.3 Roberts kernel
7.7.4 Prewitt kernel
7.7.5 Sobel kernel
7.7.6 Frei -Chen edge detector
7.7.7 Second derivative method of detecting edges in an image munotes.in

Page 82


Digital Image Processing
82 7.7.8 Laplacian of gaussian (LOG)
7.7.9 Diffe rence of gaussian filters (DoG)
7.7.10 Canny edge detectors
7.8 Edge Linking
7.9 Hough Transform
7.10 Summary
7.11 List of References
7.12 Unit End Exercises
7.0 OBJECTIVES
 To separate or combine different image elements that can be used to
develop objects of interest that can be the subject of investigation and
interpretation.

 To get familiar with different segmentation techniques along with their
applicability
7.1 INTRODUCTION
Image segmentation is the division of a picture into groups of pixels that
are homogeneous in terms of a particular criterion. Adjacent groupings
must be diversified and different groups must not intersect. Instead,then
being pixel -oriented, segmentation algorithms are area -oriented. The
division of the image into linked regions i s the outcome of segmentation.
Therefore, segmentation is the process of splitting an image into useful
sections.
7.2 IMAGE SEGMENTATION TECHNIQUES
Image segmentation is classified into two categories:Local segmentation
and Global segmentation
i] Localsegm entation : Local segmentation divides sub -images, which are
tiny windows on a larger image, into separate segments. Local
segmentation has a lot fewer pixels accessible than global segmentation.
Local segmentation needs to use pixel data sparingly.
ii] Glob al segmentation :Segmenting an entire image is what global
segmentation is all about. The majority of global segmentation's work
involves segments with a sizable number of pixels. This strengthens the
reliability of estimated parameter values for global seg ments.

munotes.in

Page 83


Image Segmentation
83 7.3 REGION APPROACH
A region in an image is a collection of related pixels that are joined
together. The region technique assigns each pixel to a specific object or
region. The goal of the boundary technique is to identify and pinpoint the
actual re gional borders. The edges are first found in the edge approach,
and then they are connected to create the necessary boundaries.
7.3.1 Region growing
An approach to picture segmentation known as "region growing" involves
looking at nearby pixels and adding them to a region class if no edges are
found. Each border pixel in the area goes through this process one more
time. If neighbouring regions are discovered, a region -merging technique
that preserves strong edges while dissolving weak edges is employed.
Starting with a seed is necessary for region growing. The seed could be a
single pixel, but ideally it would be a region. By including as many nearby
pixels as feasible that match the homogeneity requirement, a new segment
is formed from the seed. After then, the resulting section is dropped from
the procedure. The remaining pixels are used to choose a new seed. This
keeps happening until every pixel has been assigned to a segment. The
settings for each segment must be updated when pixels are gathered. The
initial seed selected and the sequence in which neighbouring pixels are
evaluated may have a significant impact on the segmentation that results.
The choice of homogeneity criteria in image growth depends not only on
the issue at hand but also on the kind of image that needs to be divided
into segments. The criteria used to determine whether a pixel should be
included in the area or not, the connectivity type used to identify
neighbours, and the approach taken to visit neighbouring pixels all affect
how region s evolve.
Comparing region -growing to traditional segmentation methods has
various benefits. The boundaries of the growing zones are incredibly
intertwined and narrow. In terms of noise, the algorithm is likewise highly
steady.
7.3.2 Region splitting
Top-down strategies involve region division. Starting with a complete
image, it is divided up so that the divided portions are more homogeneous
than the entire. Since splitting substantially restricts the forms of
segments, splitting alone is insufficient for a cceptable segmentation.
Therefore, the split -and-merge algorithm, splitting followed by a merging
phaseis always preferred.
7.3.3 Region splitting and merging
An image -segmentation method that takes spatial information into account
is region splitting and m erging. The following is the region -splitting -and-
merging method: munotes.in

Page 84


Digital Image Processing
84  Splitting
1]Let R stand for the full image in split 1.
2] Split or partition the image into successively smaller quadrant portions
using the predicate P.
As seen in figure 1, a quadtree structure provides a practical way to
visualize the splitting method. The root of a quadtree represents the
complete image, and each node represents a subdivision.

Figure 1: (a) Splitting of an image (b) Representation by a quadtree
It is probable that c ontiguous zones with the same characteristics will be
found in the final partition. Applying merging and only merging nearby
regions whose combined pixels fulfil the predicate P will address this
flaw.
 Merging
Any adjacent regions that are similar enough s hould be merged. The split -
and-merge algorithm's process is described below:
1. Commence with the entire picture.
2. Divide the variation into quadrants if it is too large.
3. Combine any nearby locations that are sufficiently comparable.
4. Continue itera tively repeating procedures (2) and (3) until no more
splitting or merging takes place.
The image that has been separated and combined is shown in Figure 1(a).
Figure 1(b) depicts the image's quadtree representation.
7.4 CLUSTERING TECHNIQUES
By grouping t he patterns into clusters or groups so that they are more
similar to one another than to patterns belonging to different clusters, the
clustering technique aims to access the links between patterns of the data
set. In other words, clustering is the divisio n of things into groups based
on shared characteristics. The clustering technique makes an effort to
extract a feature vector from the image's localised regions. Each pixel is munotes.in

Page 85


Image Segmentation
85 given the class of the closest cluster mean as part of the conventional
clusteri ng process. The two types of clustering techniques are partitional
and hierarchical. There are numerous types of clustering methods within
each category.
7.4.1 Hierarchical clustering
A proximity matrix is used in hierarchical clustering algorithms to show
how similar each pair of data points is to the others. The outcome is a tree
of clusters that displays the nested group of patterns and the similarity
thresholds at which groups shift. While the root node is set aside for the
full dataset and the leaf nod es are for individual data samples, the resulting
clusters are always formed as the internal nodes of the tree. The
procedures used to combine two small clusters or divide a large cluster
vary depending on the clustering techniques.
The agglomerative and d ivisive algorithm categories are the two main
types employed in the hierarchical clustering framework. Agglomerative
algorithms start with N singlepoint clusters and attempt to combine them
into larger and larger clusters. There are three categories into w hich the
algorithm can be subdivided: (1) single -link algorithm, (2) complete -link
algorithm, and (3) minimum -variance algorithm. According to the shortest
distance between data samples from two clusters, the single -link algorithm
combines two clusters. In light of this, the algorithm permits a propensity
to create clusters with elongated shapes. The complete -link approach,
however, consistently yields compact clusters despite incorporating the
maximum distance between data samples in clusters.The definitio n of the
dissimilarity measurement between two groups affects the quality of
hierarchical clustering. By minimising the cost function and creating a
new cluster with the least amount of cost function rise, the minimum -
variance algorithm joins two clusters. This approach, known as the
pairwise -nearest -neighbor algorithm, has generated a lot of attention in
vector quantization.
Divide and conquer clustering starts with the entire dataset in one cluster
and splits it repeatedly until single -point clusters are found on leaf nodes.
In order to combat agglomerative clustering, it employs a reverse
clustering approach. The divisive algorithm performs an exhaustive search
for all pairings of clusters for data samples on each node.
Hierarchical algorithms include COB WEB, CURE, and CHAMELEON.
7.4.2 Partitional clustering
The iterative optimization process used in partition -based clustering tries
to minimize an objective function f that gauges the effectiveness of
clustering. The division of each pattern to the nearest cluster and the
computing of the cluster centroids are the two learning processes that
make up partition -based clustering.Partition -based clustering frequently
begin with an initial solution that contains a predetermined number of
clusters. The optimality criterion is typically used to compute the cluster
centroids so that the objective function is minimized. The two types of munotes.in

Page 86


Digital Image Processing
86 partitional algorithms are density -based partitioning and partitioning
relocation techniques.The first category of algorithms is furt her divided
into probabilistic clustering, K -medoids, and K -means. Density -based
partitioning, the second category of partitional algorithms, includes
DBSCAN, OPTICS DBCLAS, and DENCLUE.Compared to hierarchical
clustering techniques, partitional clustering techniques, such K -means
clustering and ISODATA, have the advantage of maximizing some
criterion function.
7.4.3 k -means clustering
The simplest method for unsupervised classification is the K -means
approach. There is no need for training data for the clu stering methods.
Iterative processes such as K -means clustering are used. The K -means
clustering technique segments the image by categorizing each pixel in the
class with the nearest mean, iteratively computing a mean intensity for
each class, and clusteri ng the data. The K -means clustering algorithm's
steps are listed below:
1] Choose K initial clusters z 1(l), z 2(l), ………. z k(l).
2]Distribute the samples x among the K clusters at the kth iterative step
using the connection

For i = 1, 2, …, K, i ≠ j, where C j(k) denotes the set of samples whose
cluster centre is z j(k).
3] Determine the new cluster centresz j(k + 1), j = 1, 2,..., K, with the goal
of minimizing the sum of the squared distances between all locations in C j
(k) an d the new cluster. The sample mean of C j (k) is the measurement
that minimizes this. Consequently, the new cluster centre is provided by

where N j is the number of samples in C j (k)
4] If z j (k + 1), j = 1, 2, ……, K, the algorithm has converged and the
procedure is terminated. Otherwise go to Step 2
The K -means algorithm's disadvantage is that, once K is selected, the
number of clusters is fixed and it always returns K cluster centres.
7.4.4 Fuzzy clustering
Depending on whether a pattern data belongs only to one cluster or to
numerous clusters with different degrees, clustering methods can be
categorized as either hard or fuzzy. In contrast to fuzzy clustering, which
assigns a value between zero and one to each pattern, hard clustering munotes.in

Page 87


Image Segmentation
87 assigns a membership value of zero or one to each pattern's data. Since
they can more accurately depict the relationship between the input pattern
data and clusters, fuzzy clustering algorithms can generally be regarded as
being superior to those of their hard counterparts. F uzzy clustering takes
advantage of the fact that each pattern has some graded membership in
each cluster in order to reduce a heuristic global cost function. The
clustering criterion permits numerous cluster assignments for each pattern.
The fuzzy K -means algorithm employs a gradient descent method to
estimate the class membership function and iteratively update the cluster
centroid.
Dissimilarity function
The dissimilarity function is a crucial parameter for determining how
different data patterns are from one another. The diversity and size of the
characteristics contained in patterns necessitates careful selection of the
dissimilarity functions. Different clustering outcomes are produced by
using various clustering dissimilarities. The Minkowski metricis a popular
way to express how different two data vectors are which is given by,

p =1 means the L1 distance or Manhattan distance
p =2 means the Euclidean distance or L2 distance
Below are the pre requisite of a distance function that must be fulfilled b y
a metric

The triangle inequality, a stricter requirement for the definition of metric,
is defined as follows:

The case with l = ∞ in Minkowski metric comes to the maximum metric:

These measurements are translation -invariant but scaling -variant. Th e
Canberra metric is one of the scale -invariant differences given by

munotes.in

Page 88


Digital Image Processing
88 The cosine similarity is given by

The dot in this instance is the L2 norm in Euclidean space and indicates
the inner product of two vectors. The Mahalanobis distanceis another
method for generalizing the Euclidean distance to the scale invariant
situation and is expressed as

Where Σ: covariance matrix with respect to the entire data set

When using a model -based clustering technique like the EM clustering
algorithm, the Mahalanobis distance is used as a measure of dissimilarity.
When used within the context of K -means cluster ing, the Mahalanobis
distance application is always hampered by the singularity problem and a
high computational cost.
7.5 THRESHOLDING
Segments made using thresholding techniques have pixels with
comparable intensity. Setting limits in photos with solid o bjects resting on
a contrasting background using thresholding is a valuable method. There
are many segmentation techniques based on grey levels that use either
global or local image data. The thresholding approach requires a
background with a varied intens ity level and an object with uniform
intensity. Such a picture can be divided into two areas using
straightforward thresholding.
7.5.1 Global Thresholding
The simplest and most used segmentation technique is global thresholding.
In global thresholding, the following stipulation is imposed along with a
threshold value of θ:

The equation mentioned above provides a detailed explanation of a
binarization technique but makes no mention of how to choose the
threshold parameter's value. It is necessary to choose the value of in the
best possible way. When pixels from distinct segments employ intensities
that overlap, global thresholding will suffer. If the cause is noise, a method
like the minimum -error method can be used to determine the parameters
of the underl ying cluster and select the thresholds that will minimize the munotes.in

Page 89


Image Segmentation
89 classification error. Variable thresholding could be utilized if the overlap
is brought on by differences in illumination across the image. One way to
see this is as a type of local segmentation .
7.5.2 Adaptive Thresholding
Global thresholding, also known as fixed thresholding, functions well
when the background is uneven but has a generally uniform grey level and
the objects of interest have an inner grey level that is reasonably
consistent. In many instances, the object contrast within an image varies
while the backdrop grey level is not consistent. In these circumstances, a
threshold that performs well in one section of the image could perform
poorly in other areas. In these situations, it is p ractical to employ a
threshold grey level that progressively changes depending on where it is in
the image.
7.5.3 Histogram -based threshold selection
The gray -level histogram of an image with an object against a contrasting
background is bimodal. The compa ratively large number of points inside
and outside the object are represented by the two peaks. The
comparatively few spots along the object's edge correlate to the dip
between the peaks. It is usual practice to determine the threshold grey
level using thi s dip.
The area function derivative for an object whose border is determined via
thresholding is the histogram

where D is the gray level, A(D) is the area of the object obtained by
thresholding at gray level D, and H(D) is the histogram
If D is close to the histogram dip, raising the threshold from D to D +ΔD
just slightly reduces the area. As a result, setting the threshold near the
histogram's dip reduces the area measurement's susceptibility to minute
threshold selection errors.
The histogram will be n oisy if the image that contains the object is noisy
and small. Unless the dip is exceptionally sharp, the noise can obscure or
at least make the placement unpredictable from one image to the next. By
applying a convolution filter or the curve -fitting metho d to smooth the
histogram, this effect can be somewhat mitigated.
7.6 EDGE -BASED SEGMENTATION
Edge -based segmentation uses the edges of a picture to extract spatial
information. Edges are when the homogeneity criteria for segments
discontinues. Local linea r gradient operators like Prewitt, Sobel, and
Laplacian filters are typically used for edge detection. These operators
perform best with images that have few noise sources and sharp edges.
Some edge connecting may be necessary since the boundaries identifi ed munotes.in

Page 90


Digital Image Processing
90 by these operators may not always consist of a collection of closed linked
curves.
7.7 EDGE DETECTION
Edge detection is a technique for spotting significant changes in an image.
One of the main functions of the lowest levels of image processing is edge
detection. The boundaries between various objects are often formed by the
locations where there are abrupt changes in brightness. By calculating
intensity differences in local picture regions, these locations can be
located. That is, an area with high sign als of change should be sought after
by the edge -detection algorithm. The majority of edge detectors measure
the intensity gradient at a specific location in the image.
7.7.1 Gradient operator
A gradient is a two -dimensional vector that points to the direc tion in
which the image intensity grows fastest. The gradient operator ∇ is given
by

If the operator ∇ is applied to the function f then

The gradient magnitude and gradient orientation are the two functions that
can be represented in terms of the direc tional derivatives. It is feasible to
determine the gradient's magnitude || ∇f|| and direction (∇f ).
The strength of the edge is determined by the gradient's magnitude, which
reveals how much the neighborhood's pixels differ from one another. The
gradient 's size is given by

The greatest rate of growth of f (x, y) per unit distance in the gradient
orientation of | ∇f| is determined by the gradient's size. munotes.in

Page 91


Image Segmentation
91 The gradient orientation indicates the direction of the biggest shift, which
is probably the edge -crossing direction. The gradient's direction is
determined by

where the angle is measured with respect to the x -axis.
7.7.2 Edge detection using first order derivates
A digital pixel grid's derivative can be described in terms of differences.
The first deriv ative of an image with gray -value pixels must meet the
following requirements: it must be non -zero at the start of a gray -level step
or ramp, and it must be non -zero along the ramp (constant change in grey
values). It must also be zero in flat segments, or in areas of constant gray -
level values. Using a one -dimensional function f (x), the first -order
derivative can be found by

A function of two variables, f (x, y), is an image. Only the partial
derivative along the x -axis is mentioned in the previous equa tion. It is
possible to identify pixel discontinuity in eight different directions,
including up, down, left, right, and along the four diagonals.
By estimating the finite difference, the second technique of obtaining the
first-order derivative is provide d:


The finite difference can be approximated as

Using the pixel coordinate notation and considering that j corresponds to
the direction of x, and i corresponds to the y direction, we have munotes.in

Page 92


Digital Image Processing
92

The majority of edge -detecting algorithms can be compared to gradient
estimators. The gradient computations must be estimated since the
gradient is a continuous function notion and the input signal (the image) is
a finite signal. Convolution is most frequently used in gradient calculation
because derivatives are sh ift-invariant and linear. For identifying edges,
numerous kernels have been presented.
7.7.3 Roberts kernel
The main goal is to calculate the differences between adjacent pixels. To
directly utilize{+ 1, -1} which calculates these differences, is one
techn ique to detect an edge. These are referred to as forward differences in
mathematics. In reality, the Roberts kernels are too small to accurately
locate edges when there is noise. The Roberts operator for a cross -
gradient method is the easiest approach to i mplement the first -order partial
derivative.

Implementing the partial derivatives shown above involves roughly
converting them to two 2x2 masks. The providers of the Roberts operator
masks are

These filters have the shortest aid, which results in a m ore precise
positioning of the edges, but the short support of the filters has the
drawback of being susceptible to noise.
7.7.4 Prewitt kernel
Judy Prewitt is honoured by the name of the Prewitt kernels. The concept
of central difference is the foundation of Prewitt kernels. In comparison to
the Roberts operator, the Prewitt edge detector is a considerably better
operator. Take a look at how the pixels are arranged around the core pixel
[i, j] in the illustration below: munotes.in

Page 93


Image Segmentation
93

The partial derivates of the Prewi tt operator are calculated as

In the above equations c is the constant thatsignifies the emphasis given to
pixels closer to the centre of the mask. G x and G y are the approximations
at [i, j].
Setting c = 1, the Prewitt operator mask is obtained as

The support is longer on the Prewitt masks. The edge detector is less
susceptible to noise because the Prewitt mask differentiates in one
direction and averages in the other.
7.7.5 Sobel kernel
The Sobel kernels bear Irwin Sobel's name. The Sobel kernel is ba sed on
central differences, however when averaging, it gives more weight to the
central pixels. The Sobel kernels can be viewed as 3x3 approximations to
Gaussian kernels' first derivatives. The Sobel operator's partial derivates
are calculated as

The Sob el masks in matrix form are given as

A Sobel mask has superior noise -suppression qualities compared to a
Prewitt mask.
munotes.in

Page 94


Digital Image Processing
94 7.7.6 Frei -Chen edge detector
By mapping the intensity vector with a linear transformation and then
identifying edges based on the ang le between the intensity vector and its
projection onto the edge subspace, Frei -Chen mask edge detection is
performed. The normalized weights allow for the realization of Frei -Chen
edge detection. Unique Frei -Chen masks are masks that include all of the
basis vectors. This suggests that the weighted sum of nine Frei -Chen
masks is used to represent a 3x3 image area. The image is initially
confounded with each of the nine masks. The convolution outcomes of
each mask are then combined to produce an innerproduc t. Following are
the nine Frei -Chen masks:

The first four Frei -Chen masks are used to represent edges, the next four
to represent lines, and the last mask to represent averages. The image is
projected onto the required masks for edge detection.
7.7.7 Second de rivative method of detecting edges in an image
Finding the ideal edge is comparable to determining the largest or least
derivative point. It is possible to determine a function's maximum or
minimum value by differentiating the provided function and looking for
locations where the derivative is zero. The second derivative is obtained
by differentiating the first derivative. It is analogous to looking for
locations where the second derivative is zero to locate the best edges.
Images can be processed using dif ferential operators, although zeros rarely
land perfectly on a pixel. They typically lie between pixels. By locating
the zero crossings, the zeros can be isolated. When one pixel is positive
and its neighbour is negative, this is known as a zero crossover. The
following are issues with zero -crossing methods:
 Zero crossing methods produce two -pixel thick edges.
 Zero crossing methods are extremely sensitive to noise.
For images, there is a single measure, similar to the gradient magnitude,
that measures the second derivative, which is obtained by taking the dot
product of ∇ with itself munotes.in

Page 95


Image Segmentation
95

The operator ∇⋅∇ = ∇2 is called the Laplacian operator.
The Laplacian operator when applied to the function f, we get

The following difference equations can be used to exp ress the Laplacian
operation:

The brightness values of each of the surrounding pixels are subtracted
from the central pixel via the Laplacian operator. The Laplacian's output is
non-zero when the neighbourhood has a discontinuity in the form of a
point, line, or edge. Depending on how close to the edge the central point
is, it could be either positive or negative.
Rotational invariance characterises the Laplacian operator. As long as they
are orthogonal, the Laplacian operator does not depend on the direc tion.
7.7.8 Laplacian of gaussian (LOG)
Noise in the input image is a significant factor in the Laplacian operator's
performance degradation. Prior performing edge enhancement, the image munotes.in

Page 96


Digital Image Processing
96 might be smoothed to reduce the effects of noise. The Laplacian -of-
Gaussian (LOG) operator smoothes the image via convolution with a
kernel that has a Gaussian form and is then applied after.
7.7.9 Difference of gaussian filters (DoG)
The DoG filter is obtained by taking the difference of two Gaussian
functions. The expres sion of a DoG filter is given by

7.7.10 Canny edge detectors
The fact that a Laplacian zero -crossing merely combines the principal
curvatures together makes it difficult to use as an edge detector. In other
words, it doesn't actually determine the gradie nt's maximum magnitude.
Edges are defined by the Canny edge detector as second derivative zero -
crossings in the direction of the largest first derivative. The Canny
operator performs their work in stages. The image is initially smoothed
using a Gaussian co nvolution. Then, to highlight the areas of the image
with high spatial derivatives, a 2D first derivative operator is used to the
smoothed image. In the image of the gradient magnitude, edges give birth
to ridges. Non -maximal suppression is the method by w hich the algorithm
tracks along the top of these ridges and sets to zero all pixels that are not
actually on the ridge top in order to produce a thin line in the output. The
two thresholds T1 and T2, with T1 > T2, regulate the hysteresis of the
tracking pr ocess. Only at a position on a ridge higher than T1 may
tracking start. From there, tracking continues in both directions until the
height of the ridge drops below T2. Hysteresis like this prevents noisy
edges from becoming many edge fragments. The three p arameters that
make up a Canny edge detector are (i) the width of the Gaussian kernel,
(ii) the higher threshold, and (iii) lower threshold used by the tracker
The detector's sensitivity to noise is decreased by widening the Gaussian
kernel, but some of th e image's finer details are lost in the process. As the
Gaussian width is raised, the localization inaccuracy in the identified
edges also significantly rises. The Canny edge detector uses gaussian
smoothing for two reasons. In addition to noise reduction, it can be used to
first regulate how much detail emerges in the edge image.
For optimal results, the upper tracking threshold is often set relatively high
and the lower threshold value is set quite low. If the lower threshold is set
too high, noisy edges will fragment. If the higher threshold is set too low,
more false and undesired edge fragments will be produced. munotes.in

Page 97


Image Segmentation
97 7.8 EDGE LINKING
One can threshold an edge -magnitude image and thin the resulting binary
image down to single pixel -wide closed, linked bounda ries if the edges are
reasonably strong and the noise level is low. However, such an edge
picture will contain gaps that need to be filled under less -than-ideal
circumstances. Small gaps can be filled by looking for other end points in
a neighbourhood of 5 by 5 pixels or greater, centred on the end point, and
then filling in border pixels as necessary to join them. This, however, has
the potential to oversegment images with several edge points.
 Edge linking using Heuristic Search Algorithm
The heuristic sea rch strategy to connect two edge points P and Q is
presented below if there is a gap between them:
A. Determine if P's neighbours qualify as the first step towards Q.
1. Name three P neighbours that are about in the same general area as Q.
2. For each of t he points from Step 1, compute the edge quality function
from P.
3. Decide on the option that improves edge quality from P to those
points.
4. Begin the subsequent iteration from the location established in Step 2.
5. Continue in this manner until you reac h point Q.
B. Establish and qualify the Path.
1. Evaluate the newly formed path's edge quality function against a
threshold.
2. Discard the freshly formed edge if it is insufficiently robust.
3. Use the resulting graph as the boundary if the newly formed e dge is
robust enough.
7.9 HOUGH TRANSFORM
The straight -line y = mx + b isrepresented in polar coordinate as
ρ = x cos (θ) + y sin (θ)
where ρ, θ defines a vector from the origin to the nearest point on the
straight line. This vector will be perpendicular f rom the origin to the
nearest point to the line as shown in figure: munotes.in

Page 98


Digital Image Processing
98

Figure 2: Straight line
A point in the 2D space determined by the parameters and corresponds to
any line in the x, y plane. So, a single point in the ρ, θ space is the Hough
transform of a straight line in the x, y space. Each straight line that
traverses a certain point in the x, y plane, x i and y i maps to a point in the ρ,
θ space, and these points should meet the aforementioned equation with x 1
and y 1 as constants. Any point in the x, y plane corresponds to a certain
sinusoidal curve in the, space, and as a result, the locus of all such lines in
the x, y space is a sinusoid in parameter ρ, θ space.
Suppose we have a set of edge points x i, yi that lie along a straight -line
having parameters ρ 0, θ0. Each edge point plots to a sinusoidal curve in the
ρ, θ space, but these curves must intersect at a point ρ 0, θ0since this is a
line, they all have common.
7.10 SUMMARY
 The goal of imag e segmentation is to divide an image into discernible,
essentially homogeneous sections.

 Global segmentation deals with segmenting an entire image, whereas
local segmentation focuses on segmenting sub -images, which are tiny
windows on a larger image.

 There are three methods for performing image segmentation: the
region approach, boundary approach, and edge approach.

 An approach to picture segmentation known as "region growing"
involves looking at nearby pixels and adding them to a region class if
no edg es are found. With a top -down strategy known as region
splitting, an entire image is split into portions that are more
homogeneous than the whole.

 By grouping the patterns into clusters or groups so that they are more
similar to one another than to patter ns belonging to different clusters,
the clustering technique aims to access the links between patterns of
the data set.

 By using a thresholding process, a picture can be segmented, and the
threshold value can be chosen using the image's histogram as a too l. munotes.in

Page 99


Image Segmentation
99  Edges are essentially discontinuities in the intensity of the image
brought on by adjustments to the image structure.

 It is possible to identify the edges in a picture using the Robert,
Prewitt, Sobel, and Frei -Chen kernels.
7.11 LIST OF REFERENCES
1) Digital Image Processing, S Jayaraman, S Esakkirajan,
T Veerakumar,Tata McGraw -Hill Education Pvt. Ltd., 2009.
2) Digital Image Processing 3rd Edition, Rafael C Gonzalez, Richard E
Woods, Pearson, 2008.
3) Scilab Textbook Companion for Digital Image Pro cessing,
S. Jayaraman, S. Esakkirajan and T. Veerakumar, 2016
(https://scilab.in/textbook_companion/generate_book/125).
7.12 UNIT END EXERCISES
1) What is content -based image retrieval?

2) What is an ‘edge’ in an image? On what mathematical operation are
the two basic approaches for edge detection based on?

3) What are the three stages of the Canny edge detector? Briefly explain
each phase.

4) Compare the Canny edge detector with the Laplacian of Gaussian edge
detector

5) Describe various edge detection methods.
6) What is edge linking?
7) Describe Hough transform.
8) Discuss on image segmentation techniques.
9) Describe the region -based approach.

munotes.in

Page 100

100 8
IMAGE COMPRESSION
Unit Structure
8.0 Objectives
8.1 Introduction
8.2 Need for image compression
8.3 Redundancy in images
8.4 Image -compression scheme
8.5 Fundamentals of Information Theory
8.5.1 Entropy and mutual information
8.5.2 Shannon’s source cod ing theorem
8.5.3 Rate -distortion theory
8.6 Run-length coding
8.6.1 1 -DRun -length coding
8.6.2 2 -DRun -length coding
8.7 Shannon -Fano coding
8.8 Huffman Coding
8.9 Arithmetic Coding
8.10 Transform -based compression
8.11 Summary
8.12 List of References
8.13 Unit End Exercises
8.0 OBJECTIVES
To understand and get familiar with the following concepts:
 need for image compression along with their metrics and techniques
 lossless and lossy image compression
 spatial domain and frequency domain

munotes.in

Page 101


Image Compression
101 8.1 INTRODUCTION
The demand for digital information has rapidly increased as a result of the
development of multimedia technologies over the past few decades. The
widespread use of digital photographs is largely due to technological
advancements. Applications like medical an d satellite imagery frequently
use still photos. Digital photos contain a tremendous quantity of data. As
digital images find more applications, image data size reduction for both
storage and transmission are becoming more and more crucial. A mapping
from a higher dimensional space to a lower dimensional space is what
image compression is. Many multimedia applications, including image
storage and transmission, heavily rely on image compression.The
fundamental objective of image compression is to represent a n image with
the fewest possible bits while maintaining acceptable image quality. All
image -compression methods aim to reduce the amount of data as much as
possible while removing statistical redundancy and taking advantage of
perceptual irrelevancy.
8.2 N EED FOR IMAGE COMPRESSION
The amount of information that computers handle has increased
tremendously over the past few decades thanks to advancements in
Internet, teleconferencing, multimedia, and high -definition television
technology. Consequently, a sign ificant issue with multimedia systems is
the storage and transmission of the digital image component. To exhibit
photographs with an acceptable level of quality, a tremendous amount of
data is needed. Large amounts of storage space and transmission
bandwid th are needed for high -quality image data, which the existing
technology cannot provide both technically and economically.
Compressing the data to save on storage space and transmission time is
one method that could work for this issue.
For instance, the a mount of storage space needed to hold a 1600 x 1200
colour photograph is
1200 ×1600 × 8 × 3 = 46,080,000 bits
= 5,760,000 bytes
= 5.76 Mbytes
1.44 Mb is the maximum amount of space on a floppy disc. The maximum
space available if we have three floppies is 1.44 x 4 = 5.76 Mbytes. In
other words, a 1600x1200 RGB image requires at least four floppies to
store.
Every year, the amount of data carried across the Internet doubles, and a
sizable chunk of that data is made up of photographs. Any given device
will ex perience significant cost savings and become more accessible to use
if its bandwidth requirements are reduced. photos can be represented more
compactly through the use of image compression, allowing for speedier
transmission and storage of photos. munotes.in

Page 102


Digital Image Processing
102 8.3 REDU NDANCY IN IMAGES
Because images are typically highly coherent and contain redundant
information, image compression is achievable. Redundancy and
irrelevancy reduction help in compression. Duplication is referred to as
redundancy, while irrelevancy is the p ortion of an image's information that
the human visual system will not pick up on. Figure 1 depicts the
classification of redundancy

Figure 1: Classification of redundancy
 Statistical redundancy
As previously mentioned, there are two kinds of statistical redundancy:
interpixel redundancy and coding redundancy.
When adjacent pixels in an image are correlated, interpixel redundancy
results. The implication is that the pixels next to each other are not
statistically independent. Interpixel redundancy is the term used to
describe the interpixel correlation.
Information representation is connected to redundancy in coding. Codes
are used to represent the information. Examples of codes include the
arithmetic codes and the Huffman code. The efficiency of differe nt codes
may vary. To properly compress the image, codes must be effective.
 Spatial redundancy
In an image, the statistical relationship between adjacent pixels is
represented by spatial redundancy. The term "spatial redundancy" refers to
the relationship between adjacent pixels in an image. Each pixel in an
image need not be individually represented. Instead, a pixel's neighbours
can be used to forecast it. The fundamental idea behind differential
coding, which is commonly used in picture and video reducti on, is to
eliminate spatial redundancy by prediction.
 Temporal redundancy
The statistical relationship between pixels from subsequent frames in a
video clip is known as temporal redundancy.Interframe redundancy is
another name for the temporal redundancy. To lessen temporal
redundancy, motion compensated predictive coding is used. Effective munotes.in

Page 103


Image Compression
103 video compression is achieved by removing a significant amount of
temporal redundancy.
 Psychovisual redundancy
The traits of the human visual system (HVS) are connected to
psychovisual redundancy. Visual information is not perceived similarly in
the HVS. It's possible that certain information is more crucial than other
information. Perception won't be impacted if less data is used to represent
less significant visual info rmation. This suggests that visual information is
redundant in a psychovisual sense. Effective compression results from
removing the psychovisual redundant information.
8.4 IMAGE -COMPRESSION SCHEME
Figure 2 depicts the general block diagram of an image -compression
technique. While the channel encoder and decoder pair is often referred to
as a channel codec module, the source encoder and decoder pair is
frequently referred to as a source codec module. Over the years, Shannon's
well-known separation theorem h as been used to support the two coding
modules' independent designs.

Figure 2: Image compression scheme
 Source Coding: The effective translation of the source data —the input
image data —into a series of bits is the aim of source coding. By
reducing entrop y, the source coder lowers the average number of bits
needed to represent the image. In essence, source decoding is an
inverse process.

 Channel: The mathematical representation of the medium used for
communication is the "channel."

 Channel Coding: The s ource encoder compresses the output and adds
controlled redundancy using the channel encoder. The channel
encoder's job is to shield the communication system from channel
noise and other transmission mistakes. To recreate the compressed
bits, the channel d ecoder makes use of the redundant bits in the bit
sequence.


munotes.in

Page 104


Digital Image Processing
104 8.5 FUNDAMENTALS OF INFORMATION THEORY
Both lossless and lossy compression are mathematically based on
information theory. Entropy, average information, and rate -distortion
principles are cover ed in this section.
8.5.1 Entropy and mutual information
For a random variable X generated by a discrete memoryless source from
a finite alphabet set X = {x 1, x 2, ………… x N} with p x being the
probability density function, Shannon defined a measure of the
information content of X, called entropy, as

The entropy H(X) is the lower bound of the bit rate that can represent the
source without distortion. The conditional entropy of a random variable Y
with a finite alphabet set y is defined as

where p xy(x, y) is the joint probability density function of X and Y and p x|
y(x, y) is the conditional probability density function of X given Y. The
mutual information I (X; Y) between two random variables X and Y is a
measurement of the reduction in uncertainty due to co nditioning of Y,
which is defined as

The above equation implies that mutual information is symmetric.
8.5.2 Shannon’s source coding theorem
The aim of source coding is to effectivelyexpress a random experiment’s
sequence of outcomes. According to Shannon ’s source -coding theorem
“Let X be the ensemble of letters from a discrete memoryless source with
finite entropy H (X). Blocks of J symbols from the source are encoded into
code words of length N from a binary alphabet. For any ε > 0, the
probability P e of a block -decoding failure can be made arbitrarily small if

and J is sufficiently large. Conversely, if R ≤ H (X) − ε then P e becomes
arbitrarily close to 1 as J is made sufficiently large. The source -coding
theorem demonstrates that the source entropy H (X) is the lower bound of
the average number of bits/symbols required to encode the output of a
discrete memoryless source without distortion. munotes.in

Page 105


Image Compression
105 8.5.3 Rate -distortion theory
The task of reducing redundancy from a source while adhering to a
predetermined fide lity requirement is the focus of the rate distortion
theory. The maximum distortion of any source -coding scheme that
satisfies the rate constraint R is the definition of the rate -distortion
function represented by R(D) of the source X. The rate -distortion function
for a discrete memoryless source using the mean square error criterionis
given by

where PY|X implies source -coding rule and MSE the MSE of the encoding
rule.
8.6 RUN -LENGTH CODING
Long sequences of the same symbol can be effectively encoded usin g run -
length coding (RLC). By encoding the total amount of symbols in a run,
run-length coding takes advantage of the spatial redundancy. The word
"run" denotes the repetition of a symbol, while the word "run -length"
denotes the quantity of repeated symbol s. A series of numbers is converted
into a series of symbol pairs (run, value) using run -length coding. For this
type of compression, images with sizable regions of consistent shade make
suitable candidates. The bitmap file format used by Windows employs
run-length coding.There are two types of run -length coding: (i) 1D run -
length coding (ii) 2D run -length coding. As opposed to 2D RLC, which
uses both horizontal and vertical correlation between pixels, 1D RLC only
uses the horizontal correlation between pix els on the same scan line.
8.6.1 1 -DRun -length coding
Each scan line in 1D run -length coding is separately encoded. A series of
alternating, independent white runs and black runs can be thought of as
each scan line. The first run in each scan line is taken into consideration to
be a white run as per an agreement between the encoder and decoder. The
run-length of the initial white run is set to zero if the first actual pixel is
black. There is end -of-line (EOL) at the conclusion of each scan line.
When an EO L codeword is encountered, the decoder is aware that the scan
line has ended.
Example: Consider figure 3 for an example of a binary image and its
associated binary representation. Using run -length coding, the image must
be encoded.Run -length coding transmi ts two values: the first value
represents the quantity of times a specific symbol has appeared, and the
second value is the sign itself. It is clear from Fig. 3 that each row is
successively scanned, and the associated run -length code is delivered. An
illustration of run -length coding is provided for the fourth row of the input
image. The bit stream for the fourth row in the example is 4, 0, 10, 1, 4, 0. munotes.in

Page 106


Digital Image Processing
106

Figure 3: Example ofrun -length coding
8.6.2 2 -DRun -length coding
The correlation between pixels in a s canline is used in 1D run -length
coding. 2D run -length coding was created to take use of the correlation
between pixels in nearby scan lines to increase coding effectiveness.
8.7 SHANNON -FANO CODING
A top -down binary tree approach, Shannon -Fano coding uses binary trees.
The Shannon -Fano coding algorithm is provided below:
1. The collection of source symbols is arranged in reverse probability
order.
2. The interpretation of each symbol is that of a tree's root.
3. The list is split into two groups with rough ly equal overall probabilities
for each category.
4. A 0 is added to the first group's code phrase.
5. The number one is added to the second group's code phrase.
6. For each group, steps (3) to (4) are repeated until there is only one node
in each subset.
Example: Construct the Shanon –Fano code for the word MUMMY
Solution: The given word is MUMMY. The number of alphabets in the
word MUMMY (N) is five. Step 1 Determining the probability of
occurance of each character Probability of occurrence of each charact er in
MUMMY is given by

Step 2 Determining the number of steps Now, the number of steps to be
involved in this problem is number of steps = number of symbols − 1
= 3 − 1 = 2
Step 3 Computation of the code word for each symbol the steps involved
in the computation of the code words is shown in the table given below munotes.in

Page 107


Image Compression
107

Step 4 Determination of entropy
After determining the probability and the code for each cha racter, the next
step is to compute the entropy. The entropy is given by

Step 5 Computation of average length
The average length is given by

Step 6 Computation of efficiency
The efficiency of the code η is given by

8.8 HUFFMAN CODING
In 1952, D A H uffman created the Huffman coding system. The algorithm
was created by David Huffman in 1950 while he was a student at MIT
taking an information theory course. One symbol is mapped to one code
word in Huffman codes, which are the best codes. It is assumed in
Huffman coding that each pixel's intensity has a specific probability of
occurrence, and that this probability is spatially invariant. Each intensity
value is given a binary code through Huffman coding, with shorter codes munotes.in

Page 108


Digital Image Processing
108 going to intensities with highe r probabilities. The Huffman codes table can
be fixed in both the encoder and decoder if the probabilities can be
calculated a priori. If not, the compressed image data must also be
transmitted with the coding table to the decoder. Entropy, Average length,
Efficiency, Variance are some of the parameters involved in Huffman
coding.
In the first step of Huffman's method, the probabilities of the symbols
under consideration are ranked, after which the lowest probability symbols
are combined into a single symbo l and substituted in the following source
reduction. This binary coding technique is seen in figure 4. A hypothetical
collection of source symbols and their probabilities are arranged from top
to bottom in decreasing order of probability values at the far left. The
lowest two probabilities, 0.06 and 0.04, are combined to create a
"compound symbol" with probability 0.1 to create the initial source
reduction.In order to rank the probabilities of the reduced source from
most to least likely, this compound symb ol and its corresponding
probability are placed in the first source reduction column. Then, this
process is continued until the reduced source (at the far right) has two
symbols.

Figure 4: Huffman source reductions
The coding of each reduced source, proc eeding backwards from the
smallest to the largest, is the second stage in Huffman's method. The
symbols 0 and 1 are, of course, the smallest length binary code for a two -
symbol source. These symbols are given to the two symbols on the right,
as seen in Fig . 5 (The assignment is arbitrary; it would also be effective to
reverse the order of the 0 and 1). The 0 used to code the reduced source
symbol with probability 0.6 is now assigned to each of these symbols, and
a 0 and 1 are arbitrarily appended to each to distinguish them from one
another. This is because the reduced source symbol with probability 0.6
was created by combining two symbols in the reduced source to its
left.This operation is then repeated for each reduced source until the
original source is r eached. The final code appears at the far left in Fig. 5.
The average length of this code is

and the entropy of the source is 2.14 bits/symbol. munotes.in

Page 109


Image Compression
109

Figure 5: Huffman code assignment procedure
When symbols are coded one at a time, Huffman's method finds the best
possible code for a collection of symbols and probabilities. Once the code
has been written, it may be easily decoded without error using a simple
search table. The code is a block code that can be instantly and exclusively
decoded. Because each sour ce symbol is translated into a predetermined
order of code symbols, it is known as a block code. Because each code
word in a string of code symbols may be deciphered without referring to
subsequent symbols, it is instantaneous.It can only be decoded in one
method for each given string of symbols, making it uniquely decodable.
As a result, any string of Huffman -encoded symbols may be decoded by
looking at each symbol in the string from left to right.For the binary code
of Fig. 5, a left -to-right scan of the encoded string 010100111100 reveals
that the first valid code word is 01010, which is the code for symbol a 3.
The next valid code is 011, which corresponds to symbol a1. Continuing
in this manner reveals the completely decoded message to be a 3 a1 a2 a2 a6.
8.9 ARITHMETIC CODING
In arithmetic coding, which dates back to Elias's work (Abramson [1963]),
source symbols and code words do not always correlate exactly. Instead, a
single arithmetic code word is given to an entire series of source symbols
(or a mess age). By itself, the code word designates a range of real values
between 0 and 1. The interval used to represent the message's increasing
number of symbols gets smaller as well as the quantity of information
units (let's say, bits) needed to do so.The size of the gap is decreased for
each symbol in the message in line with the likelihood that it will appear.
The technique achieves (but only in theory) the bound set by Shannon's
first theorem since it does not require, as does Huffman's approach, that
each s ource symbol translate into an integral number of code symbols (i.e.,
that the symbols be coded one at a time).
The procedure of fundamental arithmetic coding is shown in Figure 6.
Here, a 1a2a3a3a4, a five -symbol sequence or message, is coded from a four -
symbol source. The message is first believed to take up the entire half -
open interval [0, 1) before the coding process even begins. This interval is
originally partitioned into four sections depending on the probabilities of
each source symbol, as shown in Table 1. For instance, the symbol a1 is
related to the subinterval [0, 0.2).The message interval is initially reduced
to [0, 0.2) because it is the first symbol of the message that is being coded. munotes.in

Page 110


Digital Image Processing
110 As a result, in Fig. 6, the range [0, 0.2) is expanded to f ill the entire height
of the figure, with the values of the restricted range designating its end
points. The next message symbol is then added to the narrower range, and
the procedure is repeated according to the probability of the origin source
symbol.In this manner, symbol a2 narrows the subinterval to [0.04, 0.08),
a3 further narrows it to [0.056, 0.072), and so on. The range is reduced to
[0.06752, 0.0688) by the last message symbol, which must be reserved as
a unique end -of-message signal. Of course, t he message can be
represented by any integer falling inside this subinterval, such 0.068. The
five symbols of the arithmetically coded message shown in Fig. are
represented by three decimal digits. In comparison to the source's entropy,
which is 0.58 decim al digits per source symbol, this equates to 0.6 decimal
digits per source symbol.The resulting arithmetic code approaches the
bound defined by Shannon's first theorem as the length of the sequence
being coded grows. The use of finite precision arithmetic and the addition
of the end -of-message indicator, which is required to distinguish one
message from another, are the two variables that cause coding
performance to fall short of the bound in practise. The latter issue is
addressed by scaling and rounding s trategies introduced in practical
arithmetic coding systems. The scaling approach divides each subinterval
in accordance with the symbol probabilities after renormalizing it to the
[0, 1) range. The rounding technique ensures that the coding subintervals
are faithfully represented despite the truncations brought on by finite
precision arithmetic.
Table 1:Arithmetic coding example


Figure 6:Arithmetic coding procedure

munotes.in

Page 111


Image Compression
111 8.10 TRANSFORM -BASED COMPRESSION
The method most frequently employed in image -coding a pplications is
image -transform coding. A set of linear transform coefficients, which are
typically scalar -quantised and entropycoded for transmission, are created
by linearly transforming an image. Transform coding, then, is a
mathematical process that div ides a large number of highly correlated
pixels into a smaller set of uncorrelated coefficients. The foundation of
transform coding is the existence of a specific degree of connection
between pixels in an image. When compressing images, transformation is
a very helpful technique. It is used to convert time -domain image data to
frequency -domain data. The spatial redundancy in the time domain can be
reduced by converting the data into frequency domain.The advantage of
using transformation is that the energy o f the transformed data is mainly
condensed in the lowfrequency region, and is represented by a few
transform coefficients. Thus, most of these coefficients can be discarded
without significantly affecting the reconstructed image quality. In a
transform cod er, the discrete data signal is segmented into non -
overlapping blocks, and each block is expressed as a weighted sum of
discrete basis functions. The purpose of transform coding is to decompose
the correlated signal samples into a set of uncorrelated trans form
coefficients, such that the energy is concentrated into as few coefficients
as possible. The block diagram of transform -based image coding scheme
is shown in Fig. 7.

Figure 7:Transform -based image -coding scheme
Transformation: The image's correlatio n qualities are altered as a result of
the transformation's "reorganization" of the grey values. The process of
transformation condenses the image data into a tiny set of coefficients.
The transform should have the following characteristics for effective
compression:
 Decorrelation: The transform should generate less correlated or
uncorrelated transform coefficients to achieve high compression ratio.
 Linearity: Linearity principle allows one -to-one mapping between
pixel values and transform coefficients.
 Orthogonality: Orthogonal transforms have the feature of eliminating
redundancy in the transformed image munotes.in

Page 112


Digital Image Processing
112 Quantization: It is the process of limiting a quantity's range of possible
values in order to minimize the number of bits required to represent it. By
itself, the picture transformations don't lead to any compression. Instead,
they represent the image in a distinct domain where the image data is
divided into parts with varied degrees of significance. In essence,
quantization is an irreversible process.
Entropy encoder: The reduction of the number of bits needed to represent
each symbol at the quantizer output is the goal of an entropy encoder.
Techniques for entropy -coding that are frequently employed include run -
length coding, arithmetic coding, and Huffm an coding. A lossless coding
system is essentially what entropy coding is.
Essentially, the encoder's output is a bit stream that is sent over the
channel to the decoder. If one is just focused on source code, the channel
is typically taken to be lossless. If one is interested in joint source and
channel coding, one should be aware that the mistake happens when the
bitstream traverses the channel. In essence, the decoder reverses the
encoder's process to produce the reconstructed image.
8.11 SUMMARY
 Compact representation is compression. The representation of a
picture with the fewest possible bits is known as image compression.

 Redundancies in an image are reduced to accomplish compression.
Spatial redundancy, temporal redundancy, and psychovisual
redundan cy are three different types of redundancies in an image.

 Image compression can be divided into two categories: lossless
compression and lossy compression. The rebuilt image closely
resembles the original image when using lossless compression.
Information is lost during lossy compression.

 These methods are frequently used in lossless picture compression:
Run-length coding, Huffman coding, arithmetic coding, Shannon -Fano
coding, and dictionary -based methods like LZW are only a few
examples.

 Quantization is basically approximation. The process of quantization is
non-linear and irreversible. Scalar quantization and vector quantization
are the two basic categories under which quantization may be
classified.
8.12 LIST OF REFERENCES
1) Digital Image Processin g, S Jayaraman, S Esakkirajan, T Veerakumar,Tata McGraw -Hill Education Pvt. Ltd., 2009.
2) Digital Image Processing 3rd Edition, Rafael C Gonzalez, Richard E
Woods, Pearson, 2008. munotes.in

Page 113


Image Compression
113 3) Scilab Textbook Companion for Digital Image Processing,
S. Jayaraman, S. Esakkirajan and T. Veerakumar, 2016
(https://scilab.in/textbook_companion/generate_book/125).
8.13 UNIT END EXERCISES
1) Explain the need for image compression.
2) Describe Redun dancy in images.
3) Explain Image -compression scheme.
4) What is Entropy and mutual information?
5) Explain Shannon’s source coding theorem.
6) Describe Rate -distortion theory.
7) What is Run -length coding? Explain along with its types.
8) Write a note on Shannon -Fano coding.
9) Describe Huffman Coding.
10) Explain Arithmetic Coding.
11) Explain Transform -based compression.


munotes.in