~~
~~
## Page 84

84IMAGE PROCESSING

measure of the similarity between g and h as noted in the previous section. The

large g յ h(0) values show that g and h have significant time and f requency

characteristics (shape, bandwidth, etc.). Therefore, if h is the ramp sh aped base

function in figure (d), then transform coefficient could be utilized to identify linear

gradients of brightness in a line of the image. If h is a sinusoidal functio n, like that

of Fig. (a), on the other hand it is possible to use g յ h(0). Plots such as those in

Fig. 5.2 and a similarity measure like this can reveal much about transforming the

function's time -frequency features

Independent variables T and F instead o f spatial variables x and u are used in the

introduction to the time frequency plane. g(t) and h(f) continuous functions are

replaced by f(x) and S u(x). Although concepts use continuous functions and

variables, they also apply to distinct functions and va riables

munotes.in

## Page 85

85Chapter 5: Wavelet and Other Image Transforms

FIGURE 5.2 Basis vectors (for ) of some commonly encountered transforms: (a)

Fourier basis (real and imaginary parts), (b) discrete Cosine basis, (c) Walsh -Hadamard basis, (d) Slant basis, (e) Haar basis, (f) Daubechies basis, (g) Biorthogonal B -spline basis and its dual, and (h) the st andard basis, which is

included for reference only (i.e., not used as the basis of a transform).

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

The position of h on th e plane of time-frequency Fig. 5.3 is a strictly objective

descriptor of h and therefore g for large values of (a) . let p h =|h(t) |2 / ||h(t)2||2 be a

probability density function with mean

In Eq. ( 67) , every value of t is weighted by ph to calculate a weighted mean with

respect to coordinate t

eq-67

munotes.in

## Page 86

86IMAGE PROCESSING

FIGURE 5.3

(a) Basis function localization in the time -frequency plane. (b) A standard basis

function, its spectrum, and location in the time frequency plane. (c) A complex

sinusoidal basis function (with its real and imaginary parts shown as solid and

dashed lines, respectively), its spectrum, and location in the time -frequency plane.

(Reference from "Digital Image Processing fourth edition by Rafael C. Gonzalez

and Richard E. Woods)

and variance

eq-68

Let pH (f) =|H(F) |2 / ||H(F)2||2 be a probability density function with mean

eq -69

and variance

The Fourier transform of H is where f denotes frequency. H( f) (t). Then, as shown

in Fig. 5 .3(a), the energy of base function h is concentrated at (µ t ,µf) on the time -

frequency plane. In a rectangular region called the Heisenberg box or a cell, the

majority of the energy comes from such an area ıt ıf

The energy of continuous function h(t) is ȁ݄ሺݐሻଶ |dtஶ

ିஶ

munotes.in

## Page 87

87Chapter 5: Wavelet and Other Image Transforms

So if indicated in angular frequency, the const ant on the right side of Eq. ( 71)

even with a Gaussian base function, whose transform is also a Gaussian feature,

equal treatment is feasible.

ߪ௧ଶߪଶଵ

ଵగమ eq-71

Because a function support can be describe d as the series of points where the

function is non -zero, the principle of uncertainty in Heisenberg tells us that a

function can't be supported on time and on frequency at all. Reveal so that both ߪ௧

and ߪit can’t be very small. Hence the basis fun ction ߜሺݐെݐሻ illustrated in Fig.

5.3(b) is correctly localized with the time[i.e ߪ௧ൌͲ, hence the width of ߜሺݐെݐሻ

is equal to zero], its spectrum is also nonzero on the whole f -axis. i.e, hence ߦߜሺ݂െ݂ሻൌሺെ݆ʹߨ݂ݐሻ and ȁሺെ݆ʹߨ݂ݐሻ |=1 for entire f ,ߪൌ

λ, The output is an infinitesimal small , infinitely large Heisenberg cell on the time -

frequency plane. Basis function ሺʹߨ݂ݐሻ of Fig. 5 .3(c) , on the other hand, is

primarily, nonzero on the broad time axis, nonetheless is clearly localized in

frequency. Because ߦሼሺെ݆ʹߨ݂ݐሻൌߜሺ݂െ݂ሻሽ spectrum ȁߜሺ݂െ݂ሻȁ is zero

at fully frequencies other than The resultant Heisenberg cell is infinitely comprehensive and infinitesimally small in height ߪఠୀ As Figs. 5 .3(b) and (c)

illustrate, perfect localization in time is accompanied by a loss of localization in

frequency and vice versa.

Repeating once more to F ig. 5.2, notice the basis for the DFT of Fig. 5 .2(a) and the

standard basis of Fig. 5.2 (h) in Fig. 5 .3(c) and (b), accordingly, are discrete

examples (for N=16 ) of the impulses and complex exponential functions. In the top

half of Fig. 5.2, the other basis is both the index u frequency and the width or

support 16. For a given u, their locations are similar in the time frequency plane.

This is especially obvious when u is 8 and the basic functions of the Heisenberg

cells are the same. For all the other u parameters ߤ௧ǡߪ௧Ǥ µt and ߪ௧Ǥ of the

heisenberg cell and their value is similar, where there are slight variations between

cosine, ramp or square wave forms . Similarly, with exception of the already

described standard basis, the basis funct ion of the bottom half of Fig. 5.2 is also

analogous for a given u. These basis functions are small waves called wavelets of

form that are scaled and shifted

The DFT basis functions do not seem to be ordered frequency due to aliases.

߰௦ǡ௧ሺݐሻൌʹ௦Ȁଶ߰ሺʹ௦ݐെ߬ሻ

In which s and ߬ are integers and mother wavelet ߰ሺݐሻ is a real, square -integrable

function having a bandpass -like spectrum. Parameter decides the positio n of ߰௦ǡ௧ሺݐሻ on the t -axis, s decided its width —that is, how broad or nar row it is along munotes.in

## Page 88

88IMAGE PROCESSING

the t-axis, and ʹ௦ manage its amplitude , the functions analogous to in Fig. 5 .2 have

lowpass spectra and ar e known as scaling functions. Eq ( 72) generates a basis

which is defined by the cells Heisenberg on the right side of Fig. 5.4 along with a

properly constructed mother wavelet

FIGURE 5.4

Time and frequency localization of 128 -point Daubechies basis functions.

(Reference from "Digital Image Processing fourth edition by Rafael C. Gonzalez

and Richard E. Woods)

The proof of Eq. ( 73)

ॅሼ߰ሺʹ௧ݐሻሽൌଵ

ଶೞȲሺ

ଶೞሻ Eq -73

And the spectrum is spread for positive s values — each component of the

frequency is shifted by more than one by factor of 2S. The compressing time

expands the spectrum as was the case for a rectangu lar pulse. It is shown in Figs

5.4(b) – (d) graphically. Note that i n Fig. 5.4 (c), the width of the base function is

half as in (d) whereas its width of its spectrum is twice that of (d). It is shifted by a

factor of two higher frequency. The s ame could be explained in Fig. 5.4 (b) in

comparison to (c) the base function and spectrum. Halving time support and

doubling frequency support, Heisenberg cells of different widths and heights are

produced with the same area. In addition, each cell row to the right of Fig. 5.4

represents a single scale s and frequency range. The cells in a row are moved over

time in relation to each other.

ॅሼ߰ሺݐെ߬ሻൌ݁ିଶగఛȲሺ݂ሻ eq-74

Thus ȁॅሼ߰ሺݐെ߬ሻȁൌȁȲሺ݂ሻ| and the spectra of the time -shifted wavelets are

identical.

munotes.in

## Page 89

89Chapter 5: Wavelet and Other Image Transforms

5.4 Basis Images

As the inverse transformati on kernel s(x,y,u,v) in Eq. ( 32) only retains the

indices x,y,u,v, rathe r than f(x,y) or T(u,v), Eq. ( 32) can also be used as the

matrix sum of the kernel

eq-75

where F is an matrix having the elements of f(x, y) and

eq-76

For ݑǡݒൌͲǡǥǤǡܰെͳ F is then clearly defined as a linear combination of ܰଶ

matrices having size ܰܺܰ ,that is , the ܵ௨Ǥ௩ for ݑǡݒൌͲǡǥǤǡܰെͳ , when the

underlying s(x,y,u,v) are real -valued, independent, and symmetric

ܵ௨ǡ௩ୀܵ௨ܵ௩்

Eq-77

Where S u and S v defined by Eq. ( 20) are previously. F is a 2 -D image and is called

basic images in the context of digital imag e processing. As show n in Fig. 5.5 (a), it

can be arranged in a array to give a concise view of 2 -D basis functions they

represent.

EXAMPLE: The basis images of the standard basis .

In figure 5.2 (h), ݁the basis of the NX1 column vector, which is 1 in nth and all

other elements are 0, would be a certain instance (N=16) of a standard basis

݁ǡ݁ଵǡǥǤǤǡ݁ேିଵ . Since it is real orthonormal, and the corresponding orthogonal transformation matrix is during the corresponding 2 -D transform. In

other words, the transfor mation of F on the standard base is F —confident that a

discrete function is implicitly represented in respect of the standard basis if it is

written in vector form.

The basis image of the 2-D size standard base Figure 5.5 (b) Just as the 1 -D-basis

vectores, that are non zero at just one moment (or x -value), are not zero at just one

munotes.in

## Page 90

90IMAGE PROCESSING

point on a xyp lane, the basis images of Fig. 5.5 (b) are non zero. This is a result of

Eq (77) , hence ݏ௨ǡ௩ൌ݁௨݁௩்ൌܧ௨ǡ௩ where ܧ௨ǡ௩ is a matrix with zeros having a 1

in the uth row and vth column . in similar way , the DFT basis shown in fig 5.6 .

which follow from Eq. ( 77) , Eq. ( 22) , and the defined equation of the 1 -D DFT

expansion functions [i.e., Eq. ( 56) ]. Note the DFT basis image of maximum

frequency occurs when u and v are 4, just as the 1 -D DFT basis function of

maximum frequency occurred at in Fig. 5.1

FIGURE 5.5 (a) Basis image organization and (b) a standard basis of size For

clarity, a gray border has been added around each basis imag e. The origin of each

basis image (i.e., ݔൌݕൌͲ) is at its top left

(Reference from "Digital Image Processing fourth edition by Rafael C. Gonzalez

and Richard E. Woods)

FIGURE 5.6 (a) Tranformation matrix ܣிof the discrete Fourier transform for N=8

where ߱ൌ݁ିଶగȀ଼ݎሺͳെ݆ሻȀξʹ or (b) and (c)

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

The real and imaginary parts of the DFT basis images of size 8X8 For clarity, a

black border has been added around each basis image. For 1 -D transforms, matrix

ܣி is used in conjunction with Eqs. (43) and ( 44) ; for 2 -D transforms, it is used

with Eqs. (6-41) and (6 -42).

munotes.in

## Page 91

91Chapter 5: Wavelet and Other Image Transforms

5.6 Fourier -Related Transforms

The Fourier real function transformation is highly complex. In this section we

discuss three transforms associated to Fourier which are real rather than complex:

the discrete Hartley transform, discrete cosine transform, and discrete sine transform. All three transforms handle unnecessary numbers' computational complexity and can be carried out via fast FFT -like algorithms.

The Discrete Hartley Transform

A discrete Hartley Transform (DHT) is a Fourier -related transformation of discrete,

periodic data analogous to the discrete Fourier Transform (DFT), with analogue

applications in signal processing and associated fields .Its primary difference from

DFT would be that it transforms real inputs to real outputs without involving

complex numbers intrinsically. Much like DFT is the discreet analogue of the

continuous Fourier transformation (FT), DHT is the discrete analogue of the

Hartley transform, which Ralph V. L. Hartley introduced in 1942.

The transformation m atrix of the discrete Hartley transform (DHT) is obtained by

replacing the inverse transformation kernel.

The function cas, the terminology for the cosine -and-sin function, is defined

asݏܽܿሺߠሻൌ
ሺߠሻሺߠሻ

ݏሺݔǡݑሻൌͳ

ξ݊ݏܽܿ൬ʹߨݔݑ

ܰ൰

=ଵ

ξቂݏܿቀଶగ௨௫

ேቁݏ݅݊ቀଶగ௨௫

ேቁቃ eq -78

Whose divisible 2 -D complement is

We shall not consider the non - divisible form ݏሺݔǡݕǡݑǡݒሻൌଵ

ேݏܽܿቀଶగሺ௨௫ା௩௬ሻ

ேቁ

ݏሺݔǡݕǡݑǡݒሻ =ቂଵ

ξݏܽܿቀଶగ௨௫

ேቁቃቂଵ

ξݏܽܿቀଶగ௩௬

ேቁቃ eq -79

Meanwhile the resultant DHT transformation matrix —represented in Fig. 5.7

—is real, orthogonal, and symmetric, ൌൌି should be

used in the calculation of both forward and inverse transform. For 1 -D transforms,

ܣு is used in comb ination with Eqs. (6 -28) and ( 29) for 2 -D transforms, Eqs. (35)

and ( 36) are used . Since ܣுis symmetric, the forward and inverse transforms are

same. munotes.in

## Page 92

92IMAGE PROCESSING

FIGURE 5.7

The transformation matrix and basis images of the discrete Hartley transform for

(a) Graphical representation of orthogonal transformation matrix ܣு (b) ܣு

rounded to two decimal places, and (c) 2 -D basis images.

(Reference from "Digital Image Pr ocessing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

For 1 -D transforms, matrix ܣு is used in conjunction with Eqs. (28) and ( 29) ; for

2-D tran sforms, it is used with Eqs. (35) and ( 36)

Note the resemblance of the harmonically associat ed DHT basic functions in

Figure. 5.7 (a) and the real part of the DFT basic f unctions in Fig. 5.1 . It's simple to

demonstrate that

ൌሼሽെሼሽ

ൌሼሻሼሽ eq-(80)

Where is the unit transformation matrix of the DFT. In addition, given the real

part of the DFT kernel.

In Eqs. (81) and ( 82) , subscripts HY and F are used to represent the Hartley and

Fourier kernels, respectively.

ܴ݁ሼݏிሺݔǡݑሻሽൌܴ݁൜ͳ

ξܰ݁ଶగ௨௫

ேൠൌͳ

ξܰܿݏ൬ʹߨݔݑ

ܰ൰

eq-(81)

The discrete Hartley kernel can be rewritten using triginometric identity

ܽܿݏሺߠሻൌξʹ
ሺߠെగ

ସሻ[see Eq. ( 78)] as

ݏுሺݔǡݑሻൌටଶ

ሺଶగ௨௫

ேെగ

ସሻ eq-(82)

munotes.in

## Page 93

93Chapter 5: Wavelet and Other Image Transforms

The basic functions of the Fourier and Hartley discrete transformations will be

scaled ξʹ and shifted byߨȀͶ, i.e. scaled and shifted by one another. When

comparing Figures 5.1 and 5.7 , the shift is evident (a)

The Discrete Cosine Transform

The most frequently found form discrete cosine transform (DCT) is achieved by

replacing the inverse transformations kernel. There are 8 standard DCT variations

and various symmetry conditions are assumed. For example, even about a sample

or about a point between two samples could be assumed to be the entry

eq-83

Where

eq-84

5.7 Walsh -Hadamard Transforms

An orthogonal transformation technique, the Walsh -Hadamard transform breaks

down a signal into a set of basis functions. It is non -sinusoidal and orthogonal in

nature. Walsh functions, that are rectangular or square waves with values of +1 or

–1, represent as the basis functions for these equations. Walsh -Hadamard

transforms are also referred to as Hadamard transforms Walsh transforms, or

Walsh -Fourier transforms, depending on the context.

(eq-85)

This corresponds to the modulo 2 arithmetic operation perf ormed in the exponent

of Eq. (8 5) and is the kth bit in the binary representation of z. For example, if

n=3 and z=6 and (110 in binary), and If b0(2) = 0, b 1 (z) = 1 and b z (z) = 1 if N=2

the resulting Hadamard -ordered transformation matrix is

munotes.in

## Page 94

94IMAGE PROCESSING

WHTs with Hadamard or natural ordering are denoted by the letter Aw, which

stands for Hadamard or natural order transformation matrix. Although it is of size

2X2 in this case, it is more commonly of size N X N, where N X N is the dimension

of the discrete function being transformed.

Where the matrix on the right (without t he scalar multiplier) is called a Hadamard

matrix of order 2. Letting HN denote the Hadamard matrix of order N, a simple

recursive relationship for generating Hadamard -ordered transformation matrices is

Where

And

And

munotes.in

## Page 95

95Chapter 5: Wavelet and Other Image Transforms

5.8 Slant Transform

The N x N Slant transform matrices are defined by the recursion

where N = 2n, IM, denotes an M x M identity matrix, and

The parameters a n and b n are defined by the recursions

which solve to give

Using these formulas, the 4 x 4 Slant transformation matrix is obtained as

Properties of the Slant Transform

(i) The slant transform is real and orthogonal.

S = S*

S-1 = ST

munotes.in

## Page 96

96IMAGE PROCESSING

(ii) The slant transform is fast, it can be implemented in (N log2N) operations

on an N x 1 vector.

(iii) The en ergy deal for images in this transform is rated in very good to

excellent range.

(iv) The mean vectors for slant transform matrix S are not sequentially ordered

IRUQ

5.9 Haar Transform

In general, the Haar functions hk(x) are described on the continuum interval x א

[0, 1] for k = 0,...,N —1 for which N = 2n. The integer k could be partitioned in a

unique way as follows:

k= 2P+ q-1

Where SQ-1; q = 0, 1 for p = 0 and 1T2P IRUS 0. For example, when N =

4, we have

Representing k by (p ,q), the Haar functions are defined as

In order to produce the Haar transform, it is necessary to allow x take discrete

values at m/N, where m =0, 1,..., N -1. The Haar transform is defined as follows

for N = 8.

munotes.in

## Page 97

97Chapter 5: Wavelet and Other Image Transforms

Properties of the Haar Transform

1. The Haar transform is real and orthogonal. Therefore,

Hr=Hr*

Hr-1 = HrT

2. In computing, the Haar transform is an extremely rapid transformation. If

you have a N x 1 vector, you can do it in O (N) operations, which is quite

fast.

3. The basis vectors of the Haar matrix are ordered in a frequency -ordered

manner.

4. For images, the energy compaction of the Haar transform is weak.

5.10 Wavelet Transforms

The wavelet transform is quite close to the Fourier transform (or much more similar

to the windowed Fourier transform), but it has an e ntirely different merit function

than the Fourier transform. The most significant distinction is as follows: On one

hand, the Fourier transform breaks down the input signal into sines and cosines,

i.e., functions that are localised in Fourier space; on the other hand, the wavelet

transform employs functions that are localised in both real and Fourier space. For

the most part, the wavelet transform may be stated mathematically using the

following equation:

where * denotes the complex conjugate symbol and function ȥ denotes a

function of some kind. This function can be chosen at random, as long as it

complies with a few rules.

Because of this, the Wavelet transform is actually an infinite collection of different

transforms that can be computed depending on the merit function that is employed

in its computation. This is the primary reason why we can hear the word "wavelet

transform" used in a variety of various settings and applications, including computer science. There are a variety of methods for categorizing the many types

of wavelet transformations. We just present the division based on the wavelet

orthogonally . We can employ orthogonal wavelets for discrete wavelet transform

development and non -orthogonal wavelets for continuous wavelet transform

munotes.in

## Page 98

98IMAGE PROCESSING

development while developing discrete wavelet transforms, for example. The

following are the characteristics of these two transform ations:

After performing the discrete wavelet transform, a data vector of the same length

as the input is returned. In most cases, even in this vector, many data points are

close to zero. This corresponds to the fact that it decomposes into a set of wavele ts

(functions) that are orthogonal to the translations and scaling of the original image

data. As a result, we decompose such a signal into a wavelet coefficient spectrum

with the same or smaller number of wavelet coefficients as the number of signal

data points. Because there is no redundant information in a wavelet spectrum of

this type, it is particularly well suited for signal processing and compression

applications.

Instead, the continuous wavelet transform yields an array that is one dimension

bigger than the input data. We can obtain an image of the time -frequency plane

from a single -dimensional data set. We can clearly see the progression of the

signal's frequencies over the course of the signal's duration and compare the

spectrum to the spectra of o ther signals. Because the non -orthogonal collection of

wavelets is utilized in this analysis, the data is strongly correlated, resulting in a

significant amount of redundancy. This makes it easier to see the outcomes in a

more humane light.

Discrete Wavele t Transform

With the discrete wavelet transform (DWT), you can achieve a more precise

implementation of the wavelet transform by employing a discrete subset of wavelet

scales and translations that adhere to certain constraints. For lack of a better term,

this transform decomposes the signal into a set of wavelets that are mutually

orthogonal to one another, which is the primary difference between it and the

continuous wavelet transform (CWT), or its discrete time series implementation,

which is sometimes re ferred to as discrete -time continuous wavelet transform (DT -

CWT).

The wavelet can be generated from a scaling function that explains the scaling

qualities of the wavelet's scaling properties. The requirement that the scaling

functions must be orthogonal to their discrete translations imposes some mathematical requirements on them, which are referenced throughout the text, for

example, the dilation equation, which can be found here.

munotes.in

## Page 99

99Chapter 5: Wavelet and Other Image Transforms

Color Image Processing

The utilization of color for image processing is in spired by two key factors. First,

color is an important descriptor that also simplifies the identification and extraction

of objects from the image. Color image processing is divided into two main parts:

full color and pseudo -color processing. In the initi al group, the images in question

are usually taken with a full -color sensor, such as a color TV camera or a color

scanner. In the second group, the issue is to assign a color to a specific monochrome

intensity or set of intensities. Until recently, much of the digital color image

processing was performed at the pseudo -color level. Even so, over the last decade,

color sensors and color image processing hardware have become available at

affordable prices. As a result, full -color image processing techniques ar e now used

in a wide variety of applications, including publishing, visualization and the

Internet.

x After this chapter is complete, learners can understand color fundamentals

and the color spectrum.

x Know many of the color models used in digital image proce ssing.

x Understand how to implement fundamental techniques, including intensity

slicing and intensity -to-color transformations in pseudo -color image processing.

x You know how to decide whether a gray -scale approach can be extended to

color pictures.

x Know how to work with full color images, including color transformations,

color add -ons and tone/color corrections.

x Have to know the effect of noise in the processing of color images.

x Know how to perform spatial filtering on color images.

x Understand the benefits o f using color in the segmentation of images.

5.11 Colour Fundamentals

Introducing and discussing the visible light spectrum. A series of component colors

includes the visible light, also known as white light. The light passes through

triangular prism and these colors are also observed. Figure 5.8: The colors of the

white light on the prism – red, orange, yellow, green, blue and violet – are

separated. Dispersion is defined as the isolation of visible light into its various

colors. munotes.in

## Page 100

100IMAGE PROCESSING

Figure 5.8 Color sp ectrum seen by passing white light through a prism.

James Clerk Maxwell demonstrated that electromagnetic radiation is a form of

light. Radio waves, visible light and rays contain this radiation. As a radiation

range, which extends beyond visible radiation to include one side of the radio

wave and the other side of the gamma rays, Figure 2 indicates electra -magnetic

radiation. A very small part of the electromagnetic spectrum occupies the visible

light field. The sunlight is visible and stretches beyond the red (infraround IR)

and ultraviolet (UV) with a maximum of yellow intensity.

Figure 5.9. The electromagnetic spectrum

Fig.5.9 . The electromagnetic spectrum, which encompasses the visible region of

light, extends from gamma rays with wave lengths of one hundredth of a nanometer

to radio waves with wave lengths of one meter or greater.

In regard to light as an electromagnetic wave, the spectral signature of a color can

be recognized by its wavelength. We perceive the waves as color , the shorter the

munotes.in

## Page 101

101Chapter 5: Wavelet and Other Image Transforms

wavelength violet and the longest wave is red. Visible light is the range of

electromagnetic wavelengths to which the eye reacts. The human eye is unable to

adapt to radiation of longer or shorter wave lengths.

Figure 5.9 a. A wave representation of three different light hues:

red, yellow -green and violet

Fig 5.9. A wave representation of three different light hues: red, yellow -green and

violet, each with a different wavelength, which represents the distance between

wave crests.

Fig.5.9 a shows three standard waves of visible light. The longitude of the wave is

the distance from the crest to the next and the lambda, ߣ ,is indicated in Greek.

Violet light is a 410 nanometer wavelength electromagnetic radiation with 680

Nanometers of red lig ht.

Hundreds of thousands of different colors and intensities can be distinguishable

from the human visual system, but just about 100 grey shades. Therefore, a lot of additional information can be found within an image and this additional information can b e used in order to simplify image analyses, e.g. object identification and extraction based on color.

To define a specific color, three separate quantities are used. The color of the

wavelength is the dominant one. The colors visible on the electromagnetic

spectrum range from approximately 410nm (violet) to 680nm (red), as shown in

Figure 5.9a.

The saturation depends on the purity of excitation and on the amount of white light

combined with the hue. There is a pure hue completely saturated, i.e. no mixed

white light. Hue and saturation combined decide the chromaticity of a specified

munotes.in

## Page 102

102IMAGE PROCESSING

color. Ultimately, the actual amount of light determines the intensity with a more

intense colors corresponding to the more light.

Achromatic light has no color - only quantity o r intensity is its attribute. Grey level

is an intensity measurement. The intensity of the energy is determined by the

physical quantity. Brightness or luminance, on the other hand, is defined by color

perception and is also psychological. Blue is perceive d to be as darker than green,

given the intensity of blue and green. Note also that our perception of intensity is

nonlinear, as normalized intensity variations are considered to be the same changes

in brightness between 0.1 and 0.11 and 0.5 to 0.55.

Color mainly depends on the object's reflecting properties. We see the reflected

rays, while some are absorbed. The color and function of the human visual system

must also be taken into consideration. For instance, when green but no red light

lights it, the obj ect which reflects red or green, appears green, and in the absence

of green light, the object appears red. In pure white light, it will appear yellow (=

red + green).

Tristimulus theory of color perception

There are 3 kinds of cones in the human retina. Fi gure 5.10 shows the response of

each type of cone in accordance with the wavelength of the incident light. At 440nm

(blue), 545nm (green) and 580nm the peaks for each curve are (red). Note that the

last two in the yellow range of the spectrum actually peak .

Figure 5.10 : Spectral response curves for each cone type. The peaks for each curve

are at 440nm (blue), 545nm (green) and 580nm (red).

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 103

103Chapter 5: Wavelet and Other Image Transforms

CIE primaries

The tristimulus color perception theory seems to suggest that a combination of three

primaries – red, green and blue – can produce any color, while almost all visible

colors can be corresponded with one another, but some cannot. However, when one

of the primary colors, it can be balanced by a combination of the other two, and

thus a negative weighting of the primary can be considered for this particular one.

In 1931, the Commission Internationale de l'Éclairage (CIE), which could be

applied to the shape of all visible colors, specified three basic primaries, the names

X, Y and Z. The primary Y is selected such that the color match function matches

the luminous efficiency function of the human eye precisely, as shown in figure

5.11 for the sum of the three curves.

Figure 5.11 : The CIE Chromaticity Diagram showing all visible colors. x and y are

the normalized amounts of the X and Y primari es present, and hence z = 1 - x - y

gives the amount of the Z primary required.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 104

104IMAGE PROCESSING

Both visible colors appear in the CIE Chromaticity Diagram (see Figure 5.11 ). The

x and y axis provide normal X and Y primary quantities for a certain color and,

therefore, z = 1 – x – y provides the necessary amount of Z primary. Chromaticity

is independe nt of the luminous energy and depends on dominant wavelength and

saturation. Colors of the same chromaticity but differing luminance all map in that

area to the same point.

The pure spectrum colors lie on the curved border and a regular white light is

determined to be close to equivalent energy point x = y = z = 1/3. The endpoints of

a line at this point are followed by additional colors, that is, colors that add white.

Both colors in any line of the chromaticity diagram can be achieved by combining

the col ors of the end points of the line as shown in Figure 5.12 . In addition, the

vertices will form all colors in a triangle by mixing colors.

Figure 5.12 : Mixing colors on the chromaticity diagram.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

All colors on the line IJ can be obtained by mixing colors I and J, and all colors in

the triangle IJK can be obtained by mixing colors I, J and K.

5.12 Color Models

By specifying a 3D coordinate system and a subsp ace that includes all constructible

colors within a particular model, color patterns provide a standard way to specify a

particular color. Every color that can be described by a model is a single point

within the subspace it defines. The basic hardware (RG B, CMY, and YIQ) or image

processing applications (HSI) are oriented to each color model.

munotes.in

## Page 105

105Chapter 5: Wavelet and Other Image Transforms

The RGB Model

In the RGB model, an image is composed of 3 independent image levels: red, green

and blue. One is in each of the prim ary colors. (As seen in figure 5.10 , the regular

wavelengths are for the three primaries). The sum of each primary component

present is specified in a particular color. The geometry for color defined with a

Cartesian coordination system for RGB color model is shown in Figure 5.13 . In the

line between the black and white vertices is the greyscale spectrum, i.e. these colors

made of equal quantities of each primary.

Figure 5.13 : The RGB color cube. The greyscale spectrum lies on the line joining

the black and white vertices.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

It is an additive model, which is to say that the colors present in t he light add to

new colors and is suitable, for example, for mixing colored light. The additive color

mix of red, green, and blue primaries to form three secondary colors yellow (red +

green), cyan (blue + green) and magenta (red + blue) and white ((red + green +

blue) in the picture on the left of Figure 5.13 .

For the most color displays and video cameras the RGB model is used.

The pixel depth is termed the number of bits used to display each pixel in RGB

space. Consider an RGB image, which contains 8 -bit images of each red, green,

and blue image. Each pixel RGB [that is, triplets of values (R, G, B)] is 24 bits in

depth under th ese situations (3 image planes times the number of bits per plane). A

24-bit RGB image is sometimes referred to by the term full -color image. The total

munotes.in

## Page 106

106IMAGE PROCESSING

number of possible colors in a 24 -bit RGB image is (28)3= 16, 777, 216 . Figure

5.14 illustrate the 24 -bit RGB color cube analogous to the diagram in Figure 5.14

Note that the range of values in the cube are also scaled to the numbers that

represent the number bits in the images for digital images. If the key images are 8 -

bit, the limits of the cube would b e [0, 255]. For instance, at the point [255, 255,

255] white will be in the cube.

Figure 5.14 A 24 -bit RGB color cube.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

The CMY and CMYK Color Models

A subtractive model for absorption of the color, for example because of colored

pigments in paints, is the CMY (cyan -magenta -yellow) model. While the RGB

model determines what's added to black to obtain a particular color, the CMY

model determines w hat is subtracted from white. The primaries are cyan, magenta

and yellow in this case, the secondary color being red, green and blue.

When the surface is illuminated with a cyan pigment with white light, no red light,

including magenta, yellow and blue, is reflected. The RGB and CMY model relationship is defined by:

ܥ

ܯ

ܻ൩ൌͳ

ͳ

ͳ൩െܴ

ܩ

ܤ൩ eq-1

Most devices, such as color printers and copiers, that store colored pigments on

paper need CMY data input or make an RGB internal conversion to CMY. The

munotes.in

## Page 107

107Chapter 5: Wavelet and Other Image Transforms

conversion is ca rried out with simple operation (1) where all color values are

assumed to be standardized into the [0, 1] range. Equation (1) shows that the light

reflected by the pure cyan surface does not contain any red (i.e., C = 1 — R in the

equation). Likewise, pure magenta is not green and pure yellow is not blue.

Equation (1) also shows that a collection of CMY values allows RGB values to be

easily obtained by deducting from 1. Per CMY value. As previously stated, this

color model is used for the gene ration of a hardcopy output while image processing,

so that it is usually of little practical interest to reverse CMY to RGB operation.

The CMY model is used by printing devices and filters.

Figure 5.15 : The figure on the left shows the additive mixing o f red, green and blue

primaries to form the three secondary colors yellow (red + green), cyan (blue +

green) and magenta (red + blue), and white ((red + green + blue). The figure on the

right shows the three subtractive primaries, and their pairwise combin ations to form

red, green and blue, and finally black by subtracting all three primaries from white.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

The same quantities can contain black in the pigments primaries, cyan, magenta,

and purple. The combination of these colors in practice makes a black look muddy.

Thus, a fourth color, black, denoted by the K, will be added to create true black

(which then in print is the dominant color) which will lead to th e color of the

CMYK model. The black is applied to the measurements needed for true black. So

when publishers speak of 'four -color printing,' they refer to the 3 CMY colors, plus

a slice of black.

munotes.in

## Page 108

108IMAGE PROCESSING

To convert from CMY to CMYK, use the following formula:

K = min (C, M, Y) eq -2

C = C K

M = M K

Y = Y – K

If then we have pure black, with no color contributions, from which it follows

that

C=0 eq -3

M=0 eq -4

Y=0 eq -5

The C, M, and Y on the right side of Eqs . (2) -(5) are in the CMY color system.

The C, M, and Y on the left of Eqs. (6) -(9) are in the CMYK system.

Otherwise

& &í.í.HT -6

0 0í.í.HT -7

Y=(Y -K)/(1 -K) eq -9

Where all values are assumed to be in the range [0, 1]. The conversi ons from

CMYK back to CMY are:

& &
í..HT -10

0 0
í..HT -11

< <
í..HT -12

The C, M, Y, and K on the right side of Eqs. (10) - (12) are in the CMYK color

system. The C, M, and Y on the left of these equations are in the CM Y system

The HSI Color Model

The HSI model would improve the RGB model. The color model of the hue

saturation intensity is closely similar to the color sensing properties of human

vision. The method, which transforms from RGB to HSI or back, is harder than

other color models. I denote the intensity of light, H refers to the hue indicating the

measure of the purity of colors, S refers to the saturation. If the saturation value of

a color is high, it indicates that the color is the low white color. munotes.in

## Page 109

109Chapter 5: Wavelet and Other Image Transforms

Three quantities of hue, saturation and intensity may be defined for color. This is

the HSI model and the whole color space that can be specified is shown in Figure

5.16

Figure 5.16 : The HSI model with the left HSI, with the right HSI triangle, formed

by a horizontal slice with a particular intensity through the HSI solid. The color is

measured by a red, and the distance from the axis is saturated. The colors, i.e. pure

colors, on the solid surface are fully saturated, and the grey spectrum on the solid

axis. Hue is undefined for these colors.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Conversion between the RGB model and the HSI model is quite complicated. The

intensity is given by

ܫൌܴܩܤ

͵

where the quantities R, G and B are the amounts, normalized to the range [0,1] in

the red, green and blue components. Therefore the intensity is only an average of

the components red, green and blue. The saturation is provided through:

ܵൌͳെሺܴǡܩǡܤሻ

ܫൌͳെ͵

ܴܩܤሺܴǡܩǡܤሻ

where the min(R,G,B) term is really just indicating the amount of white present. If

any of R, G or B are zero, there is no white and we have a pure color.

munotes.in

## Page 110

110IMAGE PROCESSING

Certainly this quantity is easy to measure and interpret. The c olor model of HSI

(hue, saturation, intensity) separates the intensity component into a chromatic

image from the data that conveys colors (hue and saturation). The HSI model is

therefore an ideal tool for developing image processing algorithms that are nat ural

and intuitive to humans, based on color descriptions. The primary colors are

divided by 120°. The secondary colors are 60° from the primaries, which means

that the angle between secondaries is also 120°. Figure 10 shows the same

hexagonal shape and an arbitrary color point (shown as a dot)

An angle from a reference point determines the hue of the point. In general a 0°

angle of the red axis is 0 hue, and the hue from there increases in reverse. The

vector length from origin to point is the saturation ( the distance from the vertical

axis). The origin is defined by the vertical intensity axis intersection of the color

plan. High intensity axis, length of the /vector up to a color point and the angle this

vector creates with the red axis are the important components of the HSI color

spaces.

Figure 5.17 Hue and saturation in the HSI color model. The dot is any color point.

The angle from the red axis gives the hue. The length of the vector is the saturation.

The intensity of all colors in any of these plan es is given by the position of the plane

on the vertical intensity axis.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 111

111Chapter 5: Wavelet and Other Image Transforms

Figure 5.18 The HSI color model based on (a) triangular, and (b) circular color

planes. The triangles and circles are perpendicular to the vertical intensity axis.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Converting colors from RGB to HIS

Consider RGB values n ormalized to the range [0; 1].

Given an RGB value, H is obtained as follows:

ܪൌ൜ߠ݂݅ܤܩ

͵Ͳെߠ݂݅ܤܩ

It should be normalized to the range [0; 1] by dividing the quantity computed above

by 360

șLVJLYHQE\

ߠൌݏܿିଵቊͳȀʹሾሺܴെܩሻሺܴെܤሻሿሾሺܴെܩሻଶሺܴെܤሻሺܩെܤሻሿଶሿቋ

munotes.in

## Page 112

112IMAGE PROCESSING

șLVPHDVXUHGZLWKUHVSHFWWRUHGD[LVRI+6,VSDFH

Saturation is given by

ܵൌͳെ͵

ܴܩܤሾሺܴǡܩǡܤሻሿ

Intensity component is given by

ܫൌܴܩܤ

͵

Converting colors from HSI to RGB

Consider the values of HSI in the interval [0; 1]

+VKRXOGEHPXOWLSOLHGE\RUʌWRUHFRYHUWKHDQJOHIXUWKHUFRPSXWDWLRQLV

based on the value of H

RG sector – ƕ+ƕ

ܤൌܫሺͳെܵሻ

ܴൌܫͳܵݏܿܪ

ݏܿሺͲെܪሻ൨

ܩൌ͵ܫെሺܴܤሻ

GB sector – ƕ+ƕ

ܪᇱൌܪെͳʹͲ

ܴൌܫሺͳെܵሻ

ܩൌܫቈͳܵݏܿܪᇱ

ݏܿሺͲെܪᇱሻ

ܤൌ͵ܫെሺܴܩሻ

BR sector – ƕ+ƕ

ܪᇱൌܪെʹͶͲ

ܩൌܫሺͳെܵሻ

ܤൌܫቈͳܵݏܿܪᇱ

ݏܿሺͲെܪᇱሻ

ܴൌ͵ܫെሺܩܤሻ

munotes.in

## Page 113

113Chapter 5: Wavelet and Other Image Transforms

A Device Independent Color Model

Color model CMYK and RGB are device -dependent, that is to say, on the way a

color is translated. They point to a particular device, but don't have information

about the final person's perception of the color. The color with the identical RGB

parameters is viewed by us in various ways based on the brightness contrast and

the sharpness of your computer monitor, ambient light and the angle we are looking

at the monitors. In a 'CMYK' color model, a person's perception of color depends

on an even higher variety of conditions (for example gl ossy paper is more vivid

and rich, than a matte color), particularly paint, the humidity of the papers dried up

and the attributes of the printed printing press, etc.

We append (such as) color profiles to hardware -reliant color patterns to transmit

extra dependable color information to a person. So each profile includes information about a specific way of human color transmission and adapts the last

color to the original color settings by adding or removing any constituents. For

instance, a colored profile is used for printing on glossy foils, 10% Cyan cleaning

and 5% Yellow adding to the original color, because of the unique attributes of the

printing press, the film itself and other conditions. However, not all the transfer

color problems can be solved, ev en attached profiles.

Device -independent color models have no information on a person's color transfer. They describe color that people with normal colored vision perceive mathematically.

A device independent color space is one in which the coordinates use d to define

the color, wherever they are applied, produce the same color. The color space CIE

L*a*b* is an example of a device -dependent color space (known as CIELAB and

based on the human visual system).

The color components are given by the following equ ations:

ܮכൌͳͳǤ݄൬ܻ

ܻ௪൰െͳ

ߙכൌͷͲͲ݄൬ܺ

ܺ௪൰െ݄൬ܻ

ܻ௪൰൨

and

ܾכൌʹͲͲ݄൬ܻ

ܻ௪൰െ݄൬ܼ

ܼ௪൰൨

where

݄ሺݍሻൌቐඥݍయݍͲǤͲͲͺͺͷǤ

Ǥͺݍͳ

ͳͳݍͲǤͲͲͺͺͷ munotes.in

## Page 114

114IMAGE PROCESSING

The reference white tri stimulus values of Xw, YW and Zw are usual white of a

diffuser reflective of CIE D65 illumination standard (defined by x = 0.3127 and y

= 0.3290 in CIE chromaticity diagram of Figure 5.11 ). Figure 5.11 : The colors of

the L*a*b are colorimetric (i.e. colors that match are identically enco ded), are

perceptually uniform (i.e. the color differences among various colors are uniformly

perceived - see Mac Adams class paper [1942]). Although the format cannot be

viewed directly (it is required to convert to another color space), its spectrum cove rs

the whole spectrum of viewers and can precisely display, print or print, or input

device. Like the L*a*b system, it is an excellent intensity decoupled (presented in

lightness L*) from the color (presented with a* for red minus green and b* for green

minus blue), which makes it useful both for the manipulation of images (tone and

contrast editing) and for image compression application. Calibrated imagery

systems benefit primarily from interactive, independent correction of tonal and

color imbalances in t wo sequence operations. Before irregularities such as colors

over and under saturated, the problems of the tonal range of the images are

resolved. The tonal range of an image, also known as its key type concerns its

overall color intensity distribution. Mo st information on high -key images are

concentrated at high (or light) intensities; colors for low -key images are mainly at

low intensities; middle -key images lie among them. As in the monochrome case,

the intensities or the image of color between the highl ights and the shadows are often equally desirable. The following examples show a range of color transformations for tonal and color balance correction.

5.13 Pseudocolor Image Processing

Pseudo color processing (also known as false color means assigning col ors to grey

values, on the basis of a given criterion. The name pseudo or false color is applied

to distinguish the color assignment process from the process associated with true

color for human visualization and interpretation of gray -scale events in an image or image sequence. The fact that human beings distinguish thousands of colors shades and intensities from only two dozen or so shades of the grey is one

of the main reasons for coloring.

Intensity Slicing and Color Coding

This is a straightforward ca se for pseudocolor image processing. It is also known

as color coding or density slicing.

Imagine a 3D function greyscale image with the intensity of the third dimension.

The pixel position coordinates would "slice" the image into two parts when placing

a plane parallel to the horizontal plane. Then different colors can be assigned to munotes.in

## Page 115

115Chapter 5: Wavelet and Other Image Transforms

different levels. Figure 5.8 shows an example of using a plane at f(x, y) = l i to slice

the image function into two levels.

If each side of the plane shown in Figure 5.19 is assigned a different color, a pixel

whose grey level is over the plane is coded in one color, whilst any pixel underneath

it is coded in one color. One of the two colors may be assigned arbitrarily to the

levels on the plane itself. The result is a two -colored picture that can control its

relative appearance by moving the sliding plane upward and downward.

The technique can be formulated as continues to follow in particular. The grey scale

is [0, L -1], and l et lo is indicate as black [f(x,y),y) = 0], and level l L-1the

represented white is [f(x,y) =L -1]. Assume that P planes are defined in levels l 1,

l2,....,l p are perpendicular to the intensity axis. If 0 < P < L –1, the grey scale is

divided into I 1, I2... I p + 1.

݂݂݅ሺݔǡݕሻ߳ܫݐ݈݂݁ሺݔǡݕሻൌܿ

Where ck is the color accompanying with the kth intensity interval I k defined by the

partitioning planes at l = k - 1 and l = k.

Figure 5.19 Geometric interpretation of the intensity slicing technique.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

The concept of planes is particularly useful to interpret the intensity -slicing

technique in geometric terms. Figure displays an alternative im age defining the

same mapping as in Fig. Depending on whether the mapping function in the figure

is above or below the li value, any grey input level is assigned one of two colors.

The mapping function takes on a staircase form when more levels are used.

munotes.in

## Page 116

116IMAGE PROCESSING

Figure 5.20 An alternative representation of the intensity -slicing technique.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Figure shows a simple yet practical application of intensity slicing. The grayscale

image of the radiation testing pattern Picker Thyroid Phant om is Figure 5.21 (a)

and the intensity slicing of this image is divided into eight colors, as shown in

Figure 5.21 (b). Regions that occur in the grayscale are in fact very variable, as the

different colors in the sliced image show.

The left lobe is a dull grey in the grayscale picture, for example, which makes it

difficult to pick out changes in intensity. In contrast, 8 regions of constant intensity

are clearly visible in the picture, one for each of the colors used. With different

color numbers and intensity intervals, the characteristics of intensity variations in

a grayscale image are quickly determined. In situations such as the one shown here,

the object of interest has a uniform texture with variations in intensity which are

difficult to analyze visually.

Figure 5.21 (a) Grayscale image of the Picker Thyroid Phantom. (b) Result of

intensity slicing using eight colors.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 117

117Chapter 5: Wavelet and Other Image Transforms

The above simple example divided the grayscale into intervals and each was given

a different color, regardless of the significance of the grayscale. In this case, the

interest was simply to view the various grey levels that make up the image. Intensity

slicing plays a much more meaningful role when the grayscale is divided according

to the physical charac teristics of the image Figure 5.22 .(a) shows, for example, the

X-ray image of a weld (wide dark horizontal area) with multiple cracks and

porosities (the bright streaks running horizontally through the middle of the

image). If porosity or crack is present in a weld, the full force of X -rays through

the object saturates the image sensor across the object. Thus, an 8 -bit image of

intensity values of 255 in such a system implies a welding problem automatically.

If visual analysis is used to inspect welds (a widely accepted method still now),

the inspector can make the work significantly easier with the simple color coding

which allocates one color to level 255 or another to all other intensity levels. The

result is displayed in Figure 5.22 (b). The conclusion that if images were pr esented

in the form of Figure 5.22 .(b) no explanation was required that human error rates

would be lower in other words, if you are looking for an intensity or range of values,

intensity slicing is a simple but effective visualization aid, especially if many

images need to be checked regularly.

Figure 5.22 (a) X -ray image of a weld. (b) Result of color coding.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Intensity to Color Tran sformations

The idea behind this approach is to carry out three independent grey pixel

transformations of any input pixel. The three outcomes are then separately fed into

a color Television screen's red, green and blue channels. This technique generates

a composite image whose color contents are amplified by the nature of the

transformations functions. Mention that the gray -level image values are transformations and are not position functions.

munotes.in

## Page 118

118IMAGE PROCESSING

In the intensity slicing, piecewise linear functions of the grey level generate linear

functions to produce colors. This method can, however, be focused on smooth,

nonlinear functions, which give significant versatility as might be expected.

Figure 5.23 Functional block diagram for pseudocolor ݂ோǡ݂ீ݂݀݊ܽ image

processing. Images and are fed into the corresponding red, green, and blue inputs

of an RGB color monitor.

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

The type of processing shown just now is very powerful for visualizing events of

interest for complex images, in particular when they go beyond our normal sensing

capabilities. Figure 5.24 is a very good example. These are Jupiter Moon Io

images, shown in pseudo -color, when several se nsor photos of the Galileo

spacecraft are combined, some of which cannot be seen on the eye in spectral

regions. However, it is possible to combine the sensed image with a meaningful

pseudo -color map by understanding physical and chemical processes that ca n

influence sensors' responses.

Figure 5.24: pseudo color rendition Jupiter Moon Io.

munotes.in

## Page 119

119Chapter 5: Wavelet and Other Image Transforms

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

One way of combining sensed image data is by showing differences in the

composition of surface chemicals or by modifying the reflected surface sunlight.

The pseudo color image, for example, in Figure 5.25 shows bright red representing

the newly expelled ma terial from Io's active volcano. This image provides these

features much more easily than would have been possible through individual

analysis of the component images

Figure 5.25 : A close -up (Courtesy of NASA)

(Reference from "Digital Image Processing fo urth edition by

Rafael C. Gonzalez and Richard E. Woods

5.14 Basics of Full -Color Image Processing

Techniques for full color image processing fall under two categories. In the first

category, each component image is constructed independently and a composite

color image from the individual elements is then established. One work directly

with color pixels in the second category. Due to the at least three color images,

color pixels are actually vectors. In the RGB system for example, every color point

could be interpreted as a vector in the RGB coordinate system extending from the

source to that point.

Let c represent an arbitrary vector in RGB color space:

ܿൌܥோ

ܥீ

ܥ൩ൌܴ

ܩ

ܤ൩

munotes.in

## Page 120

120IMAGE PROCESSING

This equation shows that the components of c are at one point simply the RGB

components of a color image. When the color components are a coordinate function

(x, y), the notation is used

ܿൌܥோሺݔǡݕሻ

ܥீሺݔǡݕሻ

ܥሺݔǡݕሻൌܴሺݔǡݕሻ

ܩሺݔǡݕሻ

ܤሺݔǡݕሻ– pq(2)

MN vectors such as c(x, y), x = 0,1, 1, 2,...,M - l; y = 0,1,2,,...,N - 1, are present on

an image of the size M X N.

It should be kept in mind clearly that Eq.(2) represents a v ector with spatial

variables of x and y components. Two conditions have to be met for the equivalence

of each color component and vector -based processing: First, both vectors and

scalars must be covered by this process. Second, it must be independent of al l other

components for each component of the vector.

Figure 5.26 Spatial masks for gray -scale and RGB color images.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Figure 5.26 shows grey scale and full -color images for the spatial processing in the

neighborhood. Assume the process is an average neighborhood. The average grey

level of all pixels in the neighborhood would be summarized and divided by the

total number of pixels in the neighborhood in Figure 5.26 (a). In Figure 5.26 ( b),

an average of all vectors in the neighborhood would be summed up and the total

number of vectors in the neighborhood would be divided by each component. So

each component of the average pixel is the sum of the image pixel of the same

component, which is the same as the result if the average is per color component.

munotes.in

## Page 121

121Chapter 5: Wavelet and Other Image Transforms

5.15 Colour Transformations

In the context of a single color model, the color transformation processes the color

image components.

Formulation

As with the techniques of transformation at gray -level of model colour with the

expression

ሺǡሻൌሾሺǡሻሿ

Where f(x,y) is an color input image, g(x,y) is a color output image that's

transformed or processed, and T has become an operator i n f over a space area of

f (x,y).

The pixel values here are three -fold or quartets from the color space selected to

represent the images (e. g. groups of three or four values).

We introduced the basic grayscale transformations similarly to the approach. I n this

section we will focus only on the color changes in the form.

ൌሺͳǡʹǡǤǤǤǤǤǤǡሻǡൌͳǡʹǡǤǤǤǤǤǤǤǤǡ

Where ri and si are variables for notation simplicity which are f(x,y) and g (x,y) at

any point (x,y ) color components, n is the number of color components, and

{T1,T1,....,Tn} is a set of functions for transformation and color map that operates

at the time of ri to produce si. Notice that n transformations, Ti, integrate in

equation to implement the sin gle transformation function T . The color space

selected to describe f and g pixels defines the value of n. The red, green, and blue

components for the image input respectively indicate when the RGB color space is

selected, e.g. n=3, r1,r2, and r3. When se lecting CMYK or its color spaces, n=4 or

n=3.

Figure above shows a color image of a large -size (4' x 5 ') high -resolution color

image of a bowl of strawberries and cup of coffee that's been digitized. The initial

CMYK scan components are in the second row of the figure. In these images, 0 is

black and 1 color component in each CMYK. This is shown by the strawberries,

because the images that correspond to the two CMYK components are the brightest

of the large quantities of magenta and yellow. Black is spare and generally

contained in the bowl of strawberries in coffee and shadows. The last row or Fig. 7

shows the components of Figure 5.27 — equated, (2) (4). As expected, a monochrome rendering of the original color is the intensity component. Figure 7:

compon ents computed using equation, (2) up to (4 )are shown in this last row . The

intensity component is presumed to be a monochrome version of the original full -

colour. Moreover the strawberries are relatively pure in colour; they have the

highest saturation o r a minimum dilution of the hues in the image by white light,

so finally we find it difficult to describe the hue component. munotes.in

## Page 122

122IMAGE PROCESSING

The challenge is exacerbated by the fact that in a model where 0 and 360o meet,

there is a discontinuity and that the saturation of the hue is uncertain ( — for

example for white, black, and pure grey). The discontinuity is quite obvious around

the strawberries, which are displayed in both black (0) and white grey level values

(1).

Figure 5.27 : A full -color image and its various colo r-space components (Original

image courtesy Med -data Interactive.)

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Equation (2 ) can be used with any color space components of Figure 5.27 (2).

Theory shows that any changes in any color model can be carried out. However, a

few other operations are more suitable for specific models in practice . The cost of

transferring representations into a decision on the colour space to implement this

must be considered for a given transformation. Suppose, for example, we want to

change the image intensity using Fig. 5.27.

݃ሺݔǡݕሻൌ݂݇ሺݔǡݕሻ

Where 0 < k < 1 , the simple transformation can be done in the colour space.

ݏଷൌݎ݇ଷ

Where s1 = r1, and s2 = r2. O nly its component intensity r3 is altered. Three

components have to be changed in the RGB colour space:

munotes.in

## Page 123

123Chapter 5: Wavelet and Other Image Transformsݏൌݎ݇݅ൌͳǡʹǡ͵

Similar linear transformations are required in CMY space:

ݏൌݎ݇ሺͳെ݇ሻ݅ൌͳǡʹǡ͵

In the same way, the transformations need ed to change the CMYK image intensity

are given

ݏൌ൜ݎ݅ൌͳǡʹǡ͵

ݎ݇ሺͳെ݇ሻ݅ൌͶ

This equation tells us that we only modify the fourth (K) component to modify the

intensity of a CMYK image.

Figure 5.28 shows that the transformations are applied to the full color image. In

Figure 5.28 (c) through (h) the mapping functions are shown graphically. The

CMYK mapping function consists of two components, the same is the case with

HSI; one component is handled by the transformations, the other by the rest. As

most various transformations have been made, the net outcome of modifying the

color intensity by a fixed value for everyone has been the identical

Figure 5.28 Adjusting the intensity of an image using color transformations. (a)

Original image. (b) Result of decreasing its intensity by 30% (i.e., letting ). (c)

The required RGB mapping function. (d) –(e) The required CMYK mapping

functions. (f) The required CMY mapping function. (g) –(h) The required HSI

mapping functions.

(Original image courtesy of MedData Interactive.)

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 124

124IMAGE PROCESSING

It must be noted that every trans formation represented through out varies depending

on only one component within the space of its colors. The red ou tcome component,

varies depending on only the red input and is autonomous of the green and blue

inputs. Straightforward and regularly used tools for color processing include

transformations of this type. As mentioned at the start of our discussion, it can be

done on a per -color basis.

We would also evaluate numerous transformations of this type in the remainder of

this section and discuss the matter wherein the functions for c omponent

transformation depend on all the color components of the input image and therefore

cannot be performed on a single color component basis.

Color Complements

The color circle (also known as the color wheel) shown in Figure 5.29 came from

Sir Isaac N ewton who developed his first position at the end of the color spectrum

in the seventeenth century. The color circle is an image of the colors organized as

per their chromatic connection. The circle consists of the primary colors being

equally distant. The secondary colors are then positioned in an equal distance

arrangement between both the primary colors . The net outcome is that the color

circle is supplemented by hues directly opposite one another. Our interest in

supplements i s due to their analogy with the grayscale negative.

Color complements are, as with the gray -scale particular instance, effective to

enhance the detail embedded in the dark regions of a color image notably if the

regions dominate. Some of these concepts are explained in the following example.

Figure 5.29 Color complements on the color circle.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 125

125Chapter 5: Wavelet and Other Image Transforms

EXAMPLE: Computing color image complements.

The full picture of Figure 5.30 and its complement to colour is displ ayed in Figure

5.30 (a) and (c). Figure 5.30 (b) illustrates the RGB transformations used to

calculate a complement. They correspond to the negative grayscale process. The

complement recalls conventional c olour film negative photographic elements. The

cyans in the complement replace the reds of the original image. The compliment is

white when the original image is black, etc . The colour circle of Figure 5.30 .

allows the original image to be predicted for e ach hue in the supplement image, and

only a corresponding input colour functions for each RGB component that is

involved in the calculation of a complement.

Figure 5.30 Color complement transformations. (a) Original image. (b) Complement transformation functions. (c) Complement of (a) based on the RGB

mapping functions. (d) An approximation of the RGB complement using HSI

transformations.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Wood

Color Slicing

It is useful to highlight a special range of colours in an image to separate objects

from their environment. The basic idea: (1) demonstrates the colours of interest so

they can distinguish themselves from the background; or (2) uses a color -defined

area as a mask. The simplest approach is to extend the techniques of intensity

slicing. Even so, since the colour pixel is an n -dimensional quantity , the consequent colour transformation functions are more difficult than their counterparts in

munotes.in

## Page 126

126IMAGE PROCESSING

grey scale. In essence, the necessary transformations seem to be quite complicated

than that of the originally thought color component transformations. It is be cause

every practical color slicing method includes the transformed color components of

every pixel as a function of all the color components of the original pixel.

One way of "slicing" a color image is to map colors into a nonprominent neutral

color after a certain range of interest. When a Cube of width W (or hypercube for)

is contained in interest’s colors, the required transformation set shall be determined

through prototypes (e.g. average) of color with components.

ݏൌ൝ͲǤͷ݂݅ቂቚݎെܽหݓ

ʹቃ

ݎݐ݄݁ݏ݅ݓݎ݁ݕ݊ܽͳ݆݊݅ൌͳǡʹǡǥǡ݊

Such transformations illustrate the colours surrounding the prototype by forcing all

other colours to the centre point of the colour space (this is an arbitrarily chosen

neutral point) . For example, the colour space for the RGB is middle gray or colour

(0.5, 0.5, 0.5) with an appropriate neutral point.

Eq. has become whenever a sphere is used to indicate the colours of interest.

ݏൌ൞ͲǤͷ݂݅ሺݎെܽ

ୀଵሻଶܴଶ

ݎݐ݄݁ݏ݅ݓݎ݁݅ൌͳǡʹǡǥǡ݊

The enclosed sphere radius (or hypersphere for) and its centre is component (i.e., a

prototypical colour). Other bene ficial variations of Eq and include the

implementation of multiple colour protot ypes and a reduction of the intensity of

colours, rather than a neutral constant.

5.16 Color Image Smoothing and Sharpening

It will be illustrated in this section how the fundamentals of this type of

neighborhood processing are applied to the task of smo othing and sharpening

colored images.

Color Image Smoothing

Using the grayscale picture smoothing technique, one can view of spatial filtering

as a spatial filtering operation in which the coefficients of the filtering mask are all

1. As the mask is moved across the image to be smoothed, each pixel is replaced

by the average of the pixels in the neighborhood indicated by the mask, resulting

in a smoothed image with fewer pixels. The application of this principle to the

processing of full -color photogr aphs is straightforward. The most significant

distinction is that, rather than dealing with scalar gray -level values, we must deal

with component vectors of the form given in Equation ( 2).

Assume Sxy is the set of coordinates that define a neighbourhood in an RGB colour

image that is centred at ( x, y ). In an RGB colour image , the average of the RGB

component vectors in this neighbourhood is calculated as follows: munotes.in

## Page 127

127Chapter 5: Wavelet and Other Image Transforms

Equation ( 1 )

It follows from Equation ( 2 ) and the properties of vector addition that

Equation ( 2 )

Identify the components of this vector as scalar images that would be created by

independently smoothing each plane of the beginning RGB image using standard

gray-scale neighborhood processing, as shown in the example below. Consequently, we conclude that smoothing by neighborhood averaging can be

performed on a per -color plane basis using neighborhood averaging. In this case,

the result is identical to the one obtained when the averaging is conducted using

RGB color vectors.

Figure 5.31 Color Image Smoothing

[Image refer from “ Smoothing, Sharpening and Segmentation of Image ”Dr Mir

Mohammad Azad, M N I Chowdhury International Journal of Engineering and Applied

Sciences (IJEAS) ISSN: 2394 -3661, Volume -5, Issue -5, May 2018 ]

munotes.in

## Page 128

128IMAGE PROCESSING

Eg:- Consider the color image shown in Figure 5.31 (a). The red, green, and blue

planes of this image are depicted in Figs, Figure 5.31 (b) through (c)

The HSI components of the image are depicted in Figures Figure 5.31 (a) through

(d). We simply smooth each o f the RGB colour planes on its own, and then merge

the smoothed planes to generate a smoothed full -color result by combining the

processed planes.

Figure 5.32 HSI component of the RGB color image

[Image refer from “ Smoothing, Sharpening and Segmentation of Image ”Dr Mir

Mohammad Azad, M N I Chowdhury International Journal of Engineering and Applied

Sciences (IJEAS) ISSN: 2394 -3661, Volume -5, Issue -5, May 2018 ]

Undoubtedly, the most significant advantage of the HSI colour model is that it

deco uples intensity (which is closely related to grey scale) from colour information.

The fact that it is suited for various gray -scale processing techniques suggests that

smoothing simply the intensity component of the HSI representation shown in

Figure 5.32 may be more efficient than smoothing the entire representation. This

approach's benefits and/or drawbacks are demonstrated next by smoothing only the

intensity component, while keeping the hue and saturation components unaffected,

and converting the resul t to an RGB image for display.

Color Image Sharpening

In this section, we will look into image sharpening with the Laplacian function.

Knowing the Laplacian of an input vector, we can define it as a vector with

components that are identical to the Laplacia n of each individual scalar component

of the input vector. This definition comes from the field of vector analysis. The

Laplacian of vector c in Equation I corresponds to the RGB colour scheme.

munotes.in

## Page 129

129Chapter 5: Wavelet and Other Image Transforms

In the same way that we learned in the previous section, we can compute the

Laplacian of a full -color image by computing the Laplacian of each component

image individually.

Figure 5.33 Image sharpening with the Laplacian

[Image refer from “ Smoothing, Sharpening and Segmentation of Image ”Dr Mir

Mohammad Azad, M N I Chowdhury International Journal of Engineering and Applied

Sciences (IJEAS) ISSN: 2394 -3661, Volume -5, Issue -5, May 2018 ]

Using Equation to compute the Laplacians of the RGB component pictures in

Figure 5.30 and merging them to generate the sharpened full -color result in Figure

5.33, .This result was obtained by multiplying the Laplacian of the intensity

component by the hue and saturation components, which remained constant.

5.17 Using Color in Image Segmenta tion

Segmentation is a process that partitions an image into regions.

Segmentation in HIS color Space

If we want to segment an image depending on colour, for example, and in this case,

Furthermore , we wish to carry out the procedure on an individual basis. When it

comes to planes, it is reasonable to think of the HSI space because the hue picture

provides an accessible representation of colour. Typically, using saturation as a

masking image, you can separate off more information. In the hue image, look for

munotes.in

## Page 130

130IMAGE PROCESSING

areas of interest. The image of intensity is as follows: Because of this, it is

employed less frequently for colour picture segmentation. It does not contain any

colour information. The following is an illustration: This is an example of how

segmentation i s carried out in the HSI system.

Figure 5.34 Image segmentation in HSI space. (a) Original. (b) Hue. (c) Saturation.

(d) Intensity. (e) Binary saturation mask (f) Product of (b) and (e). (g) Histogram

of (f). (h) Segmentation of red components from (a).

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Figure 5.34 (f) depicts the product of the mask and the hue image, and Figure 5.34

(g) depicts the histogram of the product image (note that the gray scale is in the

range [0, 1]). We can observe in the histogram that the high values

(which correspond to the values of interest) are grouped at the very high of the

grayscale, clos e to the value of one. Figure 5.34 (h) depicts the binary image

munotes.in

## Page 131

131Chapter 5: Wavelet and Other Image Transforms

produced b y thresholding a product image with a threshold value of 0.9 as the

outcome of the thresholding operation. Using the spatial location of the white spots

in this image, we can determine which points in the original image have the reddish

hue that we are loo king for. The segmentation approach used here was far from

ideal, as there are areas in the original image where we would definitely state there

is a reddish hue, but which were not identified by this segmentation method. The

regions depicted in white in Figure 5.34 (h) are, however, the best that this

approach can accomplish in detecting the reddish components of the original

image, as determined by trial and comparison. The segmentation method detailed

in the following section has the potential to produce better outcomes than the

previous method.

Segmentation in RGB vector Space

Working in HSI space is more intuitive, segmentation is one area in which better

results generally are obtained by using RGB color vectors. The approach is

straightforward. Suppose that the objective is to segment objects of a specified

color range in an RGB image. Given a set of sample color point’s representative of

the colors of interest, we obtain an estimate of the “average” color that we wish to

segment. Let this averag e color be denoted by the RGB vector a. The objective of

segmentation is to classify each RGB pixel in a given image as having a color in

the specified range or not. In order to perform this comparison, it is necessary to

have a measure of similarity. One of the simplest measures in the Euclidean

distance. Let z is similar to a if the distance between them is less than a specified

threshold, D0. The Euclidean distance between z and a is given by

In this case, the subscripts R, G, and B represent the RGB c omponents of vectors a

and z, respectively. As shown in the illustration, th e locus of points such that D ( z,

a ) D0 is a solid sphere with radius D0, and this is known as the locus of points.

Figure 5.34 (a). Points that are contained within or on the surface of the object.

Points outside the sphere satisfy the provided colour condition; spheres that satisfy

the specified colour criterion sphere od does not exist. It is necessary to code these

two groups of p oints in the picture. Using, for example, black and white, one can

create a binary segmented image.

munotes.in

## Page 132

132IMAGE PROCESSING

Figure 5.34 Three approaches for regions for RGB vector segmentation.(a ->b->c)

(Reference from "Digital Image Processing fourth edition by Rafael C. Gonzalez

and Richard E. Woods)

5.18 Noise in Color Images

It is possible to use color images with the noise models. While the noise content of

a color image typically has the same properties in each color channel, it is possible

for separate color channel s to be affected differentially by noise in some cases. If

the electronics of a particular channel malfunction, this is one possibility to

consider. Different noise levels, on the other hand, are more likely to be produced

by changes in the relative amount s of noise. A measure of the amount of

illumination supplied to each individual color channel using a red filter in a CCD

camera, for example, can significantly weaken the image. because of the amount

of illumination detected by the red sensor elements Bec ause CCD sensors are

noisier at lower levels of illumination, the resulting red light is more intense. In this

circumstance, the noise component of an RGB image would tend to be louder than

the noise components of the other two component images.

In this ex ample, we explore the impact of noise in colour images. The image in

Figure 5.35 (a) through (c) depicts the three primary colours in an RGB image with

added additive Gaussian noise, whereas Figure 5.35 (d) represents the final image

after the colours have been mixed together. In color images, fine grain noise, such

as this, appears less obvious since it is masked by colour. The results of converting

the RGB image in Figure 5.35 to HSI are shown in Figure 5.35 (a) through (c).

Consider how much worse the hue and saturation values in the noisy image have

gotten when compared to the original HSI image. Similarly, the overall image of

munotes.in

## Page 133

133Chapter 5: Wavelet and Other Image Transforms

Figure 5.35 (c) is a little smoother and has less noise than any of the thr ee noisy

RGB images. As can be seen in Equation, the intensity image is the sum of the

RGB images

Figure 5.35 (a)–(c) Red, green, and blue 8 -bit component images corrupted by

additive Gaussian noise of mean 0 and standard deviation of 28 intensity level s. (d)

Resulting RGB image

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Figure 5.36 HSI components of the noisy color image in Figure 5.35 (d) . (a) Hue.

(b) Saturation. (c) Intensity.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 134

134IMAGE PROCESSING

When a single RGB channel is damaged by noise, for example, converting to HSI

distributes the noise among all HS I component images. An illustration of this is

shown in Figure 5.36 . Figure 5.36 (a) illustrates an RGB image with a corrupted

green component caused by salt and pepper noise with a probability of either salt

or pepper of 0.05. The HSI component images in Figure 5.36 (b) through (d) clearly

demonstrate how the noise extended from the green RGB channel to all of the HSI

images. Naturally, this is not surprising, as the HSI components are computed using

all RGB components.

Figure 5.36 (a) RGB image with green plane corrupted by salt -and-pepper noise.

(b) Hue component of HSI image. (c) Saturation component. (d) Intensity

component

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

5.19 Color Image Compres sion

Due to the fact that the number of bits necessary to represent colour is often three

to four times that required to represent grey levels, data compression is critical for

the storing and transmission of colour images. In the previous sections' RGB,

CMY(K), and HSI images, the data that are compressed are the components of each

colour pixel (e.g., the red, green, and blue components of the pixels in an RGB

munotes.in

## Page 135

135Chapter 5: Wavelet and Other Image Transformsimage); these are the mechanism by which colour information is given. Compression is the process of removing duplicate and/or unneeded data.

Figure 5.37 (a) depicts a 24 -bit RGB full -color representation of an iris, where each

of the red, green, and blue components is represented by eight bits. Figure 5.37 (b)

was created using a compressed version of the image in (a) and is thus a compressed

and then decompressed approximation of it. Although the compressed image

cannot be displayed directly on a colour monitor —it must be decompressed first —

it comprises just one data bit (and hence one storage bit) fo r every 230 bits of data

in the original image. Suppose that the image is of size 2000 × 3000 =6.106 pixels.

The image is 24 bits/pixel, so it storage size is 144 ڄ 106 bits.

Figure 5.37 Color image compression. (a) Original RGB image. (b) Result of

compressing, then decompressing the image in (a).

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 136

136IMAGE PROCESSING

5.20 Summary

We have studied the basis form ation of digital images and the real and imaginary

parts of the DFT basis images .Several transforms such as discrete Hartley

Transform (DHT) Discrete Cosine Transform , Walsh -Hadamard Transforms, Slant

Transform, Haar Transform and their properties, Wavelet Transforms and Discrete

Wavelet Transform. Studies the fundamental of colour,CIE primaries, different

types of color models like RGB Model ,CMY and CMYK Color Models ,HSI Color

Model and concepts of conversion for color models from RGB to HIS ,HSI to RGB,

Color model CM YK and RGB for independent devices, Pseudo color processing ,Intensity to Color Transformations ,Techniques for full color image processing , Color Image Smoothing and Sharpening and so on.

5.21 Unit End questions

1. Explain the Matrix -Based Transforms.

2. Explain t he basic functions in the Time -Frequency Plane.

3. Write short note on Discrete Hartley Transform.

4. Write short note on Slant Transform.

5. Write short note on Haar Transform.

6. Write short note on Wavelet Transforms.

7. Explain the color fundamentals.

8. Explain the col or models.

9. Explain the Pseudocolor Image Processing

10. Explain the color image Segmentation.

11. Explain the noise in the color images.

12. Explain the Color image compression.

5.22 Reference for further reading

1. Digital Image Processing Gonzalez and Woods Pearson/Prentice Hall Fourth

2018

2. Fundamentals of Digital Image Processing A K. Jain PHI

3. The Image Processing Handbook J. C. Russ CRC Fifth 2010

4. All Images courtesy from Digital Image Processing by Gonzalez and Woods,

4th Edition

munotes.in

## Page 137

137Chapter 6: Image Compression and Watermarking

Unit 3

6 IMAGE COMPRESSION AND WATERMARKING

Unit Structure

6.0 Introduction

6.1 Fundamentals

6.2 Huffman Coding

6.3 Golomb Coding

6.4 Arithmetic Coding

6.5 LZW Coding

6.6 Run -length Coding

6.7 Symbol -based Coding

6.8 8 Bit -plane Coding

6.9 Block Transform Coding

6.10 Predictive Coding

6.11 Wavelet Coding

6.12 Digital Image Watermarking

6.13 Summary

6.14 Unit End questions

6.15 Reference for further reading

6.0 Introduction

In recent years multimedia product development and demand have increased more

and more, contributing to an inadequate network bandwidth and memory storage

device. The theory of data compression therefore becomes increasingly important

in order to reduce the redundancy of data, saving more bandwidth and space in

hardware. The process of encoding information using fewer bits or other information units than an unencoded representation is data compression or source -

coding in computer science and information theory. It helps to reduce costly

resource consumption, for exampl e, disk space or bandwidth transmission. Theory

and practice will be introduced, the most commonly used compression techniques

examined, and the industry standards described, that will help us. This section

concludes on the process of introducing visible a nd invisible data (such as

information concerning copyright) to images into digital images watermarking. munotes.in

## Page 138

138IMAGE PROCESSING

6.1 Fundamentals

Data compression, also known as compaction, the process by which data is

reduced, typically using encoding techniques, necessary for storage or the transfer

of a given information piece. There must be a clear distinction between information

and data. It's not synonymous with it. Actually, data is the means of transmitting

information. The same amount of information may be represented by different

amounts of data. You can see an example in order to understand image redundancy

or data redundancy in digital image processing. Suppose two people told a story

from Ramesh and Suresh. In the comparison with Ramesh, Suresh told the story in

less words, where Ramesh had too many words to tell the same story. Either

Ramesh said irrelevant information/data that isn't the part of his story, or he

repeated his words several times. In other words, it includes data (or words) not

providing any relevant information or simply restoring what has already been

known. This means that data redundancy is included

Data redundancy is a key issue for the compression of digital images. It is not an

abstract concept, but a quantifiable entity mathematica lly. If n1 and n2 indicate the

number of information -carrying in two data sets that comprise the same information, the relative data redundancy (R D) of the first data set (n1) can be

defined as

ܴൌͳെͳ

ܥோ

Where CR is commonly referred to as the compression ratio

ܥோൌ݊ଵ

݊ଶ

In case n2 = n1, C R = 1, R D = 0 shows that the first representation of the information

does not contain redundant data (related to the second dataset). In n2 << n1 ,ܥோฺ

λฺܴͳ which invol ves considerable compression and highly redundant data.

Finally, the second set of data contains a lot more than the original representation

of݊ଶب݊ଵܥோฺͲฺܴλ, respectively. Naturally this is the usually unwanted case of expansion of the data. C R and R D are generally intervals (0,λ)

and ( -λ, 1), respectively. A useful Compression Ratio, such as 10 (or 10:1), means

that for each one unit in a second or compressed data set, a first data set contains

ten information carrying units (say bits) . The corresponding 0.9 redundancy means

that 90% of the first data is redundant.

Three fundamental data redundancies could be recognized and used in digital

image compression: coding redundancy, interpixel redundancy, and psychovisual

redundancy. Data com pression is obtained by reducing or eliminating one or more

of these redundancies. munotes.in

## Page 139

139Chapter 6: Image Compression and Watermarking

Coding Redundancy:

In this we use a wording to demonstrate how an image's grey histogram can also

provide a good insight into code construction in order to minimize the amou nt of

data used for it.

Again suppose a discrete random variable r k in interval [0, 1] is the grey levels in

the image and every r k is likely to occur probability pr (rk).

ሺݎሻൌ݊

݊݇ൌͲǡͳǡʹǡǥǡܮെͳ

Where L is the number of grey level, n k is the number that is demonstrated in the

image with the kth grey level, and n is the total pixel number in the image. If the

number of bits used for each r k's value is l (r k), the average number of bits needed

for each pixel is

ܮ௩ൌ݈ିଵ

ୀሺݎሻሺݎሻ

In other words, the average length of the code words allocated to the different grey

level values is determined by summing up each grey level's number of bits and

probability of the grey level occurrence. The nu mber of bits needed to code the M

X N image is also MNL avg.

Interpixel Redundancy:

Take i nto account the images in Fig. 6 .1(a) and (b). These images show almost

identica l histograms as shown in Figs. 6 .1(c) and (d). It is also important to note

that both histograms are trimodal, and that three dominant ranges of grey level

values are present. Due to the not equally likely grey values of these images,

variable length coding can be applied for reducing the redundancy of coding

resulting from a straight or natural binary pixel coding. Nevertheless, the coding

process does not affect the correlation level between the pixels in the images. That

is, the codes used to represent each image's grey levels are not related to the

correlation between pixels. munotes.in

## Page 140

140IMAGE PROCESSING

Fig.6.1 Two images and their gray -level histograms and normalized

autocorrelation coefficients along one line.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

The autocorrelation coefficient calculated along a line of ea ch picture is shown in

Figures 6 .1(e) and (f).

ߛሺο݊ሻൌܣሺοሻ

ܣሺͲሻ

Where

ܣሺο݊ሻൌͳܰെο݂݊ሺݔǡݕሻ݂ሺݔǡݕο݊ሻேିଵିο௬ୀ

munotes.in

## Page 141

141Chapter 6: Image Compression and Watermarking

7KHDERYHVFDOLQJIDFWRULVWKHQXPEHURIVXPWHUPVIRUHDFKLQWHJHUYDOXHRIǻQ

This value is variable. Nat XUDOO\WKHWRWDORISL[HOVRQDOLQHPXVWEHǻQ strictly

lower than N. The variable x is the line co -ordinate for the computation used.

Notice the shape of functions in Figs. 6 .1(e) and (f) . Their shapes can be associated

objecti vely to the structure in Figs. 6 .1(a) and (b). This relationship is especially

apparent in Fig. 6 .1(f), where there is a direct correlation between both the

significant correlations among both pixels divided into 45 and 90 samples between

the verti cally oriented match es of Fig. 6 .1. (b).

These images represent another significant form of data redundancy – one which

is directly related to the image's interpixel correlations. The information contained

in individual pixels is comparatively small because the value of any g iven pixel

could be adequately predicted from the value of its neighbours. A great deal of a

single pixel's visual contribution to a picture is redundant, based on its neighbours' values. A variety of names have been coined to refer to these interpixel dependencies, including spatial redundancy, geometric redundancy, and frame

redundancy. To include all of them we use the term interpixel redundancy.

The 2D pixel array usually used for human visualization and interpretation should

be changed into a more effe ctive (but usually nonvisual) format to reduce interpixel

redundancies in an image. An image may for instance be represented by the

differences between adjacent pixels. This type of transformations is known as

mapping (i.e. those which delete interpixel re dundancy). They are called reversible mappings if you can reconstruct the original picture components from the transformed data package.

Psychovisual Redundancy:

The brightness of a region, as the eye perceives, depends not only on the light

reflected by t hat region. For instance, variations in intensity (Mach bands) in an

area of constant intensity can be perceived. Such phenomena arise because the eye

does not respond to all visual information with equal sensitivity. Some information

is simply less import ant for normal visual processing than other information. It is

said that this information is psycho -visually redundant. It can be removed without

adversely affecting the quality of the perception of images.

It is not surprising that psychovisual redundanci es are in place, since human

perception of the data in an image does not usually require a quantitative analysis of each pixel value in an image. An observer usually investigates and psychologically combines characteristics, such as edges and textures, int o

recognizable groups. In order to complete the process of image interpretation the

brain correlate these groups with previous knowledge. Psycho -visual redundancy

differs from the previously mentioned redundancies. Psychovisual redundancy is munotes.in

## Page 142

142IMAGE PROCESSING

associated with actual or measurable visual information, in contrast to coding and

interpixel redundancy. The removal of the information is only possible as it is not

necessary for the normal visual processing. As the removal of redundant psycho -

visual dat a causes the loss of quantitative information, quantization is commonly

called.

This terminology corresponds to the ordinary use of the word, usually by mapping

a wide range of input values to a limited number of output values. The quantization

results in loss of data compression, since the operation is irreversible (visual

information lost).

Fidelity Criteria Realizable or quantitative information is lost due to the removal of psychovisually redundant data. Since information relevant is lost, it is highly

desirable to use repeatable or reproducible methods to quantify the nature and

extent of the data loss. As the basis for such an evaluation, there are two general

classes of criteria:

A) Objective fid elity criteria and

B) Subjective fidelity criteria.

When the information loss level could be described as a function of the original or

input image as well as the compressed and then decompressed output image, an

objective fidelity criterion is stated for this. The root -mean -square error (rms) of an

input to the output image is a good example. That f(x, y) is an image input, and

allow f(x, y) to indicate an estimation or approximation of f(x, y) which is the result

of compression and decompression of the in put afterwards. e(x, y) error between f

(x, y) and ݂መሺݔǡݕሻfor any value of x and y can be defined as

݁ሺݔǡݕሻൌ݂መሺݔǡݕሻെ݂ሺݔǡݕሻ

so that the cumulative error among the two images is

ൣ݂መሺݔǡݕሻെ݂ሺݔǡݕሻ൧ேିଵ

௬ୀெିଵ

௫ୀ

Where the images are M X N. The root -mean -square error, e rms is the square root

averaged over the M X N array and the errors between f(x, y) and ݂መሺݔǡݕሻ

݁௦ൌͳ

ܰܯൣ݂መሺݔǡݕሻെ݂ሺݔǡݕሻ൧ଶேିଵ

௬ୀெିଵ

௫ୀଵȀଶ

The mean -square signal -to-noise ratio of the compr essed -decompressed image is a

closely linked objective fidelity criterion . If݂መሺݔǡݕሻ is considered to be the sum of munotes.in

## Page 143

143Chapter 6: Image Compression and Watermarking

the original image f(x, y) and a noise signal e(x, y), the mean -square signal -to-

noise ratio of the output image, denoted SNR rms, is

ܴܰܵ௦ൌσσ݂መሺݔǡݕሻଶ ேିଵ

௬ୀெିଵ

௫ୀ

ͳ

ܯܰσσൣ݂መሺݔǡݕሻെ݂ሺݔǡݕሻ൧ଶேିଵ

௬ୀெିଵ

௫ୀ

In the square root of Eq. above, the rms value of the signal -to-noise relation, known

as the SNR rms, is derived.

While objective fidelity criteria provide a simple and convenient m echanism for

assessing loss of information, human beings are ultimately viewed the most decompressed images. Therefore, it is often more appropriate to measure image

quality with a human observer's subjective assessments. To this end, a "typical"

decompres sed image is shown in a corresponding cross section of viewers and the

evaluations are averaged. The assessments can be carried out using an absolute

evaluation scale or by comparing f(x, y) and ݂መሺݔǡݕሻside-by-side.

Image compression models

Fig.6.2 illustrates that there are two different structural blocks of a compression

system, one encoder and one decoder. The input image f(x,y), which produces a set

of symbols from the input data, is fed into the encoder. The encoded representation

is then fed in to the decoder after transmission through the channel to produce the

revised output ݂መሺݔǡݕሻǤ݂መሺݔǡݕሻcan be an accurate f (x, y) replica in general, or not.

If so, the system is error -free or information -free; otherwise, the reconstructed

image contains som e degree of distortion. Two relatively independent functions or

subblocks comprise both the encoder and decoder shown in Fig.6.2 . The encoder

consists of a source encoder that eliminates input redundancies and a channel

encoder that enhances the noise immu nity of the output from the source encoder.

As intended, a channel decoder with a source decoder is included with the decoder.

If there is noise free (not an error) channel between the encoder and decoder, the

channel encoder and Decoder shall be removed a nd the specific encoder and

Decoder shall be respectively the source encoder and decoder.

Fig.6.2 A general compression system model

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 144

144IMAGE PROCESSING

The Source Encoder and Decoder:

The source encoder must reduce or delete any coding, interpixel or psychovisual

redundancies in the input image. In each case, the particular application and related

fidelity criteria determine the best encoding method of use. Usually, a series of

three independent operations can model this ap proach. As illustrated by Fig. 6.3

(a), one of the three redundancies is reduced by each operation. The corresponding

source decoder is shown in Figure 6.3 (b). The mapper converts the input data i nto

(usually non -visual) format in the first step of the source encoding process so that

interpixel redundancies are reduced in the input image. Generally, this operation is

reversible and can decrease the data needed to represent the image directly or not .

Fig.6.3 (a) Source encoder and (b) source decoder model

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Run-length coding is an instance of mapping that leads directly to data compression

in this initial phase of the whole encoding process. An example of the opposite is

the image representation by a set of transforming coefficients. This is where the

mapper converts the image into a range of coefficients, rendering its interpixel

redundancies available in later phases of the encoding process for compression.

The second step , or quantizer block in figure 6.3 (a), decreases the precision of the

output of the mapper according to a fidelity criterion pre -established. The psycho -

visual redundancies o f the image are reduced by this stage. This is an irreversible

procedure. It is an irreversible operation. Therefore, if the compression is required,

it must be omitted.

The symbol coder generates a fixed or variable -length code to constitute the

quantizer output in the third and final phases of the source encoding process and

maps the source output to a code. The term symbol coder differs from the entire

source encoding process. This coding operation is the same. In most cases, the

munotes.in

## Page 145

145Chapter 6: Image Compression and Watermarking

mapped and quantized dat a set is represented by variable -length code. It provides

the shortest code words to the output values that most often occur and reduces the

redundancy of the code. Naturally, the operation is reversible. After the symbol

coding step is completed, each of the three redundancies has been removed by the

Image .

Figure 6.3 (a) indicates three successive operations in the source encoding process

although not all three are generally contained in each compression system.

Remember, for instance, that if error -free c ompression is required, the quantizer

must be removed.

Fig. 6.3.(a). The mapper and quantizer, for example, are often represented in the

predictive compression systems by a single block, which performs both operations

simultaneously. Fig 6.3(b) includes only two components in a source decoder: a

decoder for symbols and an inverse mapper. The inverted operations of the source

encoder and mapper blocks are performed on these blocks in the reverse order.

Since quantizations lead to irreversible loss of information, the general sourc es

decoder model shown in Fig. 6.3 (b) does not include reverse quantizing blocks.

The Channel Encoder and Decoder:

In the overall encoding -decoding process, a channel encoder and decoder play a

significant r ole when the channel in Fig. 6.3 is noisy or susceptible to error. They

are intended to decrease channel noise by inserting in the source encoded data a controlled form of redundancy. Since the source encoder's output is low in redundancy, transmission noise withou t this controlled redundancy is highly

sensible. R. W. Hamming developed one of the most effective channel coding

methods (Hamming [1950]). It is based on adding sufficient bits to the encoded

data to ensure that certain minimum bits are changed between va lid words. (Multi -

bit errors can be detected and corrected by adding extra redundancy bits.) The

Hamming 7 -bit number (7, 4) is a 4 -bit binary number b3b2b1b0, which is h 1, h2,

h3...., h 6, h7.

݄ଵൌܾଷْܾଶْܾ

݄ଶൌܾଷْܾଵْܾ

݄ସൌܾଶْܾଵْܾ

݄ଷൌܾଷ

݄ହൌܾଶ

݄ൌܾଵ

݄ൌܾ munotes.in

## Page 146

146IMAGE PROCESSING

Where X refers to an exclusive OR operation. Please note: bits h 1, h2 and h 4 are

equal parity bits respectively for b 3 b2 b2, b3b1b0, and b 2b1b0. (Recall that a string

of binary bits has even parity if the number of bits with a value of 1 is even.) The

channel decoder must check the coded value for an odd parity over the bit fields,

even parity, to decode a Hamming encoded result. The non-zero word C 4c2c1 is a

one-bit error.

ܿଵൌ݄ଵ۩݄ଷ۩݄ହ۩݄

ܿଶൌ݄ଶ۩݄ଷ۩݄۩݄

ܿସൌ݄ସ۩݄ହ۩݄۩݄

When a non -zero value is identified, the decoder will simply add the word code bit

position that is indicated by the word parity. The decoded binary value is then

removed as h 3h5h6h7 from the corrected code word.

Variable -Length Coding:

The easiest way to compress images without errors is to only minimise coding

redundancy. Coding redundancy is usually available in any normal grey level

binary encoding of a grey scale image. The grey levels can be removed by coding.

In order to do that, a varying -length code must be constructed which assigns the

shortest possible code words to the most probable grey levels. Here, we look at

various optimal and almost optimal methods for building such a code. The

techniques are expressed in the information theory language. Practically the source

symbols may be the grey image levels or the output of a mapping process at a grey

level (pixel discrepancies, run -lengths and so on

6.2 Huffman coding:

Huffman is the most popular way to remove coding redundancy (Huf fman [1952]).

The Huffman code provides the smallest number of code symbols per source

symbol when independently coding the symbols of an information source. The

resultant code is optimum for a set value of n in terms of the noiseless coding

theorem, provi ded the source symbols are coded one at a time.

The first step in Huffman's technique is to produce a series of reduced source values

by arranging the probability of the symbols being considered and combining the

lowest probability symbols to the next sour ce reduction in a single symbol. This

binary coding process can be seen in Figure 6.4 (K-ary Huffman codes can also be

constructed). A hypothesized set of source symbols and their probabilities in terms

of decreasing probability values can be ordered from top to bottom. To decrease

the initial source, the 0.06 and 0.04 probabilities of the bottom of the two sources

are combined with the probability 0.1 of a 'compound symbol.' The compound

symbol is placed in the first column for reducing source and its associated munotes.in

## Page 147

147Chapter 6: Image Compression and Watermarkingprobability, so that reduced spring probabilities are also most likely to be determin ed. This process is then repeated until a reduced source with two symbols

(at the far right) is reached.

The second step in the process of Huffman is to code each source that is reduced,

beginning with the smallest source and returning to the original sour ce. Naturally,

symbols 0 and 1. are the minimal binary code for a two symbol source. As

illustrated in Fig. 6.4, the two symbols were allocated to the right (the assignment

is arbitrary; reversing the order of the 0 and 1 would work just as well). Since th e

reduced source symbol is 0.6, combining two two symbols in the reduced source

on the left, the code 0 is now allocated to these two symbols and o ne symbol 0 and

1 are arbitrary appended to each to differentiate between them. For each reduced

source, this process was repeated until it reaches the initial source. T he final code

is shown in fig. 6.5 at the far left. This code has an average length.

Fig.6.4 Huffman source reductions.

Fig.6.5 Huffman code assignment procedure.

Lavg = (0.4)(1) + (0.3)(2) + (0.1)(3) + (0.1)(4) + (0.06)(5) + (0.04)(5) = 2.2 bits/pixel

and the entropy of the source is 2.14 bits/symbol. The resulting Huffman code

efficiency is 0.973.

The procedure of Huffman generates optimum code for a number of symbols and

probab ilities that are restricted from the coding of the symbols one at a time. Once

the code is generated, it is easy to encode and/or decode in a lookup table way. The

code as a whole is a block code that can be decoded instantly. A block code is

named as the code is mapped to a fixed sequence of code symbols for each source

symbol. It's immediate because every word of code can be decoded into a series of

symbols without referring to the following symbols. It is unique in that every code

munotes.in

## Page 148

148IMAGE PROCESSING

string can be decoded i n one way only. Thus any Huffman string of encoded

symbols can be decoded by examining left -to-right at the individual string symbols.

The left to the right scan of the 010100111100 encoded string shows that the initial

word code of the Fig. 6.5 binary cod e is 01010, which is the symbol code a3. The

next valid code is 011, which is symbol a1. The completely decoded message to be

a3a1a2a2a6 is revealed in this way.

6.3 Golomb Coding

A Golomb code is a variable -length code similar to Huffman; but, unlike Huff man,

it is based on a simple concept of the likelihood of the values (which are explicitly

treated as natural numbers rather than abstract symbols): small values are more

likely than large ones. The precise relation between size and probability is captured

in a parameter, the divisor.

To Golomb -code a number, find the divisor's quotient and remainder. The quotient

should be written in unary notation, followed by the remainder in truncated binary

notation. In practise, a stop bit is required following the quotient: if the quotient is

expressed as a succession of zeroes, the stop bit is a one (or vice versa - and people

do seem to prefer to write their unary numbers with ones, which is Wrong). The

remainder's length can be calculated by the divisor.

A Golomb -Rice code is a Golomb code with a power of two as the divisor, allowing

for more efficient implementation via shifts and masks rather than division and

modulo.

For example, here's the Golomb -Rice code with divisor 4, for numbers up to 15:

munotes.in

## Page 149

149Chapter 6: Image Compression and Watermarking

The source of information A generates the symb ols {A0, A1, A2, A3 and A4} with

the corresponding probabilities {0.4, 0.3, 0.15, 0.1 and 0.05}. Encoding the source

symbols using binary encoder and Golomb encoder gives: Source Symbol Pi Binary Code Golomb Code Source Symbol A0 0.4 000 0 A0 A1 0.3 001 10 A1 A2 0.15 010 110 A2 A3 0.1 011 1110 A3 A4 0.05 100 1111 A4 Lavg H = 2.0087 3 2.05 L avg

The Entropy of the source is

Since we have 5 symbols (5<8=23), we need 3 bits at least to represent each symbol

in binary (fixed -length code). Hence the average length of the binary code is \

Thus the efficiency of the binary code is

The average length of the Golomb code is

Thus the efficiency of the Golomb code is

This example demonstrates that the efficiency of the Golomb encoder is much

higher than that of the binary encoder.

munotes.in

## Page 150

150IMAGE PROCESSING

6.4 Arithmetic coding:

Unlike previously mentioned variable -length codes, nonblock codes are created by

arithmetic coding. There is no single correspondence between the source symbols

and the code words of arithmetic coding that can be traced back to Elias's work. A

single arithmetic code word is instead assigned to a whole sequence of source

symbols (or message). The word code describes an interval between 0 and 1 of real

numbers. With the increase in the n umber of symbols in the message, the range for

representing it becomes smaller and the number of units of information necessary

for representing the range is larger. The size of the interval is reduced by each

message symbol according to its probability. S ince the technique requires that each

source symbol does not translate into an integral number of coding symbols (that

is, coding the symbols one by one at a time) as is the approach of Huffman, it

reaches (but only in theory) the boundary of noiseless cod ing theorem.

Fig.6.6 Arithmetic coding procedure

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

The fundamental arithmetic coding procedure is illustrated in Fig.6.6 . A five -

symbol or message sequence, a1a2a3a3a4, is coded in this section from a four -

symbol source. The message is supposed to occupy the complete half -open

interval [0, 1) at the beginn ing of the coding process. As shown in table 5.2, this

interval was originally divided into four regions, depending on each source

symbol's probabilities. For instance, the symbol ax, is related to the sub -interval [0,

0.2). The message interval is origina lly restricted to [0, 0.2] because the message is

the first symbol of the coded message. Thus Fig.6.6 [0, 0.2] extends the figure to

its full height, with the end points marked with the narrowing range values.

munotes.in

## Page 151

151Chapter 6: Image Compression and Watermarking

Table 6 .1 Arithmetic coding example

Symbol a 2 reduces the subinterval by [0.04, 0.08), and a3 closes it by [0.056,

0.072), respectively. A special end -of-message indicator should be reserved for the

final message symbol, and the range is narrowed by [0.06752, 0.0688). Naturally,

any number can be us ed to represent the message within this subinterval —for

example, 0.068.

Three decimal digits represent the five -symbol message in the arithmetically coded

message of Fig.6.6 . It is evaluated in 3/5 or 0.6 decimal digits by source symbol

with entropy, which is 0.58 decimal digits or 10 -ary units/symbol.

The resulting arithmetical code reaches the limit introduced by the noiseless coding

theorem, as the length of the coded sequence increases.

In practice two factors lead to the coding performance being less t han limited (1)

by adding the end -of-message indicator required to separate one message from the

other; and (2) by applying finite arithmetic precision. The latter issue is addressed

in practical implementations of arithmetic coding with a scaling strategy and a rounding strategy (Langdon and Rissanen [1981]). The scaling strategy renormalizes every [0, 1] subinterval before splitting it into the probabilities of the

symbols. The rounding strategy ensures that finite precision arithmetic truncation

does not prevent correctly representing coding subintervals.

6.5 LZW Coding:

The Lempel Ziv -Welch (LZW) coding method applies fixed -long coding words to

variable sequences of source symbols but does not require a priori knowledge of

the probability of encoding symbols occurring. LZW compression has been built

into a number of mainstream image file formats, such as the interchange format

(GIF), tagged image file format (TIFF), and the portable document format (PDF).

Conceptually, LZW coding is very strai ghtforward. A codebook or "dictionary" is

created at the beginning of the coding process that contain the source symbols to

be coded. The first 256 words in the dictionary are given to grey values 0, 1, 2...

And 255 for 8 -bit monochrome images. The gray -level sequences not included in

munotes.in

## Page 152

152IMAGE PROCESSING

the dictionary are algorithmically determined (for example the next unused)

locations when an encoder sequences the image pixels consecutively examines. If,

for example, the two first pixels of an image are white, sequence "25 5-255," the

address below is reserved for grey levels 0 to 255, and can be assigned to position

256. The next time two white pixels of the following number are encountered, the

code word 256 is used to represent them with the location address, which contai ns

sequence 255 -255. If in the code process a 9 -bit, 512 -word dictionary is used, a

single 9 -bit code word replaces the original bits (8 + 8) used to represent two pixels.

Cleary, a major system parameter is the size of the dictionary. If it is too small, it

is less likely to detect corresponding grey level sequences; if it is too large, the size

of the code words affects the performance of the compression.

Consider the following 4 x 4, 8 -bit image of a vertical edge:

The steps involved in coding 16 pixel s will be described in Table 6.2 . The

following starting content is supposed to include a 512 words dictionary:

Initially unused are locations 256 through 511. It is encoded with a left -to-right,

top-to-bottom processing of its pixels. The variable — column 1 of Table 6.2 —

known as the currently recognised sequence is linked to each successive Gray Level

value. Thi s variable is null or empty at the start, as can be seen. The dictionary will

be searched for each concatenated sequence and the newly concatenated and

accepted sequence (i.e. located in the dictionary) will replace the dictionary as it

has in the first ro w of the table. This has been done in row 2 column 1.

munotes.in

## Page 153

153Chapter 6: Image Compression and Watermarking

There is no generation of output codes or alteration of the dictionary. If, however,

the concatenated sequence cannot be identified, the current sequence is output as

the next encoded value, a concatena ted but unknown sequence is added to the

dictionary, and the current pixel value is started in the sequence currently

recognized. This was done in table row 2. The last two columns provide a detailed

gray-level sequence when the whole 4 x 4 image is scanne d. There are defined nine

extra words of code. The dictionary includes 265 code words and a number of

repetitive gray -level sequences were successfully identified by the LZW algorithm —which uses them in order to reduce the original 128 -bit image to 90 bits

(i.e., 10 9 -bit codes). Reading the third column from top to bottom produces the

encoded result. The compression ratio resulting is 1.42:1.

One special feature of LZW coding that has just been demonstrated is that while

the data is encoded, the code dicti onary or book is created. Noteworthy, the LZW

decoder produces the same decompression dictionary as it decodes the encoded

data stream simultaneously. While this example does not require a strategy for

handling dictionary overflow, most useful applications require a strategy. A

convenient way is to flush or reset the dictionary when complete and continue to

code with a new initialized dictionary. A more complicated way is to monitor

compression and flush the dictionary in bad or unacceptable circumstances. Instead

it is easy to track and replace the least used dictionary entries if necessary.

Table 6.3 LZW coding example

munotes.in

## Page 154

154IMAGE PROCESSING

6.6 Run-Length Coding

Repeating intensity images on their rows (or columns) may also be compressed by

showing the same intensity runs as run-length pairs, where a new intensity starts

with each run -length pair and the number of consecutive pixels with that intensity.

In the 1950s the technique, known as Run -Length Encoding (RLE) was developed

and became the traditional compression method in facsimile coding, along with its

2-D extensions. Compression is accomplished by elimination of a simple type of

spatial redundancy – the same -intensity groups. When there are few (or no) runs of

identical pixels, run -length encoding results in data exp ansion.

Run Length Encoding (RLE) is not a technique used too commonly these days, but

it is an excellent way of making a sense of some of the compression problems.

Imagine that we have a simple black and white image.

One way a computer can store this im age in binary is to use the format of '0' in

white, and '1' in black (this is a "bitmap," since pixels have been mapped to bits).

The above picture would be shown as follows using this method:

011000010000110

100000111000001

000001111100000

000011111110000

000111111111000

001111101111100

011111000111110

111110000011111

011111000111110

001111101111100

000111111111000

000011111110000

000001111100000

100000111000001

munotes.in

## Page 155

155Chapter 6: Image Compression and Watermarking

011000010000110

The main question in compression is whether or not the same image can be

represented in fewer bits so that the original image can still be reconstructed.

We can prove it. There are various ways to do this, but we concentrate on a way

called run length encoding in this section.

Imagine reading the above bits to someone who copie d them... you might tell things

like "five zeroes" earlier on rather than "zero zero zero zero c zero." That is the

basic concept behind run time encoding (RLE), used in the storage of digital images

to save space. In run length encoding, each row is repla ced by numbers that indicate

how many consecutive pixels the same colour is, often beginning with the number

of white pixels. For instance, the first line in the above image contains one white,

two black and four white, one black, four white and two black pixel.

011000010000110

This could be represented as follows.

1, 2, 4, 1, 4, 2, 1

For the second row, since we have to say the number of white pixels before

saying black, we have to tell specifically that there is zero at the beginning of the

row.

100000111000001

You may ask why we must first say the number of white pixels that was zero in

this case. The explanation is that if we had no clear rule, the computer would not

know which colour was what when the picture in this form was shown.0, 1, 5, 3,

5, 1

The third row contains five whites, five blacks, five whites.

000001111100000

This is coded as:

5, 5, 5

That means we get the following representation for the first three rows.

1, 2, 4, 1, 4, 2, 1

0, 1, 5, 3, 5, 1

5, 5, 5

You can work out what the oth er rows would be following this same system. munotes.in

## Page 156

156IMAGE PROCESSING

Representation for the remaining rows

The remaining rows are

4, 7, 4

3, 9, 3

2, 5, 1, 5, 2

1, 5, 3, 5, 1

0, 5, 5, 5

1, 5, 3, 5, 1

2, 5, 1, 5, 2

3, 9, 3

4, 7, 4

5, 5, 5

0, 1, 5, 3, 5, 1

1, 2, 4, 1, 4, 2, 1

Converting run length encoding back to the original representation

Just to ensure that we can reverse the compression process, have a go at finding

the original representation (zeroes and ones) of this (compressed) image.

4, 11, 3

4, 9, 2, 1, 2

4, 9, 2, 1 , 2

4, 11, 3

4, 9, 5

4, 9, 5

5, 7, 6

0, 17, 1

1, 15, 2

The CCITT (consultative committee of the international telegraph and telephone)

and run -length coding are CCITT standards for encoding a binary and gray -level

image. With this method, images are scanned row by row and identifies the run .

The output vector run-length pixel value and run length is defined.

Hrun-length = ( H 0 + H 1)/ (L 0 +L1)

H0 = entropy of the black run

H1 = entropy of white run

L0 = average value of black run

L1 = average value of white run munotes.in

## Page 157

157Chapter 6: Image Compression and Watermarking

6.7 Symbol -Based Coding

It is also referred to as token -based coding in some circles. A symbol is a collection

of sub images that appear frequently in a given context. Each symbol is stored in a

symbol dictionary, and the image is coded as an asset of the triplet (x1, y1, t1), (x 2,

y2, t2), etc., where the location of the symbol is specified by the triplet (xi, yi) ,

where ti is the token, which is a reference to the symbol in the dictionary, and each

triplet represents an instance of the dictionary symbol in the image.

Consider t he straightforward bilevel image shown in Fig. 6.7 (a). It includes only

one word, banana, which is made up of three distinct symbols: a b, three a's, and

two n's. Given that the letter b is the initial symbol recognized during the coding

process, its 9 X 7 bitmap is stored at location 0 of the symbol dictionary, as shown

in the following diagram. As illustrated in Fig. 6.7 (b), the token identifying the b

bitmap is the number 0. To illustrate how this works, the first triplet in the encoded

image's repre sentation [see Figure 6.8.7(c) ] is (0, 2, 0,] denoting that the upper -left

corner (an optional convention) of the rectangular bitmap indicating the b symbol

is to be placed at the location (0, 2) in the decoded image. It is possible to encode

the remainin g portion of an image with five extra triple ts after the bitmaps for n

symbols have been detected and added to the dictionary, as shown in the following

example. Compression occurs as long as the six triplets required to locate the

symbols in the image, as well as the three bitmaps required to define them, are

lower in size than the original image. It is assumed that each triplet is formed of

three bytes in this example, and that the starting picture has or 459 bits. In this case,

the compressed represent ation has or 285 bits and the compression ratio c=1.61 is

obtained by multiplying the starting image by (6 3 8 + [(9 7) + (6 7) + (6 6)]. For

example, to decode the symbol -based representation shown in Figure 6.7 (c), you

simply read the bitmaps of the sym bols given in the triplets from the symbol

dictionary and arrange them in the appropriate positions at the spatial coordinates

specified in the respective triplets.

FIGURE 6. 7 (a) bi-level document, (b) symbol dictionary, and (c) the triplets used

to loc ate the symbols in the document.

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 158

158IMAGE PROCESSING

In the beginning of the 1970s, Symbol -based compression was suggested, but only

recently became practical. Improvements in symbol matching and enhanced CPU

processing speeds have enabled both dictionary symbols to be selected and to see

where they can be found in an image in time. And symbol -based decoding is

definitely faster than encoding, as with many other compression methods. At last,

we observe that to further improve the compression performance, both the bitmaps

symbols which are stored in the dictionary and the three versions used for the

reference themselves can be encoded. If the resulting compression is lossless as in

fig. 6.7, only exact symbol matches are permitted. If small differences are allowed,

there will be some level for the reconstruction error.

6.8 8 Bit-Plane Coding:

An effective approach to reduce the redundancy of the interpixels of an im age is to

independently process the image bit planes. The technique, called Bit Plane Coding

(bitplane coding), is based on the principle of decomposing multilevel

(monochromatic or color -based) images into a series of binary images.

Bit-plane decompositio n:

A m-bit grey image in the form of the base 2 polynomial could be displaying the

grey levels

ܽିଵʹିଵܽିଵʹିଵڮܽଵʹଵܽʹ

Based on this property, the m coefficients of the polynomial into 1 -bit bits is a

simple way of decomposing the image into a series of binary images. The By

collecting the a0 bits of each pixel the zero -order bit plane is formed, while (m - 1)

the bit bit plane contains am -1, bits or coefficients. Generally, the number of every

bit plane from 0 to m -1 is determined by setting its pixels to equal the values of

each pixel of the respective bits and polynomial coefficients of the original image.

The inherent drawback is that small changes in the grey level can have a considerable effect on the complexity of bit planes. When the 127 (01111111) intensity pixel is neighboring to the 128 (10000000), for example, the corresponding 0 to1 (or 1 to 0) transitioning sha ll be included on every bit plane.

As the two binary codes for 127 and 128 are different most important bits, for

example, Bit plane 7 will have a zero -valued pixel next to a pixel value 1, which

will create a transition between 0 to 1 (or 1 to 0) at this point.

݃ൌܽ۩ܽାଵͲ݅݉െʹ

݃ିଵൌܽିଵ

A different approach to decomposition (which eliminates the influence of small

variations in grey levels) is to first image representation with the m -bit Gray munotes.in

## Page 159

159Chapter 6: Image Compression and Watermarking

code. From Here, X denotes the exclusive OR operation: The m -Bit Gray Code

gm-1... g2g1g0 which corresponds to the polynomial in Eq. above. The only feature

of this code is that the following words differ only in one bit. Small grey level

changes are also unlikely to affect all mbit levels. For inst ance, only the 7th bit

plane can contain transitions of 0 to 1 when grey level 127 and 128 are next to each

other, because the Gray codes of 127 and 128, respectively, are 11000000 and

01000000.

6.9 Transform Coding:

Both predictive coding methods work dir ectly on the image pixels and are also

spatial domain methods. In this code, the techniques of compression based on

modification of the image transformation are considered. A reversible, linear

transform (such as a Fourier transform) is used in transform c odification to map the

image into a set of coefficients that are then quantified and coded. For most natural

images, many of the coefficients have small dimensions and can be quantified

grossly (or completely discarded) with a small distortion in the image . A number

of transformations can be used to convert image data, including the discrete Fourier

transform (DFT).

Fig. 6.8 A transform coding system: (a) encoder; (b) decoder.

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

A standard transform coding system is shown in Figure 6.8. The decoder implements the inverse phase sequence of the encoder, which carries out four fairly straightforward operation (with exception of the quantification function): decomposition of the subimages, transformation, and quantification and coding. An

image N X N is first divided up into sub -images of size n X n, then transformed

into sub -image tran sforms (N/N) 2 of each size n X n. The aim of the process of

transformation is to decorrelate the pixels of any subimage or to place as little

information as possible in the transforming coefficients.

These coefficients have the least effect on the consist ency of the subimage. The

encryption process finishes the quantified coefficients by coding (usually with a

variable length code). Any or all of these processing stages of transformative

munotes.in

## Page 160

160IMAGE PROCESSING

encoding may be adapted to local image contents or fixed for all sub -images,

known as non -adaptive transform encodings.

6.10 Predictive Coding

Lossless Predictive Coding:

The error -free method of compression does not require an image to be decomposed

into a group of bit planes. The technique, also known as loss-predictive coding,

consists of removing the redundancy of interpixel pixel by collecting the latest

information in each pixel and encoding the new information only. The new pixel

information is defined as the difference between the actual and predicte d pixel

value .

The essential elements of the lossless predictive coding system are shown in Figure

8.1. The system is made up of an encoder and a decoder with the same predictor.

The predictor creates the expected pixel value depend on a set of previous in puts,

when each pixel of the input image, denoting fn is introduced to the encoder. The

predictor's output then is rounded to the nearest integer, denoted ݂መሺ݊ሻ, and used to

form a differential or prediction error coding the following element in the

comp ressed data stream using variable -length (by the symbol encoder).

݁ሺ݊ሻൌ݂ሺ݊ሻെ݂መሺ݊ሻ

Fig. 6.9 A lossless predictive coding model: (a) encoder; (b) decoder

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

The decoder of received variable -length code words is rebuilt from Fig. 8.1(b)

and inverse operations are performed to decompress or recreate the original input

sequence.

݂ሺ݊ሻൌ݁ሺ݊ሻ݂መሺ݊ሻ

Various techniques for ݂መሺ݊ሻ generation may be used local, global and adaptive.

Furthermore, a linear combination of m of previous pixels forms in most cases a

prediction. In other words, ݂መሺ݊ሻൌݑݎ݊݀ߙ݂ሺ݊െ݅ሻୀଵ൩

munotes.in

## Page 161

161Chapter 6: Image Compression and Watermarking

Where m is a linear predictor order, round is a function used to indicate an operation

RIWKHURXQGLQJRUQHDUHVWLQWHJHUDQGĮLIRUL PDUHFRHIILFLHQWVRI

prediction. The subscript n indexes predictor outputs according to their occurrence

time in raster scan applications. In Eqns. above, t he most explicit notations f (t),

݂መሺݐሻand e (t) could be replaced, where t represents time. In Eqns. In all cases, n

shall is being used as an index on the spatial coordinates and/or frame number of

an image (in a time sequence). For example, Eq. above c an be written as 1 -D linear

predictive coding

݂መሺݔǡݕሻൌ݀݊ݑݎߙ݂ሺݔǡݕെ݅ሻ

ୀଵ൩

While each of the subscripted variable's function of spatial coordinates x and y are

expressed explicitly. The Eq. says the linear 1 -D prediction f(x, y) is the functi on

alone on the current line of the previous pixels. In 2 -D predictive coding, the

prediction depends on the previous pixels in a top to bottom left -to-right image

scan. In the 3D case, these pixels and prior pixels of earlier frames are the basis of

this case. For the first m pixel of each line, the equal above cannot be evaluated, so

these pixels must be coded by other methods (e.g. Huffman code) and considered

as an overhead of the predictive coding process.

Lossy Predictive Coding:

within this types of coding, a quantizer is added to the lossless prediction models

and the resulting balance between reconstruction precision and compression

performance is evaluated. As Fig. 6.10 illustrates, between the symbol encoder as

well as the position during which the prediction error was created, the quantizer

that captures the nearest integer of the error -free encoder would be inserted. It maps

the prediction error into a limited range of ݁Ƹሺ݊ሻoutputs, which determine the

compression and distortion associated with the lossy predictive coding.

Fig. 6.10 A lossy predictive coding model: (a) encoder and ( b) decoder .

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 162

162IMAGE PROCESSING

The error -free figure encoder should be modified to accommodate the insertion of the quantization step, so that the encoder and decoder produce predictions equivalent. As shown in fig. 6.10 (a), the the lossy encoder's predictor is placed

within a feedback loop, where its input (referred to as݂ሺ݊ሻሶ) is produced in

function of previous predictions and the corresponding quantified errors. The

results are obtained.

݂ሺ݊ሻሶൌ݁ሺ݊ሻሶ݂መሺ݊ሻ

This configuration of the closed loop avoids error building in the output of the

decoder. Fig. 6.10(b) shows that the decoder output is also provided by the Eqn

above.

Optimal predictors:

The optimal predictor used by most predictive coding applications limits the mean -

square prediction error of the encoder. The notation ܧሼǤሽdenotes the statistical

expectation operator.

ܧሼ݁ଶሺ݊ሻሽൌܧቄቂ݂ሺ݊ሻെ݂መሺ݊ሻ൧ଶቅ

subject to the constraint that

݂ሶሺ݊ሻൌ݁ሶሺ݊ሻ݂መሺ݊ሻൎ݁ሺ݊ሻ݂መሺ݊ሻൌ݂ሺ݊ሻ

That is, the optimization criterion is chosen in order to minimize the mean -

square prediction error, it is assumed that the quantization error is minimal

[݁ሶሺ݊ሻൎ݁ሺ݊ሻ] and the prediction is restricted to a linear combination of m of

previo us pixels. 1 These constraints are not necessary, but at the same time they

greatly simplify analysis and reduce the computational complexity of the predictor.

The resulting predictive coding method is called the differential pulse code

modulation (DPCM)

6.11 Wavelet Coding:

The coding of the wavelet is based on the assumption that the coefficients of a

transform that decorrelates an image's pixels can be coded better than the originals.

If most important visual information are packaged into a small number of

coefficients as the basis for the transformation, the remainder of the coefficients

which be quantized coarsely or truncated to zero with a little distortion of the

image.

The standard wavelet coding system is shown in Figure. For the encoding of a 2J

X 2J image, a wavelet analysisȲ, — a minimum decomposition level (J - P) are

chosen for the purpose of calculating the discrete transformation of the image munotes.in

## Page 163

163Chapter 6: Image Compression and Watermarking

wavelet. The fast wavelet transform can be used when the wavelet contains an

additional scaling fun ction ࡇ . In both instances, the computed transforms a significant part of the original image with a zero mean and laplacian distribution to

horizontal, vertical and diagonal decomposition coefficients.

Reference from "Digital Image Processing fourth edit ion by

Rafael C. Gonzalez and Richard E. Woods)

Since several of the measured coefficients have little visual detail, Inter coefficient

and coding redundancies can be reduced and quantified. In addition, quantization

could be adapted to take advantage of any positional correlation over the P

decomposition levels. One or more lossless coding techniques can be used in the

final symbol codin g step including run -length, Huffman coding, arithmetic and bit

plane coding. Decoding is done by inverting encoding – with exception of

quantization, which cannot be precisely reversed.

The major difference between the wavelet system and the transform cod ing system

is that the subimage processing phases of the transform coder are not supported.

Due to its computational efficiency as well as its intrinsically local existence (e.g.

its base functions are restricted in duration), it is needless to subdivide t he original

image.

Wavelet Selection

In Fig. 6.11 all aspects of wavelet coding systems and performance are affected by

the wavelets selected as a base for forward and reverse transformation. They have

a direct impact on the complex computation of the tran sforms and, less directly, on

the system's ability to recompress acceptable error images. The number of filter

taps equal to the number of non -zero wavelet coefficients and scaling of the vector

coefficients can be applied as a digital filter sequence if t he transforming tap has a

supplementary scaling function. The wavelet's capacity to pack information in a small number of transforming coefficients determines its performance in compression and reconstruction. Daubechies waves and biorthogonal waves are th e

most widely used expansion functions for wavelet -based compression. The latter

allow useful analytical properties such as the number of disappearances that can be

incorporated in the decomposition filters, while important synthesis properties such

as smo oth reconstruction are incorporated in the reconstruction filters.

munotes.in

## Page 164

164IMAGE PROCESSING

The four disc rete wavelengths in Figure 6.12 are shown. In Fig 6.12 (a) Haar

wavelets have been used as expansion or base functions, the simplest and only

discontinuous wavelets considered in this example. Wavlets from Daubechies, one

of the most widely used imagery wave lets, were employed in Fig. 6.1 2(b), and

Daubechies w aves with increased symmetry sym metry were extended to Fig. 6

.12(c) The Cohen -Daubechies -Feauveau -wavelets in illustration of biorthog onal

wavelets used in Fig. 6.12 (d) are included. Like previous results, the subordinate

structure was visible with a coef ficient of intensity 128 corresponding to coefficient

value 0 by scaling all detailed coefficients.

FIGURE 6.12

Three -scale wavelet transforms with respect to (a) Haar wavelets, (b) Daubechies

wavelets, (c) symlets, and (d) Cohen -Daubechies -Feauveau bior thogonal wavelets.

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

As illustrated in Table 6.2 , the number of operations required to compute the

transforms in Fig. 6.12 grows from four to twenty -eight multiplications and adds

each coefficient (for each decomposition level) as you move from Fig. 6.12(a) to

(d). Each of the four transformations was computed using a fast wavelet transform

formulation (i.e., filter bank). Notably, as computational complexity (i.e., the

amount of filter taps) rises, so does the performance of information packing. When

Haar wavelets are used and detail coefficients less than 1.5 are cancelled out, 33.8

percent of the entire transform is zeroed out. With more sophisticated biorthogonal

wavelets, the percentage of zeroes coefficients increases to 42.1 percent, nearly

doubling the possible compression.

munotes.in

## Page 165

165Chapter 6: Image Compression and Watermarking

TABLE 6.2

Wavelet transform filter taps and zeroed coefficients when truncating the transforms in Fig. 8.43 below 1.5

6.12 Digital Image Watermarking

The techniques and protocols make it possible to distribute images (in the form of

photographs or videos) via digital media and the Internet. Regrettably, the images

distributed in this manner can be reproduced frequently and without error, jeopardizing the rights of their owners. Even when images are encrypted for

distribution, they are rendered vulnerable upon decryption. To deter unauthorized

duplication, one method is to embed one or more pieces of information, generally

calle d to as a watermark, into highly vulnerable images in such a manner so the

watermarks become indistinguishable from the images themselves. They defend

the rights of their owners in a variety of ways as important aspects of the

watermarked images, including the following:

1. Copyright identification. When an owner's rights are violated, watermarks

can provide information that serves as proof of ownership.

2. Identification or fingerprinting of the user. Legal users' identities can be

embedded in watermarks and used to track down the sources of unauthorized

copies.

3. Authenticity determination. A watermark can serve as a guarantee that an

image has not been altered, provided the watermark is designed to be

destroyed upon image modification.

4. Automated m onitoring. Watermarks can be tracked by systems that keep

track of when and where images are used (for example, programmes that scan

the Web for images embedded in Web sites). Monitoring is beneficial for

collecting royalties and/or locating unauthorized u sers.

5. Copy protection. Watermarks can be used to set usage and copying restrictions on images (e.g., to DVD players)

In this section, we will discuss digital image watermarking, which is the technique

of embedding data into an image in such a way that it may be utilized to make a Wavelet Filter Taps (Scaling + Wavelet) Zeroed Coefficients Haar 2+2 33.3% Daubechies 8+8 40.9% Symlet 8+8 41.2% Biorthogonal 17+11 42.1% munotes.in

## Page 166

166IMAGE PROCESSING

assertion about the image. The methods outlined have no resemblance to the

compression techn iques discussed previously (although they do involve the coding

of information). Indeed, watermarking and compression are diametrically opposed

in several aspects. While compression aims to reduce the amount of data used to

represent images, watermarking a ims to enhance them by adding information and

data (i.e., watermarks). Watermarks can be visible or invisible.

A visible watermark is a subimage or image that is opaque or semi -transparent and

is superimposed on another image (i.e., the image being waterm arked) to make it

clear to the observer. Television networks frequently include visible watermarks

(modelled after their logos) in the top or lower right -hand corner of the screen. As

illustrated in the following example, visible watermarking is often acco mplished in

the spatial domain.

The image in Fig. 6.13(b) is the image's lower right quadrant with a scaled -down

version of the watermark in Fig. 6.13 (a) superimposed on top. Using fw the

watermarked image as a starting point, we may represent it as a linear combination

of the unmarked image f and the watermark w.

FIGURE 6.13

A simple visible watermark: (a) watermark; (b) the watermarked image; and (c) the

difference between the wate rmarked image and the original (non -watermarked)

image

Reference from "Digital Image Processing fourth edition by Rafael C. Gonzalez

and Richard E. Woods)

݂௪ൌሺͳെߙሻ݂ݓߙ

Wherein constant determines the watermark's relative visibility to the underlying

image. If Įis 1, the watermark is transparent and fully obscures the underlying

image. As the value approaches 0, more than just the underlying image is seen and

less of the waterm ark. In general , 0Į1, as shown in Fig. 6.13(b), Į=0.3. The

computed difference (intensity -scaled) between the watermarked image in Fig.

6.13 (b) and the unmarked image is shown in Figure 6.13(c). Intensity 128, on the

other hand, represents a difference o f 0. It's worth noting that the underlying image

is visible through the "semi transparent" watermark. Both Fig. 6.13(b) and the

difference image in Fig. 6.13 (c) demonstrate this .

munotes.in

## Page 167

167Chapter 6: Image Compression and Watermarking

Invisible watermarks, in contrast to the visible watermark in the previous example,

cannot be seen with the human eye. They are invisible yet recoverable with the use of a proper decoding algorithm. By introducing them as visually redundant information [information that the human eye ignores or cannot perceive], their

invisibilit y is ensured. The example in Figure 6.14 (a) is simple. Due to the fact

that the least significant bits of an 8 -bit image have little effect on our perception

of it, the watermark from Fig. 6.14 (a) was put or "hidden" in the image's two least

significant bits. Using the notation presented previously, let us say

݂௪ൌͶ൬݂

Ͷ൰ݓ

Ͷ

and conduct the computations using unsigned integer arithmetic. By dividing and

multiplying by 4, the two least significant bits of f are set to 0, by dividing w by

64, the two most s ignificant bits of w are shifted into the two least significant bit

positions, and by adding the two results, the LSB watermarked image is generated.

Note that in Fig. 6.14(a), the embedded watermark is not visible. However, by

zeroing the image's most sig nificant six bits and scaling the remaining values to the

entire intensity range, the watermark can be retrieved as shown in Fig. 6.14. (b)

FIGURE 6.14 A simple invisible watermark: (a) watermarked image; (b) the

extracted watermark; (c) the watermarked image after high quality JPEG compression and decompression; and (d) the extracted watermark from (c).

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

A critical aspect of invisible watermarks is their resistance to removal attempts,

both accidental and purposeful. Weak invisible watermarks are removed when the

images in which they are embedded are modified. This is a useful quality in some

applications, such as image authentication. As illustrated in Figures 6.14 (c) and

(d), the LSB watermarked image in Figure 6.14 (a) has a delicate invisible

watermark. The watermark is deleted if the image in (a) is compressed and

decompressed using lossy JPEG. The outcome of compressing and decompressing

Figure 6.14 (a) is shown in Figure 6.14 (c); the rms error is 2.1 bits. When we use

the same procedure as in (b) to extract the watermark from this image, the outcome

is incoherent [see Fig. 6.14 (d) ]. While lossy compression and decompression

munotes.in

## Page 168

168IMAGE PROCESSING

saved the image's critical visual information, they obliterated the image's delicate

watermark.

Robust invisible watermarks are meant to withstand image modification, whether accidental or purposeful. Lossy compression, linear and nonlinear filtering, cropping, rotation, a nd resampling are all examples of common accidental attacks.

Intentional attacks can take the form of printing and rescanning, as well as the

insertion of extra watermarks and/or noise. Naturally, it is superfluous to endure

attacks that render the image u nusable.

6.13 Summary

We have seen different compression techniques with an examples and various types

of image compression algorithms that are implemented in various types of image

processing applications, watermarking techniques for digital images.

6.14 Unit End q uestions

1. What is data compression?

2. Write a short note on

a) Coding Redundancy b) Interpixel Redundancy

c) Psychovisual Redundancy

b) What is a Fidelity Criteria?

c) Explain the Huffman coding with an example.

d) Explain the Golomb Coding with an example.

e) Explain the Arithmetic Coding with an example.

f) Explain the LZW Coding with an example.

g) Explain the Run -length Coding with an example.

h) Explain the Symbol -based Coding with an example.

i) Write a short note on Block Transform Coding.

j) Write a short note on Predi ctive Coding.

k) Write a short note on Wavelet Coding.

l) Write a short note on Digital Image Watermarking.

6.15 Reference for further reading

1. Digital Image Processing Gonzalez and Woods Pearson/Prentice Hall Fourth

2018

2. Fundament als of Digital Image Processing A K. Jain PHI

3. The Image Processing Handbook J. C. Russ CRC Fifth 2010

4. All Images courtesy from Digital Image Processing by Gonzalez a nd Woods,

4th Edition

munotes.in

## Page 169

169Chapter 7: Image Compression and Watermarking

Unit 4

7 MORPHOLOGICAL IMAGE PROCESSING

Unit Structure

7.0 Objective

7.1 Preliminaries,

7.2 Erosion and Dilation

7.3 Opening and Closing,

7.4 The Hit -or-Miss Transform

7.5 Morphological Algorithms

7.6 Morphological Reconstruction

7.7 Morphological Operations on Binary Images

7.8 Grayscale Morphology

7.9 Summary

7.10 Unit End questions

7.11 Reference for further reading

7.0 Objective

Upon the completion of this chapter, the readers will know

x This section defines a number of important concepts related to the subject of

mathematical morphology and how the method is extensively applied in

digital image processing.

x A variety of tools have been used for binary image morphology, incorporating erosion, dilation, opening, closing, and the techniques can also

be combined them to generate more complex tools.

x Capable of developing binary image morphology based algorithms for the

performance of tasks including morphological smoothing, edge -detection

and skeletonisation

x Learn how binary image morphology to gray scale images can be expanded.

x Will be able to develop grey scale image processing algorithms for tasks like

textural segmentation, granulometry, gray -scale gradient computing and

others munotes.in

## Page 170

170IMAGE PROCESSING

7.1 Preliminaries

Morphological image processing is a set of non -linear image processing relevant to

the shape or morphology of the image features. As per Wikipedia, morphological

operations only depend on the specific ordering of the values of the pixels, not the

numerical values therein. Morphological operations may also be used in grayscale

images such that the light transfer functions of the grayscale are uncertain and their

absolute pixel values are either of no or minor concern.

Morphological techniques e valuate an image called a structural element with a

small shape or template. The structuring element is placed anywhere in the image

and compared with the corresponding pixel neighborhood. In some operations, the

element is tested if it "fits" in the neigh borhood, while others test if it "hits" or

crosses it:

Figure 7.1 Probing of an image with a structuring element

(white and grey pixels have zero and non -zero values, respectively).

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

A morphology operation on a binary image produces a new binary image with a

pixel value not zero only, if the test at that location in the input image is successful.

A small binary image is the structuring element, i.e. a small matrix of pixels, each

with its own zero or one value:

x The sizes of the structuring element indicate the matrix dimensions.

x The pattern and zeroes specify the structure element shape .

x One of its pixels typically originates from a structuring element, but the origin

can usually be outside the structuring element.

munotes.in

## Page 171

171Chapter 7: Image Compression and Watermarking

[Reference from

https://www.cs.auckland.ac.nz/courses/compsci773s1c/lectures/ImageProcessing -

html/topic4.htm ]

The odd dimensions of the structuring matrix and of the origin identified as the core of the matrix is a standard procedure. In morphological images, Structuring elements play the same role in linear image filtering as convolution

kernels.

If a structuring element is positioned in a binary image, each pixel of the structuring

element is associated with the corresponding pixel of the neighborhood. The

structuring el ement should match the image if the corresponding image pixel is 1

for each of its pixels also set to 1. Likewise, if the corresponding image pixel is

also 1. For at least one of their pixels set to 1, a structuring element is said to hit

or intersect an image.

Figure 7.2 Fitting and hitting of a binary image with structuring elements s 1 and s 2.

[Reference from

https://www.cs.auckland.ac.nz/courses/compsci773s1c/lectures/ImageProcessing -

html/topic4.htm ]

munotes.in

## Page 172

172IMAGE PROCESSING

7.2 Erosion and dilation

Erosion

Erosion is the alternative to dilation. If dilation expands an image, erosion shrinks

it. The structural element determines how the image is shrunk. Typically, the

structural element is smaller than t he image, measuring 3 x 3 size.

This results in a faster computation time as compared to using larger structural

elements. Almost identical to dilation, erosion will mov e the structural element

from l eft to right and top to bottom.

At the center position, denoted by the structuring element's centers, the procedure

checks to see if there is complete overlap with the structuring e lement or not.

If no complete overlapping oc curs, the center pixel given by the structural element's

center will be set to white or 0. Assume that X represents the reference binary image

and B represents the structuring element. The equation for erosion is as follows:

ܺٓܤൌ൛ݖหܤ௭אܺൟ

According to the equation, the outcome element z is only evaluated if the

structuring element is a subset of or equal to the binary picture X. Fig. b illustrates

this process. Again, the white square represents O, whereas the black square

represents 1.

At position •, the erosion process begins. There is no total overlap in this case, and

hence the pix el at location • remains white.

After shifting the structural element to the right, the identical condition is seen.

Because there is no complete overlapping a t position u, the black square designated

with • • will be converted white.

The structural element is then relocated further to the point shown by •• •. Here, we

observe that the overlapping is complete, since all of the structuring element's black

squares overlap with the image's black squares.

As a result, the image's structuring element's center will be black. The result is seen

in Figure 7.3 b after the structuring element reaches the final pixel. Erosion is a

thinning operator that causes an image to b ecome smaller. By eroding an image, it

is possible to eliminate narrow sections while thinning out wider ones.

Figure 7.3 Erosion

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 173

173Chapter 7: Image Compression and Watermarking

Dilation

As the binary image is expanded from its original shape, dilation is defined as the

technique of expanding a binary image from one shape to another. The structural

element determines how the binary picture is expanded and how it is displayed.

This structuring ele ment is smaller in size when compared to the image itself, and

the size that is typically used for the structuring element is 3 × 3 pixels in size.

Dilation is similar to convolution in that the structuring element is mirrored and

shifted left to right and top to bottom; at each shift, the process looks for any

overlapping identical pixels between the structuring element and the binary image.

If there is an overlap, the pixels beneath the structuring element's center position

will be set to 1 or black.

Assume that X represents the reference image and B represents the structuring

element. Equation defines the dilation operation.

ܺ۩ܤൌ൛ݖหൣሺܤሻ௭ځܺ൧אܺൟ

Where B is the rotation of the image B about the origin. The equation indicates that

when the struc turing element B dilates the image X, the outcome element z is that

at least one element in B intersects with an element in X.

Whether this is the case, the position of the structural element within the image will

be set to 'ON'. This procedure is depicted in Fig. 7.4 a. I is represented by the black

square, and 0 is represented by the white square.

At first, the structuring element's center is aligned at position •. There is no overlap

between the black squares of B and the black squares of X at this point ; so, the

square will remain white at position •.

After that, the structural element will be transferred to the right. At position **, one

of B's black squares overlaps or intersects with X's black square.

As a result, the square at position • • will be tu rned to black. Consequently, the

structural element B is shifted left to right and top to bottom on picture X to produce

the dilated image shown in Fig. 7.4 a.

The dilation operator is a binary object enlargement operator. Dilation has a variety

of applica tions, but the most common is bridging gaps in an image, as B expands

the features of X. munotes.in

## Page 174

174IMAGE PROCESSING

Figure 7.4 Dilation

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

7.3 Opening and Closing

Apart from the two major operations of erosion and dilation, there are two

secondary operations that are critical in the processing of morphological images:

opening and its opposite, closing. We are primarily concerned in opening, while

the properties of closing are usua lly equivalent via complementation. While

opening is defined in terms of erosion and dilation, it has a more geometric

formulation in regards of structuring element fits, which serves as the basis for its

implementation.

Opening

The opening of image A by i mage B is denoted by ל and is defined as the result of

erosion and dilation by

ܣלܤൌሺܣٓܤሻْܤ

The functional symbol for opening isߛܤሺܣሻ. To illustrate the role of opening in

processing, we use the following corresponding formulation:

ܣלܤൌራሼܤ௫ǣܤ௫ؿܣሽ

The opening is determined in this case by union all translations of the structural

element that fit within the input image. Each fit is identified, and the opening is

determined by adding the translations of the structural elements to each identif ied

location. Indeed, this is exactly what eroding and eventually dilating refers to.

In Fig. 7.5 , a rectangle is eroded and then dilated by a disks to demonstrate how

opening is expressed as erosion followed by dilation. Additiona lly, the fitting effect

can be discerned: opening the rectangle has resulted in it being rounded from the

munotes.in

## Page 175

175Chapter 7: Image Compression and Watermarking

inside, this rounding being caused by the method in which the disks was "rolled

around" inside the rectangle in order to establish a union of the fits. If the structural

elem ent had been a small square with a horizontal base, no rounding occurred and

the opened image remained identical to the original.

In Fig. 7.5 , we have two instances of opening. By opening with a disc, you create

a filter that smooth from the inside out; in other words, it rounds corners that extend into the background. With a square structural element, the impression is significantly different.

Rather than viewing the opened image as the processing's final output, we might

use an alt ernative approach. Consider subtracting the opening from the input image

using set theory. This operator is referred to as the opening top -hat:

Figure 7.5 (a) Structuring element, (b) input image, (c) erosion, (d) opening.

Figure 7.6. (a) Structuring element, (b) input image, (c) opening, (d) opening top -hat.

(Reference from "Digital Image Processing fourth edition

by Rafael C. Gonzalez and Richard E. Woods)

The opening top -hat in Fig. 7.5 comprises of protruding input -image corners into

the background and can be used fo r recognition purposes. Figure 7.6 illustrates

another application of the opening top -hat to detect gear teeth. Although the disk is

frequently used because its shape effect is rotationally unchangeable, there are

numerous circumstances when it is advantageous to use other types of structuring

elements.

munotes.in

## Page 176

176IMAGE PROCESSING

Figure 7.7 (a) Structuring element, (b) input image, (c) dilation, (d) closing

(Reference from "Digital Image Processing fourth edition

by Rafael C. Gonzalez and Richard E. Woods)

Closing

Closing is a complementary procedure to opening, which is described as a dilation

followed by an erosion. The closing of A by B is indicated by ܣή B and defined by

ܣήܤൌሺܣ۩ܤሻٚܤ

In functional notation, closing is also denoted by ܤሺܣሻ. Closing is depicted in

Figure 7.7 . The effect is visible in the way the closing has been filtered from the

outside, smoothing only the protruding corners into the image.

Because closing is the dual operator of the opening,

ܣήܤൌሺܣιܤሻ

Because closing is synonymous with opening, opening is synonymous with

closing: yields that complement one another

ܣιܤൌሺܣήܤሻ

How the structuring element from the c losing is reflected. If the object is a disc or

any other symmetrical shape, reflection is irrelevant. We might use duality in

conjunction with the union formulation of opening, thus fitting, or "rolling the ball,"

around the image's perimeter. With the us e of an asymmetric al structuring element,

Figure 7.8 depicts the duality between open and close.

Due to the fact that the closing contains the input image, subtracting the input image

from the closing yields the closing top -hat operator:

ܣήƸܤൌሺܣήܤሻെܣ

Figure 7.9 illustrates the closing and the closing top -hat in the application. A

shape's convex hull is approximated by closing it with a large disk , and its convex

hull inadequacies are approximated by closing it with a top -hat. Due to their strong

discriminant property, these defects are frequently exploited in character recognition.

munotes.in

## Page 177

177Chapter 7: Image Compression and Watermarking

Figure 7.8 Duality between open and close with a nonsymmetrical

structuring element.

(Reference from "Digital Image Processing fourth edition

by Rafael C. Gonzalez and Richard E. Woods)

Figure 7.9 (a) Input image, (b) closing by a disk, (d) closing top -hat.

(Reference from "Digital Image Processing fourth edition

by Rafael C. Gonzalez and Richard E. Woods) Consider the image and structural element, with the opening and closing represented. Opening, as a filter, has cleaned the boundary by removing minor

extrusions; however, it has done so much more precisely than erosion, with the

result that the opened image is a far more accurate duplicate of the original than the

eroded image. Similar remarks apply to the closing, with the exception of minor

invasions bein g filled. Notably, while the origin's position in relation to the

structuring element influences both erosion and dilation, it has no effect on opening

and shutting.

munotes.in

## Page 178

178IMAGE PROCESSING

Figure 7.10 (a) Input image, (b) structuring element, (c) opening, (d) closing.

(Reference from "Digital Image Processing fourth edition

by Rafael C. Gonzalez and Richard E. Woods)

7.4 The Hit -or-Miss Transform

The hit -and-miss transform is a basic binary morphological operation that can be

used to inspect an image for specific patte rns of foreground and background pixels.

It is, in fact, the fundamental operation of binary morphology, as it is the source of

practically all other binary morphological operators. As is the case with other

binary morphological operators, it accepts a bin ary image and a structuring element

as input and outputs another binary image.

The hit -or-miss transformation of an image A by B is denoted by A ٘B.

B is a structural element pair B=(B1,B2).. Instead of a single element,

B1: the collection of B elements a ssociated with a certain object

B2 A collection of B elements that relate to the background

As follows is the definition of the hit -or-miss transform:

ܣ٘ܤൌሺܣٚܤଵሻځሺܣٚܤଶሻ

This transform is important for locating all pixel configurations that correspond to

the B 1 structure (i.e. a match) but not to the B 2 structure (i.e. a miss). As a result,

the hit -or-miss transform is used to detect shapes.

Using the two structural elements B1 and B2, use the hit -or-miss transform to

determine the locations of the following shape pixel configuration in the image

below.

munotes.in

## Page 179

179Chapter 7: Image Compression and Watermarking

Solution:

The figure below illustrates how the hit -or-miss transform is applied to the image

from the preceding example.

Figure 7.11 (a) Binary image. (b) Result of applying hit -or-miss transform.

(Reference from "Digital Image Processing fourth edition

by Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 180

180IMAGE PROCESSING

7.5 Morphological Algorithms

Morphology's primary application is to extract image compone nts that are useful

for representing and describing shape. Boundary extraction, skeletonization (i.e.

extracting the skeleton of an object), and thinning are all performed using

morphological algorithms.

Boundary Extraction

The boundary of a set A, indicat HGE\ȕ$FDQEHGHWHUPLQHGLQWKHIROORZLQJ

manner:

Ⱦሺሻെെሺٚሻ

where B denotes the structuring element.

The figure below illustrates how to extract an object's boundary from a binary

image.

Figure 7.12 (a) Binary image. (b) Object boundary extracted using the previous

equation and 3×3 square structuring element.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Due to the fact that the structuring element is 3x3 pixels in size, the resulting

boundary is one pixel thick. Thus, utilizing the 5x5 structuring element results in a

boundary that is between 2 and 3 pixels thick, as illustrated in the following figure.

Figure 7.13 Object boundary extracted using 5×5 square structuring element

(Reference from "Digital Image Processing fourth edition

by Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 181

181Chapter 7: Image Compression and Watermarking

Thinning

Thinning is the process of reducing binary objects or shapes in an image to single -

pixel width strokes. The definition of thinning a set A by a structuring element B

is as follows:

ܣٔܤൌܣെሺܣ٘ܤሻൌܣځሺܣ٘ܤሻ

Because we are simply matching the pattern (shape) to the structuring elements,

there is no need for a background operation in the hit -or-miss transform.

B is a sequence of structuring elements in this case:

ሼܤሽൌሼܤଵǡܤଶǡܤଷǡǤǤǡܤሽ

where Bi denotes Bi -1's rotation. Thus, the following can be expressed as the

thinning equation:

ܣٔሼܤሽൌሺሺǤǤ൫ሺܣٔܤଵሻٔܤଶ൯ǤǤሻٔܤሻ

The entire procedure is repeated until no additional modifications are required. The

following figure illustrates how to thinning the fingerprint ridges to a thickness of

one pixel.

Figure 7.14 (a) Original fingerprint image. (b) Image thinned once. (c) Image

thinned twice. (d) Image thinned until stability (no changes occur).

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Figure 7.15 (a) illustrates a set of structural elements that are usually used for

thinning (notice that Bi is equivalent Bi-1 to rotated clockwise by 45o and Fig. 7.15

(b) illustrates a set A that is to be thinned using the approach mentioned before.

The result of thinning A with a single pass of B1 to obtain A 1 depicted in Figure

7.15 (c). The result of thinning A1 with B2 is depicted in Figure 7.15 (c), and the

outcomes of passes with the other structural components are depicted in Figures

7.15 (e) through (k) (there were no changes from A7to A8or from A9 to A11).

Convergence occurred following the second pass of B6 Figure 7.15 (l), which

depicts the thinned result. Finally, Fig. 7.15 (m) illustrates the converted m -

connectivity of the thinned set .

munotes.in

## Page 182

182IMAGE PROCESSING

FIGURE 7.15 (a) Structuring elements. (b) Set A. (c) Result of thinning A B1

with (shaded). (d) Result of thinning with A1 and B 2(e)–(i) Results of thinning

with the next six SEs. (There was no change between A7andA8 (j)–(k) Result of

using the first four elements again. (l) Result after convergence. (m) Result

converted to m -connectivity.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Thickening

Thickening is the morphological polar opposite of thinning, defined by the

expression

ܣٖܤൌܣڂሺܣ٘ܤሻ

where B is a thickening structuring element. As with thinning, thickening is a

sequential process:

ܣٖሼܤሽൌሺሺǤǤ൫ሺܣٖܤଵሻٖܤଶ൯ǤǤሻٖܤሻ

The structural components utilised for thickening are identical to those illustrated

in Fig. 7.16 (a), exc ept that both 1's and 0's have been changed. However, in

practise, a separate algorithm for thickening is rarely utilised. Rather than that, the

conventional approach is to thin the background of the set in question and then to

complement the outcome. In o ther words, to thicken a set A, we first form Ac a thin

Ac set and then complement it to achieve the thickened set A. This approach is

munotes.in

## Page 183

183Chapter 7: Image Compression and Watermarking

illustrated in Figure 7.16. As previously stated, we display only set A and picture

I, not the padded version of image I.

FIGURE 7.16 (a) Set A. (b) Complement of A. (c) Result of thinning the

complement. (d Thickened set obtained by complementing (c). (e) Final result,

with no disconnected points.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Convex hull

If the straight line segment connecting any two points in A lies entirely within A,

the set is called to be convex. The smallest convex set containing an arbitrary set S

is called the convex hull H. The difference between H and S is referred to as the

convex deficiency of S. Convex hull and convex deficient are useful for describing

objects.

Assume that B i, i=1,2,3,4 are the four structural elements. The approach entails

implementing the following equatio n:

ܺൌ൫ܺିٔܤ൯ܣ݅ൌͳǡʹǡ͵ǡͶ݇ൌͳǡʹǡ͵ǡͶ

with X 0 = A

When the procedure converges ( ܺൌܺെͳ), we let D i = ܺ. The convex hull of

A is then

ܥሺܣሻൌራܦସ

ିଵ

As an outcome, the procedure entails repeatedly applying the hit -or-miss transform

to A with B1; once no further changes take place, the union with A is conducted

and the outcome is denoted by D. Repeat the technique with B2 (applied to A) until

munotes.in

## Page 184

184IMAGE PROCESSING

no additio nal changes occur, and so forth... The convex hull of A is formed by the

union of the four resultant Ds.

Figure 7.17 Convex hull

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

One disadvantage of the approach is that the convex hull can become larger than

the minimum dimensions necessary to ensure convexity. One straightforward way

to mitigate this effect is to constrain growth to the vertical and horizontal

dimensions of the initia l collection.

Hole filling

A hole is defined as a region of background pixels that is encircled by a connected

border of foreground pixels.

Assume that A is a set containing eight connected boundaries, each boundary

covering a background region (a hole). T he purpose is to fill all holes with ones

given a point in each hole (for binary images). We begin by creating an array X 0 of

zeros (of the same size as the array containing A), except the locations in X 0 that

correspond to the given point in each hole, which are set to one. The following

process then replaces all of the holes with ones:

ܺൌሺܺିଵْܤሻתܣ k=1,2,3

munotes.in

## Page 185

185Chapter 7: Image Compression and Watermarking

where A is the symmetric structuring element

If X k = X k-1, the algorithm stops at iterat ion step k. Thus, the set X k contains all

filled holes, while the union of X k and A includes all filled holes and their

boundaries. If left unchecked, the dilatation could fill the entire region. However, the intersection with the complement of A at each s tep restricts the result to the region of interest. This is an illustration of how a morphological process might be conditioned in order to achieve a desired property. In the current context, it is

referred to as a conditional dilation.

Figure 7.18 Hole filling

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

An image that could result from thresholding to 2 levels a scene containing polished

spheres (ball bearings). Dark spots could be results of reflections. The objective is

to eliminate reflections by hole filling…

A (white) point selected Result of filling that Result of filling all inside one sphere

the spheres. A (white) point was selected as the result of filling all the spheres

within one sp here.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 186

186IMAGE PROCESSING

Pruning

Pruning techniques are critical for thinning and skeletonizing algorithms, as these

operations frequently leave parasitic elements that must be "cleaned up" in post -

processing. We begin with a pruning problem and work our way up to a

morphological solution. A popular approach to automated character recognition is

to evaluate the shape of each character's skeleton. These skele tons are frequently

defined by "spurs" (parasitic components). Spurs are formed during erosion as a

result of inconsistencies in the strokes that comprise the characters. We propose a

morphological technique for resolving this issue, presuming that the len gth of a

parasitic component does not exceed a predefined number of pixels.

Figure 7.19 Pruning

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

The method is focused on gradually deleting a parasitic branch's termination point.

Additionally, this reduces (or eliminates) the length of other branches in the

character. In the absence of additional structural information, any branch with three

or less pixels should be deleted. Thinning an input set A by the use of a sequence

of structural components aimed at detecting .

Let

ܺൌܣלሼܤሽ

where {B} denotes the order of the structural elements. This sequence is made up

of two distinct structures, each of which is rotated by 900 degrees to create a total

of 8 pieces.

munotes.in

## Page 187

187Chapter 7: Image Compression and Watermarking

After applying the equation 3 times, the set X1 is obtained, and the character is

"restored" to its original form but with the parasitic branches eliminated.

To begin, we create a set X 2 that c ontains all of the end points in X1:

ܺଶൌራሺܺଵ଼

ିଵ۪ܤሻ

where Bk denotes the identical structural parts. The following step is to dilate the

end points three times using set A as a delimiter:

ܺଷൌሺܺଶ۩ܪሻתܣ

where H is a three -dimensional three -dimensio nal structuring element composed

of ones and the intersection with A as applied after each step. This type of

conditional dilation prevents non -zero elements from appearing outside of the

region of interest.

Finally, by combining X3 and X1, the desired res ult is obtained:

ܺସൌܺଵܺଷ

X3 occasionally includes the "tips" of some parasitic branches in more complex

circumstances. This can occur when the branches' termination sites are close to the

skeleton. Although these elements may be deleted in X1, they may reappear during

the dilation because they are legitimate points in A. Because parasitic elements are

typically brief in comparison to legitimate strokes, they are rarely picked up again.

As they are disconnected regions, their detection and eradication ar e simple.

Skeletonization

Skeletonization is a technique for decreasing foreground regions in a binary image

to a skeletal residue that retains the majority of the original region's extent and

connectivity while discarding the majority of the original foreground pixels. To

understand how this operates, consider that the foreground regions of the input

binary image are composed of a uniformly slow -burning material. Simultaneously

light flames along the region's boundary and observe the fire spre ad towards the

interior. The fire will extinguish itself at spots when it crosses two distinct borders,

and these points are referred to as the 'quench line'. This is the skeleton line. It is

obvious from this definition that thinning results in the format ion of a skeleton.

Figure 7.20: Skeletonization, Medial axis transform.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 188

188IMAGE PROCESSING

Another approach to see the skeleton is as the loci of the centres of bi -tangent

circles that completely encompass the foreground region under consideration. This

is illustrated in Figure 7.21 for a rectangular shape.

Figure 7.21: Skeleton of a rectangle defined in terms of bi -tangent circles.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods) Although the terms medial axis transform (MAT) and skeletonization are frequently used interchangeably, we shall make a subtle distinction between them.

The skeleton is nothing more than a binary image of a simple skeleton. On the other

hand, the MAT is a grayscale image in which each point on the skeleton has an

intensity that corresponds t o its distance from an object border.

A set A's skeleton S(A) can be thought of as:

a) If z is a point in S(A) and (D)z is the largest disk centred on z and included

within A, no larger disks(not necessarily centred on z) containing (D)z and

contained wit hin A can be found. A maximal disks is denoted by the

disk(D)z.

b) The disk (D)z makes two or more distinct contacts with the boundary of A.

A's skeleton can be described in terms of erosions and openings as follows:

ܵሺܣሻൌራݏ

ୀሺܣሻ

Sk(A) = (A ٓ %í(A ٓ kB) ל

where B is a structural element that (A ٓ kB) denotes k sequential erosions

beginning with A; i. e., A is eroded first by B, followed by the result by B, and so

on:

(A ٓ kB) = ((…((A ٓ )ٓ )ٓ )…ٓ )

K is the last iterative step before A erodes to an empty set:

K = max{k |(A ٓ N% }

munotes.in

## Page 189

189Chapter 7: Image Compression and Watermarking

S(A) can be obtained as the union of the skeleton subsets S k(A),k = 0, 1, 2, …,K

.A can be reconstructed from these subsets:

ܣൌራሺܵሺܣሻْܤ݇

ୀሻ

Where ሺܵሺܣሻْܤ݇ )denotes a series of k sequential dilations, beginning with

Sk(A)

(ܵ (A) ْ kB ) = ((…(( ܵሺܣሻ ْ )ْ B) ْ )…ْ )

FIGURE 7.22 (a) Set A. (b) Various positions of maximum disks whose centers

partially define the skeleton of A. (c) Another maximum disk, whose center

defines a different segment of the skeleton of A. (d) Complete skeleton (dashed)

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

7.6 Morphological Reconstruction

Reconstruction is a morphological operation that entails the transformation of two

images and a structural factor (instea d of a single image and structuring element).

The change begins with a single image, the marker. The second image, the mask,

serves as a constraint on the transformation. Connectivity is defined by the

structuring element used. We will use 8 -connectivity ( the default), which indicates

that B is a 3 × 3 * matrix of 1s in the following discussion, with the center defined

at coordinates (2, 2).

If G is the mask and F is the marker, the reconstruction of G from F, denoted R G(F),

is defined by the following iter ative procedure:

1. Initialize h 1 to be the marker image, F.

2. Create the structuring element: B = ones(3).

3. Repeat

munotes.in

## Page 190

190IMAGE PROCESSING

݄ାଵൌሺ݄۩ܤሻתܩ

until݄ାଵൌ݄

4. RG(F)=݄ାଵ

Marker F must be a subset of G: ܨكܩ

The prior iterative approach is illustrated in Figure 10.21. While this iterative

formulation is conceptually sound, there are far quicker computational approaches

available.

FIGURE 7.23 Morphological reconstruction. (a) Original image (the mask). (b)

Marker image. (c) –(e) Intermediate result after 100, 200, and 300 iterations,

respectively. (f) Final result. (The outlines of the objects in the mask image are

superimposed on (b) –(e) as visual references.)

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Geodesic Dilation and Erosion

Geodesic dilation and erosion are key procedures in the field of morphological

reconstruction. Additionally, these processes add a new dimension to conventional

dilation and erosion. They allow us to rebuild certain shapes within an image.

Nevertheless, we are still working with binary images in this case. If you're

unfamiliar with binary images, they are images that contain ei ther black or white

pixels. In other words, we may divide these images into two groups: one foreground

and one background.

The unique aspect of these two methods is that we employ two images instead of

one for each. Likewise to the simple versions, we requ ire an image to which we

can apply dilation or erosion. Along with this image, we'll need a masking image

to control the resultant image's growth and shrinkage.

munotes.in

## Page 191

191Chapter 7: Image Compression and Watermarking

We must obtain the intersection of the masked picture and the dilated image using

geodesic dila tion. As previously stated, masking the image will constrain the

growth of subsequent dilations.

Let denote the marker image and the mask image, F كG.The geodesic dilation of

size 1 of the marker image with respect to the mask, denoted by ܦீሺଵሻሺܨሻ

ܦீሺଵሻሺܨሻൌሺܨْܤሻתܩ

The geodesic dilation of size of the marker image with respect to, denoted by

ܦீሺሻሺܨሻ is defined as

ܦீሺሻሺܨሻൌܦீሺଵሻሺܨሻ(ܦீሺିଵሻሺܨሻሻ

FIGURE 7.24 Illustration of a geodesic dilation of size 1. Note that the marker

image contains a point from the object in G. If continued, subsequent dilations

and maskings would eventually result in the object contained in G.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

The geodesic erosion of marker F's size 1 in relation to mask G is defined as

ܧீሺଵሻሺܨሻൌሺܨٓܤሻܩ

Where represents a predefined union (or logical OR operation). Geodesic

erosion of F with regard to G of size n is defined as

ܧீሺሻሺܨሻൌܧீሺଵሻሺܨሻ(ܧீሺିଵሻሺܨሻሻ

:KHUHQLVDSRVLWLYHQXPEHU ݀݊ܽܧீሺሻሺܨሻൌܨ and at each step, the set union

ensures that the geodesic erosion of an image keeps greater than or equal to its mask

image. As the forms , Indicate, geodesic dilation and erosion are duals in terms of

set complementation. Figure 7.25 illustrates a geodesic erosi on of size one. The

steps depicted are a direct application.

munotes.in

## Page 192

192IMAGE PROCESSING

FIGURE 7.25 Illustration of a geodesic erosion of size 1.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Geodesic dilation and erosion coincide over a finite number of repetitive steps as

the mask restricts the marker image's propagation or shrinkage.

Morphological Reconstruction by Dilation and by Erosion

On the basis of previous notions, morphological reconstruction by dilation of a

marker image F with respect to a mask image G, indicated ܴீሺܨሻ, is defined as

iterative geodesic dilation of F with respect to G until stability is obtained; that is,

ܴீሺܨሻൌܦீሺሻሺܨሻ

with k such that

ܦீሺሻሺܨሻൌܦீሺାଵሻሺܨሻ

Reconstruction by dilation is seen in Figure 7.26. The process initiated in Figure

7.26 is continued in Figure 7.26 (a). After acquiring ܦீሺଵሻሺܨሻ this result, the next

stage in reconstruction is to dilate it and then AND it with mask G ܦீሺଶሻሺܨሻ , as

shown in Fig. 7.26 (b). Following that, dilation of ܦீሺଶሻሺܨሻ and masking with G

produces ܦீሺଷሻሺܨሻand so on. This approach is continued until the sys tem reaches a

state of stability. Adding one additional step to this example results ܦீሺହሻሺܨሻൌ

ܦீሺሻሺܨሻ in the image, morphologically reconstructed via dilation, being provided

by ܴீሺሻሺܨሻൌܦீሺହሻሺܨሻ as illustrated. As expected, the reconstructed image i s

identical to the mask.

munotes.in

## Page 193

193Chapter 7: Image Compression and Watermarking

FIGURE 7.26 Illustration of morphological reconstruction by dilation. Sets ܦீሺଵሻሺܨሻǡG, B and F are from Fig. 7.24 . The mask (G) is shown dotted for reference.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Similarly, the morphological reconstruction by erosion of a marker image F with

regard to a mask image G, indicated ܴீሺሻሺܨሻ, is specified as the iterat ive geodesic

erosion of F with respect to G; that is,

ܴீாሺܨሻൌܧீሺሻሺܨሻ

with k such that

ܧீሺሻሺܨሻൌܧீሺାଵሻሺܨሻ

Create a figure similar to Fig 7.26a.m. for erosion -based morphological

reconstruction. In terms of set complementation, reconstruction through dilation

and erosion are duals.

Opening by Reconstruction

In morphological opening ( ܣٓܤሻْܤ ,the erosion operation eliminates objects

smaller than structuring element B, and the dilation operation recovers the size and

shape of the remaining objects. However, restoration precision is greatly dependent

on the type of structural element and the fo rm of the restored objects during the

dilation operation. The opening by reconstruction method allows for a more comprehensive restoration of items following erosion. It is defined as the reconstruction of n erosions of F by B in relation to using geodesic dilation.

munotes.in

## Page 194

194IMAGE PROCESSINGܱோሺሻሺܨሻൌܴிሺሻሺܨٓܤ݊ሻ

Where ܨٓܤ݊ is a marker image and F is a mask image used in morphological

reconstruction via dilation.

ܴிሾሺܨٓܤ݊ሻሿൌܦிሺሻሾሺܨٓܤ݊ሻሿ D represents geodesic dilation with k iterations

until stability is reached, i. e.

ܦிሺሻሾሺܨٓܤ݊ሻሿൌܦிሺିଵሻሾሺܨٓܤ݊ሻሿ . Hence ܦிሺଵሻሾሺܨٓܤ݊ሻሿൌሺሾሺܨٓ

ܤ݊ሻሿْܤሻתܨ Because the mask image constrains the marker image within the

growth region, the dilation operation on the marker image will not extend outside

the mask image. The marker image is thus a subset of the mask image. ሾሺܨٓ

ܤ݊ሻሿكܨ

The images below demonstrate a straightforward opening -by-reconstruction

procedure for extracting vertical strokes from an input text image. Due to the fact

that the original image was transformed from grayscale to binary, it contains a few

distortions in some characters, resulting in i dentical characters having differing

vertical lengths. The structural element in this example is an 8 -pixel vertical line

that is used during the erosion operation to locate things of interest. Additionally,

morphological reconstruction by dilatation ܴிሾሺܨٓܤ݊ሻሿൌܦிሺሻሾሺܨٓܤ݊ሻሿ

iterates k =9 times until the resulting image converges.

FIGURE 7.27 Original image for opening by reconstruction

FIGURE 7.28 Marker image

munotes.in

## Page 195

195Chapter 7: Image Compression and Watermarking

FIGURE 7.29 Result of opening by reconstruction

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Automatic Algorithm for Filling Holes

We devised an algorithm for filling holes based on the knowledge of a hole's starting place. We present a fully automated approach for morphological reconstruction. Assume that I(x, y) is a binary image, and that we create a marker

image F that is 0 everyw here except at the image boundary.

ܨሺݔǡݕሻൌሼͳെܫሺݔǡݕሻ

Ͳݐ݄݁ݏ݅ݓݎ݂݁݅ሺݔǡݕሻݏܾ݅݊ݎ݀ݎ݂݁ܫ

Only those dark pixels of I(x,y) that are adjacent to the b oundary have a value of

1 in F(x,y).

The binary image that contains all regions (holes) is given by: ܪൌሾܴூሺሻሺܨሻሿ

We seek to fill the hole by image I.

The complement surrounds the hole with a wall.

The marker image F is one at the border except for the original image's border

pixels.

munotes.in

## Page 196

196IMAGE PROCESSING

FIGURE 7.29.1Hole filling using morphological reconstruction.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

The marker F dilates inward starting at the boundary.

The complement acts as an AND mask, preventing any foreground pixels (including the wall) from altering throughout iterations.

Only the hole points are provided by the final procedure.

A more practical illustration is shown in Figure 7.30. The complement of the text

image in Figure 7.30 (a) is shown in Fig ure 7.30 (b), and the marker image, F,

produced is shown in Figure 7.30 (c). This image is entirely black with a white (1's)

border, except for areas corresponding to 1's in the original image's border (the

border values are difficult to identify visually at the magnification exhibited, and

also since the page is practically white). Finally, Fig. 7.30 (d) illustrates the image

after all holes have been filled.

FIGURE 7.30 (a) Text image of size pixels. (b) Complement of (a) for use as a

mask image. (c) Marker image. (d) Result of hole -filling

(Reference from "Digital Image Processing fourth edition

by Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 197

197Chapter 7: Image Compression and Watermarking

Border Clearing

Object extraction from images is a critical problem in automated image analysis.

A procedure for deleting items that touch (are connected) the image boundary is

advantageous since it leaves only complete objects for subsequent processing. it

indicates that only a portion of an object remains visible in the field of view.

As a mask, the original image is used. The image of the marker is

ܨሺݔǡݕሻൌሼͳെܫሺݔǡݕሻ

Ͳݐ݄݁ݏ݅ݓݎ݂݁݅ሺݔǡݕሻݏܾ݅݊ݎ݀ݎ݂݁ܫ

The border clearing algorithm calculates a morphological reconstruction first

ܴூሺܨሻ, which essentially extracts the items that touch the boundary, and then

generates a new image with no objects touching the borders ܫെܴூሺܨሻ.

Examine the original text image from Fig. 7.31 (a) once again. Figure 7.31 (a)

illustrates the reconstruction ܴூሺܨሻgenerated by employing a 1's 3x3 structural

element. The objects that contact the original image's border can be seen on the

right side of Fig. 7.31 (a). The image X in Figure 7.31 (b) was produced. If the task

at hand is automated character recognition, obtaining an image in which no

characters contact the border is extremely beneficial because it avoids the difficulty

of recognizing partial characters (at best a challenging work).

FIGURE 7.31 (a) Reconstruction by dilation of marker image. (b) Image with no

objects touching the border.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

7.7 Summary of Morphological Operations on Binary Images

The structure elements employed in the various binary morphological approaches

covered thus far are summarized in Figure 7.32. The shaded elements represent

foreground values (which are normally denoted by 1's in numerical arrays), the

white elements represent background values (which are typically denoted by 0's),

and the x's represent "don't care" elements.

munotes.in

## Page 198

198IMAGE PROCESSING

FIGURE 7.32 Five basic types of structuring elements used

for binary morphology

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Grayscale Morphology

Grayscale morphology's structuring components provide the same primary role as

their binary counterparts: they act as "probes" to evaluate a given image for specific

properties. In grayscale morphology, structural features fall into one of two

categories: nonflat and flat. Each is illustrated in Figure 7.33. The image in Figure

7.33 (a) depicts a hemispherical grayscale SE, while Figure 7.33 (c) depicts a

horizontal intensity profile through its center. Figure 7.33 (b) depicts a flat

structuring element in the shape of a disk, while Figure 7.33 (d) depicts the

element's equivalent intensity profile. To facilitate understanding, the elements in

Fig. 7.33 are depicted as continuous quantities; their computer implement ation is

based on digital approximations. Grayscale nonflat SEs are not widely utilized in

practice due to a variety of issues. The erosion of an image f caused by a SE b at

any point (x,y) is defined as the image's minimum value in the region coinciding

with b when b's origin is at (x,y):

FIGURE 7.33 Nonflat and flat structuring elements, and corresponding horizontal

intensity profiles through their centers. All examples in this section

are based on flat SEs.

(Reference from "Digital Image Processing fourth edition

by Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 199

199Chapter 7: Image Compression and Watermarking

Grayscale Erosion and Dilation

The grayscale erosion of f caused by a flat structuring element b at locations (x, y)

is defined as the image's least value in the region coincident with b(x, y) when b's

origin is at (x, y) . The erosion of an image f by a structuring element b at (x, y) is

expressed in equation form as

ሾ݂ܾٓሿሺݔǡݕሻൌ

ሺ௦ǡ௧ሻאሼ݂ሺݔݏǡݕݐሻሽ

A similar concept can be applied to spatial correlation, in which the coordinates x

and y are incremented through all possible values to arrive at the origin of b, which

then visits every pixel in f. That is, we locate the structural element at every pixel

location in the image, and use that as the starting point to find the erosion of f by b.

Erosion happens only at the locations that have a minimum value of f with respect

to b. For example, if b is a square structuring element of size 3X3 and a point has

the minimum value of f in the region that it spans, then finding the erosion at that

point requires locating the minimum of the nine values of f that are located within

the defined region that is spanned by b when its origin is at that point.

Corresponding ly, the grayscale dilation of f via a flat structuring element b at any

point on the coordinate plane (x, y) is determined by the maximum value of the

image in the window spanned by ܾwhen the origin of ܾis at (x, y). That is,

ሾْ݂ܾሿሺݔǡݕሻൌ

ሺ௦ǡ௧ሻאሼ݂ሺݔെݏǡݕെݐሻሽ

Whereas in grayscale erosion with a flat SE, each neighbourhood of (x, y)

containing a coincident of b computes the minimum intensity value of f in the

neighbourhood, we expect that the eroded image will be darker than the original. It

also f ollows that bright features will be reduced in size, while dark features will be

enlarged. In Fig. 7.34 (b), erosion is demonstrated by using a disk SE with a radius

of 2 pixels and a height of 2 pixels. What we're seeing here is obvious: the effects

that have been described are apparent in the eroded image. Consider, for example,

the extent to which the int ensities of the small bright dots in Fig. 7.34 (b) were

reduced, which caused them to barely be visible. At the same time, the dark features

in the figure grew in thickness. The erosion is slightly darker in general compared

to the original image's backgro und.

FIGURE 7.34 (a) Gray -scale X -ray image of size pixels. (b) Erosion using a flat

disk SE with a radius of 2 pixels. (c) Dilation using the same SE

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 200

200IMAGE PROCESSING

As shown in Fig. 7.34 (c), this was the outcome of dilation of the same SE . Erosion

does the opposite of creating these effects. To make the features more apparent, the

light ones were thickened and the darker ones were made less noticeable. In Fig.

7.34 (a) lacks the visible of thin black connecting wires located in the left, mi ddle,

and right bottom. However, unlike the eroded small white dots in Fig. 7.34 (b), the

dark dots in the dilated image are still obvious. As it turns out, the reason the black

dots are larger than the white dots is because the black dots were initially l arger

than the white dots in terms of their size relative to the SE. Finally, compare the

dilated and non -dilated images in Fig. 7.34 (a)

The erosion of image f by a nonflat SE b N is defined as:

ሾ݂ܾٓሿሺݔǡݕሻൌ

ሺ௦ǡ௧ሻאಿሼ݂ሺݔݏǡݕݐሻെܾேሺݏǡݐሻሽ

The dil ation of image f by a nonflat SE b N is defined as:

ሾْ݂ܾேሿሺݔǡݕሻൌ

ሺ௦ǡ௧ሻאಿሼ݂ሺݔെݏǡݕെݐሻܾேሺݏǡݐሻሽ

When the SE is flat, the equations become identical to previous ones, as long as

they do not contain a constant.

When dual operations like erosion a nd dilation are taken into consideration, one's

functions and abilities can be complemented and reflected.

ሾ݂ܾٓሿሺݔǡݕሻൌൣْ݂ܾ൧ሺݔǡݕሻ

Similarly,

ሾْ݂ܾሿሺݔǡݕሻൌൣ݂۩ܾ൧ሺݔǡݕሻ

Grayscale Opening and Closing

The opening of image f by SE b is:

݂לܾൌሺ݂ܾٚሻ۩ܾ

Opening is merely f being eroded of f by b, followed by being dilated the

result by b. Likewise, the grayscale closing of f by b is signified by ݂ڄܾ

݂ڄܾൌሺ݂۩ܾሻܾٚ

Grayscale images' opening and closing are duals in terms of complementation and

SE reflection:

ϳ͘ϯϱAnd ሺ݂לܾሻൌ݂ڄܾ munotes.in

## Page 201

201Chapter 7: Image Compression and Watermarking

The concept is illustrated in one dimension in Figure 7.35. Assume the curve in

Fig. ϳ͘ϯϱ(a) represents an image's intensity profile along a single row. In Figure

ϳ͘ϯϱ(b), a flat structuring element is shown in so many positions, pushed up against

the curve's bottom. The thick curve in Fig. ϳ͘ϯϱ(c) represents the entire opening.

So because s tructuring element is too large to fit completely inside the upward

peaks of the curve, the opening clips the tops of the peaks, with the amount clipped

proportional to the structuring element's reach into the peak. In general, openings

are used to elimina te small, bright details while maintaining the overall intensity

levels and larger bright features.

FIGURE 7.35 Grayscale opening and closing in one dimension. (a) Original 1 -D

signal. (b) Flat structuring element pushed up underneath the signal (c) Opening.

(d) Flat structuring element pushed down along the top of the signal. (e) Closing

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Closing is depicted graphically in Figure ϳ͘ϯϱ(d). Take note o f how the structuring

element is pushed down onto the curve as it is translated to all regions. The closing,

as illustrated in Fig. ϳ͘ϯϱ(e), is formed by determining the lowest points achieved

by every part of the structuring element since it slides again st the curve's upper

side. The grayscale opening compliance with the following requirements:

a. ݂ιܾռ݂

b. ݂݂݅ଵռ݂ଶݐ݄݂݊݁ଵιܾռ݂ଶιܾ

c. ሺ݂ιܾሻלܾ=݂ιܾ

The notation ݍռݎ is being used to denote that the domain of q is a subset of the

domain of r, as well as that any (x, y) in the domain of q is also a subset of the

domain of r.

munotes.in

## Page 202

202IMAGE PROCESSING

Likewise, the closing operation meets the following requirements:

a. ݂ռ݂ڄܾ

b. ݂݂݅ଵռ݂ଶݐ݄݂݊݁ଵڄܾռ݂ଶڄܾ

c. (݂ڄܾሻڄܾ=݂ڄܾ

Example 1 Grayscale opening and closing.

Figure 7.36 broadens the one -dimensional concepts illustrated in Figure 7.36 to two

dimensions. 7.35 The image in Figure 7.36 (a) is identical to the one and Fig. 7.36

(b) is the opening created by a disk structuring element with a height of one pixel

and a radius of three pixels. As expected, the intensity of all bright features

decreased in proportion to their sizes relative to the SE. When compared to Fig.

7.36 (b), we see that, in contrast to erosion, opening had a negligible effect on the

image's dark features and a negligible effect on the background. Likewise, Fig. 7.36

(c) illustrates the image being closed with a disc of radius 5 (because the small

round black dots are larger than the small white dots, a larger disc was required to

achieve comparable results to the opening). The bright details and background in

this image were largely unaffected, but the dark features were attenuated, with the

degree of attenuation dependent on the features' relative sizes to the SE.

FIGURE 7.36 (a) A grayscale X -ray image of size pixels. (b) Opening using a

disk SE with a radius of pixels. (c) Closing using an SE of radius 5.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

7.8 Some Basic Grayscale Morphological Algorithms

Numerous grayscale morphological techniques have been developed on the basis

of the grayscale morphological concept s introduced thus far. The following

discussion illustrates several of these algorithms.

Due to the fact that opening inhibits bright details smaller than the predefined SE

while removing dark details fairly unaffected and closing has the opposite effect,

munotes.in

## Page 203

203Chapter 7: Image Compression and Watermarking

these two operations are frequently combined as morphological filters for image

smoothing and noise removal. Consider Fig. ϳ͘ϯϳ(a), which depicts an X -ray image

of the Cygnus Loop supernova. Suppose again for purpose of this study that the

central light region is the objective and that the smaller components are noise. Our

goal is to eliminat e noise. The result of opening the original image with a flat

disk of radius 1 and then closing it with a SE of the same size is shown in Figure

ϳ͘ϯϳ(b). The figures ϳ͘ϯϳ(c) and (d) illustrate the same operation with SEs of 3

and 5 radii, respectively. A s expected, this sequence demonstrates progressive

elimination of small components as SE size increases. The final result demonstrates

that noise has been largely eliminated. The noise components on the lower right

side of the image could not be completely removed due to their size being larger

than the other successfully removed image elements.

FIGURE ϳ͘ϯϳ(a) image of the Cygnus Loop supernova, taken in the X -ray band

by NASA’s Hubble Telescope. (b) –(d) Results of performing opening and closing

sequences on the original image with disk structuring elements

of radii, 1, 3, and 5, respectively

(Reference fro m "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Morphological Gradient

Dilation and erosion, in conjunction with image subtraction, can be used to obtain

the morphological gradient, g, of a grayscale image f in the following manner:

݃ൌሺ݂۩ܾሻെሺ݂ܾٓሻ

Where b denotes an appropriate structuring element. The cumulative consequence

of this equation is that dilation thickens and erosion shrinks regions in an image.

Their distinction highlights the divisions between reg ions. Because homogenous

regions are unaffected (as long as the SE is not too large in relation to the image's

resolution), the subtraction operation seeks to eliminate them. As a result, an image

munotes.in

## Page 204

204IMAGE PROCESSING

is created in which the edges are enhanced and the homogene ous areas are

suppressed, creating a "derivative like" (gradient) effect.

An illustration of this is shown in Figure 7.38. Figure 7.38 (a) depicts a CT scan of

the head, while the following two figures depict the opening and closing with a flat

SE of 1's. Take note of the aforementioned thickening and shrinking. The

morphological gradient obtained is depicted in Figure 7.38 (d). As expected of a 2 -

D derivative image, the boundaries between regions are clearly defined.

FIGURE 7.38 (a) image of a head CT scan. (b) Dilation. (c) Erosion. (d)

Morphological gradient, computed as the difference between (b) and (c)

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Top-Hat and Bottom -Hat Transformations

Combini ng image subtraction with open and closed operations yields a top -hat

transform and a bottom -hat transform. A grayscale image f's top -hat transformation

is defined as f minus its opening.

ܶ௧ሺ݂ሻൌ݂െሺ݂ڄܾሻ

Similarly, the bottom -hat transformation of f is defined as the closing of f minus fௗ:

ܤ௧ሺ݂ሻൌሺ݂ڄܾሻെ݂

These transformations have a variety of applications, one of which is the removal

of objects from images using an element that opens or closes instead of resizing the

object that was removed. After that, the difference operation produces an image

that contains only the component that was deleted. In the case of bright objects o n

a dark background, the top -hat transform is used, whereas the bottom -hat transform

munotes.in

## Page 205

205Chapter 7: Image Compression and Watermarking

is used in the case of the opposite. The white top hat tr ansform and the black

bottom -hat transform are two common names for these transformations.

Take, for example, Fig. 7.39 (a), which depicts a grain of rice in various states of

development. In this image, nonuniform lighting was used to create the image, as

evidenced by the darker area in the bottom rightmost portion of the image. As

shown in Figure 7.39 (b), thresholding was performed with the Otsu method, which

is an optimal thresholding meth od . When nonuniform illumination is used in

conjunction with a nonuniform image, the result is segmentation errors in the dark

area (where several grains of rice were not obtained from the background) and in

the top left portion, where parts of the backgr ound were interpreted as rice. Figure

7.39 (c) depicts the image being opened by a disk with a radius of 40 degrees. This

SE was large enough that it could not be accommodated in any of the objects. A

result of this was that the objects were removed, havin g left only a close estimate

of the background to be seen. In this image, the shading pattern is clearly visible.

It is expected that the background will become more uniform as a result of

subtracting this image from the original (i.e., applying a top -hat transformation).

As illustrated in Fig. 7.39 (d), this is indeed the case. Although the background is

not perfectly uniform, the differences between light and dark extremes are less

extreme, and this was sufficient to produce a correct thresholding result, in which

all of the rice grains were properl y extracted using Otsu's method, as shown in Fig.

7.39 (e).

FIGURE 7.39 Using the top -hat transformation for shading correction. (a)

Original image of size pixels. (b) Thresholded image. (c) Image opened using a

disk SE of radius 40. (d) Top -hat transfor mation (the image minus its opening).

(e) Thresholded top -hat image.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 206

206IMAGE PROCESSING

Granulometry

Granulometry is a branch of science that is concerned with determining the size

distribution of particles in an image by analysing the image. It is difficult to count

particles when they are not neatly separated, which makes counting based on

individual pa rticle identification a difficult task. It is possible to estimate particle

size distribution indirectly using morphology, without having to identify and

measure individual particles, by using morphology.

The technique is simple and clear. When dealing wit h particles that have regular

shapes and are lighter in color than the background, the technique comprises in

implementing openings with SEs that grow in size as the particle size increases.

The underlying concept is that opening operations of a specific s ize should have

the greatest impact on regions of the input image that contain particles of similar

size to the opening operation. We compute the sum of the pixel values for each

image that results from an opening in the window. Because, as we discussed ea rlier,

openings reduce the intensity of light features in an image, this sum, which is

referred to as the surface area, decreases as the SE size increases. In this case, the

procedure produces a 1 -D array, with each element corresponding to a single pixel

in the opening for the size SE corresponding to that particular location in the array.

For the purpose of highlighting the differences between successive openings, we

compute the difference between adjacent elements of the one -dimensional array. If

the dif ferences are plotted, the peaks in the plot are indicative of the size

distributions of the particles in the image that are most prevalent in the image.

Consider the image of wood dowel plugs in Fig. ϳ͘ϰϬ(a), which depicts two

dominant sizes of wood dowel plugs. Due to the possibility of variations in the

openings caused by the wood grain in the dowels, smoothing the openings is a

sensible preprocessing step. Figure ϳ͘ϰϬ(b) depicts an image that has been

smoothed using the morphological smoothing filter d iscussed previously, with a

disk radius of 5. Figures ϳ͘ϰϬ(c) through (f) depict image openings created by disks with radii of 10, 20, 25, and 30 degrees, respectively. The small dowels' contribution to the intensity contribution in Fig. ϳ͘ϰϬ(d) has been nearly

eliminated, as can be seen in the figure. The contribution of the large dowels has

been significantly reduced in Fig. ϳ͘ϰϬ(e), and it has been reduced even further in

Fig. ϳ͘ϰϬ(f). Because it is smaller in size than the other larger dowels, the l arge

dowel near the top right corner of the image is much darker than the others in Fig.

ϳ͘ϰϬ(e). If we had been attempting to detect faulty dowels, this would have been

useful information to have. munotes.in

## Page 207

207Chapter 7: Image Compression and Watermarking

FIGURE ϳ͘ϰϬ(a) image of wood dowels. (b) Smoothed ima ge. (c) –(f) Openings

of (b) with disks of radii equal to 10, 20, 25, and 30 pixels, respectively.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

A plot of the difference array is depicted in Figure 7.41. As stated earlier, we

assume statistically significant differences (peaks in the plot) across radii at which the SE is significant enough to accommodate a collection of particles with approximately the same diameter as the particle . The result shown in Fig. ϳ͘ϰϭhas

2 different peaks, which indicates that there are two dominant object sizes in the

image.

FIGURE 7.41 Differences in surface area as a function of SE disk radius, r. The

two peaks indicate that there are two dominant pa rticle sizes in the image.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 208

208IMAGE PROCESSING

Textural Segmentation

Figure 7.42 (a) depicts a image of dark blobs overlaid on a light background, which

is a noisy image. The image is divided into two textural regions: a region on the

right that is composed of large blobs, and a region on the left that is composed of

smaller blobs. Finding a boundary between two regions depending on their textural

content, although i n this scenario is dictated by the sizes and spatial distribution of

the blobs. The process of dividing an image into regions is referred to as

segmentation.

FIGURE 7.42 Textural segmentation. (a) A image consisting of two types of blobs.

(b) Image with small blobs removed by closing (a). (c) Image with light patches

between large blobs removed by opening (b). (d) Original image with boundary

between the two regions in (c) superimposed. The boundary was obtained using a

morphological gradient.

(Reference from "Digital Image Processing fourth edition

by Rafael C. Gonzalez and Richard E. Woods)

As we can see, the objects of interest are darker than the background, and thus we

know that if we close the image with a structural element that is large r than the

small, distracting blobs, the blobs will be erased. Fig. 7.42 (b), which shows the

outcome of closing the input image with a disk with a radius of 30 pixels,

demonstrates that this is in fact the case. It is around 25 pixels in radius for the

smallest blobs. As a result, we have an image of large, dark blobs on a light

background at this stage. We can achieve a similar result by opening this image

with a structuring element that is large in comparison to the separation between

these blobs. The ne t outcome must be an image wherein the light patches between

both the blobs are erased, leaving the dark blobs and the newly dark patches

between these blobs. A disk with a radius of 60 is used to obtain the goals shown

in Figure 7.42 (c).

munotes.in

## Page 209

209Chapter 7: Image Compression and Watermarking

Grayscale Morpho logical Reconstruction

Grayscale morphological reconstruction is based on the same principles as binary

images. The marker and mask pictures, respectively, are denoted by the f and g. We

will consider that the both images are grayscale images of the same size, which

means that the intensity of f at any point in the image will be smaller than the

intensity of g at that point in the image. With respect to g, the geodesic dilation of

f of size 1 is seen andis defined as.

ܦሺଵሻሺ݂ሻൌሺْ݂ܾሻٿ݃

ר Represents the point -wise minimum operator and b is a valid structuring

element. The geodesic dilation of size 1 is obtained by computing the dilation of f

by b first, then choosing the mini mum between the result and g at each location (x,

y)

The geodesic dilation of size n of f with respect to g is defined as

ܦሺሻሺ݂ሻൌܦሺଵሻሺܦሺିଵሻ݂ሻ

With ܦሺሻሺ݂ሻൌ݂

Similarly, the geodesic erosion of size 1 of f ௗZLWK respect to g is defined as

ܧሺଵሻሺ݂ሻൌሺ݂ܾٓሻڀ݃

where denotes the point -wise maximum operator. The geodesic erosion of size n is

defined as

ܧሺሻሺ݂ሻൌܧሺଵሻሺܧሺିଵሻ݂ሻ

With ܧሺሻሺ݂ሻൌ݂

The morphological reconstruction by dilation of a grayscale mask i mage, g, by a

grayscale marker image, f, indicated by is described as the geodesic dilation of f

with respect to g, iterated until stability is achieved;

ܴሺሻሺ݂ሻൌܦሺሻሺ݂ሻ

With k such that ܦሺሻሺ݂ሻൌܦሺାଵሻሺ݂ሻ

The morphological reconstruction caused by erosion of g by f, defined by ܴሺாሻሺ݂ሻ

is similarly defined as

ܴሺாሻሺ݂ሻൌܧሺሻሺ݂ሻ

With k such that ܧሺሻሺ݂ሻൌܧሺାଵሻሺ݂ሻ munotes.in

## Page 210

210IMAGE PROCESSING

For the same reasons as in the binary case, opening by reconstruction of grayscale

images begins with eroding the input image and using it as a marker, and then

utilizes that image as the mask. The opening through reconstruction of size n of an

image f is represented by dilation of the erosion of size n of f with respect to f; that

is,

ܱோሺሻሺ݂ሻൌܴሺሻሺ݂ܾٓ݊ሻ

Where ݂ܾٓ݊ GHQRWHVQVXFFHVVLYHHURVLRQVE\EVWDUWLQJZLWKIௗ

Furthermore, the closing of an image f by reconstruction of size n is defined as the

reconstruction by erosion of the dilation of size n of f with respect to f; that is,

ܥோሺሻሺ݂ሻൌܴሺாሻሺْ݂ܾ݊ሻ

In this case, represent n the number of successive dilations by b, starting from the

letter f. Beca use of duality, it is possible to achieve the closing by reconstruction

of an image by complementing the image, acquiring the opening by reconstruction,

and then complementing the outcome. As a final point, as demonstrated in the

following example, an imp ortant method known as top -hat by reconstruction

involves subtracting with an image the opening created by reconstruction.

7.9 Summary

We have seen Morphological image processing techniques like Erosion and

dilation ,Opening and Closing ,The Hit -or-Miss Transf orm and different types of Morphological Algorithms which can applied in various image processing applications. Reconstruction is a morphological operation over an images. The

Studies of different Morphological Operations on Binary Images .Morphological

Algorithms on basic grayscale images

7.10 Unit End questions

1. What is a Morphological image processing?

2. Write a short note on Erosion and Dilation.

3. Write a short note on opening and closing .

4. Explain the Hit -or-Miss Transform.

5. Write a short note on Boundary Extraction.

6. Write a short note on Thinning and Thickening.

7. Write a short note on Convex hull and Hole filling. munotes.in

## Page 211

211Chapter 7: Image Compression and Watermarking

8. Write a short note on Pruning and Skeletonization.

9. Explain the Geodesic Dilation and Erosion.

10. Explain the Morphological Reconstruction by Dilati on and by Erosion .

11. Write a short note on Opening by Reconstruction .

12. State the algorithm for filling holes.

13. Explain the Grayscale Morphology.

14. Write a short note on Grayscale Erosion and Dilation.

15. Explain the Grayscale Opening and Closing.

16. Explain the Morphological Smoothing.

17. Explain the Morphological Gradient.

18. Write a short note on Top -Hat and Bottom -Hat Transformations.

19. What is a Granulometry ?

20. What is a Textural Segmentation?

7.11 Reference for further reading

1. Digital Image Processing Gonzalez and Woods Pearson/Prentice Hall Fourth

2018

2. Fundamentals of Digital Image Processing A K. Jain PHI

3. The Image Processing Handbook J. C. Russ CRC Fifth 2010

4. All Images courtesy from Digital Image Processing by Gonzalez and Woods,

4th Edition

munotes.in

## Page 212

212IMAGE PROCESSING

Unit 4

8 IMAGE SEGMENTATION, EDGE DETECTION,

THRESHOLDING, AND REGION DETECTION

Unit Structure

8.0 Objective

8.1 Fundamentals

8.2 Thresholding

8.3 Segmentation by Region Growing and by Region Splitting and Merging

8.4 Region Segmentation Using Clustering and Superpixels

8.5 Region Segmentation Using Graph Cuts

8.6 Segmentation Using Morphological Watersheds

8.7 Use of Motion in Segmentation

8.8 Summary

8.9 Unit End questions

8.10 Reference for further reading

8.0 Objective

The chapter's segmentation algorithms are based on one of two fundamental

properties of image intensity values: discontinuity or similarity. In the first

category, an image is partitioned into regions according to abrupt changes in

intensity, such as edges . The second category of approaches is based on segmenting

an image into regions that are similar based on a set of predefined criteria. Methods

in this category include thresholding, region growth, and region splitting and

merging. We demonstrate that by combining methods from disparate categories, such as edge detection and thresholding, we can improve segmentation performance. Additionally, we discuss image segmentation using clustering and

superpixels, as well as an introduction to graph cuts, a techniq ue that is well -suited

for extracting the image's principal regions. Following that, a discussion of image

segmentation based on morphology is presented, an approach that combines several

of the characteristics of segmentation based on techniques. munotes.in

## Page 213

213Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

After reading this chapter, readers should be able to:

x Understand the characteristics of different types of edges encountered in

exercise.

x Understand how spatial filtering can be used for edge detection.

x Familiarize yourself with additional types of edge detection techniques that

extend beyond spatial filtering.

x Utilize several different approaches to gain a better understanding of image

thresholding.

x Understand how to optimize segmentation by combining thresholding and

spatial filtering.

x Familiarize yours elf with region -based segmentation techniques, such as

clustering and superpixels.

x Understand the segmentation techniques used with graph cuts and morphological watersheds.

x Understand the fundamental techniques for incorporating motion into image

segmentat ion.

8.1 Fundamentals

Let R denote the the whole spatial region that an image occupies. Segmentation is

a process that divides the image R into n subregions, R1, R2,..., Rn, in such a way

that

a. ڂܴൌܴ

ିଵ

b. ܴis a connected set, for i=0,1,2,3..,n;

c. ܴתܴൌݎ݂݈݈݆ܽ݅݀݊ܽǡ്݆݅

d. ܳሺܴሻൌܴܶܧܷݎ݂ൌͲǡͳǡʹǡ͵ǤǤǡ

e. ܳ൫ܴܴ൯ൌܴܶܧܷfor any adjacent regions ܴܴ݀݊ܽ

P (R k) signifies a logical predicate described over the points in the set R i and

whereas is the null set. Condition (a) specifies that the segmentation should be

complete; each pixel must be contained within a region. Conditions (b) require

that points wit hin a region be connected in a predefined way. As indicated by

condition (c), the regions must be disjoint. Condition (d) specifies the properties

that all pixels within a segmented region must satisfy —for example, P (Ri) = TRUE

if all pixels within Ri hav e the same g rey level. Finally, condition (e ) establishes

that regions Ri and Rj are distinct in terms of the predicate P. munotes.in

## Page 214

214IMAGE PROCESSING

As we can see, the fundamental issue in segmentation is partitioning an image into

regions that meet the aforementioned criteria.

Monochrome image segmentation algorithms are typically based on one of two

important properties of intensity values: discontinuity or similarity. In the first

category, we assume that region boundaries are sufficiently distinct from one

another a nd from the background to permit boundary detection via local intensity

discontinuities. Edge -based segmentation is the primary technique used in this

category. The second category of Region -based segmentation approaches is based

on partitioning an image into regions that are similar based on a set of predefined

criteria.

The prior concepts are illustrated in Figure 8.1. Figure 8.1 (a) depicts a region of

constant intensity overlaid on a darker, also constant -intensity background. The

overall image is comp osed of these two regions. The result of computing the inner

region's boundary using intensity discontinuities is shown in Figure 8.1 (b). Points

on either side of the boundary are black (zero) due to the absence of intensity

discontinuities in those regio ns. To segment the image, we allocate one level (for

example, white) to all pixels on or inside the boundary and another level (for

example, black) to all points outside the boundary. The result of such a procedure

is depicted in Figure 8.1 (c). As we can see, this result satisfies conditions (a)

through (c) stated at the beginning of this section. The predicate of condition (d) is

as follows: If a pixel is on or within the boundary, it should be labelled white;

otherwise, it should be labelled black. As sh own in Fig. 8.1 (c), this predicate is

TRUE for the point’s labelled black or white. Likewise, both segmented regions

(object and background) satisfy condition (e)

FIGURE 8 .1 (a) Image of a constant intensity region. (b) Boundary based on

intensity discontinuities. (c) Result of segmentation. (d) Image of a texture region.

(e) Result of intensity discontinuity computations (note the large number of small

edges). (f) Result of segmentation based on region properties

Reference from "Digital I mage Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 215

215Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

The following three images demonstrate segmentation by region. Figure 8.1 (d) is

similar to Figure 8.1 (a), other than that the inner region's intensities form a textured

pattern. The outcome of computing intensity discontinuities in this image is shown

in Figure 8.1 (e). Because of the many spurious intensity changes, it is hard to

identify a distinctive boundary for the original image, as many of the nonnegative

intensity variatio ns are linked to the boundary, making edge -based segmentation

ineffective. However, because the outer region is consistent, we only need a

predicate that distinguishes among textured and constant regions to solve this

segmentation problem. The standard dev iation of pixel values is a measure that

actually achieves this by being greater than zero in areas of the texture region and

less than zero in all other regions. The result of segmenting the original image into

subregions of varying sizes is depicted in F igure 8.1 (f). Each subregion was then

labelled white if the standard deviation of its pixels was positive (i.e., if the

predicate was TRUE), and zero if the standard deviation was negative. Due to the

fact that groups of squares were labelled with the sam e intensity, the result has a

"blocky" appearance around the region's perimeter (smaller squares would have

given a smoother region boundary). Finally, keep in mind that these results also

satisfy the five segmentation criteria stated at the start of this section.

Complete v/s partial segmentation

In complete segmentation

x Segmented disjoint regions uniquely correspond to objects in the input image.

x Cooperation with high computation levels that make use of problem's domain -specific knowledge is required.

In partial segmentation

x Segmented regions do not always correspond to the image object.

Discontinuity -based segmentation

In a discontinuity -based approach, an image's partitions or sub -divisions are

determined by abrupt changes in the image's intensity level. We are primarily

concerned with identifying isolated points, lines, and edges in an image. To do so,

we employ the 3 X 3 Mask operation.

Figure 8.2. An image

munotes.in

## Page 216

216IMAGE PROCESSING

Figure 8.3. 3X3 Mask

ൌǡ

ୀ

ୀሺǡሻ

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

The discontinuity -based segmentation can be classified into three approaches: (1)

Point detection, (2) Line detection, and (3) Edge detection.

Point, Line, and Edge Detection

This section discusses segmentation techniques that are relied on detecting sharp,

localised transformations in intensity. We are focused in three types of image

characteristics: isolated points, lines, and edges. Edge pixels are pixels where the

image's frequency sudden modifies, while edges (or edge segments) are collec tions

of connected edge p ixels.

Edge detectors are low -level image processing tools that are used to detect pixel

boundaries. A line can be thought of as a (typically) thin edge segment where the

intensity of the background one on each side of the line is significantly greater or

lesser than the intensity of the line pixels. Indeed, as we will explain shortly, lines

outcome in what are referred to as "roof edges." Finally, an isolated point can be

thought of as a foreground (background) pixel surrounded by other foreground

(backgro und) pixels.

Point Detection

In a digital image, the simplest type of discontinuity is a point. The most frequently

used technique for detecting discontinuities is to apply a (n X n) mask to each point

in the image. The mask is constructed in the manner de picted in Figure 2.

Figure 8.4. A mask for point detection

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 217

217Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

The point is located at the (x, y) coordinates of the mask's centre in an image. If the

equivalent value of R is

צܴצܶ

Where R denotes the mask's response at any point within the image and T denotes

a non -negative threshold value. This indicates that an isolated point is detected at

the specified value (x, y). This formulation is used to calculate the weighted

differences between the centre point and its neighbours, as an isolated point's grey

level will be significantly different than that of its neighbo urs [ ]. As illustrated in

Figure 3, the result of the point detection mask is as follows.

Figure 8.5 . (a) Gray -scale image with a nearly invisible isolated black point (b)

Image showing the detected point

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Line Detection

Line detection is the next level of complexity in terms of image discontinuity in the

direction of the image. A response could be computed f or any point in the image

that indicates the direction with which the point of a line is most closely associated.

Two masks ith and jth are used for line detection. Following that, there is

צܴצצܴצǡ്݅

It means that the corresponding points is more likely to be associated with a line in

the direction of the mask i.

Figure 8.6. Line Detector masks in (a) Horizontal direction (b) 45° direction (c)

Vertical direction (d) - 45° direction

Reference from "Di gital Image Processing fourth edition

by Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 218

218IMAGE PROCESSING

The position of a given pixel [] is defined by the greatest response calculation from

these matrices. Figure 8.7 illustrates the result of the line detection mask.

Figure 8.7. (a) Original Image (b) result showing with horizontal detector (c) with

45° detector (d) with vertical detec tor (e) with -45° detector

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

We can detect lines in a specified direction using lines detector masks. For instance,

we're interested in locating all lines th at are one pixel thick and oriented at -45

degrees. For this, we use a digitized (binary) portion of an electronics circuit's wire -

bond mask. The o utcomes are depicted in Figure 8.8 .

Figure 8.8. (a) Image of a wire -bond mask (b) Result of processing with the -45°

detector (c) Zoomed view of the top, left region of -45° detector (d) Zoomed view

of the bottom, right region of -45° detector (e) Absolute value of -45° detector (f)

All points whose values satisfied the condition g >= T, where g is the image in (e)

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Edge detection

Due to the peculiarity of isolated points and lines of unitary pixel thickness in most

practical applications, edge detection is the most frequently used technique for grey

level discontinuity segmentation. An edge is a line that separates two regions of

varying intensity. It is extremely useful for detecting discontinuities in images.

When an image t ransitions from darkness to l ightness or vice versa. Figure 8.9

illustrates the changes in intensity, first -order derivative, and second -order

derivative.

munotes.in

## Page 219

219Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

Figure 8.9. (a) Intensity profile (b) First -order derivatives

(c) Second -order derivatives

Referen ce from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

First -order derivatives.

First-order derivatives respond whenever there is a discontinuity in intensity levels.

On the leading edge, it's positive, and on the trailing edge, it's negative. An image

f(x, y) and the gradient operator f are given in the following equation:

݂ൌܩ௫

ܩ௬൨= ఋ

ఋ௫

ఋ

ఋ௬

Strength of രሬሬሬሬሬሬ݂ is given by

f = magnitude of രሬሬሬሬሬሬ݂

ൌඥܩ௫ଶܩ௬ଶ

؆צܩ௫צצܩ௬צ

It gives the strength of edge at location (x, y)

ߙሺݔǡݕሻൌିଵܩ௫

ܩ௬

munotes.in

## Page 220

220IMAGE PROCESSING

Figure 8 .10. (a) Original Image (b) צܩ௫צcomponent of the gradient along x -

direction (c) צܩ௬צcomponent of the gradient along y -direction (d) Gradient

Image צܩ௫צצܩ௬צ

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

It is possible to calculate the gradient of an image in several ways.

Prewitt Edge operator

Figure 8.11 . Masks used for Prewitt Edge operator

There is a gradient in the horizontal direction, and a gradient in the vertical

direction, which is found by a mask. The mask also computes the gradient in the

horizontal direction, which is found by another mask. ܩ௫ and ܩ௬ components can

be found at different locations in an image by using these two masks. In this way,

we can determine the strength and direction of the edge at a given location (x, y).

Sobel Edge operator

Figure 8.12 . Masks used for Sobel Edge operator

Reference from "Digital Image Processing fourth edition

by Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 221

221Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

It calculates the average value of an image. Image noise has a negative impact on

image quality. As a result, it is preferable to the prewitt edge operator because it

produces a smoothing effect and reduces spurious edges caused by noise in the

image.

Second -order derivatives

It is positive at the darker side and negative at the white side. It is very sensitive to

noise present in an image. That’s why it is not used for edge detection. But, it is

very useful for extracting some secondary information i.e . we can find out whether

the point lies on the darker side or the white side.

Zero -crossing: It is useful to identify the exact location of the edge where there is

gradual transition of intensity from dark to bright region and vice -versa.

The derivative o perators have a variety of higher -order relationships:

Laplacian operator

The Laplacian mask is given by:

Figure 8.13 . Masks used for Laplacian operator

Reference from "Digital Image Processing fourth edition

by Rafael C. Gonzalez and Richard E. Woods)

ൌࢾ

ࢾ +ࢾ

ࢾ

If we consider the diagonal elements:

Figure 8.14 . Masks used for Laplacian operator using 8 -connectivity

Reference from "Digital Image Processing fourth edition

by Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 222

222IMAGE PROCESSING

It is less sensitive to noise, and because of this, it can cause a double edge effect.

However, extracting secondary information is quite helpful with it.

To reduce the effect of noise, we first apply a Gaussian operator to smooth the

image, and then we use the Laplacian operator to refine it. Together, the LoG

(Laplacian of Gaussian) operator is known as the Laplacian of Gaussian operation.

LoG operator

The LoG mask is given by

Figure 8.15 . Masks used for LoG operator

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

The Gaussian operator is given by:

݄ሺݔǡݕሻൌቆെݔଶݕଶ

ʹߪଶቇ

Where ݔଶݕଶൌݎଶ

ൌቆݎଶെߪଶ

ߪସቇቆെݎଶ

ʹߪଶቇ

Canny operator

It is very important method to find edges by isolating noise from the image before

find edges of images, without affecting the features of the edges in the image and

then applying the tendency to find the edges in the image and the critical value for

threshold.

1. Convolve image f(r, c) with a Gaussian function to get smooth image g(r, c).

f(r, c) = f(r, c) * G(r, c)

2. Apply first difference gradient operator to compute edge strength.

3. Apply non-maximal or critical suppression to the gradient magnitude.

4. Apply threshold to the non -maximal suppression image.

munotes.in

## Page 223

223Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

The result of all operators are shown

Figure 8.16 . (a) Original image (b) Result using with Prewitt operator (c) Result

using with Roberts operator (d) Result using with Sobel operator (e)Result using

with LoG operator (f) Result using with Canny operator

Reference from "Digital Image Processing fourth e dition by

Rafael C. Gonzalez and Richard E. Woods)

Marr Hildreth Edge Detector (Laplacian of Gaussian)

Neuroscience is one of the sources of inspiration for the Marr Hildreth edge

detector. The filter Marr developed is a Laplacian filter. The Laplacian filter is especially susceptible to noise. Edge detection is commonly done using the Laplacian of an image, which is especially useful for revealing regions of rapid

intensity change. Before applying the Laplacian filter, we should use a Gaussian

filter to properly process the image. The associative property of the convolution

operation applies both to the Gaussian smoothing filter and the Laplacian filter,

which allows us to first perform the convolution and then do the hybrid filter

operation in order to achieve the same result. The image to the right shows the

results of this experiment.

munotes.in

## Page 224

224IMAGE PROCESSING

Figure 8.17 a) shape of LoG with inverted hat

b) 9X9 LoG mask when sigma = 1.4

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

The shape of LoG is somewhat similar to an inverted hat. This image shows the

9X9 LoG mask when sigma = 1.4. The LoG of an image can be obtained directly

by convolving the mask over the image. After obtaining the LoG of an im age, what

do we do?

Source: https://homepages.inf.ed.ac.uk/rbf/HIPR2/log.htm

Figure 8.18 a) intensity variation of an image with a step edge

b) LoG of the image

This image illustrates the intensity variation of an image with a step edge as it

relates to the x -axis. The plot in the right side of this image illustrates the LoG of

the image. When there is an edge in the image, there is a zero crossing in LoG.

munotes.in

## Page 225

225Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

Since t his is a simple and obvious demonstration, it shows that we can determine

the edge when a zero crossing occurs.

Figure 8.19

a) After applying LoG filter b) After applying threshold over zero crossing

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

The Canny Edge Detector

A multi -step edge detection algorithm is also known as a "canny edge detector."

Detection of edges can be achieved by following the steps listed below.

1. Removal of random noise in an input image via Gaussian distribution filter

2. Gaussian filter is used to calculate the gradient magnitude, which is then

used to calculate the magnitude of the image pixels along the x and y axes.

3. Considering a group of neighbors for any curve in a direction perpendicular

to the given edge, suppress the non -max edge contributor pixel points.

4. Furthermore, use the Hysteresis Thresholding method to protect the pixels

that are above the gradient m agnitude, even if they are only slightly above

it, while neglecting the ones that are lower threshold value.

Before getting too deep into the techniques below, consider the following

three conclusions first showed up at by J.K. Canny who created the

algor ithm:

x Good Detection: To avoid getting a false positive or a false negative, the

optimal detector must be reliable.

x Good Localization : The detected edges are considered to be close to true

edges.

x Single Response Constraint: For each edge point, only one po int must be

returned by the detector.

munotes.in

## Page 226

226IMAGE PROCESSING

Steps to follow during Canny Algorithm:

Noise Reduction

An edge detector is a type of high pass filter that amplifies high -frequency

components while suppressing low -frequency ones. Due to the fact that both edges

and noise are high -frequency components, edge detectors have a tendency to

amplify noise. To avoid this, we use a low -pass filter to smooth the image. Canny

accomplishes this through the use of a Gaussian filter.

Figure 8.20

[Image reference from https://theailearner.com/2019/05/22/canny -edge -detector/ ]

While a larger filter reduces noise, it degrades edge localization, and vice versa. In

general, a value of 55 is a good choice, but this may vary according to the image.

Finding Intensity Gradient o f the Image

The pixel might not be close to matching up with its neighboring pixels when the

noise is present. Inappropriate edge detection may occur. In order to avoid having

the same issue, we use the Gaussian filter, which is applied to the image, then

convolved with the image, removing the noise, preventing the desired edges in

output images.

Convoluting the Gaussian filter or kernel g(x,y) with an image I is demonstrated in

the following example. Here, we want to ensure that any given pixel in the outp ut

is similar to its neighbors , and so we use the matrix [1 1 1] to maintain this similarity

and remove noise.

ܵൌܫכ݃ሺݔǡݕሻൌ݃ሺݔǡݕሻכܫ

݃ሺݔǡݕሻൌͳ

ξʹߪߨ݁ି௫మା௬మ

ଶఙమ

g(x,y)= Gaussian Distribution

I = input image

munotes.in

## Page 227

227Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

Derivative:

Calculate the derivative of the filter with respect to the X and Y dimensions and

convolve it with I to obtain the magnitude of the gradient along the dimensions.

Additionally, the image's direction can be calculated using the tangent of the angle

formed by the two dime nsions.

ܵൌሺ݃כܫሻൌሺ݃ሻכܫ

ܵൌቂ݃௫

݃௬ቃכܫൌ݃௫כܫ

݃௬כܫ൨

ܵൌۏێێۍ߲

ݔ߲

߲

ݕ߲ےۑۑېൌቂ݃௫

݃௬ቃ

The convolution described above generates a gradient vector with magnitude and

direction.

ሺܵ௫ǡܵ௬ሻݐ݊݁݅݀ܽݎܩ ݒ݁ܿݐݎ

݉ݐ݅݊݃ܽݑ݁݀ൌට൫ܵ௫ଶܵ௬ଶ൯

݊݅ݐܿ݁ݎ݅݀ ൌߠൌିଵܵ௫

ܵ௬

The following is an illustration of Gaussian Derivatives, which contribute to the

edges in output images.

Figure 8.21 Gaussian Derivatives

[Image reference from https://theailearner.com/2019/05/22/canny -edge -detector/ ]

Clearly, the edges remain quite blurred or thick. Keep in mind that an edge detector

should produce a single precise response corresponding to the edge. As a result, we

munotes.in

## Page 228

228IMAGE PROCESSING

must thin the edges, or in ot her words, locate the largest edge. This is accomplished

through the use of Non -max Suppression.

Non-Max Suppression

This is a technique for thinning the edges. This checks whether or not a pixel is a

local maximum in its neighborhood in the direction of t he gradient. It is retained as

an edge pixel if it is a local maximum; otherwise, it is suppressed.

Neighboring pixels are located in the horizontal, vertical, and diagonal directions

(0°, 45°, 90°, and 135°) for each pixel. As a result, we must round off the gradient

direction at each pixel to one of these values, as illustrated below.

Figure 8.22 Neighboring pixels are located in the horizontal, vertical, and

diagonal directions (0°, 45°, 90°, and 135°) for each pixel.

[Image reference from https://theailearner.com/2019/05/22/canny -edge -detector/ ]

After rounding, we'll compare each pixel's value t o the gradient direction's two

neighboring pixels. If that pixel signifies a local maximum, it is managed to retain

as an edge pixel; otherwise, it is suppressed. Thus, only the most substantial

responses will remain.

Consider the following.

Assume the gra dient direction is 17 degrees for pixel 'A.' we will round to 0 degrees

because 17 is closer to 0. Then, using the rounded gradient direction, we select

neighboring pixels (See B and C in below figure). If A has a greater intensity value

than B and C, it i s retained as an edge pixel; otherwise, and it is suppressed.

munotes.in

## Page 229

229Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

Figure 8.22 Gradient direction

[Image reference from https://theailearner.com/2019/05/22/canny -edge -detector/ ]

Figure 8.23 a) Gradient Magnitude b) Non -max suppression

[Image reference from https://theailearner.com/2019/05/22/canny -edge -detector/ ]

Clearly, the edges are thinned, but some are brighter than others. The brighter ones

can be considered strong edges, while the lighter ones may be true edges or may be

noise.

Hysteresis Thresholding:

Non-max suppression produces an image with a far more accurate depiction of its

real edges. However, as can be seen, certain edges are brighter than others. The

brighter ones can be considered strong edges, whereas the lighter ones may be true

edges or may be noise. Canny employs hysteresis thresholding to resolve the

question of "which edges are truly edges and which are not." We define two

thresholds in this section: 'High' and 'Low.'

x Any edges with intensity greater than ‘High’ are the sure edges.

x Any edges with intensity less than ‘Low’ are sure to be non -edges.

x The edges between ‘High’ and ‘Low’ thresholds are classified as edges only

if they are connected to a sure edge otherwise discarded.

Let’s take an ex ample to understand

munotes.in

## Page 230

230IMAGE PROCESSING

Figure 8.24 'High' and 'Low threshold

[Image reference from https://theailearner.com/2019/05/22/canny -edge -detector/ ]

Here, A and B are sure -edges as they are above ‘High’ threshold. Similarly, D is a

sure non -edge. Both ‘E’ and ‘C’ a re weak edges but since ‘C’ is connected to ‘B’

which is a sure edge, ‘C’ is also considered as a strong edge. Using the same logic

‘E’ is discarded. This way we will get only the strong edges in the image.

Figure 8.25 a) Non -max suppression b) Final Output

Local Processing

In this type of analysis, two primary properties are used to determine the similarity

of edge pixels:

1. The strength of the response of the gradient operator used to generate the

edge pixel.

2. The gradient 's direction

Within a small neighbourhood, for example, 3x3, 5x5, all points with shared

properties are connected. If both the magnitude and direction criteria are met, a

point (x',y') in the vicinity of (x,y) is linked to the pixel at (x,y).

|I[

\

í I[\_7KUHVKROG7P

_Į[

\

íĮ[\_7KUHVKROG

munotes.in

## Page 231

231Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

Figure 8.26 (a) Original image ; (b) detection result without local processing ; (c)

detection result with local processing (Tm=0.15 x max(| f|) and Td=pi/9

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Global Processing Using the Hough Transform

The Hough transform can be used to link pixels and identify curves. In the polar

coordinate system, the straigh t line denoted by y=mx+c can be expressed as,

ȡ [FRVș\VLQș«««««««L

Where, denotes a vector extending from the origin to the point on the straight line

y=mx+c that is closest to the origin. This vector will be perpendicular to the line

from the origin to the point closest to the line, as illustrated below.

Figure 8.27 Vector perpendicular to the line from the origin to the point closest to

the line

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Any line in the x, y plane pertains to a point in the two -dimensional space defined

by the SDUDPHWHUDQGș7KH+RXJKWUDQVIRUPRIDVWUDLJKWOLQHLQWKH[ ,y plane is a SDUWLFXODUSRLQWLQWKHȡșVSDFHDQGWKHVHSRLQWVPXVWIXOILOWKHVSHFLILHGHTXDWLRQwith the constants x1,y1. Thus, the locus of all such lines in the x, y plane

corresponds to the location of the particular sinusoidal curve in, space.

munotes.in

## Page 232

232IMAGE PROCESSING

Assume we have the edge points xi,yi that are parallel to the straight line with

SDUDPHWHUVȡș(DFKHGJHSRLQWFRUUHVSRQGVWRDVLQXVRLGDOFXUYHLQVSDFHEXW

WKHVHFXUYHVPXVWLQWHUVHFWDWȡș'XHWRWKHIDFWWKDWWKLVLVDVKDUHGOLQH

Consider the following equation for a line: y1= ax1+b

By varying the values of a and b in this equation, an infinite number of lines can

pass through this point (x1,y1).

Figure 8.28 line: y1= ax1+b

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

If we rewrite the equation as follows:

B= -ax1+y1

A straight line is obtained if we consider the ab plane instead of the xy plane (xi,yi).

This entire line in the ab plane is due to a single point in the xy plane and different

values of and b. Another point (x2, y2) in the xy plane can be considered here. Its

slope intercept equation is,

< D[E«««««««

This is what we get when we write it in terms of the ab plane.

B= -D[\««««««

In the ab plane, this is a different line. In the ab plane, these two lines will intersect

only if they are part of a straight line in the x -axis plane. (a',b') is the coordinates

of the point of intersection in the ab plane. A slope -intercept equation y=a'x+b' can

be written as follows: y=a'x+b'. This line passes through the points (x1,y1), and

(x2,y2) in the y -axis.

munotes.in

## Page 233

233Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

Figure 8.29 line passes through the points (x1,y1), and (x2,y2) in the y -axis.

Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

8.2 Thresholding

In terms of image segmentation, thresholding is the simplest method. Using

thresholding to create binary images from a grayscale image

The Basics of Intensity Thresholding

Assume that the intensity histogram in Fig. 8.30 (a) refers to an image, f( x, y), that

is comprised of light objects on a dark background, and that the intensity values of

the object and background pixels are sorted into two dominating modes. One

simple method of distinguishing the objects from the background is to set a

threshol d, T, which differentiates the two modes. Then, any point (x, y) in the image

at which the object is located is referred to as an object point. In any other case, the

point is referred to as a background point. With another way of saying it, the

segmented image, indicated by g(x, y), can be represented as

A coordinate f(x, y) is a measure of the intensity of f. (x, y).

݃ሺݔǡݕሻൌ൜ͳ݂݂݅ሺݔǡݕሻܶ

Ͳ݂݂݅ሺݔǡݕሻܶ

Global thresholding occurs when T is a constant that is applied to a whole image.

We use the term variable thresholding when the value of T varies over an image.

As a result, the value of T at every given location (x, y) in an image depends on the

features of the neighbourhood of (x, y) (for example, the average intensity of the

pixels in the neighbor hood). The term dynamic thresholding or adaptive

thresholding is typically used when T relies on the spatial coordinates (x, y). These

terms are not used universally.

Figure 8.30(b) illustrates a more complex and difficult thresholding problem

involving a histogram with three dominant modes, for example, two types of light

objects on a dark background. Multiple thresholding categorizes a point (x, y) as

munotes.in

## Page 234

234IMAGE PROCESSINGrelating to the background if f(x,y) T2. That is to say, the segmented image is specified by

݃ሺݔǡݕሻൌቐܽሺǡሻ൏ଵ

ܾଵሺǡሻଶ

ܿሺǡሻଶ

FIGURE 8.30 Intensity histograms that can be partitioned (a) by a single threshold, and (b) by dual thresholds

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods) Basic Global Thresholding Whenever the intensity distributions of objects and background pixels are substantially distinct it is possible to utilize a single (global) threshold that applies to the whole image In many other applications, there is typically sufficient variation across images so, even though global thresholding is a reasonable method an algorithm capable of predicting the threshold value for each image is needed. For this aim, the iterative approach described below can be used: 1. An initial threshold (T) is selected; this could be accomplished at random or

by any other method that is preferred. 2. In the same manner as explained before, the image is segmented into object

and background pixels, yielding two sets of pixels:

a. G1 = {f(m,n):f(m,n)>T} (object pixels)

b. G2 = {f(m,n):f(m,n) T} (background pixels) (note, f(m,n) is the value

of the pixel located in the mth column, nth row) 3. The average of each set is computed.

a. m1 = average value of G1

b. m2 = average value of G2 4. A new threshold is created that is the average of m1 and m2

a. T‘ = (m1 + m2)/2

munotes.in

## Page 235

235Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

5. Return to step two, this time using the new threshold computed in step four,

and repeat the process until the new threshold matches the one before it (i.e.,

until convergence has been achieved).

Segmentation using the preceding iterative algorithm is illustrated in Figure 8.31.

The original image is shown in Figure 8.31 (a), and the image histogram, which

demo nstrates a distinct valley, is shown in Figure 8.31(b). The basic global

algorithm produced the threshold after three iterations, starting with T equal to the

image's average intensity, and using Figure 8.31 (c) to segment the original image.

As expected g iven the histogram's distinct separation of modes, the segmentation

of object and background was flawless.

FIGURE 8.31(a) Noisy fingerprint. (b) Histogram. (c) Segmented result using a

global threshold (thin image border added for clarity). (Original ima ge courtesy

of the National Institute of Standards and Technology.).

(Reference from "Digital Image Processing fourth edition

by Rafael C. Gonzalez and Richard E. Woods)

Optimum Global Thresholding Using Otsu’s Method

Let us begin by comprehending Otsu's approach. The method processes the image

histogram, segmenting the objects based on the class variance minimization.

Generally, this technique produces the desired results when dealing with bimodal

images. The histogram of such an image contains two distinct peaks that correspond to distinct ranges of intensity values.

The fundamental idea is to divide the image histogram into two clusters using a

threshold defined by the weighted variance of these classes, denoted by ߪ௪ଶሺݐሻ

The enti re computation equation can be summarized as follows :ߪ௪ଶሺݐሻൌ

ݓଵሺݐሻߪଵଶሺݐሻݓଶሺݐሻߪଶଶሺݐሻ, where ݓଵሺݐሻ and ݓଶሺݐሻare the probabilities of the

two classes divided by a threshold t that is between 0 and 255 inclusively.

As demonstrated in Otsu's paper, there are usually two ways to determine the

threshold. The first objective is to minimize the within -class variance defined

munotes.in

## Page 236

236IMAGE PROCESSING

aboveߪ௪ଶሺݐሻ, while the second objective is to maximise the between -class variance

using the follo wing expression:

ߪଶሺݐሻൌݓଵሺݐሻݓଶሺݐሻሾߤଵሺݐሻെߤଶሺݐሻሿଶWhere ߤ denotes the mean of class i.

The probability P is computed for every pixel value in two distinct clusters C 1 and

C2 using the cluster probability functions

ݓଵሺݐሻൌܲሺ݅ሻ௧

ୀଵ

ݓଶሺݐሻൌܲሺ݅ሻூ

ୀ௧ାଵ

It's important to note that the image can be viewed as an intensity function f(x , y),

with gray -level values .The number of pixels that have a specified gray -level I is

signified by i. The image's as a whole pixel count is n.

Thus, the probability of encountering gray -level I is:

ܲሺ݅ሻൌ݊

݊

The C 1 pixel intensity values are in the range [1, t], while the C 2 pixel intensity

values are in the range [t + 1, I], where I is the maximum pixel value (255).

The following phase is to achieve the means for C 1 and C 2, which are denoted

appropriately by ߤଵሺݐሻ andߤଶሺݐሻ):

ߤଵሺݐሻൌ݅ܲሺ݅ሻ

ݓଵሺݐሻ௧

ୀଵ

ߤଶሺݐሻሻൌ݅ܲሺ݅ሻ

ݓଶሺݐሻூ

ୀ௧ାଵ

Now distinctly remember the above equation for the weighted variance within

classes. We will determine the remaining components ( ߪଵଶ,ߪଶଶ) by combining all

of the above -mentioned ingredients:

ߪଵଶሺݐሻൌሾ݅െߤଵሺݐሻሿଶ݅ܲሺ݅ሻ

ݓଵሺݐሻ௧

ୀଵ

ߪଶଶሺݐሻሻൌሾ݅െߤଶሺݐሻሿଶ݅ܲሺ݅ሻ

ݓଶሺݐሻூ

ୀ௧ାଵ

It's important to note that if the threshold is selected wrongly, the variance of certain

class will be quite large. To obtain the total variance, we essentially have to add the

variances within and bet ween classes: ߪ்ଶൌߪ௪ଶሺݐሻߪଶሺݐሻ , where ߪଶሺݐሻൌmunotes.in

## Page 237

237Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detectionݓଵሺݐሻݓଶሺݐሻሾߤଵሺݐሻെߤଶሺݐሻሿଶ. The image's total variance (ߪ்ଶ) is independent of the

threshold.

Thus, the pipeline of the general algorithm for the between -class variance

maximization option can be represented as follows:

1. calculate the histogram and intensity level probabilities

2. initialize ݓଵሺͲሻ, ߤሺͲሻ

3. iterate over poss ible thresholds: t = 0,..., max _intensity

x update the values of ݓ ,ߤ, where ݓi is a probabilit y and ߤ is a

mean of class i

x calculate the between -class variance value ߪଶሺݐሻ

4. the final threshold is the maximum ߪଶሺݐሻ value

Otsu’s Binarization Application

The authors propose an enhanced Otsu's method as one approach for estimating the

location of underwater landmarks. The advantages of this approach are that it

allows for precise real -time segmentation of underwater features and has been

demonstrated to perform better than threshold segmentation methods. Consider its

concept in greater detail by examining the side -scan sonar (SSS) shipwreck image

provided in the article. Modern SSS systems are capable of imaging a large area of

the sea floor in two dimensions and produc ing realistic images. Thus, their

background contains regions of sludge and aquatic animals in the form of spots that

are typically equal to 30 pixels in diameter.

FIGURE 8.32 : SSS sea bottom image

[ Image reference https://learnopencv.com/otsu -thresholding -with-opencv/ ]

They distort proper image processing because their grey level is similar to that of

certain zones of foreground objects. As shown below, the segmented image

produced by the classical Otsu technique co ntains these artefacts:

munotes.in

## Page 238

238IMAGE PROCESSING

FIGURE 8.33 : SSS sea bottom image

[ Image reference https://learnopencv.com/otsu -thresholding -with-opencv/ ]

To address this spot challenge, a technique is based on Otsu's binarization was

developed to restrict the search range of the suitable segmentation threshold for

foreground object division. The following is the i mproved Otsu's method pipeline:

1. Calculate Otsu's threshold T

2. Calculate N 30 using Canny edge detection

3. if N 30 > 300, calculate final T value.

As a result, the wrecked ship stands out clearly from the background:

FIGURE 8.34 : Improved Otsu’s result

[ Image reference https://learnopencv.com/otsu -thresholding -with-opencv/ ]

Using Image Smoothing to Improve Global Thresholding

The image from Figure 8.35 (a), the histogram is shown in Figure 8.35 (b), and the

image is shown in Figure 8.35 (c) after it has been thresholded using Otsu's method.

Each black point in the white region and each white point in the bla ck region

represents a thresholding error, indicating that the segmentation was a complete

failure. The result of smoothing the noisy image with a size averaging kernel is

shown in Figure 8.35 (d), and the histogram is shown in Figure 8.35 (e). Smoothing

improves the shape of the histogram, and we would expect the smoothed image's

thresholding to be nearly perfect. As illustrated in Figure 8.35 (f), this is the case.

munotes.in

## Page 239

239Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

The blurring of the boundary between object and background resulted in the slight

distortio n of the object -background boundary in the segmented, smoothed image.

Indeed, the more aggressively we smooth an image, the more boundary errors in

the segmented result we should anticipate.

FIGURE 8.35 (a) Noisy image (c) and (b) its histogram. (c) Result obtained using

Otsu’s method. (d) Noisy image smoothed using an averaging kernel and (e) its

histogram. (f) Result of thresholding using Otsu’s method

(Reference from "Digital Image Processing fourth ed ition

by Rafael C. Gonzalez and Richard E. Woods)

Following that, we examine the effect of significantly shrinking the foreground

region in comparison to the background. The result is depicted in Figure 8.36 (a).

In this image, the noise is additive Gauss ian with a mean of zero and a standard

deviation of ten intensity levels (as opposed to 50 in the previous example). As

illustrated in Fig. 8.36 (b), the histogram lacks a clear valley, implying that

segmentation will fail, as confirmed by the result in Fi g. 8.36 (c). The image

smoothed with a size averaging kernel is shown in Figure 8.36 (d), and the

corresponding histogram is shown in Figure 8.36 (e). As expected, the net effect

was to narrow the histogram's spread, but the distribution remained unimodal. As

illustrated in Fig. 8.36 (f), segmentation failed once more. The reason for the failure

is that the region is so small that its contribution to the histogram is negligible in

comparison to the noise -induced intensity spread. In these instances, the fol lowing

section's approach is more likely to succeed.

munotes.in

## Page 240

240IMAGE PROCESSING

FIGURE 8.36 (a) Noisy image and (b) its histogram. (c) Result obtained using

Otsu’s method. (d) Noisy image smoothed using a averaging kernel and (e) its

histogram. (f) Result of thresholding using Otsu’s method. Thresholding failed in

both cases to extract the object of interest

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Using Edges to Improve Global Thresholding

The foreground and background of an image are a necessary condition for

segmenting it using a glob al threshold. There is sufficient space between the

histogram peaks, such that by adjusting the threshold between the gaps, the

foreground can be distinguished from the background. OneOkHistograms, or

histograms suitable for global threshold segmentation, frequently exhibit the

following properties:

x tall

x narrow

x symmetric

x separated by deep valleys

However the human eye appears to have a high contrast between foreground and

background at times, segmenting an image with a global threshold is not easy.

Determine the threshold adaptively, because the foreground or background content

is excessive, swamping the contribution of the other party to the grayscale

histogram. This experiment is designed to address this issue. The central idea to

solve this problem is Stat histogram only for the place where the foreground and

munotes.in

## Page 241

241Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

the background are adjacent. And to find nei ghboring places can often be through

magnitude of gradient of image or Laplacian operator get .

The following algorithm summarizes the preceding discussion, where f(x, y)

denotes the input image:

1. using any of the methodologies, evaluate an edge image as the magnitude of

the gradient or the absolute value of the Laplacian of f(x, y).

2. Enter a value for the threshold, T.

3. Using T from Step 2, threshold the image from Step 1 to create a binary

image, ்݃ሺݔǡݕሻ

This image is used as a mask in the subsequent step to select pixels from f(x,

y) that correspond to the mask's "strong" edge pixels.

4. Create a histogram by excluding the pixels in f(x, y) that correspond to the

locations of the 1 -valued p ixels in ்݃ሺݔǡݕሻ

5. Using the histogram created in Step 4, globally segment f(x, y) using, for

example, Otsu's method.

If T is set to a value less than the minimum value of the edge image, it will contain

only 1's, implying that the image histogram wi ll be computed using all pixels in

f(x, y). In this particular instance, the previous algorithm performs global

thresholding on the original image's histogram. It is common practice to specify T

as a percentile, which is typically set high (e.g., in the hi gh 90's) to ensure that only

a small subset of the gradient/Laplacian image pixel is used in the computation.

The following examples demonstrate the previously discussed concepts. The

gradient is used in the first example, while the Laplacian is used in th e second.

Using either approach, similar results can be obtained in both examples.

EXAMPLE 1 : Using edge information based on the gradient to improve global

thresholding.

The critical point is the images and histograms in Figures 8.37(a) and (b)

correspond to those in Figure 8 .36. As you saw, smoothing accompanied by

thresholding was unable to segment this image. The purpose of this example is to

demonstrate how to solve a problem using edge information. The mask image in

Figure 8.37(c) was created using a gradient magnitude image thresholded at the

99.7 percentile. The image in Figure 8.37 (d) is the result of multiplying the mask

by the input image. The histogram in Figure 8.37(e) represents the nonzero

elements in Figure 8.37(d). Take note that this histogram possesses the critical

characteristics discussed previously; namely, it contains reasonably symmetrical munotes.in

## Page 242

242IMAGE PROCESSING

modes separated by a deep valley. Thus, whereas the histogram of the original noisy

image did not indicate the possibility of successfully thresholding, the histogram in

Fig. 8.37(e) indicates that it is possible to successfully threshold the small object

from the background. As illustrated in Fig. 8.37(f), this is t he case. This image was

created by applying Otsu's to the noisy image in Fig. 8.37 (a) and then applying the

Otsu threshold globally to the noisy image. The outcome is nearly flawless generate

an appropriate derivative image.

FIGURE 8.37 (a) Noisy image from Fig. 8.36 (a) and (b) its histogram. (c) Mask

image formed as the gradient magnitude image thresholded at the 99.7 percentile.

(d) Image formed as the product of (a) and (c). (e) Histogram of the nonzero

pixels in the image in (d). (f) Result of segmenting image (a) with the Otsu

threshold based on the histogram in (e). The threshold was 134, which is

approximately midway between the peaks in this histogram

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

EXAMPLE 2: Using edge information based on the Laplacian to improve

global thresholding.

We take into account a much more complicated thresholding problem in this

example. Figure 8.38 (a) illustrates an 8 -bit image of yeast cells on which we wish

to perform global thresholding in order to achieve the regions pertaining to the

bright spots. As a starting point, Fig. 8.38 (b) depicts the image histogram, and Fig.

8.38 (c) depicts the result obtained by applying Otsu's method directly to the image.

As can be seen, Otsu's method fell short of the original objective of detecting bright

spots. While the method was successful in isolating several cell regions, several of

the segmented regions on the right were actually joined. The Otsu method

determined a threshold of 42 and a separability measure of 0.636.

munotes.in

## Page 243

243Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

FIGURE 8.38 (a) Image of yeast cells. (b) H istogram of (a). (c) Segmentation of

(a) with Otsu’s method using the histogram in (b). (d) Mask image formed by

thresholding the absolute Laplacian image. (e) Histogram of the nonzero pixels in

the product of (a) and (d). (f) Original image thresholded us ing Otsu’s method

based on the histogram in (e).

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

The mask image ்݃ሺݔǡݕሻis shown in Figure 8.38 (d). It was created by calculating

the exact value of the Laplacian image and then thresholding it with T set to 115

on an intensity scale ranging from 0 to 255. T equals nearly to the 99.5 percentile

of the values in the exact Laplacian image, and thus th resholding at this level results

in a limited set of pixels, as illustrated in Fig. 8.38 (d). As anticipated from the

previous section, the points cluster close to the edge of the bright spots in this

image. The histogram in Figure 8.38 (e) represents the nonzero pixels in the product

of (a) and (b) (d). Finally, Fig. 8.38 (f) illustrates the outcome of dividing the actual

image globally using Otsu's method based on the histogram in Fig. 8.38 (e). This

result is consistent with the locations of the image's bright spots. The Otsu method

calculated a threshold of 115 and a separability measure of 0.762, which are both

greater than the values obtained using the original histogram. We can even improve

the segmentation of complete cell regions by varying the perc entile at which the

threshold is set. For instance, Fig. 8.38 illustrates the result obtained using the same procedure as described previously, but with the threshold set to 55, or approximately 5% of the maximum value of the absolute Laplacian image. This

value is in the 53.9 percentile of the values contained in that image. This result is

munotes.in

## Page 244

244IMAGE PROCESSING

clearly superior to that shown in Fig. 8.38 (c) when Otsu's method is used in

conjunction with the histogram of the original image.

FIGURE 8.39 Image in Fig. 8.38 (a) segmented using the same procedure as

explained in Figs. 8.38 (d) through (f ), but using a lower value to threshold the

absolute Laplacian image.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Ric hard E. Woods)

Multiple Thresholds

We have thus far concentrated on image segmentation via a single global threshold.

Otsu's method is extensible to an infinite number of thresholds due to the fact that

the separability way of measuring upon which it is ba sed is also extensible to an

infinite number of classes. The between -class variance generalizes to the expression in the case of K classes.

ߪଶൌܲሺ݉െ݉ீሻଶ

ୀଵ

Where

ܲൌ݅

ఢೖ

And

݉ൌ݅

ఢೖ

As previously stated, the global mean is ݉ீ 7KH.FODVVHVDUHVHSDUDWHGE\.í

thresholds whose values ݇ଵכǡ݇ଶǡכǥǤǡ݇ିଵכ

ߪଶ൫݇ଵכǡ݇ଶǡכǥǤǡ݇ିଵכ൯ൌ

ழభழమழڮ಼షభழିଵߪଶሺ݇ଵǡ݇ଶǡǥǡ݇ିଵሻ

Even though this outcome holds true for any number of classes, it ends up losing

implying as the number of classes increases due to the fact that we have been

munotes.in

## Page 245

245Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

dealing with a single variable (intensity). Indeed, the variance between classes is

frequently ex pressed in terms of multiple variables expressed as vectors .In

practice , whenever there is reason to assume that the issue can be resolved

appropriately with two thresholds, multiple global thresholding is considered an

appropriate approach. Usually, appl ications requiring more than two thresholds are

solved using a combination of intensity values and other parameters. Rather than

that, additional descriptors (for example, colour) are used, and the application is

recast as a pattern recognition problem.

The between -class variance for three classes consisting of three intensity intervals

(separated by two thresholds) is given by:

ߪଶൌܲଵሺ݉ଵെ݉ீሻଶܲଶሺ݉ଶെ݉ீሻଶܲଷሺ݉ଷെ݉ீሻଶ

Where

ܲଵൌܲభ

ୀ

ܲଶൌܲభ

ୀభାଵ

ܲଷൌܲିଵ

ୀమାଵ

And

݉ଵൌͳ

ܲଵܲభ

ୀ

݉ଶൌͳ

ܲଶܲభ

ୀభାଵ

݉ଷൌͳ

ܲଷܲିଵ

ୀమାଵ

The following relationships also hold:

ܲଵ݉ଵܲଶ݉ଶܲଷ݉ଷൌ݉ீ

And

ܲଵܲଶܲଷൌͳ

The three classes are separated by two thresholds whose values ݇ଵכand

݇ଶכmaximize

ߪଶሺ݇ଵכǡ݇ଶכሻൌ

ழభழమழିଵߪଶሺ݇ଵǡ݇ଶሻ

Algorithm munotes.in

## Page 246

246IMAGE PROCESSING

(1) Let ݇ଵൌͳ

(2) Increment ݇ଶ through all its values greater than ݇ଵ DQGOHVVWKDQ/í

(3) Increment ݇ଵ to its next value and increment ݇ଶ through all its values greater

than ݇ଵ and less than െͳ

(4) Repeat until ݇ଵൌെ͵

This results in a 2 -D array ߪଶሺ݇ଵǡ݇ଶሻ, after which ݇ଵכand ݇ଶכ that correspond to

the maximum value in the array, are selected

Segmentation is as follows: ݃ሺݔǡݕሻൌቐܽǡ݂݂݅ሺݔǡݕሻ݇ଵכ

ܾǡ݂݅݇ଵכ൏݂ሺݔǡݕሻ

ܿǡ݂݂݅ሺݔǡݕሻ݇ଶכ݇ଶכ

Separability measure :ߟሺ݇ଵכǡ݇ଶכሻൌఙಳమሺభכǡమכሻ

ఙಸమ

EXAMPLE 3: Multiple global thresholding

Figure 8.40 (a) depicts an iceberg. The goal of this example is to segment the image

into three distinct regions: the dark background, the illuminated portion of the

iceberg, and the shadowed portion. The image histogram in Fig. 8.40 (b)

demonstrates that two threshol ds are required to resolve this problem. The

procedure described previously resulted in the thresholds shown in Fig. 8.40 (b),

which are located near the centres of the two histogram valleys. The segmentation

shown in Figure 8.40 (c) is the result of using these two thresholds. The measure

of separability was 0.954. The primary reason this example worked so well is that

the histogram contains three distinct modes separated by fairly large, deep valleys.

However, we can do even better by utilizing superpixel s.

FIGURE 8.40 (a) Image of an iceberg. (b) Histogram. (c) Image segmented into

three regions using dual Otsu thresholds.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 247

247Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

Variable Thresholding

Noise and non uniform illumination have a significant impact on the performance

of a thresholding algorithm. We demonstra ted how image smoothing and utilize of

edge information can be extremel y beneficial. However, there are times when this

type of preprocessing is either impractical or ineffective in addressing the problem,

to the point where none of the thresholding methods discussed thus far can resolve

the problem. In those kind of cases, a s we will demonstrate in the discussion that

follows, the next level of thresholding complexity involves variable thresholding.

Variable Thresholding Based on Local Image Properties

A fundamental concept to variable thresholding is to calculate a threshold at each

point in the image, (x, y), depending on one or more required characteristics in the

image's neighborhood (x, y). While this may appear to be a lengthy process,

advanced algorithms and hardware allow rapid neighborhood processing,

particularly for basic procedures such as logical and arithmetic operation.

The mean and standard deviation of the pixel values in the neighborhood of each

point in an image are being used to illustrate the method. Since these two quantities

are descriptors of average intensity and contrast, they are helpful in determining

local thresholds. Let ݉௫௬ and ߪ௫௬ signify the mean and standard deviatio n of the

set of pixel values in an image's neighborhood ܵ௫ǡ௬, centred on the coordinates

(x, y). The following are examples of common types of variable thresholds that are

determined by the local image properties.

ܶ௫௬ൌܽߪ௫௬ܾ݉௫௬

Where a and b are denotes nonnegative constants, and

Take note that T is threshold array also the same size as the image in which it was

achieved. The threshold value at a particular location in the array (x, y) is used to

segment the value of an image at that point.

ܶ௫௬ൌܽߪ௫௬ܾ݉ீ

Where ݉ீ denotes the mean of the global image. The segmented image is

calculated as follows:

݃ሺݔǡݕሻൌቊͳ݂݂݅ሺݔǡݕሻܶ௫௬

Ͳ݂݂݅ሺݔǡݕሻܶ௫௬

Where f(x, y) denotes the input images. This equation is assessed for each pixel

point i n the image, and a different threshold is calculated with each (x, y) location

utilizing the pixels in the neighborhood ܵ௫ǡ௬.

Significant power could be inserted to variable thresholding (with only a modest

improvement in calculation) through using predi cates based on various parameters

calculated in the neighborhood of a point (x, y):

݃ሺݔǡݕሻൌ൜ͳ݂݅ܳሺ݈݈ܿܽܽݎ݉ܽݏݎ݁ݐ݁ሻ݅ݏܧܷܴܶ

Ͳ݂݅ܳሺ݈݈ܿܽܽݎ݉ܽݏݎ݁ݐ݁ሻሻ݅ݏܮܣܨܧܵ munotes.in

## Page 248

248IMAGE PROCESSING

Where Q is a predicate depending on parameters calculated from neighbouring

pixels. Consider the following predicate, which is derived from the local mean and

standard deviation:

ܳ൫ߪ௫௬ǡ݉௫௬൯ൌ൜ܧܷܴ݂݂ܶ݅ሺݔǡݕሻܽߪ௫௬ܦܰܣ݂ሺݔǡݕሻܾ݉௫௬

ܨܣܵܮܧݐܱ݄ݓݎ݁݅݁ݏ

EXAMPLE 4 : Variable thresholding based on local image properties.

The yeast image is depicted in Figure 8.41 (a). Given that this image contains three

distinct intensity levels, it is logical to assume that the dual thresholding would be an effective segmentation technique. The outcome of applying the dual thresholding method is summarized in Figure 8.41 (b). As illustrated in the figure,

while it was necessary to distinguish the bright areas from the background, the mid -

gray portions upon on right side of the image were not correctly segregated (i.e.,

separated). To demonstrate the application of local thresholdi ng, we calculated the local standard deviation for all (x, y) values in the input image using a neighborhood of size the result is depicted in Figure 8.41 (c). Take note of how the

faint outer lines accurately delineate the cell boundaries. Following that, we

constructed a predicate in the manner, but by using global mean rather than When

the background is fairly constant and then all the object intensities are above or

below the background intensity, selecting the global mean usually gets better

results. T he values and have been used to accomplish the specification of the

predicate (as is typically the case in applications such as this, these parameters were

estimated experimentally). After that, the image was segmented. As illustrated in

Fig. 8.41 (d), seg mentation was quite successful. Notably, all outer regions have

been segmented correctly, and the majority of the inner, brighter regions have been

isolated correctly.

FIGURE 8.41 (a) Image from Fig. 8.40. (b) Image segmented using the dual

thresholding approach ( c) Image of local standard deviations. (d) Result obtained

using local thresholding.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 249

249Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

Variable Thresholding Based on Moving Averages

The moving average technique is a variation on the variable threshold processing

technique. The variable threshold is comparative to the processing of global

thresholds. The term "global threshold processing" relates to the procedure of

determining a fixed thr eshold based on the entire image. If the size of each pixel in

the image is greater than this Otherwise, the value is considered to be in the

background. The variable threshold indicates that each pixel or pixel block in the

image has a unique threshold. I f a pixel exceeds its associated threshold, it is

viewed to be in the foreground. The moving average method scans the entire image

in a linear zigzag pattern, creates a threshold at each point, and compares the grey

value at that point to the threshold to segment the image.

Method

Suppose a 5x5 picture is shown below, aij represents the gray value at position

(i, j).

Due to the fact that it is linearly scanned in the z -shape, the two -dimensional matrix

must be converted to a one -dimensional row matrix.

The moving average algorithm makes use of two parameters, n and b. The average

of n pixels is represented by n, and b is a threshold coefficient. The following one -

dimensional matrix could be used as a filter to filter and average the image obtained

previously.

This allows for the calculation of the average mij at each point. The threshold at

this pixel is calculated by multiplying the parameter b by mij. Then you can compare each pixel's grayscale to the threshold to obtain the final

segmented image.

munotes.in

## Page 250

250IMAGE PROCESSING

FIGURE 8.42 a) Original image b ) Moving average processing c) Finally,

perform a minimum filter on the result

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

8.3 Segmentation by Region Growing and by Region Splitting and

Merging

Region -Based Segmentation:

Segmentation is used to divide an image into regions. We decided to approach this

issue by defining regions using discontinuities in grey levels and segmenting them

using thresho lds based on the distribution of pixel properties such as gray -level

values or colour.

Basic Formulation:

Assume that R is used to represent the entire image region. Segmentation can be

thought of as a process that divides R into n subregions, R1, R2 ... Rn, in such a way

that

a. ڂܴൌܴ

ିଵ

b. ܴis a connected set, for i=0,1,2,3..,n;

c. ܴתܴൌݎ݂݈݈݆ܽ݅݀݊ܽǡ്݆݅

d. ܳሺܴሻൌܴܶܧܷݎ݂ൌͲǡͳǡʹǡ͵ǤǤǡ

e. ܳ൫ܴܴ൯ൌܴܶܧܷfor any adjacent regions ܴܴ݀݊ܽ

P (R k) signifies a logical predicate described over the points in the set R i and

whereas is the null set. Condition (a) specifies that the segmentation should be

complete; each pixel must be contained within a region. Conditions (b) require

that points wit hin a region be connected in a predefined way. As indicated by

condition (c), the regions must be disjoint. Condition (d) specifies the properties

that all pixels within a segmented region must satisfy —for example, P (Ri) = TRUE

munotes.in

## Page 251

251Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

if all pixels within Ri hav e the same grey level. Finally, condition (e) establishes

that regions Ri and Rj are distinct in terms of the predicate P.

Region Growing:

As the name implies, region growing is a process that groups pixels or subregions

accordance with the given criteria into larger regions. The fundamental approach

is to begin with a collection of "seed" points and grow regions from them by

attaching to each seed those neighboring pixels that share similar properties (such

as specific ranges of grey level or color). When no a priori information is provided,

the operation is to calculate the same set of properties for each pixel that will

eventually be used to assign pixels to regions during the growing process. If such

outcome of the these calculations exposes clusters of values, the pixels with

properties that put them close to the clusters' centroid could be used as seeds.

The choice of similarity criteria is determined not only by the nature of the problem

at hand, but also from the type of image data available. For inst ance, analysis of

land-use satellite imagery is highly dependent on the use of colour. Without the inherent information contained in colour images, this problem would be significantly more difficult, if not impossible, to solve. When analyzing

monochrome i mages, region analysis must be performed using a set of descriptors

based on grey levels and spatial properties (such as moments or texture).

(VVHQWLDOO\UHJLRQH[SDQVLRQVKRXOGKDOWZKHQQRDGGLWLRQDOSL[HOVIXO¿OOWKH

region's inclusion criteria. Gray l evel, texture, and colour are all local in nature and

do not take into account the region's "history." Additional criteria that enhance the

power of a region growing algorithm include the concept of size, similarity between

a candidate pixel and the pixels grown thus far (for example, a comparison of the

candidate's grey level to the average grey level of the grown region), and the shape

of the region being grown. The use of these types of descriptors is predicated on

the assumption that at least a partial model of expected results is available.

Let f(x, y) represent an input image; S(x, y) represent a seed array that included 1's

at seed point locations and 0's elsewhere ; and Q represent a predicate to be

implemented at every location (x, y). Assume that arrays f and S are of equal size.

The following is a simple region -growing algorithm based on 8 -connectivity.

1. Identify all connected components in S(x, y) and reduce them to a single

pixel; label all such pixels found as 1. All other pixels in S a re labelled with

a value of 0.

2. Create an image f Q such that at each point (x, y), f Q (x,y)=1 if the input image

fulfils a provided predicate, Q, and f Q (x,y)=0 otherwise. munotes.in

## Page 252

252IMAGE PROCESSING

3. Let g be an image created by adding all the 1 -valued points in f Q that are 8 -

connected to each seed point in S.

4. Assign a unique region label to each connected component in g. (e.g.,integers

or letters). This is the image segmented as a result of region growing.

EXAMPLE 5: Segmentation by region growing

Figure 8.43 (a) illustrat es an 8 -bit X -ray image of a weld (the horizontal dark

region) with multiple cracks and porosities (the bright regions running horizontally

via the image's centre). By segmenting the defective weld regions, we prove use of

region growing. These regions cou ld have been used for number of purposes, such

as weld inspection, archiving historical studies, and controlling an automated

welding system.

FIGURE 8.43 (a) X-ray image of a defective weld. (b) Histogram. (c) Initial seed

image. (d) Final seed image (the points were enlarged for clarity). (e) Absolute

value of the difference between the seed value (255) and (a). (f) Histogram of (e).

(g) Difference image thres holded using dual thresholds. (h) Difference image

thresholded with the smallest of the dual thresholds. (i) Segmentation result

obtained by region growing

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods )

munotes.in

## Page 253

253Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

The first step is to establish the seed points. Because cracks and porosities attenuate

X-rays significantly less than solid welds, we anticipate the regions probably

contains such types of defects to be substantially brighter than the remainder of the

X-ray image. We can obtain the seed points by thresholding the original image with

a high percentile threshold. The histogram of the image is shown in Figure 8.43

(b), and the thresholded outcome is shown in Figure 8.43 (c), with a threshold equal

to the image's 99.9 percentile of intensity values, which in this case was 254. . The

result of morphologically eroding each connected component in Figure 8.43 (c) to

a single point is shown in Figure 8.43 (d).

Following that, we must clearly state a predicate. I n this example, we're interested

in appending to each seed all pixels that are (a) eight -connected to it and (b)

"similar" to it. Our predicate is applied at each location (x, y) using absolute

intensity differences as a measure of similarity.

ܳൌ൝ܧܷܴ݂ܶ݅ݐ݄݁ݑ݈ݏܾܽ݁ݐ݂݂݀݁݅ݎ݂݁ܿ݊݁݅ݏ݊݁ݐ݊݅ݐ݅ݏ݁

ݐܾ݁ݓ݊݁݁ݐ݄݁݀݁݁ݏܽ݀݊ݐ݄݁݅ݔ݈݁ܽݐሺݔǡݕሻ݅ݏܶ

ܵܮܵܣܨܧܱݐ݄݁ݏ݅ݓݎ݁

Where T denotes a predefined threshold. Whereas this predicate is relied on

intensity differences as well as contains a particular threshold, we can define more

sophisticated methods in which each pixel has a distinctive threshold and other

properties are used in addition to differences. As the remainder of this examp le

demonstrates, the preceding predicate is sufficient to resolve the problem.

As stated previously, all seed values are 255 because the image was thresholded at

a value of 254. The difference between the seed value (255) and the value in Figure

8.43 (a) is illustrated in Figure 8.43 (e). The image in Figure 8.43 (e) contains all

of the distinctions necessary to compute the predicate at each location (x, y). The

corresponding histogram is shown in Figure 8.43 (f). We require a threshold for

establishing sim ilarity in the predicate. Because the histogram contains three

primary modes, we can begin by applying the dual thresholding technique to the

difference image. In this case, the resulting two thresholds were and, as we can see,

they closely correspond to t he histogram's valleys. (As a brief aside, the image was

segmented using these two thresholds. The result in Fig. 8.43 (g) demonstrates that

despite the fact that the thresholds are in the deep valleys of the histogram,

segmenting the defects cannot be acc omplished using dual thresholds.)

The difference image is thresholded in Figure 8.43 (h) using only The black points

represent pixels for which the predicate was TRUE; the white points represent

pixels for which the predicate was FALSE. The critical findin g here is that the

points in the weld's good regions failed the predicate, and thus will be excluded

from the final result. The region growing algorithm will consider the points in the munotes.in

## Page 254

254IMAGE PROCESSING

outer region as candidates. Step 3 will, however, reject the outer poin ts because

they are not connected to the seeds in an 8 -way fashion. Indeed, as illustrated in

Fig. 8.43 (i), this step resulted in the correct segmentation, indicating that connectivity was a necessary condition in this case. Finally, note that we used the

same value in Step 4 for all regions discovered by the algorithm. In this case, it was

visually preferable to do so because all of those regions in this application have the

same physical meaning —they all represent porosities.

Region Splitting and Merging

The method described earlier grows regions from seed points. An option available

is to at first subdivide an image into a series of non - overlapping regions and then

merge and/or split the regions in order to satisfy the segmentation conditions.

Followin g that, the fundamentals of region splitting and merging are mentioned.

Assume that R represents the entire image region and that a predicate Q is chosen.

One method for segmenting R is to subdivide it subsequently into smaller and

smaller quadrant regions , such that we begin with the entire region, R. If the region

is split into quadrants. If Q is FALSE for any quadrant, that quadrant is divided into

sub-quadrants, and so forth. This technique is conveniently represented by so -

called quadtrees; which is, t rees in which each node will have precisely four

descendants, as illustrated in Fig. 8.44 (the images equivalent to the nodes of a

quadtree are occasionally referred to as quadregions or quadimages). Take note that

the root of the tree represents the entire image, while each node represents the

subdivision of a node into four descendant nodes. Only was subdivided further in

this instance.

FIGURE 8.44 ( a) Partitioned image. (b) Corresponding quadtree.

R represents the entire image region.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

When splitting is used exclusively, the final partition typically includes adjacent

regions with identical properties. This disadvantage can be overcome by permitting

munotes.in

## Page 255

255Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

for both merging as well as splitting. Satisfying segmentation constraints. It

involves m erging adjacent regions with pixels that satisfy the predicate Q. That is,

two adjacent regions are merged only if they are adjacent.

The preceding discussion can be summarised as follows: at any step, we

1. Split any region Ri into four disjoint quadra nts for which ܳሺܴሻൌܧܵܮܣܨ;

2. When no further splitting is possible, merge any adjacent regions ܴ and ܴ

for which ܳሺܴܴሻൌܴܷܶܧ

3. Stop when no further merging is possible.

Numerous variations are possible on this basic theme. For instance, if we allow the

merging of any two adjacent regions in Step 2 and each one satisfies the predicate

independently, a significant simplification occurs. This results in a significantly

simpler (and faster) algorithm, as predicate testing is restri cted to individual

quadregions. As demonstrated in the following example, this simplification is still

capable of producing acceptable segmentation results.

EXAMPLE 6: Segmentation by region splitting and merging

The Cygnus Loop supernova is depicted in Figure 8.45 (a) in an X -ray image. The

purpose of this example is to segment (extract) the "ring" of less dense matter that

surrounds the dense inner region. The region of importance exhibits several readily

apparen t characteristics that should assist in segmentation. To begin, we recognize

that the data in this region are random, implying that their standard deviation ought

to be greater than that of the background (which is close to zero) and the large

central regi on, which is smooth. Likewise, the mean value (average intensity) of a

region containing outer ring data must be greater than the mean of the darker

background and less than the mean of the lighter central region. Thus, using the

following predicate, we sh ould be able to segment the region of interest:

FIGURE 8.45 (a) Image of the Cygnus Loop supernova, taken in the X -ray band

by NASA’s Hubble Telescope. (b) Through (d) Results of limiting the smallest

allowed quadregion to be of sizes of and pixels, resp ectively. (Original image

courtesy of NASA.)

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 256

256IMAGE PROCESSINGܳൌቄܧܷܴ݂ܶ݅ߪோܽܦܰܣͲ൏݉ோ൏ܾܨܣܵܮܧݐ݄ݓݎ݁݅݁ݏ

where ߪோand ݉ோ denote the standard deviation and mean of the processed region,

respectively, and a and b denote nonnegative constants.

Numerous regions within the external area of interest were analyzed , and the mean

intensity of pixels in those regions was never greate r than 125, and the standard

deviation was always greater than 10. The figures 8.45 (b) through (d) display the

results acquired by varying the minimum size allowed for quadregions from 32 to

8. White pixels were assigned to quadregions that satisfied the predicate; black

pixels were assigned to the remainder of the region. The optimal result in contexts

of obtaining the outer region's shape was achieved by using quadregions of size In

Fig. 8.45 (d), the small black squares represent quadregions of size who se pixels

do not satisfy the predicate. By using smaller quadregions, the number of such

black regions would increase. Using larger regions than those shown here results

in a more "block -like" segmentation. Not that the segmented region (white pixels)

in each case was a connected region that completely separated the inner, smoother

region from the background. Thus, segmentation partitioned the image effectively

into three distinct areas that correspond to the image's three primary features:

background, dens e region, and sparse region. Using any of the white regions in Fig.

8.45 as a mask would make extracting these regions from the original image a

relatively simple task.

8.4 Region Segmentation using Clustering and Superpixels

We will describe two relevant methods to region segmentation in this section. The

first one is a more traditional approach based on clustering data according to

variables such as intensity and color . The second approach is extremely quite

advanced and is focused primarily on the extraction of "superpixels" from an image

via clustering.

Region Segmentation Using K -Means Clustering

Clustering is a technique for categorizing a set of data into a predetermined number

of groups. K-means clustering is a widely used technique. In k -means clustering, a

collection of data is partitioned into k number groups.

It divides a given set of data into k distinct clusters. The K -means algorithm is

divided into two distinct phases. It calculates the k centroid in the first phase and

then assigns each point to the cluster with the closest centroid to the respective data

point in the second phase. There are several ways to define the distance to the

nearest centroid, with the Euclidean distance being one of the most frequently used. munotes.in

## Page 257

257Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

Once the grouping is complete, it recalculates the new centroid of each cluster and,

using that centroid, calculates a new Euclidean distance between each centre and

each data point, assigning the cluster points with the smallest Euclidean distance to

the cluster. The partiti on's clusters are defined by their member objects and

centroid. The centroid of each cluster is the point to which the sum of the distances

between all of the cluster's objects is minimized . Thus, K -means is an iterative

algorithm for minimizing the sum of the distances between each object and its

cluster centroid across all clusters.

Consider an image with a resolution of x ,y that needs to be clustered into k clusters.

Let p(x, y) denote the clustering of an input pixel to be cluter and ck denote the

clust er centres. The k -means13 clustering algorithm is as follows:

1. Initialize the cluster k and the centre.

2. Using the relation given below, calculate the Euclidean distance d between

the image's centre and each pixel.

݀ൌצሺݔǡݕሻെܿצ

3. Based on the distance d, assign all pixels to the nearest centre.

4. Once all pixels have been assigned, recalculate the center's position using the

relationship shown below.

ܿൌͳ

݇ሺݔǡݕሻ

௫ఢೖ ௬ఢೖ

5. Repeat the process until the tolerance or error valu e is satisfied.

6. Resize the cluster pixels to match the dimensions of the image.

While k -means has the substantial advantage of being simple to implement, it does

have some disadvantages. The final clustering results' quality is determined by the

randomly c hosen initial centroid. As a result, if the initial centroid is chosen

randomly, the result will be different for different initial centres. Thus, the initial

centre will be carefully chosen to achieve the segmentation we desire. Additionally,

computationa l complexity is a factor to consider when designing the K -means

clustering algorithm. It is dependent on the number of data elements, clusters, and

iterations.

EXAMPLE 7: Using k -means clustering for segmentation

Figure 8.46 (a) depicts an image of size pixels, while Figure 8.46 (b) depicts the

segmentation produced by the k -means algorithm, the algorithm extracted all of the

image's meaningful regions with high accuracy. Compare the quality of the munotes.in

## Page 258

258IMAGE PROCESSING

characters in both images, f or instance. It is critical to understand that the entire segmentation process was carried out using clustering of a single variable (intensity). Due to the fact that k -means operates on vector observations in general,

its ability to discriminate between r egions increases as the number of components

in vector z increases.

FIGURE 8.46 (a) Image of size pixels. (b) Image segmented using the k -means

algorithm with k=3

(Reference from "Digital Image Processing fourth edition

by Rafael C. Gonzalez and Richard E. Woods)

Region Segmentation using Superpixels

A superpixel is a collection of pixels that share certain characteristics (like pixel

intensity) . Superpixels are be coming increasingly useful in a variety of computer

vision and image processing algorithms, such as image segmentation, semantic

labelling, object detection and tracking, and so on, because they carry more

information than pixels.

Because pixels belonging to a given superpixel share similar visual properties,

superpixels have a perceptual meaning.

They represent images in a convenient and compact manner, which can be

extremely useful for computationally intensive problems.

The concept behind superpixels is to replace the conventional pixel grid with

primitive regions that are more perceptually important than specific pixels. The

objective is to decrease computational burden and also to strengthen segmentation

performance of the algorithm by removing superfluous detail. A simple illustration

will assist in explaining the fundamental concept of superpixel representations.

Figure 8.47 (a) depicts a 480,000 -pixel -wide image with varying degrees of detail

that could be defined ver bally as follows: "This is an image of two large carved

figures in the foreground and at least three much smaller carved figures resting on

a fence behind the large figures." The figures are on a beach, against a background

munotes.in

## Page 259

259Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

of ocean and sky.” Figure 8.47 (b) depicts the identical image defined by 4,000

superpixels and their boundaries, while Figure 8.47 (c) depicts the superpixel

image. Somebody could make the argument that the superpixel image's level of

particulars outcomes in the same description as the initial, however the the latter

contains only 4,000 primitive units, compared to the original's 480,000. Whether

the superpixel representation is "adequate" is application -dependent. Yes, if the

objective is to define the image at the level of detail menti oned previously. On the

other hand, if the primary objective is to identify defects at pixel -level resolutions, the answer is obviously no. Consequently, there are applications, such as computerised medical diagnosis, where any form of approximate represen tation is

unacceptable. Nonetheless, there are numerous application areas, such as image

database queries, autonomous navigation, and certain branches of robotics, where the cost savings and potential performance improvements far outweigh any discernible l oss of image detail.

FIGURE 8.47(a) Image of size (480,000) pixels. (b) Image composed of 4,000

superpixels (the boundaries between superpixels (in white) are superimposed on

the superpixel image for reference —the boundaries are not part of t he data). (c)

Superpixel image. (Original image courtesy of the U.S. National Park Services.)

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Boundar ies are one of the most important aspects of any superpixel presentation.

That is, superpixel images must preserve boundaries between regions of interest.

As shown in Fig. 8.47 (c), this is the case. Note, for instance, how distinguishable

the figures are from the background. The same is accurate of the ocean -to-sky

boundaries and between the beach and the ocean. Other features include topological

property preservation and, of course, computational efficiency. This section's

superpixel algorithm meets these criteria. Reduce the difference between a superpixel image and its parent image to save storage and computation time. A

similar image is shown in Fig. 8.48 (a), while Fig. 8.48 (b) is a superpixel image

made up of 40,000 superpixels . The only visual difference between these images is

a slight difference in contrast caused by averaging intensities in each superpixel

munotes.in

## Page 260

260IMAGE PROCESSING

region (we will discuss this in more detail later in this section). FIG. 8.48 (c) shows

an overall difference in contras t, as well as minor differences around sharp edges.

Remember that the superpixel image has fewer elements than the original and that

contrast differences can be easily corrected using histogram processing.

FIGURE 8.48 (a) Original image. (b) Image compos ed of 40,000 superpixels. (c)

Difference between (a) and (b).

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Finally, we show the results of reducing superpixels to 1,000, 500, and 250.

Compared to Fig . 8.47 (a), Fig. 8.49 shows a significant loss of detail, but the first

two images contain most of the detail relevant to the image description. Two of the

three small carvings on the back fence were removed. So did the 250 -element

superpixel image. However, the principal regio n boundaries and basic topology of

the images were preserved.

FIGURE 8.49 Top row: Results of using 1,000, 500, and 250 superpixels in the

representation of Fig. 8.47 (a) . As before, the boundaries between superpixels are

superimposed on the images for reference. Bottom row: Superpixel images

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 261

261Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

SLIC Superpixel Algorithm

Simple Linear Iterative Clustering is a state -of-the-art algorithm for segmenting

superpixels that is computationally efficient. In a nutshell, the algorithm clusters

pixels in the consolidated five -dimensional colour and image plane space to

generate comp act, nearly uniform superpixels efficiently.

Actually, the approach is quite straightforward. SLIC performs a local clustering

of pixels in a five -dimensional space defined by the CIELAB colorspace's L, a, and

b values and the pixels' x, y coordinates. It uses a different distance measurement,

which results in more compact and regular superpixel shapes, and it can be used on

both grayscale and colour images.

SLIC creates superpixels by clustering pixels according to their colour similarity

and image plane p roximity. Clustering is performed using a five -dimensional

[labxy] space. For small colour distances, the CIELAB colour space is considered

to be eternally uniform. It is not recommended to use Euclidean distance alone in

5D space, and thus the authors int roduced a new distance measure that takes

superpixel size into account.

Distance Measure

SLIC accepts an input of a desired number of approximately equal -sized

superpixels K. As a result, each superpixel will consist of approximately N/K

pixels. Thus, for superpixels of equal size, there will be a superpixel centre at each

JULGLQWHUYDO6 ¥1.

K cluster centres for superpixels C k = [l k, ak, bk, xk, yk] where k = [1, K] are chosen at regular grid intervals S. Given that any cluster has a spatial extent of approximately S2, it can be believed that pixels related with this cluster are located

within a 2S x 2S area in the xy plane around the superpixel centre.

For small distances in CIELAB colorspace, Euclidean distances are meaningful.

When spatial pixel distances exceed this perceptual colour distance threshold,

spatial pixel distances begin to outweigh pixel colour similarity.

Distance measure Ds is defined as follows.

dlab ¥O k - li)2 + (a k - ai)2 + (b k - bi)2 )

dxy ¥[ k - xi)2 + (y k - yi)2 )

Ds = d lab + (m / S)* d xy

Ds is the sum of the lab and xy plane distances normalised by the grid interval S. In

Ds, we introduce the variable m, which allows us to adjust the superpixel's

compactness. The larger the value of m, the more emphasis is placed on spatial

proximity and the cluster becomes more compact. This can be any value between

[1 and 20]. The algorithm 's authors chose m=10 as the starting point. munotes.in

## Page 262

262IMAGE PROCESSING

Algorithm

It starts by randomly sampling K frequently spaced cluster centres and relocating

them to seed locations corresponding to the neighborhood's lowest gradient

position. This avoids placing them on an edg e and minimises the possibility of

selecting a noisy pixel. The gradients of an image are calculated as

*[\ __,[\í,[í\__2 __,[\í,[\í__2

where I(x, y) denotes the lab vector associated with the pixel at position (x, y), and

||.|| denotes the L2 norm. This takes both colour and intensity information into

account.

Each pixel in the image corresponds to the cluster centre closest to it whose search

area overlaps this pixel. After associating all pixels with the near est cluster centre,

a new centre is computed as the average labxy vector of all cluster pixels.

At the conclusion of this process, a few stray labels may remain, i.e., pixels adjacent

to a larger segment with the same label but not connected to it. In the final step of

the algorithm, it enforces connectivity by relabeling disjoint segments with the

labels of the largest neighbouring cluster.

SLIC Algorithm can be summarized as -

Algorithm 1 Efficient superpixel segmentation

1. Initialize cluster centers C k = [l k, ak, bk, xk, yk]T by sampling pixels at regular

grid steps S.

2. Perturb cluster centers in an n x n neighborhood* to the lowest gradient

position.

3. repeat

4. For each cluster center C k do

5. Assign the best matching pixels from a 2S x 2S square neighborhood around

the cluster center according to the distance measure.

6. end for

7. Compute new cluster centers and residual error E { L1 distance between

previous centers and recomputed centers}

8. until E < threshold

9. Enforce connectivity. munotes.in

## Page 263

263Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

FIGURE 8.50 results of implementation (with K = 100 and m = 20)

[Image reference https://darshita1405.medium.com/superpixels -and-slic-

6b2d8a6e4f08 ]

8.5 Region Segmentation Using Graph Cuts

This section discusses a technique for dividing an image into regions that involves

demonstrating the image's pixels as nodes in a graph and then determining the best

partition (cut) of the graph into groups of nodes. Optimality is defined by criteria

that have a high value for members of a group (i.e., a region) and a low value for

members of other groups. As demonstrated later in this section, graph -cut

segmentation is capable of generating results that are sometimes superior to those

obtained using any of the previously studied segmentation methods. The cost of

this potential benefit is increased implementation complexity, which generally

results in slower execution.

Images as Graphs

A graph, G, is a mathematical structure comprised of a set V of nodes and an

associated set E of edges connecting those vertices:

Node s and edges are referred to as vertices and links.

G= (V, E)

Where V is a set and Cartesian product V × V

E ك V × V

munotes.in

## Page 264

264IMAGE PROCESSING

is a collection of elements from V that are ordered pairs. ሺǡሻא Indicates that ሺǡሻא the graph is stated to be undirected; otherwise, the graph is stated to be

directed. For instance, we can think of a street map as a graph, with nodes

representing street intersections and edges representing the streets that connect

those intersections. T he graph is undirected if all streets are bidirectional (meaning

that we can travel both ways from any two intersections). Otherwise, the graph is

directed if at least one street is one -way.

We are focused in undirected graphs whose edges are even farther characterized by

a matrix, W, for whom the element w (i, j) represents the weight associated with

the edge connecting nodes I and j. Due to the fact that the graph is undirected, w (i,

j) = w (j, i) indicating that W is a symmetric matrix. The weights are chosen so that

they are proportional to one or more measures of similarity between all pairs of

nodes. A weighted graph is a graph whose edges are associated with weights.

The purpose of this section is to represent an image to be segmented as a weighted,

undirected graph, with nodes representing the image's pixels and edges connecting

each pair of nodes. Each edge's weight, w (i, j), is proportional to the similarity

between nodes i and j. We then actively sought to partition the graph's nodes into

disjoin t subsets V 1, V2… VK, where the resemblance between nodes within a subset

is high and the resemblance between nodes in different subsets is low. The

partitioned subsets' nodes correspond to the segmented image's regions.

Additionally, superpixels are well -suited for use as graph nodes. Thus, when we

pertain to "pixels" in an image in this section, we are implicitly referring to

superpixels.

By cutting the graph, set V is partitioned into subsets. A graph cut is a division of

V into two subsets A and B in such a way that

A %9DQG$ŀ%

Where the cut is accomplished by omitting the edges that connect subgraphs A and B. There are two critical aspects to incorporating graph cuts for image segmentation: (1) associating the graph with the image; and (2) cutting the graph

in such a way that makes s ense in terms of partitioning the image into background

and foreground (object) pixels.

The simplified approach depicted in Figure 8.51 demonstrates how to generate a

graph from an image. The nodes of the graph correspond to the pixels in the image,

and to simplify the explanation, we permit edges only between adjacent pixels via

4-connectivity, which implies that neither diagonal edges link the pixels. However,

take note that edges are stipulated in between each pair of pixels in overall. Weights munotes.in

## Page 265

265Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

for edges are usually evaluated using spatial relationships (for instance, distance

from the vertex pixel) and intensity measures (for example, texture and colour),

which are consistent with pixel similarity. The degree of similarity among two

pixels is described i n this simple example as the opposite of the differentiation in

their intensities. – i.e., for two nodes (pixels) ݊ and݊, the weight of the edge

between them isݓሺ݅ǡ݆ሻൌͳȀሺหܫሺ݊ሻെܫ൫݊൯หܿሻ, where ܫሺ݊ሻ and ܫ൫݊൯are the

intensities of the two nodes (pixels), and c is a constant

That can be used to prevent division by zero. Thus, the closer the intensity values

between adjacent pixels are, the greater the value of w.

FIGURE 8.51(a) A 3 × 3 image. (c) A correspon ding graph. (d) Graph cut. (c)

Segmented image.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

While the general form in Fig. 8.51 is what we want to concentrate on in this

section, we also provide another, more common approach for developing image

graphs just for completeness. The same graph as shown in Figure 8.52 is shown

here, however, two additional nodes, the source and sink ter minal nodes, are

presented in addition to the previous, called here the "source" and "sink" terminal

nodes, respectively, all of which are connected via unidirectional links called t -

links.

The terminal nodes do not play a part to the image; rather, they s imply serve to

connect each pixel to one of two possible background or foreground states (an

object or not).

Possible outcomes are the t -links' weights. Thickness of each t -link is proportional

to the probability that the graph node it is connected to is a foreground or

munotes.in

## Page 266

266IMAGE PROCESSING

background pixel in Figures 8.52 (c) and (d) (the thicknesses shown are so that the

segmentation result would be the same as in Fig. 8.51). It is up to the designer to

decide which of the two nodes we call "background" or "foreground".

MINIM UM GRAPH CUTS

After expressing an image as a graph, the graph is cut into two or more subgraphs.

Each resulting subgraph's nodes (pixels) correlate to a region in the segmented

image. According to Fig. 8.52, methods depend on analysing the graph as a flow

network (of pipes, for instance) and determining what is generally referred to as a

minimum graph cut. This methodology is established on the Max -Flow, Minimum -

Cut Theorem. This theorem states that the maximum volume of flow moving fro m

source to sink equals the minimum cut in a flow network. This minimum cut is

described as the smallest total weight of the edges that, if eliminated, will indeed

cause the sink to become disconnected from the source:

ݐݑܿሺܣǡܤሻൌݓሺݑǡݒሻ

௨אǡא

FIGURE 8.52 (a) Same image as in Fig. 8.51 (a). (c) Corresponding graph and

terminal nodes. (d) Graph cut. (b) Se nted image.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 267

267Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

While the min -cut method is elegant, it can contribute in groupings that favor cutting small groups of isolated nodes in a graph, resulting in incorrect segmentation. In Figure 8.53, the two regions of interest are denoted by the

tightness of the pixel gro upings. Meaningful edge weights will be inversely

proportional to the distance between two points. However, this would result in

smaller weights for isolated points, resulting in min cuts such as the one shown in

Fig. 8.53.

FIGURE 8.53 An example showing how a min cut can lead to a meaningless

segmentation. In this example, the similarity between pixels is defined as their

spatial proximity, which results in two distinct regions

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

The normalized cut algorithm calculates the cost of the cut as a fraction of the total

number of edge connections to all nodes in the graph.

ݐݑܿܰሺܣǡܤሻൌݐݑܿሺܣǡܤሻ

ݏݏܽܿሺܣǡܸሻݐݑܿሺܣǡܤሻ

ݏݏܽܿሺܤǡܸሻ

where ݐݑܿሺܣǡܤሻ is given by

ݏݏܽܿሺܣǡܸሻൌݓሺݑǡݖሻ

௨אǡ௭א

is the weighted sum of all edges connecting nodes in subgraph A to nodes in the

entire graph. Similarly,

ݏݏܽܿሺܤǡܸሻൌݓሺݒǡݖሻ

௨אǡ௭א

munotes.in

## Page 268

268IMAGE PROCESSING

The total weight of all the edges from all the edges in B, and connecting all the

vertices . As you can see, the ܽܿݏݏሺܣǡܸሻ݅ݏthe cut of 'A' from the rest of the

graph, and in the same way, ܽܿݏݏሺܤǡܸሻ݅ݏ the cut of 'B' from the rest of the graph.

We can define a metric for total normalized association within graph partitions if

we consider concepts with the same basic building blocks.

ܰܿݏݏܽሺܣǡܤሻൌܽܿݏݏሺܣǡܣሻ

ܽܿݏݏሺܣǡܸሻܽܿݏݏሺܤǡܤሻ

ܽܿݏݏሺܤǡܸሻ

Where ܽܿݏݏሺܣǡܣሻ and ܽܿݏݏሺܤǡܤሻare the overall weights linking the nodes

within A and B . It is trivial to demonstrate that

ܰܿݐݑሺܣǡܤሻൌʹെܰܿݏݏܽሺܣǡܤሻ

Which indicates that minimizing ܰܿݐݑሺܣǡܤሻmaximizes ܰܿݏݏܽሺܣǡܤሻ

simultaneously.

8.6 Segmentation Using Morphological Watersheds

Previously, we discussed segmentation using three primary concepts: edge detection, thresholding, and region extraction. All of these methods was found to have some advantages (for example, global thresholding is fast) and some disadvantages (for example, the need for post processing, such as edge lin king, in

edge based segmentation). We will discuss a method based on the concept of so -

called morphological watersheds in this section. Segmentation by watersheds

incorporates several of the concepts from the other three methods and thus

frequently results in more stable segmentation results, including connected segmentation boundaries. Additionally, this approach establishes a straightforward

framework for incorporating knowledge -based constraints into the segmentation

process.

Watersheds and catchment basins are well -known concepts in topography. Individual catchment basins are divided by watershed lines. Watershed segmentation is a widely used segmentation technique that originated in the field

of mathematical morphology. Watershed segmentati on is a powerful and rapid

technique that can be used for contour detection as well as region -based

segmentation. Watershed segmentation is reliant on ridges for proper segmentation,

a property that is frequently satisfied in contour detection, where the o bjects'

boundaries are expressed as ridges. By calculating an image's edge map, it is

possible to convert the edges of objects into ridges for region -based segmentation.

Watershed management is typically accomplished through region -based growth

guided by a set of markers in order to avoid severe over -segmentation. munotes.in

## Page 269

269Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

In topography, the Watersheds are well -known. It was initially proposed as a

technique for image segmentation. It is a method of image segmentation that is

morphologically based. For the watershed transformation, the gradient magnitude

of an image is treated as a topographic surface. Watershed lines can be located in a

variety of ways. The complete division of an image via watershed transformation

is largely dependent on an accurate estimation of t he image gradients. The

watershed transform result is degraded by the background noise, resulting in the

over segmentation. Additionally, under segmentation is caused when low -contrast

edges generate small magnitude gradients, resulting in the erroneous me rging of

distinct regions. Watershed lines can be located in a variety of ways. Different

approaches to segmentation based on the watershed principle are possible. The image data can be interpreted as a topographic surface with altitudes represented by the gradient image grey levels. Region boundaries correspond to high watersheds, while the interiors of lowgradient regions correspond to catchment basins.

The catchment basins of the topographic surface are homogeneous in the sense that

all pixels belonging to the same catchment basin are connected to the basin's region

of minimum altitude (gray -level) via a simple path of pixels with monotonically

decreasing altitude (gray -level) along the path. These catchment basins then

represent the segments of the segme nted image.

FIGURE 8.54 Gray Level profile of image data b) watershed segmentation –

Local minima of gray level yield catchment basins,

local maxima define the watershed lines.

Two fundamental approaches to segmenting watershed images. The first one

begins by determining a downstream path from each image pixel to a local

minimum of image surface altitude. A catchment basin is described as the sum of

pixels whose downstream paths all terminate at the same altitude minimum.

Whereas the downstream paths for continuous altitude surfaces can be easily

determined by measuring the local gradients, there are no rules that define the

downstream paths uniquely for digital surfaces.

munotes.in

## Page 270

270IMAGE PROCESSING

The second approach is nearly identical to the first; rather than identifying

downstream paths, catchment basins are filled from the bottom up. Assume that

each local minimum has a hole in it and that the topographic surface is submerged

in water; water begins to fill all catchment basins, the minima of which are below

the water level. If two catchment basins merge as a result of further immersion, a

dam is built to the highest surface altitude and serves as a watershed boundary.

An efficient algorithm begins by sorting the pixels in ascending order of their grey

values, followed by a flooding step that involves a fast breadth -first scan of all

pixels in ascending order of their grey levels.

A brightness histogram is computed during the sorting step. Simultaneously, a list

of pointers to gray -level h pixels is created and associated with each histogram

gray-level, allowing direct access to all gray -level pixels. The flooding step makes

extensive use of information about the image's pixel sorting.

Assume that the flooding has reached a level (graylevel, altitude) k. Then, fo r each

pixel with a grey value less than or equal to k, a unique catchment basin label has

already been assigned.

Following that, pixels with a grey level of k+1 must be processed. If at least one of

its neighbours already has this label, a pixel with gray -level k+1 may belong to a

labelled catchment basin.

FIGURE 8.55 A geodesic influence zone of a catchment basin

All pixels with a grey level of k+1 that are within a catchment basin's influence

zone are labelled l, indicating that the catchment basin is growing. The queued

pixels are processed sequentially, and any pixels that cannot be assigned to an

existing label represent newly discovered catchment basins and are assigned new

and unique labels.

munotes.in

## Page 271

271Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

Watershed -flooding analogy

Assume that the grayscale image is a landscape. Allow water to rise from the

bottom of each valley, i.e. give each valley's water its own labe l. Construct a dam

or a watershed as soon as the water from two valleys converges. These watersheds

will define the boundaries between the image's various regions. The watershed

effect can be applied directly to an image, to an edge -enhanced image, or to a

distance -transformed image.

Watershed -drop of water analogy

This grayscale image is presented as a landscape. A drop of water that lands

anywhere in the landscape will flow down to a local minimum. For each local

minimum in the landscape, there is a coll ection of points referred to as the

catchment basin from which a drop of water will flow to that particular minimum.

The watershed is defined by the boundaries between adjacent catchment basins.

Following are some examples of watersheds directly applied to grey level images.

FIGURE 8.56 watershed directly applied on gray level image

FIGURE 8.57 watershed directly applied on gray level image

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Example for seeded watershed

Each cell contains a nucleus, which can be distinguished using threshold and

watershed segmentation. Using these nuclei as seeds, it is simple to locate the

cytoplasm.

munotes.in

## Page 272

272IMAGE PROCESSING

FIGURE 8.58 seeded watershed

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

Marker Based Watershed Segmentation

There will be two markers. The first is an internal marker, while the second is an

external marker. Internal markers are being used to restrict the number of regions

by specifying the objects of interest. For example, seeds can be assigned manually

or automatically to regio ns. Merging regions devoid of markers is permitted (no

dam is to be built) External markers are pixels that we are certain are in the

background. Watershed lines are common external markers that denote the

boundaries of a region. Internal markers can be us ed to obtain watershed lines for

the gradient of the segmented image. As external markers, use the obtained

watershed lines.

Each externally defined region contains a single internal marker and a portion of

the background. The issue is simplified by dividi ng each region into two sections:

an object (which contains internal markers) and a single background (containing

external markers). The following figure illustrates the segmentation of a watershed

using markers.

FIGURE 8.59 Marker Based Watershed Segmen tation

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 273

273Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

8.7 Use of Motion in Segmentation

Motion is a powerful tool that is used in a variety of applications, including

robotics, autonomous motion naviga tion, and dynamic scene analysis, to retrieve

objects of interest from a background of undesired detail. The motion is caused by

the sensing system's relative displacement from the scene being used. There are

two distinct methods for applying motion to seg mentation, and they are as follows.

1. Spatial domain approach

2. Frequency domain approach.

In the following section, we will discuss spatial domain techniques.

Spatial domain techniques

Consider the following: There are a number of image frames available, each taken

at a different point in time. Let f(x, y, t1) and f(x, y, t2) denote the two images taken

at times t1 and t2. Additionally, assume that the image contains a large number of

stationary objects and only one moving object. It is possible to detect the boundary

of a moving object in such circumstances. The procedure for determining the

boundary of a moving object is to find the difference image between images taken

at various point s in time. The image taken at the 0th interval could be used as a

reference image, and subsequent images could be used to subtract from it, resulting

in an image that contains only the boundary of the moving object and cancels out

all stationary objects.

FIGURE 8.60 (a) Image of a cheque (b ) Histogram of the image

(c) Segmented image

munotes.in

## Page 274

274IMAGE PROCESSING

[Image reference from fundamentals of digital image processing by s. annadurai

pearson publication ]

The difference image between two images taken at time t1 and t2 may be defined

as

݀ǡሺݔǡݕሻൌ൜ͳ݂݅ห݂ሺݔǡݕǡݐሻെ݂൫ݔǡݕǡݐ൯หܶݓ݄݁ݎ݁ܶݏ݅ܽݐ݄ݏ݁ݎ݄݈݀

Ͳݐ݄ݎ݁ݏ݅ݓ݁

where T denotes a nonnegative threshold. Notably, ݀ǡሺݔǡݕሻ has a value of 1 at

spatial coordinates (x,y) only if the difference in intens ity among the two images is

meaningfully different at all of those coordinates, as defined by T. Consequently,

note that the coordinates (x,y) cover the dimensions of the two images, asserting

that the difference image is the same size as the sequence's i mages.

All pixels with value 1 in ݀ǡሺݔǡݕሻ are considered to be the result of object motion.

This technique is applicable only if the two images are spatially registered and the

illumination is relatively constant within the bounds defined by T. In practice , noise

can also result in 1 -valued entries in݀ǡሺݔǡݕሻ. Typically, these entries are isolated

points in the difference image, and removing them is as simple as creating four or

eight connected regions of 1's in image ݀ǡሺݔǡݕሻ then ignoring any region with less

than a predefined number of elements. While this approach may result in the

omission of small and/or slow -moving objects, it increases the likelihood that the

remaining entries in the difference image are due to motion and not noise.

Accu mulative Differences

Consider a series of image frames taken at times t1, t2, t3,..., tn and denoted by the

variables f(x, y, t1), f(x, y, t2),..., f(x, y, tn) (x, y, tn ). Assume that the reference

image is the first image frame f(x, y, t1). By comparing the reference image in the

sequence, an accumulative difference image is obtained. Each pixel location in the

accumulative image has its count incremented whenever a diff erence between the

reference and the image in the sequence occurs at that pixel location. Thus, when

the kth frame is compared to the reference, the entry in the given pixel of the

accumulative image indicates the number of times the grey level at that pos ition

differed from the reference image's corresponding pixel value. The equation is used

to cal culate the differences. Figure 8.6 0 illustrates these concepts .

Suppose that the intensity values of moving objects are greater than those of the

background, th ree types of ADIs are considered. To simplify the notation, let

ܴሺݔǡݕሻdenote the reference image and k denote t k, such that ݂ሺݔǡݕǡ݇ሻൌ

݂ሺݔǡݕǡݐሻWe begin by assuming that ܴሺݔǡݕ=)f(x,y,1) Then, for any k > 1, and munotes.in

## Page 275

275Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

remembering that the ADI values a re counts, we define the following accumulative

differences for all relevant values of ሺݔǡݕሻ:

ܣሺݔǡݕሻൌ൜ܣିଵሺݔǡݕሻͳ݂݅ȁܴሺݔǡݕǡሻെ݂ሺݔǡݕǡ݇ሻȁܶ

ܣିଵሺݔǡݕሻݐ݄ݓݎ݁݅݁ݏ

ܲሺݔǡݕሻൌ൜ܲିଵሺݔǡݕሻͳ݂݅ȁܴሺݔǡݕǡሻെ݂ሺݔǡݕǡ݇ሻȁܶ

ܲିଵሺݔǡݕሻݐ݄ݓݎ݁݅݁ݏ

ܰሺݔǡݕሻൌ൜ܰିଵሺݔǡݕሻͳ݂݅ȁܴሺݔǡݕǡሻെ݂ሺݔǡݕǡ݇ሻȁ൏െܶ

ܰିଵሺݔǡݕሻݐ݄ݓݎ݁݅݁ݏ

where ܣሺݔǡݕሻ, ܲሺݔǡݕሻ, and ܰሺݔǡݕሻdenote the absolute, positive, and negative

ADIs computed with the kth image in the sequence, respectively. Each of the three

ADIs begins with zero counts and is the same size as the sequence's images. If the

intensity values of the background pixels are greater than the values of the moving

objects, the order of the inequalities and signs of the thresholds in above equation

are reversed.

EXAMPLE: Computation of the absolute, positive, and negative accumulative

difference images.

The three ADIs are exp ressed as intensity images in Figure 8.61 for a rectangular

object with a dimension of 75 X 50 pixels which is moving southeasterly at a speed

of 5 2 pixels per frame. The images are 256 X 256 pixels in size. We consider the

following into account: (1) The positive ADI's nonzero area equals the width of the

moving object; (2) the positive ADI's location correlates to the location of the

moving object in the frame of reference; (3) the positive ADI's count number stops

growing when the moving object is compl etely displaced from the same object in

the reference frame; (4) The absolute ADI encompasses the positive and negative

ADI regions; and (5) The direction and speed of a moving object can be evaluated

by analyzing the entries in the absolute and negative A DIs.

FIGURE 8.61 ADIs of a rectangular object moving in a southeasterly direction.

(a) Absolute ADI. (b) Positive ADI. (c) Negative ADI.

(Reference from "Digital Image Processing fourth edition by

Rafael C. Gonzalez and Richard E. Woods)

munotes.in

## Page 276

276IMAGE PROCESSING

8.8 Summary

We have studied that the image segmentation can be performed in three different

approaches such as region based approach, boundary based approach, and edge

based approach like Point detection, line detection and edge detection. We have

seen different types of edge detection operation such as prewitt , sobel ,Laplacian

,log ,canny ,the Basics concept of Intensity Thresholding ,Otsu’s Binarization

Application.

In the case of region growing, adjacent pixels are evaluated and assigned to a region

class if no ed ges are detected. Top -down region splitting separates an image into

smaller portions that are more uniform than the entire image.

It is the goal of clustering to identify patterns in a data collection by grouping them

into distinct clusters, where each clu ster is more closely related to the other clusters

than the other clusters are to each other.

A histogram of the image can be utilized as a tool to determine the threshold value

for thresholding given image.

8.9 Unit End questions

1. What are the derivative opera tors useful in image segmentation? Explain

their role in segmentation

2. What is thresholding? Explain about global thresholding

3. Explain about basic adaptive thresholding process used Understand in image

segmentation

4. Explain in detail the threshold selecti on based on boundary characteristics

5. Explain about region based segmentation.

6. What are the derivative operators useful in image segmentation? Explain

their role in segmentation.

7. Explain about the Global processing via the Hough Transform for edge

linkin g

8. Explain about the Global processing via graph -theoretic techniques for edge

linking

9. Explain about Region Splitting and Merging with an example.

10. Write about edge detection

11. Explain about the Local processing for edge linking munotes.in

## Page 277

277Chapter 8: Image Segmentation, Edge Detection, Thresholding, and Region Detection

12. Write short note on Region G rowing

13. Write the mask for prewitt operator

14. Write the mask for sobel operator

15. Write the mask for laplacian operator

8.10 Reference for further reading

1. Digital Image Processing Gonzalez and Woods Pearson/Prentice Hall Fourth

2018

2. Fundamentals of Digital Image Processing A K. Jain PHI

3. The Image Processing Handbook J. C. Russ CRC Fifth 2010

4. All Images courtesy from Digital Image Processing by Gonzalez and Woods,

4th Edition

munotes.in

## Page 278

278IMAGE PROCESSING

Unit 5

9 IMAGE SEGMENTATION II

Unit Structure

9.0 Objectives

9.1 Introduction

9.2 Image Segmentation using Snakes

9.3 Segmentation using Level Sets

9.4 Summary

9.5 Review questions

9.6 References

Objectives

1. To understand the process of segregating the foreground and background of

images.

2. To understand the working of image segmentation using contours and snakes.

9.1 Introduction

Image segmentation is a process of segregating the foreground and background of

images. Basically, canny edge detection algorithms are used in most applica tions

to detect edges. However, in many applications like medical image processing and

imagery applications, instead of basic algorithms active contours known as snakes

are used. This chapter focuses on Image segmentation using snakes and level sets.

9.2 Image Segmentation using Snakes

In image segmentation methods, connectivity and homogeneity are based on image

data. In medical image segmentation analysis, segmentation depends on anatomy.

It is usually difficult to get good segmentation. Active contours are used for image

segmentation. Prior knowledge is essential to get to know the real problem.

In an active contour framework, object segmentation is achieved by using a closed

contour to the objects’ boundary. The snake is active as it continuously evolv ing

such that the energy is minimized. munotes.in

## Page 279

279Chapter 9: Image Segmentation I I

The energy function for a snake works in two parts, internal and external energies

Esnake = E internal + E external

The internal energy depends on its intrinsic properties such as length and curvature.

The external energy depends on factors such as image structure and constraints the

user would have imposed. Snakes start with closed curve and minimizes the total

energy function to deform until they reach an optimal state. The main advantage of

active contour is that it results in closed coherent areas with smooth boundaries.

Fig 9.1 Active contour is semi -automatic as it requires the user to mark the initial

contour

Edge based and region based active contour a) Initial contour b) and c)

Intermediate contour d) Final contour

Fig 9.2 A snake is a parametric curve

A higher -level process or the user initializes any curve close to the object boundary.

The snake then deforms and moves towards the desired object boundary. In the

end, it completely shrink -wraps ar ound the object. Deformable models are curves

or surfaces defined within an image domain that can move under the influence of

munotes.in

## Page 280

280IMAGE PROCESSING

internal forces, which are defined within the curve itself and external forces within

the image data.

Problems with Image segment ation using snakes

1. Snakes may over smooth the boundary

2. Initialization is crucial

3. Depends on number and spacing of control points

4. It may not follow topological changes of objects.

9.3 Image Segmentation using Level Sets

Level Set is an alternative representation to closed contours. Level sets fit and track

objects of interest by modifying the embedding function instead of a curve

function.

Fig 9.3 Ultrasound image segmentation without reinitialization

using Level Set evolution

9.4 Summary

In this chapter, we learned about the process of image segmentation . The process

of image segmentation uses active contours , snakes, and level sets.

9.5 Questions

1. What are the disadvantages of image segmentation classical methods?

2. Explain image segmentation using active contours.

3. What are level sets in image segmentation? Explain.

munotes.in

## Page 281

281Chapter 9: Image Segmentation I I

9.6 References

1. https://www.cs.cmu.edu/~galeotti/methods_course/Segmentation2 -

Snakes.pdf

2. Medical Image Processing by G.R. Sinha and Bhagwati Charan Patel, PHI

publications

3. http://home.iitj.ac.in/~manpreet.bedi/btp/documents/sar.pdf

4. https://www.slideshare.net/UlaBac/lec11 -active -contour -and-level -set-for-

medical -image -segmentation (Reference to image Fig:10.3)

5. Credits to Mr. Ulas Bagci, Assist. Prof at UCF, Research center for Computer

Vision

6. https://www.cs.cmu.edu/~galeotti/methods_course/Level_Sets_and_Parame

tric_Transforms.pdf

For detailed study, use the mentioned links

munotes.in

## Page 282

282IMAGE PROCESSING

Unit 5

10 FEATURE EXTRACTION

Unit Structure

10.0 Objectives

10.1 Introduction

10.2 Boundary Preprocessing

10. 3 Boundary feature descriptors

10.4 Region feature descriptors

10.5 Whole -Image features

10.6 Scale -Invariant Feature Transform (SIFT)

10.7 Summary

10.8 End of the questions

10.9 References

10.0 Objectives

1. Understand the concept of feature extraction and the various feature extraction

algorithms.

2. To broadly be aware of feature detection and feature description techniques.

10.1 Introduction

Feature extraction almost always follows the output of a segmentation stage, which

usually is raw pixel data, constituting either the boundary of a region (i.e., the set

of pixels separating one image region from another) or all the points in the region

itself. Feature extraction consists of feature detection and feature description.

Feature detection refers to finding the features in an image, region , or boundary.

Feature description assigns quantitative attributes to the detected features.

10.2 Boundary Preprocessing

10.2.1 Boundary Following (Tracing)

A boundary -following algorithm whose output is an ordered sequence of points.

We assume (1) that w e are working with binary images in which object and

background points are labelled 1 and 0, respectively; and (2) that images are padded munotes.in

## Page 283

283Chapter 10: Feature Extraction

with a border of 0’s to eliminate the possibility of an object merging with the image

border.

The following algorithm traces the boundary of a 1 -valued region, R, in a binary

image.

1. Let the starting point, b0, be the uppermost -leftmost point† in the image that

is labelled 1. Denote by c0 the west neighbour of b0. Clearly, c0 is always a

background point. Examine the 8-neighbors of b0, starting at c0 and proceeding in a clockwise direction. Let b1 denote the first neighbour encountered whose value is 1, and let c1 be the (background) point immediately preceding b1 in the sequence. Store the locations of b0 for use

in Step 5.

2. Let b = b0 and c = c0.

3. Let the 8 -neighbors of b, starting at c and proceeding in a clockwise direction,

be denoted by n 1, n2, …… n 8. Find the first neighbor labelled 1 and denote it

by n k

4. Let b = n k and c =n k-1.

5. Repeat Steps 3 and 4 until b =b0. The sequence of b points found when the

algorithm stops is the set of ordered boundary points Fig 10.1 Illustration of the first few steps in the boundary -following algorithm. The

point to be processed next is labelled in bold, black; the points yet to be processed

are gray; and the points found by the algorithm are shaded. Squares without labels

are considered background (0) values.

10.2.2 Chain Codes

Chain codes are used to represent a boundary by a connected sequence of straight -

line segments of specified length and direction. A chain code representation is

based on 4 - or 8-connectivity of the segments. The direction of each segment is

coded by using a numbering scheme. A boundary code formed as a sequence of

such directional numbers is referred to as a Freeman chain code.

Digital images usually are acquired and processed in a grid format with equal

spacing in the x - and y -directions, so a chain code could be generated by following

munotes.in

## Page 284

284IMAGE PROCESSING

a boundary in, say, a clockwise direction and assigni ng a direction to the segments

connecting every pair of pixels

Fig 1 0.2 Direction numbers for 1) 4 -directional 2) 8 -directional chain code

Fig 1 0.3 a) Digital boundary with resampling grid b) Resampled grid

c) 8-directional chain -coded boundary

Exampl e of 4 -directional chain code

Consider the given image, the 4 -directional chain code is given by

0 0 3 0 3 3 2 2 2 1 1 1

munotes.in

## Page 285

285Chapter 10: Feature Extraction

Circular difference = last -first = 0 – 0 = 0

Therefore, circular chain code is 0 0 0 3 0 3 3 2 2 2 1 1 1

Shape number - Shape no of a boundary obtained from a chain code is defined as

the smallest magnitude of the circular first difference.

10.2.3 Slope Chain codes (SCC)

Using Freeman chain codes generally requires resampling a boundary to smooth

small varia tions, a process that implies defining a grid and subsequently assigning

all boundary points to their closest neighbours in the grid. The SCC of a 2 -D curve

is obtained by placing straight -line segments of equal length around the curve, with

the end points of the segments touching the curve. Obtaining an SSC requires

calculating the slope changes between contiguous line segments, and normalizing

the changes to the continuous (open) interval ( -1, 1).

Fig 1 0.4 (a) An open curve. (b) A straight -line segment . (c) Traversing the curve

using circumferences to determine slope changes; the dot is the origin (starting

point). (d) Range of slope changes in the open interval ( -1 ,1)

10.2.4 Boundary Approximation using Minimum -Perimeter Polygon

A digital boundary can be approximated with arbitrary accuracy by a polygon. For

a closed curve, the approximation becomes exact when the number of segments of

the polygon is equal to the number of points in the boundary, so each pair of

munotes.in

## Page 286

286IMAGE PROCESSING

adjacent points defines a segment of the polygon. A boundary can be represented

by Minimum -Perimeter Polygon.

MPP Algorithm

1. The MPP bounded by a simply connected cellular complex is not self -

intersecting.

2. Every convex vertex of the MPP is a W vertex, but not every W vertex of a

boundary is a vertex of the MPP.

3. Every mirrored concave vertex of the MPP is a B vertex, but not every B

vertex of a boundary is a vertex of the MPP.

4. All B vertices are on or outside the MPP, and all W vertices are on or inside

the MPP.

5. The uppermost -leftmost vertex in a sequence of vertices contained in a

cellular complex is always a W vertex of the MPP.

Fig 1 0.5 a) The Region b) Convex and Concave c) MPP superimposed on

concave, convex vertices

10.2.5 Signatures

A signature is a 1 -D func tional representation of a 2 -D boundary. Plot the distance

from the centroid to the boundary as a function of angle. The basic idea of using

signatures is to reduce the boundary representation to a 1 -D function. Though

simplicity is the advantage, the disa dvantage is scaling of the entire function

depends on only two values: the minimum and maximum.

Distance versus angle - Plot the distance from the centroid to the boundary as a

IXQFWLRQRIDQJOHVVLJQDWXUH Ușș aʌ

munotes.in

## Page 287

287Chapter 10: Feature Extraction

Fig 10.6 Distance versus angl e signatures

10.2.6 Skeletons, Medial Axes and Distance transforms

The skeleton of a region is the set of points in the region that are equidistant from

the border of the region. The skeleton is obtained using one of two principal

approaches: (1) by succe ssively thinning the region (e.g., using morphological

erosion) while preserving end points and line connectivity (this is called topology -

preserving thinning); or (2) by computing the medial axis of the region via an

efficient implementation of the medial axis transform (MAT). The skeleton of a

region is defined as its medial axis.

Fig 10.7 Medial axes of three simple regions

The distance transform of a region of foreground pixels in a background of zeros

is the distance from every pixel to the nearest n onzero valued pixel.

Fig 10.8 An image with its distance transform

munotes.in

## Page 288

288IMAGE PROCESSING

10.3 Boundary feature descriptors

10.3.1 Basic Boundary descriptors

The length of a boundary is one of its simplest descriptors.

The diameter of a boundary B is defined as

where D is a distance measure and pi and pj are points on the boundary. The value

of the diameter and the orientation of a line segment connecting the two extreme

points that comprise the diameter is called the major axis (or longest chord) of the

boundary.

The lengt h and orientation of the major axis are given by

The minor axis (also called the longest perpendicular chord) of a boundary is

defined as the line perpendicular to the major axis.

The curvature of a boundary is defined as the rate of change of slope.

10.3.2 Fourier descriptors

It represents the boundary as a sequence of coordinates and treats each coordinate

pair as a complex number. The discrete Fourier transform (DFT) of s( k) is

Where u = 0,1, 2, …., K -1. The complex coefficients a (u) is called th e Fourier

descriptors of the boundary.

munotes.in

## Page 289

289Chapter 10: Feature Extraction

Fig 10.9 Basic properties of Fourier descriptors

10.4 Region Feature Descriptors

10.4.1 Basic Descriptors

The area of a region is defined as the number of pixels in the region.

The perimeter of a region is the length of its boundary.

Compactness of a region, defined as the perimeter squared over the area

Another Dimensionless measure is circularity (also called roundness), defined as

The value of this descriptor is 1 for a circle (its maximum value) and p 4 for a

square.

The eccentricity of a region relative to an ellipse as the eccentricity of an ellipse

that has the same second central moments as the region.

10.4.2 Topological Descriptors

Topology is the study of properties of a figure that are unaffected by any

deformation, provided that there is no tearing or joining of the figure (sometimes

these are called rubber -sheet distortions). Topological property useful for region

description is the number of connected components of an image. The numbe r of

holes H and connected components C in a figure can be used to define the Euler

number, E. The Euler number is also a topological property. E =C - H

munotes.in

## Page 290

290IMAGE PROCESSING

Fig 10.10 (a) A region with two holes. (b) A region with

three connected components.

Fig 10.11 Reg ion with Euler number 0 and -1 respectively

Fig 10.12 A region with polygonal network

10.4.3 Central Moments

The 2 -D moment of order (p + q) of an M N× digital image, f (x, y), is defined as

where p = 0,1,2… and q = 0,1,2 … are integers.

The central moment of order (p + q) is defined as

where p = 0,1,2… and q = 0,1,2 … are integers

munotes.in

## Page 291

291Chapter 10: Feature Extraction

The normalized central moment of order (p + q), is defined as

Where

A set of seven, 2 -D moment invariants can be derived from the second and third

normalized central moments

10.5 Whole Image features

The principal feature detection methods are based on detecting corners and the

other works with entire regions in an image.

The Harris -Stephens Corner Detector

The basic approach is this: Corners are detected by running a small window over

an image.

The detector window is designed to compute intensity changes

munotes.in

## Page 292

292IMAGE PROCESSING

Fig 10.13 Illustration of Harris -Stephens Corner detector

Three scenarios

(1) Areas of zero (or small) intensity changes in all directions, which happens

when the window is located in a constant (or nearly constant) region, as in

location A.

(2) areas of changes in one direction but no (or small) changes in the orthogonal

direction, which this happens when the window spans a boundary between

two region s, as in location B

(3) areas of significant changes in all directions, a condition that happens when

the window contains a corner (or isolated points), as in location C.

The HS corner detector is a mathematical formulation that attempts to differentiate

between these three conditions.

Let f denote an image, and let f (s, t) denote a patch of the image defined by the values of (s, t). A patch of the same size, but shifted by (x, y),

is given by f (s + x, t + y).

The above equation can be written in the form

Where

munotes.in

## Page 293

293Chapter 10: Feature Extraction

Matrix M is called as Harris matrix.

10.6 Scale Invariant Feature Transform (SIFT)

SIFT is an algorithm developed by Lowe [2004] for extracting invariant features

from an image. I t is called a transform because it transforms image data into scale -

invariant coordinates relative to local image features. SIFT is by far the most

complex feature detection and description approach. In the presence of variables

such as scale changes, rota tion, changes in illumination, and changes in viewpoint,

methods like SIFT is used.

SIFT features (called key points) are invariant to image scale and rotation, and are

robust across a range of affine distortions, changes in 3 -D viewpoint, noise, and

chang es of illumination. The input to SIFT is an image. Its output is an n -

dimensional feature vector whose elements are the invariant feature descriptors

The first stage of the SIFT algorithm is to find image locations that are invariant to

scale change. This is achieved by searching for stable features across all possible

scales, using a function of scale known as scale space. The parameter controlling

the smoothing is referred to as the scale parameter. In SIFT, Gaussian kernels are

used to implement smoothin g, so the scale parameter is the standard deviation.

SIFT subdivides scale space into octaves, with each octave corresponding to a

doubling of ߪ. SIFT initially finds the locations of key points using the Gaussian

filtered images, then refines the locatio ns and validity of those key points using improving the accuracy and eliminating edge responses, compute key point orientations and descriptors.

10.7 Summary

In this chapter, we learned about feature extraction algorithms. We also learned

region -based and Fourier descriptors . The role of chain codes, signatures, and

whole image features.

munotes.in

## Page 294

294IMAGE PROCESSING

10.8 Review questions

1. How can one represent and describe features after extraction?

2. What are region descriptors?

3. Explain Fourier descriptors briefly.

4. What are chain codes? Explain with examples.

5. Write a short note on signatures

6. Describe Whole image features.

7. Explain Harris -Stephens Corner detector algorithm.

8. Write a short not e on SIFT.

10.9 References

1. Digital Image Processing by Gonzalez and Woods, 4th Edition.

2. All Images courtesy from Digital Image Processing by Gonzalez and Woods,

4th Edition

3. https://www.ee.columbia.edu/~xlx/courses/ee4830 -sp08/notes/lec10.pdf

munotes.in