## Page 1

1 Experiment – I

1

IMAGE ENHANCEMENT

Spatial Domain and Frequency Domain Techniques

1.1.0 Objectives

The aim of image enhancement is to improve the interpretability or

perception of information in images for human viewers, or to provide

better' input for other automated image processing tec hniques. Image

enhancement techniques can be divided into two broad categories:

1. Spatial domain methods, which operate directly on pixels, and

2. Frequency domain methods, which operate on the Fourier transform of

an image.

1.1.1 Introduction to Image Enhancem ent

● The principal objective of image enhancement is to process a given

image so that the result is more suitable than the original image for a

specific application.

● It accentuates or sharpens image features such as edges, bou ndaries, or

contrast to make a graphic display more helpful for display and

analysis.

● The enhancement doesn't increase the inherent information content of

the data, but it increases the dynamic range of the chosen features so

that they can be detected eas ily

● The greatest difficulty in image enhancement is quantifying the

criterion for enhancement and, therefore, a large number of image

enhancement techniques are empirical and require interactive

procedures to obta in satisfactory results.

● Image enhancement methods can be based on either spatial or

frequency domain techniques. munotes.in

## Page 2

Image Processing Lab

2 1.1.2 Spatial Filtering:

The use of spatial masks for image processing is called spatial filtering.

The masks used are called spatial filters.

● The basic approach is to sum products between the mask coefficients

and the intensities of the pixels under the mask at a specific location in

the image. (2D convolution).

where (2d+1)X(2d+1) is the mask size, w(i,j)'s are weights of the mask,

f(x,y) is input pixel at coordinates (x,y), R(x,y) is the output value at (x,y).

If the center of the mask is at location (x,y) in the image, the gray level of

the pixel located a t (x,y) is replaced by R, the mask is then moved to the

next location in the image and the process is repeated. This continues until

all pixel locations have been covered.

1.1.2.1 Smoothing filter

1. Smoothing filters are used for blurring and for noise reduc tion.

2. Blurring is used in preprocessing steps, such as removal of small

details from an image prior to object extraction, and bridging of small

gaps in lines or curves.

3. Noise reduction can be accomplished by blurring with a linear filter

and also by nonl inear filtering. munotes.in

## Page 3

Image Enhancement

3 a. Low pass filtering:

● The key requirement is that all coefficients are positive.

● Neighborhood averaging is a special case of LPF where all

coefficients are equal.

● It blurs edges and other sharp details in the image.

b. Median filtering:

If the objective is to achieve noise reduction instead of blurring, this

method should be used. This method is particularly effective when the

noise pattern consists of strong, spike -like components and th e

characteristic to be preserved is edge sharpness. It is a nonlinear operation.

For each input pixel f(x,y), we sort the values of the pixel and its

neighbors to determine their median and assign its value to the output

pixel g(x,y).

Fig.2: Median Filter

1.1.2.2 Sharpening Filters

To highlight fine detail in an image or to enhance detail that has been

blurred, either in error or as a natural effect of a particular method of

image ac quisition. Uses of image sharpening vary and include applications

ranging from electronic printing and medical imaging to industrial

inspection and autonomous target detection in smart weapons. munotes.in

## Page 4

Image Processing Lab

4 a. Basic high pass spatial filter:

The shape of the impulse respo nse needed to implement a high pass spatial

filter indicates that the filter should have positive coefficients near its

center, and negative coefficients in the outer periphery.

Example : filter mask of a 3x3 sharpening filter

The filtering output pixels might be of a gray level exceeding [0,L -1].

The results of high pass filtering involve some form of scaling and/or

clipping to make sure that the gray levels of the final results are within

[0,L-1].

1.1.3 Frequenc y Domain Filtering

We simply compute the Fourier transform of the image to be enhanced,

multiply the result by a filter transfer function, and take the inverse

transform to produce the enhanced image.

Spatial domain: g(x,y)=f(x,y)*h(x,y)

Frequency domain: G(w1,w2)=F(w1,w2)H(w1,w2)

a. Low Pass filtering:

Edges and sharp transitions in the gray levels contribute to the high

frequency content of its Fourier transform, so a low pass filter smoothes

an image.

Formula of ideal LPF:

b. High Pass filtering:

A high pass filter attenuates the low frequency components without

disturbing the high frequency information in the Fourier transform

domain and can sharpen edges. munotes.in

## Page 5

Image Enhancement

5 Formula of ideal HPF function:

Experiment 01

Aim : - Program for image enhancement (Smoothing & Sharpening )

using spatial domain filters.

Objective : -

The purpose of this assignment is to study image filtering in the spatial

domain. Spatial filtering is performed by convolving the image with a

mask or a kernel.Spatial filters include sharpening, smoothing, edge

detection, noise removal, etc . It consists of four parts: the first one

discusses the spatial filtering of an image using a spatial mask .3x3, 5x5,

and then this mask is used in a blurring filter . The second part studies the

order statistics filters, specially the median filter.

Part I Smoothing spatial filter: The output of a smoothing spatial filter is

simply the average of the pixels contained in the neighborhood of the filter

mask. - These filters are sometimes called averaging filters and also low

pass filters - Two types of mas ks of the spatial filter

munotes.in

## Page 6

Image Processing Lab

6 Steps : -

1) Read input Image.

2) Add noise using the “imnoise()” function.

3) Define a (3 x 3) filter.

4)Use convolution function conv2() for filtering

• Order statistics filters are nonlinear spatial filters whose response is

based on ordering (ranking) the pixels contained in an area covered by the

filter

• The best known example in this category is median filter

• Median filter - Median filters replace the value of the pixel by t he

median of the gray levels in the neighborhood of that pixel

Part : -II Sharpening spatial filter:

Spatial domain sharpening filters are also called as High Pass Filters

Laplacian Filters

Laplacian Filters

munotes.in

## Page 7

Image Enhancement

7 Gradient Filter

Part III : - Study of spatial domain filters

Study following functions and use them on various images with all

possible parameters.

fspecial(), imfilter() Ordfilt2() Medfilt2() Imnoise() Median()

fspecial : - Create predefined 2 -D filter

Syntax h = fspecial(type)

h = fspecial(type, parameters)

Description:

h = fspecial(type) create a two -dimensional filter h of the specified type.

fspecial returns h as a correlation kernel,.

Value Description

'average' Averaging filter

'gaussian' Gaussian lowpass filter

'laplacian' Approximates the two -dimensional Laplacian

operator

'prewitt' Prewitt horizontal edge -emphasizing filter

‘sobel' 'Sobel horizontal edge -emphasizing filter

‘Unsharp’ ''Unsharp contrast enhancement filter

munotes.in

## Page 8

Image Processing Lab

8 Value Description 'average' Averaging filter 'gaussian' Gaussian lowpass

filter

h = fspecial(type, parameters)

accepts the filter specified by type plus additional modifying parameters

particular to the type of filter chosen. If you omit these arguments, fspecial

uses default values for the parameters. The following list shows the syntax

for each filter type. Where applicable,

h = fspecial('average', hsize)

returns an averaging filter h of size hsize. The argument hsize can be a

vector specifying the number of rows and columns in h, or it can be a

scalar, in which case h is a square matrix. The default value for h size is

[3 3].

h = fspecial ('gaussian', hsize, sigma) : -

returns a rotationally symmetric Gaussian lowpass filter of size hsize with

standard deviation sigma (positive). hsize can be a vector specifying the

number of rows and columns in h, or it can be a scalar, in which case h is a

square matrix. The default value for hsize is [3 3]; the default value for

sigma is 0.5.

h = fspecial('laplacian', alpha)

returns a 3 -by-3 filter approximating the shape of the two -dimensional

Laplacian operat or. The parameter alpha controls the shape of the

Laplacian and must be in the range 0.0 to 1.0. The default value for alpha

is 0.2.

h = fspecial('log', hsize, sigma)

returns a rotationally symmetric Laplacian of Gaussian filter of size hsize

with standa rd deviation sigma (positive). hsize can be a vector specifying

the number of rows and columns in h, or it can be a scalar, in which case h

is a square matrix. The default value for hsize is [5 5] and 0.5 for sigma.

h = fspecial('prewitt')

h = fspecial(' sobel')

Median Filter : - Median filters replace the value of the pixel by the

median of the gray levels in the neighborhood of that pixel

1) Open /Read an image in a matrix .

2) Create a 3x3 matrix B called a mask.

3) Read the first 3x3 pixel grid of t he input image into B.

4) Sort the matrix B in ascending order. munotes.in

## Page 9

Image Enhancement

9 5) Select the middle value and put that as the first pixel value in the

output image matrix.

6) Repeat the procedure for the entire input image by reading the next 3x3

values from the input image and sort using mask B. This way Output

image values are calculated.

7) Display the input image and Output Image.

Programs for image enhancement using spatial domain filters. %This

program is for Averaging spatial Filter

%This program is for Averaging spatial Filter

a=imread('D: \DIP Course Material \DIP pract \Images \rose.jpg');

% Addition of noise to the input image

b=imnoise(a,'salt & pepper');

c=imnoise(a,'gaussian');

d=imnoise(a,'speckle');

% Defining 3x3 and 5x5 kernel

h1=1/9*ones(3,3) ; h2=1/25*ones(5,5);

% Attempt to recover the image

b1=conv2(b,h1,'same');

b2=conv2(b,h2,'same');

c1=conv2(c,h1,'same');

c2=conv2(c,h2,'same');

d1=conv2(d,h1,'same');

d2=conv2(d,h2,'same');

a=imread('D: \DIP Course Material \DIP pract \Images \rose.jpg');

% Addition of noise to the input image

b=imnoise(a,'salt & pepper');

c=imnoise(a,'gaussian');

d=imnoise(a,'speckle');

% Defining 3x3 and 5x5 kernel

h1=1/9*ones(3,3);

h2=1/25*ones(5,5); munotes.in

## Page 10

Image Processing Lab

10 % Attempt to recover the image

b1=conv2(b,h1,'same');

b2=conv 2(b,h2,'same');

c1=conv2(c,h1,'same');

c2=conv2(c,h2,'same');

d1=conv2(d,h1,'same');

d2=conv2(d,h2,'same');

% displaying the result figure,

subplot(2,2,1),

imshow(a),

title('Original Image'),

subplot(2,2,2),

imshow(b),

title('Salt & Pepper noise'),

subplot(2,2,3),

imshow(uint8(b1)),

title('3 x 3 Averaging filter'),

subplot(2,2,4)

imshow(uint8(b2)),

title('5 x 5 Averaging filter')

%........................... figure,

subplot(2,2,1),

imshow(a),

title('Original Image'),

subplot(2,2,2),

imshow(c),

title('Gaussian noise'),

subplot(2,2,3),imshow(uint8(c1)),

title('3 x 3 Averaging filter'), munotes.in

## Page 11

Image Enhancement

11 subplot(2,2,4),

imshow(uint8(c2)),

title('5 x 5 Averaging filter'),

%.................. figure,

subplot(2,2,1),

imshow(a),

title('Original Image'),

subplot(2,2,2),

imshow(d),

title('Speckle noise'),

subplot(2,2,3),

imshow(uint8(d1)),

title('3 x 3 Averaging filter'),

subplot(2,2,4),

imshow(uint8(d2)),

title('5 x 5 Averaging filter'),

%this program is for comparing averaging & median filter

clc clear

all close all

a=imread('D: \DIP Course Material \DIP pract \Images \horse.jpg');

%Addition of salt and pepper noise b=imnoise(a,'salt & pepper',0.1);

%Defining the box and median filters

h1=1/9*ones(3,3);

h2=1/25*ones(5,5);

c1=conv2(b,h1,'same');

c2=conv2(b ,h2,'same');

c3=medfilt2(b,[3 3]);

c4=medfilt2(b,[5 5]);

subplot(3,2,1), munotes.in

## Page 12

Image Processing Lab

12 imshow(a),

title('Original image')

subplot(3,2,2),

imshow(b),

title('Salt & pepper noise')

subplot(3,2,3),

imshow(uint8(c1)),

title('3 x 3 smoothing')

subplot(3,2,4),

imshow(uin t8(c2)),

title('5 x 5 smoothing')

subplot(3,2,5),

imshow(uint8(c3)),

title('3x 3 Median filter')

subplot(3,2,6),

imshow(uint8(c4)),

title('5 x 5 Median filter')

% this program is for sharpening spatial domain filter

%Sharpening Filters

A=ones(200,200);

A(30:60,30:60)=0;

A(70:150,50:170)=0

figure(1) ,

subplot(1,2,1)

imshow(A)

AM=[1 1 1;1 -8 1;1 1 1];

B=conv2(A,AM);

subplot(1,2,2),

imshow(B) munotes.in

## Page 13

Image Enhancement

13 % this program is for sharpening spatial domain filter

a=imread('D: \horse.jpg');

%Defining the laplacian filters

h1=[0 -1 0;-1 4 -1;0 -1 0]

h2=[ -1 -1 -1;-1 8 -1; -1 -1 -1];

h3=[ -1 -1 -1;-1 9 -1; -1 -1 -1];

c1=conv2(a,h1,'same');

c2=conv2(a,h2,'same');

c3=conv2(a,h3,'same');

subplot(2,2,1),imshow(a),

title('Original image')

subplot(2,2,2),

imshow(uint8(c1)),

title('Laplacian sharpening 4 at center')

subplot(2,2,3),imshow(uint8(c2)),

title('Laplacian sharpening 8 at center ')

subplot(2,2,4),

imshow(uint8(c3)),

title(' Laplacian sharpening 9 at center')

%Averaging Filter

A=ones(200,200);

A(30:60,30:60)=0;

A(70:150,50:170)=0

figure(1)

subplot(1,2,1)

imshow(A)

AM=1/9.*[1 1 1;1 1 1;1 1 1];

B=conv2(A,AM);

subplot(1,2,2)

imshow(B)

munotes.in

## Page 14

Image Processing Lab

14 Experiment 02

AIM: To Implement smoothing or averaging filters in spatial domain.

OBJECTIVE: To Implement smoothing or averaging filters in spatial

domain.

TOOLS REQUIRED: MATLAB

THEORY:

Filtering is a technique for modifying or enhancing an image. Masks or

filters will be defined. The general process of convolution and correlation

will be intr oduced via an example. Also smoothing linear filters such as

box and weighted average filters will be introduced. In statistics and image

processing, to smooth a data set is to create an approximating function that

attempts to capture important patterns in the data, while leaving out noise

or other fine -scale structures/rapid phenomena. In smoothing, the data

points of a signal are modified so individual points (presumably because

of noise) are reduced, and points that are lower than the adjacent points are

increased leading to a smoother signal. Smoothing may be used in two

important ways that can aid in data analysis by being able to extract more

information from the data as long as the assumption of smoothing is

reasonable by being able to provide analyse s that are both flexible and

robust. different algorithms are used in smoothing.

% Program for implementation of smoothing or averaging filter in

spatial domain

I=imread('trees.tif');

subplot(2,2,1);

imshow(J);

title('original image');

f=ones(3,3)/9;

h=imfilter(I,f,'circular');

subplot(2,2,2);

imshow(h);

title('averaged image'); munotes.in

## Page 15

Image Enhancement

15 Result:

Conclusion: Thus we have performed the smoothing or averaging filter

operation on the Original image and we get a filtere d image.

Discrete Fourier Transform:

The general idea is that the image ( f(x,y) of size M x N) will be

represented in the frequency domain ( F(u,v) ). The equation for the two -

dimensional discrete Fourier transform (DFT) is:

The concept behind the Fourier transform is that any waveform can be

constructed using a sum of sine and cosine waves of different frequencies.

The exponential in the above formula can be expanded into sines and

cosines wit h the variables u and v determining these frequencies.

The inverse of the above discrete Fourier transform is given by the

following equation:

Thus, if we have F(u,v) , we can obtain the corresponding image ( f(x,y) )

using the inverse, discrete Fourier transform.

Things to note about the discrete Fourier transform are the following:

● the value of the transform at the origin of the frequency domain, at

F(0,0) , is called the dc component

○ F(0,0) is equal to MN time s the average value of f(x,y)

○ in MATLAB, F(0,0) is actually F(1,1) because array

indices in MATLAB start at 1 rather than 0

munotes.in

## Page 16

Image Processing Lab

16 ● The values of the Fourier transform are complex, meaning they have

real and imaginary parts. The imaginary parts are represented by i,

which is defined solely by the property that its square is −1, ie:

● We visually analyze a Fourier transform by computing a Fourier

spectrum (the magnitude of F(u,v) ) and display it as an image.

1. the Fourier spe ctrum is symmetric about the origin

● The fast Fourier transform (FFT) is a fast algorithm for computing the

discrete Fourier transform.

● MATLAB has three functions to compute the DFT:

1. fft -for one dimension (useful for audio)

2. fft2 -for two dimensions (useful for images)

3. fftn -for n dimensions

● MATLAB has three related functions that compute the inverse DFT:

1. ifft

2. ifft2

3. ifftn

How to Display a Fourier Spectrum using MATLAB?

%Create a black 30x30 image

f=zeros(30,30 );

%With a white rectangle in it.

f(5:24,13:17)=1;

imshow(f,'InitialMagnification', 'fit')

%Calculate the DFT.

F=fft2(f);

%There are real and imaginary parts to F.

%Use the abs function to compute the magnitude

%of the combined components.

F2=abs(F); munotes.in

## Page 17

Image Enhancement

17 figure, imshow(F2,[], 'InitialMagnification','fit')

%To create a finer sampling of the Fourier transform,

%you can add zero padding to f when computing its DFT

%Also note that we use a power of 2, 2^256

%This is because the FFT -Fast Fourier Transform -

%is fastest when the image size has many factors.

F=fft2(f, 256, 256);

F2=abs(F);

figure, imshow(F2, [])

%The zero -frequency coefficient is displayed in the

%upper left hand corner. To display it in the center,

%you can use the function fftshift.

F2=ff tshift(F);

F2=abs(F2);

figure,imshow(F2,[])

%In Fourier transforms, high peaks are so high they

%hide details. Reduce contrast with the log function.

F2=log(1+F2);

figure,imshow(F2,[])

To get the results shown in the last image of the table, you ca n also

combine MATLAB calls as in:

f=zeros(30,30);

f(5:24,13:17)=1;

F=fft2(f, 256,256);

F2=fftshift(F);

figure,imshow(log(1+abs(F2)),[])

Notice in these calls to imshow, the second argument is empty square

brackets. This maps the minimum value in the image to black and the

maximum value in the image to white.

munotes.in

## Page 18

Image Processing Lab

18 1.1.4 References

● Digital Image Processing, Using MATLAB , by Rafael C. Gonzalez,

Richard E. Woods, and Steven L. Eddins

● Image Processing Toolbox, For Use with MATLAB (MATLAB's

documentation) --available through MATLAB's help menu or

online at:

http://www.mathworks.com/access/helpdesk/help/toolbox/ima ges/

● Frequency Domain Processing: www.cs.uregina.ca/Links/class -

info/425/Lab5/index.html

munotes.in

## Page 19

19 Experiment -II

2

DISCRETE FOURIER TRANSFORMATION

Aim: To find DFT/FFT forward and inverse transform of image.

Theory :

FFT: fast Fourier transform.

IFFT: Inverse fast Fourier transform.

Discrete Fourier Transform (DFT)

From the previous section, we learned h ow we can easily characterize a

wave with period/frequency, amplitude, phase. But these are easy for

simple periodic signal, such as sine or cosine waves. For complicated

waves, it is not easy to characterize like that. For example, the following is

a rela tively more complicate waves, and it is hard to say what’s the

frequency, amplitude of the wave, right?

There are more complicated cases in real world, it would be great if we

have a method that we can use to analyze the characteristics of the wave.

The Fourier Transform can be used for this purpose, which it decompose

any signal into a sum of simple sine and cosine waves that we can easily

measure the frequency, amplitude and phase. The Fourier transform can be

applied to continuous or discrete waves, in this chapter, we will only talk

about the Discrete Fourier Transform (DFT). munotes.in

## Page 20

Image Processing L ab

20 Using the DFT, we can compose the above signal to a series of sinusoids

and each of them will have a different frequency. The following 3D figure

shows the idea behind the DFT, th at the above signal is actually the results

of the sum of 3 different sine waves. The time domain signal, which is the

above signal we saw can be transformed into a figure in the frequency

domain called DFT amplitude spectrum, where the signal frequencies are

showing as vertical bars. The height of the bar after normalization is the

amplitude of the signal in the time domain. You can see that the 3 vertical

bars are corresponding the 3 frequencies of the sine wave, which are also

plotted in the figure.

In this section, we will learn how to use DFT to compute and plot the DFT

amplitude spectrum.

DFT

The DFT can transform a sequence of evenly spaced signal to the

information about the frequency of all the sine waves that needed to sum

to the time domain sign al. It is defined as:

Xk=∑n=0N −1xn e−i2πkn/N=∑n=0N −1xn[cos(2 πkn/N) −isin(2πkn/N)]

Xk=∑n=0N ⋅−1xn e−i2πkn/N=∑n=0N −1xn[cos(2 πkn/N) −isin(2πkn/N)]

where

● N = number of samples

● n = current sample

● k = current frequency, where k∈[0,N−1]k ∈[0,N−1]

● xnxn = the sine valu e at sample n

● XkXk = The DFT which include information of both amplitude and

phase munotes.in

## Page 21

Discrete Fourier Transform

21 Also, the last expression in the above equation derived from the Euler’s

formula , which links the trigonometric functions to the complex

exponential function: eix=cosx+isi nxeix=cosx+isinx

Due to the nature of the transform, X0=∑N−1n=0xnX0=∑n=0N−1xn.

If NN is an odd number, the elements X1,X2,...,X(N −1)/2X1,

X2,...,X(N −1)/2 contain the positive frequency terms and the

elements X(N+1)/2,...,XN −1X(N+1)/2,...,XN−1 contain th e negative

frequency terms, in order of decreasingly negative frequency. While

if NN is even, the elements X1,X2,...,XN/2 −1X1,X2,...,XN/2−1 contain

the positive frequency terms, and the elements XN/2, ...,XN −1XN/2,...,

XN−1 contain the negative frequency t erms, in order of decreasingly

negative frequency. In the case that our input signal xx is a real -valued

sequence, the DFT output XnXn for positive frequencies is the conjugate

of the values XnXn for negative frequencies, the spectrum will be

symmetric. Th erefore, usually we only plot the DFT corresponding to the

positive frequencies.

Note that the XkXk is a complex number that encodes both the amplitude

and phase information of a complex sinusoidal component ei2πkn/Nei

2πkn/ N of function xnxn. The amplitude and phase of the signal can be

calculated as:

amp=|Xk|N=Re(Xk)2+Im(Xk)2 −−−−−−−−−−−−−−−−√Namp=|Xk|N=Re(

Xk)2+Im(Xk)2N

phase=atan2(Im(Xk),Re(Xk))phase=atan2(Im(Xk),Re(Xk))

where Im(Xk)Im(Xk) and Re(Xk)Re(Xk) are the imagery and real part of

the complex number, atan2atan2 is the two -argument form of

the arctanarctan function.

The amplitudes returned by DFT equal to the amplitudes of the signals fed

into the DFT if we normalize it by the number of sample points. Note that

doing this will divide the power between the positive and negative sides, if

the input signal is real -valued sequence as we described above, the

spectrum of the positive and negative frequencies will be symmetric,

therefore, we will only look at one side of the DFT result, and instead of

divide NN, we divide N/2N/2 to get the amplitude corresponding to the

time domain signal.

Now that we have the basic knowledge of DFT, let’s see how we can use

it.

TRY IT! Generate 3 sine waves with frequencies 1 Hz, 4 Hz, and 7 Hz,

amplitudes 3, 1 and 0.5, and phase all zeros. Add this 3 sine waves

together with a sampling rate 100 Hz, you will see that it is the same

signal we just shown at the beginning of the section.

munotes.in

## Page 22

Image Processing L ab

22 import matplotlib.pyplot as plt

import numpy as np

plt.style.use('seaborn -poster')

%matplotlib inline

# sampling rate

sr = 100

# sampling interval

ts = 1.0/sr

t = np.arange(0,1,ts)

freq = 1.

x = 3*np.sin(2*np.pi*freq*t)

freq = 4

x += np.sin(2*np.pi*freq*t)

freq = 7

x += 0.5* np.sin(2*np.pi*fr eq*t)

plt.figure(figsize = (8, 6))

plt.plot(t, x, 'r')

plt.ylabel('Amplitude')

plt.show()

Output:

munotes.in

## Page 23

Discrete Fourier Transform

23 TRY IT! Write a function DFT(x) which takes in one argument, x - input

1 dimensional real -valued signal. The function will calculate the DFT of

the signal and return the DFT values. Apply this function to the signal we

generated above and plot the result.

def DFT(x):

"""

Function to calculate the

discrete Fourier Transform

of a 1D real -valued signal x

"""

N = len(x)

n = np.aran ge(N)

k = n.reshape((N, 1))

e = np.exp( -2j * np.pi * k * n / N)

X = np.dot(e, x)

return X

X = DFT(x)

# calculate the frequency

N = len(X)

n = np.arange(N)

T = N/sr

freq = n/T

plt.figure(figsize = (8, 6))

plt.stem(freq, abs(X), 'b' , \

markerfmt=" ", basefmt=" -b")

plt.xlabel('Freq (Hz)')

plt.ylabel('DFT Amplitude |X(freq)|')

plt.show()

munotes.in

## Page 24

Image Processing L ab

24 Output:

We can see from here that the output of the DFT is symmetric at half of

the sampling rate (you can try different sampling rate to test). This half of

the sampling rate is called Nyquist frequency or the folding frequency, it

is named after the electronic engineer Harry Nyquist. He and Claude

Shannon have the Nyquist -Shannon sampling theorem, which states that a

signal sampled at a rate can be fully reconstructed if it contains only

frequency components below half that sampling frequency, thus the

highest frequency output from the DFT is half the sampling rate.

n_oneside = N//2

# get the one side frequency

f_oneside = freq[:n_onesid e]

# normalize the amplitude

X_oneside =X[:n_oneside]/n_oneside

plt.figure(figsize = (12, 6))

plt.subplot(121)

plt.stem(f_oneside, abs(X_oneside), 'b', \

markerfmt=" ", basefmt=" -b")

plt.xlabel('Freq (Hz)')

plt.ylabel('DFT Amplitude |X(freq)|')

plt.subplot(122) munotes.in

## Page 25

Discrete Fourier Transform

25 plt.stem(f_oneside, abs(X_oneside), 'b', \

markerfmt=" ", basefmt=" -b")

plt.xlabel('Freq (Hz)')

plt.xlim(0, 10)

plt.tight_layout()

plt.show()

Output:

We can see by plotting the first half of the DFT results, we can see 3 clear

peaks at frequency 1 Hz, 4 Hz, and 7 Hz, with amplitude 3, 1, 0.5 as

expected. This is how we can use the DFT to analyze an arbitrary signal

by decomposing it to simple sine waves.

The inverse DFT

Of course, we can do the inverse transform of the DFT easily.

xn=1N ∑k=0N −1Xk ei2πkn/Nxn=1N∑k=0N ⋅⋅ −1Xk ei2πkn/N

We will leave this as an exercise for you to write a function.

The limit of DFT

The main issue with the above DFT implementation is that it is not

efficient if we have a signal with many data points. It may take a long time

to compute the DFT if the signal is large.

TRY IT Write a function to generate a simple signal with different

sampling rate, and see the difference of computing time by varying the

sampling rate.

def gen_sig(sr):

''' munotes.in

## Page 26

Image Processing L ab

26 function to ge nerate

a simple 1D signal with

different sampling rate

'''

ts = 1.0/sr

t = np.arange(0,1,ts)

freq = 1.

x = 3*np.sin(2*np.pi*freq*t)

return x

# sampling rate =2000

sr = 2000

%timeit DFT(gen_sig(sr))

Output:

120 ms ± 8.27 ms p er loop (mean ± std. dev. of 7 runs, 10 loops each)

# sampling rate 20000

sr = 20000

%timeit DFT(gen_sig(sr))

Output:

15.9 s ± 1.51 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

Example 1 :

# import sympy

from sympy import fft

# sequence

seq = [15, 21, 13, 44]

# fft

transform = fft(seq)

print (transform)

Output :

FFT : [93, 2 - 23*I, -37, 2 + 23*I] munotes.in

## Page 27

Discrete Fourier Transform

27

Example 2 :

# import sympy

from sympy import fft

# sequence

seq = [15, 21, 13, 44]

decimal_point = 4

# fft

transform = fft(seq, decimal_point )

print ("F FT : ", transform)

Output :

FFT : [93, 2.0 - 23.0*I, -37, 2.0 + 23.0*I]

Fast Fourier Transform (FFT)

The Fast Fourier Transform (FFT) is an efficient algorithm to calculate

the DFT of a sequence. It is described first in Cooley and Tukey’s classic

paper in 1965, but the idea actually can be traced back to Gauss’s

unpublished work in 1805. It is a divide and conquer algorithm that

recursively breaks the DFT into smaller DFTs to bring down the

computation. As a result, it successfully reduces the complexit y of the

DFT from O(n2)O(n2) to O(nlogn)O(nlogn), where nn is the size of the

data. This reduction in computation time is significant especially for data

with large NN, therefore, making FFT widely used in engineering, science

and mathematics. The FFT algo rithm is the Top 10 algorithm of 20th

century by the journal Computing in Science & Engineering.

In this section, we will introduce you how does the FFT reduces the

compu tation time. The content of this section is heavily based on this great

tutorial put together by Jake VanderPlas .

Symmetries in the DFT

The answer to how FFT speedup the computing of DFT lies in the

exploitation of the symmetries in the DFT. Let’s take a look of the

symmetries in the DFT. From the definition of the DFT equation

Xk=∑n=0N −1xne−i2 πkn/NXk=∑n=0N −1xne−i2 πkn/N

we can calculate the

Xk+N= ∑n=0N −1xne−i2 π(k+N)n/N=∑n=0N −1xne−i2 πne−i2πkn/NXk+N

=∑n=0N −1xne−i2 π(k+N)n/N=∑n=0N −1xne−i2 πne−i2πkn/N

Note that, e−i2πn=1e −i2πn=1, therefore, we have munotes.in

## Page 28

Image Processing L ab

28 Xk+N= ∑n=0N −1xn e−i2πkn/N=XkXk+N=∑n=0N −1xne −i2πkn/N=Xk

with a little extension, we can have

Xk+iN=Xk, for any integer iXk+iN=Xk, for any integer i

This means that within the DFT, we clearly have some symmetries that we

can use to reduce the computation.

Tricks in FFT

Since we know there are symmetr ies in the DFT, we can consider to use it

reduce the computation, because if we need to calculate both XkXk

and Xk+NXk+N, we only need to do this once. This is exactly the idea

behind the FFT. Cooley and Tukey showed that we can calculate DFT

more efficie ntly if we continue to divide the problem into smaller ones.

Let’s first divide the whole series into two parts, i.e. the even number part

and the odd number part:

Xk=== ∑n=0N −1xn e−i2πkn/N∑m=0N/2 −1x2m e−i2πk(2m)/N+∑m=0

N/2−1x2m+1 e−i2πk(2m+1)/N∑m=0N/2 −1x2m e−i2πkm/(N/2)+e −i2π

k/N∑m=0N/2 −1x2m+1 e−i2πkm/(N/2)Xk=∑n=0N −1xn e−i2πkn/N=∑

m=0N/2 −1x2m e−i2πk(2m)/N+∑m=0N/2 −1x2m+1 e−i2πk(2m+1)/N=

∑m=0N/2 −1x2m e−i2πkm/(N/2)+e −i2πk/N∑m=0N /2−1x2m+1 e−i2πk

m/(N/2)

We can see that, the two smaller terms which only have half of the size

(N2N2) in the above equation are two smaller DFTs. For each term,

the 0≤m≤N20≤m≤N2, but 0≤k≤N0≤k≤N, therefore, we can see that half of

the values will be the sa me due to the symmetry properties we described

above. Thus, we only need to calculate half of the fields in each term. Of

course, we don’t need to stop here, we can continue to divide each term

into half with the even and odd values until it reaches the la st two

numbers, then calculation will be really simple.

This is how FFT works using this recursive approach. Let’s see a quick

and dirty implementation of the FFT. Note that, the input signal to FFT

should have a length of power of 2. If the length is not, usually we need to

fill up zeros to the next power of 2 size.

import matplotlib.pyplot as plt

import numpy as np

plt.style.use('seaborn -poster')

%matplotlib inline

def FFT(x):

"""

munotes.in

## Page 29

Discrete Fourier Transform

29 A recursive implementation of

the 1D Cooley -Tukey FFT, the

input should have a length of

power of 2.

"""

N = len(x)

if N == 1:

return x

else:

X_even = FFT(x[::2])

X_odd = FFT(x[1::2])

factor = \

np.exp( -2j*np.pi*np.arange(N)/ N)

X = np.concatenate( \

[X_even+factor[:int(N/2)]*X_odd,

X_even+factor[int(N/2):]*X_odd])

return X

# sampling rate

sr = 128

# sampling interval

ts = 1.0/sr

t = np.arange(0,1,ts)

freq = 1.

x = 3*np.sin(2*np.pi* freq*t)

freq = 4

x += np.sin(2*np.pi*freq*t)

freq = 7

x += 0.5* np.sin(2*np.pi*freq*t)

plt.figure(figsize = (8, 6))

plt.plot(t, x, 'r')

plt.ylabel('Amplitude')

plt.show() munotes.in

## Page 30

Image Processing L ab

30 TRY IT! Use the FFT function to calculate the Fourier transform of the

above signa l. Plot the amplitude spectrum for both the two -sided and one -

side frequencies.

X=FFT(x)

# calculate the frequency

N = len(X)

n = np.arange(N)

T = N/sr

freq = n/T

plt.figure(figsize = (12, 6))

plt.subplot(121)

plt.stem(freq, abs(X), 'b', \

marker fmt=" ", basefmt=" -b")

plt.xlabel('Freq (Hz)')

plt.ylabel('FFT Amplitude |X(freq)|')

# Get the one -sided specturm

n_oneside = N//2

# get the one side frequency

f_oneside = freq[:n_oneside]

# normalize the amplitude

X_oneside =X[:n_oneside]/n_oneside

plt.su bplot(122)

plt.stem(f_oneside, abs(X_oneside), 'b', \

markerfmt=" ", basefmt=" -b")

plt.xlabel('Freq (Hz)')

plt.ylabel('Normalized FFT Amplitude |X(freq)|')

plt.tight_layout()

plt.show()

munotes.in

## Page 31

Discrete Fourier Transform

31 TRY IT! Generate a simple signal for length 2048, and time how long it

will run the FFT and compare the speed with the DFT.

def gen_sig(sr):

'''

function to generate

a simple 1D signal with

different sampling rate

'''

ts = 1.0/sr

t = np.arange(0,1,ts)

freq = 1.

x = 3*np.sin(2*np .pi*freq*t)

return x

# sampling rate =2048

sr = 2048

%timeit FFT(gen_sig(sr))

16.9 ms ± 1.3 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

We can see that, for a signal with length 2048 (about 2000), this

implementation of F

Example 1:

# impo rt sympy

from sympy import ifft

# sequence

seq = [15, 21, 13, 44]

# fft

transform = ifft(seq)

print ("Inverse FFT : ", transform)

Output:

Inverse FFT : [93/4, 1/2 + 23*I/4, -37/4, 1/2 - 23*I/4]

munotes.in

## Page 32

Image Processing L ab

32 Example 2:

# import sympy

from sympy import ifft

# sequence

seq = [15, 21, 13, 44]

decimal_point = 4

# fft

transform = ifft(seq, decimal_point )

print ("Inverse FFT : ", transform)

Output:

Inverse FFT : [23.25, 0.5 + 5.75*I, -9.250, 0.5 - 5.75*I]

munotes.in

## Page 33

33 Experiment III

3

DISCRETE COSINE TRANSFORM

Aim: To find DCT forward and inverse transform of image.

Theory:

DCT: Discrete cosine transform.

IDCT: Inverse discrete cosine transform.

The DCT (Discrete Cosine Transform)

An explanation and illustration of the math behind the Discrete Cosine

Transform and the concepts used in lossy JPEG image compression - low

pass filtering.

In [23]:

# imports

import numpy as np

from numpy import *

import matplotlib.pyplot as plt

from matplotlib.pyplot import *

import matplotl ib.image as mpimg

%matplotlib inline

# software versions:

# python 3.6, numpy 1.15, matplotlib 3.0.2, Pillow 5.4.1 (python imaging

library)

The basic linear algebra with N = 2

You can think of a vector - a list of numbers - as coefficients times basis

vectors.

f0[10]+f1[01]f0[10]+f1[01]

Using a different basis, different coefficients can describe the same vector.

G012 –√[11]+G112 –√[1−1]G012[11]+G112[1−1]

(The sqrt(2)'s give the basis vectors length 1, i.e. "normalizes" them.) munotes.in

## Page 34

Image Processing Lab

34 This transormation f to G is a DCT (Discrete Cosine Transform). For a

vector with 2 components, this perhaps isn't all that exciting, but does stil l

transform the original (f0,f1)(f0,f1) into low and high frequency

components (G0,G1)(G0,G1).

the matrix math

This transform can be written as a matrix multiplication.

f0[10]+f1[01]=[f0f1]=G012 –√[11]+G112 –√[1−1]=12 –

√[111 −1][G0G1]f0[10]+f1[01]=[f0f1]=G012[ 11]+G112[1 −1]=12[111−1][

G0G1]

Moreover, this orthnormal matrix has the interesting and useful property

that its transpose is its inverse. That makes the equation easy to invert.

two dimensions

The same idea can be applied to 2D images rather than 1D vector s, by

applying the 1D transform to each row and column of the image.

The 2D basis images for N=2 are then the outer products of the 1D basis

vectors. From lowest (0,0) to highest (1,1) spatial frequency these basis

images are :

In [2]:

basis = (1/sqrt(2) * array([1, 1]), 1/sqrt(2) * array([1, -1]))

for i in [0,1]:

for j in [0,1]:

print(" {}, {} :".format(i,j))

print(outer(basis[i], basis[j]))

print()

0, 0 :

[[0.5 0.5]

[0.5 0.5]]

0, 1 :

[[ 0.5 -0.5]

[ 0.5 -0.5]]

1, 0 :

[[ 0.5 0. 5] munotes.in

## Page 35

Discrete cosine transform

35 [-0.5 -0.5]]

1, 1 :

[[ 0.5 -0.5]

[-0.5 0.5]]

In [3]:

# The 8 x 8 DCT matrix thus looks like this.

N = 8

dct = np.zeros((N, N))

for x in range(N):

dct[0,x] = sqrt(2.0/N) / sqrt(2.0)

for u in range(1,N):

for x in range(N):

dct[u,x] = sq rt(2.0/N) * cos((pi/N) * u * (x + 0.5) )

np.set_printoptions(precision=3)

dct

Out[3]:

array([[ 0.354, 0.354, 0.354, 0.354, 0.354, 0.354, 0.354, 0.354],

[ 0.49 , 0.416, 0.278, 0.098, -0.098, -0.278, -0.416, -0.49 ],

[ 0.462, 0.191, -0.191, -0.462, -0.462, -0.191, 0.191, 0.462],

[ 0.416, -0.098, -0.49 , -0.278, 0.278, 0.49 , 0.098, -0.416],

[ 0.354, -0.354, -0.354, 0.354, 0.354, -0.354, -0.354, 0.354],

[ 0.278, -0.49 , 0.098, 0.416, -0.416, -0.098, 0.49 , -0.278],

[ 0.191, -0.462, 0.462, -0.191, -0.191, 0.462, -0.462, 0.191],

[ 0.098, -0.278, 0.416, -0.49 , 0.49 , -0.416, 0.278, -0.098]])

The corresponding eight 1D basis vectors (the matrix rows) oscillate with

successively hi gher spatial frequencies.

In [4]:

# Here's what they look like.

figure(figsize=(9,12))

for u in range(N): munotes.in

## Page 36

Image Processing Lab

36 subplot(4, 2, u+1)

ylim(( -1, 1))

title(str(u))

plot(dct[u, :])

plot(dct[u, :],'ro')

Lik

e the N=2 case, the vectors are orthnormal . In other words, their dot

products are 0, and each has length 1. Here are a few illustrative products.

munotes.in

## Page 37

Discrete cosine transform

37 In [5]:

def rowdot(i,j):

return dot(dct[i, :], dct[j, :])

rowdot(0,0), rowdot(3,3), rowdot(0,3), rowdot(1, 7), rowdot(1,5)

Out[5]:

(0.99999999999 99998,

0.9999999999999998,

6.938893903907228e -17,

1.942890293094024e -16,

-2.498001805406602e -16)

This also implies the inverse of this matrix is just its transpose.

In [6]:

dct_transpose = dct.transpose()

dct_transpose

Out[6]:

array([[ 0.354, 0.49 , 0.462, 0.416, 0.354, 0.278, 0.191, 0.098],

[ 0.354, 0.416, 0.191, -0.098, -0.354, -0.49 , -0.462, -0.278],

[ 0.354, 0.278, -0.191, -0.49 , -0.354, 0.098, 0.462, 0.416],

[ 0.354, 0.098, -0.462, -0.278, 0.354, 0.416, -0.191 , -0.49 ],

[ 0.354, -0.098, -0.462, 0.278, 0.354, -0.416, -0.191, 0.49 ],

[ 0.354, -0.278, -0.191, 0.49 , -0.354, -0.098, 0.462, -0.416],

[ 0.354, -0.416, 0.191, 0.098, -0.354, 0.49 , -0.462, 0.278],

[ 0.354, -0.49 , 0 .462, -0.416, 0.354, -0.278, 0.191, -0.098]])

In [7]:

# Is the dot product of dct and its transpose the identity?

maybe_identity = dot(dct, dct_transpose)

# Since there are many nearly zero like 3.2334e -17 in this numerical

result,

# the output will look much nicer if we round them all of to (say) 6 places.

roundoff = vectorize( lambda m: round(m, 6)) munotes.in

## Page 38

Image Processing Lab

38 roundoff(maybe_identity)

Out[7]:

array([[ 1., 0., -0., 0., 0., 0., -0., -0.],

[ 0., 1., 0., -0., 0., -0., 0., 0.],

[ -0., 0., 1., 0. , -0., 0., 0., 0.],

[ 0., -0., 0., 1., 0., 0., -0., 0.],

[ 0., 0., -0., 0., 1., 0., -0., -0.],

[ 0., -0., 0., 0., 0., 1., 0., -0.],

[ -0., 0., 0., -0., -0., 0., 1., 0.],

[ -0., 0., 0., 0., -0., -0., 0., 1.]])

playing with a real image

To make all this more concrete, let's apply the 2D DCT transform to part

of a real image.

Here's one, takenly randomly from the web.

In [10]:

# See http://matplotlib.org/users/image_tutorial.html for the image

manipul ation syntax.

# The image itself is a small piece from http://www.cordwainer -

smith.com/virgil_finlay.htm.

img = mpimg.imread('stormplanet112.jpg')

plt.imshow(img)

Out[10]:

munotes.in

## Page 39

Discrete cosine transform

39 In [11]:

# The image itself contain s 3 dimensions: rows, columns, and colors

img.shape

Out[11]:

(112, 112, 3)

All three of the R,G,B color values in the greyscale image are the same for

each pixel.

Let's just look at values from one tiny 8 x 8 block (which is what's used

JPEG compression) n ear his nose.

(The next images use a false color spectrum to display pixel intensity.)

In [12]:

tiny = img[40:48, 40:48, 0] # a tiny 8 x 8 block, in the color=0 (Red)

channel

def show_image(img):

plt.imshow(img)

plt.colorbar()

show_image(tiny)

In [13]:

# And here are the numbers. munotes.in

## Page 40

Image Processing Lab

40 tiny

Out[13]:

array([[179, 140, 138, 101, 110, 135, 143, 144],

[ 76, 64, 91, 110, 113, 109, 104, 118],

[ 78, 68, 40, 34, 33, 66, 90, 105],

[209, 204, 168, 163, 132, 100, 73, 57],

[ 219, 231, 221, 227, 226, 205, 172, 130],

[215, 213, 217, 223, 232, 224, 217, 203],

[181, 202, 233, 214, 207, 226, 235, 235],

[ 69, 44, 62, 66, 83, 129, 153, 182]], dtype=uint8)

Now we define the 2D version of the N=8 DCT described above.

The trick is to apply the 1D DCT to every column, and then also apply it

to every row, i.e.

G=DCT ⋅f⋅DCTTG=DCT ⋅f⋅DCTT

In [14]:

def doDCT(grid):

return dot(dot(dct, grid), dct_transpose)

def undoDCT(grid):

return dot(dot(dct_transpose, grid), dct)

# test : do DCT, then undo DCT; should get back the same image.

tiny_do_undo = undoDCT(doDCT(tiny))

show_image(tiny_do_undo) # Yup, looks the same. munotes.in

## Page 41

Discrete cosine transform

41

In [15]:

# And the numbers are the same.

tiny_do_undo

Out[15]:

array([[179., 140., 138., 101., 110., 135., 143., 144.],

[ 76., 64., 91., 110., 113., 109., 104., 118.],

[ 78., 68., 40., 34., 33., 66., 90., 105.],

[209., 204., 168., 163., 132., 100., 73., 57.],

[219., 231., 221., 227., 226., 205., 172., 130.],

[2 15., 213., 217., 223., 232., 224., 217., 203.],

[181., 202., 233., 214., 207., 226., 235., 235.],

[ 69., 44., 62., 66., 83., 129., 153., 182.]])

The DCT transform looks like this. Note that most of the intensity is at the

top left, in the lowest frequencies.

In [16]:

tinyDCT = doDCT(tiny)

show_image(tinyDCT) munotes.in

## Page 42

Image Processing Lab

42

In [17]:

set_printoptions(linewidth=100) # output line width (default is 75)

round6 = vectorize( lambda m: '{:6.1f} '.format(m))

round6(tinyDCT)

Out[17]:

array([['1173.9', ' 3.6', ' 19.8', ' 12.3', ' -5.4', ' 8.2', ' 10.3', ' -0.0'],

[' -225.9', ' 64.1', ' 24.2', ' 12.2', ' 9.9', ' -0.2', ' 0.0', ' 0.1'],

[' -122.7', ' -161.8', ' 63.2', ' -15.0', ' 0.3', ' 11.1', ' 28.5', ' 10.7'],

[' 341.9', ' 50.8', ' -48.4', ' 12.0', ' -10.2', ' -0.4', ' 0.1', ' 12.1'],

[' -20.1', ' 80.2', ' 6.9', ' 22.1', ' 0.1', ' -0.1', ' -0.0', ' -0.3'],

[' 74.4', ' 69.9', ' 32.9', ' -13.0', ' -16.3', ' -0.4', ' -0.2', ' -0.0'],

[ '-100.6', ' -38.9', ' 64.3', ' 17.2', ' -0.3', ' 0.5', ' -0.2', ' -0.1'],

[' 13.8', ' -36.5', ' 18.5', ' -0.4', ' -21.6', ' 0.1', ' 0.3', ' 0.2']],

dtype='

## Page 43

Discrete cosine transform

43

The grid positions in that last image correspond to spatial freq uencies,

with the lowest DC component at the top left, and the highest vertical and

horizontal frequency at the bottom right.

These 2D basis functions can be visualized with the image shown which is

from wikimedia commons .

The details of what I'm doing here don't really match the JPEG

transformations: I haven't done the color space transforms, and I haven't

handled the DC offsets as the JPEG spec does (which centers the values

around 0 explicit ly.)

But the concept is visible in the last two pictures: after the DCT, most of

the power is in fewer pixels, which are typically near the top left DC part

of the grid.

So here's a simple lossy "low pass filter" of the data : let's chop some of

the high f requency numbers. One (somewhat arbitrary) choice to to set the

frequencies higher than the (1,7) to (7,1) line, to zero.

This is a lossy transormation since we're throwing away information - it

can't be undone. But since there are fewer numbers, it's a fo rm of

compression.

munotes.in

## Page 44

Image Processing Lab

44 In [19]:

# First make a copy to work on.

tinyDCT_chopped = tinyDCT.copy()

# Then zero the pieces below the x + y = 8 line.

for x in range(N):

for u in range(N):

if x + u > 8:

tinyDCT_chopped[x,u] = 0.0

show_image( tinyDCT_chopped)

In [20]:

round6(tinyDCT_chopped)

# Notice all the zeros at the bottom right - those are the chopped high

frequences.

# We've essentially done a "low pass filter" on the spacial frequencies.

Out[20]:

array([['1173.9', ' 3.6', ' 19.8', ' 12.3', ' -5.4', ' 8.2', ' 10.3', ' -0.0'],

[' -225.9', ' 64.1', ' 24.2', ' 12.2', ' 9.9', ' -0.2', ' 0.0', ' 0.1'],

[' -122.7', ' -161.8', ' 63.2', ' -15.0', ' 0.3', ' 11.1', ' 28.5', ' 0.0'],

[' 341.9', ' 50.8', ' -48.4', ' 12.0', ' -10.2', ' -0.4', ' 0.0', ' 0.0'], munotes.in

## Page 45

Discrete cosine transform

45 [' -20.1', ' 80.2', ' 6.9', ' 22.1', ' 0.1', ' 0.0', ' 0.0', ' 0.0'],

[' 74.4', ' 69.9', ' 32.9', ' -13.0', ' 0.0', ' 0.0', ' 0.0', ' 0.0'],

[' -100.6' , ' -38.9', ' 64.3', ' 0.0', ' 0.0', ' 0.0', ' 0.0', ' 0.0'],

[' 13.8', ' -36.5', ' 0.0', ' 0.0', ' 0.0', ' 0.0', ' 0.0', ' 0.0']],

dtype='To see what this did to the original, we just transform it back.

In [21]:

tiny_chopped_float = undoDCT(tinyDCT_chopped)

# Also convert the floats back to uint8, which was the original format

tiny_chopped = vectorize( lambda x: uint8(x))(tiny_chopped_float)

show_image(tiny_chopped)

In [22]:

tiny_chopped

Out[22]:

array([[178, 14 0, 133, 109, 107, 135, 137, 147],

[ 76, 69, 90, 100, 107, 117, 110, 112],

[ 75, 61, 44, 39, 42, 56, 86, 107],

[214, 204, 169, 152, 131, 97, 78, 57], munotes.in

## Page 46

Image Processing Lab

46 [217, 227, 220, 230, 233, 206, 169, 125],

[211, 220, 221, 219 , 220, 223, 220, 206],

[186, 196, 223, 220, 214, 227, 229, 234],

[ 66, 46, 65, 63, 79, 129, 155, 181]], dtype=uint8)

And we have something close to the original back again - even though

close to half of the transformed image was set to ze ro.

conclusions

The procedue here isn't what happens in JPEG compression, but does

illustrate one of the central concepts - throwing away some of higher

spatial frequency information after a DCT transform.

In the real JPEG lossy compression algorithm, the steps are

● the color space is transformed from R,G,B to Y,Cb,Cr to take

advantage of human visual prejudices

● the values are shifted so that they center around zero

● the values after the DCT are "quantized" (i.e. rounded off) by different

amounts at different spots in the grid. (This* is the lossy step, and how

lossy depends on the JPEG quality.)

● a zigzag (keeping similar frequencies together) pattern turns this to a

1D stream of 64 values

● which are then huffman encoded by , typically by a pre -chosen code

(part of the JPEG standard

munotes.in

## Page 47

47 Experiment IV

4

IMAGE SEGMENTATION AND IMAGE

RESTORATION

Aim: The detection of discontinuities – Point, Line and Edge detections,

Hough transform, Thresholding, Region based segmentation chain codes.

Theory:

• This is usually accomplished by applying a suitable mask to the image.

• The mask output or response at each pixel is computed by centering

centering the mask on the pixel location.

• When the mask is centered at a point on the image boundary, the

mask response or output is computed using suitable boundary

condition. Usually, the mask is truncated.

Point Detection

This is used to detect isolated spots in an image.

• The graylevel of an isolated point will be very different from its

neighbors.

• It can be accomplished using the following 3×3 mask: munotes.in

## Page 48

Image Processing Lab

48

The output of the mask operation is usually thresholded.

• We say that an isolated point has been detected if

for some pre -specified non -negative threshold T.

Detection of lines

• This is used to detect lines in an image.

• It can be done using the following four masks:

• Let 0 R , 45 R , 90 R , and 135 R , respectively be the response to

masks 0 D , 45 D , 90 D , and 135 D , respectively. At a given pixel munotes.in

## Page 49

Image segmentation and Image

Restoration

49 (m,n), if 135 R is the maximum among { 0 R , 45 R , 90 R , 135 R },

we say that a 135 line is most likely passing through that pixel

Edge Detection

• Isolated points and thin lines do not occur frequently in most practical

applications.

• For image segmentation, we are mostly interested in detecting the

boundary between two regions with rela tively distinct gray -level

properties.

• We assume that the regions in question are sufficiently homogeneous so

that the transition between two regions can be determined on the basis of

gray-level discontinuities alone.

• An edge in an image may be defin ed as a discontinuity or abrupt change

in gray level. munotes.in

## Page 50

Image Processing Lab

50

• These are ideal situations that do not frequently occur in practice.

Also, in two dimensions edges may occur at any orientation.

• Edges may not be represented by perfect discontinuities. Therefore,

the task of edge detection is much more difficult than what it looks

like.

• A useful mathematical tool for developing edge detectors is the first

and second derivative operators.

munotes.in

## Page 51

Image segmentation and Image

Restoration

51

• From the example above, it is clear that the magnitude of the first

derivativ e can be used to detect the presence of an edge in an image.

• The sign of the second derivative can be used to determine whether

an edge pixel lies on the dark or light side of an edge.

• The zero crossings of the second derivative provide a powerful way

of locating edges in an image.

• We would like to have small -sized masks in order to detect fine

variation in graylevel distribution (i.e., micro -edges). • munotes.in

## Page 52

Image Processing Lab

52 • On the other hand, we would like to employ large -sized masks in

order to detect coarse variation in graylevel distribution (i.e., macro -

edges) and filter -out noise and other irregularities.

• We therefore need to find a mask size, which is a compromise

betw een these two opposing requirements, or determine edge content

by using different mask sizes

• Most common differentiation operator is the gradient.

• The magnitude of the gradient is:

• The direction of the gradient is given by:

• In practice, we use discrete approximations of the partial derivatives

∂f/∂x and ∂f /∂y , which are implemented using the masks:

• The gradient can then be computed as follows:

munotes.in

## Page 53

Image segmentation and Image

Restoration

53 • Other discrete approximations to the gradient (more precisely, the

appropriate partial deriv atives) have been proposed (Roberts,

Prewitt).

• Because derivatives enhance noise, the previous operators may not

give good results if the input image is very noisy.

• One way to combat the effect of noise is by applying a smoothing

mask. The Sobel edge det ector combines this smoothing operation

along with the derivative operation give the following masks:

Since the gradient edge detection methodology depends only on the

relative magnitudes within an image, scalar multiplication by factors such

as 1/2 or 1 /8 play no essential role. The same is true for the signs of the

mask entries. Therefore, masks like correspond to the same detector,

namely the Sobel edge detector.

• However, when the exact magnitude is important, the proper scalar

multiplication factor should be used.

• All masks considered so far have entries that add up to zero. This is

typical of any derivative mask.

Example:

munotes.in

## Page 54

Image Processing Lab

54 The code will only compile in linux environment. Make sure that openCV

is installed in your system before you run the program.

Steps to download the requirements below:

● Run the following command on your terminal to install it from the

Ubuntu or Debian repository.

● sudo apt -get install libopencv -dev python -opencv

● OR In order to download OpenCV from the official site run the

following command:

● bash install -opencv.sh

● on your terminal.

● Type your sudo password and you will have installed OpenCV.

Principle behind Edge Detection

Edge detection involves mathematical methods to find points in an image

where the brightness of pixel intensities changes distinctly.

● The first thing we are going to do is find the gradient of the grayscale

image, allowing us to find edge -like regions in the x and y direction.

The gradient is a multi -variable generalization of the derivative. W hile

a derivative can be defined on functions of a single variable, for

functions of several variables, the gradient takes its place.

● The gradient is a vector -valued function, as opposed to a derivative,

which is scalar -valued. Like the derivative, the gra dient represents

the slope of the tangent of the graph of the function . More

precisely, the gradient points in the direction of the greatest rate of

increase of the function, and its magnitude is the slope of the graph in

that direction.

Note: In computer vision, transitioning from black -to-white is considered

a positive slope, whereas a transition from white -to-black is a negative

slope.

munotes.in

## Page 55

Image segmentation and Image

Restoration

55 # Python program to Edge detection

# using OpenCV in Python

# using Sobel edge detection

# and laplacian method

impor t cv2

import numpy as np

#Capture livestream video content from camera 0

cap = cv2.VideoCapture(0)

while(1):

# Take each frame

_, frame = cap.read()

# Convert to HSV for simpler calculations

hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

# Calculation of Sobelx

sobelx = cv2.Sobel(frame,cv2.CV_64F,1,0,ksize=5)

# Calculation of Sobely

sobely = cv2.Sobel(frame,cv2.CV_64F,0,1,ksize=5)

# Calculation of Laplacian

laplacian = cv2.Laplac ian(frame,cv2.CV_64F)

cv2.imshow('sobelx',sobelx)

cv2.imshow('sobely',sobely)

cv2.imshow('laplacian',laplacian)

k = cv2.waitKey(5) & 0xFF

if k == 27:

break

cv2.destroyAllWindows()

#release the frame

cap.release()

munotes.in

## Page 56

Image Processing Lab

56 Calculation of the derivative of an image

A digital image is represented by a matrix that stores the

RGB/BGR/HSV(whichever color space the image belongs to) value of

each pixel in rows and columns.

The derivative of a matrix is calculated by an operator called

the Laplacian . In order to calculate a Laplacian, you will need to calculate

first two derivatives, called derivatives of Sobel , each of which takes into

account the gradient variations in a certain direction: one horizontal, the

other vertical.

● Horizontal Sobel derivative (Sobel x) : It is obtained through the

convolution of the image with a matrix called kernel which has always

odd size. The kernel with size 3 is the simplest case.

● Vertical Sobel derivative (Sobel y) : It is obtained through the

convolution of the image with a matrix called kernel which has always

odd size. The kernel with size 3 is the simplest case.

● Convolution is calculated by the following method: Image represents

the original image matrix and filter is the kernel matrix.

● Factor = 11 – 2- 2- 2- 2- 2 = 3

Offset = 0

Weighted Sum = 124*0 + 19*( -2) + 110*( -2) + 53*11 + 44*( -2) +

19*0 + 60*( -2) + 100*0 = 117

O[4,2] = (117/3) + 0 = 39

So in the end to get the Laplacian (approximation) we will need to

combine the two previous resu lts (Sobelx and Sobely) and store it in

laplacian.

munotes.in

## Page 57

Image segmentation and Image

Restoration

57 Parameters:

● cv2.Sobel(): The function cv2.Sobel(frame,cv2.CV_64F,1,0,ksize=5)

can be written as cv2.Sobel (original_image, ddepth, xorder, yorder,

kernelsize)

● where the first parameter is the original image, the second parameter is

the depth of the destination image. When ddepth= -1/CV_64F, the

destination image will have the same depth as the source. The third

parameter is the order of the derivative x. The four th parameter is the

order of the derivative y. While calculating Sobelx we will set xorder

as 1 and yorder as 0 whereas while calculating Sobely, the case will be

reversed. The last parameter is the size of the extended Sobel kernel; it

must be 1, 3, 5, or 7.

● cv2.Laplacian : In the function

cv2.Laplacian(frame,cv2.CV_64F)

● the first parameter is the original image and the second parameter is

the depth of the destination image.When depth= -1/CV_64F, the

destination image will have the same depth as the source.

Edge Detection Applications

Reduce unnecessary information in an image while preserving the

structure of image.

● Extract important features of image like curves, corners and lines.

● Recognizes objects, boundaries and segmentation.

● Plays a major role in computer vision and recognition

Line detection in python with OpenCV | Houghline method

The Hough Transform is a method that is used in image processing to

detect any shape, if that shape can be represented in mathematical form. It

can detect the shape eve n if it is broken or distorted a little bit.

We will see how Hough transform works for line detection using the

HoughLine transform method. To apply the Houghline method, first an

edge detection of the specific image is desirable. For the edge detection

technique go through the article Edge detection

Basics of Houghline Method

A line can be represented as y = mx + c or in parametric form, as r = xcosθ

+ ysinθ where r is the perpendicular distance from origin to the line, and θ

is the angle formed by this perpendicular line and horizontal axis

measured in counter -clockwise ( That direction varies on how you

represent the coordinate system. This representation is used in OpenCV).

munotes.in

## Page 58

Image Processing Lab

58

So Any line can be represented in these two terms, (r, θ).

Working of Houghline method:

● First it creates a 2D array or accumulator (to hold values of two

parameters) and it is set to zero ini tially.

● Let rows denote the r and columns denote the (θ)theta.

● Size of array depends on the accuracy you need. Suppose you want the

accuracy of angles to be 1 degree, you need 180 columns(Maximum

degree for a straight line is 180).

● For r, the maximum dista nce possible is the diagonal length of the

image. So taking one pixel accuracy, number of rows can be diagonal

length of the image.

Example:

Consider a 100×100 image with a horizontal line at the middle. Take the

first point of the line. You know its (x,y) values. Now in the line equation,

put the values θ(theta) = 0,1,2,….,180 and check the r you get. For every

(r, 0) pair, you increment value by one in the accumulator in its

corresponding (r,0) cells. So now in accumulator, the cell (50,90) = 1

along with some other cells.

Now take the second point on the line. Do the same as above. Increment

the values in the cells corresponding to (r,0) you got. This time, the cell

(50,90) = 2. We are actually voting the (r,0) values. You continue this

process for every point on the line. At each point, the cell (50,90) will be

incremented or voted up, while other cells may or may not be voted up.

This way, at the end, the cell (50,90) will have maximum votes. So if you

search the accumulator for maximum votes, you get the value (50,90)

which says, there is a line in this image at distance 50 from origin and at

angle 90 degrees.

munotes.in

## Page 59

Image segmentation and Image

Restoration

59

Everything explained above is encapsulated in the OpenCV function,

cv2.HoughLines(). It simply returns an array of (r, 0) values. r i s measured

in pixels and 0 is measured in radians.

# Python program to illustrate HoughLine

# method for line detection

import cv2

import numpy as np

# Reading the required image in

# which operations are to be done.

# Make sure that the image is in the same

# directory in which this python program is

img = cv2.imread('image.jpg')

# Convert the img to grayscale

gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

# Apply edge detection method on the image

edges = cv2.Canny(gray,50,150,apertureSize = 3)

# This returns an array of r and theta values

lines = cv2.HoughLines(edges,1,np.pi/180, 200) munotes.in

## Page 60

Image Processing Lab

60 # The below for loop runs till r and theta values

# are in the range of the 2d array

for r_theta in lines[0]:

r,theta = r_theta[0]

# Stores the value of cos(theta) in a

a = np.cos(theta)

# Stores the value of sin(theta) in b

b = np.sin(theta)

# x0 stores the value rcos(theta)

x0 = a*r

# y0 stores the value rsin(theta)

y0 = b*r

# x1 stores the rounded off valu e of (rcos(theta) -1000sin(theta))

x1 = int(x0 + 1000*( -b))

# y1 stores the rounded off value of (rsin(theta)+1000cos(theta))

y1 = int(y0 + 1000*(a))

# x2 stores the rounded off value of (rcos(theta)+1000sin(theta))

x2 = int(x0 - 1000*( -b))

# y2 stores the rounded off value of (rsin(theta) -1000cos(theta))

y2 = int(y0 - 1000*(a))

# cv2.line draws a line in img from the point(x1,y1) to (x2,y2).

# (0,0,255) denotes the colour of the line to be

#drawn. In this case, it is red.

cv2.line(img,(x1,y1), (x2,y2), (0,0,255),2)

# All the changes made in the input image are finally

# written on a new image houghlines.jpg

cv2.imwrite('linesDetected.jpg', img) munotes.in

## Page 61

Image segmentation and Image

Restoration

61 Elaboration of function(cv2.HoughLines (edges,1,np.pi/180, 200)):

1. First parameter, Input image should be a binary image, so apply

threshold edge detection before finding applying hough transform.

2. Second and third parameters are r and θ(theta) accuracies respectively.

3. Fourth argument is the threshold, which means minimum vote it

should get for it to be considered as a line.

4. Remember, number of votes depend upon number of points on the

line. So it represents the minimum length of line that should be

detected.

munotes.in

## Page 62

Image Processing Lab

62 Alternate simpler method for directly extracting points:

Python3 import cv2

import numpy as np

# Read image

image = cv2.imread('path/to/image.png')

# Convert image to grayscale

gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)

# Use canny edge detection

edges = cv2.Canny(gray,50,150,apertureSize=3)

# Apply HoughLinesP method to

# to directly obtain line end points

lines = cv2.HoughLinesP(

edges, # Input edge image

1, # Distance resolution in pixels

np.pi/180, # Angle resolu tion in radians

threshold=100, # Min number of votes for valid line

minLineLength=5, # Min allowed length of line

maxLineGap=10 # Max allowed gap between line for joining them

)

# Iterate over points

for poi nts in lines:

# Extracted points nested in the list

x1,y1,x2,y2=points[0]

# Draw the lines joing the points

# On the original image

cv2.line(image,(x1,y1),(x2,y2),(0,255,0),2)

# Maintain a simples lookup list for points

lines_ list.append([(x1,y1),(x2,y2)])

# Save the result image

cv2.imwrite('detectedLines.png',image)

munotes.in

## Page 63

Image segmentation and Image

Restoration

63 Summarizing the process

In an image analysis context, the coordinates of the point(s) of edge

segments (i.e. X,Y ) in the image are known and therefore serve as

constants in the parametric line equation, while R(rho) and Theta(θ) are

the unknown variables we seek.

● If we plot the possible (r) values defined by each (theta), points in

cartesian image space map to curves (i.e. sinusoids) in the polar Hough

parameter space. This point -to-curve transformation is the Hough

transformation for straight lines.

● The transform is implemented by quantizing the Hough parameter

space into finite intervals or accumulator cells. As the algorithm runs,

each (X,Y) is transf ormed into a discretized (r,0) curve and the

accumulator(2D array) cells which lie along this curve are

incremented.

● Resulting peaks in the accumulator array represent strong evidence

that a corresponding straight line exists in the image.

Applications of Hough transform:

1. It is used to isolate features of a particular shape within an image.

2. Tolerant of gaps in feature boundary descriptions and is relatively

unaffected by image noise.

3. Used extensively in barcode scanning, verification and recognition

Thresh olding techniques using OpenCV

Thresholding is a technique in OpenCV, which is the assignment of pixel

values in relation to the threshold value provided. In thresholding, each

pixel value is compared with the threshold value. If the pixel value is

smalle r than the threshold, it is set to 0, otherwise, it is set to a maximum

value (generally 255). Thresholding is a very popular segmentation

technique, used for separating an object considered as a foreground from

its background. A threshold is a value which has two regions on its either

side i.e. below the threshold or above the threshold.

In Computer Vision, this technique of thresholding is done on grayscale

images. So initially, the image has to be converted in grayscale color

space.

If f (x, y) < T

then f (x, y) = 0

else

f (x, y) = 255 munotes.in

## Page 64

Image Processing Lab

64 where

f (x, y) = Coordinate Pixel Value

T = Threshold Value.

In Open CV with Python, the function cv2.threshold is used for

thresholding.

Syntax: cv2.threshold(source, threshold Value, max Val, thresholding

Technique)

Parameters:

-> source : Input Image array (must be in Grayscale).

-> thresholdValue : Value of Threshold below and above which pixel

values will change accordingly.

-> maxVal : Maximum value that can be assigned to a pixel.

-> thresholdingTechnique : The type of thresholding to be applied.

Simple Thresholding

The basic Thresholding technique is Binary Thresholding. For every pixel,

the same threshold value is applied. If the pixel value is smaller than the

threshold, it is set to 0, otherwise, it is set to a maximum value.

The different Simple Thresholding Techniques are:

● cv2.THRESH_BINARY : If pixel intensity is greater than the set

threshold, value set to 255, else set to 0 (black).

● cv2.THRESH_BINARY_INV : Inverted or Opp osite case of

cv2.THRESH_BINARY.

● cv.THRESH_TRUNC : If pixel intensity value is greater than

threshold, it is truncated to the threshold. The pixel values are set to be

the same as the threshold. All other values remain the same.

● cv.THRESH_TOZERO : Pixel inte nsity is set to 0, for all the pixels

intensity, less than the threshold value.

● cv.THRESH_TOZERO_INV : Inverted or Opposite case of

cv2.THRESH_TOZERO.

munotes.in

## Page 65

Image segmentation and Image

Restoration

65

Below is the Python code explaining different Simple Thresholding

Techniques –

● Python3

● # Python program to illustrate

● # simple thresholding type on an image

●

● # organizing imports

● import cv2

● import numpy as np

●

● # path to input image is specified and

● # image is loaded with imread command

● image1 = cv2.imread('input1.jpg')

●

● # cv2.cvtColor is appl ied over the

● # image input with applied parameters

● # to convert the image in grayscale

● img = cv2.cvtColor(image1, cv2.COLOR_BGR2GRAY)

munotes.in

## Page 66

Image Processing Lab

66

# applying different thresholding

# techniques on the input image

# all pixels value above 120 will

# be set to 255

ret, thresh1 = cv2.threshold(img, 120, 255, cv2.THRESH_BINARY)

ret, thresh2 = cv2.threshold(img, 120, 255, cv2.THRESH_BINARY_INV)

ret, thresh3 = cv2.threshold(img, 120, 255, cv2.THRESH_TRUNC)

ret, thresh4 = cv2.threshold(img, 120, 255, cv2.THRESH_TOZERO)

ret, thresh5 = cv2.threshold(img, 120, 255, cv2.THRESH_TOZERO_INV)

# the window showing output images

# with the corresponding thresholding

# techniques applied to the input images

cv2.imshow('Binary Threshold', thresh1)

cv2.imshow('Binary Threshold Inver ted', thresh2)

cv2.imshow('Truncated Threshold', thresh3)

cv2.imshow('Set to 0', thresh4)

cv2.imshow('Set to 0 Inverted', thresh5)

# De -allocate any associated memory usage

if cv2.waitKey(0) & 0xff == 27:

cv2.destroyAllWindows()

Region and Edge Based Segmentation

Segmentation

Segmentation is the separation of one or more regions or objects in an

image based on a discontinuity or a similarity criterion. A region in an

image can be defined by its border (edge) or its interior, and the two

represent ations are equal. There are prominently three methods of

performing segmentation:

● Pixel Based Segmentation

● Region -Based Segmentation

● Edges based segmentation munotes.in

## Page 67

Image segmentation and Image

Restoration

67 Edges based segmentation

Edge -based segmentation contains 2 steps:

● Edge Detection: In edge detection, we need to find the pixels that are

edge pixels of an object. There are many object detection methods

such as Sobel operator, Laplace operator, Canny, etc.

1 0 -1

2 0 -2

1 0 -1

Sobel vertical

Operator

+1 2 1

0 0 0

-1 -2 -1

Sobel Horizontal

Operator

0 -1 0

-1 4 -1

0 -1 0

Negative Laplace

Operator

● Edge Linking: In this step, we try to refine the edge detection by

linking the adjacent edges and combine to form the whole object. The

edge linking can be performed using any of the two methods below:

● Local Processing: In this method, we used gradient and direction

to link the neighborhood edges. If two edges have a similar

direction vector then they can be linked.

● Global processing: This method can be done using HOG

transformation

munotes.in

## Page 68

Image Processing Lab

68

● Pros :

● This approach is similar to how the humans brain approaches the

segmentation task.

● Works well in images with good contrast between object and

background.

● Limitations:

● Does not work well on images with smooth transitions and low

contrast.

● Sensitive t o noise.

● Robust edge linking is not trivial and easy to perform.

Region -Based Segmentation

In this segmentation, we grow regions by recursively including the

neighboring pixels that are similar and connected to the seed pixel. We use

similarity measures such as differences in gray levels for regions with

homogeneous gray levels. We use connectivity to prevent connecting

different parts of the image.

There are two variants of region -based segmentation:

● Top-down approach

● First, we need to define the predef ined seed pixel. Either we can

define all pixels as seed pixels or randomly chosen pixels. Grow

regions until all pixels in the image belongs to the region.

● Bottom -Up approach

● Select seed only from objects of interest. Grow regions only if the

similarity c riterion is fulfilled.

● Similarity Measures:

● Similarity measures can be of different types: For the grayscale

image the similarity measure can be the different textures and

other spatial properties, intensity difference within a region or the

distance b/w m ean value of the region.

● Region merging techniques:

● In the region merging technique, we try to combine the regions

that contain the single object and separate it from the

background.. There are many regions merging techniques such as

Watershed algorithm, S plit and merge algorithm, etc .

munotes.in

## Page 69

Image segmentation and Image

Restoration

69

● Pros:

● Since it performs simple threshold calculation, it is faster to

perform.

● Region -based segmentation works better when the object and

background have high contrast.

● Limitations:

● It did not produce many accurate segmentation results when there

are no significant differences b/w pixel values of the object and

the background.

Implementation:

● In this implementation, we will be performing edge and region -based

segmentation. We will be using scikit image module for that and an

image from its dataset provided.

● # code

● import numpy as np

● import matplotlib.pyplot as plt

● from skimage.feature import canny

● from skimage import data,morphology

● from skimage.color import rgb2gray

● import scipy.ndimage as nd

● plt.rcParams["figure. figsize"] = (12,8)

● %matplotlib inline

● # load images and convert grayscale

● rocket = data.rocket()

● rocket_wh = rgb2gray(rocket)

● # apply edge segmentation

● # plot canny edge detection

● edges = canny(rocket_wh)

● plt.imshow(edges, interpolation='gaussian')

● plt.title('Canny detector')

● # fill regions to perform edge segmentation

● fill_im = nd.binary_fill_holes(edges)

● plt.imshow(fill_im)

● plt.title('Region Filling') munotes.in

## Page 70

Image Processing Lab

70 ● # Region Segmentation

● # First we print the elevation map

● elevation_map = sobel(rocket_wh)

● plt.ims how(elevation_map)

# Since, the contrast difference is not much. Anyways we will perform it

markers = np.zeros_like(rocket_wh)

markers[rocket_wh < 0.1171875] = 1 # 30/255

markers[rocket_wh > 0.5859375] = 2 # 150/255

plt.imshow(markers)

plt.title('marke rs')

# Perform watershed region segmentation

segmentation = morphology.watershed(elevation_map, markers)

plt.imshow(segmentation)

plt.title('Watershed segmentation')

# plot overlays and contour

segmentation = nd.binary_fill_holes(segmentation - 1)

label_rock, _ = nd.label(segmentation)

# overlay image with different labels

image_label_overlay = label2rgb(label_rock, image=rocket_wh)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(24, 16), sharey=True)

ax1.imshow(rocket_wh)

ax1.contour(segmentation, [0.8], linewidths=1.8, colors='w')

ax2.imshow(image_label_overlay)

fig.subplots_adjust(**margins)

munotes.in

## Page 71

Image segmentation and Image

Restoration

71

Output:

munotes.in

## Page 72

Image Processing Lab

72

Elevation

maps

munotes.in

## Page 73

Image segmentation and Image

Restoration

73

munotes.in

## Page 74

Image Processing Lab

74

munotes.in

## Page 75

75 Experiment V

5

IMAGE DATA COMPRESSION

Aim: Fundamentals of compression, Basic compression methods.

Theory:

In the field of Image processing, the compression of images is an

important step before we start the processing of larger images or videos.

The compression of images is carried out by an encoder and output a

compressed form of an image. In the processes of compression, the

mathematical transforms play a vital role. A flow chart of the process of

the compression of the image can be represented as:

The general representation of the image in a computer is like a vector of

pixels. Each pixel is represented by a fixed number of bits. These bits

determine the intensity of the color (on grayscale if a black and white

image and has three channels of RGB if colored images.)

Why Do We Need Image Compression?

Consider a black and white image that has a resolution of 1000*1000 and

each pixel uses 8 bits to represent the intensity. So the total no of bits req=

1000*1000*8 = 80,00,000 bits per image. And consider if it is a video

with 30 frames per second of the above -mentioned type images then

the total bits for a video of 3 secs is: 3*(30*(8, 000, 000))=720, 000, 000

bits

As we see just to store a 3 -sec video we need so many bits which is very

huge. So, we need a way to have proper representation as well to store the

information about the image in a minimum no of bits without losing the

character of the image. Thus, image compression plays an important role.

Basic steps in image compression:

● Applying t he image transform

● Quantization of the levels

● Encoding the sequences.

● Transforming The Image munotes.in

## Page 76

Image Processing Lab

76 What is a transformation(Mathematically)?

It a function that maps from one domain(vector space) to another

domain(other vector space). Assume, T is a transform , f(t):X ->X’ is a

function then, T(f(t)) is called the transform of the function.

We generally carry out the transformation of the function from one vector

space to the other because when we do that in the newly projected vector

space we infer more information about the function.

A real life example of a transform:

Here we can say that the prism is a transformation function in which it

splits the white light (f(t)) into its components i.e the representation of the

white light.

And we observe that we can infer more information about the light in its

component representation than the white light one. This is how transforms

help in understanding the functions in an efficient manner.

Transforms in Image Processing

The image is also a function of the location of the pixels. i.e I(x, y) where

(x, y) are the coordinates of the pixel in the image. So, we generally

transform an image from the spatial domain to the frequency domain.

Why Transformation of the Image is Important?

It becomes easy to kno w what all the principal components that make up

the image and help in the compressed representation.

It makes the computations easy. munotes.in

## Page 77

Image Data Compression

77 Example: finding convolution in the time domain before the

transformation:

Finding convolution in the frequency domain after the transformation:

So we can see that the computation cost has reduced as we switched to the

frequency domain. We can also see that in the time domain the

convolution was equivalent to an integration operator but in the frequency

domain, it beco mes equal to the simple product of terms. So, this way the

cost of computation reduces.

So this way when we transform the image from domain to the other

carrying out the spatial filtering operations becomes easier.

Quantization

The process quantization is a vital step in which the various levels of

intensity are grouped into a particular level based on the mathematical

function defined on the pixels. Generally, the newer level is determined by

taking a fixed filter size of “m” and dividing each of the “m” terms of the

filter and rounding it its closest integer and again multiplying with “m”.

Basic quantization Function: [pixelvalue/m] * m

So, the closest of the pixel values approximate to a single level hence as

the no of distinct levels involved in the i mage becomes less. Hence we

reduce the redundancy in the level of the intensity. So thus quantization

helps in reducing the distinct levels.

Eg: (m=9)

Thus, we see in the above example both the intensity values round up to

18 thus we reduce the number of distinct levels(characters involved) in the

image specification.

munotes.in

## Page 78

Image Processing Lab

78 Symbol Encoding

The symbol stage involves where the distinct characters involved in the

image are encoded in a way that the no. of bits required to represent a

character is optimal based on the frequency of the character’s occurrence.

In simple terms, In this stage codewords are generated for the different

characters present. By doing so we aim to reduce the no. of bits required to

represent the intensity levels and represent them in an op timum number of

bits.

There are many encoding algorithms. Some of the popular ones are:

Huffman variable -length encoding.

Run-length encoding.

In the Huffman coding scheme, we try to find the codes in such a way that

none of the codes are the prefixes t o the other. And based on the

probability of the occurrence of the character the length of the code is

determined. In order to have an optimum solution the most probable

character has the smallest length code.

Example:

We see the actual 8 -bit representation as well as the new smaller length

codes. The mechanism of generation of codes is: munotes.in

## Page 79

Image Data Compression

79

So we see how the storage requirement for the no of bits is decreased as:

Initial representation –average code length: 8 bits per inte nsity level.

After encoding –average code length:

(0.6*1)+(0.3*2)+(0.06*3)+(0.02*4)+(0.01*5)+(0.01*5)=1.56 bits per

intensity level

Thus the no of bits required to represent the pixel intensity is drastically

reduced.

Thus in this way, the mechanism of quantization helps in compression.

When the images are once compressed its easy for them to be stored on a

device or to transfer them. And based on the type of transforms used, type

of quantization, and the encoding scheme the decoders are designed based

on the reversed logic of the compression so that the original image can be

re-built based on the data obtained out of the compressed images.

There are organizations who receive data form lakhs or more persons,

which is mostly in form of text, with a few images. Most of you know that

the text part is stored in databases in the form of tables, but what about the

images? The images are small compared to the textual data but constitute a

much higher space in terms of storage. Hence, to save on the part of spa ce

and keep running the processes smoothly, they ask the users to submit the

compressed images. As most of the readers have a bit of CS

background(either in school or college), they understand that using online

free tools to compress images is not a good p ractice for them.

Till Windows 7, Microsoft used to give MS Office Picture Manager which

could be used to compress images till an extent, but it also had some

limitations. munotes.in

## Page 80

Image Processing Lab

80 Those who know a bit of python can install python and use pip

install pillow in command prompt(terminal for Linux users) to install

pillow fork .

You’ll get a screen like this

Assemble all the files in a folder and keep the file Compress.py in the

same folder.

Run the python file with python.

Below is the Source Code of the file :

# run this in any directory

# add -v for verbose

# get Pillow (fork of PIL) from

# pip before running -->

# pip install Pillow

# import required libraries

import os

import sys

from PIL import Image

# define a function for

# compressing an image

def compressMe(file, verbose = False):

# Get the path of the file

filepath = os.path.join (os.getcwd(), file)

# open the image

picture = Image.open(filepath) munotes.in

## Page 81

Image Data Compression

81 # Save the picture with desired quality

# To change the quality of image,

# set the quality variable at

# your desired level, The more

# the value of quality variable

# and lesser the compression

picture.save("Compressed _"+file,

"JPEG",

optimize = True,

quality = 10)

return

# Define a main function

def main():

verbose = False

# checks for verbose flag

if (len(sys.argv)>1):

if (sys.argv[1].lower()==" -v"):

verbose = True

# finds current working dir

cwd = os.getcwd()

formats = ('.jpg', '.jpeg')

# looping through all the files

# in a current directory

for file in os.li stdir(cwd):

# If the file format is JPG or JPEG

if os.path.splitext(file)[1].lower() in formats:

print('compressing', file)

compressMe(file, verbose)

print("Done")

# Driver code

if __name__ == "__main__":

main() munotes.in

## Page 82

Image Processing Lab

82 Folder Before Compression:

Folder before running file

Command Line for executing Code :

PS: Please run code after getting into the directory.

Command Line for executing Code

Folder after execution of Code :

Folder after running code

You can clearly see the compressed file.

munotes.in

## Page 83

83 Experiment VI

6

MORPHOLOGICAL OPERATION

Aim: Morphological operational: Dilation, Erosion, Opening, Closing.

Theory:

EROSION AND DILATION IN MORPHOLOGICAL

PROCESSING.

These operations are fundamental to morphological processing.

Erosion:

With A and B as sets in Z2 , the erosion of A by B, denoted A B, is

defined as

In words, this equation indicates that the erosion of A by B is the set of all

points z such that B, translated by z, is contained in A. In the following

discussion, set B is assumed to be a structuring element. The statement

that B has to be contained in A is equivalent to B not sharing any common

elements with the background; we can express erosion in the following

equivalent form:

where, A c is the complement of A and Ø is the empty set.

a) Set (b) Square structuring element, (c) Erosion of by shown shaded. (d)

Elongated structuring element. (e) Erosion of by using this element. The

dotted border in (c) and (e) is the boundary of set A, shown only for

reference.

munotes.in

## Page 84

Image Processing Lab

84 The e lements of A and B are shown shaded and the background is white.

The solid boundary in Fig. (c) is the limit beyond which further

displacements of the origin of B would cause the structuring element to

cease being completely contained in A. Thus, the locus of points

(locations of the origin of B) within (and including) this boundary,

constitutes the erosion of A by B. We show the erosion shaded in Fig.

(c).The boundary of set A is shown dashed in Figs. (c) and (e) only as a

reference; it is not part of the erosion operation. Figure (d) shows an

elongated structuring element, and Fig. (e) shows the erosion of A by this

element. Note that the original set was eroded to a line. However, these

equations have the distinct advantage over other formulations in that they

are more intuitive when the structuring element B is viewed as a spatial

mask.

Thus, erosion shrinks or thins objects in a binary image. In fact, we can

view erosion as a morphological filtering operation in which image details

smaller than the struc turing element are filtered (re -moved) from the

image

(i) Dilation

However, the preceding definitions have a distinct advantage over other

formulations in that they are more intuitive when the structuring element

B is viewed as a convolution mask. The bas ic process of flipping

(rotating) B about its origin and then successively displacing it so that it

slides over set (image) A is analogous to spatial convolution. Keep in

mind, however, that dilation is based on set operations and therefore is a

nonlinear operation, whereas convolution is a linear operation. Unlike

erosion, which is a shrinking or thinning operation, dilation "grows" or

"thickens" objects in a binary image. The specific manner and extent of

this thickening is controlled by the shape of the structuring element used.

In the following Figure (b) shows a structuring element (in this case B = B

because the SE is symmetric about its origin). The dashed line in Fig. (c)

shows the original set for reference, and the solid line shows the limit

beyond which any further displacements of the origin of B by z would

cause the intersection of B and A to be empty. Therefore, all points on and

inside this boundary constitute the dilation of A by B. Figure (d) shows a

structuring element designed to achieve mo re dilation vertically than

horizontally, and Fig. (e) shows the dilation achieved with this element munotes.in

## Page 85

Morphological Operation

85

FIG:4.1.5 (a) Set (b) Square structuring element (the dot denotes the

origin). (c) Dilation of by shown shaded. (d) Elongated structuring

element. (e) Di lation of using this element. The dotted border in (c) and

(e) is the boundary of set shown only for reference.

Opening and Closing

Opening generally smoothes the contour object, breaks narrow isthmuses,

and eliminates thin protrusions. Closing also tends to smooth sections of

contours but, ass opposed to opening, it generally fuses narrow breaks and

long thin gulfs, eliminates small holes, and fills gaps in the contour munotes.in

## Page 86

Image Processing Lab

86

munotes.in

## Page 87

Morphological Operation

87

Morphological Operations in Image Processing (Opening)

Morphological operations are used to extract image components that are

useful in the representation and description of region shape.

Morphological operations are some basic tasks dependent on the picture

shape. It is typically performed on binary images. It needs two data

sources , one is the input image , the second one is called structuring

component . Morphological operators take an input image and a

structuring component as input and these elements are then combines

using the set operators. The objects in the input image are proc essed

depending on attributes of the shape of the image, which are encoded in

the structuring component.

Opening is similar to erosion as it tends to remove the bright foreground

pixels from the edges of regions of foreground pixels. The impact of the

operator is to safeguard foreground region that has similarity with the

structuring component, or that can totally contain the structuring

component while taking out every single other area of foreground pixels.

Opening operation is used for removing interna l noise in an image.

Opening is erosion operation followed by dilation operation.

Syntax: cv2.morphology Ex(image, cv2.MORPH_OPEN, kernel)

Parameters:

-> image : Input Image array.

-> cv2.MORPH_OPEN : Applying the Morphological Opening

operation.

-> kernel : Structuring element.

Below is the Python code explaining Opening Morphological Operation –

# Python program to illustrate

# Opening morphological operation

# on an image

# organizing imports

import cv2

import numpy as np

# return video from t he first webcam on your computer.

screenRead = cv2.VideoCapture(0)

# loop runs if capturing has been initialized.

while(1):

# reads frames from a camera munotes.in

## Page 88

Image Processing Lab

88 _, image = screenRead.read()

# Converts to HSV color space, OCV reads colors as BGR

# frame is converted to hsv

hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)

# defining the range of masking

blue1 = np.array([110, 50, 50])

blue2 = np.array([130, 255, 255])

# initializing the mask to be

# convoluted over input image

mask = cv2.inRange(hsv, blue1, blue2)

# passing the bitwise_and over

# each pixel convoluted

res = cv2.bitwise_and(image, image, mask = mask)

# defining the kernel i.e. Structuring element

kernel = np.ones((5, 5), np.uint8)

# defining the opening function

# over the image and structuring element

opening = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)

# The mask and opening operation

# is shown in the window

cv2.imshow('Mask', mask)

cv2.imshow('Opening', opening)

# Wait for 'a' key to stop the program

if cv2.waitKey(1) & 0xFF == ord('a'):

break

# De -allocate any associated memory usage

cv2.destroyAllWindows()

# Close the window / Release webcam

screenRead.r elease()

munotes.in

## Page 89

Morphological Operation

89 Input Frame:

Mask:

munotes.in

## Page 90

Image Processing Lab

90 Output Frame:

The system recognizes the defined blue book as the input as removes and

simplifies the internal noise in the region of interest with the help of the

Opening function.

Morphological Operations in Image Processing (Closing)

Closing is similar to the opening operation. In closing operation, the basic

premise is that the closing is opening performed in reverse. It is defined

simply as a dilation followed by an erosion using the same structuring

element used in the opening operation.

Syntax: cv2.morphologyEx(image, cv2.MORPH_CLOSE, kernel)

Parameters:

-> image : Input Image array.

-> cv2.MORPH_CLOSE : Applying the Morphological Closing operation.

-> kernel : Structuring element.

Below is the Python code explaining Closing Morphological Operation –

# Python program to illustrate

# Closing morphological operation

# on an image

# organizing imports

import cv2 munotes.in

## Page 91

Morphological Operation

91 import numpy as np

# return video from the first webcam on your computer.

scree nRead = cv2.VideoCapture(0)

# loop runs if capturing has been initialized.

while(1):

# reads frames from a camera

_, image = screenRead.read()

# Converts to HSV color space, OCV reads colors as BGR

# frame is converted to hsv

hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)

# defining the range of masking

blue1 = np.array([110, 50, 50])

blue2 = np.array([130, 255, 255])

# initializing the mask to be

# convoluted over input image

mask = cv2.inRange(hsv, blue1, blue2)

# passing the bitwise_and over

# each pixel convoluted

res = cv2.bitwise_and(image, image, mask = mask)

# defining the kernel i.e. Structuring element

kernel = np.ones((5, 5), np.uint8)

# defining the closing function

# over the image and structuring element

closing = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)

# The mask and closing operation

# is shown in the window

cv2.imshow('Mask', mask)

cv2.imshow('C losing', closing)

# Wait for 'a' key to stop the program

if cv2.waitKey(1) & 0xFF == ord('a'):

break

# De -allocate any associated memory usage

cv2.destroyAllWindows()

# Close the window / Release webcam

screenRead.release() munotes.in

## Page 92

Image Processing Lab

92 Input Frame:

Mask:

munotes.in

## Page 93

Morphological Operation

93 Output:

Erosion and Dilation of images using OpenCV in python

Morphological operations are a set of operations that process images

based on shapes. They apply a structuring element to an input image and

generate an output image

. The most basic morphological operations are two: Erosion and Dilation

Basics of Erosion:

● Erodes away the boundaries of the foreground object

● Used to diminish the features of an image.

Working of erosion:

1. A kernel(a matrix of odd size(3,5,7) is convolved with the image.

2. A pixel in the original image (either 1 or 0) will be considered 1 only

if all the pixels under the kernel are 1, otherwise, it is eroded (made to

zero).

3. Thus all the pixels near the boundary will be discarded depending

upon the size of the kernel.

4. So the thickness or size of the foreground object decreases or simply

the white region decreases in the image.

Basics of dilation:

● Increases the object area

● Used to accentuate features

Working of dilation:

1. A kernel(a matrix of odd size(3,5,7) is convolved with the image

2. A pixel element in the original image is ‘1’ if at least one pixel under

the kernel is ‘1’. munotes.in

## Page 94

Image Processing Lab

94 3. It increases the white region in the image or the size of the foreground

object increases

# Python program to demonstrate erosion and

# dilation of images.

import cv2

import numpy as np

# Reading the input image

img = cv2.imread('input.png', 0)

# Taking a matrix of size 5 as the kernel

kernel = np.ones((5,5), np.uint8)

# The first parameter is the original image,

# kernel is the matr ix with which image is

# convolved and third parameter is the number

# of iterations, which will determine how much

# you want to erode/dilate a given image.

img_erosion = cv2.erode(img, kernel, iterations=1)

img_dilation = cv2.dilate(img, kernel, iteratio ns=1)

cv2.imshow('Input', img)

cv2.imshow('Erosion', img_erosion)

cv2.imshow('Dilation', img_dilation)

cv2.waitKey(0)

Uses of Erosion and Dilation:

1. Erosion:

● It is useful for removing small white noises.

● Used to detach two connected objects etc.

2. Dilation:

● In cases like noise removal, erosion is followed by dilation. Because,

erosion removes white noises, but it also shrinks our object. So we

dilate it. Since noise is gone, they won’t come back, but our object

area increases.

● It is also useful in joining broken parts of an object.

munotes.in