## Page 1

1Chapter 1: Introduction To Soft Computing

Unit 1

1 INTRODUCTION TO SOFT COMPUTING

Unit Structure

1.0 Objectives

1.1 Computational Paradigm

1.1.1 Soft Computing v/s Hard Computing

1.2 Soft Computing

1.3 Premises of Soft Computing

1.4 Guidelines of Soft Computing

1.5 Uncertainty in AI

1.6 Application of Soft Computing

1.0 Objectives

In this chapter, we will try to learn what is soft computing, difference between hard

computing and soft computing and reason for why soft computing evolved. At the

end, some application of soft computing will be discussed.

1.1 Computational Par adigm

Figure 1.1: Computational Paradigms

munotes.in

## Page 2

2SOFT COMPUTING TECHNIQUES

Computational paradigm is classified into two viz: Hard computing and soft

computing. Hard computing is the conventional computing. It is based on the

principles of precision, certainty, and inflexibility. It requires mathematical model

to solve problems. It deals withs the precis e models. Th is model is further classified

into symbolic logic and reasoning, and traditional numerical modelling and search

methods. The basic of traditional artificial intelligence is utilised by these methods.

It consumes a lot of time to deal with real life problem which contains imprecise

and uncertain information. The following problems cannot accommodate hard

computing techniques:

1. Recognition problems

2. Mobile robot co -ordination, forecasting

3. Combinatorial problems

Soft computing deals with approximate models. This model is further classified

into two approximate reasoning, and functional optimization & random search

methods. It handles imprecise and uncertain information of the real world. It can

be used in all industries and business sector s to solve problems. Complex systems

can be designed with soft computing to deal with the incomplete information,

where the system behaviour is not completely known or the existence of measures

of variable is noisy.

1.1.1 Soft Computing v/s Hard Computing Hard Computing Soft Computing It uses precisely stated analytical model. It is tolerant to imprecision, uncertainty, partial truth and

approximation . It is based on binary logic and crisp systems . It is based on fuzzy logic and probabilistic reasoning . It has features such as precision and categoricity . It has features such as approximation and dispositionality. It is deterministic in nature. It is stochastic in nature. It can work with exact input data. It can work with ambiguous and noisy data. It performs sequential computation. It performs parallel computation. It produces precise outcome. It produces approximate outcome. munotes.in

## Page 3

3Chapter 1: Introduction To Soft Computing

1.2 Introduction to Soft Computing

The real -world problems require systems that combines knowledge, techniques,

and methodologies from various source. These systems should possess humanlike

expertise within specific domain, adapt themselves and learn to do better in the

changing environment s and explain how they make decisions or take actions.

Natural language is used by human for reasoning and drawing conclusion. In

conventional AI , the human intelligent behaviour is expressed in the language form

or symbolic rules. It manipulates the symbols on the assumption that such

behaviour can be stored in symbolically structured knowledge base known as

physical symbol system hypothesis.

“Basi cally, Soft Computing is not a homogenous body of concepts & techniques.

Rather, it is partnership of distinct methods that in one way or another conform to

its guiding principle. At this juncture, the dominant aim of soft computing is to

exploit the tolerance for imprecision and uncertainty to achieve tractability, robustness and low solutions cost. The principal constituents of soft computing are

fuzzy logic, neurocomputing, and probabilistic reasoning, with the latter subsuming

genetic algorithms, belief networks, chaotic systems, and parts of learning theory.

In partnership of fuzzy logic, neurocomputing, and probabilistic reasoning, fuzzy logic is mainly concerned with imprecision and approximate reasoning; neurocomputing with learning and curve -fitting ; and probabilistic reasoning with

uncertainty and belief propagation.”

-Zadeh (1994)

Soft computing combines different techniques and concepts. It can handle imprecision and uncertainty. Fuzzy logic, neurocomputing, evolutionary and genetic programming, a nd probabilistic computing are fields of soft computing.

Soft computing is designed to model and enable solutions to real world problems ,

which cannot be modelled mathematically. It does not perform much symbolic

manipulation.

The main computing paradigm of soft computing are: Fuzzy systems, Neural

Networks and Genetic Algorithms.

• Fuzzy set are for knowledge representation via fuzzy If – Then rules.

• Neural network for learning and adaptivity and

• Genetic algorithm for evolutionary computation.

munotes.in

## Page 4

4SOFT COMPUTING TECHNIQUES

To achieve close resemblance with human like decision making, soft computing

aims to exploit the tolerance for approximation, uncertainty, imprecision, and

partial truth.

• Approximation: the model has similar features but not same.

• Uncertainty : the features of the model may not be same as that of the

entity/belief.

• Imprecision: the model features (quantities) are not same as that the real ones

but are close to them.

1.3 Premises of Soft Computing

• The real -world problems are imprecise and uncertain.

• Precision and certainty carry a cost.

• There may not be precise solutions for some problems.

1.4 Guidelines of Soft Computing The guiding principle of soft computing is to exploit the tolerance for approximation, uncertainty, imprecision and part ial truth to achieve tractability,

robustness and low solution cost. Human mind is the role model for soft computing.

1.5 Uncertainty of AI

• Objective (features of whole environment)

o There are lot of uncertainty in the world. We have limited capabilities

to sense these uncertainties.

• Subjective (features of interaction with concrete environment

o For the same/similar situation people may have different experience s.

This experience maps on the features of semantics of different languages.

1.6 Application of Soft Computing

The application of soft computing has proved following advantages:

• The application that cannot be modelled mathematically can be solved.

• Non-linear problems can be solved.

• Introducing human knowledge such as cognition, understanding, recognition,

learning and other into the field of computing. munotes.in

## Page 5

5Chapter 1: Introduction To Soft Computing

Few applications of soft computing are enlisted below:

• Handwritten Script Recognition using Soft Computing:

It is one of the demanding parts of computer science. It can translate

multilingual documents and sort the various scripts accordingly. Block -level

technique concept is used by the system to recognize the script from several

script document given. To classify the script according to their features, it

uses Discrete Cosine Transform (DCT) and Discrete Wavelet Transform

(DWT) together.

• Image Processing and Data Compression using Soft Computing :

Image analysis is the high -level processing technique which includes recognition and bifurcation of patterns. It is one of the most important parts

of the medical field. The problem of computational complexity and efficiency

in the classification can be ea sily be solved using soft computing techniques.

Genetic algorithms, genetic programming, classifier systems, evolutionary

strategies, etc are the techniques of soft computing that can be used. These

algorithms give the fastest solutions to pattern recognit ion. These help in

analysing the medical images obtained from microscopes as well as examine

the X -rays.

• Use of Soft Computing in Automotive Systems and Manufacturing :

Automobile industry has also adapted soft computing to solve some of the

major problems .

Classic control methods is built in vehicles using the Fuzzy logic techniques.

It takes the example of human behavior, which is described in the forms of

rule – “If-Then “statements.

The logic controller then converts the sensor inputs into fuzzy varia bles that

are then defined according to these rules. Fuzzy logic techniques are used in

engine control, automatic transmissions, antiskid steering, etc.

• Soft Computing based Architecture :

An intelligent building takes inputs from the sensors and controls effectors

by using them. The construction industry uses the technique of DAI (Distributed Artificial Intelligence) and fuzzy genetic agents to provide the

building with capabilities that match human intelligence. The fuzzy logic is

used to create behaviour -based architecture in intelligent buildings to deal

with the unpredictable nature of the environment, and these agents embed

sensory information in the buildings. munotes.in

## Page 6

6SOFT COMPUTING TECHNIQUES

• Soft Computing and Decision Support System :

Soft computing gives an advantage of re ducing the cost of the decision

support system. The techniques are used to design, maintain, and maximize

the value of the decision process. The first application of fuzzy logic is to

create a decision system that can predict any sort of risk. The second

application is using fuzzy information that selects the areas which need

replacement.

• Soft Computing Techniques in Power System Analysis :

Soft computing uses the method of Artificial Neural Network (ANN) to

predict any instability in the voltage of the powe r system. Using the ANN,

the pending voltage instability can be predicted. The methods which are

deployed here, are very low in cost .

• Soft Computing Techniques in Bioinformatics :

The techniques of soft computing help in modifying any uncertainty and

indifference that biometrics data may have. Soft computing is a technique

that provides distinct low -cost solutions with the help of algorithms, databases, Fuzzy Sets (FSs), and Artificial Neural Networks (ANNs). These

techniques are best suited to give qu ality results in an efficient way.

• Soft Computing in Investment and Trading :

The data present in the finance field is in opulence and traditional computing

is not able to handle and process that kind of data. There are various

approaches done through soft computing techniques that help to handle noisy

data. Pattern recognition technique is used to analyse the pattern or behaviour

of the data and time series is used to predict future trading points.

Summary

In this chapter, we have learned that the soft comp uting is the partnership of

multiple techniques which helps to accomplish a particular task. The real -world

problem that contains uncertain and imprecise information can be solved using soft

computing techniques.

munotes.in

## Page 7

7Chapter 1: Introduction To Soft Computing

Review Questions

1. What is computational paradigm?

2. State difference between hard computing and soft computing?

3. Write a short note on soft computing.

4. What are the premises and guiding principle of soft computing techniques?

5. Give any three applications of soft computing.

Bibliography, References and Further Reading

• https://www.coursehero.com/file/40458824/01 -Introduction -to-Soft-

Computing -CSE-TUBEpdf/

• https://techdifferences.com/difference -between -soft-computing -and-hard-

computing.html

• https://www.researchgate.net/profile/Mohamed_Mourad_Lafifi/post/Soft_C

omputing_Applications/attachment/5b8ef4933 843b0067537cb3b/AS%3A6

67245734817800%401536095188583/download/Soft+Computing+and+its

+Applications.pdf

• https://wisdomplexus.com/blogs/applications -soft-computing/

• Artificial Intelligence and Soft Computing, by Anandita Das Battacharya,

SPD 3rd, 2018

• Principles of Soft Computing, S.N. Sivanandam, S.N.Deepa, Wiley, 3rd , 2019

• Neuro -fuzzy and soft computing, J.S.R. Jang, C.T.Sun and E.Mizutani, Prentice Hall of India, 2004

munotes.in

## Page 8

8SOFT COMPUTING TECHNIQUES

Unit 1

2 TYPES OF SOFT COMPUTING

TECHNIQUES

Unit Structure

2.0 Objectives

2.1 Types of Soft Computing Techniques

2.2 Fuzzy Computing

2.3 Neural Computing

2.4 Genetics Algorithms

2.5 Associative Memory

2.6 Adaptive of Resonance Theory

2.7 Classification

2.8 Clustering

2.9 Probabilistic Reasoning

2.10 Bayesian Network

2.0 Objectives

The objective of this chapter is to give the overview of various soft computing

techniques.

2.1 Types of Soft Computing Techniques

Following are the various techniques of soft computing:

1. Fuzzy Computing

2. Neural Network

3. Genetic Algorithms

4. Associative memory

5. Adaptive Resonance Theory munotes.in

## Page 9

9Chapter 2: Types of Soft Computing Techniques

6. Classification

7. Clustering

8. Probabilistic Reasoning

9. Bayesian Network

All the above techniques are discussed in brief in the below sections.

2.2 Fuzzy Computing

The knowledge that exists in real world is vague, imprecise, uncertain, ambiguous, or probabilistic in nature. This type of knowledge is also known as fuzzy knowledge. Human thinking and reasoning frequently involves fuzzy information.

The classical compu ting system involves two valued logic (true/false, 1/0, yes/no).

This system sometimes may not be able to answer some questions as human does,

as they do not have complete true answer. The computing system is not just

expected to give answers like human bu t also describe the reality level calculated

with the imprecision and uncertainty of the facts and rules applied.

Lofti Zadeh observed that the classical computing system was not capable to handle

subjective data representation or unclear human ideas. In 1965, he introduced fuzzy

set theory as the extension of classical set theory where elements have degrees of

memberships. It allows to determine the distinctions among the data that is neither

true nor false. It is like process of human thinking like very h ot, hot, warm, little

warm, cold, too cold.

In classical system, 1 represents absolute truth value and 0 represents absolute false

value. But in the fuzzy system, there is no logic for absolute truth and absolute false

value. But in fuzzy logic, there is intermediate value too present which is partially

true and partially false. munotes.in

## Page 10

10SOFT COMPUTING TECHNIQUES

Fig 2.1: Fuzzy logic with example

Fuzzy Logic Architecture:

Fig 2.2: Fuzzy Logic Architecture

Fuzzy logic architecture mainly constitutes of following four components:

• Rule base : It contains the set of rules. The If -then conditions are provided by

the experts to govern the decision -making system. These conditions are based

on linguistic information.

• Fuzzification : It converts the crisp numbers into the fuzz y sets. The crisp

input is measured by the sensors and passed into the control system for

processing.

• Inference engine: It determines the matching degree of the current fuzzy

input with respect to each rule and decides which rules are to be fired

accordin g to the input field. Next, the fired rules are combined to form the

control actions.

• Defuzzification: The fuzzy set obtained from the inference engine is

converted into the crisp value.

munotes.in

## Page 11

11Chapter 2: Types of Soft Computing Techniques

Characteristics of fuzzy logic:

1. It is flexible and easy to implement.

2. It helps to represent the human logic.

3. It is highly suitable method for uncertain or approximate learning.

4. It views inference as a process of propagating elastic constraints.

5. It allows you to build nonlinear functions of arbitrary complexity.

When not to use fuzzy logic:

1. If it is inconvenient to map an input space to an output space.

2. When the problem can be solved using common sense.

3. When many controllers can do the fine job, without the use of fuzzy logic.

Advantages of Fuzzy Logic System:

• Its structur e is easy and understandable.

• It is used for commercial and practical purposes.

• It helps to control machines and consumer products.

• It offers acceptable reasoning. It may not offer accurate reasoning.

• In data mining it helps you to deal with uncertainty.

• It is mostly robust as no precise inputs are required.

• It can be programmed to in the situation when feedback sensor stops working.

• Performance of the system can be modified or altered by using inexpensive

sensors to keep the overall system cost and complexity low.

• It provides a most effective solution to complex issues.

Disadvantages of Fuzzy Logic System:

• The res ults of the system m ay not be widely accepted as the fuzzy logic is

not always accurate.

• It does not have the capability of machine learning as -well-as neural network

type pattern recognition.

• Extensive testing with the hardware is needed for validation and verification

of a fuzzy knowledge -based system.

• It is difficult task to set exact, fuzzy rules and membership functions. munotes.in

## Page 12

12SOFT COMPUTING TECHNIQUES

Application areas of Fuzzy Logic:

• Automotive Systems: Automatic Gearboxes, Four -Wheel Steering, Vehicle

environment control.

• Consumer Electronic Goods: Photocopiers, Still and video cameras, television.

• Domestic Goods: Refrigerators, Vacuum cleaners, Washing Machines.

• Environment Control: Air conditioners, Humidifiers.

2.3 Neural Computing

Artificial Neural Network (ANN) also known as ne ural network is the concept

inspired from human brain and the way the neurons in the human brain works. It is

computational learning system that uses a network of functions to understand and

translate a data input of one form into another form. It contains large number of

interconnected processing elements called as neuron. These neurons operate in

parallel and are configured. Every neuron is connected with other neurons by a connection link. Each connection is associated with weights which contain informat ion about the input signal.

Component s of Neural Networks :

1. Neuron model: The information process unit of ANN .

Neuron model consist of the following:

a. Input

b. Weight

c. Activation functions

2. Architecture: The arrangement of neurons and links connecting neurons,

where every link .

Following are the different ANN architecture:

a. Single layer Feed forward Network

b. Multi -layer Feed forward Network

c. Single node with its own feedback

d. Single layer recurrent network

e. Multi-layer recurrent network munotes.in

## Page 13

13Chapter 2: Types of Soft Computing Techniques

3. A learning algorithm: For training ANN by modifying the weights in order to

model a particular learning task correctly on the training examples.

Following are the different types of learning algorithm:

a. Supervised Learning

b. Unsu pervised Learning

c. Reinforcement Learning

Applications of Neural Network:

1. Image recognition

2. Pattern recognition

3. Self-driving car trajectory prediction

4. Email spam filtering

5. Medical diagnosis

2.4 Genetics Algorithms

Genetic Algorithms initiated and developed in the early 1970’s by John Holland are

unorthodox search and optimization algorithms, which mimic some of the process

of natural evolution. Gas perform directed random search through a given set of

alternative with the aim of finding the best al ternative with respect to the given

criteria of goodness. These criteria are required to be expressed in terms of an object

function which is usually referred to as a fitness function.

Biological Background:

All living organism consist of cell. In each cell, there is a set of chromosomes which

are strings of DNA and serves as a model of the organism. A chromosomes consist

of genes of blocks of DNA. Each gene encodes a particular pattern. Basically, it

can be said that each gene encodes a traits.

Steps in volved in the genetic algorithm:

• Initialization: Define the population for the problem.

• Fitness Function: It calculates the fitness function for all the chromosomes in

the population.

• Selection: Two fittest chromosomes are selected for the producing the offspring. munotes.in

## Page 14

14SOFT COMPUTING TECHNIQUES

• Crossover: Information in the two chromosomes is exchanged to produce the

new offspring.

• Mutation: It is the process of promoting diversity in the populations.

Benefits Of Genetic Algorithm

• Easy to understand.

• We always get an answer and the answer gets better with time.

• Good for noisy environment.

• Flexible in forming building blocks for hybrid application.

• Has substantial history and range of use.

• Supports multi -objective optimization.

• Modular, separate from application.

Application of Geneti c Algorithm:

• Recurrent Neural Network

• Mutation testing

• Code breaking

• Filtering and signal processing

2.5 Associative Memory

An associative memory is a content -addressable structure that maps a set of input

patterns to a set of output patterns. The associative memory are of two types : auto -

associative and hetero -associative.

An auto -associative memory retrieves a previously stored pattern that most

closely resembles the current pattern. In a hetero -associative memory , the

retrieved pattern is, in general, different from the input pattern not only in content

but possibly also in type and format.

munotes.in

## Page 15

15Chapter 2: Types of Soft Computing Techniques

Description of Associative Memory :

Fig 2. 3: A content -addressable memory, Input and output

A content -addressable memory is a type of memory that allows, the recall of data

based on the degree of similarity between the input pattern and the patterns stored

in memory. It refers to a memory organization in which the memory is accessed by

its content and not or opposed to an explicit address in th e traditional computer

memory system. This type of memory allows the recall of information based on

partial knowledge of its contents.

The simplest artificial neural associative memory is the linear associator. The other

popular ANN models used as associative memories are Hopfield model and

Bidirectional Associative Memory (BAM) models.

2.6 Adaptive Resonance Theory

ART stands for "Adaptive Resonance Theory", invented by Stephen Grossberg in

1976. ART encompasses a wide variety of neural networks, b ased explicitly on

neurophysiology. The word "Resonance" is a concept, just a matter of being within

a certain threshold of a second similarity measure. The basic ART system is an

unsupervised learning model, like many iterative clustering algorithms wher e each

case is processed by finding the "nearest" cluster seed that resonate with the case

and update the cluster seed to be "closer" to the case. If no seed resonate with the

case, then a new cluster is created.

Grossberg developed ART as a theory of hum an cognitive information processing.

The emphasis of ART neural networks lies at unsupervised learning and self -

organization to mimic biological behavior. Self -organization means that the system

must be able to build stable recognition categories in real -time. The unsupervised

munotes.in

## Page 16

16SOFT COMPUTING TECHNIQUES

learning means that the network learns the significant patterns based on the inputs

only. There is no feedback. There is no external teacher that instructs the network

or tells which category a certain input belongs. The basic ART system is an

unsupervised learning model.

The model typically consists of:

• a comparison field and a recognition field composed of neurons,

• a vigilance parameter, and

• a reset module.

Comparison field and Recognition field :

• The Comparison field takes an input vector (a 1 -D array of values) and

transfers it to its best match in the Recognition field; the best match is, the

single neuron whose set of weights (weight vector) matches most closely the

input vector.

• Each Recognition Field neuron outputs a nega tive signal(proportional to that

neuron’s quality of match to the input vector) to each of the other Recognition

field neurons and inhibits their output accordingly.

• Recognition field thus exhibits lateral inhibition, allowing each neuron in it

to represent a category to which input vectors are classified.

Vigilance parameter :

• It has considerable influence on the system memories:

o higher vigilance produces highly detailed memories,

o lower vigilance results in more general memories

Reset module :

• After the input vector is classified, the Reset module compares the strength

of the recognition match with the vigilance parameter.

o If the vigilance threshold is met, then training commences.

o Else, the firing recognition neuron is inhibited until a new input vector

is applied.

Training ART -based Neural Networks :

• Training commences only upon completion of a search procedure. What

happens in this search procedure : munotes.in

## Page 17

17Chapter 2: Types of Soft Computing Techniques

o The Recognition neurons are di sabled one by one by the reset function

until the vigilance parameter is satisfied by a recognition match.

o If no committed recognition neuron’s match meets the vigilance threshold, then an uncommitted neuron is committed and adjusted

towards matching the input vector.

Methods of Learning :

• Slow learning method: here the degree of training of the recognition neuron’s

weights towards the input vector is calculated using differential equations and

is thus dependent on the length of time the input vector is presented.

• Fast learning method: here the algebraic equations are used to calculate degree

of weight adjustments to be made, and binary values are used.

Types of ART Systems:

• ART 1 : The simplest variety of ART networks, accept only binary inputs.

• ART 2 : It extends network capabilities to sup port continuous inputs.

• Fuzzy ART : It Implements fuzzy logic into ART’s pattern recognition, thus enhances generalizing ability. One very useful feature of fuzzy ART is complement coding, a means of incorporating the absence of features into

pattern clas sifications, which goes a long way towards preventing inefficient

and unnecessary category proliferation.

• ARTMAP : Also known as Predictive ART, combines two slightly modified

ARTs , may be two ART -1 or two ART -2 units into a supervised learning

structure where the first unit takes the input data and the second unit takes

the correct output data, then used to make the minimum possible adjustment

of the vigilance parameter in the first unit in order to make the correct

classification.

2.7 Classification

Classification is supervised learning. Classification algorithms is used to predict

the categorical values. Training is provided to identify the category of new observations. The program learns from the given dataset or observations and then

classifies new observation into a number of classes or groups. Classes are also

called as target/labels or categories.

munotes.in

## Page 18

18SOFT COMPUTING TECHNIQUES

Classification algorithms:

• Logistic Regression

• Naïve Bayes

• K-Nearest Neighbour

• Decision tree

• Random Forest

Application of Classification :

• Email Spam Detection

• Speech Recognition

• Identification of Cancer tumour cells

• Biometric Identifications

2.8 Clustering

Clustering is type of unsupervised learning method. In this learning we draw

references from datasets consisting of input data without labelled responses. Generally, it is used as a process to find meaningful structure, explanatory underlying processes, generative features, and groupings inherent in a set of

examples.

Its task is to divide the population or data points into several groups. Data points in

the same group are similar to the other data point in the same group and dissimilar

to the data points in other groups.

Why Clustering?

Clustering determines the group ing among the unlabelled data present. There is no

criteria for a good clustering. It depends on the criteria that the user fits the need of

the user.

Clustering Methods:

• Density -Based Methods

• Hierarchical Based Methods

o Agglomerative (bottom up approach)

o Divisive (top down approach)

• Partitioning Methods

• Grid-based Methods munotes.in

## Page 19

19Chapter 2: Types of Soft Computing Techniques

Applications of Clustering in different fields

• Marketing

• Biology

• Insurance

• City Planning

• Earthquake studies

2.9 Probabilistic Reasoning

Probabilistic reasoning is a way of knowledge representation where we apply the

concept of probability to indicate the uncertainty in knowledge. In probabilistic

reasoning, we combine probability theory with logic to handle the uncertainty. We

use probability in probabilistic reasoning becau se it provides a way to handle the

uncertainty that is the result of someone's laziness and ignorance. In the real world,

there are lots of scenarios, where the certainty of something is not confirmed, such

as "It will rain today," "behavior of someone for some situations," "A match

between two teams or two players." These are probable sentences for which we can

assume that it will happen but not sure about it, so here we use probabilistic

reasoning.

Need of probabilistic reasoning in AI:

• When there are unpredictable outcomes.

• When specifications or possibilities of predicates becomes too large to

handle.

• When an unknown error occurs during an experiment.

• In probabilistic reasoning, there are two ways to solve problems with

uncertain knowledge:

o Bayes' rule

o Bayesian Statistics

2.10 Bayesian Networks

Bayesian network is also known Bayesian belief network, decision network or

Bayesian Model. It deals with the probabilistic events and solves a problem which

has uncertainty. munotes.in

## Page 20

20SOFT COMPUTING TECHNIQUES

Baye sian networks are a type of probabilistic graphical model that uses Bayesian

inference for probability computations. Bayesian networks aim to model conditional dependence, and therefore causation, by representing conditional

dependence by edges in a direc ted graph. Through these relationships, one can

efficiently conduct inference on the random variables in the graph through the use

of factors.

Fig 2. 4: Bayesian Network example

A Bayesian network is a directed acyclic graph in which each edge corresponds

to a conditional dependency, and each node corresponds to a unique random

variable. Formally, if an edge (A, B) exists in the graph connecting random

variables A and B, it means that P(B|A) is a factor in the joint probabili ty

distribution, so we must know P(B|A) for all values of B and A in order to conduct

inference.

The Bayesian network has mainly two components:

• Causal Component

• Actual numbers

Each node in the Bayesian network has condition probability distribution P(Xi

|Parent(Xi) ), which determines the effect of the parent on that node.

Applications of Bayesian Network s:

• Medical Diagnosis

• Management efficiency

• Biotechnology

munotes.in

## Page 21

21Chapter 2: Types of Soft Computing Techniques

Summary

In this chapter we have learned different techniques used in soft computing. Fuzzy

system can be used when we want to deal with uncertainty and imprecision.

Adaptivity and learning abilities in the system can be build using neural computing.

To find the be tter solution to the problem, genetic algorithms can be applied. The

pattern can be retrieved from the memory based on the content and not based on

address is called associative memory. Find the input patterns closest resemblances

in the memory can also be done with the adaptive resonance theory. Classification

is based on supervised learning usually used for predictions and clusterin g is based

on unsupervised learning. Probabilistic reasoning and Bayesian Networks are based

on the probability of the event occurring.

Review Questions

1. Write a short note on fuzzy system.

2. What is artificial neural network? Explain its components and learning

methods.

3. Write a short note on genetic algorithms.

4. Explain the working of Adaptive Resonance Theory.

5. Write a short note o n associative memory.

6. Compare classification technique with clustering technique.

7. Write a short note on probabilistic reasoning.

8. Write a short note on Bayesian Networks.

Bibliography, References and Further Reading

• https://www.coursehero.com/file/40458824/01 -Introduction -to-Soft-

Computing -CSE-TUBEpdf/

• https://www.geeksforgeeks.org/fuzzy -logic -introduction/

• https://www.guru99.com/what -is-fuzzy -logic.html

• https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence

_fuzzy_logic_systems.htm

munotes.in

## Page 22

22SOFT COMPUTING TECHNIQUES

• https://deepai.org/machine -learning -glossary -and-terms/neural -network

• https://www.javatpoint.com/bayesian -belief -network -in-artificial -

intelligence

• https://www.javatpoint.com/probabilistic -reasoning -in-artifical -

intelligence#:~:text=Probabilistic%20reasoning%20is%20a%20way,logic%

20to%20handle%20the%20 uncertainty

• https://www.geeksforgeeks.org/clustering -in-machine -learning/

• https://www.jav atpoint.com/classification -algorithm -in-machine -learning

• https://www.geeksforgeeks.org/genetic -algorithms/

• Artificial Intelligence and Soft Computing, by Anandita Das Battacharya,

SPD 3rd, 2018

• Principles of Soft Computing, S.N. Sivanandam, S.N.Deepa, Wiley, 3rd , 2019

• Neuro -fuzzy and soft computing, J.S.R. Jang, C.T.Sun and E.Mizutani,

Prentice Hall of India, 2004

munotes.in

## Page 23

23Chapter 3: Introduction to Artificial Neural Network & Supervised Learning Network I

UNIT 2

3

INTRODUCTION TO ARTIFICIAL

NEURAL NETWORK & SUPERVISED

LEARNING NETWORK I

Unit Structure

3.0 Objective

3.1 Basic Concept

3.1.1 Introduction to Artificial Neural Network

3.1.2 Overview of Biological Neural Network

3.1.3 Human Brain v/s Artificial Neural Network

3.1.4 Characteristics of ANN

3.1.5 Basic Models of ANN

3.2 Basic Models of Artifical Neural Network

3.2.1 The Model Synaptic Interconnection

3.2.2 Learning Based Model

3.2.3 Activation Function

3.3 Terminologies o f ANN

3.4 McCulloch Pitts Neuron

3.5 Concept of Linear Separability

3.6 Hebb Training Algorithm

3.7 Perceptron Network

3.8 Adaptive Linear Neuron

3.8.1 Training Algorithm

3.8.2 Testing Algorithm

3.9 Multiple Adaptive Linear Neurons

3.9.1 Architecture

Review Questions

References munotes.in

## Page 24

24SOFT COMPUTING TECHNIQUES

3.0 Objectives

1. The fundamentals of artificial neural network

2. Understanding between biological neuron and artificial neuron

3. Working of a basic fundamental neuron model.

4. Terminologies and terms used for better understanding of Artificial Neural

Network

5. The basics of supervised learning and perceptron learning rule

6. Overview of adaptive and multiple adaptive linear neurons

3.1 Basic Concept

Neural networks are information processing systems that are implemented to

model the working of the human brain. It is more of a computational model used

to perform tasks in a better optimized way than the traditional systems . The

essential properties of biological neural net works are considered in order to

understand the information processing tasks. This indeed will allow us to design

abstract models of artificial neural networks which can be simulated and

analyzed.

3.1.1 Introduction to Artificial Neural Network

Artificial Neural Network (ANN) is an information processing system that

possesses characteristics with biological neural networks. ANNs consists of

large number of highly interconnected processing elements called nodes or units

or neurons. These neurons operate in parallel. Every neuron is connected to the

other neuron through the communication link with assigned weights which

contain information about the input signal. These processing elements are called

neurons or artificial n eurons.

3.1.2 Overview of Biological Neural Network

Fig 3.1 : Schematic diagram of a Neuron

(Image courtesy: Ugur Halici Lecture notes )

munotes.in

## Page 25

25Chapter 3: Introduction to Artificial Neural Network & Supervised Learning Network I

The fact that the human brain consists of large number of neurons with numerous

interconnections that processes information. The term neural network is usually

referred to the biological neural network that processes and transmits

information . The biological neurons are part of the nervous system.

The biological neuron consists of three major parts

1. Soma o r Cell body - contains the cell nucleus. In general, processing occurs

here

2. Dendrites - branching fibres that protrude from the cell body or soma . The

nerve is connected to the cell body.

3. Axon - It carries the impulses of the neuron. It carries information aw ay

from the soma to other neurons.

4. Synapse - Each strand of an axon terminates into a small bulb -like organ

called synapse. It is through synapse the neuron introduces its signals to

other neurons.

Working of the neuron

1. Dendrites receive activation signal from other neurons which is the internal

state of every neuron

2. Soma processes the incoming activity signals and convert its into output

activation signals.

3. Axons carry signals from the neuron and sends it to other neurons.

4. Electric impuls es are passed between the synapses and the dendrites. The

signal transmission involves a chemical process called neuro -transmitters.

3.1.3 Human Brain v/s Artificial Neural Network

Comparison between biological and artificial neurons based on the following

criteria

1. Speed – Signals in human brain move at a speed dependent on the nerve

impulse. The biological neuron is slow in processing as compared to the

artificial neural networks which are modelled to process faster.

2. Processing - The biological neuron can perform massive parallel operations

simultaneously. A large number of simple units are organized to solve

problems independently but collectively. The artificial neurons also

respond in parallel but do not execute programmed instructions. munotes.in

## Page 26

26SOFT COMPUTING TECHNIQUES

3. Size and Complexity - The size and complexity of the brain is

comparatively higher than that of artificial neural network. The size and

complexity of an ANN is different for different applications

4. Storage Capacity – The biological neuron stores the information in it s

interconnection and in artificial neuron it is stored in memory locations.

5. Tolerance - The biological neuron has fault tolerant capability but artificial

neuron has no tolerant capability. Biological neurons considers

redundancies whereas artificial neuro ns cannot consider redundancies.

6. Control mechanism - There is no control unit to monitor the information

processed in to the network in biological neural networks whereas in

artificial neur on model all activities are continuously monitored by a

control unit .

3.1.4 Characteristics of Artificial Neural Networks

1. It is a mathematical model consists of computational elements

implemented neurally.

2. Large number of highly interconnected processing elements known as

neurons are prominent in ANN

3. The interconnections with their weights are associated with neurons.

4. The input signals arrive at the processing elements through connections

and weights.

5. ANNs collective behavior is characterized by their ability to learn, recall

and generalize from the given data.

6. A single neuron carries no specific information.

3.1.5 How a simple neuron works?

Fig 3.2 Architecture of a simple artificial neural net

From the given figure above, there are two input neurons X1 and X2 transmitting

signal to the output neuron Y for receiving signal.

munotes.in

## Page 27

27Chapter 3: Introduction to Artificial Neural Network & Supervised Learning Network I

The input neurons are connected to the output neurons over a weighted

interconnection links w1 and w2.

For above neuron arc hitecture , the net input has to be calculated in the way.

yin = x1w1+x2w2

where x 1 and x2 are the activations of the input neurons X 1 and X2 . The output

yin of the output neuron Y can be obtained by applying activations over the net

input .

y =f(yin)

Output = Function ( net input calculated )

The function to be applied over the net input is called activation function .

3.2 Basic Models of Artifical Neural Network

The models of ANN are specified by the three basic entities

1. The model’s synaptic interconnections

2. The learning rules adopted for updating and adjusting the connection

weights

3. The activation functions

3.2.1. The model’s synaptic interconnections

ANN consists of a set of highly interconnected neurons connected through

weights to the other processing elements or to itself. The arrangement of these

processing elements and the geometry of their interconnections are important for

ANN. The arrangement o f neurons to form layers and the connection pattern

formed within and between layers is called the network architecture.

There are five basic neuron connection architectures.

1. Single -layer feed -forward network

2. Multilayer feed -forward network

3. Single node with its own feedback

4. Single -layer recurrent network

5. Multi -layer recurrent network

munotes.in

## Page 28

28SOFT COMPUTING TECHNIQUES

1. Single -layer feed -forward network

It consists of a single layer of network where the inputs are directly connected

to the output, one per node with a series of various we ights.

2. Multi -layer feed -forward network

It consists of multi layers where along with the input and output layers, there are

hidden layers. There can be zero to many hidden layers. The hidden layer is

usually internal to the network and has no direct contact with the environment.

3. Single node with own feedback

The simplest neural network architecture giving feedback to itself with a single

neuron.

munotes.in

## Page 29

29Chapter 3: Introduction to Artificial Neural Network & Supervised Learning Network I

4. Single -layer recurrent network

A single -layer network with a feedback directed back to itself or to other

processing element or both.

5. Multilayer recurrent network

A recurrent network has at least a feedback in place. The processing elements

output can be directed back to the nodes in the previous layer.

3.2.2. Learning

The most important part of ANN is it capability to train or learn. It is basically a

process by means of which a neural net adapts for adjusting or updating the

connection weights in order to receive a desired response.

Learning in ANN is broadly classified into three categories

1. Supervised Learning

2. Unsupervised Learning

3. Reinforcement Learning

munotes.in

## Page 30

30SOFT COMPUTING TECHNIQUES

1. Supervised Learning

In Supervised learning, it is assumed that the correct target output values are

known for each input pattern. In this learning, a supervisor or teacher is needed

for error minimization. The difference between the actual and desired output

vector is minimized using the error signal by adjusting the weights until the

actual output matches the desired output.

2. Unsupe rvised Learning

In Unsupervised learning, the learning is performed without the help of a teacher

or supervisor. In the learning process, the input vectors of similar type are

grouped together to form clusters. The desired output is not given to the networ k.

The system learns on its own with the input patterns.

3. Reinforcement Learning

The Reinforcement learning is a form of Supervised learning as the network

receives feedback from its environment. Here the supervisor does not present the

desired output but learns through the critic information.

3.2.3 Activation Function

An activation function f is applied over the net input to calculate the output of

an ANN. The choice of activation functions depends on the type of problems to

be solved by the network.

The most common functions are

1. Identity function - It is a linear function. It is defined as f(x) = x for all x

2. Binary step function: The function can be defined as

1 if x >= ߠ

f(x) =

0 if x < ߠ

Here, ߠ represents the threshold value.

3. Bipolar Step function: The function can be defined as

1 if x >= ߠ

f(x) =

-1 if x < ߠ

Here, ߠ represents the threshold value munotes.in

## Page 31

31Chapter 3: Introduction to Artificial Neural Network & Supervised Learning Network I

4. Sigmoidal functions: These functions are used in back -propagation nets.

They are of two types:

Binary Sigmoid function: It is known as unipolar sigmoid function.

It is defined by the equation

f(x) = ଵ

ଵାషഊೣ

Here, ᆋ is the steepness parameter. The range of the sigmoid function is

from 0 to 1

Bipolar Sigmoid function: This function is defined as

f(x) = ଵିషഊೣ

ଵାషഊೣ

Here, ᆋ is the steepness parameter. The range of the sigmoid function is

from -1 to +1

5. Ramp function: The ramp function is defined as

1 if x > 1

f(x)= x if 0 ݔͳ

0 if x < 0

The graphical representation is shown below for all the activation functions munotes.in

## Page 32

32SOFT COMPUTING TECHNIQUES

3.3 Terminologies of ANN

3.3.1 Weights

Weight is a parameter which contains information about the input signal. This

information is used by the net to solve a problem.

In ANN architecture, every neuron is connected to other neurons by means of a

directed communication link and every link is as sociated with weights.

Wij is the weight from processing element ‘i’ source node to processing element

‘j’ destination node.

3.3.2 Bias (b)

munotes.in

## Page 33

33Chapter 3: Introduction to Artificial Neural Network & Supervised Learning Network I

The bias is a constant value included in the network. Its impact is seen in

calculating the net input. The bias is included by adding a component x 0 =1 to

the input vector X.

Bias can be positive or negative. The positive bias helps in increasing the net

input of the network. The negative bias helps in decreasing the net input of the

network.

3.3.3. Thre shold ( ࣂ)

Threshold is a set value used in the activation function. In ANN, based on the

threshold value the activation functions are defined and the output is calculated.

3.3.3 Learning Rate (ࢻ)

The learning rate is used to control the amount of weight adjustment at ea ch step

of training. The learning rate ranges from 0 to 1. It determines the rate of learning

at each time step.

3.4 McCulloch - Pitts Neuron (MP neuron model)

MP neuron model was the earliest neural network model discovered by Warren

McCulloch and Walter Pitts in 1943.It is also known as Threshold Logic Unit.

The M -P neurons are connected by directed weighted paths. The activation of

this model is binary. The weights associated with the communication links may

be excitatory (weight is positive) or i nhibitory (weight is negative). Each neuron

has a fixed threshold and if the net input to the neuron is greater than the

threshold then the neuron fires otherwise it will not fire.

3.5 Concept of Linear Separability

Concept: Sets of point in 2 -D space are linearly separable if the points can be

separated by a straight line

In ANN, linear separability is the concept wherein the separation is based on the

network response being positive or negative. A decision line is drawn to separate

positive and negative responses. The decision line is called as linear -separable

line. munotes.in

## Page 34

34SOFT COMPUTING TECHNIQUES

Fig 3.3: Linear Separable Patterns

The linear separability of the network is based on the decision -boundary line. If

there exists weights for which the training data has correct response ,+ 1

(positive) ,it will lie on one side of the decision boundary line and all other data

on the other side of the boundary line. This is known as linear separability.

3.6 Hebb Network

Hebb or Hebb learning rule stated by Donald Hebb in 1949 states that, the

learning is performed by the change in the synaptic gap. Explaining further, he

stated “When an axon of cell A is near enough to excite cell B, and repeatedly

takes place in firing it, some growth or metabolic change takes place in one or

both the cells such that A’s efficiency, as one of the cells firing B, is increased”.

In Hebb learning, if two interconnected neurons are ‘ON’ simultaneously then

the weights associated with these neurons can be increased by c hanging the

strength in the synaptic gap.

The weight update is given by

Wi (new) = w i (old) + x iy

Flowchart of Training algorithm,

munotes.in

## Page 35

35Chapter 3: Introduction to Artificial Neural Network & Supervised Learning Network I

Fig 3.4: Flowchart of Hebb training algorithm

3.7 Perceptron Networks

Perceptron Networks are single -layer feed forward networks. They are the

simplest perceptron ,

Perceptron consists of three units – input unit (sensory unit), hidden unit

(associator unit) and output unit (response unit). The input units are connected

to th e hidden units with fixed weights having values 1, 0 or -1 assigned at

random. The binary activation function is used in input and hidden unit. The

response unit has an activation of 1, 0 or -1. The output signal sent from the

hidden unit to the output uni t are binary.

The output of the perceptron network is given by y =f(yin) where yin is the

activation function.

munotes.in

## Page 36

36SOFT COMPUTING TECHNIQUES

Fig 3.5: Perceptron model

Perceptron Learning algorithm

The training of perceptron is a supervised learning algorithm. The algorithm can

be used for either bipolar or binary input vectors, fixed threshold and variable

bias.

The output is obtained by applying the activation function over the calculated

net input.

The weights are adjusted to minimize error when the output does not match the

desired output.

munotes.in

## Page 37

37Chapter 3: Introduction to Artificial Neural Network & Supervised Learning Network I

3.8 Adaptive Linear Neuron (ADALINE)

It is a network with a single linear unit. The linear activation functions are called

linear units. In this, the input -output relationship is linear. Adaline networks are

trained using the delta rule.

Adaline is a single -unit neuron, which receives input from several units and also

from one unit, called bias. An Adeline model consists of trainable weights. The

inputs are of two values (+1 or -1) and the weights have signs (positive or

negative).

Initially random weights are assigned. The net input calculated is applied to a

quantizer transfer function (possibly activation function) that restores the output

to +1 or -1. The Adaline model compares the actual output with the target output

and with the bias and the adjusts all the weights.

3.8.1 Training Algorithm

The Adaline network training algorithm is as follows:

Step0: weights and bias are to be set to some random val ues but not zero. Set the

OHDUQLQJUDWHSDUDPHWHUĮ

Step1: perform steps 2 -6 when stopping condition is false.

Step2: perform steps 3 -5 for each bipolar training pair s:t

Step3: set activations foe input units i= 1 to n.

Step4: calculate the net input to the output unit.

Step5: update the weight and bias for i=1 to n

Step6: if the highest weight change that occurred during training is smaller than

a specified tolerance then stops the training process, else continue. This is the

test for the stopping condit ion of a network.

3.8.2 Testing Algorithm

It is very essential to perform the testing of a network that has been trained.

When the training has been completed, the Adaline can be used to classify input

patterns. A step function is used to test the performa nce of the network. The

testing procedure for the Adaline network is as follows:

Step0: initialize the weights. (The weights are obtained from the training

algorithm.) munotes.in

## Page 38

38SOFT COMPUTING TECHNIQUES

Step1: perform steps 2 -4 for each bipolar input vector x.

Step2: set the activations of the input units to x.

Step3: calculate the net input to the output units

Step4: apply the activation function over the net input calculated.

3.9 Multiple Adaptive Linear Neurons (Madaline)

It consists of many adalines in parallel with a single output unit whose value is

based on certain selection rules. It uses the majority vote rule. On using this rule,

the output unit would have an answer either true or false.

On the other hand, if AND rule is used, the output is true if and only if both the

inputs are true and so on.

The training process of Madaline is similar to that of Adaline

3.9.1 Architecture

It consists of “n” units of input layer and “m” units of Adaline layer and “ 1” unit

of the Madaline layer. Each neuron in the Adaline and Madaline layers has a bias

of excitation “1” . The Adaline layer is present between the input layer and the

Madaline layer; the Adaline layer is considered as the hidden layer.

Fig 3.6: Architecture of Madaline layer

munotes.in

## Page 39

39Chapter 3: Introduction to Artificial Neural Network & Supervised Learning Network I

Review Questions

1. Define the term Artificial Neural Network .

2. List and explain the main components of biological neuron.

3. Mention the characteristics of an artificial neural network.

4. Compare the similarities and differences between biological and artificial

neuron.

5. What are the basic models of an artificial neural network?

6. List and explain the commonly used activation functions.

7. Define the following

a. Weights

b. Bias

c. Threshold

d. Learning rate

8. Write a short note on McCulloch Pitts Neuron model.

9. Discuss about the concept of line r separability.

10. State the training algorithm used for the Hebb learning networks.

11. Explain perceptron network.

12. What is Adaline? Draw the model of an Adaline network.

13. How is Madaline network formed?

REFERENCES

1. “Principles of Soft Computing ”, by S.N. Sivanand am and S.N. Deepa ,

2019, Wiley Publication, Chapter 2 and 3

2. http://www.sci.brooklyn.cuny.edu/ (Artificial Neural Networks, Stephen

Lucci PhD)

3. Related documents, diagrams from blogs, e -resources from RC

Chakraborty lecture notes and tutorialspoint. com.

munotes.in

## Page 40

39Chapter 4: Supervised Learning Network II and Associative Memory Network

Unit 2

4 SUPERVISED LEARNING N ETWORK II AND

ASSOCIATIVE MEMORY NETWORK

Unit Structure

4.0 Objective

4.1 Backpropagation Network

4.2 Radial Basis Function

4.3 Time Delay Neural Network

4.4 Functional Link Network

4.5 Tree Neural Network

4.6 Wavelet Neural Network

4,7 Associative Memory Networks -Overview

4.8 Auto associative Memory Network

4.9 Hetro associative Memory Network

4.10 Bi-directional Associative Memory

4.11 Hopfield Networks

4.0 Objectives

1. To understand Back -propagation networks used in real time application.

2. Theory behind radial basis network and its activation function

3. Special supervised learning networks such as time delay neural networks,

functional link networks, tree neural networks and wavelet neu ral networks

4. Details and understanding about associative memory and its types

5. Hopfield networks and its training algorithm.

6. An overview of iterative auto associative and temporal associative memory munotes.in

## Page 41

40SOFT COMPUTING TECHNIQUES

4.1 Backpropagation networks

It is applied to multi -layer feed forward networks consisting of processing elements

with different activation functions. The networks associated with back propagation

learning algorithm is known as Back propagation networks. It uses gradient descent

method to calculate error and p ropagate it back to the hidden unit.

The training at BPN is performed in three stages

1. The feed -forward of the input training pattern

2. The calculation and back -propagation of the error

3. Weight updates

Fig. 4.1: Architecture of Backpropagation network (Image:guru99.com)

1. A back -propagation neural network is a multilayer, feed -forward neural

network consisting of an input layer, a hidden layer and output layer.

2. The neurons present in the hidden and output layers have activation wit h

always value 1.

3. The bias also acts as weights.

4. During the learning phase, signals are sent in the reverse direction.

5. The output obtained can be either binary or bipolar.

4.2 Radial Basis Function network

The radial basis function is a classification and functional approximation neural

network. It uses non -linear activation functions like sigmoidal and Gaussian

munotes.in

## Page 42

41Chapter 4: Supervised Learning Network II and Associative Memory Network

functions. Since radial basis functions have only one hidden layer, the convergence

of optimizati on is much faster.

1. The architecture consists of two layers.

2. The output nodes form a linear combination of the basis functions computed

by means of radial basis function nodes. Hidden layer generates a signal

corresponding to an input vector in the i nput layer, and corresponding to this

signal, network generates a response.

Fig.4.2: Architecture of Radial Basis functions

4.3 Time Delay Neural Networks

Time delay networks are basically feed -forward neural networks except that the

input weights has a tapped delay line associated to it.In TDNN, when the output is

being fed back through a unit delay into the input layer, the net computed is

equivalent to an infinite impulse response filter.

A neuron with a tapped delay line is called a Time delay neural network unit and a

network which consists of TDNN units is called a Time delay neural network.

Application od TDNN is speech recognition.

4.4 Functional Link networks

Functional link networks is a specifically designed high order neural netwo rks with

low complexity for handling linearly non -separable problems. It has no hidden

layers. This model is useful for learning continuous functions.

munotes.in

## Page 43

42SOFT COMPUTING TECHNIQUES

The most common example of linear non -separability is XOR problem.

Fig 4.3: Functional line network model with no hidden layer

4.5 Tree Neural Networks

These networks are basically used for pattern recognition problems. It uses

multilayer neural network at each decision -making node of a binary classification

for extracting a non -linear feature.

The decision nodes are circular nodes and the terminal nodes are square nodes. The

splitting rule decides whether the pattern moves to the right or left.

The algorithm consists of two phases

1. The growing phase - A large tree is grown in this phase by recursively finding

the rules of splitting until all the terminal nodes have nearly pure membership

or else it can split further.

2. Tree pruning phase - To avoid overfilling/overfitting of data, a smaller tree is

selected or it is pr uned.

munotes.in

## Page 44

43Chapter 4: Supervised Learning Network II and Associative Memory Network

Example - Tree neural networks can be used for waveform recognition problem.

Fig 4.4: Binary Classification tree

4.6 Wavelet Neural Networks

These networks work on wavelet transform theory. It is useful for functional

approximation through wavelet decomposition. It consists of rotation, dilation,

translation and if the wavelet lies on the same line then it is called wavelon instead

of a neuron.

Fig 4.5: Wavelet Neural network with translation, rotation,

dilation and wavelon

munotes.in

## Page 45

44SOFT COMPUTING TECHNIQUES

4.7 Associative Memory Networks -Overview

1. An associative memory is a content addressable memory structure that maps

the set of input patterns to the output patterns. It can store a set of patterns as

memories. The recall is through association of the key pattern with the help

of information memorized. Associative memory makes a parallel search with

a stored data file. The concept behind this type of search is to retrieve the

stored data either completely or partially.

2. A content -addressable structure refers to a memory organization where the memory is accessed by its content. The associative memory is of two types

autoassociative memory and heteroassociative memory which are single -

layer nets where the weights are d etermined by the net output which is stored

as a pattern. The architecture of the associative net is either feed -forward or

iterative.

4.8 Autoassociative Memory Network

1. In this network, training input and target output vectors are same.

2. Determination of w eight is called storing of vectors.

3. Weight is set to zero.

4. It increases net ability to generalize

5. The net’s performance is based on its ability to reproduce a stored pattern from

a noisy input.

Architecture

For an autoassociative net, the training input and target output vectors are the same.

The input layer consists of n input units and the output layer also consists of n output units. The input and output layers are connected through weighted interconnectio ns.

Fig 4.6: Autoassociative network

munotes.in

## Page 46

45Chapter 4: Supervised Learning Network II and Associative Memory Network4.8.1 Training Algorithm

4.9 Heteroassociative memory network

1. In this network, the training input and the target output vectors are different.

2. The determination of weights is done by either using Hebb rule or delta rule.

3. The net finds an appropriate output vector, corresponds to an input vector x,

that may be either one of the stored patterns or a new pattern.

Architecture

The input layer consists of n number of input units and the output layer consists of

m number of output units. There is a weighted connection between the input and

output layers. Here, the input and output are not correlated with each other.

Fig 4.7: Heteroassociative network

munotes.in

## Page 47

46SOFT COMPUTING TECHNIQUES

4.10 Bidirectional Associative Memory (BAM)

1. The BAM network performs forward and backward associative searches for

stored stimulus responses.

2. It a type of recurrent heteroassociative pattern matching network that encodes

using Hebbian learning rule.

3. BAM neural nets can respond either ways from inpu t and output layers.

4. It consists of two layers of neurons which are connected by directed weight

path connections.

5. The network dynamics involves two layers of interaction until all the neurons

reach equilibrium.

Fig: 4 .8 Bidirectional associative memory net

4.11 Hopfield Networks

1. These networks were developed by John. J. Hopfield.

2. Through his work, he promoted construction of the hardware chips.

3. These networks are applied in associative memory and optimization problems.

4. They are basically of two types -discrete and continuous Hopfield networks.

Discrete Hopfield networks - The Hopfield networks is an autoassociative fully

interconnected single -layer feedback network with fixed weights.

It works in discrete fashion. The network takes two -valued inputs -binary or

bipolar. In this network, only one unit updates its activation at a time.

The usefulness of content addressable memory is realized by discrete Hopfield net.

Continuous Hopfield networks - In this network, time is considered to be a

continuous variable. These networks are used for solving optimization problems

like travelling salesman problems. These networks can be realized as an electronic

circuit. The nodes of these Hopfield networks have co ntinuous graded output. The

total energy of the network decreases continuously with time.

munotes.in

## Page 48

47Chapter 4: Supervised Learning Network II and Associative Memory Network

QUSETIONS

1. Define Content addressable memory

2. What are the two main types of associative memory?

3. What are Back Propagation networks?

4. Explain the architecture and workin g of Radial basis function networks.

5. What is Bidirectional associative memory network?

6. Write a short note on Hopfield network.

REFERENCES

1. “Principles of Soft Computing”, by S.N. Sivanandam and S.N. Deepa, 2019,

Wiley Publication, Chapter 3 and Chapter 4.

2. Related documents, diagrams from blogs, e -resources from RC Chakraborty

lecture notes.

munotes.in

## Page 49

48SOFT COMPUTING TECHNIQUES

Unit 3

5 UNSUPERVISED LEARNING

Unit Structure

5.0 Introduction

5.1 Fixed Weight Competitive Nets

5.2 Mexican Hat Net

5.3 Hamming Network

5.4 Kohonen Self -Organizing Feature Maps

5.5. Kohonen Self-Organizing Motor Map

5.6 Learning Vector Quantization (LVQ)

5.7 Counter propagation Networks

5.8 Adaptive Resonance Theory Network

5.0 Introduction

In this learning, there exists no feedback from the system (environment) w indicate

the desired outputs of a network. The network by itself should discover any

relationships of interest, such as features, patterns, contours, correlations or

categories, classification in the input data, and thereby translate the discovered

relationships into outputs. Such networks are also called self -organizing networks.

An unsupervised learning can judge how similar a new input pattern is to typical

patterns already seen, and the network gradually learns what similarity is; the

network may construct a set of axes along which to measure similarity to previous

patterns, i.e., it performs principal component analysis, clustering, adaptive vector

quantization and feature mappin g.

For example, when net has been trained to classify the input patterns in to any one

of the output classes, say, P, Q, R, S or T, the net may respond to both the classes,

P and Q or R and S. In the case mentioned, only one of several neurons should fire, munotes.in

## Page 50

49Chapter 5: Unsupervised Learning

i.e., respond. Hence the network has an added structure by means of which the net

is forced to make a decision, so that only one unit will respond. The process for

achieving this is called competition. Practically, considering a set of students, if we

want to classify them on the basis of evaluation performance, their score may be

calculated, and the one whose score is higher than the o thers should be the winner.

The same principle adopted here is followed in the neural networks for pattern

classification. In this case, there may exist a tie; a suitable solution is presented

even when a tie occurs. Hence these nets may also be called competitive nets, the

extreme form of these competitive nets is called winner -take-all.

The name itself implies that only one neuron in the competing group will possess

a nonzero output signal at the end of competition .

There exist several neural networks that come under this category. To list out a

few: Max net, Mexican hat, Hamming net, Kohonen self -organizing feature map,

counter propagation net, learning vector quantization (LVQ) and adaptive

resonance theory (ART).

The learning algorithm used ·m most of these nets is known as Kohonen learning.

In this learning, the

units update their weights by forming a new weight vector , which is a linear

combination of the old weight vector and the new input vector. Also, the learning

continues for the unit whose weight vector is closest to the input vector. The weight

updation formula used in Kohonen learning for output cluster unit j is given as

monotonically as training continues. There exist two methods to determine the

winner of the network during competition. One of the methods for determining the

winner uses the square of the Euclidean distance between the input vector and

weight vector, and the unit whose weight vector is at the smallest Euclidean

distance from the input vector is chosen as the winner. The next method uses the

dot product of the input vector and weight vector. The dot product between the

input vector and weight vector is noth ing but the net inputs calculated for the

corresponding duster units. The unit with the largest dot product is chosen as the

winner and the weight updation is performed over it because the one with largest

munotes.in

## Page 51

50SOFT COMPUTING TECHNIQUESdot product corresponds to the smallest angle between the input and weight vectors, if both are of unit length. 5.1. Fixed Weight Competitive Nets These competitive nets arc those where the weights remain fixed, even during training process. The idea of competition is used among neurons for enhancement of contrast in their activation functions. These are Maxnet, Mexican hat and Hamming net. Maxnet The Maxnet serves as a sub net for picking the node whose input is larger. Architecture of Maxnet The architecture of Maxnet is shown in Figure 5·1, where fixed symmetrical weights are present over the weighted interconnections. The weights between the neurons are inhibitory and fixed. The Maxnet with this structure can be used as a subnet to select a particular node whose net input is the largest .

Figure 5.1 Maxnet Structure

munotes.in

## Page 52

51Chapter 5: Unsupervised Learning

Testing/Application Algorithm of Maxnet :

Step

0: Initial weights and initial activations are ser. The weight is set as ሾͲ൏ߝ൏

ͳȀ݉ሿ, where ̶݉ is the total number of nodes. Let

ݔሺͲሻൌ input to the node ܺ

and

ݓൌ൜ͳ if ݅ൌ݆

െߝ if ്݆݅

Step 1: Perform Steps ʹെͶ, when stopping condition is false. Step 2: Update the

activations of each node. For ݆ൌͳ to ݉,

ݔሺ݊אݓሻൌ݂ݔሺͲǤ݀ሻെߝԝ

ஷԝݔሺ݈݀ሻ

Step

3: Save the acrivarions obtained for use in the next iteration. For ݆ൌͳ to ݉,

ݔሺ oid ሻൌݔ (new)

Step 4: Finally, test the stopping condition for convergence of the network. The

following is the stopping condition: If more than one node has a nonzero

activation, continue; else stop.

5.2 Mexican Hat Net

In 1989, Kohonen developed the Mexican hat network which is a more generalized

contrast enhancement network compared to the earlier Maxner. There exist several "cooperative neighbors" (neurons in close proximity) to which every other neuron is connected

by excitatory link s. Also, each neuron is connected over inhibitory weights to a

number of" competitive neighbors" {neurons present farther away). There are

several oilier fanher neurons ro which the connections between the neurons are nor

established. Here, in addition to the connections within a particular laye·r Of neural

net, the neurons also receive some o ther external signals.

This interconnection pattern is repeated for several other neurons in the layer.

munotes.in

## Page 53

52SOFT COMPUTING TECHNIQUES

5.2.1 Architecture of Mexican Hat Net

The architecture of Mexican hat is shown in Figure 5·2, with the interconnection

pattern for node Xi. The

neurons here are arranged in linear order; having positive connections between Xi

and near neighboring units, and negative connections between Xi and fart her away

neighboring units. The positive connection region is called region of cooperation

and the negative connection region is called region of competition. The size of these

regions depends on the relative magnitudes existing between the positive and

negative weights and also on the topology of regions such as linear, rectangular,

hexagonal grids, ere. In Mexican Hat, there exist two symmetric regions around

each individual neuron.

The individual neuron in Figure 5 -2 is denoted by Xi. This neuron is surr ounded

by other neurons Xi+ 1,

Xi-1, Xi+ 2, Xi-2, .... The nearest neighbors to the individual neuron Xi are Xi+ 1, Xi-

1. Xi+ 2DQG;L-2·

Hence, the weights associated with these are considered to be positive and are

denoted by WI and w2. The

farthest neigh bors m the individual neuron Xi are taken as Xi+3 and Xi -3, the

weights associated with these are negative and are denoted by w3. It can be seen

chat Xi+ 4 and Xi -4 are not connected to the individual neuron Xi, and therefore no

weighted interconnections ex ist between these connections. To make it easier, the

units present within a radius of 2 [query for unit] to the unit Xi are connected with

positive weights, the units within radius 3 are connected with negative weights and

the units present further away f rom radius 3 are not connected in any manner co

the neuron X i.

Figure 5.2 Structure of Maxican Hat

munotes.in

## Page 54

53Chapter 5: Unsupervised Learning

5.2.3 Flowchart of Mexican Hat Net

The flowcha rt for Mexi cann hat is shown in Figure 5 -3. This dearly depicts the

flow of the process performed in Mexican Hat Network.

Figure 5.3. Flowchart of Mexican Hat

munotes.in

## Page 55

54SOFT COMPUTING TECHNIQUES

5.2.3 Algorithm of Mexican Hat Net:

The various parameters used in the training algorithm are as shown below.

ܴଶൌ radius of rcgions of interconnections

ା and ି are conniected to the individual units for ݇ൌͳ to ܴଶ.

ܴଵൌ adrus of tegion with positive reinforcement ሺܴଵ൏ܴଶሻ

ൌ weight berween and the unis ା and ି

Ͳင݇ငܴଵǡݓൌ positive

ܴଵင݇ငܴଶǡݓൌ negative

ݐൌ external input signal

ݔൌ vector of accivation

ݔൌ vecior of activations at previous time step

ݐ୫ୟ୶ൌ total number of iterations of contmst enhancemen.

Here the iteration is started only with the incoming of the external signal

presented to the network.

Step 2: When ݐ is less than ݐmax , perform Steps 3 -7.

Srep 3: Calculace net input. Fot ݅ൌͳ to ݊,

ݔൌܿଵԝோభ

ୀିோభݔశܿୀିோమିோభିଵݔశೖܿଶԝோమ

ୀோభାଵݔశೖ

Step 4: Apply the activation function. For ݅ൌͳ to ݊,

munotes.in

## Page 56

55Chapter 5: Unsupervised Learningݔൌሾݔ୫ୟ୶ǡሺͲǡݔሻሿ

Step 5: Save the current activations in ݔ, i.e., for ݅ൌͳ to ݊,

ݔൌݔ

Step 6: Increment the iteration counter:

ݐൌݐͳ

Step 7: Test for stopping condition. The following is the stopping condition:

If ݐ൏ݐEx. then continue Else stop . The positive reinforcement here has the

capacity to increase the activation of units with larger initial activations and the

negative reinforcement has the capacity’ to reduce the activation of unis with

smaller initial activations. The activation function used here for unit ୧ at a

particular time instant ᇱᇱݐᇱᇱ is given by

ݔሺߣሻൌ݂ݏሺݐሻԝ

ԝݓݔା݇ሺݐെͳሻ൩

The terms present within the summation symbol are the weighted signals that

arrived from other units ߙ the previous time step.

5.3 Hamming Network

The Hamming network selects stored classes, which are at a maximum Hamming

distance (H) from the

noisy vector presented at the input (Lippmann, 1987). The vectors involved in this

case are all binary and

bipolar. Hamming network is a maximum likelihood classifier that determines

which of several exemplar

vectors (the weight vector for an output unit in a clustering net is exemplar vector

or code book vector for the pattern of inputs, which the net has placed on that duster

unit) is mos t similar to an input vector (represented as an n~tuple). The weights of

the net are determined by the exemplar vectors. The difference between the tom!

number of components and the Hamming distance between the vecrors gives the

measure of similarity between the input vector and stored exemplar vcctors.lt is

already discussed the Hamming distance between the two vectors is the number of

components in which the vectors differ.

Consider two bipolar vectors x and y; we use a relation munotes.in

## Page 57

56SOFT COMPUTING TECHNIQUES

x . y = a - d

where a is the number of components in which the vectors agree, d the number of

components in which the vectors disagree. The value "a - d" is the Hamming

distance existing between two vectors. Since, the total number of components is n,

we have,

݊ൌܽ݀

i.e., ݀ൌ݊െܽ

On simplification, we ge t

ݔڄݕൌܽെ݀

ݔڄݕൌܽെሺ݊െܽሻ

ݔڄݕൌʹܽെ݊

ʹܽൌݔڄݕ݊

ܽൌͳ

ʹሺݔڄݕሻͳ

ʹሺ݊ሻ

From the above equation, it is clearly understood that the weights can be set to one-

half the exemplar vector and bias can be set initially to n/2. By calculating the unit

with the largest net input, the net is able to locate a particular unit that is clos est to

the exemplar. The unit with the largest net input is obtained by the Hamming net

using Maxnet as its subnet.

5.3.1. Architecture of Hamming Network:

The architecture of Hamming network is shown in Figure 5-4. The Hamming

network consists of two layers. The first layer com putes the difference between the

total number of component s and Hamming distance between the inpu t vector x and

the stored pattern of ve ctors in the feed -forward path. The efficient response in this

layer of a neuron is the indic ation of the minimum Hamming distance value

between the input and the category, which this neuron represents. The second layer

of the Hamming ne twork is composed of Maxnet (used as a subnet) or a Winner -

take-all network which is a recurrent network The Max net is found to suppress the

values at Maxnet output nodes except the initially maximum output node of the

first layer. munotes.in

## Page 58

57Chapter 5: Unsupervised Learning

Figure 5.4 Structure of Hamming Network

5.3.2 Testing Algorithm of Hamming Network:

The given bipolar input vector is x and for a given set of "m" bipolar exemplar

vectors say e(l),.

e(j), ... , e(m), the Hamming network is used to determine the exemplar vector

that is closest m the input

vector x. The net input entering unit Yj gives the measure of the similarity

between the input vector and

exemplar vector. The parameters used here are the following:

n = number of input units (number of components of input -output vector)

m= number of output units (number of components of exemplar vector)

e(j)= jth exemplar vector, i.e.,

e(j) = [e1 (j), ... , e j(j), ... , e n(j)]

The testing algorithm for the Hamming Net is as follows:

Step 0: Initialize the weights. For ݅ൌͳ ro ݊ and ݆ൌͳ ro ݉,

ݓൌ݁ሺ݆ሻ

ʹ

Initialize the bias for storing the ᇱ݉ exemplar vectors. For ݆ൌͳ to ݉,

ܾൌ݊

ʹ

Step 1: Perform Steps 2 -4 for each inpuc vector ݔ .

Step 2: Calculate the net input to each unit , i.e.,

munotes.in

## Page 59

58SOFT COMPUTING TECHNIQUESݕൌܾԝఠୀଵݔݓǡ݆ൌͳ to ݉

Step 3: Initialize the activations for Maxnet, i.e.,

ݕሺͲሻൌݕǡ݆ൌͳ to ݉

Step 4: Maxnet is found to iterate for finding the exemplar that best matches the

inpur patterns.

5.4 Kohonen Self -Organizing Feature Maps

Feature’s mapping is a process which converts the patterns of arbitrary dimensionality into a response of one- or two -dimensional arrays of neurons, i.e. it

converts a wide pattern space into a typical feature space. The network performing

such a mapping is called feature map. Apart from its capability to reduce the higher

dimensionality, it has to preserve the neighborhood relations of the input patterns,

i.e. it has to obtain a topology preserving map. For obtaining such feature maps, it

is required to find a self-organizing array which consist of neurons arranged in a

one-dimensional array or a two -dimensional array. To depict this, a typical network

structure where each component of the inpu t vector x is connected to each of nodes

is shown in Figure 5 -5.

Figure 5.5 One -dimensional Feature mapping network

On the other hand , if the input vector is two -dimensional, the inputs, say x(a, b),

can arrange themselves

in a two -dimensional array defining the input space (a, b) as in Figure 5 -6. Here,

the two layers are fully connected .

munotes.in

## Page 60

59Chapter 5: Unsupervised Learning

The topological preserving property is observed in the brain, bur nor found in any

other artificial neural network.

Figure 5.6. Two dimensional feature mapping network

5.4.1 Architecture of Kohone n Self -Organizing Feature Maps

Consider a linear array of cluster units as in Figure 5 -7. The neighborhoods of the

units designated by "o" of radii Ni(k1), Ni(k2) and Ni(k,), k1 > k, > k,, where k1 =

2, k2 = 1, k3 = 0.

For a rectangular grid, a neighborhood (Ni) of radii k 1, k2 , and k3 is shown in

Figure 5 -8 and for a

hexagonal grid the neighborhood is shown in Figure 5 -9. In all the three cases

(Figures 5 -7-5-9), the unit wi th “#” symbol is the winning unit and the other units

are indicated by "o." In both rectangular and hexagonal gr ids, k1 >k2 > k3, where

k1 = 2, k2 = 1, k3 = 0.

For rectangular grid, each unit has eight nearest neighbors but there are only six

neighbors for each unit in

the case of a hexagon grid. Missing neighborhoods may just be ignored. A typical

architecture of Kohonen self -organizing feature map (KSOFM) is shown in

Figure 5 -10.

munotes.in

## Page 61

60SOFT COMPUTING TECHNIQUES

Figure 5.7. Linear array of cluster units

Fifure 5.8.Rectanguler grid

Figure 5.9. Hexagonal grid

munotes.in

## Page 62

61Chapter 5: Unsupervised Learning

Figure 5.10. Kohonen self organizing feature map architecture

Flowchart of Kohonen Self -Organizing Feature Maps

Figure 5.11. Flowchart for training process of KSOFM

munotes.in

## Page 63

62SOFT COMPUTING TECHNIQUES

5.4.2. Training Algorithm of Kohonen Self-Organizing Feature Maps :

Step 0: - Initialize the weights ݓ : Random values may be assumed. They can be

chosen as the same range of values as the component if input vector. If

information related to distribution of clusters is known, the initial weights. can bet

taken to reflect that prior knowledge.

• Set topologic al neighborhood parameters: As clustering progresses, the

radius of the neighborhood Decreases

• Initialize the learning rate ߙ :It should be a slowly decreasing function of

time.

Step 1: Perform Steps ʹെͺ when stopping condition is false.

Step 2; Perform Steps 3 -5 for each input vector ݔ .

Step 3: Compute the square of the Euclidean distance, i.e., for each ݆ൌͳ

to ݉,

ܦሺ݆ሻൌԝ

ୀଵԝ

ୀଵ൫ݔെݓ൯ଶ

Step 4: Find the winning unit index
, so that ሺ
ሻ is minimum. (In Steps 3 and 4 ,

dot product method can also be used to find the winner, which is basically the

calculation of net input, and the winner will be the one with the largest dot

product.)

Step 5: For all units ݆ within a specific neighborhood of ܬ and for all ݅ ,calculate

the new weights:

ݓሺሻൌݓሺפሻേߙൣݔെݓሺ old ሻ൧

Or

ݓሺ new ሻൌሺͳെߙሻݓሺ old ሻߙݔ

Step 6: Update the learning rate ߙ using the formula ߙሺݐͳሻൌͲǤͷߙሺݐሻǤ

Step 7: Reduce radius of topological neighborhood at specified time intervals.

Step 8 : Test for stopping condition of the network munotes.in

## Page 64

63Chapter 5: Unsupervised Learning

5.5. Kohonen Self -Organizing Motor Map :

Figure 5.12. Architecture of kohonen self organizing motor map

The extension of K ohonen feature map for a multilayer network involve t he

addition of an association layer to the output of the self -organizing feature map

layer. The output node is found to associate the desired output values with certain

input vectors. This type of architecture is called as Kohonen self -organizing motor

map and layer that is added is called a motor map in which the movement

command,

are being mapped into two -dimensional locations of excitation. The architecture

of KSOMM is shown in

Figure 5 -12. Here, the feature map is a hidden layer and this acts as a compe titive

network which classifies the input vectors.

5.6 Learning Vector Quantization (LVQ)

LVQ is a process of classifying the patterns , wherein each output unit represents a

particular class. Here, for each class several units should be used. The output unit

weight vector is called the reference vector or code book vector for the class which

the unit represents. This is a special case of competitive net , which uses sup ervised

learning methodology. During the training the output units are found to b e

positioned to approximate the decision surfaces of the existing Bayesian classifier.

Here, the set of training patterns with known classifications is given to the network,

along with an initial distribution of the reference vectors. When the training pro cess

is complete, an LVQ net is found to classify an input vector by assigning it to the

munotes.in

## Page 65

64SOFT COMPUTING TECHNIQUES

same class as that of the ou tput unit, which has its weight vector very close to the

input vector. Thus LVQ is a classifier paradigm that adjusts the boundaries between

categories to minimize existing misclassification. LVQ is used for optical character

recognition, converting speech mro phonemes and other application as well.

5.6.1. Architecture of LVQ:

Figure 5 -13 shows the architecture of LVQ. From F igure 5 -13 it can be noticed

that there exists input layer with "n" unit; and output layer with "m" units. The

layers are found to be fully interconnected with weighted linkage acting over the

links.

Figure 5.13. Architecture of LVQ

5.6.2. Flowchart of LVQ:

The parameters used for the training process of a LVQ include the following:

ݔൌ taaining vector ሺݔଵǡǥǡݔǡǥǡݔሻ

ܶൌ category or class for the training vector ݔ

ݓൌ weight vector for jh outpus unit ൫ݖଵǡǥǡݓǡǥǡݓ௩൯

ܿൌ cluster or class or category associated with jh output unit.

The Euclidean distance of jh outpui unit is ܦሺ݆ሻൌσ൫ݔെݓ൯ଶǤ The flowchart

indicaring the flow of training process is shown in Figure ͷെͳͶ.

munotes.in

## Page 66

65Chapter 5: Unsupervised Learning

5.6.3. Training Algorithm of LVQ:

Step 0 : Initialize the reference vectors. This can be done using the following

steps.

• From the given sec of training vectors, take the first " ݉ ( "number of

clusters) training vectors and use them as weighc vectors, the remaining

vectors can be used for training.

• Assign the initial weights and classifications random.1y.

• -means chustering mechod.

munotes.in

## Page 67

66SOFT COMPUTING TECHNIQUES

Set initial learning rate ߙ.

Step1: Perform Ste ps ʹെ if the stopping condition is false.

Step 2: Perform Steps 3 -4 for each training input vector ݔ.

Step 3: Calculate the Euclidean distance; for ݅ൌͳ to ݊ǡ݆ൌͳ to ݉,

ܦሺ݆ሻൌԝ

ୀଵԝ

ୀଵ൫ݔെݓ൯ଶ

Find the winning unit index ܬ ,when ܦሺܬሻ is minimum.

Step 4: Update the weights on the winning unit, ݓ ,using the following

conditions.

If ܶൌݍǡ then ݑሺ݁݊ݓሻൌݑሺ݈݀ሻߙൣݔെݓఫሺ݈݀ሽ൧

If ്ܶݍǡ then ݑሺ݁݊ݓሻൌ݃ݑሺ݈݀ሻെߙൣݔെݑሺͲሿ݀൯൧

Step 5: Reduce the learning rate ߙ.

Step 6: Test for the stopping condition of the training process.

(The stopping conditions may be fixed number of epochs or if learning rate has

reduced to a negligible value.)

5.7 Counter propagation Networks

They are multilayer networks ba sed on the combinations of the input, output and

clustering layers. The applications of count er propagation nets are data compression, function approximation and pattern association. The counter

propagation network is basically constructed from an ins tar-outstar model. This

model is a three -layer neural network that performs input -output data mapping,

producing an output vector yin response t o an input vector x, on the basis of

competitive learning. The three layers in an instar -outstar model are the input layer,

the hidden (competitive) layer and the output layer. The connections between the

input layer and the competitive layer are the instar structure, and the connections

existing between the competitive layer and the output layer are the outstar structure.

There are two stages involved in the training process of a counter propagation net.

The input vectors are

clustered in the first stage. Originally, it is assumed that there is no topology

included in the count er propagation network. However, on the inclusion of a linear munotes.in

## Page 68

67Chapter 5: Unsupervised Learning

topology, the performance of the net can be improved. The dusters are formed using

Euclidean distance method or dot product method. In the second stage of training,

the weights from the cluster layer units to the output units are tuned to obtain the

desired response.

There are two types of count er propagation nets:

(i) Full counter propagation net

(ii) Forward -only counter propagation net

5.7.1. Full Counter propagation Net:

Full counter propagation net (full CPN) efficiently represents a large number of

vector pairs x:y by adaptively constructing a look -up-table. The approximation here

is x*.y* , which is based on the vector pairs x :y, possibly with some distorted or

missing elements in e ither vector or both vectors. The network is defined to

approximate a continue function, defined on a compact set A. The full CPN works

best if the inverse function f-1 exists and is continuous. The vectors x and y

propagate through the network in a counter flow manner to yield output vectors x*

and y* , which are the approximations of x and y, respective . During competition,

the winner can be determined either by Euclidean distance or by dot product

method. In case of dot product method, the one with the largest net input is the

winner. Whenever vectors are to be compared using the dot product metric, they

should be normalized. Even though the normalization can be performed without

loss of information by adding an extra component, yet to avoid the comp lexity

Euclidean distance method can be used. On the basis of this, direct comparison can

be made between the full CPN and forward -only CPN.

For continuous function, the CPN is as efficient as the back -propagation net; it is a

universal continuous function approximate . In case of CPN, the number of hidden

nodes required to achieve a particular level

of accuracy is greater than the number required by the back -propagation network.

The greatest appeal of

CPN is its speed of learning. Compared to various mappin g networks, i t requires

only fewer steps of training to achieve best performance. This is co mmon for any

hybrid learning method that combines unsupervised learning (e.g., instar learning)

and supervised learning (e.g., outsrar learning).

As already discussed, the training of CPN occurs in two phases. In the input phase,

the units in the duster munotes.in

## Page 69

68SOFT COMPUTING TECHNIQUES

layer and input layer are found to be active. In CPN, no topology is assumed for

the cluster layer units; only the winning units are allowed to learn. The weight

pupation learning rule on the winning duster units is

ݒሺ new ሻൌݒሺͲ݈݀ሻߙൣݔെݒሺ݈݀ሻ൧ǡ݅ൌͳ to ݊

ݓȀሺ new ሻൌݓሺ݈݀ሻߚሺݕെݓሺ݈݀ሻሿǡ݇ൌͳ to ݉

In the second phase of training, only the winner unit J remains active in the

cluster layer. The weights between the winning cluster unit J and the output units

are adjusted so that the vector of activations of the units in the Y -output layer is

y* which is an approximation to the input vector y and X* which is an

approximation to the input vector x. The weig ht updating for the units in the Y -

output and X -output layers are

ݑሺ݁݊ݓሻൌݑሺ݈݀ሻܽൣݕെݑሺ݈݀ሻ൧ǡ݇ൌͳ to ݉

ݐሺ݁݊ݓሻൌݐሺ݈݀ሻܾൣݔെݐሺ݈݀ሻ൧ǡ݅ൌͳ to ݊

5.7.2. Architecture of Full Counter propagation Net

The general structure of full CPN is shown in Figure 5 -15. The complete

architecture of full CPN is shown in Figure 5 -16.

The four major components of the instar -outstar model are the input layer, the

instar, the competitive layer and the ou tstar. For each node i in the input layer, there

is an input value xi;. An instar responds maximally to the input vectors from a

particular duster. All the ins tar are grouped into a layer called the competitive layer.

Each of the instar responds maximally to a group of input vectors in a different

region of space. This layer of instars classifies any input vector because, for a given

input, the winning instar with the strongest response identifies the region of space

in which the input vector lies. Hence, it is necessary that the competitive layer

single outs the winning instar by setting its output to a nonzero value and also

suppressing the other outputs to zero. That is, it is a winner -take-all or a Maxnet -

type network. An outstar model is found to have all the nodes in the output layer

and a single node in the competitive layer. The outstar looks like the fan -out of a

node. Figures 5 -17 and 5 -18 indicate the unit s that are active during each of the two

phases of training a full CPN. munotes.in

## Page 70

69Chapter 5: Unsupervised Learning

Figure 5.15.General Structure of full CPN

Figure 5.16. Architecture of full CPN

munotes.in

## Page 71

70SOFT COMPUTING TECHNIQUES

Figure 5.17 First phase of training of full CPN

Figure 5.18 Second phase of training of full CPN

5.7.3. Training Algorithm of Full Counter propagation Net :

Step 0: Set the initial weighrs and the initial learning rate.

Step 1: Perform Sreps ʹെ if stopping condition is folse for phase I training.

Step 2: For each of the training input vector pair ݔǣݕ presented, perform Steps

͵െͷǤ

Step 3: Make the X -input layer activations to vector . Make the Y -inpur layer

acrivations to vector Y.

Step 4: F ind the winning cluster unit. If dot product method is used, find the cluster

unit ݖ with target net inpur: for ݆ൌͳ to .

ݏൌԝ

ୀଵݔݒԝ

ୀଵߛݓೕ

If Euclidean distance merhod is used, find the cluster unis ݖଵ whore squared

distance from input vecrors is the smallest:

munotes.in

## Page 72

71Chapter 5: Unsupervised Learningܦൌԝୀଵ൫ݔെݒ൯ଶԝୀଵۦߛെݑ݇ۧଶ

If there occurs a tie in case of selection of winner unit, the unit with the smallest

index is the winner. Take the winner unit index as J.

Step 5: Update the weights over the calculated winner unit ݖǤ

Step 6: Reduce the learning rates.

ߙሺݐͳሻൌͲǤͷߙሺݐሻǣߚሺݐͳሻൌͲǤͷߚሺݐሻ

Step 7: Test stopping condition for phase I training.

Step 8: Perform Steps 9-15 when stopping condition is false for phase II training.

Step 9: Perform Steps ͳͲെͳ͵ for each training input pair ݔǢݕ .Here ߙ and ߚ are

small constant values.

Step 10: Make the X -input layer activations to vector ݔ .Make the Y -input layer

activa tions to vectot ݕ .

Step 11: Find the winning cluster unit (use formulas from Step 4). Take the

winner unit index as ݆ .

Step 12: Update the weights entering into unit 3).

For ݅ൌͳ to ݊ǡݒሺ new ሻൌݒሺሻߙൣݔെݒሺሻ൧

For ݇ൌͳ to ݉ǡݓሺሻൌݓȀሺሻߚሾݕെݓሺሻሿ

Step 13: Update the weights from unit ݖ to ghe outpur layers.

For ݅ൌͳ to ݊ǡܿሺ new ሻൌݐሺ old ሻܾൣݔെݐሺሿ݀൯൧

For ݇ൌͳ to ݉ǡݑሺ new ሻൌݑሺ old ሻܽൣݕെݑሺ old ሻ൧

Step 14: Reduce the learning rates ܽ and ܾ.

ܽሺݐͳሻൌͲǤͷܽሺݐሻǢܾሺݐͳሻൌͲǤͷܾሺݐሻ

Step 15: Test stopping condition for phase II training.

munotes.in

## Page 73

72SOFT COMPUTING TECHNIQUES

5.7.4. Testing Algorithm of Full Counter propagation Net:

Step 0: Initialize the weights (from training algorithm).

Step 1: Perform Steps 2 -4 for each input pair X: Y.

Step 2: Ser X -input layer activations to vector . Ser Y -input layer activarions to

vector Y.

Step 3: Find the cluster unir ݖ that is closest to the input pair.

Step 4: Calculate approximations to ݔ and ݕ :

ݔכൌݐǢݕכൌݑ

5.7.5. Forward Only Counter propagation Net:

A simplified version of full CPN is the forward -only CPN. The approximation of

the function y = f(x) but not of x = f(y) can be performed using forward -only CPN,

i.e., it may be used if the mapping from x to y is well defined but mapping from y

to x is not defined. In forward -only CPN only the x -vectors are used to form the

clusters on the Kohonen units. Forward -only CPN uses only the x vectors to form

the clusters on the Kohonen units during first phase of training.

In case of forward -only CPN, first input vectors are presented to the input units.

The cluster layer units compete with each other using winner -take-all policy to

learn the input vector. Once entire set of training vectors has been presented, there

exist reduction in learning rate and the vector s are presented again, performing

several iterations. First the weights between the input layer and duster layer are

trained. Then the weights between the cluster layer and output layer are trained.

This is a specific competitive network, with target known . Hence, when each input

vector is presented m the input vector, its associated target vectors are presented to

the output layer. The winning duster unit sends its signal to the output layer. Thus

each of the output unit has a computed signal (wjk) and die target value (yk). The

difference between these values is calculated; based on this, the weights between

the winning layer and output layer are updated. The weight updation from input

units to cluster units is done using the learning rule given below:

For i= 1 to n,

ڄݒሺ new ሻൌݒ fold ሻߙൣݔെݒሺ old ሻ൧ൌሺͳെߙሻݒሺ old ሻߙݔ munotes.in

## Page 74

73Chapter 5: Unsupervised Learning

The weight updation from cluster units to output units is done using following the

learning rule: For ݇ൌͳ to ݉,

ݓሺ new ሻൌݒሺ old ሻܽൣݕെݓሺ݈݀ሻ൧ൌሺͳെܽሻݓሺ od ሻܽݕ

The learning rule for weight updation from the duster units to output units can be

written in the form of delta rule when the activations of the cluster units ൫ݖ൯ are

included, and is given as

ݓሺ new ሻൌݓሺͲפ݀ሻ݊ݖൣݕെݓሺ old ሻൟ

where

ݖൌ൜ͳ if ݆ൌܬ

Ͳ if ്݆ܬ

This occurs when ݓ is interprered as the computed output (i.e., ݕൌݓ ). In

the formulation of forward -only CPN also, no topological structure was assumed.

5.7.6. Architecture of Forward Only Counter propagation Net:

Figure 5 -20 shows the architecture of forward -only CPN. It consists of three layers:

input layer, cluster (competitive) layer and output layer. The architecture of

forward -only CPN resembles the back -propagation network, but in CPN there

exists interconnections between the units in the duster layer (which are nor

connected in Figure 5 -20). Once competition is completed in a forward -only CPN,

only one unit will be active in that layer and it sends signal to the output layer. As

inputs are presented m the network, the desired outputs will also be presented

simultaneously.

Figure 5.19 Architecture of forward only CPN

munotes.in

## Page 75

74SOFT COMPUTING TECHNIQUES

5.7.8. Training Algorithm of Forward Only Counter propagation Net:

Step 0: Initialize the weights and learning races.

Step 1: Perform Steps 2 -7 when stopping condition for phase I training is false.

Step 2: Perform Steps 3 -5 for each of training input ܺ .

Step 3: Set the X -input layer acrivations to vector ܺ.

Step 4: Compute the winning cluster unit ሺܬሻ. If dot product mechod is used, find

the cluster unit zy Step wich the largest net input:

ݖൌԝ

ୀଵݔݒ

If Euclidean distance is used, find the cluster unit ݖ square of whose distance

from the input pattetn is smallest:

ܦൌԝ

ୀଵ൫ݔെݒ൯ଶ

If there exists a tie in the selection of winner unit, the unit with the smallest index

is chosen as the winner.

Step 5: Perform weight updation for unit ݖ. For ݅ൌͳ to ݊,

ݒሺ new ሻൌݒሺ old ሻߙൣݔെݒሺ old ሻ൧

Step 6: Reduce learning mte ߙ

ߙሺݐͳሻൌͲǤͷߙሺݐሻ

Step 7: Test the stopping condition for phase I training.

Step 8: Perform Steps ͻെͳͷ when stopping condition for phase II training is

false. (Set ߙ a small constant value for phase II training.)

Step 9: P erform Steps 10 -13 for each tmining input pait ݔǤ. .

Step 10: Ser X -input layer activations to vector X. Set Y -output layer activations

to vector Y.

Step 11: Find the winning cluster unit (J) [use formulas as in Step 4]. munotes.in

## Page 76

75Chapter 5: Unsupervised Learning

Step 12: Update the weights int o unit ݖ. For ݅ൌͳ to ݊,

ݒሺ new ሻൌݒሺ old ሻߙሾݔെݒ (old) פ

Step 13: Update the weights from unit ) to the output units. For ݇ൌͳ to ,

ݓሺ new ሻൌݓሺ old ሻߚൣߟെݓሺ old ሻ൧

Step 14: Reduce learning rate ߚ ,i.e.,

ߚሺݐͳሻൌͲǤͷߚሺݐሻ

Step 15: Test the stopping condition for phase II training.

5.7.9. Testing Algorithm of Forward Only Counter propagation Net:

Step 0: Set initial weights. (The initial weights here are the weights obtained

during training.)

Step 1: Present input vector .

Step 2: Find unit J that is closest to vector .

Step 3: Set activations of output units:

ݕൌݓ

5.8 Adaptive Resonance Theory Network

The adaptive resonance theory (ART) network, developed by Steven Grossberg

and Gail Carpenter (1987), is consistent with behavioral models. This is an

unsupervised learning, based on competition, that finds categories autonomously

and learn s new categories if needed. The adaptive resonance model was developed

to solve the problem of instability occurring in feed -forward systems. There are

two types of ART: ART 1 and ART 2. ART 1 is designed for clustering binary

vectors and ART 2 is designed to accept continuous -valued vectors. In both the

ners, input patterns can be presented in any order. For each pattern, presented to

the network, an appropriate cluster unit is chosen and the weighs of the cluster unit

are adjusted to let the cluster unit learn the pattern. This network controls the degree

of similarity of the patterns placed on the same cluster units. During training, each

training pattern may be presented several times. It should be noted that the mput

patterns should not be presented on the same cluster unit, when it is presented each

time. On the basis of this, the stability of the net is defined as that wherein a pattern

is not presented o previous cluster units. munotes.in

## Page 77

76SOFT COMPUTING TECHNIQUES

The adaptive resonance theory (ART) network, developed by Steven Grossberg

and Gail Carpenter ሺͳͻͺሻ, is consistent with behavioral models. This is an

unsupervised learning, based on competition, that finds categories auconomously

and learns new categories if needed. The adapdive resonance model was developed

to solve the proble m of instability oceutring in feed -forward systems. There are two

types of ART: ART 1 and ART 2. ART 1 is designed for clustering binary vectors

and ART 2 is designed to accept continuous -valued vectors. In both the ners, input

patterns can be presented in any order. For each pattern, presented to the network,

an appropriate cluster unit is chosen and the weighs of the cluster unit are adjusted

to let the cluster unit learn the pattern. This network controls the degree of similarity

of the patterns placed o n the same cluster units. During training, each training

pattern may be presented several times. It should be noted that the input patterns

should not be presented on the same cluster unit, when it is presented each time.

On the basis of this, the stabilit y of the net is defined as that wherein a pattern is

not presented (o previous cluster units The stability may be achieved by reducing

the learning rates. The ability of the network to respond to a new pattern equally at

any stage of learning is called as plastic: ART nets are designed to possess the

properties, stability and plasticity. The key concept of ART is that the stability

plasticity can be resolved by a system in which the network includes bottom -up

(input -output) competitive learning combined wit h top-down (output -input)

learning. The instability of instar -outstar networks could be solved by reducing the

learning rate gradually to zero by freezing the learned categories. Buc, at this point,

the net may lose its plasticity or the ability to react to new data. Thus it is difficult

to possess both stability and plasti city. ART networks are designed particularly to

resolve the stability -plasticity dilemma, that is, they are stable to preserve

significant past learning but nevertheless remain adaptable to incorporate new

information whenever it appears.

5.8.1. Fundamenta l architecture of ART -

Three groups of neurons reused to build an ART network. These include:

1. Input processing neurons (F1 layer).

2. Clustering units (F2 layer).

3. Control mechanism (controls degree of similari ty of patterns placed on the

same duster

The processing neuron ሺ 1) layer consists of two portions: Input portion and

interface portion input portion may perform some processing based on the inputs it

receives. This is especially performed in the case of ART 2 compared to ART 1. munotes.in

## Page 78

77Chapter 5: Unsupervised Learning

The interface portion of the ଵ layer combines the input from input portion of ଵ

and ଶ layers for comparing the similarity of the input signal with the weight vector

for the interface portion ʹͷ (b).

There exist two sets of weighted interconnections fo r controlling the degree of

similarity between the units in the interface portion and the cluster layer. The

bottom -up weights are used for the connection from ଵሺ ሻ layer to ଶ tayer and

are represented by ߜሺ݂ th ଵ unit to ଶ unit). The iop -down weights are used

for the connection from ଶ layer to ଵሺ ሻ layer and are repiesented by ݐఓ᪄ሺ݆ th ଶ

unit to ݅ th ଵ anic). The competitive Jayer in this cose is the cluster layct and the

duster unit wich largest net input is the victim to learn the inpu t pattern, and the

activations of all other ଶ urnis are mate zero The interface units combinc the data

from input and cluster layer units. On the basis of the similarity between the top -

down weight vector and input vector, the cluster unit may be allowed to learn the

input pattern. This decision is done by -esset mechanism unit on the basis of the

signals receives from interface portion and input portion of the ଵ layer. When

duster unit is not allowed to learn, it is inhibited and a new cluster unit is sel ected

as the victim.

5.8.2. Fundamental algorithm of ART -

Step 0: initialize the necessary parameters.

Step 1: Perform Steps ʹെͻ when stopping condition is false.

Step 2: Perform Steps ͵െͺ for each input vector.

Step 3: ଵ layer processing is done.

Step 4: Perform Steps ͷെ when teset condition is true.

Step 5: Find the victim unit to learn the current input pattern. The victim unit is going to

be the ଶ unit (that is nor inhibited) with the largest input.

Step 6: F 1 (b) u nits combine their inputs from F 1 (a) and F 2.

Step7: Test for reset condition. Step If reset is true, then the current victim unit is rejected

(inhibited); go to Step ͶǤ If reser is false, then che carrent victim unit is accepted for

learning; go to nex t step (Step 8).

Step 8: Weight updation is performed.

Step 9: Test for stopping condition.

Adaptive resonance theory 1 (ART 1) network is designed for binary input vectors.

As discussed generally, the ART 1 net consists of two fields of units -input unit ሺܨଵ

unit) and output unit ሺܨଶ unit) -aiong with the reser control unit for controlling the

degree of similarity of patterns placed on the same cluster unit. There exist two sets munotes.in

## Page 79

78SOFT COMPUTING TECHNIQUES

of weighted interconnection patch between ଵ and ଶ layers. The supplemental unic

present in the net provides the efficient neural control of the leatning process.

Carpenter and Grossberg have designed ART 1 network as a real -time system. In

ART 1 network, ic is not necessary to present an input pattern in a particular order;

it can be presented in any order. ART 1 network can be practically implemented by

analog circuits governing the differential equations, i. Q. the bottom -up and top

down weights are controlled by differential equations.)ART 1 network ru ns

throughout autonomously. It does nor require any external control signals and can

run stably with infinite patterns of input data.

ART 1 network is trained using fast learning method, in which the weights reach

equilibrium during each learning trial. Du ring this resonance phase, the activations

of F units do not change; hence the equilibrium weights can be determined exactly

The ART 1 network performs well with perfect binary input patterns, but is

sensitive to noise in the input dara. Hence care should be taken to handle the noise .

5.8.3. Fundamental architecture of ART1 -

The ART 1 network is made up of two units:

1 Computational units.

2 Supplemental units.

In this section we will discuss in detail about these two units.

Computational units

The computational unit for ART 1 consists of the following:

1 Input units ሺ ଵ unit െ both input portion and interface portion).

2 Cluster units ሺ ଶ unit െ outpuc unit),

Reset control unit (controls degree of similarity of patterns placed on same cluster).

The basic architecture of ART I (computational unit) is shown in Figure 5 -22. Here

each unit present in the input portion of ଵ layer ሺǡǤǡ ଵሺሻ layer unic) is

connected to the respective unic in the interface portion of E layer (i.e., ଵሺ ሻ layer

unit). Reset control unit has connections from each ଵሺሻ and ଵሺ ሻ units. Also,

each unit in ଵሺ ሻ layer is connected through two weighted interconnection pachs

to each unic in ଶ layer and the reser control unit is connected to every F2 unit.The

Xi unit of F 1(b) layer is connected to Y j unit of F 2 layer through bottom -up weight

(bij) and the Y j unit of F 2 is connected to X i unit of F 1 through top -down weights munotes.in

## Page 80

79Chapter 5: Unsupervised Learning

(tji). Thus ART 1 includes a bottom -up competitive learning system combined with

a top -down outst ar learning system. In Figure ͷെʹʹ for simplicity only the

weighted interconnections ܾ and ݐ are shown, the other units’ weighted interconnections are in a similar way. The cluster layer ሺܨଶ layer) unit is a

competitive layer, where only the uninhibited node with the largest net input has

nonzero activation.

Figure 5 .22 Basic architecture of ART 1

5.8.4. Training Algorithm of ART1 -

Step 0: initialize the parameters:

and Ͳ൏ߩͳ

Initialize the weights:

Ͳ൏ܾሺͲሻ൏ߙ

ߙെͳ݊ and ݐሺͲሻൌͳ

Step 1: Perform Steps 2 -13 when stopping condition is false.

Step 2: Perform Steps ͵െͳʹ for each of the training input.

Step 3: Set activations of all ଶ units to zero. Set the activations of ଵሺʹሻ units to

input vectors.

Step 4: Calculate the n orm of ȭ צݏצൌԝݏ

munotes.in

## Page 81

80SOFT COMPUTING TECHNIQUES

Step 5: Send input signal from ଵ (a) layer to ଵ (b) byer:

ݔଵൌݏ

Step ǣ for each ଶ pode thar is not inhibited, the following rule should hold: If

ݕǢ്െͳ, then ݕ᪄ൌσܾ௫

Step 7: Perform Steps ͺെͳͳ when reset is true.

Step 8 : Find J for ݕݕ for all nodes ݆ .If ݕൌെͳ, then all the nodes are

inhibited and note that this pattern cannot be clustered.

Step 9: Recalculate activation of ଵሺ ሻ :

ݔൌݏݐ

Step 10: Calculate the norm of vector ݔ.

צݔצൌԝ

ݔ

Step 11: Test for reset condition. If צݔצȀצݏצ൏ߩ ,then inhibit node ܬǡݕൌെͳǤ

Go back to step 7 again. Else if צݔצȀצݏצߩ ,then procced to the next step

(Step 12).

Step 12: Perform weight updation for node J. (fast learning):

ܾሺ new ሻൌߙݔ

ߙെͳצݔצ

ඥݐ݅ (new) ൌݔൟ

Step 13: Test for stopping condition. The following may be the stopping

conditions:

a. No change in weights.

b. No reset of units.

c. Maximum number of epochs reached.

5.8.5. Adaptive Resonance Theory 2 (ART2):

Adaptive resonance theory 2 (ART 2) is for continuous -valued input vectors. In

ART 2 network complexity is higher than ART 1 network because much processing

is needed i n F 1 layer. ART 2 network was developed by Carpenter and Grossberg munotes.in

## Page 82

81Chapter 5: Unsupervised Learning

in 1987. ART 2 network was designed to self -organize recognition categories for

analog as well as binary input sequences. The major difference between ART l and

ART 2 networks is the input layer. On the basis of the stability criterion for analog

inputs, a three -layer feedback system in the input layer of ART 2 network is

required: A bottom layer where the input pa tterns are read in, a top layer where

inputs coming from the output layer are read in and a middle layer where the top

and bottom patterns are combined together to form a marched pattern which is then

fed back to the top and bottom input layers. The complexity in the F1 layer is

essential because continuous -valued input vec tors may be arbitrarily dose together.

The F1 layer consists of normalization and noise suppression parameter, in addition

to comparison of the bottom -up and top -down signals, needed for the reset

mechanism.

The continuous -valued inputs presented to the ART 2 n etwork may be of two

forms. The first form

is a "noisy binary" signal form, where the information about patterns is delivered

primarily based on the

components which are "on" or "off," rather than the differences existing in the

magnitude of the components chat are positive. In this case, fast learning mode is

best adopted. The second form of patterns are those, in which the range of values

of the comp onents carries significant information and the weight vector for a cluster

is found to be interpreted as exemplar for· the patterns placed -on chat unit. In this

type of pattern, slow learning mode is best adopted. The second form of data is

"truly continuo us.''

5.8.6. Fundamental architecture of ART2 -

A typical architecture of ART 2 network is shown in Figure ͷെʹͷǤ From the

figure, we can notice that ଵ layer consists of six types of units - W, X, U, V, P, Q -

and there are " ݊ "units of each type. In Figur e ͷെʹͷ, only one of these units is

shown. The supplemental parc of the connection is shown in Figure ͷെʹǤ

The supplemental unit " ᇱᇱ between units ܹ and ܺ receives signals from all " ܹ"

units, computes the no run of vector ݓ and sends this signal to each of the ܺ units.

This signal is inhibitory signal. Each of this ሺଵǡǥǡǡǥǡሻ also receives

excicatory signal from the corresponding ܹ unit. In a similar way, there exists

supplementa l units between ܷ and ܸ ,and ܲ and ܳ ,performing the same operation

as done between W and . Each unit and unit is connecred to unit. The

connections between ୨ of the ଵ layer and of the ଶ layer show the weighted munotes.in

## Page 83

82SOFT COMPUTING TECHNIQUESinterconnections, which multiplies the signals transmitted over those pachs. The winning ଶ unics’ activation is ݀ሺͲ൏݀൏ͳሻ. There exists normalization between ܹ and ǡ and ଵ and and . The noimalization is performed approximately to unit length. The operations performed in ଶ layer are same for both ART 1 and ART 2. The units in ଶ layer compete with each other in a winner -take-all policy to learn each input pattern. The testing of reset condition differs for ART 1 and ART 2 networks. Thus, in ART 2 network, some processing of the input vector is necessary because the magnitudes of the real valued input vectors may vary more than for the binary input vectors.

Figure 5.25. Architecture of ART2 network 5.8.7.Training Algorithm of ART2: Step 0: Initialize the following parameters: ܽǡܾǡܿǡ݀ǡ݁ǡߙǡߩǡߠǤ Also, specify the number of epochs of training (nep) and number of learning iterations (nit). Step 1: Perform Steps 2-12 (nep) times. Step 2: Perform Steps ͵െͳͳ for each input vector ݏǤ

munotes.in

## Page 84

83Chapter 5: Unsupervised Learning

Step 3: Update ଵ unit activations:

ݑൌͲǢݓଶൌݏǢൌͲǢݍൌͲǢݒൌ݂ሺݔሻ

ݔൌݏ

݁צݏצ

Update ଵ unit activations again:

ݑൌݒ

݁צݒצǢݓൌݏܽݑǢ

ܲൌݑǢݔൌݓ

݁צݓצᇱǢ

ݍൌ

݁צצǢݒൌ݂ሺݔሻܾ݂ሺݍሻ

In ART 2 networks, norms are calculated as the square root of the sum of the

squares of the respective values.

Step 4: Calculate signals to ଶ units:

ݕൌԝ

ୀଵܾ

Step 5: Perform Steps 6 and 7 when reset is true.

Step 6: Find ଶ unit wich largest signal
is defined such that ݕݕǡ݆ൌͳ (o

݉ሻ.

Step 7: Check for reser:

ݑൌݒ

ܿצߥצǢൌݑ݀ݐǢݎൌݓܿ

݁צݑצൊܿצצ

If צݎצ൏ሺߩെ݁ሻ, then ݕൌെͳ (inhibit ܬሻ. Reser is true; perform Step 5 .

If צݎצሺߩെ݁ሻǡ then

ݓൌݏܽݑǢݔൌݓ

݁צݓצǣ

ݍൌ

݁צצǢݒൌ݂ሺݔሻܾ݂ሺݍሻ

Reset is false. Proceed to Step ͺǤ

Step 8: Perform Steps 9 -l 1 for specified number of learning i nteractions. munotes.in

## Page 85

84SOFT COMPUTING TECHNIQUES

Step 9: Update the weights for winning unit J:

ݐൌ݀ߙݑሼሾͳ݀ߙሺ݀െͳሻሽሽݐ

ܾൌ݀ߙݑሼہͳ݀ߙሺ݀െͳሻሿሽܾ

Step 10: Update F_ acrivations:

ݑൌݒ

ܿצߥצǣݓൌݏܽݑǢ

ܲൌݑ݀ݐǢݔൌݓ

݁צݓצǢ

ݍൌܲ

݁צצǢݒൌ݂ሺݔሻܾ݂ሺݍሻ

Step 11: Check for the stopping condition of weight updating .

Step 12: Check for the stopping condition for number of epochs.

Review Questions:

1. Explain the concept of Unsupervised Learning.

2. Write a short note on Fixed Weight Competitive Nets

3. Explain Algorithm of Mexican Hat Net

4. What is mean by Hamming Network

5. Explain the Architecture of Hamming Network

6. Write a short note on Kohonen Self -Organizing Feature Maps

7. Write a short note on Learning Vector Quantiz ation (LVQ)

8. Explain Counter propagation Networks

9. What is mean by Adaptive Resonance Theory Network

Reference

1. “Principles of Soft Computing”, by S.N. Sivanandam and S.N. Deepa,

2019, Wiley Publication, Chapter 2 and 3

2. http://www.sci.brooklyn.cuny.edu/ (Artificial Neural Networks, Stephen

Lucci PhD)

3. Related documents, diagrams from blogs, e -resources from RC Chakraborty

lecture notes and tutorialspoint.com.

munotes.in

## Page 86

85Chapter 6: Special Networks

Unit 1

6 SPECIAL NETWORKS

Unit Structure

6.1 Simulated Annealing Network

6.2. Boltzmann Machine

6.3. Gaussian Machine

6.4. Cauchy Machine

6.5. Probabilistic Neural Net

6.6. Cascade Correlation Network

6.7. Cognitron Network

6.8. Neocognitron Network

6.9. Cellular Neural Network

6.10. Optical Neural Networks

6.11 . Spiking Neural Networks (SNN)

6.12. Encoding of Neurons in SNN

6.13. CNN Layer Sizing

6.14. Deep learning Neural networks

6.15. Extreme Learning Machine Model (ELMM)

6.1. Simulated Annealing Network

The concept of simulated annealing has it origin in the physical annealing process

performed over metals and other substances. In metallurgical annealing, a metal

body is heated almost to its melting point and then c ooled back slowly to room

temperature. This process eventually makes the metal's global energy function

reach an absolute minimum value. If the metal's temperature is reduced quickly, the

energy of the metallic lattice will be higher than this minimum valu e because of the

existence of frozen lattice dislocations that would otherwise disappear due to

thermal agitation. Analogous to the physical annealing behaviour, simulated annealing can make a system change its state to a higher energy state having a

chance to jump from local minima or global maxima. There exists a cooling

procedure in the simulated annealing process such that the system has a higher munotes.in

## Page 87

86SOFT COMPUTING TECHNIQUES

probability of changing to an increasing energy state in the beginning phase of

convergence. Then, as time goes by, the system becomes stable and always moves

in the direction of decreasing energy state as in the case of normal minimization

produce.

With simulated ann ealing, a system changes its state from the original state old

to a new stare new with a probability given by

ൌͳ

ͳሺെȟܧȀܶሻ

where ȟܧൌܧold െܧnew (energy change ൌ difference in new energy and old

energy) and ܶ is the nonnegative parameter ( acts like temperature of a physical

system). The probability as a function of change in energy ሺȟܧሻ obtained for

different values of the remperature ܶ is shown in Figure െͳ. From Figure െͳ,

it can be noticed that the probability when ȟܧͲ is always higher than she

probability when ȟܧ൏Ͳ for any remperature.

An optimization problem seeks to find some configuration of parameters ܺǗൌ

ሺܺଵǡǥǡܺሻ, hat minimizes some function ݂ሺܺሻ called cost function. In an arcificial

neural network, configuration parameters are associated with the set of weights and

the cost function is associated with the error function.

The simulated annealing concept is used in statistical mechanics and is cal led

Metropolis algorithm. As discussed earlier, this algorithm is based on a material

that anneals into a solid as temperature is slowly decreased. To understand this,

consider the slope of a hill having local valleys. A stone is moving down the hill.

Here , the local valleys are local minima, and the bottom of the hill is going to be

the global or universal minimum. It is possible that the stone may stop at a local

minimum and never reaches the global minimum. In neural nets, this would

correspond to a set of weights that correspond to that of local minimum, but this is

nm the desired solution. Hence, to overcome this kind of situation, simulated

annealing perturbs the stone such that if it is trapped in a local minimum, it escapes

from it and continues fall ing till it reaches its global minimum (optimal solution).

At that point, further perturbations cannot move the stone to a lower position.

Figure 6 -2 shows the simulated annealing between a stone and a hill.

munotes.in

## Page 88

87Chapter 6: Special Networks

Figure 6.1 Probability “P” as a function in energy(AE)

for different values of temperature T

Figure 6.2 Simulated annealing stone and hill The components required for annealing algorithm are the following 1 A basic system configuration: The possible solu tion of a problem over which

we search for a best (optimal) answer. (In a neural ner, this is optimum steady -

state weight.) 2 The move set: A ser of allowable moves thar permit us to escape from local

minima and reach all possible configurations. 3 A cost func tion associated with the error function.

munotes.in

## Page 89

88SOFT COMPUTING TECHNIQUES4 A cooling schedule: Starting of the cost function and rules to determine when it should be lowered and by how much, and when annealing should be

terminated.

Simulated annealing networks can be used to make a network converge to its

global minimum. 6.2. Boltzmann Machine The early optimization technique used in artificial neural networks is based on the Boltzmann machine. When the simulated annealing process is applied w the discrete Hopfield network, it becomes a Boltzmann machine. The network is configured as the vector of the states of the units, and the stares of the units are binary valued with probabilities state transition. The Boltzmann machine described in this section has fixed weights wij. On applying the Boltzmann machine to a constrained optimization problem, the weights represent the constraints of the problem and the quantity to be optimized. The discussion here is based on the fact of maximization of a consensus function (CF). The Boltzmann machine consists of a set of units (Xi, and Xj) and a set of bi -directional connections between pairs of units. This machine can be used as an associative memory. If the units Xi; and Xj are connected, then w ij7KHUHH[LVWVsymmetry in the weighted interconnections based on the directional nature. It can be represented as wij=w ji. There also may exist a self -connection for a unit (wij). For unit Xi, its State xi; may be HLWKHURU7KHREMHFWLYHRIWKHQHXUDOQHWLVWRmaximize the CF given by

ൌԝ

ԝ

ஸݓݔݔ The maximum of the CF can be obtained by letting each unit attempt to change its state (alter between " ͳ DQGRUDQG7KHFKDQJHRIVDWHFDQEHGRQHeither in parallel or sequencial manner. However, in this case ali the description is based on sequential manner. The consensus change when unit ܺ changes its state is given by

ȟܨܥሺ݅ሻൌሺͳെʹݔሻቌݓԝ

ஷԝݓݔቍ where ݔ is the current srate of unit ܺ. The variation in coefficient ሺͳെʹݔሻ is given by ሺͳെʹݔሻൌ൜ͳǡܺ is currently off െͳǡܺ is currently on munotes.in

## Page 90

89Chapter 6: Special NetworksIf unit ܺ were to change its activations, then the resulting change in the CF can be obtained from the information that is local to unit ܺ. Generally, ܺ does not change its stare, but if the states are changed, then this increases the consersus of the net. The probability of the network that accepts a change in the state for unit ܺ is given by

ሺ݅ǡܶሻൌͳ

ͳሾെȟ ሺ݅ሻȀܶሿ where ܶ (temperature) is the controlling parameter and it will gradually decrease as the CF reaches the maximum value. Low values of ܶ are acceptable because they increase rhe net consensus since the net accepts a change in state. To help the net not to stick with the local maximum, probabilistic functions are used widely. 6.2.1. Architecture of Boltzmann Machine

B

Figure 6.3 Architecture of Boltzmann machine 6.2.2. Testing Algorithm of Boltzmann Machine Step 0: Initialize the weights representing the constraints of the problem. Also initialize control parameter ܶ and activate the uni ts. Step 1: When stopping condition is false, perform Steps 2-8. Step 2: Perform Steps ͵െ݊ଶ rimes. (This forms an epoch.)

munotes.in

## Page 91

90SOFT COMPUTING TECHNIQUES

Step 3: Integers ܫ and ܬ are chosen random values berween 1 and ݊( .Unit ܷଵǡ is

the current victim to change its state.)

Step 4: Calculate the change in consensus:

ȟܨܥൌ൫ͳെʹܺǡ൯ݓሺܫǡܬǣܫǡܬሻԝ

ǡஷԝԝ

ଵǡԝݒሺ݅ǡ݆ǣܫǡܬሻܺǡ

Step 5: Calculate the probability of acceptance of the change in state:

ሺܶሻൌͳȀͳሾെሺȟ Ȁܶሻሿ

Step 6: Decide whether to accept the change or not. Les ܴ be a random number

between

0 and 1. If ܴ൏ , accep t the change:

ܺǡൌͳെܺǡ (This changes the scate ǡ.) If ܴܣܨ ,reject the change.

Step 7: Reduce the control parame ter ܶ .ܶ (new) ൌͲǤͻͷܶ (old)

Step 8: Test for stopping condition, which is:

If the temperature reaches a specified value or if there is no change of state for

specified number of epochs then stop, else continue.

6.3. Gaussian Machine

Gaussian machine is one which includes Boitzmann machine, Hopfield net and

other neural ne tworks. The Gaussian machine is based on the following three

parameters:

(a) a slope parameter of sigmoidal func tion ߙ,

(b) a time step ȟݐ( ,c) temperacure ܶ .The s teps involved in the operation of the

Gaussian net are the following:

Step 1: Compute the net input to unit ܺ :

ൌԝே

ୀଵݓݒߠ߳

where ߠ ;is the rhreshold and א the random noise which depends on temperature ܶ.

Step 2: Change the activity level of unit ܺ :

ȟݔ

ȟݐൌെݔ

ݐ

Step 3: Apply the activation function:

ݒൌ݂ሺݔሻൌͲǤͷሾͳሺݔሻሿ

The binary step function corresponds to ߙൌλ (infini ty). munotes.in

## Page 92

91Chapter 6: Special Networks

The Gaussian machine with ܶൌͲ corresponds the Hopfield net. The Bolamann

machine can be obtained by setting ȟݐൌൌͳ to get

ȟݔൌെݔ net

or ݔ (new) ൌ net ൌԝே

ୀଵԝ݅ݒݒߠ߳

The approximate Boltzmann acceptance function is obtained by integrating the

Gaussian nois e distribution

නԝஶ

ͳ

ξʹߨߪଶሺݔെݔଶሻ

ʹߪଶ݀ݔൎ ሺݎǡܶሻൌͳ

ͳሺെݔ݈ܶሻ

where ݔൌȟܨܥሺ݅ሻ. The noise which is found to obey a logistic rather than a

Gaussian distribution produces a Gaussian machine that is identical to Boltzmann

machine having Metrop olis acceptance function, i.e., the output set to 1 with

probability,

ሺ݅ǡ ሻൌͳ

ͳሺെݔȀܶሻ

ȟݔൌെݔ net

6.4. Cauchy Machine

Cauchy machine can be called fast simulated annealing, and it is based on including

more noise to the net input for increasing the likelihood of a unit escaping from a

neighbourhood of local minimum. Larger changes in the system's configuration can

be obta ined due to the unbounded variance of the Cauchy distribution. Noise

involved in Cauchy distribution is called "coloured noise" and the noise involved

in the Gaussian distribution is called "white noise." By setting ȟݐൌ߬ൌͳ, the

Cauchy machine can be exte nded into the Gaussian machine, to obtain

ȟݔൌെݔ

or ݔ (new) ൌ net ൌԝே

ୀଵԝݓݒߠ߳

The Cauchy acceptance function can be obtained by integrating the Cauchy noise

distribution:

නԝஶ

ͳ

ߨܶݔ݀

ܶଶሺݔെݔሻଶൌͳ

ʹͳ

ߨ
ቀݔ

ܶቁൌ ሺ݅ǡܶሻ

where ݔൌȟܨܥሺݐሻ. The cooling schedule and temperature have to be considered

in both Cauchy and Gaussian machines. munotes.in

## Page 93

92SOFT COMPUTING TECHNIQUES

6.5. Probabilistic Neural Net

The probabilistic neural net is based on the idea of conventional probability theory,

such as B ayesian classification and other estimators for probability density functions, to construct a neural net for classification. This net instantly approximates optimal boundaries between categories. It assumes that the training

data are original representativ e samples. The probabilistic neural net consists of

two hidden layers as shown in Figure 6 -4. The first hidden layer contains a

dedicated node for each training p attern amd the second hidden layer contains a

dedicated node for each class. The two hidden la yers are connected on a class -by-

class basis, that is, the several examples of the class in the first hidden layer are

connected only to a single machine unit in the second hidden layer.

Figure 6.4. Probabilistic neural network

The algorithm for the construction of the net is as follows:

Step 0: For each training input pattern ݔሺሻǡൌͳ to ܲ ,perform Steps 1 and 2.

Step 1: Create pattern unit ݖ (hidden -layer -l unit). Weight vecror for unit ݖ is

given by

ݓൌݔሺሻ

Unit ݖ is either ݖ -class- 1 unit or ݖ -class - 2 unic.

Step 2: Connect the hidden -layer - 1 unit to the hidden -layer - 2 unic.

If ݔሺሻ belongs to class 1, then connect the hidden layer unic ݖ ro the hidden layer

unit ଵ.

Otherwise, connect pattern hidden layer unit ݖ to the hidden layer unit ܨଶ.

munotes.in

## Page 94

93Chapter 6: Special Networks

6.6. Cascade Correlation Network:

Cascade correlation is a network which builds its own architecture as the training

progresses. Figure 6 -5 shows the cascade correlation archit ecture. The network

begins with some inputs and one or more output nodes, but it has no hidden nodes.

Each and every input is connected to every output node. There may be linear units

or some nonlinear activation function such as bipolar sigmoidal activati on function

in the output nodes. During training process, new hidden nodes are added to the

network one by one. For each new hidden node, the correlation magnitude between

the new node's output and the residual error signal is maximized. The connection

is made to each node from each of the network's original inputs and also from every

pre-existing hidden node. During the time when the node is being added to the

network, the input weights of the hidden nodes are -frozen, and only the output

connections are tr ained repeatedly. Each new node thus adds a new one -node layer

to the network.

Figure 6.5. Cascade architecture after two hidden nodes have been added

In Figure 6 -5, the vertical lines sum all incoming activations. The rectangular boxed connections are frozen and "0" connections are trained continuously. In the beginning of the training, there are no hidden nodes, and the network is trained over

the comp lete training set. Since there is no hidden node, a simple learning rule,

Widrow -Hofflearning rule, is used for training. After a certain number of training

cycles, when there is no significant error reduction and the final error obtained is

unsatisfactory , we try to reduce the residual errors further by adding a new hidden

node. For performing this task, we begin with a candidate node that receives

trainable input connections from the network's external inputs and from all pre -

munotes.in

## Page 95

94SOFT COMPUTING TECHNIQUES

existing hidden nodes. The ou tput of this candidate node is not yet connected to the

active network. After this, we run several numbers of epochs for the training set.

We adjust the candidate node's input weights after each -epoch to maximize C

which is defined as

ܥൌԝ

פԝ

൫ݒെݒԦ൯൫ܧǡെܧ᪄൯

where ݅ is the network output at which error is measured, ݆ the raining partern, ݒ

the candidate node's output value, ܧ the residual output error at node ǡߥ᪄ the value

of ݕ averaged over all parterns, ܧതതത the value of ܧ

averaged o ver all patterns. The value " ܥᇱᇱ ' measures the correlation berween the candidate node's oucput value and the calculated residual output error. For maximizing ܥ ,the gradient μ݀μݓ is obrained as

μܿ

μݓൌԝ

ǡߪ൫ܧǡെܧ᪄൯݀ܫ

where ߪ is the sign of the correlation between the candidatc's value and output ݅Ǣ݀

the derivative for pattern ݆ of the candidate node's activation function with respecc

to sum of its inputs; ܫǡ the input the candidate node receives from node ݉ for

pattern ݆Ǥ When gradient μ݀μݓ is calculated, perform gradient ascent to maximize

C. As we are training only a single layer of weights, simple delta learning rule can

be applied. When ܥ stops improving, again a new candidate can be brought in as a

node i n the active network and its input weights are frozen. Once again, all the

output weights are trained by the delta learning rule as done previously, and the

whole cycle repeats itself until the error becomes acceptably small.

6.7. Cognitron Network:

The sy naptic strength from cell X to cell Y is reinforced if and only if the following

two conditions are true:

l. Cell X - presynaptic cell fires.

2. None of the postsynaptic cells present near cell Y fire stronger than Y .

The model developed by Fukushima was ca lled cognitron as a successor to the

perceptron which can perform cognizance of symbols from any alphabet after

training. Figure 6 -6 shows the connection between presynaptic cell and postsynaptic cell. munotes.in

## Page 96

95Chapter 6: Special Networks

The cognitron network is a self -organizing multilayer neural network. Its nodes

receive input from the defined areas of the previous layer and also from units within

its own area. The input and output neural elements can rake the form of positive

analog values, which are proportional to the pulse density of f iring biological

neurons. The cells in the cognitron model use a mechanism of shunting inhibition,

i.e., a cell is bound in terms of a maximum and minimum activities and is driven

toward these extremities. The area from which the cell receives input is cal led

connectable area. The area formed by the inhibitory cluster is called the vicinity

area. Figure 6. 7 shows the model of a cognitron. Since the connectable areas for

cells in the same vicinity are defined to overlap, but are not exactly the same, there

will be -a slight difference appearing between the cells which is reinforced so that

the gap becomes more apparent. Like this, each cell is allowed to develop its own

characteristics.

Cognitron network can be used in neurophysiology and psychology. Since th is

network closely resembles the natural characteristics of a biological neuron, this is

best suited for various kinds of visual and auditory information processing systems.

However, a major drawback of cognitron net is that it cannot deal with the problem s

of orientation or distortion. To overcome this drawback, an improved version called

neocognitron was developed.

Figure 6.6 Connection between presynaptic cell and postsynaptic cell

munotes.in

## Page 97

96SOFT COMPUTING TECHNIQUES

Figure 6.7 Model of a cognitron network 6.8. Neocognitron Network Neocognitron is a multilayer feed-forward network model for visual pattern recognition. It is a hierarchical net comprising many layers and there is a localized pattern of connectivity between the layers. It is an extension of cognitron network. Neocognitron net can be used for recognizing hand -written characters. A neocognitron model is shown in Figure 6·8. The algorithm used in cognitron and neocognitron is same, except that neocognicron model can recognize patterns that are position-shifted or s hape -distorted. The cells used in neocognitron are of two types: 1. S·-cell: Cells that are trained suitably to. respond to only certain features in

the previous layer. 2. C-cell· A C-cell displaces the result of an S -cell in space, i.e., son of

"spreads" the fe atures recognized by the S -cell.

Figure 6.8 Neocognitron models

munotes.in

## Page 98

97Chapter 6: Special Networks

Figure 6.9 Sprcading effect in neocognitron

Neocognitron net consists of many modules with the layered arrangement of S -

cells and C -cells. The S -cells receive the input from the previous layer, while C -

cells receive the input from the S -layer. During training, only the inputs to the S -

layer are modif1ed. The S -layer helps in the detection of spccif1c features and their

complexities. The feature recognized in the S 1 layer may be a horizontal bar or a

vertical bar but the feature in the Sn layer may be more complex. Each unit in the

C-layer corresponds to one relative position independent feature. For the independent feature, C -node receives the inputs from a subset o f S-layer nodes. For

instance, if one node in C -layer detects a vertical line and if four nodes in the

preceding S -layer detect a vertical line, then these four nodes will give the input to

the specific node in C -layer to spatially distribute the extracted features. Modules

present near the input layer (lower in hierarchy) will be trained before the modules

that are higher in hierarchy, i.e., module 1 will be trained before module 2 and so

on.

The users have to fix the "receptive field" of each C -node befor e training starts

because the inputs to C -node cannot be modified. The lower level modules have smaller receptive fields while the higher level modules indicate complex independent features present in the hidden layer. The spreading effect used in

neocogni tron is shown in Figure 6 -9.

6.9. Cellular Neural Network –

cellular neural network (CNN), introduced in 1988, is based on cellular automata,

i.e., every cell in the network is connected only to its neighbour cells. Figures 6 -10

(A) and (B) show 2 x 2 CNN and 3 x 3 CNN, respectively. The basic unit of a CNN

is a cell. In Figures 6 -10(A) and (B), C(l, l) and C(2, 1) are called as cells.

munotes.in

## Page 99

98SOFT COMPUTING TECHNIQUES

Even if the cells are not directly connected with each other, they affect each other

indirectly due to propagation effects of the network dynamics. The CNN can be

implemented by means of a hardware model. This is achieved by replacing each

cell with linear capacitors and resistors, linear and nonlinear controlled sources,

and independent sources. An electronic circuit model ca n be constructed for a CNN.

The CNNs are used in a wide variety of applications including image processing,

pattern recognition and array computers.

Figure 6.10 (A) A2*2CNN;(B) a 3*3 CNN

6.10. Optical Neural Networks

Optical neural networks interconnect neurons with light beams. Owing to this

interconnection, no insulation is required between signal paths and the light rays

can pass through each other without interacting. The path of the signal travels in

three dimensi ons. The transmission path density is limited by the spacing of light

sources, the divergence effect and the spacing, of detectors. A$ a result, all signal

paths operate simultaneously, and true data rare results are produced. In holograms

with high densit y, the weighted strengths are stored.

These stored weights can be modified during training for producing a fully adaptive

system. There are two classes of this optical neural network. They are:

1. electro -optical multipliers;

2. holographic correlators .

6.10.1. Electro -optical multipliers

Electro -optical multipliers, also called electro -optical matrix multipliers, perform

matrix multiplication in

parallel. The network speed is limited only by the available electro -optical

components; here the computation ti me is potentially in the nanosecond range. A

munotes.in

## Page 100

99Chapter 6: Special Networks

model of electro -optical matrix multiplier is shown in Figure 6 -11.

Figure 6 -11 shows a system which can multiply a nine -element input vector by a 9

X 7 matrix, which

produces a seven -element NET vector. There e xists a column of light sources that

passes its rays through

a lens; each light illuminates a single row of weight shield. The weight shield is a

photographic film where transmittance of each square (as shown in Figure 6 -11) is

proportional to the weight. There is another lens that focuses the light from each

column of the shield m a corresponding photoelectron. The NET is calculated as

ൌσԝݓݔ

where NET k is the net output of neuron k; w ik the weight from neuron i to neuron

k; x i the input vector

component i. The output of each photo detector represents the dot product between

the input vector and a

column of the weight matrix. The output vector set is equal to the produce of the

input vector with weight

matrix. Hence, matrix multiplication is performed parallel. The speed is independent of the size of the array.

Figure 6.11 Elecrno -optical multiplier

6.10.2. Holographic Correlators

In holographic correlators, the reference images are stored in a thin hologram and

are retrieved in a coheren tly illuminated feedback loop. The input signal, either

noisy or incomplete, may be applied to the system and can simultaneously be

correlated optically with all the stored reference images. These. correlations can be

threshold and are fed back to the inpu t, where the strongest correlation reinforces

munotes.in

## Page 101

100SOFT COMPUTING TECHNIQUESthe input image. The enhanced image passes around the loop repeatedly, which approaches the stored image more closely on each pass, until the system gets stabilized on the desired image. The best performance of optical correlators is obtained when they are used for image recognition. A generalized optical image recognition system with holograms is shown in Figure 6- 12.

Figure 6.12 Optical image recognition system The system input is an image from a laser beam. This passes through a beam splitter, which sends it to the threshold device. The image is reflected, then gets reflected from the threshold device, passes back to the beam splitter, then goes to lens 1, which makes it fall on the first hologram. There are several stored images in first hologram. The image then gets correlated with each stored image. This correlation produces light patterns. The brightness of the patterns varies with the degree of correlation. The projected images from lens 2 and mirror A pass through pinhole array, where they are spatially separated. From this array, light patterns go to mirror B through lens 3 and then are applied to the second hologram. Lens 4 and mirror C then produce superposition of the multiple correlated images o1nto the back side of the threshold device. The front surface of the threshold device reflects most strongly that pattern which is brightest on its rear surface. Its rear surface has projected on it the set of four correlations of each of the four stored images with the input image. The stored image that is similar to the input image possesses highest correlation. This reflected image again passes through the beam splitter and re-enters the loop for further enhancement. The system gets converged on the stored patterns most like the input pattern.

munotes.in

## Page 102

101Chapter 6: Special Networks

6.11. SPIKING NEURAL NETWORKS(SNN)

As it is well known that the biological nervous system has inspired the development

of the artificial neural network models. On looking into the depth of working of

biological neu rons, it is noted that the working of these neurons and their

computations are performed in temporal domain and the neuron firing depends on

the timing between the spikes stimulated in the neurons of the brain. These

fundamental biological understandings o f the neuron operation lead the pathway to

the development of spiking neural networks (SNN). SNNs fall under the category

of third -generation neural networks and this is more closely related to the biological

counterparts compared to the first - and second -generation neural networks. These developed spiking neural networks use transient pulses for performing the computations and require communications within the layers of the network designed. There exist different spiking neural models and their classificat ion is

based on their level of abstraction.

6.11.1. Architecture of SNN Model

Neurons in central nervous system communicate using short -duration electrical

impulses called spikes or action potentials in which their amplitude is constant in

the same structure of neurons. SNNs offer a biological plausible fast third -genera -

tion neural connectionist model. They derive their strength and interest from an

accurate modelling of synaptic interactions between neurons, taking into account

the time of spike em ission. SNNs overcome the computational power of neural

networks made of threshold or sigmoidal units. Based on dynamic event -driven

processing, they open up new horizons for developing models with an exponential

capacity of memorizing and a strong ability to fast adaptation.

Moreover, SNNs add a new dimension, the temporal axis, to the representation

capacity and the processing abilities of neural networks. There are many different

models one could use to model both the individual spiking neurons and also the

nonlinear dynamics of the system. Neurons communicate with spikes, also known

as action potentials. Since all spikes art identical (1 -2 ms of duration and 100 mV

of amplitude), the information is encoded by the liming of the spikes and not the

spikes t hemselves. Basically, a neuron is divided into three parts: the dendrites, the

soma and the axon. Generally speaking, the dendrites receive the input signals from

the previous neurons. The received input signals are processed in the soma and the

output sig nals are transmitted at the axon. The synapse is between every two

neurons; if a neuron j sends a signal across the synapse to neuron i, the neuron that

sends the signal is called pre-synaptic neuron and the neuron that receives the

signal is called post-synaptic neuron. Every neuron is surrounded by positive and munotes.in

## Page 103

102SOFT COMPUTING TECHNIQUES

negative ions. In the inner surface of the membrane there is an excess of negative

charges and on the outer surface there is an excess of positive charges. Those

charges create the membrane potenti al. Each spiking neuron is characterized by a membrane potential. When the membrane potential reaches a critical value called threshold it emits an action

potential, also known as a spike (Figure 7 -1). A neuron is said to fire when its

membrane potential reaches a specific threshold. When it fires, it sends a spike

towards all other connected neurons. Its membrane potentials then reset and the

neuron cannot fire for a short period of time, this time period refractory period. The

output of a spiking neuron i s therefore binary (spike or not) but it can be converted

to continuous signal over time. Hence the activity of a neuron over a short period

of lime is converted into a mean firing rate. The spikes are identical to each other

and their form does not change as the signal moves from a pre -synaptic to a post -

synaptic neuron. The firing time of a neuron is called spike train.

Fig-.6.13 -SNN spikes: The membrane potential is increased and at time t(f) the

membrane potential reaches the threshold so that a spike is emitted.

6.11.2. Izhikevich Neuron Model

The Izhikevich Neuron Model is defined by the following equation:

v’= 0.04v2 + 5v + 140 –u +I

u’= a(bv -u)

munotes.in

## Page 104

103Chapter 6: Special Networks

If v >= 30 mV, then v = c and u = u + d. Here, / is the input, v is the neuron

membrane voltage and u is the recovery vari able of the activation of potassium K

ionic currents and inactivation of sodium Na ionic currents. The model exhibits all

known neuronal firing patterns with the appropriate values for the variables a, b, c

and d.

1 The parameter a describes the time scale of the recovery variable u. Smaller

values result in slower recovery. A typical value is a = 0.02.

2. The parameter b describes the sensitivity of the recovery variable u to the

sub-threshold fluctuations of the membrane potential v. A typical value is b

- 0.2.

3. The parameter c describes the after -spike reset value of the membrane

potential v caused by the fast high -threshold K (potassium) conductance. A

typical value for real neurons is c = -65 mV.

4. The parameter d describes the after -spike reset of the recovery variable u

caused by slow high threshold Na (sodium) and K (potassium) conductance.

A typical value of d is 2.

The IZ neuron uses voltage as its modelling variable. When the membrane voltage

v(f) reaches 30 mV, a spike is emitted and the membrane voltage and the recovery

variable are reset according to IZ neuron model equations. For I ms of simulation,

this model t akes 13 FLOPS. Figure 7 -2 illustrates the IZ neuron model firing.

munotes.in

## Page 105

104SOFT COMPUTING TECHNIQUES

Fig- 6.14-The Izhikevich Spiking Neuron Model. In the top graph, there exists

the membrane potential of the neuron. In the middle graph, there is the

membrane recovery variable. Finally , the bottom plot represents the action pre -

synaptic spikes .

The SNN with N neurons is assumed to be fully connected and hence the output of

each neuron I is connected t o every other neuron. The synaptic strength of these

connections are given by the N x N matrix W where W[i, j ] is th e strength between

the output of neuron j and the input of neuron i. Thus W [i, :] repr esents the synapses

at the input of neuron i, whereas W[:, j] represents the synapse values connected

to the outputs of neuron j. Each neuron has it s own static parameters and varying

state values. The set P represents the set of possible constant parameters and I is

the set of neuron states. The set of possible inp uts to the neurons is denoted by R.

munotes.in

## Page 106

105Chapter 6: Special Networks

The neuron updated functio n f:(P, S, R) -> (S, [0,1 ]) takes input parameters as the

neuronal states and inputs and produces the next neuronal stat e and binary output.

Izhikevich's model uses a two -dimensional differenti al equation to represent the

state of a single neuron i, namely, its membrane recovery variable u[i] and

membrane potential v[i], that is (u[i], v[i]) Ԗ S with a hard reset spike. Additional

four parameters are used for the configuration of the neurons: a - time scale of u; b

- sensitivity of u; c - value of v after th e neuron is fired; d - value of u after the

neuron is fired. Hence the neuron parameters are (a, b, c, d) Ԗ P, These parameters

can be tuned to represent different neuron classes. If the val ue of v[i| is above 30

mV, the output is set to 1 (otherwise it is 0) and the state variables are reset.

Izhikevich used a random input for each neuron in the range N(0,1), a zero mean

and unit variance that is normally distributed. This input results in r andom number

of neurons firing each time, depending not only on the intensity of the stimulus, but

also on their randomly initialized parameters. After the input layer, one or more

layers are connected in a feed -forward fashion. A spike occurs anytime the voltage

reaches 30 mV. While the neurons communicate with spikes, the input current I i of

the neuron i is equal to

ܫൌԝ

ୀଵݓߜԝ

ୀଵݓܫሺݐሻ

where w ij is weight of connection from node; to node i; wik is weight of connection

from external input k to node i; Ik(t) is binary external input NįM is binary output

of neuron j (0 or 1).

When the input current signal changes, the response of the Izhikevich ne uron also

changes, generating different firing rates. The neuron is responded during “T” ms

with an input signal and it gets fired when its membrane potential reaches a specific

value, generating an action potential (spike) or a train of spikes. The firing rate is

evaluated as

6.12. Encoding of Neurons in SNN

Spiking neural networks can encode digital and analogy information. The neuronal

coding schemes are of three categories: rate coding, temporal coding and

population coding. In rate coding, the information is encoded into the mean firing

munotes.in

## Page 107

106SOFT COMPUTING TECHNIQUESrate of the neuron, which is also known as temporal average. In temporal coding, the information is encoded in the form of spike times. In population coding, a number of input neurons (population) are involved in the analog encoding and this produces different firing times. Commonly used encoding method is the population- based encoding. In population encoding, analogy input values are represented into spike times using population coding. Multiple Gaussian receptive fields are used so that the input neurons encode an input value into spike times. The firing time is computed based on the intersection of Gaussian function. The centre of the Gaussian function is calculated using

ߤൌܫ୫୧୬ሺʹכ݅െ͵ሻȀʹכሺ୫ୟ୶െܫ୫୧୬ሻȀሺܯെʹሻ and the width is computed employing

ߪൌͳȀߚሺܫ୫ୟ୶െܫ୫୧୬ሻȀሺܯെʹሻ where ͳߚʹ with the variable interval of ሾܫ୫୧୬ᇲԝܫ୫ୟ୶ሿ. The parameter " ߚ "controls the width of each Gaussian receptive field. 6.12.1. Learning with Spiking Neurons Similar to other supervised training algorithms, the synaptic weights of the network are adjusted iteratively in order to impose a desired input-output mapping to the SNN. Learning is performed through implementation of synaptic plasticity on excitatory synapses, The synaptic weights of the model, which are directly connected to the input pattern, determine the firing rate of the neurons. This means that the carried learning phase generates the desired behaviour by adjusting the synaptic weights of the neuron. The neurons characterize sudden change of the membrane potential instantaneously prior to and subsequent to the firing. This potential behavioural feature leads to complexity in training SNNs. Some of the learning models include SpikeProp, spike-based supervised Hebbian learning, and ReSuMe and Spike time -dependent plasticity. Neurons can be trained to classify categories of input signals based on only a temporal configuration of spikes. The decision is commu nicated by emitting precisely timed spike trains associated with given input categories. Trained neurons can perform the classification task correctly. The weights w between a pre-synaptic neuron i and a post -synaptic neuron j do not have fixed values. It has been proved through experiments that they change, and this affects the amplitude of the generated spike. The procedure of the weight munotes.in

## Page 108

107Chapter 6: Special Networksupdate is called learning process and it can be divided into two categories: supervised an d unsupervised learning If the synaptic strength is increased then it is

called long -term potentiation (LTP) and if the strength is decreased then it is called

long-term depression (LTD).

6.12.2. Spike Prop Learning Algorithm

SNN employs spiking neurons as computational units which account to precise

firing times of neurons for information coding. The information retrieval from the

spike trains (neurons encode the information) are done by binary bit coding which

is a population coding approach. Th is section presents the error -back propagation

supervised learning algorithm as employed for the spiking neural networks.

Each SNN consists of a set of neurons (I, J), a set of edges (E ᄬ I x J), input neurons

i ᄬ I and output neurons j ᄬ J. For each non -input neuron, i Ԗ I, with threshold

function V th and potential u(t), each synapse {i, j} ߋ E will have a response function

İij and weight w ij. The structures of neurons tend to be fully connected feed forward

neural network. The source neuron V will fire a nd propagate spikes along all

directions. Formally, a spike train is defined as a sequence of pulses. Each target

neuron w that receives a spike experiences an increase in potential at time t, similar

as w j,w İj,w (i-t).

The firing time of a neuron i is denoted as t where f = 1,2,3,... is the number of the

spike. The objective is to train a set of target firing times t ft and actual firing time t a

For a series of the input spike trains S in(t), a sequence of the target outpu t spikes S

(f) is obtained. The goal is to find a vector of the synaptic weights w such that the

outputs of the learning neurons S out(t) are close to S t(t). Changing the weights of

the synapses alters the timing of the output spike for a given temporal inp ut pattern

ܵଵሺݐሻൌԝ

ߜ൫ݐെݐ൯

where ߜሺݔሻ is the Dirac function, ߜሺݔሻൌͲ for ݔ്Ͳ and ିିԝߜሺݔሻ݀ݔൌͳ. Every

pulse is taken as a single point in time. The objective is to train the desired target

firing times ൛ݐൟ and that of the actual firing times ሼݐሽǤ The least mean squares

error function is chosen and is defined by

ܧൌͳ

ʹԝ

င௩ሺݐఈെݐሻଶ

munotes.in

## Page 109

108SOFT COMPUTING TECHNIQUES

In error -back propagation algorithm, each synaptic terminal is taken as a separate

connection ݇ from neuron ݅ to ݆ with weight ݓכӏ݅ߟכ is the learning rate parameter.

The basic weight adaptation functions for neurons in the output layer hidden layer

are given by

ߜൌߜక

ߜ୳ߜୱ

ߜݔሺݐלሻൌሺݐെݐሻ

σఢ௧ೕԝσଵԝݓథߜᇲሺݐሻ

ߜݐௗ

ȟݓǡൌെߟܧߜ

ߜݓൌെߟݕሺݐሻڄߜ௧

ߜൌߜݐ

ߜݔሺݐሻσఓאԝߜߜݔଵሺݐሻ

ߜݐ

ȟݓǡൌെߟݕሺݐఈሻڄߜ

The training process involves modifying the thresholds of the neuron firing and

synaptic weights. The algorithmic steps involved in learning through Spike -Prop

Algorithm are as follows:

6.12.3. Spike -Prop Algorithm

Step 1: The threshold is chosen and the weights are initialized randomly between

0 and 1.

Step 2: In feed -forward stage, each input synapse receives input signal and

transmits it to the next neuron (i.e., hidden units). Each hidden unit with SNN

function calculated is sent to the output unit which in return calculates the spike

function as the response for the given input. The firing time of a neuron ta is found.

The time to first spike of the output neurons is compared with that of the desired

time tfi of the first spike.

Step 3: Perform the error -back propagation learning process for all the layers.

The equations are transformed to partial derivatives and the process is carried out.

Step 4: &DOFXODWHį j using actual and desired firing time of each output neuron.

Step 5; &DOFXODWHį i employin g the actual and desired firing times of each hidden

QHXURQDQGį j values.

Step 6: Update weights: For output layer, calculate each change in weight.

Step 7: Compute: New weight = Old weight + ǻZijk

Step 8: For hidden layer, calculate each change in weigh t.

Step 9: Compute new weights for the hidden layer. New weight = Old weight + ǻ

whik

Step 10: Repeat until the occurrence of convergence. munotes.in

## Page 110

109Chapter 6: Special Networks

6.12.4. Spike Time -Dependent Plasticity (STOP) Learning

Spike time -dependent plasticity (STOP) is viewed as a more quantitative form of

Hebbian learning. It emphasizes the importance of causality in synaptic strengthening or weakening. STDP is a form of Hebbian Learning where spike time

and trans mission are used in order to calculate the change in the synaptic w eight of

a neuron. When the pre -synaptic spikes precede post -synaptic spikes by tens of

milliseconds, synaptic efficacy is increased. On the other hand, when the post -

synaptic spikes precede the pre -synaptic spikes, the synaptic strength decreases.

Further more, the synaptic efficacy ǻwij is a function of the spike times of the pre -

synaptic and post -synaptic neurons. This is called Spike Timing -Dependent

Plasticity (STDP). The well -known STDP algorithm modifies the synaptic weights

using the following algor ithm

ȟݓൌ൜ܣାሺȟݐȀ߬ାሻif ȟݐ൏Ͳ

െܣିሺെȟݐȀ߬ሻ if ȟݐͲ

ݓmev ൌ൜ݓold ߟȟݓሺݓmas െݓold ሻ if ȟݓͲ

ݓous ߟȟݓሺݓെݓ୫୧୬ሻ if ȟݐ൏Ͳ

Where ǻt = (t pre – tpost) the time delay between the pre synaptic spike and the post

synaptic spike. If the pre-synaptic spike occurs before the post synaptic spike, the

weight of the synapse should be increased. If the pre synaptic spike occurs after the

post-synaptic spike, then the weight of the synapse gets reduced. STDP learning

can be used for Inhibitory o r excitatory neurons.

6.12.5. Convolutional neural network (CNN)

Convolutional neural network (CNN) is built up of one or more number of

convolutional layers and after then it is trailed by one or more fully connected

layers like feed forward networks. CNN architecture is designed to possess the

structure of a two dimensional input image, that is, CNN's key advantage is that its

input consists of images and this representation of images designs the architecture

in a practical way. The neurons in CNN arc arr anged in 3 dimensions: height, width,

and depth. The information pertaining to "depth" is an activation volume and it

represents the third dimension. This architectural design of CNN is carried out with

the local connections and possesses weights which art subsequently followed by

certain pooling operations. CNN’s can be trained in an easy manner and these have

minimal parameters for the same number of hidden units than that of the other fully

interconnected networks considered for comparison, figure 7 -3 shows the arrangement of neurons in three dimensions in a convolutional neural network. As

a regular neural network, the convolutional neural network is also made up of munotes.in

## Page 111

110SOFT COMPUTING TECHNIQUES

layers, and each and every layer transforms an input 3D volume to an output 3D

volume alo ng with certain differentiable activation functions with or without any

parameters.

Figure 6.15 Arrangement of neurons in CNN model

6.12.6. Layers in Convolutional Neural Networks

It is well noted that the convolutional neural network is a sequence of layers and

each and every layer in CNN perform transformation of one volume activations to

the other by employing a differentiable function. CNN consists of three major

layers:

ͳǤ Convolutional layer

ʹǤ Pooling layer

͵Ǥ Fully interconnected layer (regular neural models like perceptron and BPN)

These layers exist between the input layer and output layer Input layer holds the

input values represented by the pixel values of an image. Convolutional layer

perform s computation and determines output of a neuron that is connected to local

regions in the input. The computation is done by performing dot product between

their weights and a small region that is connected to the input volume. After then,

an element wise a ctivation function is applied wherein the threshold set to zero.

Applying this activation function results no change in the size of the volume of the

layers Pooling layer carries out the down sampling operation along with the spatial

dimensions including w idth and height Regular fully connected layers perform

computation of the class scores (belongs to the class or nut) and result m a specified

volume size. In this manner, convolutional neural networks transform the original

input layer by laser and result in the final scores. Pooling layer implements only a

died function whereas convolutional and fully interconnected layer implements

transformation on functions and as well on the weights and biases of the neurons.

munotes.in

## Page 112

111Chapter 6: Special Networks

Fundamentally, a convolutional neural netw ork is none comprising a sequence of

layers that transform the image volume into an output volume. Each of the designed

layers in CNN is modelled to take an input 3 dimensional volume data and perform

transformation to an output 3 dimensional data employin g a differentiable function Here, the designed convolutional and fully inter connected layers possess parameters and the pooling layers do not possess a parameter.

6.12.7. Architecture of a Convolutional Neural Network

It is well known that CNN is made up of a number of convolutional and pool ing

(also called as sub -sampling) layers, subsequently followed by fully interconnected

layers (at certain cases this layer becomes optional based on the application

considered).

Figure 6.17 CNN with convolutional and pooling layers

The input presented to the convolutional layer is an n x n x p image where “ n" is

the height and width of an image and “p" refers to the number of channels (e g., an

RGB image possess 3 channels and so p = 3). The convolutional layer t o be

constructed possesses 'm filters of size r x r x q, where “r" tends to be smaller than

the dimension of the image and “q” can be the same size as that of “p" or it can be

smaller and vary for each of the filter. The filter size enables the design of l ocally

connected structure which gets convolved with the image for producing “m" feature

maps. The size of feature map will be “ n - r + 1”. Each of the feature maps then

gets pooled (sub -sampled) based on maximum or average pooling over r x r

connecting re gions. The value of “r” is 2 for small images and 5 for larger images.

A bias and a non -linear sigmoidal function can be applied to each of the feature

munotes.in

## Page 113

112SOFT COMPUTING TECHNIQUES

map before or after the pooling layer, figure 7-4 shows the architecture of the

convolutional neural network.

6.12.8. Designing the Layers in CNN Model

CNN b nude up of the three individual layers and this subsection presents the details

on designing each of these lasers specifying their connectivity and hyper parameters.

1- Design of Convolutional Layer

The primary building block of convolutional neural network is the convolutional

layer. The convolutional layer is designed to perform intense computations in a

CNN model. Convolutional layer possess a set of trainable filters and every filter

is spatially small (along the width and height) but noted to extend through the

fullest depth of the input volume. When the forward pass gets initiated, each filter

slides across the height and width of the input volume and the dot product is

computed between the input at any position and that of the entries in the filter.

When the filter slides across the height and weight of the input volume, a two -

dimensional activation feature map is produced that gives the responses of that

filter at every spatial position. The fil ters get activated when they come across

certain type of visual features (like edge detection, color stain on the first layer,

certain specific patterns or honeycomb existing on higher layers of the network)

and the net work learns from the filter that get s activated. Convolutional layer

consists of the complete set of filters and each of these filters produces a separate

2-dimensional activation map. These activation maps will be stacked along the

depth dimension and result in the output volume.

In CNN net work model, at the convolutional layer, each neuron gets connected

only to a local region of the input volume. The spatial extent of this neuronal

connectivity is represented by a hyper -parameter called the receptive field of the

neuron. This receptive fie ld of the neuron is the filter size. This spatial extent's

connectivity along the depth axis will be equal to the depth of input volume. These

connections tend to be local in space and get full towards the entire depth of the

input volume.

With respect to the number of neurons in the output volume, three hyper -parameters

are noted to control the size of the output volume - depth, stride and zero -padding.

The depth of the output volume refers to the number of filters to be used, wherein

each learning searche s the existence of difference in the input. The stride is to be

specified for sliding the filter. munotes.in

## Page 114

113Chapter 6: Special Networks

one pixel at a time, stride = 1

2 pixel at a time, stride = 2

subsequently for other strides

The movement of the filter is specified by the above equation. This representation

of the strides results in smaller output volumes spatially. At times it is required to

pad the input volume with zeros around the border, hence, the other hyper -pa-

rameter is the size of this zero -padding. Zero -padding allows controlling the spatial

size of the output volumes. It should be noted that if all neurons presented in the

single depth slice employ the same weight vector, then in every depth slice, the

forward pass o f the convolutional layer can be computed as the convolution of the

neuronal weights with that of the input volume. Thus, the sets of weights are

referred in CNN as filter that gets convolved with the input. The limitation of this

approach is that it uses lots of memory, as certain values in the input volume arc

generated repeatedly for multiple times.

It is to be noted that the backward pass for a convolution operation is also a

convolution process. The backward pass also moves to a back propagation neural

network. In few works carried out earlier, it is observed that they use 1 x 1 con -

volution, but for a two -dimensional case it is similar to a point -wise scaling

operation. As with CNN model, it is operated more on three -dimensional volumes

and also the fi lters get extended over the full depth of the input volume. It is to be

noted that employing 1 x 1 convolution will perform the three -dimensional dot

product. Another method of convolution is the dilated convolution, wherein an

added hyper -parameter called dilation is included to the convolutional layer. In case

of dilated convolution, there is possibility to have filters with spaces between each

cell. Implementation will be done in a manner of dilation 0, dilation 1 (gap 1 will

be adopted between the filters) and so on. Employing dilated convolutions drastically increases the receptive field.

2-Design of Pooling Layer Between the successive convolutional layers, pooling layers are placed. The presence of pooling layer between the convolute layers is to grad ually decrease the

spatial size of the parameters and to reduce the computation in the network. This

placement of pooling layer also controls the occurrence of over fitting. The pooling

layer works independently on depth slice of the input as well as resiz es them fPRYH munotes.in

## Page 115

114SOFT COMPUTING TECHNIQUES

spatially. Commonly employed pooling layer is the one with the filter size of 2 x 2

applied with a stride of 2 down samples. The down sampling occurs for every depth

slice in the input by 2 along the height and width. The dimension of the depth

parameter remains unaltered in this case. Pooling sizes with higher receptive fields

are noted to be damaging. Generally used pooling mechanism is the “max pooling”.

Apart from this operation, the pooling layer can also perform functions like mean

pooling or even L2 -norm pooling. In the backward pass of a pooling layer, the

process is only to route the gradient to the input that possessed the highest value in

the forward pass. Hence, at the time of forward pass of the pooling layer, it is

important to track th e index of the activation function (probably “max”) so that the

gradient routing is carried out effectively by a back -propagation network algorithm.

6.12.9. Layer Modelling in CNN and Common CNN Nets

The other layers of importance in convolutional neural network are the

normalization layer and the fully connected layer. Numerous normalization

layers are developed to be used in CNN model and they are designed in a manner

to implement the inhibition procedure of the human brain. Various types of

normalization procedures like mean scaling, max scaling, summation process,

etc. can be employed if required for operation in the CNN model. Fully

connected layers possess full interconnections for all the activations in the

previous layer. As reg ular, their activations are based on computing the net input

to the neurons of a layer along with the bias input also.

6.12.10. Conversion of Fully Connected Layer to Convolutional Layer

The main difference between the fully connected and the convolutional layer is

that the neurons present in the convolu tional layer get connected only to a local

region in the input and the neurons in the convolutional voluminous structure

share their parameters. The neurons in both fully connected and convolutional

layers calculate the dot products and hence their functional form remains the

same. Therefore it is possible to perform conversion between the fully connected

and the convolutional layers.

Considering any convolutional layer, there exists a fully connected layer which

implements one and the same forward pass function. The weight matrix will be

a large one and possesses zero entities except at specific blocks (no self -connec -

tion and existence of local connectivity') and the weights in numerous blocks

tend to be eq ual (parameter sharing). Also, fully connected layer can be

converted into convolutional layer; here the filter size will be set equal to the size munotes.in

## Page 116

115Chapter 6: Special Networks

of the input volume and the output will be a single depth column fit across the

input volumes. This gives the same result as that of the initial fully connected

layer. In both these conversions, the process of converting a fully connected layer

to a convolutional layer is generally in practice.

6.13. CNN Layer Sizing

As known, CNN model commonly comprises convolutional layer, pooling layer,

and fully connected layer. The rules for sizing the architecture of the CNN model

are as follows:

1. The input layer should be designed in such a way that it should be divisible

by the convolutional layer should employ small size filters, specifying the

stride. The convolutional layer should not alter the spatial dimensions of the

input.

2. The pooling layer down samples the spatial dimensions of the input. Commonly used pooling is the max -pooling with a 2 x 2 receptive fields and

a stride of 2. Receptive field size is accepted until 3x3 and if it exceeds above

3, the pooling becomes more aggressive and tends to lose information. This

results in poor performance of the network.

From all the above, it is clearly understood that the convolutional layers preserve

the spatial size of their input. On the other hand, the pooling layers are responsible

for down sampling the volumes spatially. Alternatively, if strides greater than 1 or

zero-padding are not done to the input in convolutional layers, then it is very

important to track the input volumes through the entire CNN architecture and

ensure that all the strides and filters work in a proper manner. Smaller strides are

generally better in practic e. Padding actually improves the performance of the

network. When the convolutional layer does not zero -pad the inputs and only

performs authenticate convolutions, then the volume size will reduce by a smaller

amount after each convolution process.

6.13.1. Common CNN Nets In the past few years, there were numerous CNN models developed and implemented for various applications. Few of them include

ͳǤ LeNet: The first convolutional neural network model named after the developer LeCun. It is applied to read zip codes, dig its and so on. ʹǤ AlexNet: CNN model in this case was applied to computer vision application. munotes.in

## Page 117

116SOFT COMPUTING TECHNIQUES

It was developed in the year 2012 by Alex Krizhevsky and team.

͵Ǥ ZFNetf: It was developed in the year 2013 by Zeiler and Fergus and hence

named as ZFNet. In this network model, the convolutional layers in the

middle are expanded and the stride and filter size are made smalt in the first

layer.

ͶǤ VGGNet : It was modelled in the year 2014 by Karen and Andrew. It has

phenomen al impact on the depth of the net work and it was noted that depth

of network parameter plays a major role for better performance.

ͷǤ GoogLeNet It was developed in the year 2014 from Google by Szegedy and

team. This net contributed an Inception module wherein the numbers of

parameters in the model are reduced. This network employs mean pooling

instead of fully con nected layers at the top of the convolutional network. As

a result, more number of parameters arc eliminated in this case.

Ǥ ResNet: It was modelled in the year 2015 by Kaiming and team, and hence

called as Residual Network. This network is the default convolutional neural

network. It employs batch normalization and the architecture also docs not

consider fully connected layers at the end of the network .

6.13.2. Limitations of CNN Model

The computational considerations are the major limitations of the convolutional

neural network model. Memory require ment is one of the problems for CNN

models. In the current processor unit, the memory limits from 3/4/6 GB to the latest

best version of 12 GB memory. The memory can be handled by

1. Convolutional network implementations should maintain varied memory

requirements, like the image data modules

2. Intermediate volume sizes specify the number of activations at e ach layer of

the convolutional network as well is their gradients. Running convolutional

network at the time of testing alone reduces the memory by large amount, by storing only the current activations at any layer and eliminating the activations of the pr evious layer.

3. Network parameters and their size, gradient descent values of the parameters

during backward pass in back propagation process and also a step cache when

momentum factor is used. The memory required to store a parameter alone

should be multipl ied by a factor of at least 3 or so. munotes.in

## Page 118

117Chapter 6: Special Networks

On calculating the total number of parametric values, the number must be converted

to a specified size in GB for memory requirement. For each of the parameters,

consider the number of parametric values. Then multiply th e number of parametric

values by 4 to get the raw number of bytes and then divide it by multiples of 1024

to get the amount of memory in KB, MB and then in GB. In this way, the memory

requirement of CNN model can be computed and the limitations can be over come.

6.14. Deep learning Neural networks:

Machine learning approaches are undergoing a tremendous revolution, which has

led to the development of third generation neural networks. The limitations

observed in the second -generation neural networks like delayed converged undue

local and global minimal problems and so on are handled in the developed third -

generation neural networks. One of the prominent third generation neural networks

is the deep learning neural networks (DLNNs) and this neural model provides a

deep understanding of the input information.

The prominent researcher behind the concept of deep learning neural networks is

Professor Hinton fr om University of Toronto who managed to develop a special

program module for constituting the formulation of molecules to produce an

effective medicine. Minton's group employed deep learning artificial intelligence

methodology to locate the combination of molecules required for the composition

of medicine with very limited information on source data. Apple and Google have

transformed themselves with deep learning concepts and this can be noted through

Apple Siri and Google Street view, respectively.

The learning process in deep learning neural network takes place in two steps. In

the first step, the information about the input data’s internal structure is obtained

from the existing large array of unformatted data. This extraction of the internal

structure is carried out by an auto -associator unit via unsupervised training layer -

by-layer, then the formatted data obtained from the unsupervised multi -layer neural

network gets processed through a supervised network module employing the

already available neural ne twork training methods. It is to be noted that the amount

of unformatted data should be as large as possible and the amount of formatted data

can be smaller in size (but this need not be an essential criteria).

6.14.1. Network Model and Process Flow of Dee p Learning Neural Network

The growth of deep learning neural networks is its deep architecture that contains

multiple hidden layers and each hidden layer carries out a non -linear transformation

between the layers. DLNNs get trained based on two features: munotes.in

## Page 119

118SOFT COMPUTING TECHNIQUES

1. Pre-training of the deep neural networks employing unsupervised learning

techniques like auto -encoders layer -by-layer,

2. Fine tuning of the DLNNs employing back propagation neural network.

Basically, auto -encoders are employed with respect to the unsupervised learning

technique and the input data is the output target of the auto -encoder. An auto -

encoder consists of two parts - encoder and decoder network. The operation of an

encoder network is to transform the input data that is present in the form of a high -

dimensional space into codes pertaining to low -dimensional space. The operation

of the decoder network is to reconstruct the inputs from the corre sponding codes.

In encoder neural networ k, the encoding function is given by “fĬ". The encode

vector (Ev) is given by

Ev = fĬ(xv)

where “ x” is the data set of the measured signal.

The reconstruction operation is carried out at the decoder neural network and its

function is given by “gĬ". This reconstruction function maps the data set “xv” from

the low -dimensional space into the high -dimensional space. The reconstructed

form is given by

ݔොv = g ș(Ev)

The ultimate goal of these encoder and decoder neural networks is to minimize the

reconstruction error E(x, ݔො) for that many numbers of training samples. E(x, ݔො) is

specified as a loss function that is used to measure the discrepancy between the

encoded and decoded data samples. The key objective of the unsupervised auto -

encoder is to determine the parameter sets that minimize the reconstruction error

“E”

įaeșș¶ ଵ

ே σܧே

௩ୀଵ(xvJ¶ș(fĬ(xv)))

The encoding and decoding functions of the DLNN will be present along with a

non-linearity and are given by

fĬ(x) = f af_e (b+W x)

gĬ(x) = f af_d (b+W xT)

Where faf_e and faf_d refer to the encoder activation function and the decoder

DFWLYDWLRQIXQFWLRQUHVSHFWLYHO\³ELQGLFDWHVWKHELDVRIWKHQHWZRUNDQG:DQGmunotes.in

## Page 120

119Chapter 6: Special Networks

WT specify the weight matrices of the DLNN model.

The reconstruction error is given by

E(x, ݔො) =|| x- ݔො||

In orde r to carry out the pre -training of a DLNN model, the “N" auto -encoders

developed in previous module should be stacked. For the given input signal xv

input layer along with the first hidden layer of DLNN arc considered as the

encoder neural network of the f irst auto -encoding process. When the first auto -

encoder is noted to be trained by minimizing the reconstruction error, the first

WUDLQHGSDUDPHWHUVHWș 1, of the encoder neural network is employed to ini tialize

the first hidden layer of the DLNN and the f irst encode vector is obtained by

E1v = fĬ(xv)

Now, the input data becomes the encode vector E 1v. The first and second hidden

layers of the DLNN are considered as the encoder neural network for the second

auto-encoder. Subsequently, the second hidden laye r of the DLNN gets initialized

by that of the second trained auto -encoder. This process gets continued upto the N -

th auto -encoder that gets trained for initializing the final hidden layer of the DI.NN

model. The final or the N -th encode vector in generaliz ed form for the vector xv is

obtained by

ENv = fĬ(EvN-1)

ZKHUH³ș N” denotes the Nth trained parameter set of the encoder neural network.

Thus, in this way, all the DLNN s hidden layers get pre trained by means of the N

stacked auto encoders. It is well noted that the process of pre -training avoids local

minima and improv es generalization aspect of the problem under consideration.

Figure 7 -5 shows the fundamental architecture of the deep learning neural network.

Figure 6.18 Architecture model of deep learning neural network

munotes.in

## Page 121

120SOFT COMPUTING TECHNIQUES

The above completes the pre training process o f DLNN and the nest process is the

tine-tuning process in the DLNN model DLNN models output is calculated from

the input signal Xy as

yy =ת N+1(EyN)

where תN+1 denotes the trained parameter set of the output layer. Here, back

propagation network (BPN)is employed for minimizing the error of the output by

carrying out the parameter adjustments in DLNN backwards in case the output

the target of xx is tv , then the error criterion is given by

MSE( Ȍ ͳȀܰσܧ

௬ୀଵ(yy, ty

Where Ȍ ^ת1, ת2, ת3, …..תN+1}

6.14.2. Training Algorithm of Deep Learning Neural Network:

Step 1: Start the algorithmic process.

Step 2: Obtain the training data sets to feed into the DLNN model and initialize

the necessary parameters.

Step 3: Construct DLNN with "N” hidden layers.

Step 4: Perform the training of r -th auto -encoder.

Step 5: Initialize i -th hidden layer parameter of DLNN emp loying the parameters of the auto encoder.

Step 6: Check whether “i” is greater than “N". If no carry out step 4; if yes go to

the next step.

Step 7: Calculate the dimensions of the output layer.

Step 8: Fine tune the parameters of DLNN through the BBN algorithm.

Step 9: With the final fine -tuned DLNN model go to the next step.

Step 10: Return the trained DLNN.

Step 1 1: Output the solutions achieved.

Step 12: Stop the process on meeting termination condition. The termination

condition is the number of it erations or reaching the minimal mean square

error.

6.14.3. Encoder Configurations

Encoders are built so as to receive the possible exact configuration of the input at

the output end. These encoders belong to the category of auto associator neural

units, Auto associator modules, are designed to perform the generating part as well as the synthesizing part. Encoders discussed in this section belong to the synthesized module of auto associator and tor the generation part, a variation of

Boltzmann machine as p resented in special networks. munotes.in

## Page 122

121Chapter 6: Special NetworksAn auto encoder is configured to be all open layer neural network Auto encoder for its operation sets its target value equal to that of the Input vector. A model of an

auto encoder is as shown in figure 7.6. The encoder model attempts to find

approximation of a defined function authenticating that the feedback of a neural network tends to be approximately equal to the values of the given input parameters. The encoder is also capable of compressing the data once the given

input signal gets passed to that of the output of the network. The compression is

possible in an auto encoder if there exists hidden interconnections or a sort of

characteristics correlation. In this manner, auto encoder behaves m a similar

manner as the p rincipal component analysis and achieves data reduction (possible

compression) in the input side.

Figure 6.19 Model configuration of an auto encoder

Features in the hidden layer munotes.in

## Page 123

122SOFT COMPUTING TECHNIQUESOn the other hand, when the auto encoder is trained with the stochastic gradient descent algorithm and the where the number of hidden neurons becomes greater than the number of inputs, it results in the possible decrease in the error values. So, it is applied for various function analysis and compression applications Another variation in the encoder configuration is the denoting auto encoder. Here, the variation exists in the training process. On training the deep learning neural network for demolishing encoder, corrupted or demonised data (substituted with “0" values) can be given as input. further to this, during the same time, the coned data can be compared with that of the output data. The advantage of this mechanism is that it paves way to restore the damaged data. 6.15. EXTREME L EARNING MACHINE MODEL (ELMM) Over the years, it has been observed that the k nearest neighbourhood and other few architectures like support machine (SVM) classifiers employed for classification requite more computations due to the repetition of classification and registration, hence they are relatively slow. SVM approach, even though it has the advantages of generalization and can handle high dimensional feature space, assumes that the data are independently and identically distributed. This is not applicable for all sets of data, as they are likely to have noise and related distribution. Storage is also an added disadvantage of SVM classifier. Other multilayer neural networks which are trained with back propagation algorithm based on gradient descent learning rule. Posses certain limitations like slow conversions, setting the learning rate parameters, local and global minimum occurrences and repeated training process without attaining conversions point. ELMM is a single hidden layer feed forward neural network where input weights and hidden neuron are randomly selected without training. The output weights are analytically computed employing the least square norm solution and Moore – Penrose inverse of a generalized linear system. This method of determining output weights results in significant reduction of training time. For hidden layer neurons are the activation functions like Gaussian , sigmoidal and so on can be employee for output layer neurons layer linear activation function. This single layer feed forward, network ELM model employee additive neural design instead of kernel based and hence there is random parameter selection. 6.15.2. ELM TRAINING PROGRAM: For a given training vector pair N={xt ,tt)}, with x i € Rn ti € Rm , i=1,…,N activation function f(x) and hidden neuron N, the algorithm is as follows: munotes.in

## Page 124

123Chapter 6: Special Networks

Step 1: Start Initialize the necessary parameters, choose suitable activation

function and the number of hidden neurons in the hidden layer for the considered

problem.

Step 2: Assign arbitrary input weights w i and bi as b i

Step 3: Compute the output matrix H at the hidden layer

H=[ࣄZE

Step 4: &RPSXWHWKHRXWSXWZHLJKWᆧEDVHGRQWKHHTXDWLRQ

ᆧ +
7

6.15.3. Other ELM Models

Huang initially proposed ELM in the year 2004 and subsequently numerous

researchers worked on ELM and developed certain improved ELM algorithms.

ELM was enhanced over the years to improve the network training speed, to

avoid local and global minima, to reduce iteration time, to overcome the difficulty

in defining learning role parameters and setting the stopping criteria.

Since ELM works on empirical minimization principle, the random selection of

input layer weights and hidden layer biases result in non -optimal convergence. In

comparison with that of the gradient descent learning rule, ELM may require

PRUHQXPEHURIKLGGHQOD\HUQHXURQVDQGWKLVUHGXFHV(/0¶VWUDLQLQJHIIHFW

Henceforth, to speed the convergence and response of ELM training, numerous

improvements were made in existing ELM algorithm and modified versions of

ELM algorithm were introduced. The following sub -sections present few

improvements made by researchers in the existing ELM algorithm.

6.15.4. Online Extreme Learning Machine

ELM is well noted for solving regression and classification problems; it results

in better generalization performance and training speed. When considering ELM

for real applications which involve minimal data set, it may result in over -fitting

occurrences.

Online ELM is also referred to as online sequential extreme learning machine

(OSELM) and this works on sequential adaptation with recursive least square

algorithm. This was also introduced by Huang in the year 2005. Further to this,

online sequential fuzzy ELM (OS -Fuzzy-ELM) has also been developed for

implementing different orders of TSK models. In fuzzy -based FLM, randomly

all the antecedent parameters of membership functions are assigned first and

subse quently the consequent parameters are computed. Zhang, in the year 2011, munotes.in

## Page 125

124SOFT COMPUTING TECHNIQUESdeveloped selective forgetting ELM (SFELM) to overcome the online training issues and applied it to time -series prediction. SFELM’s output weight is calculated in a recursive manner at the time of online training based on its generalization performance. SFELM is noted to possess better prediction accuracy. 6.15.5. Pruned Extreme Learning Mac/i/ne ELM is well known for its short training time and here the number of hidden layer nodes are randomly selected and are analysed for determination of thei r respective weights. This minimizes the calculation time with fast learning. Rong in the year 2008 modified the architectural design of ELM as the existence of smaller or higher hidden layer neurons will result in Under -fitting and over -fitting problems f or classification problems. Pruned ELM (PELM) algorithm was developed as an automated technique to design an ELM. The significance of hidden neurons was measured in PELM by employing statistical approaches. Starting with higher number of hidden neurons, the insignificant ones are then pruned with class labels based on their importance. Henceforth the architectural design of ELM network gets automated. PELM is inferred to have better prediction accuracy for unseen data when compared with basic ELM. there als o exists a pruning algorithm that is based on regularized regression method, to determine the required number of hidden neurons in the network architecture. This regression approach starts with higher number of hidden neurons and in due course the unimportant neurons get pruned employing methods like ridge regression, elastic network and so on. In this manner, the architectural design of FILM network gets automated. 6.15.6. Improved Extreme Learning Machine Models ELM requires more number of hidden neurons due to its random computation of input layer weights and hidden biases. Owing on this, certain hybrid ELM algorithms were developed by researchers to improve the generalization capability. One of the method proposed by Zhu (2005) employs differential evolution (DE) algorithm for obtaining the input weights and Moore-Penrose (MP) inverse to obtain the output weights of an ELM model. Several researchers also attempted to combine ELM with other data processing methods resulting in new ELM learning models and applying the newly devel oped algorithm for related applications. ELM at times results in non-optimal performance and possess over -fitting occurrence. This was addressed by Silva in the year 2011 by hybridizing group search optimizer to compute the input weights and ELM algorithm for computing munotes.in

## Page 126

125Chapter 6: Special Networks

the hidden layer biases. Here it is required to evaluate the influence of various

types of members that tend to fly over the search space bounds. The effectiveness

of ELM model gets lowered because at times, th e hidden layer output matrix

obtained through the algorithm docs not form a full rank matrix due to random

generation of input weights and biases. This was overcome by the development

of effective extreme learning machine (EELM) neural network model which

properly selects the input weights and biases prior to the calculation of output

weights ensuring a full column rank of the output matrix.

Thus, considering the existing limitations of ELM models, researchers have

involved themselves in developing new vari ants of ELM models both in the

algorithmic side and in the architectural design side. This section has presented

few of the variants of ELM models as developed by the researchers and applied

for various prediction and classification problems.

6.15.7. Appli cations of ELM

Neural networks are widely employed in mining, classification, prediction,

recognition and other applications. ELM has been developed with an idea to

improve the learning ability and provide better generalization performance.

Considering th e advantages of ELM models, few of its application include

ͳǤ Signal processing

ʹǤ Image processing

͵Ǥ Medical diagnosis

ͶǤ Automatic control

ͷǤ Aviation and aerospace

Ǥ Business and market analysis

Summary:

In this chapter we learn about Simulated Annealing Network, Boltzmann Machine, Gaussian Machine, Cauchy Machine, Probabilistic Neural Net ,Cascade Correlation Network, Cognitron Network ,Neocognitron Network , Cellular Neural

Network , Optical Neural Networks, Spiking Neural , Networks ( SNN) ,Encoding

of Neurons in SNN, CNN Layer Sizing, Deep learning Neural networks, Extreme

Learning Machine Model (ELMM) in detail.

munotes.in

## Page 127

126SOFT COMPUTING TECHNIQUES

Review Questions:

1. Write a short note on Simulated Annealing Networks?

2. Explain Architecture of Boltzmann Machine.

3. Explain Probabilistic Neural Net.

4. Write a short note on Cellular Neural Network.

5. What are the Third -Generation Neural Networks?

6. Explain Architecture of a Convolutional Neural Network

7. What are the Limitations of CNN Model.

8. Write a short note on Deep learnin g Neural networks.

9. Write a short note on ELM Architecture and Training Algorithm

Reference:

1. “Principles of Soft Computing”, by S.N. Sivanandam and S.N. Deepa, 2019,

Wiley Publication, Chapter 2 and 3

2. http://www.sci.brooklyn.cuny.edu/ (Artificial N eural Networks, Stephen

Lucci PhD)

3. Related documents, diagrams from blogs, e -resources from RC Chakraborty

lecture notes and tutorialspoint.com .

munotes.in

## Page 128

127Chapter 7: Introduction to Fuzzy Logic and Fuzzy

Unit IV

7 INTRODUCTION TO FUZZY LOGIC

AND FUZZY

Unit Structure

7.0 Objectives

7.1 Introduction to Fuzzy Logic

7.2 Class ical Sets

7.3 Fuzzy Sets

7.4 Classical Sets v/s Fuzzy Sets

7.4.1 Operations

7.4.2 Properties

7.5 More Operations on Fuzzy Sets

7.6 Functional Mapping of Classical Sets

7.7 Introduction to Classical Relations & Fuzzy Relations

7.8 Cartesian Product of the Relation

7.9 Classical Relation v/s Fuzzy Relations

7.9.1 Cardinality

7.9.2 Operations

7.9.3 Properties

7.10 Classical Composition and Fuzzy Composition

7.10.1 Properties

7.10.2 Equivalence

7.10.3 Tolerance

7.11 Non-Interactive Fu zzy Set

munotes.in

## Page 129

128SOFT COMPUTING TECHNIQUES

7.0 Objectives

We begin this chapter with introducing fuzzy logic, classical sets and fuzzy sets

followed by the comparison of classical sets and fuzzy sets.

7.1 Introduction to Fuzzy Logic

Fuzzy logic is a form of multi -valued logic to deal with reasoning that is

approximate rather than precise. Fuzzy logic variables may have a truth value that

ranges between 0 and 1 and is not constrained to the two truth values of classical

propositional logic.

“As the complexity of a system increases, it b ecomes more difficult and eventually

impossible to make a precise statement about its behavior, eventually arriving at a

point of complexity where the fuzzy logic method born in humans is the only way

to get at the problem” – Originally identified & set fo rth by Lotfi A. Zadeh, Ph.D.,

University of California, Berkeley.

Fuzzy logic offers soft computing:

• provides a technique to deal with imprecision & information granularity.

• provides a mechanism for representing linguistics construct.

Figure 7.1: A fuzzy logic system accepting imprecise data and

providing a decision

The theory of fuzzy logic is based upon the notion of relative graded membership

and so are the functions of cognitive processes. It models uncertain or ambiguous

data & pr ovides suitable decision. Fuzzy sets that represents fuzzy logic provides

means to model the uncertainty associated with vagueness, imprecision & lack of

information regarding a problem or a plant or system.

Fuzzy logic operates on the concept of membership. The basis of the theory lies in

making the membership function lie over a range of real numbers from 0.0 to 1.0.

The fuzzy set is characterized by (0.0,0,1.0). The membership value is “1” if it

belongs to the set & “0” if it not member of the s et. The membership in the set is

found to be binary, that is, either the element is a member of a set or not. It is

indicated as

munotes.in

## Page 130

129Chapter 7: Introduction to Fuzzy Logic and Fuzzy߯ሺݔሻൌቄͳݔאܣͲǡݔבܣ

E.g. The statement “Elizabeth is Old” can be translated as Elizabeth is a member

of the set of old people and can be written symbolically as Î

ߤሺܦܮܱሻÎߤis the membership function that can return a value

between 0.0 to 0.1 depending upon the degr ee of the membership.

Figure 7.2: Graph showing membership functions for fuzzy set “tall”.

Figure 7.3: Graph showing membership functions for fuzzy set

“short”, “medium” and “tall”.

The membership was extended to possess various “degree of membership” on the

real continuous interval [0,1]. Zadeh generalized the idea of a crisp set by extending

a valuation set {0,1} (definitely in, definitely out) to the interval of real values

(degree of membership) between 1 & 0, denoted by [0,1]. The degre e of the

membership of any element of fuzzy set expresses the degree of computability of

the element with a concept represented by fuzzy set.

Membership Function : A fuzzy set A contains an object x to degree a(x), that is,

a(x) = Degree (ݔאܣሻܽǣܺ՜ሼݏݎܾ݁݉݁ܯ ݄݅݁ܦ݃ݏ݁݁ݎሽ

Possibility Distribution: The fuzzy set A can be expressed as ܣൌ൛൫ݔǡܽሺݔሻ൯ൟǡݔאܺǢ

munotes.in

## Page 131

130SOFT COMPUTING TECHNIQUESݔאܺ

Fuzzy sets tend to capture vagueness exclusively via membership functions that are

mappings from a given universe of discourse X to a unit internal containing

membership value. The membership function for a set maps each element of the

set to membership value between 0 & 1 and uniqu ely describes that set. The values

0 and 1 describes “not belonging to” & “belonging to” a conventional set,

respectively; values in between represent “fuzziness”. Determining the membership function is subjective to varying degree depending on the situati on. It

depends on an individual’s perception of the data in question and does not depend

on randomness.

Figure 7.4: Boundary region of a Fuzzy Set

Figure 7.5: Configuration of a pure fuzzy system

Fuzzy logic also consists of fuzzy inference engine or fuzzy rule base to perform

approximate reasoning somewhat similar to human brain. The fuzzy approach uses

a premise that human don’t represent classes of objects as fully disjoint sets but

rather as se ts in which there may be graded of membership intermediate between

full membership and non -membership. A fuzzy set works as a concept that makes

it possible to treat fuzziness in a quantitative manner. Fuzzy sets form the building

munotes.in

## Page 132

131Chapter 7: Introduction to Fuzzy Logic and Fuzzy

blocks for fuzzy IF -THEN rules which have general form “IF X is A THEN Y is

B” where A and B are fuzzy sets.

The term “fuzzy systems” refers mostly to systems that are governed by fuzzy IF -

THEN rules. The IF part of an implication is called antecedent whereas the THEN

part is call ed consequent. The fuzzy system is a set of fuzzy rules that converts

inputs to outputs.

The fuzzy inference engine (algorithm) combines fuzzy IF -THEN rules into a

mapping from fuzzy sets in the input space X to the fuzzy sets in the output space

Y based f uzzy logic principles. From a knowledge representation viewpoint, a

fuzzy IF -THEN rule is a scheme for capturing knowledge that involves imprecision. The main features of the reasoning using these rules is its partial

matching capability, which enables an inference to be made from a fuzzy rule even

when the rule’s condition is partially satisfied. Fuzzy systems, on one hand is rule

based system that are constructed from a collection of linguistic rules, on other

hand, fuzzy systems are non -linear mappings o f inputs to the outputs. The inputs

and the outputs can be numbers or vectors of numbers. These rule -based systems

can in theory model any system with arbitrary accuracy, i.e. they work as universal

approximation.

The Achilles’ heel of a fuzzy system is it rules; smart rules gives smart systems and

other rules give less smart or dumb systems. The number of rules increases

exponentially with the dimension of the input space. This rule explosion is called

the curse of dimensionality & is general problem for mathematical models.

7.2 Classical Sets (Crisp Sets)

Collection of objects with certain characteristics is called set. A classical set / crisp

set is defined as the collection of distinct objects. An individual entity of the set is

called as element/ membe r of the set. The classical set is defined in such a way that

the universe of discourse is splitted into two groups: members and non -members.

Partial membership does not exist in the case of crisp set.

Whole set: The collection of elements in the universe

Cardinal number: Number of the elements in the set.

Set: The collections of elements within the universe

Subset: The collections of elements within the set. munotes.in

## Page 133

132SOFT COMPUTING TECHNIQUES

7.3 Fuzzy Sets

A fuzzy set is a set having degree of membership between 0 & 1. A member of one

fuzzy set can also be the member of other fuzzy set in same universe. A fuzzy set

ܣ in the universe of disclosure U can be defined as a set of ordered pairs and it is

given by

ܣൌሼሺݔǡߤܣሺݔሻȁݔאܷሽ

where

ߤܣሺݔሻ

Ǥ ሾͲǡͳሿߤܣሺݔሻאሾͲǡͳሿ.

When the universe of disclosure is discrete & finite, fuzzy set A is given as

When the universe of discl osure is continuous & infinite, fuzzy set A is given as

Universal Fuzzy Set/ Whole Fuzzy Set : If and only if the value of the membership

function is 1 for all the members under consideration . Any fuzzy set A is defined

on universe U is the subset of that universe.

Empty Fuzzy Set : If and only if the value of the membership function is 0 for all

the members under consideration.

Equal Fuzzy Set : two fuzzy set A & B are said to be equal fuzzy sets if

ߤܣሺݔሻൌߤܤሺݔሻݔאܷ

Fuzzy Power set P(U) : The collection of all fuzzy sets and fuzzy subsets on

universe U.

munotes.in

## Page 134

133Chapter 7: Introduction to Fuzzy Logic and Fuzzy

7.4 Classical Sets v/s Fuzzy Sets

7.4.1 Operations Classical Sets Fuzzy Sets Definition The classical set is defined in such a way in that the universe

of the discourse is divided into two groups: members and nonmembers. Consider Set A

in Universe U:

An object x is a member of a

given set ܽሺݔאܣሻ i.e. x belongs to A.

An object x is a member of a

given set ܽሺݔבܣሻ i.e. x does

not belong to A. A fuzzy set is a set having degree of membership between 0 & 1.

A fuzzy set ܣ in the universe

of disclosure U can be defined

as a set of ordered pairs and it

is given by:

ܣൌሼሺݔǡߤܣሺݔሻȁݔאܷሽ

Union The union between two sets gives all those elements in the

universe that belong to either set A or set B or both the sets.

The union is termed as logical

OR operation. ܣܤൌሼݔȁݔאܣݎݔאܤሽ The union of fuzzy sets A & B is defined as:

ߤܣܤሺሻൌߤܣሺሻשߤ۰ሺሻ

ൌሼߤܣሺሻǡߤ۰ሺሻሽ݂ݎ݈݈ܽݔאܷ

V indicates max operation

Intersection The intersection between two sets gives all those elements in the universe that belong to both set A and set B. The union is termed as logical AND operation. ܣתܤൌሼݔȁݔאܣ݀݊ܽݔאܤሽ The intersection of fuzzy sets A & B is defined as:

ߤܣתܤሺሻൌߤܣሺሻרߤ۰ሺሻ

ൌሼߤܣሺሻǡߤ۰ሺሻሽ

݂ݎ݈݈ܽݔאܷ ר indicates min operation Complement The complement of set A is defined as the collection of all

elements in the universe X that

do not belong to set A. #ൌሼݔȁݔבܣǡݔאܺሽ The union of fuzzy sets A & B is defined as:

ߤ#ሺሻൌͳെ

ߤܣሺሻ݂ݎ݈݈ܽݔאܷ

munotes.in

## Page 135

134SOFT COMPUTING TECHNIQUES Classical Sets Fuzzy Sets Difference The difference of set A with respect to set B is the collection of all the elements in

the universe that belong to A but does not belong to B. It is

denoted by A|B or A -B ܣȁܤൌሼݔȁݔאܣ݀݊ܽݔבܤሽܣെሺܣתܤ)

7.4.2 Properties Classical Sets Fuzzy Sets Commutativity ܣܤൌܤܣ ܣתܤൌܤתܣ ܣܤൌܤܣ ܣתܤൌܤתܣ Associativity ܣሺܤܥሻൌሺܣܤሻܥ ܣתሺܤתܥሻൌሺܣתܤሻתܥ ܣሺܤܥሻൌሺܣܤሻܥ ܣתሺܤתܥሻൌሺܣתܤሻתܥ Distributivity ܣሺܤתܥሻൌሺܣܤሻתሺܣܥሻ ܣתሺܤܥሻൌሺܣתܤሻሺܣתܥሻ ܣሺܤתܥሻൌሺܣܤሻתሺܣܥሻ ܣתሺܤܥሻൌሺܣתܤሻሺܣתܥሻ Idempotency ܣܣൌܣ ܣתܣൌܣ ܣܣൌܣ ܣתܣൌܣ Transitivity ݂݅ܣكܤكܥݐ݄݁݊ܣكܥ ݂݅ܣكܤكܥݐ݄݁݊ܣكܥ Identity ܣᢥൌܣǢܣתᢥൌܣ ܣܺൌܺǢܣתܺൌܣ ܣᢥൌܣǢܣתᢥൌܣ ܣܺൌܺǢܣתܺൌܣ Involution (double

negation) #ൌܣ #ൌܣ DeMorgan’s Law ȁܣܤȁൌܣܤ ȁܣתܤȁൌܣתܤ ȁܣܤȁൌܣܤ ȁܣתܤȁൌܣתܤ Law of Contradiction ܣת#ൌᢥ Not Followed Law of Excluded

Middle ܣ#ൌܺ Not Followed munotes.in

## Page 136

135Chapter 7: Introduction to Fuzzy Logic and Fuzzy

7.5 More Operations on Fuzzy Sets

Algebraic Sum: The algebraic sum (A+B) of two fuzzy sets A & B is defined as

ߤܣܤሺሻൌߤܣሺሻߤ۰ሺሻെߤܣሺሻǤߤ۰ሺሻ

Algebraic Product : The algebraic product (A.B) of two fuzzy sets A & B is

defined as

ߤܣǤܤሺሻൌߤܣሺሻǤߤ۰ሺሻ

Bounded Sum : The bounded sum ሺܣْܤሻ of two fuzzy sets A & B is defined as

ߤܣْܤሺሻൌሼͳǡߤܣሺሻߤ۰ሺሻሽ

Bounded Difference : The bounded difference ሺܣْܤሻ of two fuzzy sets A & B

is defined as

ߤܣٖܤሺሻൌݔܽሼͲǡߤܣሺሻെߤ۰ሺሻሽ

7.6 Functional Mapping of Classical Sets

Mapping is a rule of correspondence between set -theoretic forms and function

theoretic forms.

X and Y are two different universe of disclosure. If an element x contained in X corresponds to an element y contained Y , it is called as mapping from X to Y; i.e. ݂ǣܺ՜ܻ

Let A & B be two sets on universe. The function theoretic forms of operation

performed between these two sets are given as follows:

Union : ߯ܣܤሺሻൌ߯ܣሺሻש߯۰ሺሻൌሼ߯ܣሺሻǡ߯۰ሺሻሽש is maximum

operator.

Intersection : ߯ܣתܤሺሻൌ߯ܣሺሻר߯۰ሺሻൌሼ߯ܣሺሻǡ߯۰ሺሻሽר is

minimum operator.

Complement: ߯#ሺሻൌͳെ߯ܣሺሻ

Containment: كܤǡݐ݄݊݁߯ܣሺሻ߯۰ሺሻ

munotes.in

## Page 137

136SOFT COMPUTING TECHNIQUES

7.7 Introduction to Classical Relations & Fuzzy Relations

Relationship between the object are the basic concepts involved in decision making

& other dynamic system application. Relations represent mapping between sets &

connective logic. A classical binary relation represents the presence or absences of

connection or interaction or association between the elements of two sets. Fuzzy

binary relations impart degrees of strength to connections or assoc iation. In fuzzy

binary relation, the degree of association is represented by membership grades in

the same way as the degree of set membership is represented in fuzzy set.

When r = 2, the relation is a subset of the Cartesian product A1*A2. This relation

is called a binary relation from A1 to A2. X & Y are two universe; their Cartesian

product X* Y is given by ܺכܻൌሼሺݔǡݕሻȁݔאܺǡݕאܻሽ

Every element in X is completely related to every element in Y . The characteristic

function, denoted by Ȥ gives the strength of the relationship between ordered pair

of elements in each universe. The characteristic function, denoted by Ȥ gives the

strength of the relationship between ordered pair of elements in each universe.

߯ߕכܻሺݔǡݕሻൌ൜ͳǡሺݔǡݕሻאߕכܻ

Ͳǡሺݔǡݕሻבߕכܻ

A binary relation in which each element from the first set X is not mapped to more

than one element in second set Y is called a function and is expressed

as ܴǣߕ՜ܻ

A fuzzy relation is a fuzzy set defined on the Cartesian product of classical set

{X1,X2,X3,…Xn} where tuples (x1,x2,…,xn) may have varying degree of membership ߤܴሺݔͳǡݔʹǡǥǡ݊ݔሻ within the relation

ܴሺܺͳǡܺʹǡǥǤǡܺ݊ሻൌ ߤܴሺݔͳǡݔʹǡǥǡ݊ݔሻȁଵכଶכǥேሺݔͳǡݔʹǡǥǡ݊ݔሻǡא݅ܺ

A fuzzy relation between two sets X & Y is called binary fuzzy relation & is

denoted by R(X,Y). A binary relation R(X,Y) is referred to as bipartite graph

when ;<$ ELQDU\UHODWLRQ RQ D VLQJOH VHW ; LV FDOOHG GLJUDSK RU GLUHFWHG

graph.This relation occur when X=Y and is denoted as R(X,X)or R(X2).The matrix

representing a fuzzy relation is called fuzzy matrix.A fuzzy relation R is a mapping

from Cartesian product space X *Y to interval [0,1]where the mapping strength is

expressed by the membership function of the relation for ordered pairs from the

two universe [ ȝR(x,y)] munotes.in

## Page 138

137Chapter 7: Introduction to Fuzzy Logic and Fuzzy

A fuzzy graph is a graphical representation of a binary fuzzy relation. Each

element in X & Y corresponds to a node in the fuzzy graph. The connection links

are established between the nodes by the elements of X*Y with nonzero membership grades in R(X,Y). The links may also be present in the forms of arcs.

This links are labelled with membership value as ሾܴߤሺݔǡݕሻሿǤ്ܺ

ܻǡ
܍ܜܑܜܚ܉ܘܑ܊ܘ܉ܚܐǤǡ
Ƭ
Ǥܺൌ ܻ ,a node is connected to itself and directed links are used; in such case, the fuzzy

graph is called directed graph . Here, only one set off nodes corresponding to set

X is used.

The domain of binary fuzzy relation R(X,Y) is the fuzzy set, dom R(X,Y) having

the membership function as:

The range of binary fuzzy relation R(X,Y) is the fuzzy set, ran R(X,Y) having the

membership function as:

7.8 Cartesian Product of the Relation

An ordered r -tuple is and ordered sequence of r -elements expressed in the form

(a1, a2, a3 … ar) .

munotes.in

## Page 139

138SOFT COMPUTING TECHNIQUES

An unordered r -tuple is a collection of r -elements without any restriction in order.

For r = 2, the r -tuple is called an ordered pair .

For crisp sets A1,A2,A3, ….Ar , the set of all r -tuples (a1,a2,a3,…ar) where a1א

ͳǡʹאʹǡǥǤǡאAr is called Cartesian product of A1,A2,A3, ….Ar and is

denoted by A1*A2*A3*….*Ar.

If all the ar’s are identical and equal to A, then the Cartesian product

A1*A2*A3*….*Ar is denoted as Ar

7.9 Classical Relation v/s Fuzzy Relations

7.9.1 Cardinality Classical Relations Fuzzy Relations Cardinality: Consider n elements of universe X being related to the m elements of universe Y .

When the cardinality of X= nX & the cardinality of Y = nY , then the cardinality of relation R between the two universe is

݊ܺכܻൌ݊ܺכܻ݊ The cardinality of the power set P(X *Y) describing the relation is given by ݊ܲሺܺכܻሻൌʹሺܻ݊ܺ݊ሻ The cardinality of fuzzy sets on any universe is infinity; hence the cardinality of a fuzzy relation between

two or more universe is also infinity.

7.9.2 Operations

Let R & S be two separate relations on the Cartesian universe X * Y . The null

relation and the complete relation are defined by the relation matrices ܴ߶ܽ݊݀߃ܴ.

munotes.in

## Page 140

139Chapter 7: Introduction to Fuzzy Logic and FuzzyOperations Classical Relations Fuzzy Relations Union ܴܵ՜ܴ߯ܵሺݔǡݕሻൌሾܴ߯ሺݔǡݕሻǡ߯ܵሺݔǡݕሻሿ ߤܴܵሺݔǡݕሻ ൌሾߤܴሺݔǡݕሻǡߤܵሺݔǡݕሻሿ Intersection ܴתܵ՜ܴ߯תܵሺݔǡݕሻൌ݊݅ሾܴ߯ሺݔǡݕሻǡ߯ܵሺݔǡݕሻሿ ߤܴתܵሺݔǡݕሻ ൌ݊݅ሾߤܴሺݔǡݕሻǡߤܵሺݔǡݕሻሿ Complement ܴ՜ܴ߯ሺݔǡݕሻǣ߯ሺݔǡݕሻൌͳെ߯ሺݔǡݕሻ ߤܴሺݔǡݕሻൌͳെߤܴሺݔǡݕሻ Containment ܴؿܵ՜ܴ߯ሺݔǡݕሻǣܴ߯ሺݔǡݕሻ߯ܵሺݔǡݕሻ ܴؿฺܵߤܴሺݔǡݕሻߤܵሺݔǡݕሻ Identity ߶՜߶ܴƬܺ՜߃ܴ Inverse The inverse of fuzzy relation R on X*Y is denoted by R-1.

It is relation on Y*X defined by

R-1(y,x)= R(x,y ) for all pairs ሺݕǡݔሻאܻכܺ Projection For fuzzy relation R(X,Y), let ሾܴ՝ܻሿ denote the projection of R

onto Y .

7.9.3 Properties Classical Relations Fuzzy Relations Properties • Commutativity • Associativity

• Distributivity

• Involution

• Idempotency

• DeMorgan’s Law

• Excluded middle law • Commutativity • Associativity

• Distributivity

• Involution

• Idempotency

• DeMorgan’s Law

munotes.in

## Page 141

140SOFT COMPUTING TECHNIQUES

7.10 Classical Composition and Fuzzy Composition

The operation executed on two binary relations to get a single binary relation is

called composition .

Let R be a relation that maps elements from universe X to universe Y and S be a

relation that maps elements from universe Y to universe Z. The two binary elements

R & S are compatible if R كX*Y & S كY*Z.The composition between the two

relations is denoted b y RלS.

Consider the universal sets given by:

ܺൌሼܽͳǡܽʹǡܽ͵ሽǢܻൌሼܾͳǡܾʹǡܾ͵ሽǢܼൌሼ
ͳǡ
ʹǡ
͵ሽ

Let the relation R & S be formed as:

ܴൌܺכܻൌሼሺܽͳǡܾͳሻǡሺܽͳǡܾʹሻǡሺܽʹǡܾʹሻǡሺܽ͵ǡܾ͵ሻሽ

ܵൌܻכܼൌሼሺܾͳǡܿͳሻǡሺܾʹǡܿ͵ሻǡሺܾ͵ǡܿʹሻሽ

It can be inferred that:

ܶൌܴלܵൌሼሺܽͳǡܿͳሻǡሺܽʹǡܿ͵ሻǡሺܽ͵ǡܿʹሻǡሺܽͳǡܿ͵ሻሽ

The composition operations are of two types

1. Max-Min Composition: ܶൌܴלܵ

2. Max-product Composition: ܶൌܴלܵ

Let A be fuzzy set on universe X & B be fuzzy set on universe Y . The Cartesian product over A and B results in fuzzy relation B and is contained within the entire

(complete) Cartesian space ܣכܤൌܴݓ݄݁ݎܴ݁ؿܺכܻ

The membership function of fuzzy rela tion is given by ߤܴሺݔǡݕሻൌߤܣכܤሺݔǡݕሻൌ

ሾߤܣሺݔሻǡߤܤሺݕሻሿ

For e.g., for a fuzzy set A that has three elements and a fuzzy set B has four

elements, the resulting fuzzy relation R will be represented by a matrix size 3 * 4

munotes.in

## Page 142

141Chapter 7: Introduction to Fuzzy Logic and Fuzzy

There are two types of fuzzy composition techniques:

1. Fuzzy Max -min composition

2. Fuzzy Max -product composition

Let R be fuzzy relation on Cartesian space X*Y and S be fuzzy relation on Cartesian

Space Y*Z.

Fuzzy Max -min composition:

The max -min composition of R(X,Y ) and S(Y ,Z) is denoted by ܴሺܺǡܻሻιܵሺܻǡܼሻis

defined by T(X,Z) as Fuzzy Max -product composition:

7.10.1 Properties Classical Composition Fuzzy Composition Associative ሺܴιܵሻιܯൌܴιሺܵιܯሻ ሺܴιܵሻιܯൌܴιሺܵιܯሻ Commutative ܴι്ܵܵιܴ ܴι്ܵܵιܴ Inverse ሺܴιܵሻെͳൌܵെͳιܴെͳ ሺܴιܵሻെͳൌܵെͳιܴെͳ 7.10.2 Equivalence Classical Composition Fuzzy Composition Reflexivity ܴ߯ሺ݅ݔǡ݅ݔሻൌͳݎሺ݅ݔǡ݅ݔሻאܴ ߤܴሺ݅ݔǡ݅ݔሻൌͳݔאܺ Symmetry ܴ߯ሺ݅ݔǡ݆ݔሻൌܴ߯ሺ݆ݔǡ݅ݔሻሺ݅ݔǡ݆ݔሻאฺܴሺ݆ݔǡ݅ݔሻאܴ ߤܴሺ݅ݔǡ݆ݔሻൌߤܴሺ݆ݔǡ݅ݔሻ݅ݔǡ݆ݔאܺ

Transitivity ܴ߯ሺ݅ݔǡ݆ݔሻܴ݀݊ܽ߯ሺ݆ݔǡ݇ݔሻൌͳǡݏܴ߯ሺ݅ݔǡ݇ݔሻൌͳሺ݅ݔǡ݆ݔሻאܴሺ݆ݔǡ݇ݔሻאܴǡݏሺ݅ݔǡ݇ݔሻאܴ ߤܴሺ݅ݔǡ݆ݔሻൌڊͳܴ݀݊ܽߤሺ݆ݔǡ݇ݔሻൌڊʹ ฺ ߤܴሺ݅ݔǡ݇ݔሻൌڊݓ݄݁ݎ݁ ڊൌሺڊͳǡڊʹሻ

Fuzzy Max -product transitive can be defined. It is given by

munotes.in

## Page 143

142SOFT COMPUTING TECHNIQUES

7.10.3 Tolerance Classical Composition Fuzzy Composition A tolerance relation R1 on universe X is one where the only the properties of reflexivity & symmetry are satisfied.

A binary fuzzy relation that possesses the properties of

reflexivity and symmetry is called

fuzzy tolerance relation or

resemblance relation. The tolerance relation can also be called proximity relation. The equivalence relations are a special case of th e tolerance relation. An equivalence relation can be formed from tolerance relation R1 by (n -1)

compositions with itself, where n is the cardinality of the set that defines R1, here it is X The fuzzy tolerance relation can be reformed into fuzzy equivalen ce relation in the same way as a crisp tolerance relation is reformed into crisp equivalence relation 7.11 Non-Interactive Fuzzy Set

The independent events in probability theory are analogous to noninteractive fuzzy

sets in fuzzy theory. We are defining fuzzy set A on the Cartesian space X= X1 x

X2. Set A is separable into two noninteractive fuzzy sets called orthogonal projections if and only if

where

munotes.in

## Page 144

143Chapter 7: Introduction to Fuzzy Logic and Fuzzy

The equations represent membership functions for the orthographic projections of

A on univ erses X1 and X2. respectively.

Summary

In this chapter, we have discussed the basic definitions, properties and operations

on classical sets and fuzzy sets. Fuzzy sets are tools that convert the concept of

fuzzy logic into algorithms. Since fuzzy sets allow partial membership, they

provide computer with such algorithms that extend binary logic and enable it to

take human -like decisions. In other words, fuzzy sets can be thought of as a media

through which the human thinking is transferred to a computer. One difference

between fuzzy sets and classical sets is that the former does not follow the law of

excluded middle and law of contradiction.

The relation concept is used for nonlinear simulation, classification, and control.

The description on composition of relations gives a view of extending fuzziness

into functions. Tolerance and equivalence relations are helpful for solving similar

classification problems. The noninteractivity between fuzzy sets is analogous to the

assumption of independence in probabi lity modelling.

Review Questions

1. Explain fuzzy logic in detail.

2. Compare Classical set and fuzzy set.

3. Enlist and explain any three classicals set operations.

4. Enlist and explain any three fuzzy sets operations.

5. Enlist and explain any three classical set properties.

6. Enlist and explain any three fuzzy sets properties.

7. Write a short note on fuzzy relation.

8. Compare classical relations and fuzzy relations.

9. Write a short note classical composition and fuzzy composition.

munotes.in

## Page 145

144SOFT COMPUTING TECHNIQUES

Bibliography, References and Further R eading

• Artificial Intelligence and Soft Computing, by Anandita Das Battacharya,

SPD 3rd, 2018

• Principles of Soft Computing, S.N. Sivanandam, S.N.Deepa, Wiley, 3rd ,

2019

• Neuro -fuzzy and soft computing, J.S.R. Jang, C.T.Sun and E.Mizutani,

Prentice Hall of India, 2004

munotes.in

## Page 146

145Chapter 8: Membership Functions, Defuzzification, Fuzzy Arithmetic and Fuzzy Measures

Unit IV

8 MEMBERSHIP FUNCTIONS,

DEFUZZIFICATION, FUZZY ARITHMETIC

AND FUZZY MEASURES

Unit Structure

8.0 Objectives

8.1 Introduction to Membership Function

8.2 Features of the Membership Function

8.3 Overview of Fuzzification

8.4 Methods of Membership Value Assignment

8.4.1 Intuition

8.4.2 Inference & Rank Ordering

8.4.3 Angular Fuzzy Sets

8.4.4 Neural Network

8.4.5 Genetic Algorithm

8.4.6 Inductive Reasoning

8.5 Overview of Defuzzification

8.6 Concept of Lamba -Cuts for Fuzzy Sets (Alpha -Cuts)

8.7 Concept of Lamba -Cuts for Fuzzy Relations

8.8 Methods of Defuzzification

8.8.1 Max-membership Principle

8.8.2 Centroid Method

8.8.3 Weighted Average Method

8.8.4 Mean -Max Membership

8.8.5 Centers of Sums

8.8.6 Centers of Largest Area

8.8.7 First of Maxima, Last of Maxima munotes.in

## Page 147

146SOFT COMPUTING TECHNIQUES

8.9 Overview of Fuzzy Arithmetic

8.10 Interval Analysis of Uncertain Values

8.11 Mathematical operations on Intervals

8.12 Fuzzy Number

8.13 Fuzzy Ordering

8.14 Fuzzy Vectors

8.15 Extension Principles

8.16 Overview of Fuzzy Measures

8.17 Belief & Plausibility Measures

8.18 Probability Measures

8.19 Possibility & Necessity Measures

8.20 Measure of Fuzziness

8.21 Fuzzy Integrals

8.0 Ob jectives

This chapter begins with explaining the membership function and later introduces

the concept of fuzzification, defuzzification and fuzzy arithmetic.

8.1 Introduction to Membership Function

Membership function defines fuzziness in a fuzzy set irrespective of the elements

in the discrete or continuous. The membership functions are generally represented

in graphical form. There exist certain limitations for the shapes used in graphical

form of membership function. The rules that describes fuzzine ss graphically are

also fuzzy. Membership can be thought of as a technique to solve empirical

problems on the basis of experience rather than knowledge.

8.2 Features of the Membership Function

The membership function defines all the information contained i n a fuzzy set. A

fuzzy set A in the universe of discourse X can be defined as a set of ordered pairs:

A={(x, ȝ$[Ň[אX} where ȝA(.) is called membership function of A. The

membership function ȝA(.) maps X to the membership space M,i.e. ȝ$;ĺ0 munotes.in

## Page 148

147Chapter 8: Membership Functions, Defuzzification, Fuzzy Arithmetic and Fuzzy Measures

The membership value ranges in the interval [0,1]. Main features involved in

characterizing membership function are:

• Core : The core of a membership function for some fuzzy set A is defined as

that region of universe that is characterized by complete membership in the

set A. The core has elements x of the universe such that ܣߤሺݔሻൌͳ. The core

of a fuzzy set may be an empty set.

• Support : The support of a membership function for a fuzzy set A is defined

as that region of universe that is characterized by a nonzero membership. The

support comprises elements x of the universe such that ܣߤሺݔሻ

ͲǤܣߤሺݔሻൌͳ is

referred to as a fuzzy singleton.

• Boundary : The support of a membership function for a fuzzy set A is defined

as that region of universe containing elements that have nonzero but not

complete membership. The boundary comprises of those elements of x of the

universe such that Ͳ൏ܣߤሺݔሻ൏ͳ. The boundary ele ments are those which

posses s partial membership in fuzzy set A .

Figure 8.1: Properties of Membership Functions

munotes.in

## Page 149

148SOFT COMPUTING TECHNIQUES

Other types of Fuzzy Sets

Figure 8.2: (A) Normal Fuzzy Set and (B) Subnormal Fuzzy Set

• Normal fuzzy set : A fuzzy set whose membership function has at least one

element x in the universe whose membership value is unity.

o Prototypical element : The element for which the membership is equal

to 1.

• Subnormal fuzzy set : A fuzzy set wherein no membership function has it

equal to 1.

• Convex fuzzy set: A convex fuzzy set has membership function whose membership values are strictly monotonically increasing or strictly monotonically decreasing or strictly monotonically increasing than strictly

monotonically decreasing with increasing valu es for the elements in the

universe.

• Nonconvex fuzzy set: the membership value of the membership function is

not strictly monotonically increasing or decreasing or strictly monotonically

increasing than decreasing.

Figure 8.3: (A) Convex Normal Fuzzy Set and (B) Nonconvex Normal Fuzzy Set

munotes.in

## Page 150

149Chapter 8: Membership Functions, Defuzzification, Fuzzy Arithmetic and Fuzzy Measures

The intersection of two convex fuzzy set is also a convex fuzzy set. The element in

the universe for which a particular fuzzy set A has its value equal to 0.5 is called

crossover point of membership function. Th ere can be more than one crossover

point in fuzzy set. The maximum value of the membership function of the fuzzy set

A is called height of the fuzzy set . If the height of the fuzzy set is less than 1, then

the fuzzy set is called subnormal fuzzy set . When the fuzzy set A is a convex single

–point normal fuzzy set defined on the real time, then A is termed as a fuzzy

number .

Figure 8.4: Crossover Point of a Fuzzy Set

8.3 Overview of Fuzzification

Fuzzification is the process of transforming a crisp set to a fuzzy set or a fuzzy set

into a fuzzier set. This operation translates accurate crisp input value into linguistic

variables. Quantities that we consider to be accurate, crisp & deterministic, possess

uncertainty within themselves. The uncertainty arises due to vagueness, imprecision or uncertainty.

For a fuzzy set A={ȝi/xi|xi אX},a common fuzzification algorithm is performed by

keeping ȝi constant and xi being transformed to a fuzzy set Q(xi) depicting the

expression about xi. The fuzzy set Q(xi) is ref erred to as the kernel of fuzzification.

The fuzzified set A can be expressed as:

where the symbol ~ means fuzzified. This process of fuzzification is called support

fuzzification (s -fuzzification) .

munotes.in

## Page 151

150SOFT COMPUTING TECHNIQUES

Grade fuzzification (g -fuzzification) is another method where ݅ݔ
ߤ݅is expressed as a fuzzy set.

8.4 Methods of Membership Value Assignment

Following are the methods for assigning membership value:

• Intuition

• Inference

• Rank ordering

• Angular fuzzy sets

• Neural Network

• Genetic Algorithm

• Inductive Reasoning

8.4.1 Intuition

Intuition method is the base upon the common intelligence of human. It is capacity of the human to develop membership functions on the basis of their own intelligence and understanding capability. There should be an in -depth knowledge

of the application to which membership value assignment has to be made.

Figure 8.5: Membership functions for the Fuzzy variable “weight”

8.4.2 Inference & Rank Ordering

The inference method uses knowledge to perform deductive reasoning. Deduction

achieves conclusion by means of forward inference.

munotes.in

## Page 152

151Chapter 8: Membership Functions, Defuzzification, Fuzzy Arithmetic and Fuzzy Measures

Rank ordering is carried on the basis of the preferences. Pairwise comparisons

enable us to determine preferences & resulting in determining the ord er of

membership.

8.4.3 Angular Fuzzy Sets

Angular fuzzy set ‘s’ is defined on a universe of angles, thus repeating the shapes

every ʹߨ cycles. The truth value of the linguistic variable is represented by angular

fuzzy sets. The logical proposition is equated to the membership value “1” is said

to be “true” and that preposition with membership value 0 is said to be “false”. The

intermediate values between 0 & 1 correspond to proposition being partially true

or partially false.

Figure 8.6: Model of An gular Fuzzy Set

The values of the linguistic variable vary with “ ș´ WKHLUPHPEHUVKLSYDOXHVDUH

RQWKHȝșD[LV7KHPHPEHUVKLSYDOXHFRUUHVSRQGLQJWRWKHOLQJXLVWLFWHUPFDQEH

REWDLQHGIURPHTXDWLRQȝWș WWDQșZ here t is the horizontal projection of radial

vector

munotes.in

## Page 153

152SOFT COMPUTING TECHNIQUES

8.4.4 Neural Network

Figure 8.7: Fuzzy Membership function evaluated from Neural Networks

8.4.5 Genetic Algorithm

Genetic algorithm is based on the Darwin’s theory of evolution, the basic rule is

“survival of the fittest”. Genetic algorithms use the following steps to determine

the fuzzy membership function:

• For a particular functional mapping system, the same membership functions

& shapes are assumed for various fuzzy variables to be defined.

• These chosen members hip functions are then coded into bit strings.

• Then these bit strings are concatenated together

• The fitness function to be used here is noted. In genetic algorithm, fitness

function plays a major role similar to that played by activation function in

neural network.

• The fitness function is used to evaluate the fitness of each set of membership

function.

• These membership functions define the functional mapping of the system

munotes.in

## Page 154

153Chapter 8: Membership Functions, Defuzzification, Fuzzy Arithmetic and Fuzzy Measures

8.4.6 Inductive Reasoning

Induction is used to deduce causes by means of backward inference. The characteristics of inductive reasoning can be used to generate membership functions. Induction employs entropy minimization principles, which clusters the

parameters corresponding to the output classes. To perform inductive reasonin g

method, a well -defined database for the input -output relationship exist. Induction

reasoning can be applied for complex systems where database is abundant & static.

Laws of Induction:

• Given a set of irreducible outcomes of experiment, the induced probabilities

are probability consistent with all the available information that maximize

the entropy of the set.

• The induced probability of a set of independent observation is proportional

to the probability density of the induced probability of single ob servation.

• The induced rule is that rule consistent with all available information of that

minimizes the entropy

The third law stated above is widely used for development of membership function.

The membership functions using inductive reasoning are genera ted as follow:

• A fuzzy threshold is to be established between classes of data.

• Using entropy minimization screening method, first determine the threshold

line

• Then start the segmentation process

• The segmentation process results into two classes.

• Again, partitioning the first two classes one more time, we obtain three

different classes.

• The partitioning is repeated with threshold value calculation, which lead us

to partition the data set into a number of classes and fuzzy set.

• Then on the basis of shape, membership function is determined.

8.5 Overview of Defuzzification

Defuzzification is mapping process from a space of fuzzy control actions defined

over an output universe of discourse into space of crisp control action. A

defuzzification process produces a nonfuzzy control action that represents the munotes.in

## Page 155

154SOFT COMPUTING TECHNIQUES

possibility distribution of an inferred fuzzy control action. Defuzzification process

has the capability to reduce a fuzzy set into a crisp single -valued quantity or into a

crisp set; to conver t a fuzzy matrix into a crisp matrix; or to convert a fuzzy number

into a crisp number. Mathematically, the defuzzification process may also termed

as “rounding off”. Fuzzy set with a collection of membership values or a vector of

values on the unit interv al may be reduced to a single scalar quantity using

defuzzification process.

8.6 Concept of Lamba -Cuts for Fuzzy Sets (Alpha -Cuts)

Consider a fuzzy set A. The set A

ߣሺͲ൏ߣ൏ͳሻǡ
ሺߣሻെ

ሺሾߙ]-cut) set, is a crisp

set of the fuzzy set & defined as:

AߣൌሼݔȁߤܣሺݔሻߣሽǢߣאሾͲǡͳሿ

The set A

ߣ is called a weak lambda -cut set if it consists of all the elements of fuzzy

set whose

membership functions have values greater than or equal to specified value.

The set A

ߣ is called a strong lambda -cut set if it consists of all the elements of fuzzy

set whose

membership functions have values strictly greater than specified value.

AߣൌሼݔȁߤܣሺݔሻߣሽǢߣאሾͲǡͳሿ

munotes.in

## Page 156

155Chapter 8: Membership Functions, Defuzzification, Fuzzy Arithmetic and Fuzzy Measures

8.7 Concept of Lamba -Cuts for Fuzzy Relations

8.8 Methods of Defuzzification

Defuzzification is the process of conversion of a fuzzy quantity into a precise quantity. The output of a fuzzy process may be union of two or more fuzzy membership functions defined on the universe of discourse of the output variable.

Figure 8.8 (A) : First part of fuzzy output, (B) second part of fuzzy outpu t, (C)

union of parts (A) and (B)

munotes.in

## Page 157

156SOFT COMPUTING TECHNIQUES

Defuzzification Methods

• Max-membership principle

• Centroid method

• Weighted average method

• Mean -Max membership

• Centers of Sums

• Center of largest area

• First of maxima, last of maxima

8.8.1 Max -membership Principle

This method is also known as height method and is limited to peak output

functions . This method is given by the algebraic expression:

Figure 8.9 : Max -membership Defuzzification Method

8.8.2 Centroid Method

This method is also known as center of mass, center of area, center of gravity,

munotes.in

## Page 158

157Chapter 8: Membership Functions, Defuzzification, Fuzzy Arithmetic and Fuzzy Measuresis denotes an algebraic integration.

Figure 8.10 : Centroid Defuzzification Method

8.8.3 Weighted Average Method

This method is valid for symmetrical output membership functions only. Each

membership function is weighted by its maximum membership value.

denotes algebraic sum and xi is the maximum of the ith

membership function.

Figure 8.11: Weighted average defuzzification method

(two symmetrical membership functions)

munotes.in

## Page 159

158SOFT COMPUTING TECHNIQUES

8.8.4 Mean -Max Membership

This method is also known as the middle of maxima. The locations of the maxima

membership can be nonunique.

Figure 8.12: Mean -max membership defuzzification method

8.8.5 Centers of Sums

This method employs the algebraic sum of the individual fuzzy subsets. Advantage:

Fast calculation. Drawback: intersecting areas are added twice . The defuzzified

value x* is given by :

Figure 8.13: (A) First and (B) Second Membership functions, (C) Defuzzification

munotes.in

## Page 160

159Chapter 8: Membership Functions, Defuzzification, Fuzzy Arithmetic and Fuzzy Measures

8.8.6 Centers of Largest Area

This method can be adopted when the output consists of at least two convex fuzzy

subsets which are not overlapping. The output in this case is biased towards a side

of one membership function. When output fuzzy set has at least two convex regions

then the center -of-gravity of the convex fuzzy sub region having the largest area is

used to obtain the defuzzified value x*. This value is given by:

Figure 8.14: Center of Largest Area Method

8.8.7 First of Maxima, Last of Maxima

This method uses the overall output or union of all individual output fuzzy sets c j

for determining the smallest value of the domain with the maximized membership

in c j.

Figure 8.15: First of maxima (last of maxima) method

munotes.in

## Page 161

160SOFT COMPUTING TECHNIQUES

The steps used for obtaining x* are:

• Initially, the maximum height in the union is found

where sup is supremum, i.e., the least upper bound

• Then the first of maxima is found:

where inf is the infimum, i.e. the greatest lower bound.

• After this the last of maxima is found:

8.9 Overview of Fuzzy Arithmetic

Fuzzy arithmetic is based on the operations and computations of fuzzy numbers.

Fuzzy numbers help in expressing fuzzy cardinalities and fuzzy quantifiers. Fuzzy

arithmetic is applied in various engineering applications when only imprecise or

uncertain sensory da ta are available for computation. The imprecise data from the

measuring instruments are generally expressed in the form of intervals, and suitable

mathematical operations are performed over these intervals to obtain a reliable data

of the measurements (whi ch are also in the form of intervals). This type of

computation is called interval arithmetic or interval analysis.

8.10 Interval Analysis of Uncertain Values

Fuzzy numbers are an extension of the concept of intervals. Intervals are considered

at only one unique level. Fuzzy numbers consider them at several levels varying

from 0 to 1. In interval analysis, the uncertainty of the data is limited between the

intervals specified by the lower bound & upper bound. The following are the

various types of intervals :

• ሾܽͳǡܽʹሿൌሼݔȁܽͳݔܽʹሽ is closed interval

munotes.in

## Page 162

161Chapter 8: Membership Functions, Defuzzification, Fuzzy Arithmetic and Fuzzy Measures

• ሾܽͳǡܽʹሻൌሼݔȁܽͳݔ൏ܽʹሽ is an interval closed at left end side & open at

right end.

• ሺܽͳǡܽʹሿൌሼݔȁܽͳ൏ݔܽʹሽ is an interval open at left end side & closed at

right end.

• ሺܽͳǡܽʹሻൌሼݔȁܽͳ൏ݔ൏ܽʹሽ is an open interval, open at both left end and

right end.

8.11 Mathematical operations on Intervals

Let ܣൌሾܽͳǡܽʹሿƬܤൌሾܾͳǡܾʹሿǤݔאሾܽͳǡܽʹሿƬݕא

ሾܾͳǡܾʹሿ

Addition (+) : ܣܤൌሾܽͳǡܽʹሿሾܾͳǡܾʹሿൌሾܽͳܾͳǡܽʹܾʹሿ

Subtraction ( -): ܣെܤൌሾܽͳǡܽʹሿെሾܾͳǡܾʹሿൌሾܽͳെܾʹǡܽʹെܾͳሿ

We subtract the larger value out of b1 & b2 from a1. The smaller value out of b1 &

b2 from a2 is subtracted.

Multiplication (.) : Let the two intervals of confidence be A=[a1,a2] & B=[b1,b2]

defined on non -negative real line.

ܣǤܤൌሾܽͳǡܽʹሿǤሾܾͳǡܾʹሿൌሾܽͳǤܾͳǡܽʹǤܾʹሿ

If we multiply an interval with a non -negative real number ן

ןǤܣൌሾןǡןሿǤሾܽͳǡܽʹሿൌሾןǤܽͳǡןǤܽʹሿ

ןǤܤൌሾןǡןሿǤሾܾͳǡܾʹሿൌሾןǤܾͳǡןǤܾʹሿ

Division ( ൊ): The division two intervals of confidence defined on non -negative

real line is given by.

ܣൊܤൌሾܽͳǡܽʹሿൊሾܾͳǡܾʹሿൌሾܽͳȀܾͳǡܽʹȀܾʹሿ

If b1 = 0 then the upper bound increases to

λǤͳൌʹൌͲǡ
λ

Image

(#ሻǣݔאሾെܽʹǡെܽͳሿǤܣൌሾܽͳǡܽʹሿ#ൌሾെܽʹǡെܽͳሿ.

Note that ܣ#ൌሾܽͳǡܽʹሿሾെܽʹǡെܽͳሿൌሾܽͳെܽʹǡܽʹെܽͳሿ്Ͳ

The subtraction becomes addition of an image. munotes.in

## Page 163

162SOFT COMPUTING TECHNIQUES

Inverse (A-1): If

ݔאሾܽͳǡܽʹሿǡ

൬ͳ

ݔ൰ൌͳ

ܽʹǡͳ

ܽͳ൨Ǥ ǡെͳൌሾܽͳǡܽʹሿെͳ

ൌͳ

ܽʹǡͳ

ܽͳ൨Ǥ
Ǥ

െןͲǤǤ൬ͳ

ן൰Ǥ

ǡܣൊןൌܣǤͳ

ןǡͳ

ן൨ൌሾܽͳ

ןǡܽʹ

ןሿ

Max and Min Operations : ܣൌሾܽͳǡܽʹሿƬܤൌሾܾͳǡܾʹሿ

Max: ܣשܤൌሾܽͳǡܽʹሿשሾܾͳǡܾʹሿൌሾܽͳשܾͳǡܽʹשܾʹሿ

Min: ܣרܤൌሾܽͳǡܽʹሿרሾܾͳǡܾʹሿൌሾܽͳרܾͳǡܽʹרܾʹሿ

Table 8.1: Set Operations on Intervals

Table 8.2: Algebraic Properties of Intervals

8.12 Fuzzy Number

A fuzzy number is a normal, convex membership function on the real line R. Its

membership function is piecewise continuous. That is, every Ȝ-cut set A Ȝ

Ȝא[0,1],of a fuzzy number A is a closed interval of R & the highest value of

munotes.in

## Page 164

163Chapter 8: Membership Functions, Defuzzification, Fuzzy Arithmetic and Fuzzy Measures

membership of A is unity.For two given numbers A & B in R,for specific Ȝא [0,

@ZHREWDLQWZRGRVHGLQWHUYDOV

ܣߣͳൌሾܽͳሺߣͳሻǡ

ܽʹሺߣʹሻ

ሿ݉ݎ݂ݕݖݖݑ݂݊݉ݑܾ݁ݎܣ

ܤߣͳൌሾܾͳሺߣͳሻǡ

ܾʹሺߣʹሻሿ݉ݎ݂ݕݖݖݑ݂݊݉ݑܾ݁ݎܤ

)X]]\QXPEHULVDQH[WHQVLRQRIWKHFRQFHSWRILQWHUYDOV )X]]\QXPEHUVFRQVLGHU

WKHPDWVHYHUDOOHYHOVZLWKHDFKRIWKHVHOHYHOVFRUUHVSRQGLQJWRHDF h Ȝ-cut of the

IX]]\QXPEHUV7KHQRWDWLRQ$ Ȝ=[DȜ),DȜ@FDQEHXVHGWRUHSUHVHQWDFORVHG

LQWHUYDORIDIX]]\ QXPEHU$DWD Ȝ h -OHYHO

munotes.in

## Page 165

164SOFT COMPUTING TECHNIQUES

Table 8.3 Algebraic Properties of Addition and Multiplication on

Fuzzy Numbers

8.13 Fuzzy Ordering

The technique for fuzzy ordering is based on the concept of possibility measure.

For a fuzzy number A, two fuzzy sets, A1 & A2 are defined. For this number, the

set of numbers that are possibly greater than or equal to A is denoted as A1 and is

defined as

munotes.in

## Page 166

165Chapter 8: Membership Functions, Defuzzification, Fuzzy Arithmetic and Fuzzy Measures

In a similar manner, the set of numbers that are necessarily greater than A is denoted

as A2 and is defined as

where ςܣ and NA are possibility and necessity measures.

We can compare A with B1 & B2 by index of comparison such as the possibility or

necessity measure of a fuzzy set. That is, we can calculate the possibility and

necessity measures, in the set ߤܣ of fuzzy sets B1 & B2. On the basis of this, we

obtain four fundamental indices of compa rison.

8.14 Fuzzy Vectors

A vector P = (P1, P2, ... , Pn) is called a fuzzy vector if for any element we have 0

ͳ for i = 1 to n. Similarly, the transpose of the fuzzy vector e denoted by

PT

, is a column vector if P is a row vector, i.e.,

Let P & Q as fuzzy vector on length n.

munotes.in

## Page 167

166SOFT COMPUTING TECHNIQUES

Fuzzy inner product:

Fuzzy outer product:

The complement of fuzzy vector ~P has constraint Ͳ̱ܲͳݎ݂݅ൌͳݐ݊

̱ܲൌሺͳെܲͳǡͳെܲʹǡǥͳെܲ݊ሻൌሺ̱ܲͳǡ̱ܲʹǡǥǡ̱ܲ݊ሻ

Largest component is defined as its upper bound:

Smallest component is defined as its lower bound:

Properties of Fuzzy Vector

8.15 Extension Principles The extension principle allows generalization of crisp sets into fuzzy sets framework & extends point -to-point mappings for fuzzy sets.

munotes.in

## Page 168

167Chapter 8: Membership Functions, Defuzzification, Fuzzy Arithmetic and Fuzzy Measures

8.16 Overview of Fuzzy Measures

A fuzzy measure explains the imprecision or ambiguity in the assignment of an

element ן to two or more crisp sets. For representing uncertainty condition , known

as ambiguity, we assign a value in the unit interval [0, 1] to each possible crisp set

to which the element in the problem might belong. The value assigned represents

the degree of evidence or certainty or belief of the element's membership in the set.

The representation of uncertainty of this manner is called fuzzy measure. The

difference between a fuzzy measure and a fuzzy set on a universe of elements is

that, in fuzzy measure, the imprecision is in the assignment of an element to one of

two or more crisp sets, and in fuzzy sets, the imprecision is in the prescription of

the boundaries of a set.

munotes.in

## Page 169

168SOFT COMPUTING TECHNIQUES

A fuzzy measure is defined by a function g: P(X) ĺ>@ZKLFKDVVLJQVWRHDFK

FULVSVXEVHWRIDXQLYHUVHRIGLVFRXUVH;DQXPEHULQWKHXQLWLQWHUYDO> @ZKHUH

3;LVSRZHUVHWRI;$IX]]\PHDVXUHLVDVHWIXQFWLRQ7RTXDOLI\DIX]]\

PHDVXUHWKHIXQFWLRQJVKRXOGSRVVHVVFHUWDLQSURSHUWLHV$IX]]\PHDVXUHLVDOVR

GHVFULEHGDVIROORZVJ%ĺ>@ZKHUH% ؿ3;LVDIDPLO\RIFULVSVXEVHWVRI

X Here %LVD%RUHOILHOGRUD ı ILHOG$OVRJVDWLVILHVUKHIROORZLQJWKUHHD[LRPV

RIIX]]\PHDVXUHV

• %RXQGDU\FRQGLWLRQJ ݃ሺሻൌͲǢ݃ሺܺሻൌͳ

• 0RQRWRQLFLW\ J IRU HYHU\ FODVVLFDO VHW ܣǡܤאܲሺܺሻǡ݂݅ܣك

ܤǡݐ݄݊݁݃ሺܣሻ݃ሺܤሻ

• &RQWLQXLW\JIRUVHTXHQFH ܣ݅אܲሺܺሻȁ݅אܰሻǡܣͳك

ܣʹǥݎܣͳلܣʹǥݐ݄݊݁

՜ஶ݃ሺܣ݅ሻൌ݃ቀ

՜ஶܣ݅ቁ

ZKHUH1LVWKHVHWRIDOOSRVLWLYHLQWHJHUV

A ߪ ILHOGRU%RUHOILHOGVDWLVILHVWKHIROORZLQJSURSHUWLHV

• ܺאܤƬאܤ

• ݂݅ܣאܤǡݐ݄̱݊݁ܣאܤ

• % LV FORVHG XQGHU VHW XQLRQ RSHUDWLRQ LH LI ܣאܤƬܤא

ܤሺߪ݂݈݁݅݀ሻǡݐ݄݊݁ܣܤאܤሺߪ݂݈݁݅݀ሻ

7KHIX]]\PHDVXUHH[FOXGHVWKHDGGLWLYHSURSHUW\ RIVWDQGDUGPHDVXUHVK7KH

DGGLWLYHSURSHUW\VWDWHVWKDWZKHQWZRVHHV$DQG%DUHGLVMRLQWWKHQ ݄ሺܣܤሻൌ

݄ሺܣሻ݄ሺܤሻǤ
ܣكܣܤƬܤكܣ

ܤǡ
ǡ݃ሺܣ

ܤሻሾ݃ሺܣሻǡ݃ሺܤሻሿǤ
ܣתܤكܣƬܣתܤك

ܤǡ
ǡ݃ሺܣ

ܤሻ݅݊ሾ݃ሺܣሻǡ݃ሺܤሻሿ

8.17 Belief & Plausibility Measures

7KHEHOLHIPHDVXUHLVDIX]]\PHDVXUHWKDWVDWLVILHVWKUHHD[LRPVJJDQGJDQGDQDGGLWLRQDOD[LRPRIVXEDGGLWLYLW\ $EHOLHIPHDVXUHLVDIXQFWLRQ ܾ݈݁ǣܤ՜ሾͲǡͳሿ

VDWLVI\LQJD[LRPVJJDQGJRIIX]]\PHDVXUHVDQGVXEDGGLWLYLW\D[LRP It is

GHILQHGDVIROORZV munotes.in

## Page 170

169Chapter 8: Membership Functions, Defuzzification, Fuzzy Arithmetic and Fuzzy Measures

Plausibility is defined as Pl(A)=1 -EHOƖIRUDOO$ א&%3;%HOLHIPHDVXUHFDQ

be defined as bel(A)=1 -3OƖ3ODXVLELOLW\PHDVXUHFDQDOVREHGHILQHGLQGHSHQGHQWRI EHOLHI PHDVXUH$ SODXVLELOLW\ PHDVXUH LV D IXQFWLRQ 3O%ĺ>@ satisfying

D[LRPVJ JJRIIX]]\PHDVXUHVDQGWKHIROORZLQJVXEDGGLWLYLW\D[LRPD[LRP

J

IRUHYHU\ ݊אܰ DQGDOOFROOHFWLRQRIVXEVHWVRI;

7KHEHOLHIPHDVXUHDQGWKHSODXVLELOLW\PHDVXUHDUHPXWXDOO\GXDOVRLWZLOOEH

EHQHILFLDOWRH[SUHVV ERWKRIWKHPLQWHUPVRIDVHWIXQFWLRQPFDOOHGDEDVLF

SUREDELOLW\DVVLJQPHQW 7KHEDVLFSUREDELOLW\DVVLJQPHQWPLVDVHWIXQFWLRQ ǣܤ՜

ሾͲǡͳሿܿݑݏ݄ݐ݄ܽݐ݉ሺൌͲሻܽ݊݀σܣאܤ݉ሺܣሻൌ

ͳǤ
Ǥ

݉ሺܣሻאሾͲǡͳሿǡܣאܤሺܲܥሺܺሻሻ LVFDOOHG$

VEDVLFSUREDELOLW\QXPEHU

*LYHQDEDVLFDVVLJQPHQWPDEHOLHIPHDVXUHDQGDSODXVLELOLW\PHDVXUHDQGD

SODXVLELOLW\PHDVXUHFDQEHXQLTXHO\ GHWHUPLQHGE\

munotes.in

## Page 171

170SOFT COMPUTING TECHNIQUES

8.18 Probability Measures

A probability measure is the function ܲǣܤ՜ሾͲǡͳሿ

ͳǡʹƬ͵ ሺሻ

ܲሺܣܤሻൌܲሺܣሻܲሺܤሻݓ݄ݎ݁ݒ݁݊݁ܣתܤൌǡܣǡܤאܤ.

Theorem : “A belief measure bel on a finite ߪ-field B, which is a subset of P(X), is

a probability measure if and only if its basic probability assignment m is given by

m({x}) = bel({x}) and m(A) = 0 for all subsets of X that are not singletons.”

The theorem indicates fiat a probability measure on finite sets can be represented

uniquely by a function defined on the elements o f the universal set X rather than

its subsets. The probability measures on finite sets can be fully represented by a

function, P: X ՜ [0, 1] such that P(x) = m({x}) . This function P(X) is called

probability distribution function .

Within probability measure, the total ignorance is expressed by the uniform

probability distribution function:

ܲሺݔሻൌ݉ሺሼݔሽሻൌͳ

ȁܺȁ݂ݎ݈݈ܽݔאܺ

The plausibility and belief measures can be viewed as upper & lower probabilities

that characterize a set of probabili ty measures.

munotes.in

## Page 172

171Chapter 8: Membership Functions, Defuzzification, Fuzzy Arithmetic and Fuzzy Measures

8.19 Possibility & Necessity Measures

A group of subsets of a universal set is nested if these subsets can be ordered in a

way that each is contained in the next; i.e. ܣͳؿܣʹؿܣ͵ǥؿ݊ܣǡ݅ܣאܲሺܺሻ are

nested sets. When the focal elements of a body of evidence (E, m) are nested, the

linked belief and plausibility measures are called consonants, because here the

degrees of evidence allocated to them do not conflict with each other.

Theorem: “Consider a consonant body of evidence (E, m), the associated consonant belief and plausibility measures possess the following properties:

݈ܾ݁ሺܣתܤሻൌ൫݈ܾ݁ሺܣሻǡ݈ܾ݁ሺܤሻ൯

݈ܲሺܣܤሻൌሺ݈ܲሺܣሻǡ݈ܲሺܤሻሻ

for all ܣǡܤאܤሺܥܲሺܺሻሻ.

Consonant belief and plausibility measures are referred to as necessity & possibility

measures & are denoted by N and ς, respectively.

The possibility measure ς & necessity measure N are function:

ςǣܤ՜ሾͲǡͳሿƬǣ՜ሾͲǡͳሿ
ς & N both satisfy the axioms g1,g2 & g3

of fuzzy measures and following axiom g7:

ςሺܣܤሻൌ൫ςሺሻǡςሺܤሻ൯ܣǡܤאܤ

ܰሺܣתܤሻൌ݊݅൫ܰሺሻǡܰሺܤሻ൯ܣǡܤאܤ

Necessity and possibility are special subclasses of belief and plausibility measures,

they are related to each other by

ςሺܣሻൌͳെܰሺ#ሻƬܰሺܣሻൌͳെςሺ#ሻܣאߪ݂݈݁݅݀

munotes.in

## Page 173

172SOFT COMPUTING TECHNIQUES

8.20 Measure of Fuzziness

The fuzzy measures concept provides a general mathematical framework to deal

with ambiguous variables. Measures of uncertainty related to vagueness are

referred to as measures of fuzziness. A measure of fuzziness is a function ݂ǣܲሺܺሻ՜ܴ where R is the real line and P(X) is the set of all fuzzy subsets of X.

The function f satisfies the following axioms:

• Axiom 1 (f1): f(A) = 0 if and only if A is a crisp set.

• Axiom 2 (f2): If A (shp) B, then f(A) f(B), where A (shp) B denotes that A

is sharper than B.

• Axiom 3 (f3): f(A) takes the maximum value if and only if A is maximally

fuzzy.

Axiom f1 shows that a crisp set has zero degree of fuzziness in it. Axioms f2 and

f3 are based on concept of "sharper" and "maximal fuzzy," respectively.

munotes.in

## Page 174

173Chapter 8: Membership Functions, Defuzzification, Fuzzy Arithmetic and Fuzzy Measures

8.21 Fuzzy Integral s

Summary

This chapter starts with the discussion about membership functions and their

features. The formation of the membership function is the core for the entire fuzzy

system operation. The capability of human reasoning is important for membership

functions. The inference method is based on the geometrical shapes and geometry ,

whereas the angular fuzzy set is based on the angular features. Using neural

networks and reasoning methods the memberships are tuned in a cyclic fashion and

are based on rule structure. The improvements are carried out to achieve an

optimum solution using generic algorithms. Thus, the membership function can be

formed using any one of the methods.

Later we have discussed the methods of converting fuzzy variables into crisp

variables by a process called as defuzzification. Defuzzification process is essential

because some engineering applications need exact values for performing the

operation. Defuzzification is a natural and essential technique. Lambda -cuts for

fuzzy sets a nd fuzzy relations were discussed. Apart from the Lambda -cut method,

seven defuzzification methods were presented. The method of defuzzification

should be assessed on the basis of the output in the context of data available.

Finally, we discussed fuzzy ari thmetic, which is considered as an extension of

interval arithmetic. One of the important tools of fuzzy set theory introduced by

Zadeh is the extension principle, which allows any mathematical relationship

between nonfuzzy elements to be extended to fuzzy entities. This principle can be

munotes.in

## Page 175

174SOFT COMPUTING TECHNIQUES

applied to algebraic operations to define set -theoretic operations for higher order

fuzzy sets. The belief and plausibility measures can be expressed by the basic

probability assignment m, which assigns degree of evidence o r belief indicating

that a particular element of X belongs to set A and not to any subset of A. The main

characteristic of probability measures is that each of them can be distinctly

represented by a probability distribution function defined on the element s of a

universal set apart from its subsets. Fuzzy integrals defined define by Sugeno

(1977) are also discussed. Fuzzy integrals are used to perform integration of fuzzy

functions.

Review Questions

1. What is membership function? Enlist and explain its featur es.

2. Write a short note on fuzzification.

3. Explain any three methods of membership value assignments in detail.

4. Write a short note on defuzzification.

5. What is Lambda -cuts for fuzzy set and Fuzzy relations?

6. Explain any three methods of defuzzification in detail.

7. Write a short note on fuzzy arithmetic.

8. What are the mathematical operations on intervals of fuzzy.

9. Write a short note on fuzzy number and fuzzy ordering.

10. Write a short note on fuzzy vectors.

11. Write a short note on belief and plau sibility measures.

12. Write a short note on possibility and necessity measures.

Bibliography, References and Further Reading

• Artificial Intelligence and Soft Computing, by Anandita Das Battacharya,

SPD 3rd, 2018

• Principles of Soft Computing, S.N. Sivanandam, S.N.Deepa, Wiley, 3rd ,

2019

• Neuro -fuzzy and soft computing, J.S.R. Jang, C.T.Sun and E.Mizutani,

Prentice Hall of India, 2004

munotes.in

## Page 176

175Chapter 9: Genetic Algorithm

UNIT 5

9 GENETIC ALGORITHM

Unit Structure

9.0 Introduction

9.1 Biological Background

9.2 The Cell

9.3 The Cell

9.4 Genetic Algorithm and Search Space

9.5 Genetic Algorithm vs. Traditional Algorithms

9.6 Basic Terminologies in Genetic Algorithm

9.7 Simple GA

9.8 General Genetic Algorithm

9.9 Operators in Genetic Algorithm

9.10 Stopping Condition for Genetic Algorithm Flow

9.11 Constraints in Genetic Algorithm

9.12 Problem Solving Using Genetic Algorithm

9.13 The Schema Theorem

9.14 Classification of Genetic Algorithm

9.15 Holland Classifier Systems

9.16 Genetic Programming

9.17 Advantages and Limitations of Genetic Algorithm

9.18 Applications of Genetic Algorithm

9.19 Summary

9.20 Review Questions

munotes.in

## Page 177

176SOFT COMPUTING TECHNIQUES

Learning Objectives

x Gives an introduction to natural evolution.

x Lists the basic operators (selection, crossover, mutation) and other terminologies used in Genetic Algorithms (GAs).

x Discusses the need for schemata approach.

x Details the comparison of traditional algorithm with GA.

x Explains the operational flow of simple GA.

x Description is given of the various classifications of GA - Messy GA,

adaptive GA, hybrid GA, parallel GA and independent sampling GA.

x The variants of parallel GA (fine -grained parallel GA and c oarse -grained

parallel GA) are included.

x Enhances the basic concepts involved in Holland classifier system.

x The various features and operational properties of genetic programming are

provided.

x The application areas of GA are also discussed.

Thales R. Darwi n says that "Although the belief that an organ so perfect as the eye

could have been formed by natural selection is enough to stagger any one; yet in

the case of any organ, if we know of a long series of gradations in complexity, each

good for its possesso r, then, under changing conditions of life, there is no logical

impossibility in the acquirement of any conceivable degree of perfection through

natural selection."

9.0 Introduction

Thales Darwin has formulated the fundamental principle of natural selectio n as the

main evolutionary tool. He put forward his ideas without the knowledge of basic

hereditary principles. In 1865, Gregory Mendel discovered these hereditary

principles by the experiments he carried out on peas. After Mendel's work genetics

was devel oped. Morgan experimentally found that chromosomes were the carriers

of hereditary information and that genes representing the hereditary factors were

lined up on chromosomes. Darwin's natural selection theory and natural genetics

remained unlinked until 1 920s when it was proved that genetics and selection were

in no way contrasting each other. Combination of Darwin’s and Mendel’s ideas

leads to the modern evolutionary theory. munotes.in

## Page 178

177Chapter 9: Genetic Algorithm

In The Origin of Species, Thales Darwin stated the theory of natural evolution.

Over many generations, biological organisms evolve according to the principles of

natural selection like "survival of the fittest" to reach some remarkable forms of

accomplishment. The perfect shape of the albatross wing, the efficiency and the

similarity b etween sharks and dolphins and so on are good examples of what

random evolution with absence of intelligence can achieve. So, if it works so well

in nature, it should be interesting to simulate natural evolution and try to obtain a

method which may solve c oncrete search and optimization problems.

For a better understanding of this theory, it is important first to understand the

biological terminology used in evolutionary computation . It is discussed in

Section 1.2

In 1975, Holland developed this idea in Adaptation in Natural and Artificial

Systems. By describing how to apply the principles of natural evolution to optimization problems, he laid down the first GA. Holland’s theory has been further

developed and now GA s stand up as powerful adaptive methods to solve search and optimization problems. Today, GAs are used to resolve complicated optimization problems, such as, organizing the time table, scheduling job shop,

playing games.

What are Genetic Algorithms?

GAs is adaptive heuristic search algorithms based on the evolutionary ideas of

natural selection and genetics. As such they represent an intelligent exploitation of

a random search used to solve optimization problems. Although randomized, GAs

are by no means random; instead they exploit historical information to direct the

search into the region of better performance within the search space. The basic

techniques of the GAs are designed to simulate processes in natural systems

necessary for evolu tion, especially those that follow the principles first laid down

by Thales Darwin, "survival of the fittest," because in nature, competition among

individuals for seamy resources results in the fittest individuals dominating over

the weaker ones.

Why Gene tic Algorithms?

They are better than conventional algorithms in that they are more robust. Unlike

older AI systems, they do not break easily even if the inputs are changed slightly

or in the presence of reasonable noise. Also, in searching a large state -space,

multimodal state -spare or n-dimensional source, a GA may offer significant

benefits over more typical optimization techniques (linear programming, heuristic,

depth -first and praxis.) munotes.in

## Page 179

178SOFT COMPUTING TECHNIQUES

9.1 Biological Background

The science that deals with the mechanisms responsible for similarities and

differences in a species is called Genetics. The word "genetics" is derived from the

Greek word "genesis" meaning "to grow" or "to become. “The science of genetics

helps us to differentiate between heredity and variations and accounts for the

resemblances and differences during the process of evolution. The concepts of GAs

are directly derived from natural evolution and heredity. The terminologies

involved in the biological background of species are discussed in the followi ng

subsections.

9.2 The Cell

Every animal/human cell is a complex of many "small" factories that work together.

The centre of all this is the cell nucleus. The genetic information is contained in the

cell nucleus. Figure 9-1 shows anatomy of the animal cell and cell nucleus.

Chromosomes

All the genetic information gets stored in the chromosomes. Each chromosome is

build of deoxyribonucleic acid (DNA). In humans, chromosomes exist in pairs (23

pairs found). The chromosomes are divided into several parts called genes. Genes

code the properties of species, i.e., the characteristics of an individual. The

possibilities of combination of the genes for one property are called alleles, and a

gene can take different alleles. For example, there is a gene for eye colour, and all

the different possible alleles are black, brown, blue and green (since no one has red

or violet eyes!). The set of all possible alleles present in a particular population

forms a gene pool. This gene pool can determine all the differen t possible variations

for the future generations. The size of the gene pool helps in determining the

diversity of the individuals in the population. The set of all the genes of a specific

species is called genome. Each and every gene has a unique position on the genome

called

munotes.in

## Page 180

179Chapter 9: Genetic Algorithm

Fig9-1 anatomy of the animal cell and cell nucleus

Locus . In fact, most living organisms store their genome on several chromosomes,

but in the GAs, all the genes are usually stored on the same chromosomes. Thus,

chromosomes and genomes are synonyms with one other in GAs. Figure 9-2 shows

a model of chromosome.

9.2.3 Genetics

For a particular individual, the entire combination of genes is called genotype. The

phenotype describes the physical aspect of decoding a genotype to produce the

phenotype. One interesting point of evolution is that selection is always done on the phenotype whereas the reproduction recombines genotype. Thus, morphogenesis plays a key role between s ection and reproduction. In higher life

forms, chromosomes contain two sets of genes. These are known as diploids. In the

munotes.in

## Page 181

180SOFT COMPUTING TECHNIQUES

case of conflicts between two values of the same pair of genes, the dominant one

will determine the phenotype whereas the other one, c alled recessive, will still be

present and

Figure 9-2 Model of chromoson

Figure 9-3 Development of genotype to Phonotype

Can be passed onto the offspring. Diploid allows a wider diversity of alleles. This

provides a useful memory mechanism in changing or noisy environment. However,

most GAs concentrates on haploid chromosomes because they are much simple to

construct. In haploid representation, only one set of each gene is stored, thus the

process of determining which allele should be dominant and which one should be

recessive is avoided. Figure 9-3 shows the development of genotype to phenotype.

munotes.in

## Page 182

181Chapter 9: Genetic Algorithm

9.2.4 Reproduction

Reproduction of species via genetic information is carried out by the following;

1. Mitosis: In mitosis the same genetic information is copied to new offspring.

There is no exchange of information. This is a normal way of growing of

multicell structures, such as organs. Figure 9-4 shows mit osis form of

reproduction.

2. Meiosis: Meiosis forms the basis of sexual reproduction. When meiotic

division takes place, two gametes appear in the process. When reproduction

occurs, these two gametes conjugate to a zygote which becomes the new

individual. Thus in this case, the genetic informa tion is shared between the

parents in order to create new offspring. Figure 9-5 shows meiosis form of

reproduction.

munotes.in

## Page 183

182SOFT COMPUTING TECHNIQUES

Figure 9-5 Meiosis form of reproduction

Table 9·1 Comparison of natural evolution and genetic algorithm terminology Natural evolution Genetic algorithm Chromosome String Gene Feature or character Allele Feature value Locus String position Genotype Structure or coded string Phenotype Parameter set, a decoded structure

9.2.5 Natural Selection

The origin of species is based on "Preservation of favourable variations and

rejection of unfavourable variations.” The variation refers to the differences shown

by the individual of a species and also by offspring's of the same parents. There are

more individuals born than can su rvive, so there is a continuous struggle for life.

Individuals with an advantage have a greater chance of survival, i.e., the survival

of the fittest. For example, Giraffe with long necks can have food from tall trees as

well from the ground; on the other hand, goat and deer having smaller neck can

have food only from the ground. As a result, natural selection plays a major role in

this survival process.

munotes.in

## Page 184

183Chapter 9: Genetic Algorithm

Table 9.1 gives a list of different expressions, which are common in natural

evolution and genetic algor ithm.

9.3 Traditional Optimization and Search Techniques

The basic principle of optimization is the efficient allocation of scarce resources.

Optimization can be applied to any scientific or engineering discipline. The aim of

optimization is to find an algorithm which solves a given class of problems. There

exists no specific method which solves all optimization problems. Consider a

function,

f(x) : [x1,x"] o[0, 1] ……….(1)

Where

f(x)= ^1 if l|x - a|| <, > 0, -1 elsewhere ……….(2)

For the above function, f can be maintained by decreasing or by making the

interval of [x1, x"] large. Thus, a difficult task can be made easier. Therefore, one

can solve optimization problems by combining human creativity and the ra w

processing power of the computers. The various conventional optimization and search techniques available are discussed in the following subsections.

9.3.1 Gradient Based Local Optimization Method

When the objective function is smooth and one needs efficient local optimization,

it is better to use gradient -based or Hessian -based optimization methods. The

performance and reliability of the different gradient methods vary considerably. To

discuss gradient -based local optimization, let us assume a smooth objective function (i.e., continuous first and second derivatives). The object function is

denoted by

f(x) : KnoR …….(3)

The first derivatives are contained in the gradient vector 'f(x)

wf(x)iwxl

'f(x) = : ……(4)

wf(x)iwxn munotes.in

## Page 185

184SOFT COMPUTING TECHNIQUES

The second derivatives of the object function are contained in the Hessian matrix

H(x):

……………..(5)

Few methods need only the gradient vector, but in the Newton's method we need

the Hessian matrix. The general pseudo code used in gradient methods is as

follows:

Select an initial guess value x1and set n = I.

Repeat

Solve the search direction Pn from Eq. (5) below.

Determine the next iteration point using Eq. ( 5) below:

xn+I= Xn+On Pn

Setn=n+l.

Until || Xn – Xn-1 || < ……(6)

These gradient methods search for minimum and not maximum. Several different

methods are obtained based on the details of the algorithm.

The search direction Pn in conjugate gradient method is found as follows:

Pn= -'f(Xn)+EnPn-1 ……………….(7)

In sec ond method,

EnPn= -'f(xn) …………..(8)

is used for finding search direction. The matrix En in Eq. (6) estimates the Hessian

and is updated in each iteration. When En is defined as the identity matrix, the

steepest descent method occurs. When the matrix Bn is the Hessian H (xn), we get

the Newton's method.

munotes.in

## Page 186

185Chapter 9: Genetic Algorithm

The length O n of the search step is computed using:

O n= argmin f(an + OPan) …..(9)

O n>0

The discussed is a one -dimensional optimization problem. The steepest descent

method provides poor performance. As a result, conjugate gradient method can be

used. If the second derivatives are easy to compute, then Newton’s method may provide best results. The secant methods are faster than conjugate gradient methods, but there occurs memory problems. Thus, these local optimization methods can be combined with other methods to get a good link between performance and reliability.

9.3.2 Random Search

Random search is an extremely basic method. It only explodes the search space by randomly selecting solutions and evaluates their fitness. This is quite an unintelligent strategy, and is rarely used. Nevertheless, this method is sometimes

worth testing. It doesn't take much effort to implement it, and an important number

of evaluations can be done fairly quickly. For new unresolved problems, it can be

useful to compare the results of a more advanced algorithm to those obtained just

with a random search for the same numb er of evaluations. Nasty surprises might

well appear when comparing, for example, GAs to random search. It’s good to

remember that the efficiency of GA is extremely dependent on consistent coding

and relevant reproduction operators. Building a GA which per forms no more than

a random search happens more often than we can expect. If the reproduction

operators are just producing new random solutions without any concrete links to

the ones selected from the last generation, the GA is just doing nothing else than a

random search.

Random search does have a few interesting qualities. However good the obtained

solution may be, if it’s not optimal one, it can be always improved by continuing

the run of the random search algorithm for long enough. A random search never

gets stuck at any point such as a local optimum. Furthermore, theoretically, if the

search space is finite, random search is guaranteed to reach the optimal solution.

Unfortunately, this result is completely useless. For most of problems we are

interested in, exploiting the whole search space takes lot of time.

9.3.3 Stochastic Hill Climbing

Efficient methods exist for problems with well -behaved continuous fitness functions. These methods use a kind of gradient to guide the direction of search.

Stochastic hill climbing is the simplest method of these kinds. Each iteration munotes.in

## Page 187

186SOFT COMPUTING TECHNIQUES

consists in choosing randomly a solution in the neighbourhood of the current

solution and retains this new solution only if it improves the fitness function.

Stochastic hill climbing conve rges towards the optimal solution if the fitness

function of the problem is continuous and has only one peak (un imodal function).

On functions with many peaks (multimodal functions), the algorithm is likely to

stop on the first peak it finds even if it is not the highest one. Once a peak is reached,

hill climbing cannot progress anymore, and that is problematic when this point is a

local optimum. Stochastic hill climbing usually starts from a random select point.

A simple idea to avoid getting stuck on the first local optimal consists in repeating

several hill climbs each time starting from a different randomly chosen point. This

meth od is sometimes known as iterated hill climbing. By discovering different

local optimal points, chances to reach the global optimum increase. It works well

if there are not too many local optima in the search space. However, if the fitness

function is very "noisy" with many small peaks, stochastic hill climbing is

definitely nor a good method to use. Nevertheless, such methods have the

advantage of being easy to implement and giving fairly good solutions very

quickly.

9.3.4 Simulated Annealing

Simulated ann ealing (SA) was originally inspired by formation of crystal in solids

during cooling. As discovered a long time ago by Iron Age blacksmiths, the slower

the cooling, the more perfect is the crystal formed. By cooling, complex physical

systems naturally conv erge rewards a stare of minimal energy. The system moves

randomly, but the probability to stay in a particular configuration depends directly

on the energy of the system and on its temperature. This probability is formally

given by Gibbs law:

in = eElkT …….(10)

where E stands for the energy, k is the Boltzmann constant and T is the temperature.

In the mid0l970s, Kirkpatrick by analogy of these physical phenomena; laid out the

first description of SA.

As in the stochastic hill climbing, the iteration of the SA consists of randomly

choosing a new solution in the neighbourhood of the actual solution. If the fitness

function of the new solution is better than the fitness function of the current one,

the ne w solution is accepted as the new current solution. If the fitness function is

not improved, the new solution is retained with a probability:

P = e -1f(y) -f(x)|lkT ……. (11) munotes.in

## Page 188

187Chapter 9: Genetic Algorithm

Where f(y) - f(x) is the difference of the fitness function between the new and the

old solution.

The SA behaves like a hill climbing method but with the possibility of going

downhill to avoid being trapped at local optima. When the temperature is high, the

probability of deteriorate the solution is quite important, and then a lot of large moves are possible to explode the search space. The more the temperature decreases, the more difficult it is to go downhill. The algorithm thus tries to climb

up from the current solution to reach a maximum. When temperature is lower, there

is an exp loitation of the current solution. If the temperature is too low, number

deterioration is accepted, and the algorithm behaves just like a stochastic hill

climbing method. Usually, the SA stars from a high temperature which decreases

exponentially. The slow er the cooling, the better it is for finding good solutions. It

even has been demonstrated that with an infinitely slow cooling, the algorithm is

almost certain to find the global optimum. The only point is that infinitely slow

cooling consists in finding the appropriate temperature decrease rate to obtain a

good behaviour of the algorithm.

SA by mixing exploitation features such as the random search and exploitation

features like hill climbing usually gives quite good results. SA is a serious

competitor of GAs. It is worth trying to compare the results obtained by each. Both

are derived from analogy with natural system evolution and both deal with the same

kind of optimization problem. GAs differ from SA in two main features which

makes them more efficient. First, GAs use a population -based selection whereas

SA only deals with one individual at each iteration. Hence Gas are expected to

cover a much larger landscape of the search space at each iteration; however, SA

iterations are much more simple, and so, often much f aster. The grocer advantage

of GA is its exceptional ability to be parallelized, whereas SA does not gain much

of this. It is mainly due to the population scheme use by GA. Second, Gas use

recombination operators, and are able to mix go od characteristics from different

solutions. The exploitation made by recombination operators are supposedly

considered helpful to find optimal salmons of the problem. On the other hand, SA

is still very simple to implement and gives good this. SAs have pr oved their

efficiency over a large spectrum of difficult problems, like the optimal layout or

primed circuit board or the famous travelling salesman problem.

9.3.5 Symbolic Artificial Intelligence

Most symbolic artificial intelligence (AI) systems are very static. Most of them can

usually only solve one given specific problem, since their architecture was designed

for whatever that specific problem was in the first place. Thus, if the given problem

were somehow to be changed, these systems could have a hard time adapting to

them; since the algorithm that would originally arrive co the solution may be either

incorrect or less efficient. GAs were created to combat these problems. They are munotes.in

## Page 189

188SOFT COMPUTING TECHNIQUES

basically algorithms based on natural biological evolution. The architec ture of

systems that implement GAs is more able to adapt to a wide range of problems. A

GA functions by generating a large set of possible solutions to a given problem. It

then evaluates each of chose solutions, and decides on a "fitness level" (you may

recall the phrase: "survival of the fittest") for each solution set. These solutions then

breed new Solutions. The parent solutions that were more "fit” are more likely m

reproduce, while those that were less "fit" are more unlikely to do so. In essence,

solutions are evolved over time. This way we evolve our search space scope to a

point where you can find the solution. GAs can be incredibly efficient if

programmed correctly.

9.4 Genetic Algorithm and Search Space

Evolutionary computing was introduced in th e 1960s by I. Rothenberg in the work

"Evolution Strategies. “This idea was then developed by other researches. GAs

were invented by John Holland and developed this idea in his book "Adaptation in

Natural and Artificial Systems" in the year 1975. Holland pr oposed GA as a

heuristic method based on "survival of the finest." GA was discovered as a useful

tool for search and optimization problems.

9.4.1 Search Space

Most often one is looking for the best solution in a specific set of solutions. The

space of all feasible solutions (the set of solutions among which the desired solution

resides) is called search space (also state space). Each and every point in the searc h

space represents one possible solution. Therefore, each possible solution can be

“marked" by its fitness value, depending on the problem definition. With GA one

looks for the best solution among a number of possible solutions - represented by

one point in the search space; GAs are used to search the search space for the best

solution, e.g., minimum. The difficulties in this case are the local minima and the

starting point of the search. Figure 9.6 gives an example of search space.

Figure 9.6 : An example of search space.

munotes.in

## Page 190

189Chapter 9: Genetic Algorithm

9.4.2 Genetic Algorithms World

GA raises again a couple of important features. First, it is a stochastic algorithm;

randomness has an essential role in GAs. Both selection and reproduction need

random procedures. A second very impo rtant point is that GAs always considers a

population of solutions. Keeping in memory more than a single solution at each

iteration offers a lot of advantages. The algorithm can recombine different solutions

to the better ones and so it can use the benefit s of assortment. A population -based

algorithm is also very amenable for parallelization. The robustness of the algorithm

should also be mentioned as something essential for the algorithm's success. To

business refers to the ability to perform consistently well on a broad range of

problem types. There is no particular requirement on the problem before using

GAs, so it can be applied to resolve any problem. All these features make GA a

really powerful optimization tool.

With the success of GAs, other algorithms making use of the same principle of

natural evolution have also emerged. Evolution strategy, genetic programming are

some algorithms similar to these algorithms. The classification is not always clear

between the different algorithms, thus to avoid any confusion, they areal gathered

in what is called Evo1ationary Algorithms .

The analogy with nature gives these algorithms something exciting and enjoyable.

Their ability to deal successfully with a wide range of problem area, including those

which are difficult for other methods to solve makes them quite powerful. However

today, GAs is suffering from too muc h readiness. GA is a new field, and parts of

the theory still have to be properly established. We can find almost as many

opinions on GAs as there are researchers in this field. In this document, we will

generally find the most current point of view. But t hings evolve quickly in GAs

too, and some comments might not be very accurate in few years.

It is also important to mention GA limits in this introduction. Like most stochastic

methods, GAs is not guaranteed to find the global optimum salmon to a problem;

they are satisfied with finding "acceptably good" solutions to the problem. GAs are

extremely general too, and so specific techniques for solving particular problems

are likely to out -perform GAs in both speed and accuracy of the final result. GAs

are some thing worth trying when everything else fails or when we know absolutely

nothing of the search space. Nevertheless, even when such specialized techniques

exist, it is often interesting to hybridize them with a GA in order to possibly gain

some improvements . It is important always to keep an objective point of view; do

not consider that GAs is a panacea for resolving all optimization problems. This

warning is for those who might have the temptation to resolve anything with GA. munotes.in

## Page 191

190SOFT COMPUTING TECHNIQUES

The proverb says "If we have a hammer, all the problems look like a nails.'' GAs

do work and give excellent results if they are applied properly on appropriate

problems.

9.4.3 Evolution and Optimization

To depict the importance of evolution and optimization process, consider a species

Basilosaurus that originated 45 million years ago. The Basilosau rus was a prototype

of a whale (Figure 9-7). It was about 9 m long and

Figure 9-7Basilosaurus.

Figure 9·8 Tutsiops flipper.

weighed approximately 5 tons. It still had a quasi -independent head and posterior

paws, and moved using undulatory movements and hunted small preys. Its anterior

members were reduced to small flippers with an elbow inoculation; Movements in

such a viscous element (water) are very hard and require big efforts. The anterior

members of basilosaurus were not really adapted to swimming. To adapt them, a

double phenomenon must occur the shortening of the "arm" with the locking of the

elbow articulation and the extension of the fingers constitute the base structure of

the flipper (refer Figure 9-8).

The image shows that two fingers of the common dolphin are hypertrophied to the

detriment of the rest of the member. The basilosaurus was a hunter; it had to be fast

and preci se. Through time, subjects appeared with longer fingers and short arms.

munotes.in

## Page 192

191Chapter 9: Genetic Algorithm

They could move faster and more precisely than before, and therefore, live longer

and have many descendants.

Meanwhile, other improvements occurred concerning the general aerodynamic l ike

the integration of the head to the body, improvement of the profile, strengthening

of the caudal fin, and so on, finally producing a subject perfectly adapted to the

constraints of an aqueous environment. This process of adaptation and this

morphologic al optimization is so perfect that nowadays the similarity between a

shark, a dolphin or submarine is striking. The first is a cartilaginous fish

(Chondrichryen) that originated in the Devonian period ( -400 million years), long

before the apparition of the first mammal. Darwinian mechanism hence generated

an optimization process -hydrodynamic optimization - for fishes and others marine

animals –auto dynamic optimization for pterodactyls, birds and bars. This observation is the basis of GAs.

9.4.4 Evolution and Genetic Algorithms

The basic idea is as follows: the genetic pool of a given population polemically

contains the solution, or a better solution, to a given adaptive problem. This solution

is not “active” because the genetic combination on whi ch it relies split among

several subjects. Only the association of different genomes can lead to the solution.

Simplistically speaking, we could by example consider that the shortening of the

paw and the extension of the fingers of our basilosaurus are con trolled by two

"genes." No subject has such a genome, but during reproduction and crossover,

new genetic combination occur and, finally, a subject can inherit a "good gene

“from both parents his paw is now a flipper.

Holland method is especially effective because he not only considered the role of

mutation (mutations improve very seldom the algorithms), but also utilized genetic

recombination (crossover): these recombination, the crossover of partial solutions,

greatly improve the capability of the algorith m to approach, and eventually find,

the optimum.

Recombination of sexual reproduction is a key operator for natural evolution.

Technically, it takes two genotypes and it produces a new genotype by mixing the

gene found in the originals. In biology, the mos t common form of recombination is

crossover: two chromosomes are cur at one point and the halves are spliced to

create new chromosomes. The effect of recombination is very important because it

allows characteristics from two different parents to be assorte d. If the father and

the mother possess different good qualities, we would expect that all the good

qualities will be passed to the child. Thus the offspring, just by combining all the munotes.in

## Page 193

192SOFT COMPUTING TECHNIQUES

good features from its parents, may surpass its ancestors. Many people believe that

this mixing of genetic material via sexual reproduction is one of the most powerful features of GAs. As a quick parenthesis about sexual reproduction, GA representation usually does not differentiate male and female individuals (without

any pe rversity). As in many livings species (e.g., snails) any individual can be either

a male or a female. Infact, for almost all recombination operators, mother and father

are interchangeable.

Mutation is the other way to get new genomes. Mutation consists in changing the

value of genes. In natural evolution, mutation mostly engenders non -viable

genomes. Actually mutation is not a very frequent operator in natural evolution.

Nevertheless, in optimization, a few random changes can be a good way of

exploiting the search space quickly.

Through those low -level notions of genetic, we have seen how living beings store

their characteristic information and how this information can be passed into their

offspring. It very basic but it is more than enough to understand the GA theory.

Darwin was totally unaware of the biochemical basics of genetics. Now we know

how the genetic inheritable information is coded in DNA, RNA, and proteins and

that the coding principles are actually digital, much resembling the information

storag e in computers. Information processing is in many ways totally different,

however. The magnificent phenomenon called the evolution of species can also

give some insight into information processing methods and optimization, in

particular. According to Darwi nism, inherited variation is characterized by the

following properties:

1. Variation must be copying because selection does not create directly

anything, but presupposes a large population to work on.

2. Variation must be small -scaled in practice. Specie s do not appear suddenly.

3. Variation is undirected. This is also known as the blind watch maker

paradigm.

While the natural sciences approach to evolution has for over a century been to

analyse and study different aspects of evolution to find the underlying principles,

the engineering sciences are happy to apply evolutionary principles, that have been heavily tested over billions of years, to arrack the most complex technical problems, including protein folding.

munotes.in

## Page 194

193Chapter 9: Genetic Algorithm

9.5 Genetic Algorithm vs. Traditio nal Algorithms

The principle of Gas is simple: emirate genetics and natural selection by a computer

program: The parameters of the problem are coded most naturally as a DNA - like

linear data structure, a vector or a suing. Sometimes, when the problem is na turally

two or three dimensional, corresponding array structures are used.

A set, called population, of these problem -dependent parameter value vectors is

processed by GA. To start, there is usually a totally random population, the values of different parameters generated by a random number generator. Typical population size is from few dozens to thousand s. To do optimization we need a cost

function or fitness function as it is usually called when Gas are used. By a fitness

function we can select the best solution candidates from the population and delete

the not so good specimens.

The nice thing when comp aring GAs to other optimization methods is that the

fitness function can be nearly anything that can be evaluated by a computer or even

something that cannot In the latter case it might be a human judgment that cannot

be seated as a crisp program, like in the case of eye witness, where a human being

selects from the alternatives generated by GA. So, there are not any definite

mathematical restrictions on the properties of the fitness fraction. It may be

discrete, multimodal, etc.

The main criteria used to classify optimization algorithms are as follows: continuous/discrete, constrained/unconstrained and sequential/parallel. There is a clear difference between discrete and continuous problems. Therefore, it is instructive to notice that continuous methods are sometimes used to solve inherently discrete problems and vice versa. Parallel algorithms are usually used to

speed up processing. There are, however, some cases in which it is more efficient

to run several processors in parallel rather than sequentially. These cases include

among others those in which there is high probability of each individual search run

to get stuck into a local extreme.

Irrespective of the above classification, optimization methods can be further

classified into deterministic and non -deterministic methods. In addition, optimization algorithms can be classified as local or global. Interns of energy and

entropy local search correspond to entropy while global optimization depends

essentially on the fitness, i.e., energy landscape.

GA diffe rs from conventional optimization techniques in following ways: munotes.in

## Page 195

194SOFT COMPUTING TECHNIQUES

1. GAs operate with coded versions of the problem parameters rather than

parameters themselves, i.e., GA works with the coding of solution sec and

nor with the solution itself.

2. Almost all conventional optimization techniques search from a single point,

but GAs always operate on a whole population of points (strings), i.e., GA

uses population of solutions rather than a single solution for searching. This

plays a major role to the robustness of GAs. It improves the chance of

reaching the global optimum and also helps in avoiding local stationary point.

3. GA uses fitness fiction for evaluation rather than derivatives. As a result, they

can be applied to any kind of continuous or discrete opt imization problem.

The key point to be performed here is to identify and specify a meaningful

decoding function.

4. GAs use probabilistic transition operates while conventional methods for

continuous optimization apply deterministic transition operates, i .e., Gas

does not use deterministic rules. These are the major differences that exist between GA and conventional optimization techniques.

9.6 Basic Terminologies in Genetic Algorithm

The two distinct elements in the GA are individuals and populations. An individual

is a single solution while the population is the set of individuals currently involved

in the search process.

9.6.1 Individuals

An individual is a single solution. Individual groups together two forms of solutions

as given below:

I. The chromosome which is the raw "genetic" information (genotype) that the

GA deals.

2. The phenotype which is the expressive of the chromosome in the terms of the

model.

A chromosome is subdivided into genes. A gene is the GA's representation of a

single factor for a control factor. Each factor in the solution set corresponds to a

gene in the chromosome. Figure 9-9 shows the representation of a genotype. munotes.in

## Page 196

195Chapter 9: Genetic Algorithm

A chromosome should in some way c ontain information about the solution that it represents. The morphogenesis function associates each genotype with its phenotype. It simply means that each chromosome must define one unique

solution, but it does not mean that each solution is encoded by ex actly one

chromosome. Indeed, the morphogenesis function is not necessarily objective, and it is even sometimes impossible (especially with binary representation). Nevertheless, the morphogenesis function should at least be subjective. Indeed; Solution Set Phenotype Factor 1 Factor 2 Factor 3

… Factor N

Figure 9·9 Representation of genotype and phenotype. 101010111010110 Figure 9·10 Representation of a chromosome.

all the candidate solutions of the problem must correspond to at least one possible

chromosome, to be sure that the whole search space can be exploited. When the

morphogenesis function that associates each chromosome to one solution is not

injective. i.e., different chromosomes can encode the same solution, the representation is said to be degenerated. A slight degeneracy is not so worrying,

even if the space where the algorithm is looking for the optimal solution is

inevitably enlarged. Bur a too important de generacy could be a more serious

problem. It can badly affect the behaviour of the GA, mostly because if several

chromosomes can represent the same phenotype, the meaning of each gene will

obviously not correspond to a specif1c characteristic of the soluti on. It may add

some kind of confusion in the search. Chromosomes encoded by bit strings are

given in Figure 9-10.

Gene 1 Gene 2 Gene 3 … Gene N Chromosome Genotype munotes.in

## Page 197

196SOFT COMPUTING TECHNIQUES

9.6.2 Genes

Genes are the basic "instructions" for building a GA. A chromosome is a sequence

of genes. Genes may describe possible solution to a problem, without actually being

the solution. A gene is a bit string of arbitrary lengths. The bit string is a binary

representation of number of intervals from a lower bound. A gene is the GN s

representation of a single factor value for a control factor, where control factor must

have an upper bound and a lower bound. This range can be divided into the number

of intervals that can be expressed by the gene's bit string. A bit string of length "n"

can represent (2n1 - 1) intervals. The size of the interval would be (range)/ (2n- 1).

The structure of each gene is defined in a record of phenotyping parameters. The phenotype parameters are instructions for mapping between genotype and phenotype. It can also be said as encoding a solution set into a chromosome and

decoding a chromosome to a solution set. The mapping b etween genotype and

phenotype is necessary to convert solution sets from the model into a form that the

GA can work with, and for converting new individuals from the GA into a form

that the model can evaluate. In a chromosome, the genes are represented as shown

in Figure 9-11.

9.6.3 Fitness

The fitness of an individual in a GA is the value of an objective function for its

phenotype. For calculating fitness, the chromosome has to be first decoded and the

objective function h as to be evaluated. The fitness 1 0 1 0 1 1 1 0 1 1 1 1 0 1 0 1

Gene 1 Gene2 Gene 3 Gene4

Figure 9·11 Representation of a gene.

not only indicates how good the solution is, but also corresponds to how does the

chromosome is to the optimal one.

In the case of multicriterion optimization, the fitness function is definitely more

difficult to determine. In multicriterion optimization prob lems, there is often a

dilemma as how to determine if one solution is better than another. What should be

done if a solution is better for one criterion but worse for another? But here, the

trouble comes more from the definition of a "better" salmon rather than from how

to implement a GA to resolve it. If sometimes a fitness function obtained by a

simple combination of the different criteria can give good result, it supposes that munotes.in

## Page 198

197Chapter 9: Genetic Algorithm

criterions can be combined in a consistent way. But, for more advanced problem s,

it may be useful to consider something like Pareto optimally or other ideas from

multicriterian optimization theory.

9.6.4 Populations

A population is a collection of individuals. A population consists of a number of

individuals being reseed, the phenotype parameters defining the individuals and

some information about the search space. The two important aspects of population

used in GAs are:

1. The initial population generation.

2. The population size.

For each and every problem, the population s ize will depend on the complexity of

the problem. It is often a random initialization of population. In the case of a binary

coded chromosome this means chat each bit is initialized to a random 0 or 1.

However, there may be instances where the initializati on of population is carried

out with some known good solutions.

Ideally, the first population should have a gene pool as large as possible in order to

be able to explode the whole search space. All the different possible alleles of each

should be present i n the population. To achieve this, the initial population is, in

most of the cases, chosen randomly. Nevertheless, sometimes a kind of heuristic

can be used to seed ·the initial population. Thus, the mean fitness of the population

is already high and it ma y help the GA to find good solutions faster. Bur for doing

this one should be sure that the gene pool is spillage enough. Otherwise, if the

population badly lacks diversity, the algorithm will just explode a small part of the

search space and never find gl obal optimal solutions.

The size of the population raises few problems too. The larger the population is,

the easier it is m explode the search space. However, it has been established that

the time required by a GAm converge is O (n log n) function evaluations where n

is the population size. We say that the population has converged when all the

individuals are very much alike and further improvement may only be possible by

mutation. Goldberg has also shown that GA efficiency to reach global optimum

instead of local ones is largely determined by the size of the population. To sum up, a large population is quite useful. However, it requires much more computational cost memory and time. Practically, a population size of around 100

individuals i s quite frequent, but anyway this size can be changed according to the

time and the memory disposed on the machine compared to the quali ty of the result

to be reached.

munotes.in

## Page 199

198SOFT COMPUTING TECHNIQUES

Population Chromosome 1 1 1 1 0 0 0 1 0 Chromosome 2 2 0 1 1 1 1 0 1 1 Chromosome 3 1 0 1 0 1 0 1 0 Chromosome 4 1 1 0 0 1 1 0 0 Figure 9-12 Population.

Population being combination of various chromosomes is represented as in Figure

9-12. Thus the population in Figure 9-12 consists of four chromosomes.

9.7 Simple GA

GA handles a population of possible solutions. Each solution is represented through

a chromosome, which is just an abstract representation. Coding all the possible solutions into a chromosome is the first part, but certainly not the most straightforward one o f a GA. A set of reproduction operators has to be determined,

coo. Reproduction operators are applied directly on the chromosomes, and are used

to perform mutations and recombination over solutions of the problem. Appropriate

representation and reproductio n operators are the determining factors, as the

behaviour of the GA is extremely dependent on it. Frequency, it can be extremely

difficult to find a representation that respects the structure of the search space and

reproduction operators that are coherent and relevant according to the properties of

the problems.

The simple form of GA is given by the following.

I. Scan with a randomly generated population.

2. Calculate the fitness of each chromosome in the population.

3. Repeat the following steps until n offspring’s have been created:

* Select a pair of parent chromosomes from the current population.

* With probability Pc crossover the pair at a randomly chosen point co

forms two offspring’s.

* Mutate le two offspring’s at each locus with probability Pm.

4. Replace the current population with the new population.

5. Go to seep 2.

Now we discuss each iteration of this process.

Generation: Selection: is supposed to be able to compare each individual in the

population. Selection is done by using a fitness f unction. Each chromosome has an

associated value corresponding to the fitness of the solution it represents. The

fitness should correspond to an evaluation of how good the candidate solution is. munotes.in

## Page 200

199Chapter 9: Genetic Algorithm

The optimal solution is the one which maximizes the fitness f unction. GAs deal

with the problems that maximize the fitness function. Bur, if the problem consists

of minimizing a cost function, the adaptation is quite easy. Either the cost function

can be transformed into a fitness function, for example by inverting it; or the

selection can be adapted in such way that they consider individuals with low

evaluation functions as better. Once the reproduction and the fitness function have

been properly defined, a GA is evolved according to the same basic structure. It

starts by generating an initial population of chromosomes. This first population

must offer a wide diversity of genetic materials. The gene pool should be as large

as possible so that any solution of the search space can be engendered. Generally,

the initial population is generated randomly. Then, the GA loops over an iteration

process to make the population evolve. Each iteration consists of the following

steps:

1. Selection: The first step consists in selecting individuals for reproduction.

This selection i s done randomly with a probability depending on the relative

fitness of the individuals so that best ones are often chosen for reproduction

rather than the poor ones.

2. Reproduction: In the second step, offspring are bred by selected individuals.

For gen erating new Chromosomes, the algorithm can use both recombination

and mutation.

3. Evaluation: Then the fitness of the new chromosomes is evaluated.

4. Replacement: During the last step, individuals from the old population are

killed and replaced by the new ones.

The algorithm is stopped when the population converges toward the optimal

solution.

BEGIN/* genetic algorithm"/

Generate initial population;

Compare fitness of each individual;

WHILE NOT finished DO LOOP

BEGIN

Select individuals from old generations

For mating;

Create offspring by applying

Recombination and/or mutation

The selected individuals;

Compute fitness of the new individuals;

Kill old individuals w make room for munotes.in

## Page 201

200SOFT COMPUTING TECHNIQUES

New chromosomes and insert

Offspring in the new generalization;

IF Pop ulation has converged

THEN finishes: =TRUE;

END

END

Genetic algorithms are not too hard to program or understand because they are

biological based. An example of a flowchart of a GA is shown in Figure 9-13.

Figure 9·13 Flowchart for genetic algorithm.

munotes.in

## Page 202

201Chapter 9: Genetic Algorithm

9.8 General Genetic Algorithm

The general GA is as follows:

Step 1: Create a random initial state: An initial population is created from a

random selection of solutions J (which are analogous to chromosomes). This is

unlike the situation for symbolic AI systems, where the initial State in a problem is

already given.

Step 2: Evaluate fitness: A value for fitness is assigned to each solution (chromosome) depending on how close it actually is w solving the problem (thus

arriving to the answer of the desired problem).

(These "solutions" are not to be confused w ith "answers" to the problem; think of

them as possible

Characteristics that the system would employ in order to reach the answer.)

Step 3 Reproduce (and children mutate): Those chromosomes with a higher fitness

value are more likely to reproduce offspring (which can mutate after reproduction).

The offspring is a product of the father and mother, whose composition consists of

a combination of genes from the row (this process is known as "crossing over").

Step 4: Nat generation: If the new generation contai ns a solution that produces an

output that is dose enough or equal to the desired answer then the problem has been

solved. If this is not the case, then the new generation will go through the same

process as their parents did. This will continue L until a solution is reached.

Table 9·2 : Fitness value for corresponding

Chromosomes (Example 9.1)

Chromosome Fitness

A : 00000110 2

B : 11101110 6

C : 00100000 1

D : 00110100 3

Table 9·3: Fitness value for corresponding

Chromosomes

Chromosome Fitness

A : 01101110 5

B : 00100000 1

C : 10110000 3

D : 01101110 5 munotes.in

## Page 203

202SOFT COMPUTING TECHNIQUES

Fitness -proportionate selection

(Roulette wheel sampling)

Figure 9·14 Roulette wheel sampling for proportionate selection

Example 9.1: Consider 8 -bitchromosomes with the following properties:

1. Fitness function f(x) = number of 1 bits in chromosome;

2. Population size N = 4;

3. Crossover probability Pc= 0.7;

4. Mutation probability Pm = 0.001;

Average fitness of population= 12/4 = 3 .0.

1. If B and C are selected, crossover is not performed.

2. If B is mutated, then

B : 11101110 o B' : 01101110

3. If B and D are selected, crossover is performed.

B : 11101110 E : 10110100 o D : 00110100 F : 01101110

4. If E is mutated, then

E : 10110100 o E' : 10110000

Best-fit string from previous population is lost, but the average fitness of population

is as given below:

Average fitness of population 14/4 = 3.5

Tables 9-2 and 9-3 show the fitness value for the corresponding chromosomes and

Figure 9-14 shows the Roulette wheel selection for the fitness proportionate

selection.

munotes.in

## Page 204

203Chapter 9: Genetic Algorithm

9.9 Operators in Genetic Algorithm

The basic operators that are to be discussed in this section include: encoding,

selection, recombination and mutation operators. The operators with their various

types are explained with necessary examples.

9.9.1 Encoding Encoding is a process of representing individual genes. The process can be performed using bits, numbers, trees, arrays, lists or any other objects. Th e

encoding depends mainly on solving the problem. For example, one can encode

directly real or integer numbers.

9.9.1.1 Binary Encoding

The most common way of encoding is a binary string, which would be represented

as in Figure 9-9.

Each chromosome encodes a binary (bit) suing. Each bit in the suing can represent

some characteristics of the solution. Every bit string therefore is a solutio n but not

necessarily the best solution. Another possibility is that the whole string can

represent a number. The way bit strings can code differs from problem to problem.

Binary encoding gives many possible chromosomes with a smaller number of

alleles. On the other hand, this encoding is not natural for many problems and

sometimes corrections must be made after genetic operation is completed. Binary

coded strings with Is an d Os are mostly used. The length of the string depends on

the accuracy. In such coding

1. Integers are represented exactly.

2. Finite number of real numbers can be represented.

3. Number of real numbers represented increases with string length.

9.9.1.2 Octal Encoding

This encoding uses string made up of octal numbers (0 -7) (see Figure 9-16). Chromosome 1 1 1 0 1 0 0 0 1 1 0 1 0 Chromosome 2 I 0 1 1 1 1 1 1 1 1 1 0 0 Figure 9·9 Binary encoding. Chromosome 1 03467216 Chromosome 2 9723314

Figure 9·16 Octal encoding munotes.in

## Page 205

204SOFT COMPUTING TECHNIQUES

Chromosome 1 9CE7

Chromosome 2 3DBA

Figure 9·17 Hexadecimal encoding. Chromosome A 1 5 3 2 6 4 7 9 8 Chromosomes 8 5 6 7 2 3 1 4 9

Figure 9·18 Permutation encoding.

9.9.1.3 Hexadecimal Encoding

This encoding uses string made up of hexadecimal numbers (0 -9, A -F)

(see Figure 9-17).

9.9.1.4 Permutation Encoding (Real Number Coding)

Every chromosome is a string of numbers, represented in a sequence. Sometimes

corrections have to be done after geneti c operation is complete. In permutation

encoding, every chromosome is a suing of integer/real values, which represents

number in a sequence.

Permutation encoding (Figure 9-18) is only useful for ordering problems. Even for

this problem, some types of cross over and mutation corrections must be made to

leave the chromosome consistent (i.e., have real sequence in it).

9.9.1.5 Value Encoding

Every chromosome is a string of values and the values can be anything connected

w the problem. This encoding produces bes t results for some special problems. On

the other hand, it is often necessary to develop new genetic operator's specific to

the problem. Direct value encoding can be used in problems, where some

complicated values, such as real numbers, are used. Use of bi nary encoding for this

type of problems would be very difficult.

In value encoding (Figure 9-19), every chromosome is a string of some values.

Values can be anything connected to problem, form numbers, real numbers or

characters to some complicated objects . Value encoding is very good for some

special problems. On the other hand, for this encoding it is often necessary to

develop some new crossover and mut ation specific for the problem. Chromosome A 1.2324 5.3243 0.4556 2.3293 2.4545 Chromosome B ABDJEIFJDHDIERJFDLDFLFEGT Chromosome C (back), (back), (right), (forward), (left) Figure 9-19 Value encoding. munotes.in

## Page 206

205Chapter 9: Genetic Algorithm

9.9.1.6 Tree Encoding

This encoding is mainly used for evolving program expressions for genetic

programming. Every chromosome is a tree of some objects such as functions and

commands of a programming language.

9.9.2 Selection

Selection is the process of choosing two parents from the population for crossing.

After deciding on an encoding, the next step is to decide how to perform selection,

i.e., how to choose individuals in the population that will create offspring for the

next generation and how many offspring each will create. The purpose of selection

is in emphasize fitter individuals in the -population in hopes that their offspring have

higher fitnes s. Chromosomes are selected from the initial population to be parents

for reproduction. The problem is how to select these chromosomes. According to

Darwin’s theory of evolution the best ones survive to create new offspring. Figure

9-20 shows the basic sel ection process.

Selection is a method that randomly picks chromosomes out of the population

according to their evaluation function. The higher the fitness function, the better

chance that an individual will be selected. The selection pressure is defined as the

degree to which the better individuals are favoured. The higher selection pressured,

the more the better individuals are favoured. This selection pressure drives the GA

to improve the population fitness over successive generations.

The convergence rat e of GA is largely determined by the magnitude of the selection

pressure, with higher selection pressures resulting in higher convergence rates. GAs

should be able to identify optimal or nearly optimal solutions under a wide range

of selection scheme press ure. However, if the selection pressure is too low, the

convergence rate will be slow, and the GA will take unnecessarily longer to find

the optimal solution. If the selection pressure is too high, there is an increased

change of the GA prematurely converg ing to an incorrect (sub -optimal) solution.

In addition to providing selection pressure, selection schemes should also preserve

population diversity, as this helps to avoid premature convergence.

Typically we can distinguish two types of selection scheme, proportionate -based

selection and ordinal based selection. Proportionate -based selection picks out

individuals based upon their fitness values relative to the fitness of the other

individuals in the population. Ordinal -based selection schemes select indivi duals

not upon their raw fitness, bur upon their rank within the population. This requires that the selection pressure is independent of the fitness distribution of the population, and is solely based upon the relative ordering (ranking) of the population. munotes.in

## Page 207

206SOFT COMPUTING TECHNIQUES

Figure 9-20 Selection.

It is also possible to use a scaling function to redistribute the fitness range of the

population in order to adapt the selection pressure. For example, if all the solutions

have their finesses in the range [999, 1000], the probability of selecting a better

individua l than any other using a proportionate based method will note important.

If the fitness every individual is bringing to the range [0, 1] equitable, the

probability of selecting good individual instead of bad one will be important.

Selection has to be balan ced with variation from crossover and mutation. Too

strong selection means sub -optimal highly fit individuals will take over the

population, reducing the diversity needed for change and progress; too weak

selection will result in too slow evolution. The va rious selection methods are

discussed in the following subsections.

9.9.2.1 Roulette Wheel Selection

Roulette selection is one of the traditional GA selection techniques. The commonly

used reproduction operator is the proportionate reproductive operator wh ere a

string is selected from the mating Pool with a probability proportional to the fitness.

The principle of Roulette selection is a linear search through a Roulette wheel with

the store in the wheel weighted in proportion to the individual's fitness val ues. A

target value is set, which is a random proportion of the sum of the finesses in the

population. The population is stepped through until the target value is reached. This

is only a moderately strong selection technique, since fir individuals are not

guaranteed to be selected for, bur somewhat have a greater chance. A fit individual

will contribute more to the target value, but if it does not exceed it, the next

chromosome in line has a chance, and it may be weak. It is essential that the

population not be sorted by fitness, since this would dramatically bias the selection.

munotes.in

## Page 208

207Chapter 9: Genetic Algorithm

The Roulette process can also be explained as follows: The expected value of an

individual is individual’s fitness divided by the actual fitness of the population.

Each individual is assigned a slice of the Roulette wheel, the size of the slice being

proportional to t he individual's fitness. The wheel is spun N times, where N is the

number of individuals in the population. On each spin, the individual under the

wheel's marker is selected to be in the pool of parents for the next generation. This

method is implemented a s follows:

1. Sum the total expected value of the individuals in the population. Let it be T.

2. Repeat N times:

i. Choose a random integer "r" between 0 and T.

ii. Loop through the individuals in the population, summing the expected

values, until the sum is greater than or equal to "r." The individual

whose expected value puts the sum over this limit is the one selected.

Roulette wheel selection is easier to implemen t bur is noisy. The rate of evolution

depends on the variance of fitness's in the population.

9.9.2.2 Random Selection This technique randomly selects a parent from the population. In terms of disruption of genetic codes, random selection is a little more disruptive, on average,

than Roulette wheel selection.

9.9.2.3 Rank Selection

The Roulette wheel will have a problem when the fitness values differ very much.

If the best chromosome fitness is 90%, its circumference occupies 90% of Roulette

wheel, and then other chromosomes have too few chances to be selected. Rank

Selection ranks the population and every chromosome receives fitness from the

ranking. The worst has fitness 1 and the best has fitness N. It results in slow

convergence but prevents too qui ck convergence. It also keeps up selection

pressure when the fitness variance is low. It preserves diversity and hence leads to

a successful search. In effect, potential parents are selected and a tournament is held

to decide which of the individuals will be the parent. There are many ways this can

be achieved and two suggestions are:

1. Select a pair of individuals at random. Generate a random number R between

0 and 1. If R r then use the

second individual as the parent. This is repeated to select the second parent.

The value of r is a parameter to this method.

2. Select two individuals at random. The individual with the highest evaluation

becomes the parent. Repeat to find a second parent. munotes.in

## Page 209

208SOFT COMPUTING TECHNIQUES

9.9.2.4 Tournament Selection

An ideal selection strategy should be such that it is able to adjust its selective

pressure and population diversity so as to fine -rune GA search performance.

Unlike, the Roulette wheel selection, the tournament selection strategy prov ides

selective pressure by holding a tournament competition among Nu individuals.

The best individual from the tournament is the one with the highest fitness, who is

the winner of Nu. Tournament competitions and the winner are then inserted into

the mating pool. The tournament competition is repeated until the mating pool for

generating new offspring is filled. The mating pool comprising the tournament

winner has higher average population fitness. The fitness difference provides the

selection pressure, whic h drives GA to improve the fitness of the succeeding genes.

This method is more efficient and leads to an optimal solution.

9.9.2.5 Boltzmann Selection

SA is a method of function minimization or maximization. This method simulates

the process of slow cooli ng of molten metal to achieve the minimum function value

in a minimization problem. Controlling a temperature -like parameter introduced

with the concept of Boltzmann probability distribution simulates the cooling

phenomenon.

In Boltzmann selection, a conti nuously varying temperature controls the rate of

selection according to a preset schedule. The temperature starts out high, which

means that the selection pressure is low. The temperature is gradually lowered,

which gradually increases the selection pressu re, thereby allowing the GA to

narrow in more closely to the best part of the search space while maintaining the

appropriate degree of diversity.

A logarithmically decreasing temperature is found useful for convergence without

getting stuck to a local mini ma state. However, it takes time to cool down the

system to the equilibrium state.

Let fax be the fitness of the currently available best string. If the next string has

fitness f (X:) such that f(X;)> fmaxWKHQWKHQHZVWULQJLVVHOHFWHG2WKHUZLVHLWLV

selected with Bole/Mann

P= exp[-{fmax- f(Xi)} /T] ……………(17)

probability where T = To (1-D )k and k = (1 + 100 *g/G); g is the current generation

number; G the maximum value of g. The value of CI:' can be chosen from the range

[0, 1] and that of T0 from the range [5, 100]. The final stare is reached when munotes.in

## Page 210

209Chapter 9: Genetic Algorithm

computation approaches zero value of T, i.e., the global solution is achieved at this

point.

The probability that the best string is selected and introduced into the mating pool

is very high. However , Elitism can be used to eliminate the chance of any undesired

loss of information during the mutation stage. Moreover, the execution time is less.

Figure 9·21 Stochastic universal sampling.

Elitism

The first best chromosome or the few best chromosomes are copied to the new

population. The rest is done in a classical way. Such individuals can be lost if they

are not selected to reproduce or if crossover or mutation destroys them. This

significantly improves the GA's performance.

9.9.2.6 Stoch astic Universal Sampling

Stochastic universal sampling provides zero bias and minimum spread. The

individuals are mapped to contiguous segments of a line, such that each individual's

segment is equal in size to its fitness exactly as in Roulette wheel sel ection. Here

equally spaced pointers are placed over the line, as many as there are individ uals to

be selected. Consider N Pointer the number of individuals to be selected, then the

distance between the pointers are 1/N Pointer and the position of the first pointer

is given by a randomly generated number in the range [0, 1/N Pointer]. For 6

individuals to be selected, the distance between the pointers is 1/6 = 0.167.

Figure 9-21 shows the selection for the above example.

Sample o f 1 random number in the range [0, 0.167]: 0.1.

After selection the mating population consists of the individuals,

1,2,3,4,6,8

Stochastic universal sampling ensures selection of offspring that is closer to what

is deserved as compared to Roulette wheel sel ection.

munotes.in

## Page 211

210SOFT COMPUTING TECHNIQUES

9.9.3 Crossover (Recombination)

Crossover is the process of taking two parent solutions and producing from them a

child. After the selection (reproduction) process, the population is enriched with

better individuals. Reproduction makes clones of go od strings but does not create

new ones. Crossover operator is applied to the mating pool with the hope that it

creates a better offspring.

Crossover is a recombination operator that proceeds in three steps:

1. The reproduction operator selects at random a pair of two individual strings

for the mating.

2. A cross site is selected at random along the string length.

3. Finally, the position values are swapped between the two strings following

the cross site.

That is the simplest way how to do that is to choose randomly some crossover point

and copy everything before this point &on the first parent and then copy everything

after the crossover point from the other parent. The various crossover techniques

are discussed in the following subsections .

Parent1 1 0 1 1 0 0 1 0

Parent2 1 0 1 0 1 1 1 1

Child1 1 0 1 1 0 1 1 1

Chiled2 1 0 1 0 1 0 1 0

Figure 9·22: Single -point crossover

9.9.3.1 Single -Point Crossover

The traditional genetic algorithm uses single -point crossover, where the two mating

chromosomes are cut once at corresponding points and the sections after the cuts

exchanged. Here, a cross site or crossover point is selected randomly along the

length of t he mated strings and bits next to the cross sites are exchanged.

Inappropriate site is chosen, bender children can be obtained by combining good

parents, else it severely hampers string quality. munotes.in

## Page 212

211Chapter 9: Genetic Algorithm

Figure 9-22 illustrates single point crossover and it can be observed that the bits

next to the crossover point are exchanged to produce children. The crossover point

can be chosen randomly.

9.9.3.2 Two Point Crossover

Apart from single point crossover, many different crossover algorithms have been

devised, often involving more than one cut point. It should be noted that adding

further crossover points reduces the performance of the GA. The problem with

adding additional crossover points is that building blocks are more likely to be

disrupted. However, an advantag e of having more crossover points is that the

problem space may be searched more thoroughly.

In two -point crossover, two crossover points are chosen and the contents between

these points are exchanged between two mated parents.

In Figure 9-23 the dotted li nes indicate the crossover points. Thus the comments

between these points are

exchanged between the parents to produce new children for mating in the next

generation.

Parent1 1 1 0 1 1 0 1 0

Parent2 0 1 1 0 1 1 0 0

Child 1 1 1 1 0 1 0 1 0

Child2 0 1 0 1 1 1 0 0

Figure 9-23 Two -point crossover

Originally, GAs were using one point crossover which cuts two chromosomes in

one point and splices the two halves to create new ones. But with this one -point

crossover, the head and the rail of one chromosome cannot be passed together to

the offspring. If both the head and the rail of a chromosome contain good genetic

information, none of the offspring obtained directly with one -point crossover will

share the two good features. Using a two -point crossover one can avoid this

drawback, and so it is generally considered better than one -point crossover. In fact,

this problem can be generalized to each gene position in a chromosome. Genes that

are close on a chromosome have more chance to be passed together to the offspring munotes.in

## Page 213

212SOFT COMPUTING TECHNIQUES

obtained through N -points crossover. It leads to an unwanted correlation between

genes next to each other. Consequently, the efficiency of an N -point crossover will

depend on the position of the genes within the chromosome. In a genetic

representation, genes that encode dependent characteristic s of the solution should

be close together. To avoid all the problem of genes locus, a good thing is to use a

uniform crossover as recombination operator.

9.9.3.3 Multipoint Crossover (N·Point Crossover)

There are two ways in this crossover. One is even nu mber of cross sires and the

other odd number of cross sites. In the case of even number of cross sires, the cross

sites are selected randomly around a circle and information’s exchanged. In the

case of odd number of cross sites, a different cross point is always assumed at the

string beginning.

9.9.3.4 Uniform Crossover

Uniform crossover is quite different from the N-point crossover. Each gene in the

offspring is created by copying the corresponding gene from one or the other parent

chosen according to a ra ndom generated binary crossover mask of the same length

as the chromosomes. Where there is a 1 in the crossover mask, the gene miscopied

from the first parent, and where there is a 0 in the mask the gene is copied from the

second parent. Anew crossover mas k is randomly generated for each pair of

parents. Offspring, therefore, contain a mixture of genes from each parent. The

number of effective crossing point is not fixed, but will average L/2 (where L is the

chromosome length).

In Figure 9-24, new children are produced using uniform crossover approach. It

can be noticed that while producing child 1, when there is a 1 in the mask, the gene

is copied from parent 1 else it is copied from parent 2. On producing child 2, when

there is a 1 in the mask, the gene is copied from parent 2, and when there is a 0 in

the mask, the gene is copied from the parent 1.

9.9.3.5 Three Parent Crossover

In this crossover technique, three parents are randomly chosen. Each bit of the first

parent is compared with the bit of the second parent. If both are the same, the bit is

taken for the offspring; otherwise the bit from the third parent is taken for the

offspring. This concept is illustrated in Figure 9-25.

munotes.in

## Page 214

213Chapter 9: Genetic Algorithm

Parent 1 1 0 1 1 0 0 1 1

Parent 2 0 0 0 1 1 0 1 0

Mask 1 1 1 0 1 0 1 1 0

Child 1 1 0 0 1 1 0 1 0

Child 2 0 0 1 1 0 0 1 1

Figure 9·24 Uniform crossover

Parent 1 11010001

Parent 2 01101001

Parent 3 01101100

Child 01101001

Figure 9·25 Three parent crossover

9.9.3.6 Crossover with Reduced Surrogate

The reduced surrogate operator constraints crossover to always produce new

individuals wherever possible. This is implemented by restricting the location of

crossover points such that crossover points only occur where gene values differ.

9.9.3.7 Shuffle Cr ossover

Shuffle crossover is related to uniform crossover. A single crossover position (as

in single point crossover) is decreed. But before the variables are exchanged, they

are randomly shuffled in both parents. After recomb ination, the variables in the

offspring are unstuffed. This removes positional bias as the variables are randomly

reassigned each time crossover is performed.

9.9.3.8 Precedence Preservative Crossover

Precedence preservative crossover (PPX) was independently developed for vehicle

touting problems by Blanton and Wainwright (1993) and for scheduling problems

by Bierwirth et al. (1996). The operator passes on precedence relations of

operations given in two parental permutations to one offspring at the same race,

while no n ew precedence relations are introduced. PPX is illustrated below for a.

problem consisting of six operations A -F. The operator works as follows: munotes.in

## Page 215

214SOFT COMPUTING TECHNIQUES

l. A vector of length Sigma, sub i == 1 tomi, representing the number of

operations involved in the problem, i s randomly filled with elements of the

set {1, 2).

2. This vector defines the order in which the operations are successively drawn

from parent I and parent 2.

3. We can also consider the parent and offspring permutations as lists, for which

the operation s "append “and "delete'' are defined.

4. First we scan by initializing an empty offspring.

5. The leftmost operation in one of the two parents is selected in accordance

with the order of parents given in the vector.

6. After an operation is selected, it is deleted in both parents.

7. Finally the selected operation is appended to the offspring.

8. Step 7 is repeated until both parents are empty and the offspring domains all

operations involved.

Note that PPX does not work in a uniform crossover manner due tithe "deletion -

append" scheme used. Example is shown in Figure 9-26.

9.9.3.9 Ordered Crossover

Ordered two -point crossover is used when the problem is order based, for example

in U shaped assembly line balancing, etc. Given two parent chromosomes, two

random crossover points are selected partitioning

Parent permutation 1 A B C D E F

Parent per mutation 2 C A B F D E

Select parent no. (1/2) 1 2 1 1 2 2

Offspring permutation A C B D F E

Figure 9·26 Precedence preservative crossover (PPX).

Parent 1:4 2 | 1 3 | 65 Child 1:4 2 | 31 | 65

Parent 2:2 3 | 1 4 | 56 Child 2:2 3 | 41 | 56

Figure 9·27 Ordered crossover

them into a left, middle and right portions. The ordered two point crossover behaves

in the following way: child 1 inherits its left and right section from· parent l, and

its middle section is determined by the genes in the middle section of parent 1 in

the order in which the values appear in parent 2. A similar process i s applied to

determine child 2. This is shown in Figure 927. munotes.in

## Page 216

215Chapter 9: Genetic Algorithm

9.9.3. 10 Partially Matched Crossover

Finally matched crossover (PMX) can be applied usefully in the TSP. Indeed, TSP

chromosomes are simply sequences of integers, where each integer represents a

different city and the order represents the time at which acidy is visited. Under this

representat ion, known as permutation encoding, we are only interested in labels

and not alleles. It may be viewed as a crossover of permutations that guarantees

that all positions arc found exactly once in each offspring, i.e., both offspring

receive a full complemen t of genes, followed by the corresponding filling in of

alleles from their parents. PMX proceeds as follows:

1. The two chromosomes are aligned.

2. Two crossing sires are selected uniformly at random along the strings,

defining a marching section.

3. The matching section is used to effect a cross through position -by-position

exchange operation.

4. Alleles are moved to their new positions in the offspring.

The following illustrates how PMX works.

Name 9 8 4 . 5 6 7 . 1 8 2 1 0 Allele 1 0 1 . 0 0 1 . 1 1 0 0

Name 8 7 1 . 2 3 1 0 . 9 5 4 6 Allele 1 1 1 . 0 1 1 . 1 1 0 1

Figure 9·28 Given strings

Consider the two strings shown in Figure 9-28, where the dots mark the selected

cross points. The marching section defines the position -wise exchanges that must

take place in both parents to produce the offspring. The exchanges are read from

the marching section of one chromosome to that of the ot her. In the example

illustrate in Figure 9-28, the numbers that exchange places are 5 and 2, 6 and 3, and

7 and 10. The resulting offspring are as shown in Figure 9-29. PMX is dealt in

derail in the next chapter.

Name 9 8 4 . 2 3 1 0 . 1 6 5 7 Allele 1 0 1 . 0 1 0 . 1 0 0 1

Name 8 1 0 1 . 5 6 7 . 9 2 4 3 Allele 1 1 1 . 1 1 1 . 1 0 0 1

Figure 9·29 partially matched crossover.

9.9.3.11 Crossover Probability

The basic parameter in crossover technique is the crossover probability

(Pt).Crossover probability is a parameter to describe how often crossover will be

performed. If there is no crossover, offspring are exact copies of parents. If there is munotes.in

## Page 217

216SOFT COMPUTING TECHNIQUES

crossover , offspring are made from parts of both parents' chromosome. If c rossover

probability is 100%, then all offspring are made by crossover. If it is O%, whole

new- generation is made from exact copies of chromosomes from old population

(but this does not mean that the new generation is the same!). Crossover is made in

hope that new chromosomes will contain good parts of old chromosomes and

therefore the new chromosomes will be better. However, it is good to leave some

part of old population survive to next generation.

9.9.4 Mutation

After crossover, the strings are subjected to mutation. Mutation prevents the

algorithm to be trapped in a local minimum. Mutation plays the tale of recovering

the lost genetic materials as well as for randomly distributing genetic information.

It is an insurance policy against the irreve rsible loss of genetic material. Mutation

has been traditionally considered as a simple search operator. If crossover is

supposed to exploit the current solution to find better ones, mutation is supposed to

help for the exploitation of the whole search spa ce. Mutation isvie¥1ed as a

background operator to maintain genetic diversity in the population. It introduces

new genetic structures in the population by randomly modifying some of its

building blocks. Mutation helps escape from local minima's trap and ma intains

diversity in the population. It also keeps the gene pool well stocked, thus ensuring

periodicity. A search space is said to be argotic if there is a non -zero probability of

generating any solution from any population state.

There are many different forms of mutation for the different kinds of representation.

For binary representation, a simple mutation can consist in inverting the value of

each gene with a small probability. The probability is usually taken about 1/ L,

where L is the length of the c hromosome. It is also possible to implement kind of

hill climbing mutation operators that do mutation only if it improves the quality of

the solution. Such anoperawr can accelerate the search; however, care should be

taken, because it might also reduce the diversity in the population and make the

algorithm converge toward some local optima. Mutation of a bit involves flipping

a bit, changing 0 to1 and vice -versa.

9.9.4 1 Flipping

Flipping of a bit involves changing 0 to 1 and 1 to 0 based on a mutation

chromosome generated. Figure 9-30explains mutation flipping concept. A parent

is considered and a mutation chromosome is randomly generated. For a 1 in

mutation chromosome, the corre sponding bit in parent chromosome is flipped (0 to

1 and1 to 0) and child chromosome is produced. In the case illustrated in Figure 9-munotes.in

## Page 218

217Chapter 9: Genetic Algorithm

30, 1 occurs at 3 places of mutation chromosome, the corresponding bits in parent

chromosome are flipped and the child is generated.

9.9.4.2 Interchanging

Two random positions of the string are chosen and the bits corresponding to those

positions are interchanged (Figure 9.31).

Parent 1 0 1 1 0 1 0 1

Mutation chromosome 1 0 0 0 1 0 0 1

Child 0 0 1 1 1 1 0 0

Figure 9·30 Mutation flipping.

Parent 1 0 1 1 0 1 0 1

Child 1 1 1 1 0 0 0 1

Figure 9·31 Interchanging

Parent 1 0 1 1 0 1 0 1

Child 1 0 1 1 0 11 1

Figure 9·32 Reversing.

9.9.4.3 Reversing

A random position is chosen and the bits next to that position is reversed and child

chromosome is produced (Figure 9-32).

9.9.4.4 Mutation Probability

An important parameter in the mutation technique is the mutation probability (P,).

It decides how often parts of chromosome will be mutated. If there is no mutation,

offspring are generated immediately after crossover (or directly copied) within any

change. If mutation is performed, one or more parts of a chromosome are changed.

If mutation probability is 10 0%, whole chromosome is changed; if it is 0%, nothing

is changed. Marion generally prevents the GA from falling into local extremes. munotes.in

## Page 219

218SOFT COMPUTING TECHNIQUES

Mutation should not occur very often, because then GA will in fact change to

ralidom search.

9.10 Stopping Condition for Ge netic Algorithm Flow

In short, the various stopping condition are listed as follows:

1. Maxim 11m generations:. The GA stops when the specified number of

generations has evolved.

2. Elapsed time: The genetic process will end when a specified time has ela psed.

Note: If the maximum number of generation has been reached before the

specified time has elapsed, the process will end.

3. No change in fitness: The genetic process will end if there is no change tithe

population's best fitness for a specified number of generations.

Note: If the maximum number of generation has been reached before the

specified number of generation with too changes has been reached, the

process will end.

4. Stall generations: The algorithm stops if there is no improv ement in the

objective function for a sequence of consecutive generations of length "Stall

generations."

5. Stall time limit. The algorithm stops if there is no improvement in the

objective function during animerval of time in seconds equal to "Stall time

limit."·

The termination or convergence criterion finally brings the search to a halt. The

following are the few methods of termination techniques.

9.10.1 Best Individual

A best individual convergence criterion stops the search once the minimum fitness

in the population drops below the convergence value. This brings the search w a

faster conclusion, guaranteeing at least one good solmion.

9.10.2 Worst Individual

Worst individual terminates the search when the least fir individuals in the

population have fitness less than me convergence criteria. This guarantees the

entire population w be of minimum standard, although the best individual may not

be significantly be tter than the worst. In this case, a stringent convergence value munotes.in

## Page 220

219Chapter 9: Genetic Algorithm

may never be met, in which case the search will terminate after the maximum has

been exceeded.

9.10.3 Sum of Fitness

In this termination scheme, the search is considered to have satisfaction converged

when the sum of the fitness in the entire population is less than or equal to the

convergence value in the population record. This guarantees that virtually all

individuals in the population will be within a particular fitness range, although it is

bener to pair this convergence criteria with weakest gene replacement, otherwise a

few unfit individuals in the population will blow out the fitness sum. The

population size has to be considered while setting the convergence value.

9.10.4 M edian Fitness

Here at least half of the individuals will be better than or equal to the convergence

value, which should give a good range of solutions to choose from.

9.11 Constraints in Genetic Algorithm

If the GA considered consists of only objective fun ction and no information about

the specifications of variable, then it is called unconstrained optimization problem.

Consider, an unconstrained optimization problem of the form

Minimize f(x) = x2 ………… (18)

and there is no information about "x" range. GA minimizes this function using its

operators in random specifications.

In the case of constrained optimization problems, the information is provided for

the variables under consideration. Constraints are classifi ed as:

1. Equality relations.

2. Inequality relations.

GA geneses a sequence of parameters to be rested using the system under consideration, objective function (to be maximized or minimized) and the constraints. On running. the system, the objective fun ction is evaluated and

constraints are checked to see if there are any violations. If there are no violations,

the parameter set is assigned the fitness value corresponding to the objective

function evaluation. When the constraints are violated, the soluti on is infeasible

and thus has no fitness. Many practical problems are constrained and it is very

difficult to find a feasible point that is best. As a result, one should get some munotes.in

## Page 221

220SOFT COMPUTING TECHNIQUES

information out of infeasible solutions, irrespective of their fitness rankin g in

relation tithe degree of constraint violation. Thesis performed in penalty method.

Penalty method is one where a constrained optimization problem is transformed to

an unconstrained optimization problem by associating a penalty or cost with all

constra int violations. This penalty is included in the objective function evaluation.

Consider the original constrained problem in maximization form:

Maximize f(x)

Subject to gi(x) >0, i = 1, 2, 3, ... , n

where x is a k-vector. Transforming this to unconstrained form:

Maximize f(x) + P σ߮ሾ݃ǡሺݔሻሿே

ୀଵ ……..(19)

where ) is the penalty function and P is the penalty coefficient. There exist several

alternatives for this penalty function. The penalty function can be squared for all

violated constraints. In certain situations, the unconstrained solution converges to

the constrained solution as the penalty coefficient p rends to infinity.

9.12 Problem Solving Using Genetic Algorithm

9.12.1 Maximizing a Function

Consider the problem of maximizing the function,

f (x )= x2 …..(20)

where x is permitted to vary between 0 and 31. The steps involved in solving this

problem are as follows:

Step I: For using GA approach, one must first code the decision variable "x" into a

finite length string. I Using a five bit (binary integer) unsigned inte ger, numbers

between 0(00000) and 31(11111) can be obtained.

The objective function here is f(x) = x2 which is to be maximized. A single

generation of a GA is performed here with encoding, selection, crossover and

mutation. To start with, select initial population at random. Here initial population

of size 4 is chosen, but any number of populations can be selected based on the

requirement and application. Table 9-4 shows an initial population randomly

selected.

Table 9·4 Selection String No. Initial population

(randomly selected) x valu

e Fitness f(x) = x2 Probi Percentage Probability

(%) Expected count Actual count 1 0 1 1 0 0 12 144 0.1247 11.47 0.4987 1 munotes.in

## Page 222

221Chapter 9: Genetic AlgorithmString No. Initial population

(randomly selected) x valu

e Fitness f(x) = x2 Probi Percentage Probability

(%) Expected count Actual count 2 1 1 0 0 1 25 625 0.5411 54.11 2.1645 2 3 0 0 1 0 1 5 25 0.0216 2.16 0.0866 0 4 1 0 0 1 1 19 361 0.3126 31.26 1.2502 1 Sum 195 1.0000 100 4.0000 4 Average 288.75 0.2500 25 1.0000 1 Maximum 625 0.5411 54.11 2.1645 2 Step 2: Obtain the decoded x values for the initial population generated. Consider

string 1.

01100 = 0 * 24 + 1 * 23 + I * 22 + 0 * 21 + 0 * 20

= 0+ 8 + 4 + 0 + 0

= 12

Thus for all the four strings the decoded values are obtained.

Step 3: Calculate the fitness or objective function. This is obtained by simply

squaring the “x”

value, since the given function is f(x) = x2 When x = 12, the fitness value is

f(x) = x2 = (12) 2 = 144

For x = 25, f(x) = x2 = (25) 2 = 625

and so on, until the entire population is computed.

Step 4: Compute the probability of selection,

ܾ݅ݎܲൌሺ௫ሻ

σሺ௫ሻ

సభ ….(21)

where n is the number of populations; f(x) is the fitness value corresponding to a

particular

Individual in the population;

¦ f(x) is the summation of all the fitness value of the entire population.

Considering string l,

Fitness f (x) = 144

¦f (x) = 195

The probability that string 1 occurs is given by

P1 = 144/1 95 = 0.1247

The percentage probability is obtained as

0.1247 * 100 = 12.47% munotes.in

## Page 223

222SOFT COMPUTING TECHNIQUES

The same operation is done for all the strings. It should be noted that summation of

probability select is l.

Step 5: The next step is to calculate the expected count, which is calculated as

Expected count = ሺ୶ሻ

ሾ௩ሺ௫ሻሿ …………(22)

Where

ሺ݃ݒܣ݂ሺݔሻሻൌቂσሺ௫ሻ

సభ

തቃ ……………..(23)

For string 1,

Expected count = Fitness/Average = 144/288.75 = 0.4987

We then compute the expected count for the entire population. The expected count

gives an idea of which population can be selected for further processing in the

mating pool.

Step 6: Now the actual count is to be obtained to select the individuals who would

participate in the crossover cycle using Roulette wheel selection . The Roulette

wheel is formed as shown Figure 9-33.

The entire Raul we wheel covers 100% and the probabilities of selection as

calculated in step 4 for the entire populations are used as indicators to fit into the

Roulette wheel. Now the wheel may be spu n and the number of occurrences of

population is noted to get actual count.

1. String I occupies 12.47%, so there is a chance for it to occur at least once.

Hence its actual count may be I.

2. With string 2 occupying 54.11% of the Roulette wheel, it has a fair chance of

being selected twice. Thus its actual count can be considered as 2.

3. On the other hand, string 3 has the least probability percentage of 2.16%, so

their occurrence for next cycle is very poor. As a result, ire actual count is 0.

Figure 9·33 Selection using Roulette wheel.

munotes.in

## Page 224

223Chapter 9: Genetic Algorithm

Table 9·5 Crossover String no. Mating Pool Crossover point Offspring after

crossover x value Fitness value f(x) = x2 1 2

3

4 0 1 1 0 0 1 1 0 0 1

1 1 0 0 1

1 0 0 1 1 4 4

2

2 0 1 1 0 1 1 1 0 0 0

1 1 0 1 1

1 0 0 0 1 13 24

27

17 169 576

729

289 Sum Average

Maximum 1763 440.75

729

4. String 4 with 31.26% has at least one chance for occurring while Roulette

wheel is spun, thus its actual count is 1.

The above values of actual count are tabulated as shown is Table 9-5.

Step 7: Now, write the mating pool based upon the actual count as shown in Table

9-5.

The actual count of string no. 1 is I; hence it occurs once in the mating pool. The

actual count of string no. 2 is 2, hence it occurs twice in the mating pool. Since the

actual count of string no. 3 is 0, it does not occur in the mating pool. Similarly, the

actual count of string no. 4 being I, it occurs once in the mating pool. Based on this,

the matin g pool is formed.

Step 8: Crossover operation is performed w produce new offspring (children). The

crossover point is specified and based on the crossover point, single -point crossover

is performed and new offspring is produced. The parents are Parent 1 0 1 1 0 0 Parent 2 1 1 0 0 1 The offspring is produced as Offspring 1 0 1 1 0 1 Offspring 2 1 1 0 0 0 In a similar manner. crossover is performed for the next strings.

Step 9: After crossover operations. new offspring are produced and "x .. value. \ are

decoded and I mess is calculated. munotes.in

## Page 225

224SOFT COMPUTING TECHNIQUES

Step 10: In this step, mutation operation is performed to produce new offspring.

After crossover operation. As discussed in Section 9.9.4.1 mutation -Aipping

operation is performed and new offspring are produced. Table 9-6 shows the new

offspring after mutation. Once the offspring are obtained L after mutation, they are

decoded ta x value and the fitness values are computed.

This completes one generation. The mutation is performed on a bit -bit by basis.

The crossover probability and mutation probability were assumed to be 1.0 and

0.001, respectively. Once selection, crossover and mutation are performed, the new

popular ion is now r eady to be rested. This is performed by decoding the new

strings created by the simple GA after mutation and calculates the fitness function

values from the x values thus decoded. The results for successive cycles of

simulation are shown in Tables 9-4 and 96.

Table 9-6 Mutation String no. Offspring after

crossover Mutation chromosomes

for Ripping Offspring after

crossover x value Fitness f(x) = x2 1 2

3

4 0 1 1 0 1 1 1 0 0 0

1 1 0 1 1

1 0 0 0 1 1 0 0 0 0 0 0 0 0 0

0 0 0 0 0

0 0 1 0 0 1 1 1 0 1 1 1 0 0 0

1 1 0 1 1

1 0 1 0 0 29 24

27

20 841 576

729

400 Sum Average

Maximum 2546 636.5

841

From the rabies, it can be observed how GAs combine high -performance notions

to achieve bercer performance. In the rabies, it can be noted how maximal and

average performances have improved in the new population. The population

average fitness has improved from 288.75 to 636.5 in one generation. The

maximum fitness has increased from 625 to 841 during the same period. Though

random processes make this best solution, its improvement can also be seen

successively. The best string of the initial population (1 1 0 0 1) receives no chances

for its existence because of its high, above -average performance. When this

combines at random with the next highest string (1 0 0 1 1) and is crossed at

crossover point 2 (as shown in Table 9-5), one of the resulting strings ( 1 1 0 1 1)

proves to be a very best solution indeed. Thus after mutation at random, a new

offspring (1 1 1 0 1) is produced which is an excellent choice.

This example has shown one gene ion of a simple GA. munotes.in

## Page 226

225Chapter 9: Genetic Algorithm

9.13 The Schema Theorem

In this section. we will f ormulate and prove the fundamental research on the

behaviour of GAs - the so -called Schema Theorem. Although being completely

incomparable with convergence research’s for conventional optimization methods,

it still provides valuable insight two the intrinsi c principles of GAs. Assume a GA

with proportional selection and an arbitrary bur fixed fitess function f Let us make

the following notations:

1. The number of individuals which fulfil H at time step tare denoted as

rH,r = \Br H\

2. The expression f (t) refers to the observed average fitness at time t:

3. The term f (H, t) stands for the observed average fitness of schema H in time

step t:

Theorem (Schema Theorem - Holland 1975). Assuming we consider a simple GA.

the following in equality holds for eveq schema H:

Proof. The probability that we select an individual fulfilling H is

This probability does not change throughout the execution of the selection loop.

Moreover, each of the m individuals is select::d independent of the others. Hence

the number of selected individuals. which fulfil H, is binomially distributed with

munotes.in

## Page 227

226SOFT COMPUTING TECHNIQUES

sample amount m and the probability. We obtain, therefore, that the expected

number of selected individuals fulfilling H is

……(24)

If two individuals at crossed, which bmh fulfil H, the two offspring’s again fulfil

H. The number of strings fulfilling H can only decrease if one string. which fulfils

H, is crossed with a string which does not fulfil H. but, obviously, only if the cross

sire is chosen somewhere in between the specifications of H. The probability that

the cross sire is chosen within the detaining length of H is

………………..(25)

Hence the survival probability ps of H, i.e., the probability that a string fulfilling H

produces an offspring also fulfilling H. can be estimated as follows (crossover is

only done with probability ):

………..(26)

Selection and crossover are carried our independently, so we may compute the

expecte d number of strings fulfilling H after crossover simply as

………..(27)

After crossover, the number of strings fulfilling H can only decrease if a suing

fulfilling His ahered by mutation at a specification of H. The probability that all

specifications of H remain untouthed by mutation is obviously

………..(28)

The arguments in the proof of the Sthema Theorem can be applied analogously too

many other crossover and mutation operations.

munotes.in

## Page 228

227Chapter 9: Genetic Algorithm

9.13.1 The Optimal Allocation of Trials

The Sthema Theorem has provided the insight that building blocks receive exponentially increasing trials in future generations. The question remains, however, why this could be a good strategy. This leads to an important and well

analyzed problem from stat istical decision theory - the two -armed bandit problem

and its generalization, the k -armed bandit problem. Although this seems like a

detour from our main concern, we shall soon understand the connection to GAs.

Suppose we have a gambling machine with two s lots for coins and two arms. The

gambler ca n deposit the coin either two the left or the right slot. After pulling the

corresponding arm, either a reward is given or the coin is lost. For mathematical

simplicity, we just work with outcomes, i.e., the diffe rence between the reward

(which can be zero) and the value of the coin. Let us assume that the left arm

produces an outcome with mean value P2 and a variance V 22 while the right arm

produces an outcome with mean value P2 and variance V 12. Without loss of

generality, although the gambler does not know this, assume that P1 > P2·

Now the question arises which arm should be played. Since we do not know

beforehand whi ch arm is associated with the higher outcome, we are faced with an

interesting dilemma. Not only must we make a sequence of decisions about which

arm to play, we have to collect, at the same time, information about which is the

bener arm. This trade -off be tween exploitation of knowledge and its exploitation is

the key issue in this problem and, as rums out later, in GAs, too.

A simple approach to this problem is to separate exploitation from exploitation.

More specifically, we could perform a single experim ent at the beginning and thereafter make an irreversible decision that depends on the results of the experiment. Suppose we have N coins. If we allocate an equal number n {where

2n N) of trials to both arms, we could allocate the remaining N- 2n uials to the

observed bener arm. Assuming we know all involved parameters, the expected loss

is given as

L(N. n) = (P1 - P2){(N - n)q(n) + n[l - q(n)l}

where q(n) is the probability that the worst arm is the observed best arm after 2n

expetimental trials. The underlying idea is obvious: In case that we observe that the

worse arm is the best, which happens with probability q(n), the total number of

trials allothed to the right arm is N - 11. The loss is, therefore, (J1 1 -Jl2 )(N - n). In

the reverse case where we actually observe that the best arm is the best, which

happens with probability I - q(n), the loss is only whir we get less because we munotes.in

## Page 229

228SOFT COMPUTING TECHNIQUES

played the worse arm 11 times, i.e., (Ill -112 )11. Taking the central limit theorem into account, we can approximate q (n) with the rail of a normal distribution:

where ………..(29)

Now we have m specify a reasonable experiment size n. obviously, if we choose n = 1, the obtained information is potentially unreliable. If we choose, however, n = N/2 there are no trials left to make use of the information gained though the experimental p hase. What we see is again the trade -off between exploitation with almost no exploitation (n = 1) and exploitation without exploitation {n = N/2). It does not take a Nobel prize winner to see that the optimal way is somewhere in the

middle. Holland has stu died this problem in detail. He came to the conclusion that

the optimal strategy is given by the following equation:

where

………..(30)

Making a few transformations, we obtain that

………..(31) That is, the optimal strategy is m allocate slightly more than an exponentially increasing number of trials to the observed best arm. Although no gambler is able to apply this strategy in practice, because it requires knowledge of the mean values Jll and JLz, we still have found an important bound of performance a decision strategy should try to approach. A GA, although the direct connection is not yet fully clear, actually comes close to this ideal, giving at least an exponentially increasing number of trials to the observed best building blocks. However, one may still wonder how the two -armed bandit problem and GAs are related. Let us consider an arbitrary string position. Then there are two sthemata of order one which have their only specification in this position. According to the Sthema Theorem, the GA implicitly decides between

munotes.in

## Page 230

229Chapter 9: Genetic Algorithm

these two sthemata, where only incomplete data are available (observed average

fitness values). In this sense, a GA solves a lot of two -armed problems in parallel.

The Sthema Th eorem, however, is not restricted to sthemata of order one. Looking at competing sthemata (different sthemata which are specified in the same positions). We observe that a GA is solving an enormous number of k -armed bandit

problems in parallel. The k -armed bandit problem, although much more complicated, is solved in an analogous way - the observed better alternatives should

receive an exponentially increasing number of trials. This is exactly what a GA

does.

9.13.2 Implicit Parallelism

So far we have discovered two distinct, seemingly conflicting views of genetic

algorithms:

1. The algorithmic view that GAs operate on strings;

2. The sthema -based interpretation.

So, we may ask what a GA really processes, strings or sthemata? The answer is

surprising: Both. Now a day, the common interpretation is chat a GA processes an

enormous amount of sthemata implicitly. This is accomplished by exploiting the

currently available, incomplete information about these sthemata continuously,

while trying to e xplore more information about them and other, possibly better

sthemata.

This remarkable property is commonly called the implicit parallelism of GAs. A

simple GA has only m structures in one time step, without any memory or

bookkeeping about the previous ge nerations. We will now ny to get a feeling how

many sthemata a GA actually processes.

Obviously, there are 3n sthemata of length n. A single binary string fulfils n sthema

of order 1, (2n) sthemata of order 2, in general, (kn) sthemata of order k. Hence, a

string fulfils

………..(32)

Theorem. Consider a randomly generated start population of a simple GA and let e

E (0, 1) be a fixed error bound. Then sthemata of length

1, < E (n - l) + l

munotes.in

## Page 231

230SOFT COMPUTING TECHNIQUES

have a probability of at least (1 -) to survive one -point crossover (c ompare with

the proof of the Sthema Theorem). If the population size is chosen as m = 21/2, the

number of sthemata, which survive for the next generation, is of order O(m3).

9.14 Classification of Genetic Algorithm

There exist wide variety of GAs including simple and general GAs discussed in

Sections 9.4 and 9.5, respectively. Some or her variants of GA are discussed below.

9.14.1 Messy Genetic Algorithms

In a "classical" GA, the genes are encoded in a fixed order. T he meaning of a single

gene is determined by its position inside the string. We have seen in the previous

chapter that a GA is likely to converge well if the optimization risk can be divided

two several short building blocks. What, however, happens if the coding is chosen

such that couplings occur between distant genes? Of course, one -point crossover

rends to disadvantage long sthemata {even if they have low order) over short ones.

Messy GAs try w overcome this difficulty by using a variable -length, positio n-

independent coding. The key idea is to append an index to each gene which allows

identifying its position. A gene, therefore, is no longer represented as a single allele

value and a fixed position, but as a pair of an index and an allele. Figure 9-34(A)

shows how this "messy" coding works for a string of length 6.

Since with the help of the index we can identify the genes uniquely, genes may be

swapped arbitrarily without changing the meaning of the string. With appropriate

genetic operations, which also change the order of the paits, the GA could possibly

group coupled genes to get her automatically.

Figure 9-34 (A) Messy coding and (B) positional preference; Genes with indices

1 and 6 occur twice, the firm occurrences are used.

munotes.in

## Page 232

231Chapter 9: Genetic Algorithm

Figure 9·35 the cut and splice operation.

Owing to the free arrangement of genes and the variable length of the encoding, we

can, however, run into. Problems, which do not occur, in a simple GA. First of all,

it can happen that there are two entries in a string, which corresp ond to the same

index but have conflicting alleles. The most obvious way to overcome this "over -

specification" is positional preference - the first entry, which refers to a gene, is

taken. Figure 9-34(B) shows an example. The reader may have observed that t he

genes with indices 3 and 5 do not occur at all in the example in Figure 9-34(B).

This problem of “under specification" is more complicated and its solution is not

as obvious as for over= -specification. Of course, a lot of variants are reasonable.

One ap proach could be to theck all possible combinations and to rake the best one

(fork missing genes, there are 2k combinations). With the objective to reduce this

effort, Goldberg ct al. have suggested using so -called competitive templates for

finding specific ations for missing genes. It is nothing else than applying a local

hill climbing method with random initial value to the k missing genes.

While messy GAs usually work with the same mutation operator as simple GAs

(every allele is altered with a low probability pM), the crossover operator is

replaced by a more general cut and splice operator which also allows to mate

parents with different le ngths. The basic idea is to choose cut sites for both parents

independently and to splice the four fragments. Figure 9-35 shows an example.

9.14.2 Adaptive Genetic Algorithms

Adaptive GAs are those whose parameters, such as the population size, the crossin g

over probability, or the mutation probability, are varied while the GA is running. A

simple variant could be the following: The mutation rate is changed according to

changes in the population - the longer the population does not improve, the higher

munotes.in

## Page 233

232SOFT COMPUTING TECHNIQUES

the mu tation rare is chosen. Vice versa, it is decreased again as soon as an

improvement of the population occurs.

9.14.2.1 Adaptive Probabilities of Crossover and Mutation

It is essential to have two characteristics in GAs for optimizing multimodal

functions. The first characteristic is the capacity to converge wan optimum (local or global) after locating the region containing the optimum. The second characteristic is the capacity to explore new regions of the solution space in search

of the global optimum. The balance between these characteristics of the GA is

dictated by the values of Pw and Pn and the type of crossover employed. Increasing

values of Pw and Pr promote exploitation at the expense of exploitation. Moderately large values of Pc (in the range 0.5 -1.0) and small values of Pw (in the

range 0.001 -0.05) are commonly employed in GA practice. In this approach, we

aim at achieving this trade -off between exploitation and expl oitation in a different

manner, by varying, and Pm adaptively in response to the fitness values of the

solutions; Pr and Pm are increased when the population tends to get stuck at a local

optimum and are decreased when the population is scattered in the so lution space.

9.14.2.2 Design of Adaptive pc and Pm

To vary Pr and Pm adaptively for preventing premature convergence of the GA to

a local optimum, it is essential to identify were the GA is converging to an

optimum. One possible way of detecting is to observe average fitness value f of the

population in relation to the maximum fitness value fmax of the population. The

value fmax - f is likely to be less for a populatio n that has converged to an optimum

solution than that for a population scattered in the solution space. We have observed

the above property in all our experiments with GAs, and Figure 9-36 illustrates the

property for a typical case. In Figure 9-36 we noti ce that fmax – f decreases when

the GA converges to a local optimum with a fitness value of 0.5. (The globally

optimal solution has a fitness value of 1.0.) We use the difference in the average

and maximum fitness value, fmax - f, as a yardstick for detect ing the convergence

of the GA. The values of Pc and Pm are varied depending on the value of fmax. - f.

Since Pc and Pm have to be increased when the GA converges to a local optimum,

i.e., when fmax - f decreases, Pc and Pm will have to be varied inversely with fmax

- f. The expressions that we have chosen for Pc and Pm are of the form

Pc = k 1/ (fmax - f)

Pm = k 2/ (fmax - f) munotes.in

## Page 234

233Chapter 9: Genetic Algorithm

Figure 9·36 Variation of fmax – f and f best (best fitness).

It has to be observed in the above expressions that Pc and Pm do not depend on the

fitness value of any particular solution, and have the same values for all the solution

of the population. Consequently, solutions with high fitness values as well as

soluti ons with low fitness values are subjected to the same levels of mutation and

crossover. When a population converges to a globally optimal solution (or even a

locally optimal solution), Pc and Pm increase and may cause the disruption of the

neat-optimal sol utions. The population may never converge to the global optimum.

Though we may prevent the GA from getting stuck at a local optimum, the

performance of the GA (in terms of the generations required for convergence) will

certainly deteriorate.

To overcome the above -stated problem, we need to preserve "good" solutions of

the population. This can be achieved by having lower values of Pc and Pm for high

fitness solutions and higher values of Pc and Pm for low fitness solutions. While

the high fitne ss solutions aid in the convergence of the GA, the low fitness solutions

prevent the GA from getting stuck at a local optimum. The value of Pm should

depend not only on fmax – f but also on the fitness value f of the solution. Similarly,

Pc should depend o n the fitness values of both the parent solutions. The closer f is

to fmax the smaller Pm should be, i.e., Pm should vary directly as fmax – f.

munotes.in

## Page 235

234SOFT COMPUTING TECHNIQUES

Similarly, Pc should vary directly as fmax – f1', where f1 is the larger of the fitness

value of the solutions to be crossed. The expressions for Pc and Pm now take the

forms

………..(33)

(Here k 1 and k2 have to be less than 1.0 to constrain Pc and Pm to the range 0.0 -

1.0.)

Note that Pc and Pm are zero for the solution with the maximum fitness. Alsop, =

k1 for a solution with f = f, and Pm = k2 for a solution with f = f. For solution with

subaverage fitness values, i.e., f < f, Pc and Pm might assume values larger than

1.0. To prevent the oversh ooting of Pc and Pm beyond 1.0, we also have the

following constraints:

………..(34)

where k3, k4 < 1.0.

9.14.2.3 Practical Considerations and Choice of Values for k1, k2, k3 and k 4

In the previous subsection, we saw that for a solution with the maximum fitness

value Pc and Pm are both zero. The best solution in a population is transferred

undisrupted into the next generation. Together with the selection mechanism, this

may lead to an exponential growth of the solution in the population and may cause

premature convergence. To overcome the above -mued problem, we introduce a

default mutation rate (of 0.005) for every solution in the Adaptive Genetic

Algorithm (AGA).

We now discuss the ch oice of values for k1, kz, k3 and k4. For convenience, the

expressions for Pc and Pm are given as

………..(35)

where k1, k2, k3, k4 < 1.0.

munotes.in

## Page 236

235Chapter 9: Genetic Algorithm

It has been well established in GA literature that moderately large values of Pc (0.5

< Pc < 1.0) and small values of Pm (0.001 < Pm < 0.05) are essential for the

successful working of GAs. The moderately large values of Pc promote the

extensive recombinat ion of sthemata, while small values of Pm are necessary to

prevent the disruption of the solutions. These guidelines, however, are useful and

relevant when the values of Pc and Pm do not vary.

One of the goals of the approach is to prevent the GA from getting stuck at a local

optimum. To achieve this goal, we employ solutions with subaverage fitnesses to

search the search space for the region containing the global optimum. Such

solutions need to be completely disrupted, and for this purpose we use a value of

0.5 for k4. Since solutions with a fitness value of f should also be disrupted

completely, we assign a value of 0.5 to k 2 as well.

Based on similar reasoning, we assign k1and k3 a value of 1.0. This ensures that all

solutions with a fitness value less than or equal to f compulsorily undergo

crossover. The probability of crossover decreases as the fitness value (maximum

of the fitness values of the parent solutions) tends to fmax a nd is 0.0 for solutions

with a fitness value equal to fmax.

9.14.3 Hybrid Genetic Algorithms

As they use the fitness function only in the selection step, GAs are blind oprimizers

which do not use any auxiliary information such as derivatives or other spec ific

knowledge about the special strucrure of theobjective function. If there is such

knowledge, however, ir is unwise and inefficient not to make use of ir.Several

investigations have shown that a lot of synergism lies in the combination of genetic

alj!or irhms andconventional methods.

The basic idea is co divide the optimization task into two complementary parts. The

GA does the coarse, global optimization while local refinement is done by the

conventional method (e.g. gradient -based, hill climbing, greedy algorithm,

simulated annealing, ere.). A number of variants are reasonable:

1. The GA performs coarse search first. Afrer the GA is completed, local

refinement is done.

2. The local method is integrated in the GA. For instance, every K generations,

the po pulation is doped witha locally optimal individual.

3. Both methods run in parallel: All individuals are continuously used as initial

values for the local method. The locally optimized individuals are re -

implanred into the current generation. munotes.in

## Page 237

236SOFT COMPUTING TECHNIQUES

In this section a novel optimization approach is used that switthes between global

and local search methods based on the local topography of the design space. The

global and local optimizers work in concert to efficiently locate quality design

points better than e ither could alone. To determine when it is apptopriate to execute

a local search, some characteristics about the local area of the design space need to

be determined. One good source of information is contained in the population of

designs in the GA. By ca lculating the relative homogeneity of the population we

can get a good idea of whether there are multiple local optima located within this

local region of the design space.

To quantify the relative homogeneity of the population in each subspace, the

coeffi cient of variance of the objective function and design variables is calculated.

The coefficient of variance is a normalized measure of variation, and unlike the

actual variance, is independent of the magnitude of the mean of the population. A

high coeffici ent of variance could be an indication that there are multiple local

optima present. Very low values could indicate that the GA has converged to a

small area in the design space, warranting the use of alocal search algorithm to find

the best design within this region.

By calculating the coefficient of variance of the both the design variables and the

objective function as the optimization progresses, it can also be used as a criterion

to switch from me global to the local optimizer. As the variance of the o bjective

values and design variables of the population increases, it may indicate that the

optimizer is exploting new areas of the design space or hill climbing. If the variance

is decreasing, the optimizer may be converging toward local minima and the

optimization process could be made more efficient by switching to a local search

algorithm.

The second method, regression analysis, used in this section helps us determine

when to switch between the global and local optimizer. The design data present in

the current population of the GA can be used toprovide information as to the local

topography of the design space by attempting to fit models of various order to it.

The use of regression analysis to augment optimization algorithms is not new. In

problems in which the objective function or consrrainrs are computationally

expensive, approximations to the design space are created by sampling the design

space and then using regression or other methods to create a simple mathematical

model that closely approxim ates the actual design space, which may be highly

nonlinear. The design space can then be exploted to find regions of good designs

or optimized to improve the performance of the system using the predictive

surrogate approximation models instead of the comp utarionally expensive analysis

code, resulting in large computational savings. The most common regression munotes.in

## Page 238

237Chapter 9: Genetic Algorithm

models are linear and quadratic polynomials created by performing ordinary least

squares regrssion on a set of analysis data.

To make dear the use of regression analysis in this way, consider Figure 9-37,

which represents a complex design space. Our goal is to minimize this function,

and as a first step the GA is run. Suppose that afrer acertain number of generarions

the population consists of the sampl ed points shown in the figure. Since the

population of the GA is spread throughout the design space, having yet to converge

into one of the local minima, it seems logical to continue the GA for additional

generations. Ideally, before the local optimizer is run it would be beneficial to have

some confidence that its starting point is somewhere within the mode that contains

the optimum. Fitting a second -order response surface to the data and noting the

large error (the R2 value is 0.13), ther is a dear indica tion that the GA is currently

exploting multiple modes in the design space.

In Figure 9-38, the same design space is shown but afrer the GA has begun to

converge into the part of the design space containing the optimal design. Once

again a second -order app roximation is fir to GA's population. The dotted line

connects the points predicted by the response surface. Note how much smaller the

error is in the approximation (the R2 is 0.96), which is a good indication that the

GA is currently exploting a single mo de within the design space. At this point, the

local optimizer can be made to quickly converge to the best solution within this

area of the design space, thereby avoiding the slow convergence propenies of the

GA.

After each generarion of the global optimiz er the values of the coefficient of

determination and the coefficient of variance of the enrire population are compared

with the designer specified threshold levels.

Figure 9·37 Apptoximating multiple modes with a second -order model.

munotes.in

## Page 239

238SOFT COMPUTING TECHNIQUES

Figure 9·38 : Apptoximating a single mode with a second -order model.

The first threshold simply states that if coefficient of determination of the

population exceeds a designer set value when a second -order regression analysis is

performed on the design data in the curr ent GA population, then a local search is

started from the current 'best design' in the population. The second threshold is

based on the value of the coefficient of variance of the entire population. This

threshold is also set by the designer and can range upwards from O%. If it increases

at a rate greater than the threshold level then a local sarch is execuced from the best

point in the population.

The flowchart in Figure 9-39 illustrates the stages in the algorithm. The algorithm

can switch repeatedly bet ween the global search (Stage 1) and the local search

(Stage 2) during execution. In Stage I, the global search is initialized and then

monitored. This is also where the regression and statistical analysis occurs.

In Stage 2 the local search is executed w hen the threshold levels are exceeded, and

then this solution is passed back and integrated two the global search. The algorithm scops when convergence is achieved for the global optimization algorithm.

9.14.4 Parallel Genetic Algorithm

GAs are powerful se arch techniques that are used successfully to solve problems

in many different disciplines. Parallel GAs (PGAs) are particularly easy to

implement and promise substantial gains in performance. As such, there has been

extensive research in this field. The section describes some of the most significant

problems in modeling and designing multi -population PGAs and presents some

recent advancemenrs.

munotes.in

## Page 240

239Chapter 9: Genetic Algorithm

One of the major aspects of GA is their ability to be parallelized. Indeed, because

natural evolution deals w ith an entire population and not only with particular

individuals, it is a remarkably highly parallel process. Except in the selection phase,

during which there is competition between individuals, the only interactions

between remembers of the population o ccur during the reproduction phase, and

usually, no more than two individuals are necessary to engender a new child.

Otherwise, any other operations of the evolution, in particular the evaluation of

each member of the population, can be done separately. So , neatly all the operations

in a genetic algorithm are implicitly parallel.

PGAs simply consist in distributing the task of a basic GA on different processors.

As those tasks are implicitly parallel, little time will be spent on communication;

and rhus, t he algorithm is expected to run much faster or to find more accurate this.

It has been established chat GA's efficiency co find optimal solution is largely

determined by the population size. With a larger population size, the genetic

diversity increases, a nd so the a lgorithm is more likdy to find a global optimum! A

large population requires more memory to be scored; it has also been ptoved that it

takes a longer time to converge. If n is the population size, the convergence is

expected aft:er n log(n) fun ction evaluations.

Figure 9·39: Steps in two·stage hybrid optimization approach.

munotes.in

## Page 241

240SOFT COMPUTING TECHNIQUES

The use of mday's new parallel computers not only provides more storage space

but also allows the use of several processors to produce and evaluate more

solutions in a smaller amount of time. By parallelizing the algorithm, it is possible

D increase popul ation size, reduce the computational cost, and so improve the

performance of the GA.

Probably the first attempt to map GAs to existing parallel computer architectures

was made in 1981 by John Grefensrerre. But obviously today, with the emergence

of new hig h-performance computing (HPC), PGA is really a flourishing area.

Researthers try to improve performance of GAs. The stake is to show that GAs are

one of the besr optimization methods to be used with HPC.

9.14.4.1 Global Parallelization

The first attempt to parallelize GAs simply consists of global parallelization. This

approach nics to explicitly parallelize the implicit parallel tasks of the "sequential" GA. The nature of the problems remains unchanged. The algorithm still manipulates a single po pulation where each individual can mare with any other, but

the breeding of new children and/or their evaluation are now made in parallel. The

basic idea is that different processors can create new individuals and compme their

fir ness in parallel almost w ithout any communication among each other.

To start with, doing the evaluation of the population in parallel is something really

simple co implement. Each processor is assigned a subset of individuals to be

evaluated. For example, on a shared memory comput er, individuals could be stored

in shared memory, so that each processor can read the chtomosori:tes assigned and

c:an write back the resnlr of the fitness computation. This method only supposes

iliat the GA works with a generational update of the populati on. Of course, some

synchtonization is needed between generations.

Generally, most of the computational time in a GA is spent calling the evaluation

function. The time spent in manipulating the chromosomes during the selection or

recombination phase is usu ally negligible. By assigning to each processor a subset

of individuals m evaluate, a speedup proportional to the number of processors can

be expeaed if there is a good load balancing between them. However, load

balancing should not be a problem as general ly the time spent for the evolution of

an individual does not really depend on dle individual. A simple dynamic

stheduling algorithm is usually enough to share the population between each

processor equally.

On a distribmed memory compUter, we can smre the population in one "master"

processor responsible for sending the individuals to the other processors, i.e., munotes.in

## Page 242

241Chapter 9: Genetic Algorithm

"slaves." The master processor is also responsible for collecting the result of the

evaluation. A drawback of this distributed memory implementation is that a

bottleneck may occur when slaves are idle while only the master is working. But a

simple and good use of the master processor can improve the load balancing by

distributing individuals dynamically tothe slave processors when they finish their

jobs.

A further seep could consist in applying thegenetic operators in parallel. In fact, the

interaction inside the population only occurs during selection. The breeding,

involving only two individuals to generate he offspring, could easily be done

simultan eously over n/2 paits of individuals. But it is not chat clear if it worth doing

so. Crossover is usually very simple and not so time-consuming; the point is nor

that too much time will be lost during the communication, but that the time gain in

the algori thm will be almost nothing compared to the effort produced to change the

code.

This kind of global parallelization simply shows how easy it can be to transpose

any GA onto a parallel machine and how a speed -up sublinear to the number of

processors may be e xpected.

9.14.4.2 Classification of Parallel GAs

The basic idea behind most parallel programs is to divide a cask into chunks and

co solve the chunkssimulraneously using multiple processors. This divide -and-

conquer approach can be applied to GAs in many different ways, and the literature contains many examples of successful parallel implementations. Some parallelizacion methods use a single population, while others divide the population

into several relatively isolated subpopulacions. Some methods can exp loit massively parallel computer architectures, while others are better suited to multicomputers with fewer and more powerful processing elements.

There are three main cypes of PGAs:

1. global single -population master -slave GAs,

2. single -population fine -grained,

3. multiple -population coarse -grained GAs.

In a master -slave GA there is a single panmicric population (just as in a simple

GA), but the evaluation of fitness is distributed among several processors (see

Figure 9-40). Since in this type of PGA, selection and crossover consider the entire

population it is also known as global PGA. Fine -grained PGAs are suited for

massively parallel computers and consist of one spatially structured population. munotes.in

## Page 243

242SOFT COMPUTING TECHNIQUES

Selection and mating are resrricred to a small neighbot hood, but neighbothoods

overlap permitting some interaction among all the individuals (see Figure 9-41 for

a sthematic of this class of GAs). The ideal case is co have only one individual for

every processing element available.

Multiple -popuJarion (or mult iple-deme) GAs are more sophisticated, as they

consist in several subpopulacions which exchange individuals occasionally (Figure

9-42 has a sthematic). This exchange of individuals Master Workers

Figure 9 A sthematic of a master -slave PGA. The master stores the population,

executes GA operations and distributes individuals to the slaves. The slaves only

evaluate the fitness of the individuals.

Figure 9·41 A sthematic of a fine -grained PGA. This class ofPGAs has one

spadally distributed popularion, and ir can be implemented very efficiently on

massively parallel compmers.

munotes.in

## Page 244

243Chapter 9: Genetic Algorithm

Figure 9·42 A sthematic of a mulciple -populaTion PGA. Each process is a simple

GA, and there is (infrequent) communicadon between the populations.

is called migration and, as we shall see in later sections, it is conttolled by several

parameters. Multiple -deme GAs are very popular, but also are the class ofPGAs

which is most difficult to understand, because the effects of migration are not fully

unders tood. Multiple -deme PGAs introduce fundamental changes in the operation

of the GA and have a different behavior than simple GAs.

Multiple -deme PGAs are known with different names. Sometimes they are known

as "distributed" GAs, because they are usually impl emented on distributed memory

MIMD computers. Since the computation to communication ratio is usually high,

they are occasionally called coarse -grained GAs. Finally, multipledeme GAs

resemble the "island model" in Population Genetics which considers relati vely

isolated demes, so the PGAs are also known ·as "island" PGAs. Since the size of

the demes is smaller than the population used by a serial GA, we would expect that

lhe PGA converges faster. However, when we compare the performance of the

serial and the parallel algorithms, we must also consider the qualicy of the solutions

found in each case. Therefore, while it is true that smaller demes converge faster,

it is also true iliar the qualicy of the solution might be poorer.

It is important to emphasize th at while the master -slave parallelization method

does not affect the behaviour of the algorithm, the last two methods change the way

the GA works. For example, in master -slave PGAs, selection takes into account all

the population, but in the other two PGAs , seleccion only considers a subset of

individuals. Also, in the mascerslave any two individuals in the population can

mare (i.e., there is random mating), but in the other methods mating is restricted to

a subset of individuals.

The final merhod to parall elize GAs combines multiple demes with masrerslave or

finegrained GAs. We call this class of algorithms hierarchical PGAs, because at a

higher level they are multipledeme algorithms with single -population PGAs (either

munotes.in

## Page 245

244SOFT COMPUTING TECHNIQUES

master -slave or finegrained) at the lo wer level. A hierarchical PGA combines the

benefits of its components, and it ptomises bener performance than any of them

alone.

Master -slave parallelization : This section reviews the masterslave (or global)

parallelization method. The algorithm uses a single population and the evaluation

of the individuals and/or the application of genetic operators are done in parallel.

As in the serial GA, each individual may compete and mate with any other (thus

selection and mating are global). Global PGAs are usual ly implemented as masrer -

slave programs, where the master stores the population and the slaves evaluate the

fitness.

The most common operation iliac is parallelized is the evaluation of the individuals,

because the fitness of an individual is independent f rom the rest of the population,

and there is no need to communicme during this phase. The evaluation of

individuals is parallelizcd by assigning a fraction of the population to each of the

processors available. Communication occurs only as each slave recei ves its subset

of individuals to evaluate and when the slaves return the fitness values. If the

algorithm stops and waits to receive the fitness values for all the population before ptoceeding intothe next generation, then the algorithm is synchtonous. A synchtonous mastcrslave GA has exacdy the same ptoperties as a simple GA, with

speed being the only difference. However, ir is also possible to implement an

asynchtonous master -slave GA where the algorithm does nor stop to wait for any

slow processors, but it does nor work exactly like a simple GA. Most global PGA

implementations are synchtonous and the rest of the paper assumes that global PGA

carry our exactly the same search of simple GAs.

The global paralleliz.acion model does nor assume anything about t he underlying

computer architecture, and it can be implemented efficiently on shared memory

and distributed -memory computers. On a sharedmemory multiprocessor, the

popul:.tion could be sloted in shared memory and each processor can read the

individuals ass igned co it and write the evaluation results back without any

conflicts.

On adisrributed -memorycomputer, the population can be scored in one processor.

This "master" processor would be responsible for explicidy sending the individuals

tothe other processor s {the "slaves") for evaluation, collecting the results and

applying thegenetic operators toproduce the next generation. The number of

individuals assigned to any processor may be constant, but in some cases (like in a

multiuser envitonment where the ucili z.acion of processors is variable) it may be munotes.in

## Page 246

245Chapter 9: Genetic Algorithm

necessary to balance the computational load among the processors by using a

dynamic stheduling algorithm (e.g., guided self stheduling).

Multiple -deme parallel GAs : The important characteristics of multiple -deme

PGAs are the use of a few relarively large subpopulations and migration. Mulciple -

deme GAs are the most popular parallel method, and many papers have been

written describing innumerable aspects and derails of their implementation.

Probably the first syst ematic srudy of PGA*Grosso's dissertation. His objective was to simulate the interaction of several *

parallel subcomponents of an evolving population. Grosso simulated diploid

individuals (so there were two subcomponentS for each "gene"), and the population

was divided into five demes. Each deme exchanged individuals with all the others

with a fixed migration rate.

With conttolled expetiments, Gtosso found cha the improvement of the average

population fitness was fasrer in th e smaller demes rhan in a single large panmictic

population. This confirms a longheld principle in Population Genetics: favorable

traits spread faster when the demes are small chan when the demes are large.

However, he also observed that when the demes wer e isolated, the rapid rise in

fitness stopped at a lower fitness value than with the large population. In other

words, the quality of the solution found after convergence was worse in the isolated

case chan in the single population.

With a low migration ra te, the demes still behaved independently and exploted

different regions of the search space. The migrants did not have a significant effect

on the receiving deme and the quality of the solutions was similar to the case where

the demes were isolated. Howev er, at intermediate migration rates the divided

population found solutions similar to those found in the panmictic population.

These observations indicate that there is a critical migration rate below which the

performance of the algorithm is obstructed by the isolation of the demes, and above

which the partitioned population finds solutions of the same quality as the

panmictic population.

It is interesting that such important observations were made so long ago, at the

same time that other systematic studie s ofPGAs were underway. For example,

Tanese ptoposed a PGA with the demes connected on a fourdimensional hypercube

topology. In Tanese's algorithm, migration occurred at fixed intervals between

processors aJong one dimension of the hypercube. The migrants were chosen

ptobabilistically from the best individuals in the subpopulation, and they replaced

the worst individuals in the receiving deme. Tanese carried out three sees of munotes.in

*
* *
*
## Page 247

246SOFT COMPUTING TECHNIQUESexpetiments. In the first, the interval between migrations was ser to five generat ions, and the number of processors varied. In tests with two migration rates

and varying the number of processors, the PGA found results of the same quality

as the serial GA. However, it is difficulc to see from the xpetimental results if the

PGA found the solutions sooner than the serial GA, because the range of the cimes

is too large. In the second set of expetiments, Tanese varied the mutation and

crossover rates in each deme, attempting to find parameter values to balance

explotation and exploitation. T he third set of expetiments studied the effect of the

exchange frequency on the search, and the results showed thar migrating too

frequendy or too infrequently degraded the performance of the algorithm.

The multideme PGAs are popular due to the following several reasons:

l. Multiple -deme GAs seem like a simple extension of the serial GA. The recipe

is simple: take a few conventional (serial) GAs, run each of them on a node

of a parallel computer, and at some predetermined times exchange a few

individuals.

2. There is relatively little extra effort needed to convert a serial GA into a

multiple -deme GA. Most of the program of the serial GA remains the same

and only a few subtoutines need to be added co implement migration.

3. Coarse -grain paral lel computers are easily available, and even when they are

not, it is easy co simulate one with a network of workstations or even on a

single processor using free software (like MPI or PVM).

There are a few important issues noted fromthe above sections. Fo r example, PGAs

are very ptomising in termsofthegains in performance. Also, PGAsare more

complex than their serial counterpartS. In particular, the migration of individuals

from one deme to another is conttolled by several p:uameters like (a) the topology

that defines the connections between the subpopulations, (b) a migr;uion r;Ht:: rh.lt

conttols how m;my individuals migrate and (c) a migration inrerval that affecrs

thefreqU<'lK· of mir.1inn. In rht.' btl' 1 1lS(h .ullll·arl· 1990 the research on PGA:;

began to explote alternatives to make PGAs faster and to understand better how

they worked.

Around this time the first theorecical srudies on PGAs began to appear and the

empirical research attempted to identify favorable parameters. This section reviews

some of that early theoretical work and expetimental srudies on migration and

topologies. Also in this period, more researthers began to use mulciplepopulation

GAs co solve application problems, and this section ends with a brief review of

their work. munotes.in

## Page 248

247Chapter 9: Genetic Algorithm

One of the directions in which the field matured is that PGAs began to be tested

with very large and difficult test functions.

Fine -grained PGAs : The devdopment of massively paraHel compmers triggers a

new approach of PGAs. To take advantage of new architectures with even a greater

number of processors and less communication coslS, fine -grained PGAs have been

devdoped. The population is now partitioned into a la..tge number of very smaJl

subpopulations. The limit (and may be ideal) case is to have just one individ ual for

every processing element available.

"Basically, the population is mapped onto a connected processor graph, usually,

one individual on each processor. (But it works also more than one individual on

each processor. In this case, it is preferable to c hoose a multiple of the number of

processors for the population size.) Mating is only possible between neighboring

individual, i.e, individuals stored on neighboring processors. The selection is also

done in a neighbothood of each individual and so depends only on local information. A motivation behind local selection is biological. In nature there is no

global selection, instead natural selection is a local phenomenon, raking place in an

individual's local envitonment.

If we want to compare this model to t he island model, each neighbothood can be

considered as a different deme. But here, the demes overlap ptoviding a way w

disseminate good solutions actoss the entire population. Thus, the topology does

not need w explicitly define migration toads and migrat ion rare.

It is common to place the population on a two -dimensional or three -dimensional

torus grid because in many massively parallel computers the processing elements are connected using this topology. Consequently each individual has four neighbors. Exp etimentally, it seems that good results can be obtained using a

topology with a medium diameter and neighbothoods nor too large. Like the

coarse -grained models, it worth trying to simulate this model even on a single

processor to improve the results. Indee d, when the population is stored in a grid

like this, after few generations, different optima could appear in different places on

the grid.

To sum up, with parallelization of GA, all the different models ptoposed and all the

new models we can imagine by mi xing those ones, can demonstrate how well GA

are adapted to parallel compmarion. In fact, the too many implementations reponed

in the literature may even be confusing. We really need to understand what truly

affects the performance of PGAs.

Fine-grained PG As have only one population, but have a spatial structure that limits

the interactions between individuals. An individual can only compere and mate munotes.in

## Page 249

248SOFT COMPUTING TECHNIQUES

with its neighors; but since the neighbothoods overlap good solutions may

disseminate actoss the entire popu lation.

Robertson parallelized the GA of a classifier system on a Connection Machine 1.

He parallelized the selection of parents, the selection of classifiers to replace,

mating, and cl -ossover. The execution time of his implementation was independent

of the number of classifiers (up to 16K, the number of processing elements in the

CM-1).

Hierarchical parallel algorithms : A few researthers have cried to combine two of

the methods to parallelize GAs, producing hierarchical PGAs. Some of these new

hybrid algorithms add a new degree of complexity to .the already complicated

scene of PGAs, but other hybrids manage to keep the same complexity as one of

their components. When two methods of pacallelizing GAs are combined they form

a hierarchy. At t he upper level most of the hybrid PGAs ace multiple -population

algorithms.

Some hybrids have a fine -grained GA at the lower level (see Figure 9-43). For

example Gruau invented a "mixed" PGA. In his algorithm, the population of each

deme was placed on a two -dimensional grid, and the demes themselves were

connected as a two -dimensior:tal toM. Migration between demes occurred at

regulae intervals, and good results were reported for a novel neucal network design

and uaining application.

Another type of hierarch ical PGA uses a master -slave on each of the demes of a

multi -population GA (see Figure 9-44). Migration occurs between demes, and the

evaluation of the individuals is handled in parallel. This approach does not

inttoduce new analytic problems, and it can b e useful when worlt:ing with complex applications with objective functions that need a considerable amount of compurarion time. Bianchini and

Figure 9-43 Hierarchical GA combines a multiple -deme GA (ar the upper level)

and a fine -grained GA {at the lower level).

munotes.in

## Page 250

249Chapter 9: Genetic Algorithm

Figure 9-44 A sthematic of a hierarchical PGA. At the upper level this hybrid is a

mulci -deme PGA where each node is a master -slave GA.

Figure 9·45 This hybrid uses mulciple -deme GAs ar both the upper and the lower

levels. At the lower level the migration rate is faster and the communicarions

topology is much denser than at the upper level.

Btown presented an example of this method of hybridizing PGAs, and showed that

it can find a solution of the same qualiry as of a masrerslave PGA or a multipledeme

GAin less time.

Interestingly, a very similar concept was invented by Goldberg in the contex t of an objecr·oriented implementation of a "community model" PGA. In each "community" there are multiple houses where parents reproduce and the offsprings

are evaluated. Also, there are multiple communities and ir is possible that individuals migrate to o ther places.

munotes.in

## Page 251

250SOFT COMPUTING TECHNIQUES

A third method of hybridizing PGAs is to use multiple -deme GAs at both the upper

and the lower levels (see Figure 9-45). The idea is to force panmiaic mixing ar the

lower level by using a high migration rate and a dense topology, while a low

migration rate is used at the high level. The complexity of this hybrid would be

equivalent to a multiplepopularion GA if we consider the gtoups of panmicric

subpopularions as a single deme. This method has nor been implemented yet.

Hierarchical implementat ions can reduce the execution time more than any of their

components alone.

9.14.4.3 Coarse· Grained PGAs - The Island Model

The second class of PGA is once again inspired by nature. The population is now

divided into a few subpopu lations or demes, and each of these relatively large demes evolves separately on different processors. Exchange between subpopularions is possible via a migration operator. The term island model is easily

understandable; the GA behave as if the world was constituted of islands where

populations evolve isolated from each other. On each island the population is free

to converge award different optima. The migration operator allows "merissage" of

the different sub populations and is supposed to mix good features that emerge

locally in the different demes.

We can notice chat this time the nature of the algorithm changes. An individual can

no longer breed wit h any other from the entire population, but only with individuals

of the same island. Amazingly, even if this algorithm has been developed to be used

on several processors, it is wonh simulating it sequentially on one processor. It has

been shown on a few problems that better results can be achieved using this model.

This algorithm is able to give different suboptimal solutions, and in many problems,

it is an advantage if we need to determine a kind of landscape in the search space

to know where the good so lutions are located. Another great advantage of the island

model is iliat cite population in each island can evolve wiili different rules. That

can be used for multicriterion optimization. On each island, selection can be made

according to different fitnes s functions, representing different criterions. For

example it can be useful to have as many islands as criteria, plus another central

island where 'selection is done with a multicriterion fitness function.

The migration operator allows individuals to move betwen islands, and therefore,

m mix criteria.

In lirerarure this model is sometimes also referred as the coarsegrained PGA. (In

parallelism, grain size refers m the ratio of time spent in computation and time spent

in communication; when the ratio is hig h the processing is called coarsegrained).

Sometimes, we can also find the term "distributed" GA, since they are usually

implemented on distributed memory machines (MIMD Computers). munotes.in

## Page 252

251Chapter 9: Genetic Algorithm

Technically there are three important features in the coarsegrained PGA: the

topology that defines connec tions between sub populations, migration rare that

conttols how many individuals migrate, migration intervals chat affect how often

the migration occurs. Even if a lot of work has been done to find optimal mpology

and migrat ion parameters, here, intuition is still used more often than analysis with

quite good results.

Many topologies can be defined m connect the demes, but the most common

models are the island model and the steppingstones model. In the basic island

model, mi gration can occur between any subpopulations, whereas in the Stepping

stone demes are disposed on a ring and migration is restricted to neighbouring

demes. Works have shown that cite topology of the space is nor so important as

long as ir has high connecti vity and small diameter to ensure adequate mixing as

time proceeds.

Choosing the right time for migration and which individuals should migrate

appears to be more complicated. Quite a lot of work is done on this subject, and

problems come from the following dilemmas. We can observe that species are

converging quickly in small isolated populations. Nevertheless, migrations should

occur after a time long enough for allowing the development of goods characteristics in each subpopulation. It also appears that, immigration is a trigger

for evolutionary changes. If mjgrarion occurs after each new generation, the

algorithm is more or le equivalent to a sequencia \ GA with a lar ger population. In

praaice, migration occurs either after a fixed number of iterations in each deme or

at uniform periods of time. Migrants are usually selected randomly from the best

individuals in the population and they replace the worst in the receivin g deme. In

fact, intuition is still mainly used to fix migration rare and migration intervals; there

is absolurely nothing rigid, each personal cooking recipe may give good results.

9.14.5 Independent Sampling Genetic Algorithm (ISGA)

In the independent sa mpling phase, we design a core stheme, named the "Building

Block Detecting Strategy" (BBDS), to extract relevam building block information

of a fitness landscape. In this way, an individual is able to sequentially construct

more highly fir partial solution s. For Toyal Toad Rl, the global optimum can be

attained easily. For other more complicared fitness landscapes, we allow a number

of individuals to adopt the BBDS and independently evolve in parallel so that each

sthema region can be given samples indepcndently. During this phase, the population is expected to be seeded with ptomising genetic material. Then follows

the breeding phase, in which individuals are paired for breeding based on two mate -

selection sthemes (Huang, 2001): individuals being assigned m ates by natural munotes.in

## Page 253

252SOFT COMPUTING TECHNIQUES

selection only and individuals being allowed to actively choose their mares. In the

Iauer case, individuals are able to distinguish candidate mates that have the same

fitness yet have different string structures, which may lead to quite dif ferent

performance after crossover. This is nor achievable by natural selection alone since

it assigns individuals of the same fitness the same probability for being mares,

without explicitly raking into account string suucrures. In short, in the breeding

phase individuals manage to construct even more ptomising sthemata thtough the

recombination of highly fir building blocks found in the first phase. Owing to the

thatacteristic of independent sampling of building blocks that distinguishes the

ptoposed GAs from tonventional GAs, we name this type of GA independent

sampling genetic algorithms (ISGAs).

9.14.9 Tomparison of ISGA with PGA

The independent sampling phase of ISGAs is similar m the fine -grained PGAs in

the sense that each individual evolves autonomously, although ISG.As do not adopt

the population scrucrure. An initial population is randomly generated. Then in

every cycle each individual does local hill climbing, and creates the next population

by mating with a parmer in its neighbothood and replacing parents if offsprings are

better. By tontrast, IS Gas partition the genetic processing into two phases: the

independent sampling phase and the breeding phase as described in the preceding

section. Third, the approach employed by each individual for improvement in IS

GAs is different from that of the PGAs. During the independent sampling phase of

ISGAs, in each cycle, through the BBD S, each individual attempts to extract

relevant informacion of potential building blocks whenever its fitness increases.

Then, based on the sthema information accumulated, individuals tontinue to

tonstruct more tomplicated building blocks. However, the ind ividuals of fine -

grained PGAs adopt a local hill climbing algorithm that does not manage to extract

relevant information of potential sthemata.

The motivation of the two phased ISGAs was partially from the messy genetic

algorithms (mGAs). The two stages employed in the mGA.s are "prtwordial phase"

and "juxtaPositional phase," in which the mGAs first emphasize candidate building

blocks based on the guess at the order k of small sthemata, then just aposing them

to build up global optima in the second phase by "cut" and "splice" operators.

However, in the first phase, the mGAs still adopt centralized selection to emphasize

some candidate sthemata; thi s in rum results in the loss of samples of other

potentially ptomising sthemata. By tontrast, IS GAs manage to postpone the

emphasis of candidate building blocks to the latter stage, and highlight the fearure

of independent sampling of building blocks to s uppress hitchhiking in the first munotes.in

## Page 254

253Chapter 9: Genetic Algorithm

phase. As a result, population is more diverse and implicit parallelism can be

fulfiUed to a larger degree. Thereafter, during the second phase, ISGA.s implement

population breeding thtough two mateselecrion sthemes as discussed in the preceding section. In the following subsections, we present the key tomponenrs of

ISGAs in detail and show the tomparisons between the expetimental results of the

ISGAs and those of several other GAs on two benchmark test functions.

9.14.5.2 Tomponents of ISGAs

ISGAs are divided into two phases: the independent sampling phase and the

breeding phase. We describe them as follows.

Independent sampling phase : To implement independent sampling of various

building blocks, a number of strings are allowed w evolve in parallel and each

individual searthes for a possible evolutionary path entirely independent of others.

In this section, we develop a new searching strategy, BBDS, for each individual to

evolve based on the accumulated knowledge for potentially useful building blocks.

The idea is to allow each individual to probe valuable information toncerning

beneficial sthemata thtough resting its fitness increase since each time a fitness

increase of a string tould tome from the presence of usefu l building blocks on it. In

short, by systematically resting each bit to examine whether this bit is associated

with the fitness increase during each cycle, a cluster of bits tonstituting potentially

beneficial sthemata will be untovered. Iterating this process guarantees the formation oflonger and longer candidate building blocks.

The operation of BBDS on a string can be described as follows:

1. Generate an empty set for tollecting genes of candidate sthemata and create

an initial string with uniform prob ability for each bit until its fitness exceeds

0. (Retord the current fitness as Fit.)

2. Except the genes of candidate schemata collected, from lefr to right, successively all the other bits, one at a time, evaluate the resuhing string. If

the resulting fitness is less than Fit, retord this bit's position and original value

as a gene of candidate sthemata.

3. Except the genes retorded. Randomly generate all the other bits of the string

until the resulting string's fitness exceeds Fit. Replace Fit by the new fitness.

4. Go to steps 2 and 3 until some end criterion. The idea of this strategy is that

the tooperation of certain genes (bits) makes for good fitness. munotes.in

## Page 255

254SOFT COMPUTING TECHNIQUES

Once these genes tome in sight simultaneously, [hey tontribute a fitness increase w

the string tontaining them; thus any .loss of one of these genes leads to the fitness

decrease of the string. This is essentially what step 2 does and after this step we

should be able to tollect a set of genes of candidate sthemata. Then at step 3, we

keep the tollected genes of candidate sthel) lata fixed and randomly generate other

bits, awaiting other building blocks to appear and bring forth another fitness in

crease.

However, step 2 in this strategy only emphasizes the f1mess dtop due to a particular

bit. It ignores the possibility that the same bit leads to a new fitness rise because

many loci tould interact in an extremely non linear fashion. To rake this into

actount , the second version ofBBDS is inttoduced thtough the change in seep 2 as

follows.

Step 2: Except the genes of candidate sthemata tollected, from left to right,

successively all the other bits, one at a time, evaluate the resulting string. If the

resulting fitness is less than Fit, retord this bit's position and original value as a

gene of candidate sthemata. If the resulting fitness exceeds Fit, substitute this bit's

'new' value for the old value, replace Fit by this new fitness, retord this bit's

posicion and 'new' value as a gene of candidate sthemata, andre -execute this step.

Because this version of BBDS cakes into consideration the fitness increase resulted

from that particular bit, iris expected to cake less time for detecting. Other versions

of RBDS a re of tourse possible. For example, in step 2, if the same bit resuhs in a

fitness increase, ir can be retorded as a gene of candidate sthemata, and the

ptocedure tontinues to test the residual bits yetwithour tompletely traveling back to

the first bit to reexamine each bit. However, the empirical results obtained rhus far

indicate that the performance of this alternative is quire similar to that of the second

version. More expetimental results are needed to distinguish the difference between

them.

The over all implementation of the independent sampling phase of ISGAs is thtough

the ptoposed BBDS to get autonomous evolution of each string until all individuals

in the population have reathed some end criterion.

Breeding phase: After the independent sampling ph ase, individuals independendy

build up their own evolutionary avenues by various building blocks. Hence the

population is expected to tontain diverse beneficial sthemata and premature

tonvergence is alleviated to some degree. However, factors such as decep tion and

intompatible sthemata (i.e., two sthemata have different bit values ar common

defining positions) still could lead individuals to arrive at suboptimal regions of a munotes.in

## Page 256

255Chapter 9: Genetic Algorithm

fitness landscape. Since building blocks for some strings to leave suboptimal

regio ns may be embedded in other srrings, the search for ptoper maring partners

and then exploiting the building blocks on them are critical for overwhelming the

difficulty of strings being trapped in undesired regions. In Huang (2001) the

importance of mate se lection has been investigated and the results showed that the

GAs is able to improve their performance when the individuals are allowed to select

maces to a larger degree.

In this section, we adopt two mate -selection sthemes analyzed in Huang (2001) w

breed the population: individuals being assigned mates by natural selection only

and individuals being allowed to actively choose their mares. Since natural

selection assigns strings of the same fitness the same probability for being parents,

individuals of id entical fitness yet distinct string structures are treated equally. This

may result in significant loss of performance improvement after crossover.

We adopt the tournament selection stheme (Mitthell, 1996) as the tole of natural

selection and the mechanism for choosing mates in the breeding phase is as follows:

During each mating evem, a binary tournament selection with ptobabilicy 1.0 is

performed to select the first individual out of the two fittest randomly sampled

individuals according to the f ollowing sthemes:

1. Run the binary tournament selection again to choose the partner.

2. Run another two times of the binary tournament selection to choose two

highly fit candidate partners; then the one more dissimilar to the first

individual is selecte d for mating.

The implementation of the breeding phase is thtough iterating each breeding cycle

which consists of (a) two parents obtained on the basis of the mateseleccion

sthemes above. (b) Two -point crossover operator (crossover rate 1.0) is applied to

these parents. (c) Both parents are replaced with both offsprings if any of the two

offsprings is better than them. Then steps (a), (b) and (c) are repeated until the

population size is reathed and this is a breeding cycle.

9.14.6 Real -Coded Genetic Algori thms

The variant of GAs for rea.lvalued optimization that is closest to the original GA

are socalled realcoded GAs. Let us assume that we are dealing with a free

Ndimensional realvalued optimization problem, which means X = RN without

tonstraints. In a real-coded GA, an individual is then represented as an N-

dimensional vector of real numbers:

b = (Xi, … .,XN) munotes.in

## Page 257

256SOFT COMPUTING TECHNIQUES

As selection does not involve the particular toding, no adaptation needs to be made -

all selection sthemes discussed so f ar are applicable withour any restriction. What

has to be adapted to £his special structure are the genetic oper.uions crossover and

mutation.

9.14.6.1 Crossover Operators for Real -Coded GAs

So far, the following crossover sthemes are most common for real-coded GAs:

Flat crossover: Given two parents b1 = (x1/2, ... , x1/N) and b2 = (x2/1, ... , x2/N), a

vector of random values from the unit interval (AJ , ... , AN) is chosen and the

offspring b = (x{, ... , xfv) is tomputed as a vector of linear tombinations in the

following way (for all i = 1, ... , N):

x1i = Oi - x1i + (1 - Oi) – x2i

BLX -Įcrossover is an extension of flat crossover, which allows an offspring allele

to be also located outside the interval

[min(x1i, x2j), max(x1i, x2i)]

In BLX - Į crossover, each offspring allele is chosen as a uniformly disuibuted

random value from the imerval

[min (x1i, x2j), max(x, 1i, x2i) + 1 -Į@

where l = max(x1i,x2i) – min (x1i,x2i). The parameter a has to be chosen in advance.

For a = 0, BLX -a crossover becomes identical to flat crossover.

Simple crossover is nothing else but classical one -point crossover for real vectors,

i.e., a crossover site k H 2{ 1, ... , N- 1} is chosen and cwo offspring are created in

the following way:

b1 = (x1i««[1k, x1k+1 «x2N)

bN = (x21««[2k, x1k+1 «x1N)

Discrete crossover is analogous to classical uniform crossover for real vectors. An

offspring b of the two parents b1 and b2 is composed from alleles, which are

randomly chosen either as x1i or x2i.

9.14.6.2 Mutation Operators for Real -Coded GAs

The following mutation operators are most common for real -coded GAs:

1. Random mutation: For a randomly chosen gene i of an individual b = (x l, ... ,

XN), the allele x; is replaced by a randomly chosen value from a predefined

interval Ia, b,]. munotes.in

## Page 258

257Chapter 9: Genetic Algorithm

2. Nonuniform mutation : In nonuniform mutation, the possible impact of

mutation decreases with the number of generations. Assume that fmax is the

predefined maximum number of generations. Then, with the same setup as in

random mumion, the allele xi is replaced by one of the two values

= x1+A (t,b; - x1)

:if= x; -A (r,x; - a;)

The choice as to which of the two is taken is determined by a random expetiment

with two outtomes that have equal probabilities 1/2 and I /2. The random variable

A (t, x) determines a mutation step from the range 10, xl in the following way:

D. (t,x) = x(J -,+GP,-

In this formula, A is a uniformly distributed random value from the unit interval.

The parameter r determines the influence of the generation index ton the disrribution of mutation step sizes over the imerval IO,xl.

9.15 Holland Classifier Sy stems

A Holland classifier system is a classifier system of the Michigan type which

processes binary messages of a fixed length thtough a rule base whose rules are

adapted actording to response of the envitonment.

9.15.1 The Production System

First of all, the tommunication of theproduction system with the envitonment is

done via an arbitrarily long list of messages. The derectors translate responses from

the environment two binary messages and place them on the message list which is

then scanned and ch anged by the rule base. Finally, the effectors translate output

messages two actions on the envitonment, such as forces or movements.

Messages are binaty strings of the same length k. More formally, a message belongs

w {0, l}k. The rule base consists of a fixed number (m) of rules (classifiers) which

tonsist of a fixed number (r) of conditions and an acrion, where both conditions and

actions are strings oflength k over the alphabet {0, 1, *}.The asterisk plays the tole

of a wildcard, a 'don't care' symbol.

A condition is matthed if and only if there is a message in the list which matthes

the tondition in all nonwildcard positions. Moreover, conditions, except the first

one, may be negated by adding a' -' prefix. Such a prefixed tondition is satisfied if

and o nly if there is no message in the list which marthes the string associated with

the tondition. Finally, a rule fires if and only if all the conditions are satisfied, i.e., munotes.in

## Page 259

258SOFT COMPUTING TECHNIQUES

the conditions are tonnected with AND. Such 'firing' rules tompere to put their

action messages on the message list.

In the action pans, the wildcard symbols have a different meaning. They take

thetole of 'pass through' element. The outpm message of a firing rule, whose action

parr tontains a wildcard, is composed from the actually the re ason why Ilegations

of the first conditions are not allowed. More formally; the outgoing message m is

defined as

where a is the action part of the classifier and m is the.(Ilessage which matthes the

first tondition. Formally, a classifier is a suing of t he form

Cond 1,|’-‘|| Cond 2, ……, |’ -‘ Cond,/Action

where the brackets shouJd express the optionalicy of the " -" prefixes. Depending

on the toncrete nee¢; of the task to be solved, it may be desirable to allow messages

to be preserved for the next step. More specifically, if a message is not interpreted

and removed by the effectors interface, it can make another classifier fire in the

next step. In practical applications, this is usually actomplished by reserving a few

bits of the messages for identifying the origin of the messages (a kind of variable

index called tag).

Tagging offers new opportunities to transfer information about the current step

intothe next step simply by placing ragged messages on the list, which are not

interpreted, by the output interfa ce. These messages, which obviously tontain

information about the previous step, can support the decisions in the next step.

Hence, apptopriate use of rags permits rules to be toupled to act sequenrially. In

some sense, such messages are the memory of the system.

A single execmion cycle of the production system consists of the following steps:

1. Messages from the environment are appended tothe message list.

2. All the conditions of all classifiers are thecked against the message list w

obtain the set of firing rules.

3. The message list is erased.

4. The firing classifiers participate in a tompetition to place their messages on

the list.

5. The winning classi fiers place their actions on the list.

munotes.in

## Page 260

259Chapter 9: Genetic Algorithm

6. The messages directed to the effectors are executed.

This ptocedure is repeated iteratively. How step 6 is done, if these messages are

deleted or nor, and so on, depends on the toncrete implementation. It is, on t he one

hand, possible to choose a representation such that the effectors can interpret each

output message. On the other hand, it is possible to direct messages explicitly to

the effectors with a special tag. If no messages are directed to the effectors, t he

system is in a iliinking phase.

A classifier Rl is called tonsumer of a classifier R2 if and only if there is a message

mO which fulfills at least one ofRl's conditions and has been placed on the list by

R2. Tonversely, R2 is called a supplier of Rl.

9.15.2 The Bucket Brigade Algorithm

As already mentioned, in each time step t, we assign a strength value u i,t to each

classifier Ri. This strength value represents the torrectness and importance of a

classifier. On the one hand, the strengrh value influenc es the chance of a classifier

to place its action on the output list. On the other hand, the suength values are used

by the rule distovery system, which we will soon discuss.

In Holland classifier systems, the adaptation of the strength values depending on

the feedback (payoff) from the envitonment is done by the so.called bucket brigade

algorithm. It can be regarded as a simulared economic system in which various

agents, here the classifiers, participate in an auction, where the chance to buy the

right to post the action depends on the strength of the agents.

The bid of classifier Ri at timet is defined as

B;,, = CLrJ;,,S;

where CL E [0, 1] is a learning parameter, similar to learning rates in anificial neural

nets, and s,- is the specificity, the number of nonwildcard symbols in the tondition

pan of the classifier. If CL is chosen small, the system adapts slowly. If it is chosen

too high, the strengths rend to oscillate chaotically. Then the rules have to tompete

for the right for placing their"output messages on the list. In the simplest case, this

can be done by a random expetiment like the selection in a genetic algorithm. For

i:h bidding classifier it is decided randomly if it wins or not, where the probability

that it wins is proportional to its bid:

munotes.in

## Page 261

260SOFT COMPUTING TECHNIQUES

In rhis equation, Sar1 is the set of indices of all classifiers which are satisfied at

timet. Classifiers which get the right to post their output messages are called

winning classifiers.

Obviously, in this approach more than one winning classifier is allowed. C f tourse,

or her selection sthemes are reasonable, for instance, the highest bidding agent wins

alone. This is necessary t o avoid the conflict between two winning classifiers. Now

let us discuss how payoff from the envitonment is disrtibuted and how the strengths

are adapted. For this purpose, let us denme the set of classifiers, which have

supplied a winning gent R; in step t ZLWK7KHQWKHQHZVWUHQJWKRIDZLQQLQJ

agent is reduced by its bid and increased by its portion of the payoff P1 received

&om the environment:

where w1 is the number of winning agents in the actual time step. A winning agent

pays its bid to its suppliers which share the bid among each other equally in the

simplest case:

If a winning agent has also been active in the previous step and supplies another

winning agent, the value above is additionally increased by one portion of the bid

the tonsumer offers. In the case that two winning agents have supplied each other

mutually, the portions of the bids are exchanged in the above manner. The

SHengrhs of all other classifiers Rm which are neither winning agents nor suppliers

of winning agents, are reduc ed by a certain factor (they pay a rax):

un,1+1 = N n,1 (1 – T)

T is a small value lying in the interval [0, 1]. The intention of taxation is to punish

classifiers which never contribute anything to the outputof thesystem. With this

concept, redundant classifiers, which never become active, can be filtered out.

The idea behind credit assignment in general and bucket brigade in particular is w

increase the strengths of rules, which have ser the stage for later successful actions.

The problem of determining such classifiers, which were responsible for conditions

under which it was later on possible to receive a high payoff, can be very difficult.

Consider, for instance, the game of thess again, in which very early moves can be

significant for a l ate success or failure. In fact, the bucker brigade algorithm can

solve this problem, although strength is only transferred to the suppliers, which

munotes.in

## Page 262

261Chapter 9: Genetic Algorithm

were active in the previous step. Each time the same sequence is activated,

however, a little bir of the pay off is transferred one step back in the sequence. It is

easy to see that repeated successful execution of a sequence increases the mengrhs

of all involved classifiers.

Figure 9·46 The bucker brigade principle.

Figure 9-46 shows a simple example of how the bucker brigade algorithm works.

For simplicity, we consider a sequence of five classifiers which always bid 20% of

their strength. Only after the fifth step, after the activation of the fifth classifier, a

payoff of 60 is received. The f urther development of the strengths in this example

is shown in the Table lS -7. It is easy to see from this example that the reinforcement

of the strengths is slow at the beginning, but it accelerates later. Exactly this

property tontributes much to the ro bustness of classifier systems - they tend to be

cautious at the beginning, trying not to rush conclusions, but, after a certain number

of similar situations, the system adopts the rules more and more.

It might be clear that a Holland classifier system onl y works if successful sequences

of classifier activations are observed sufficiently often. Otherwise the bucket

munotes.in

## Page 263

262SOFT COMPUTING TECHNIQUES

brigade algorithm does not have a chance to reinforce the strengths of the successful

sequence ptoperly.

9.15.3 Rule Generation

The purpose of the rule distovery system is to eliminate low -firred rules and to

replace them by hopefully better ones. The fitness of a rule is simply its strength.

Since the classifiers of a Holland classifier system themselves are strings, the

applicati on of a GA to the problem of rule induction is straightforward, though

many variants are reasonable. Almost all variants have one thing in common: the

GA is nor invoked in each time step, but only every nth step, where 11 has to be

set such that enough inf ormation about the performance of new classifiers can be

obtained in the meantime. A. Geyer -Schuh., for instance, suggests the following

ptocedure, where the strength of new classifiers is initialized with the average

strength of the current rule base:

1. Select a subpopulation of a certain size at random.

2. Compute a new set of rules by applying the genetic operations - selection,

crossingover and muration - to this subpopularion.

3. Merge the new sub population with the rule base omitting duplicates and

replace the worst classifiers.

Table 9·7 An example for repeated propogation of payoffs

______________________________________________________

Strength after the

3rd 100.00 100.00 101.60 120.80 172.00

4th 100.00 100:32 103.44 136.16 197.60

5th 100.06 101.34 111.58 92.54 234.46

6th 100.32 103.39 119.78 168.93 247.57

.

.

.

10th 106.56 124.17 164 .44 224.84 278.52

.

.

.

25th 29.86 253.20 280.36 294.52 299.24

.

.

.

execution of the sequence

______________________________________________________ munotes.in

## Page 264

263Chapter 9: Genetic Algorithm

This process of acquiring new rules has an interesting sideffect. Iris more rhan just the exchange of parts of conditions and actions. Since we have nor stared restrictions for manipu lating rags, the GA can retombine parts of already existing

rags m invent new tags. In the following. rags spawn related rags establishing new

touplings. These new tags survive if they tonrribute to useful interactions. In this

sense, the GA additionally c reates experience -based internal structures autonomously.

9.16 Genetic Programming

Genetic programming (GP) is also part of the gtowing set of evolutionary

algorithms that apply the search principles of natural evolution in a variety of

differem problem domains, notably parameter optimization. Evolutionary algorithms and GP in particular, follow Darvin's principle of differential natural

selection. This principle states that the follow"ing preconditions must be fulfilled

for evolution to occur via (natural) selection:

1. There are entities called individuals which form a population. These entities

can reproduce or can be reproduced.

2. There is herediry in reproduction, rhat is to say that individuals produce

similar offspring.

3. In the tourse of reprodu ction, there is variery which affects the likelihood of

survival and therefore of reproducibility of individuals.

4. There are finite resources which cause the individuals to tompete. Owing to

over reproduction of individuals nor all can survive the struggle for existance. Differential natural selections will exert a tontinuous pressure towards improved individuals.

In the long run, GP and other evolutionary computing technologies will revolutionize program devel opmem. Present methods are not mamre enough for

deploymem as automatic programming systems. Nevertheless, GP has already

made intoads two automatic programming a nd will tontinue to do so in the

foreseeable fmure. Likewise, the application of evolution in machine -learning

problems is one of the potentials we will exploit over the coming decade.

GP is part of a more general Held known as evolutionary tomputation. Ev olutionary

tomputation is based on the idea that basic concepts of biological reproduction and

evolution can serve as a metaphor on which computer -based, goal -directed problem

solving can be based. The general idea is that a computer program can maintain a munotes.in

## Page 265

264SOFT COMPUTING TECHNIQUES

population of artifacts represented using some suitable computer -based data

structures. Elements of that population can then mare, mutate, or otherwise

reproduce and evolve, directed by a fitness measure that assesses the quality of the

population with re spect to the goal of the task at hand.

GP is an automated method for creating a working computer program from a high -

level problem statement of a problem. GP starts from a high -level statement of

'what needs to be done' and automarically creates a computer program to solve the

problem.

One of the central challenges of computer science is to get a computer to do what

needs to be done, without telling it how to do it. GP addresses this challenge by

ptoviding a method for automatically creating a working tompm er program from a

high-level problem statement of the problem. GP achieves this goal of automatic

programming (also sometimes called program synthesis or program induction) by

genetically breeding a population of computer programs using the principles of

Darwinian natural selection and biologically inspired operations. The operations

include reproduction, crossover, mutation and architecture -altering operations

patterned after gene duplication and gene deletion in nature.

GP is a domain -independent method t hat genetically breeds a population of

computer programs to solve a problem. Specifically, GP iteratively transforms a

population of computer programs into a new generation of programs by applying

analogs of naturally occurring genetic operations. The gene tic operations include

crossover, mutation, reproduction, gene duplication and gene deletion. GP is an

excellent problem solver, a superb function apptoximator and an effective tool for

writing functions to solve specific tasks. However, despite all these areas in which

it excels, it still does not replace programmers; rather, it helps them. A human still

must specify the fitness function and identify the problemto which GP should be

applied.

9.16.1 Working of Genetic Programming

GP typically starts with a population of randomly generated tom purer programs

composed of the available programmatic ingredients. GP iteratively transforms a

population of computer programs into a new generation of the population by

applying analogs of na turally occurring genetic operations. These operations are applied to individual(s) selected from the population. The individuals are ptobabilisrically selected to participate in the genetic operations based on their

fitness (as measured by the f1tness mea sure provided by the human user in the third munotes.in

## Page 266

265Chapter 9: Genetic Algorithm

preparatory step). The iterative transformation of the population is executed inside

the main generational loop of the run of G P.

The executional steps of GP (i.e., the flowchart of GP) are as follows;

1. Rand omly create an initial population (generation 0) of individual computer

programs composed of the available functions and terminals.

2. Iteratively perform the following subsreps (called a genemtion) on the

population until the termination criterion is sat isfied:

* Execute each program in the population and ascertain its fitness

(explicitly or implicitly) using the problem's fitness measure.

* Select one or two individual program(s) from the population with a

probability based on fitness (with reselecrion allowed) to participate in

the genetic operations in the next subsrep.

* Create new individual program(s) for the populaiion by applying the

following genetic operations with specified probabilities:

(a) Reproduction: Topy the selected individ ual program to the new

population.

(b) Crossover: Create new offspring program(s) for the new population by recombining randomly chosen parts from two

selected programs.

(c) Mutation: Create one new offspring program for the new population by randomly mu tating a randomly chosen part of one

selected program.

(d) Archirecrure -altring operation - Choose an architecture altering

operation from the available repertoire of such operations and

create one new offspring program for the new population by

applying the chosen architecture -altering operation to one selected program.

3. After the termination criterion is satisfied, the single best program in the

population produced during the run (the besr -so-far individual) is harvested

and designated as the result o f the run. If the run is successful, the result may

be a solution (or approximate solution) to the problem.

GP is problem -independent in the sense that the flowchart specifying the basic

sequence of executional steps is not modified for each new run or eac h new

problem. There is usually no discretionary human intervention or interaction during munotes.in

## Page 267

266SOFT COMPUTING TECHNIQUES

a run of genetic programming (although a human user may exercise judgment as to

whether to terminate a run).

Figure 9-47 below is a flowchart showing the executional steps of a run ofGP. The

flowchart shows the genetic operations of crossover, reproduction and mutation as

well as the architecrurealrering operations. This flowchart shows a two -offspring

version of the crossover operation.

Figure 9-47 Flowchart of genetic programming.

munotes.in

## Page 268

267Chapter 9: Genetic Algorithm

The flowchart of GP is explained as follows: GP starts with an initial population of computer programs composed of functions and terminals apptopriate to the problem. The individual programs in the initial population are t ypically generated

by recursively generating a rooted point -labeled program tree composed of random

choices of the primitive functions and terminals (provided by the human user as

part of the first and setond preparatory steps, a run ofGP). The initial ind ividuals

are usually generated subject to a pre -established maximum size (specified by the

user as a minor parameter as pan of the founh preparatory step}. In general, the

programs in the population are of different sizes (number of functions and

terminals) and of different shapes (the particular graphical arrangement of functions and terminals in the program tree).

Each individual program in the population is executed. Then, each individual

program in the population is either measured or tompared in rerms of how well it

performs the task at hand (using the fitness measure provided in the third

preparatory step). For many problems, this measurement yields a single explicit

numerical value called fitness. The fitness of a program may be measured in many

different ways, including, for example, in terms of the amount of error between its

output and the desired output, the amount of time (fuel, money, etc.) required to

bring a system to a desired target stare, the accuracy of the program in retognizing

patterns or classifying objects into classes, the payoff that a game -playing program

produces, or the tompliance of a tompl ex structure (such as an antenna, circuit, or

tonttoller) with user -specifted design criteria. The execution of the program

sometimes returns one or more explicit vaJues. Alternatively, the execution of a

program may tonsist only of side effecrs on the sta re of a world (e.g., a robot's

actions). Alternatively, the execution of a program may produce both return values

and side effects.

The fitness measure is, for many practical problems, mulriobjecrive in the sense

that it tombines two or more differem elements. The different elements of the

fitness measure are often in tompetition with one another to some degree. For many problems, each program in the population is executed over a representative sample of different fituess cases. These fitness cases may represent

different values of the program's inpur(s), differem initial conditions of a system,

or different envitonments. Sometimes the fitness cases are tonstructed probabilistically.

The creation of the initial random population is, in effect, a blind random search of

the search space of the problem. It provides a baseline for judging future search

effons. Typically, the individual programs in generation 0 all have exceedingly munotes.in

## Page 269

268SOFT COMPUTING TECHNIQUES

poor fitness. Nevertheless, some individuals in the population are {u sually) more

fir than odters. The difference. in fitness are dten exploited by GP. GP applies

Darwinian selection and the genetic operations to create a new population of

offspring programs from the current population.

The genetic operations include crossover, mutation, reproduction and the architecture -altering operations. These genetic operations are applied to individual(s) that are ptobabilistically selected from the population based on

fitness. In this ptobabilistic selection process, better individual s are favored over

inferior individuals. However, the best individual in the population is not necessarily selected and the worst individual in the population is not necessarily

passed over.

After the genetic operations arc performed on the current populat ion, the population

of offspring (i.e. the new generation) replaces the current population {i.e., the now -

old generation). This iterative process of measuring fitness and performing the

genetic operations· is reeated over many generations.

The run of GP te rminates when the termination criterion (as provided by the fifth

preparatory step) is satisfied. The outcome of the run is specified by the method of

result designation. The best individual ever encountered during the run (i.e., the

best-so-far individual ) is typically designated as the result of the run.

All programs in the initial random population {generation 0) of a run of GP are

symmetrically valid, executable programs. The genetic operations that are performed during the run (i.e., crossover. mutation, reproduction and the architecture -altering operations) are designed to produce offspring that art: syntactically valid, executable programs. Thus, ever individual created during a run

of genetic programming (including, in pmicular, the best -of-run indiv idual) is''

syntactically valid, executable program.

9.16.2 Characteristics of Genetic Programming

GP now toutinely delivers high -return human -competitive machine intelligence ,

the next four subsections explain what we mean by the terms human -competitive ,

high-return, routine and machine intelligence.

9.16.2.1 Human -Competitive

In attempting to evaluate an automated problem -solving method, the question

arises as to whether there is any real substance tothe demonstrative problems that

are published in connection with the method. Demonstrative problems in the fields

of artificial intelligence and machine learning are often connived to problems that munotes.in

## Page 270

269Chapter 9: Genetic Algorithm

circulate exclusively inside academic g roups that study a particular methodology.

These problems typically have little relevance to any issues pursued by any scientist

or engineer outside the fields of artificial intelligence and machine learning.

ln his 1983 talk entitled "Al: Where It Has Been and Where It Is Going," machine

learning pioneer Arthur Samuel said:

The aim is …… to get machines to exhibit behaviour, which of done by human,

would be assumed to involve the use of intelligence.

Samuel’s statement reflects the common goat articulated by the pioneers of the

1950s in the fields of artificial intelligence and machine learning. Indeed, getting

machines to produce human like results i s the reason for the existence of the fields

of artificial intelligence and machine learning. To make this goal more concrete,

we say that a result is “human -competitive” if it satisfies one or more of the eight

criteria in Table 9-8. These eight criteria have the desirable attribute of being at

arms -length from the fields of artificial intelligence, machine learning and GP. That

is a result cannot acquire the rating of ‘human -competitive’ merely because it is

endorsed by researchers inside the specialized fields that are attempting to create

machine intelligence, machine learning and GP. That is, a result cannot acquire the

rating of ‘human -competitive’ merely because it is endorsed by researchers inside

the specialized fields that are attempting to create machine intelligence. Instead a

result produced by an automated method must earn the rating of human -

competitive dependent of the fact that it was generated by an automated method.

9.16.2.2 High -Return

What is delivered by the accrual automated operation of an artificial method in

comparison to the amount of knowledge, information, analysis and intelligence that

is pre -supplied by the human employing the method?

We define the AI ratio (the 'artificial -to-intelligence' ratio) of a probl em-solving

method as the ratio of that which is delivered by the automated operation of the

artificial method to the amount of intelligence that is supplied by the human

applying the method to a particular problem.

Table 9·8 Eight criteria for saying that an automatically created res earch is

human -competitive

--------------------------------------------------------------------------------------------------

Criterion

-------------------------------------------------------------------------------- ------------------ munotes.in

## Page 271

270SOFT COMPUTING TECHNIQUES

A The result was patented as an invention in the past, is an improvement over a

parented invention or would quality today as a permeable new invention.

B The result is equal to or beuer than a result that was accepted as a new

scientifi c resu lt at the time when it was published in a peer -reviewed

scientific journal.

C The result is equal to better than a result that was placed into a database or

archive of results maintained by an internationally recognized panel of

scientific experts .

D The result is publishable in its own right as a new scientific result -

independent of the fact that the result was mechanically created.

E The result is equal to or better than the most rece nt human -created solution

to a lo ng-standing problem for which there has been a succession of

increasingly better human -created solutions.

F The result is equal to or better than a research that was considered an

achievement in its field at the time it was first discovered .

G The result solves a problem of indisputable difficulty in its field.

H The result holds its own or wins a regulated tom petition involving human

contestants (in the form of either live human players or human -written

computer programs).

-------------------------------------------------------------------------------------------------- The AI ratio is especially pertinent to methods for getting computers to automatically solve problems because it measures the value added by the artificial

problem -solving method. Manifestly, the aim of the fields of artificial intelligence

and machine learning is to generate human -competitive results with a high AI ratio.

Deep Blue: An Arnficin/ lme//igence Milestone (Newborn, 2002) describes the

1997 defeat of the human world thess champion Garry Kaspatov by the Deep Blue

computer system. This commanding example of machine indigence is clearly a

human -competitive result (by virtue of satisfying criterion H of Table 9-8). Feng -

Suing Hsu (the system architect and chip designer for the Deep Blue project )

recounts the intensive work on the Deep Blue project at IBM's T. J. Watson

Research Centre between 1989 and 1997 {Hsu, 2002). The team of scientists and

engineers spent years developin g the software and the specialized computer chips

to efficiently evaluate large numbers of alternative moves as part of a massive

parallel state -space search. In short, the human developers invested an enormous

amount of "!" in the project . In spite of the fact that Deep Blue delivered a high munotes.in

## Page 272

271Chapter 9: Genetic Algorithm

{human -competitive ) amount of "A," the project has a low return when measured

in terms of the A -to-l ratio.

The aim of the fields of artificial intelligence and machine learning is to get

computers to automatically gen erate human -competitive results with a high AI

ratio- not to have humans generate human -competitive results themselves.

9.16.2.3 Routine

Generality is a precondition to what we mean when we say that an automated

problem -solving method is " combine" Once the generality of a method is established, " routineness " means that relatively little human effort is required to get

the method to successfully handle new problems wit hin a particular domain and to

successfully handle new problems from a different domain. The ease of making the

transition to new problem lies at the hearr of what we mean by routine . A problem -

solving method cannot be considered routine if its executional steps must be

substantially augmented, deleted, rearranged, reworked or customized by the

human user for each new problem.

9.16.2.4 Machine Intelligence

We use the term machine intelligence to refer to the broad vision articulated in

AJan Turing's 1948 p aper emided "Intelligent Machinery" and his 1950 paper

entitled "Computing Machinery and Intelligence."

In the 1950s, the terms machine intelligence, artificial intelligence and machine

learning all referred to the goal of getting "machines to exhibit behaviour, which if

done by humans, would be assumed to involve the use of intelligence" {to again

quote Art hur Samuel).

However, in the intervening five decades, the terms "artificial intelligence" and

"machine learning" progressively diverged from their original goal -oriented

meaning. These terms are now primarily associated with particular methodologies

for a ttempting to achieve the goal of getting computers to automatically solve

problems. Thus, the term "artificial intelligence" is today primarily associated with

attempts to get computers to solve problems using methods that rely on knowledge,

logic, and var ious analytical and mathematical methods. The term "machine

learning" is today primarily associated with attempts to get computers to solve

problems that use a particular small and somewhat arbitrarily chosen set of

methodologies (many of which are statist ical in nature). The narrowing of these

terms is in marked contrast to the broad field envisioned by Samuel at the time

when he toned the term "machine learning" in the 1950s, the thatter of the original

founders of the field of artificial indigence , and t he broad vision encompassed by munotes.in

## Page 273

272SOFT COMPUTING TECHNIQUES

Turing's term "machine intelligence." Of course, the shift in focus from broad goals

to narrow methodologies is an all too common sociological phenomenon in

academic research.

Turing's term "machine intelligence" did not unde rgo this arteriosclerosis because,

by accident of history, it was never app ropriated or monopolized by any group of

academic researchers whose primary dedication is to a particular methodological

approach. Thus, Turing's term remains catholic today. We pre fer to use Turing's

term because it still communicates the broad goal of getting computers to

automatically solve problems in a human -like way. ,

In his 1948 paper, Turing identified three broad approa ches by which human

competitive \'e machine intelligence might be achieved: The first approach was a

logic -driven search. Turing's interest in this approach is not surprising in light of

Turing's own pioneering work in the 1930s on the logical foundations of

computing. The second approach for achieving machine intelligence was what he

called a "cultural search" in which previously acquired knowledge is accumulated,

stored in libraries and brought to bear in solving a problem - the approach taken by

modern knowledge -based expert systems. Turing's first two approa ches have been

pursued over the past 50 years by the \'past majority of resear chers using the methodologies that are today primarily associated with the term "artificial inelegance.''

9.16.3 Data Representation

Without any doubt, programs can be considered as strings. There are, however, two

important limitations which make it impossible to use the representations and

operations from our simple GA:

l. It is mostly inappropriate to assume a fix ed length of programs.

2. The probability to obtain syntactically correct programs when applying our

simple initialization crossover and mutation procedures is hopelessly low.

Lt is, therefore, indispensable to modify the data representation and the opera tions

such that syntactical correctness is easier to guarantee. The common approach to represent programs in GP is to consider programs as trees. By doing so, initialization can be done recursively, crossover can be done by exchanging sub

trees and random replacement of sub trees can serve as mutation operation.

Since their only construct are nested lists programs in LISP -like languages already

have a kind of tree -like Structure. Figure 9-48 shows an example how the function

3x + sin(x + I) can be implemented in a LISP like language and how such an LISP -

like Function can he split up into a tree. Let can be noted that the tree n: presen tation

corresponds to the nested lists. The program consists of tonic expressions, like munotes.in

## Page 274

273Chapter 9: Genetic Algorithm

variables and constants, whi ch act as leave nodes while functions act as no leave

nodes

Figure 9-48 The tree representation of 3x+ sin (x + 1).

There is one important disadvantage of the LISP approach -iris difficult to introduce

type checking. In case of a purely numeric function like in the above example, there

is no problem at all. However, it can be desirable to process numeric data, .mings

and logical expressions simultaneously. This is difficult to handle if we use a tree

representation like that in Figure 948.

A. Geyer -Schulz bas proposed a very general approach, which overcomes this

problem allowing maximum flexibility. He suggested representing programs by

their syntactical derivation trees with respect to a recursive 'definition of underlying

language in Back us-Naur form (BNF). This works for any ton text -free language.

He is far beyond the stop of this lecture to go into much derail about formal

languages. We will explain the basics with the help of a simple example. Consider

the following language which is suitable for implementing binary logical expressions:

The BNF description consists of so -called syntactical rules. Symbols in angular

brackets < > are called nomerminal symbols, i.e. symbols which have to be

expanded. Symbols between quotation marks are c alled terminal symbols, i.e., they

cannot be expanded any further. The first rule S:= defines the staining

symbol. A BNF rule of the general shape,

< non terminal > := < deriv1 > | < deriv2> | ... | < deriv11 >;

munotes.in

## Page 275

274SOFT COMPUTING TECHNIQUES

defines how a non -terminal symbol may b e expanded, where the different varies

are separated by vertical bars.

In order to get a feeling of how to work with the BNF grammar description, we will

now show step -by-step how the expression (NOT ( x OR y)) can be deriva ted from

the above language. For simplicity, we omit quota tion marks for the terminal

symbols:

1. We have to begin with the start symbol:

2. We replace hexpi with the se cond possible derivation:

o ()

3. The symbol may only he expanded with the terminal symbol NOT:

( ) o (NOT < exp>i

4. Next. we replace: with the third possible deriva tion:

(NOT ) o (NOT {))

5. We expand the se cond possible derivation for :

(NOT ( )) o (NOT ( OR ))

6. The first occurrence of is expanded with the first derivation:

(NOT ( OR )) o (NOT (` OR ) ) `

7. The . second occurrence of is expanded with the first derivation, too:

(NOT ( OR < exp> )) o (NOT ( ` OR `` )) `

8. Now we replace the first ` with the corresponding first alternative: `

(NOT ( < var> OR ` )) o (NOT tx OR `` )) `

9. Finally, the last non -terminal symbol is expanded with the second alternative:

(NOT ix OR `) ) o (NOT tx OR y)) `

Such a recursive derivation has an inherent tree structure. For the above example,

this derivation tree has been visualized in Figure 9.49. The syntax of modern

programming languages can be specified in BNF. Hence, our data model w ould be

applicable to all of them. The question is whether this is useful. Koza’s hypothesis

includes that the programming language has to be chosen such that the given

problem is solvable. This does not necessarily imply that we have no choose the

languag e such that virtually any solvable problem can be solved. It is obvious that

the size of the search grows with the complexity of the language. We know that the

size of the search space influences the performance of a GA – the larger the munotes.in

## Page 276

275Chapter 9: Genetic Algorithm

language. We know t hat the size of the search space influences the performance of

a GA – the larger the slower.

It is therefore, recommendable to restrict the language to necessary constructs and

to avoid superfluous constructs. Assume, for example, that we want to do symbol ic

regression, but we are only interested in polynomials with integer coefficients. For

such an application, it would be an overkill to introduce rational constants or to include exponential functions in the language. A good choice could be the following.

For repre senting rational functions with integer coefficients, it. is sufficien t to add

the division symbol "f" to the possible derivations of the binary operator .

Figure 9·49 The derivation tree of (NOT (x OR y)).

munotes.in

## Page 277

276SOFT COMPUTING TECHNIQUES

Another example: The following language could be app ropriate for dis covering

trigonometric identities:

There are basically two different variants of how w generate random programs with

respect to a given BNF grammar:

l. Beginning from the starting symbol, it is possible to expand nonterminal

symbols recursively, where we have to choose randomly if we have more

than one alternative derivation. This approach is simple and fast, but has

some disadvantages: First, it is almost impossible to rea lize a uniform

distribution. Se cond, one has to implement some constraints with respect to

the depth of the derivation trees in order to avoid excessive growth of the

programs. Depending on the complexity of the underlying grammar, this can

be a tedious ta sk.

2. Geyer -Schulz has suggested to prepare a list of all possible derivation trees

up to a certain depth and to select from this list randomly applying a uniform

distribution. Obviously, in this approach, the problems in terms of depth and

the resulting probability distribution are elegantly solved, but these advantages go along with considerably long computation times.

9. 16.3. 1 Crossing Programs

It is trivial to see that primitive string -based crossover of programs almost never

yields syntactically correct program. Instead, we should use the perfe ct syntax

information a derivation tree provides. Already in the USP times of Gp, sometime

before the BNF -based repre sentation was known, crossover was usually implemented as the exchange of randomly selected subtrees. In case that the

subtrees (sub expressions) may have different types of return values (e.g., logical

and numerical), it is not guaranteed iliar crossover preserves syntactical correctness.

The derivation tree based representation over comes this problem in a very elega nt

way. If we only exchange subtrees which start from the same no nterminal symbol,

crossover can never violate syntactical correctness. In this sense, the derivation tree

munotes.in

## Page 278

277Chapter 9: Genetic Algorithm

model provides implicit type checking. In order to demonstrate in more detail how

this crossover operation works, let us reconsider the example of binary logical

expressions. k parents, we take the following expressions:

(NOT (x OR y))

((NOT x) OR (x AND y))

Figure l5 -50 shows graphically how the two children (NOT (x OR (x AND y)))

((NOT x) OR y) are obtained.

Figure 9-50 An example for crossing two binary logical expressions.

munotes.in

## Page 279

278SOFT COMPUTING TECHNIQUES

Figure 9·51 An example for making a derivation tree

9.16.3.2 Mutating Programs

We have always considered mutation as the random deformation of a ch romosome. It is therefore, not surprising that the most common mutation in genetic programming is the random replacement of a randomly selected subtree. The only

modification is that we do not necessarily star t from the start symbol but from the

nonterminal symbol at the root of the subtree we consider. Figure 9.51 shows as

example where in the logical expression (NOT (x OR y)). Te variable y is replaced

by (N OT y).

9.16.3.3 The Fitness Function

There is no common recipe for specifying an appropriate fitness functions which

wrongly depends on the given problem. It is, however, worth emphasizing that it is

necessary to provide enough information to guide the GA to the solution. More

specifical ly, it is not sufficient to define a fitness func tion which assigns 0 to a

program which does not solve the problem and 1 to a problem. Such a fitness

function would correspond to needle -in-haystack problem. In the sense a proper

fitness measure should be a gradual concept for judging the correctness of

programs.

In many applications, the fitness function is based on a comparison of desired and

actually obtained output. Koza, for instance, uses the simple sum of quadriati c

errors for symbolic regression and the discover of trigonometric identities:

munotes.in

## Page 280

279Chapter 9: Genetic Algorithm

In this definition, F is the mathematical function which corresponds to the program

under evalu ation. The list (x i, y), 1 < 1 < N consists of reference pairs – a desired

output y, is assigned to each input 1. C heck the samples have to be chosen such

that the considered input space is covered sufficiently well.

Numeric error -based fitness functions usually imply minimization probl em. Some

other applications may imply maximization tas ks. There are basically two well -

known transformation which allow to standardize fitness functions such that always

minimization or maximization tasks are obtained.

Consider an arbitrary “raw” fitness function f. Assuming that the number of

individuals in the population is not fixed (m, at time t), the standardized fitness is

computed as

It f has to be maximized and as

If f has to be minimized. One possible varia nt is to consider the best individual of

the last k generations instead of only considering the ac tual generation.

Obviously, standardized fitness transfo rm’s any optimization problem into a

minimization task. Roulet te wheel selection relies on the fact that the objective is

maximization of the fitness function. Koza has suggested a simple transformation

such tha t, in any case, a maximization problem is obtained.

With the assumptions of previous definition, the adjust ed fitness is computed as

Another variant of adjusted fitness is defined as

For applying GP w a given problem, the following points have to be satisfied.

1. An app ropriate fitness function, which provides enough information to guide

the GA to the solution (mostly based on examples).

munotes.in

## Page 281

280SOFT COMPUTING TECHNIQUES

2. A sy ntrac tical description of a programming language, which contains as

much elements as necessary for solving the problem.

3. An in terpre ter for the programming language. The main application areas of GP include: Computer Science, Science, Engineering, and entertainment.

9.17 Advantages and Limitations of Genetic Algorithm

The advantages of GA are as follows :

1. Parallelism.

2. Liability.

3. Solution space is wider.

4. The fitness landscape is complex.

5. Easy to dis cover global optimum.

6. The problem has multi objective function.

The limitations of GA are as follows:

1. The problem of identifying fitness function.

2. Definition of represe ntation for the problem.

3. Premature convergence occurs.

4. The problem of choosing various parameters such as the size of the population, mutation rare, crossover rare, the selection method and its

strength.

9.18 Applications of Genetic Algorithm

An effective GA representation an d meaningful fitness evaluation are the keys of

the success in GA applications. The appeal of GAs tomes & on their simplici ty and

elegance as to bust search algorithms as well as from their power to dis cover good

solutions rapidly for difficult high-dimensional problems. GAs are useful and

efficient when

1. the search space is large, complex or poorly understood;

2. domain knowledge is scarce or expert kno wledge is difficult to en code to

narrow the search space;· .

3. no mathematical analysis is available; munotes.in

## Page 282

281Chapter 9: Genetic Algorithm

4. traditional search methods fail.

The advantage of the GA approach is the ease with which it can handle arbitrary

kinds of constraints and objectives; all such things can be handled as weighted

components of the fitn ess function, making it easy to adapt the GA s cheduler to the

particular requirements of a very wide range of possible overall objectives.

GAs have been used for problem -solving and for modeling. GA are applied to many

scientific, engineering problems, in business and entertainment including:

1. Optimization: GAs have been used in a wide varie ty of optimization tasks,

including numerical optimization and combinatorial optimization problems

such as traveling salesman problem (TSP), circuit desi gn (Louis, 1993), job

shop s cheduling (Goldstein, 1991) and video &sound quali ty optimization.

2. Automatic programming. GAs have been used to evolve computer programs

for specif ic tasks and to design other commercial structures, for example,

cellular au tomata and sorting networks.

3. Machine and robot learning. GAs have been used for many machine -learning

applications, including classifications and prediction, and p rotein structure

prediction. GAs have also been used to design neural networks, to evol ve

rules for learning classifier systems or symbolic production systems, and to

design and control robots.

4. Economic models: GAs have been used to model processes of innovation,

the development of bidding strategies and the emergence of e conomic

markets.

5. Immune system models: GAs have been used to model various aspects of the

natural immune system, including somatic mutation during an individual's

lifetime and the dis covery of multi -gene families during evolutionary time.

6. Ecologjcal models : GAs have been used to model e cological phenomena

such as biological arms races, host -parasite to evoluti ons, symbiosis and

resource flow in e cologies.

7. Population genetics models: GAs have been used to study questions in

population genetics, such as 'under what conditions will a gene for recombination be evolutionarily viable?'

8. Interactions between evolution and learning. GAs have been used to study

how individual learning and species evolution affect' one another. munotes.in

## Page 283

282SOFT COMPUTING TECHNIQUES

9. Models of social systems: GAs have been used to study evolutionary aspects

of social systems, such as the evolution of cooperation (Chughtai, 1995), the

evolution of communication and trail -following behavior in ants.

9.19 Summary

Genetic algorithms are original systems based on the supposed functioning of the

living. The method is very different & the classical optimization algorithms as it:

1. Uses the en coding of the parameters, not the parameters themselves.

2. Works on a population of points, not a unique one.

3. Uses the only values of the function to optimize, not their derived function or

other auxiliary knowledge.

4. Uses p robabilistic transition function and not determinist ones.

lt is important to understand that the functioning of such an algorithm does not

guarante e success. The problem is in a stochastic system and a genetic pool may be

too far from the solution, or for example, a too fast convergence may hair the

process of evolution. These algorithms are, nevertheless, extremely efficient, and

are used in fields as diverse as stock exchange, production s cheduling or programming of assembly robots in the automotive industry.

GAs can even be faster in f inding global maxima that conventional methods, in

particular when derivatives provide misleading information. It s hould be noted that

in most cases where conventional methods can be applied, GAs are much slower

because they do not take auxiliary information such as derivatives into ac count. In

these optimization problems, there is no need to apply a GA, which gives less

accurate solutions after much longer computation time. The enormous poten tial of

GAs lies elsewhere - in optimization of non -differentiable or even dis continuous

functions, discrete optimization, and program in junction.

lt has been claimed that via the operations of selection, crossover and mutation, the

GA will converge over successive generations towards the global (or near global)

optimum. This simple operation should produce a fast, useful and to bust technique

large ly becau se of the face that GAs combine direction and chance in the search in

an effective and eff icient manner. Since population implici ty contain much more

information than simply the individual fitness stores, GAs combine the good

information hidden in a soluti on with good information from another solution to produce new solutions with good information inherited from both parents, inevitabl e}' (hopefully) leading towards optimality. munotes.in

## Page 284

283Chapter 9: Genetic Algorithm

In this chapter we have also discussed the various classifications of GAs. The class

of parallel GAs is very complex, and its behavior is affected by many parameters.

It seems that the only way to achieve a greater understanding of parallel GAs is to

study individual facets independent!}', and we have seen that some of the most

influential publications in parallel GAs concentrate on only one inspect (migration

rates, communica tion topology or deme size) either ignoring or making simplifying

assumptions on the others. Also the hybrid GA, adaptive GA, independent sampling GA and messy GA has been included with the necessary information.

Genetic programming has been used to model and control a multitude of processes

and to govern their behavior ac cording to fit ness based automatically genera ted

algori thm. Implementation of generic programming will benefit in the coming year

from new approa ches which include research from developmental biology. Also, it

will be necessary to learn to handle the redundancy forming pressures in the

evolution of to the. Application of genetic programming will continue to broaden.

Many applications focus on controlling behaviour of real or virtual agents. In this

role, genetic programming may contribute considerably to the growing field of

social and behavioural simulations. A brief discussion on Holland classifier system

is also included in this chapter.

9.20 Review Questions

1. State Charles Darwin's theory of evulsions.

2. What is meant by genetic algorithm?

3. Compare and contrast traditional algorithm and genetic algorithm.

4. Stare the importance of genetic algorithm.

5. Explain in detail about the various operators involved in genetic algorithm.

6. What the various types of crossover and mu tation technique s?

7. With a neat flowchart, explain the operation of a simple genetic algorithm.

8. State the general genetic algorithm.

9. Discuss in detail about the various types of genetic algorithm in derail.

10. State schema theorem.

11. Write than note on Holland classifier systems.

12. Differentiate between messy GA and parallel GA

13. What is the importance of hybrid GAs? munotes.in

## Page 285

284SOFT COMPUTING TECHNIQUES

14. Describe the concepts involved in real -coded genetic algorithm.

15. What is genetic programming?

16. Compare genetic algorithm and genetic programming.

17. List the characteristics of genetic programming.

18. With a neat flowchart, explain the operation of genetic programming.

19. How are data represented in genetic programming?

20. Mention the application of genetic algorithm .

Exercise Problems

1. Determine the maximum of function x x x5 (0.007x+ 2) using genetic

algorithm by wiring a program.

2. Determine the maximum of function exp( -3x) + sin(6 r x) using genetic

algorithm. Given range = [0.004 0.7]; bits = 6; population = 12; generations

= 36; mutation = 0.005; mutation = 0.3.

3. Optimize the logarithmic function using a genetic algorithm by writing a

program. Genetic Algorithm

4. Solve the logical AND function using genetic algorithm by writing a

program.

5. Solve the XNOR problem using genetic algorithm by writing a program.

6. Determine the maximum of function exp(5x) + sin (7rr x) using genetic

algorithm. Given range = [0.002 0.6]; bits = 3; population == 14; generations

= 36; mutation = 0.006; matenum = 0.3.

REFERENCES

https://link.springer.com/article/10.1007/BF0 0175354

https://www.csd.uwo.ca/~mmorenom/cs2101a_moreno/Class9GATutorial.pdf

https://www.egr.msu.edu/~goodman/GECSummitIntroToGA_Tutorial -goodman.pdf

https://www.researchgate.net/publication/228569652_Genetic_Algorithm_A_Tutorial_

Review

S.Rajasekaran, G. A. Vijayalakshami , Neural Networks, Fuzzy Logic and Genetic

Algorithms: Synthesis & Applications, Prentice Hall of India, 2004

munotes.in

## Page 286

285Chapter 10: Hybrid Soft Computing TechniquesUNIT 5 10 HYBRID SOFT COMPUTING TECHNIQUES Learning Objectives • Neuro-fuzzy hybrid systems. • Comparison of fuzzy systems with neural networks. • Properties of Neuro-fuzzy hybrid systems. • Characteristics of Neuro-fuzzy hybrids. • Cooperative neural fuzzy systems. • General Neuro-fuzzy hybrid systems. • Adaptive Neuro-fuzzy Inference System (ANFIS) in MATLAB. • Genetic Neuro hybrid systems. • Properties of genetic Neuro hybrid systems. • Genetic algorithm based back-propagation network (BPN). • Advantages of Neuro-genetic hybrids. • Genetic fuzzy hybrid and fuzzy genetic hybrid systems. • Genetic fuzzy rule based systems (GFRBSs). • Advantages of genetic fuzzy hybrids. • Simplified fuzzy ARTMAP. • Supervised ARTMAP system. 10.1 Introduction In general, neural networks, fuzzy systems and genetic algorithms are distinct soft computing techniques evolved from the biological computational strategies and nature's way to solve problems. All the above three techniques individually have provided efficient solutions to a wide range of simple and complex problems pertaining to different domains. As munotes.in

## Page 287

286SOFT COMPUTING TECHNIQUES

discussed these three techniques can be combined together in whole or in pa rt, and

may be applied to find solution to the problem s, where the techn iques do not work

indiv idually. The main aim o f the concept of hybridi zation is to overcome the

weakness in one technique. While applying it and bringing out the strength of the

other technique to find solution by combining them. Every soft computing

technique has particular computational parameters (e.g., ability to learn, decision

making) which make them suited for a particular problem and not for others. It has

to be noted that neural networks a re good at recognizing pa tterns but they are no t

good at explaining how they reach their decisions. On the contrary, fuzzy logic is

good at explaining the decisions but cannot au tomatically acquire the rules used for

making the decisions. Also, the tuning of membership functions becomes an

important issue in fuzzy modelling . Since this tuning can be viewed as an

optimization problem , either neural network (Hopfield neural network gives

solution to optimization problem ) or genetic algorithms offer a possibility to solve

this problem . These limi tations act as a central driving force for the creation of

hybrid soft computing systems where two or more techniques are combined in a

suitable manner that overcomes the limitations of indivi dual techniques.

The importance of hybrid system is based on the varied nature of the ap plication

domains. Many complex domains have several different component problem s each

of which may require different types of process ing. When there is a complex

appli cation which has tw o distinct sub-problem s, say for example, a signal

process ing and serial shift reasoning, then a neural network and fuzzy logic can be

used for solving these individual tasks, respectively. The use of hybrid systems is

growing rapidly with successful applications in areas such as engineering design,

stock market analysis and prediction, medical diagnosis, process control, credit

card analysis, and few other cognitive simulations.

Thus, even though the hybrid soft computing systems have a great potential to solve

problem s, if not applied app ropriately they may result in adverse solutions. It is not

necessary that when individual techniques give good solution, hybrid systems

would gi ve an even be tter solution. The key driving force is to build highly

automated, intelligent machines for the future generations using all these techniques.

10.2 Neuro-Fuzzy Hybrid Systems

A neuro-fuzzy hybrid system (also called fuz zy neural hybrid), p roposed by J. S. R.

Jang, is a learning mechanism that utilizes the training and learning algorithms

from neural networks to find parameters of a fuzzy system (i.e., fuzzy sets, fuzzy munotes.in

## Page 288

287Chapter 10: Hybrid Soft Computing Techniques

rules, fuzzy numbers, and so on). It can also be defined as a fuzzy system that

determines its parameters by process ing data samples by using a learning algorithm

derived from or inspired by neural network theory. Alternately, it is a hybrid

intelligent system that fuses a rtificial neural network s and fuzzy logic by combining the learning and connectionist structure of neural net \works with

human -like reasoning style of fu zzy systems.

Neuro -fuzzy hybridization is widely termed as Fully Neural Net work (FNN) or

Neuro -Fuzzy System (NFS). The human like reasoning style of fully systems is

incorporated by NFS (the more popular term is used henceforth) through the use of

fuzzy sets and a linguistic model consist ing of a set of lF-THEN fuzzy rules. NFSs

are universal approximates with t he ability to solicit imerpretable lF -THEN rules;

this is their main strength. However, the strength of NFSs involves imerpretabilit y

versus accuracy, requireme nts that are contradic tory in fuzzy modelling .

In the field of fuzzy modelling research, the Neuro -fuzzy is divided in to two areas:

l. Linguistic fuzzy modelling focused on imerpretabili ty (mainly the Mamdani

model).

2. Precise fuzzy modelling focused on accuracy [mainly the Takagi -Sugeno -

Kang (TSK) model].

10.2.1 Comparison of Fu zzy Systems with Neural Networks

From the existing literature, it can be noted that neural networks and fuzzy systems

have some things in common. If there does not exist any mathematical model of a

given problem , then neural networks and fuzzy systems can be used for solving that

problem (e.g:, pattern recognition, regression, or density es timation). This is the

main reason for the growth of the intelligent computing techniques. Besides having

individual adva ntages, they do have certain disadvantages that are overcome by

combining both concepts.

When neural networks are concerned, if one problem is expressed by sufficient

number of observed examp les then only it can be used. These observations are used

to train the black box. Though no prior knowledge about the problem is needed

extracting comprehensible rules from a neural net work's structure is very difficult.

A fuzzy system, on the other hand, does not need learning examples as prior

knowledge; rather linguistic rules are required. Moreover, linguistic description of

the input and output variables should be given. If the knowledge is incomplete,

wrong or contra dictory , then the fuzzy system must be run ed. This is a time

consuming process . Table 10.1 shows how combining both approach es brings ou t

the advantages, leaving out the disadvantages. munotes.in

## Page 289

288SOFT COMPUTING TECHNIQUES

Table 10-1 Comparison of neural and fuzzy processing

--------------------------------------------------------------------------------------------------

Neural processing Fuzzy processing

--------------------------------------------------------------------------------------------------

Mathematical model not necessary Mathematical model not necessary

Learning can be done from scratch A prior kn owledge is needed

There are severa1 learning algorithms Learning is not' possible

Black -box behaviour Simple interpretation and implementation

-------------------------- ------------------------------------------------------------------------

10.2.2 Characteristics of Neuro -Fuzzy Hybrids

The general architecture of Neuro -fuzzy hybrid system is as shown in Figure 5.2-

I. A fuzzy system -based NFS is trained by means of a data -driven learning method

derived from neural net work theory. This heuristic causes local changes in the

fundamental fuzzy system. At any stage of the learning process - before, during, or

after- it can be represented as a set of fuzzy rules. For ensuring the semantic

properties of the underlying fuzzy system, the learning procedure is constrained.

An NFS approximates ann-dimensional unknown function, partly represented by

training examples. Thus fuzzy rules can be interpreted as vague prototypes of the

training data. As shown in Figure 5.2-1, an NFS is Inputs Outputs given by a three -

layer feed forward neural network model. It can also be observed that the first layer

corresponds m the input variables, and the second and third layers correspond to

the fuzzy rules and output variables, respectively. The fu zzy sets are converted to

(fuzzy) connection weights.

Figure 10.1 Architecture of Neuro -fuzzy hybrid system.

munotes.in

## Page 290

289Chapter 10: Hybrid Soft Computing Techniques

NFS can also be considered as a sy stem of fuzzy rules wherein the system can be

initialized in the form of fuzzy rules based on the prior knowledge available. Some

researchers use five layers - the fuzzy sets being encoded in the units of the second

and the fourth layer, respectively. It is, however, also possible for these model s to

be transformed in to three -layer architecture .

10.2.3 Classifications of Neuro ·Fuzzy Hybrid Systems

NFSs can be classified imo the following two systems:

l. Cooperative NFSs.

2. General Neuro -fuzzy hybrid systems.

10.2.3.1 Cooperative Neural Fuzzy Systems

In this type of system, the artificial neural network (ANN) and fuzzy system work

independently from each other. The ANN attempts to learn the parameters from the

fuzzy system. Four different kinds of coopemive fuzzy neural networks are shown

in Figure 10.2

The FNN in Figure 10.2(A) learns fuzzy set from the given gaining data. This is

done, usually, by fining membership functions with a neural network; the fuzzy

sets then being determined offline. This is follow ed by their utilization m form the

fuzzy system by fu zzy rules that are given, and not learned. The NFS in Figure 10-

2(8) determines, by a neural network, the fuzzy rules from the training data. Here

again, the neural networks learn offline before the fuzzy system is initialized. The

rule learning happens usually by clustering on self -organizing feature maps. There

is also the possibility of applying fuzzy clustering methods to obtain rules.

For the neu ro-fuzzy model shown in Figure l 0-2(C), the parameters of membership

function are learnt online, while the fuzzy system is applied. This means that,

initially, fuzzy rules and membership functions mus t be defined beforehand. A lso,

in order to improve and guide the learning step, the error has to be measured. The

model shown in Figure 10-2(0) determines the rule weights for all fuzzy rules by a

neural network. A rule is determined by its rule weight -interpreted as the influence

of a rule. They are then multiplied with the rule output. munotes.in

## Page 291

290SOFT COMPUTING TECHNIQUES

Figure 10.2 Cooperative neural fuzzy systems.

10.2.3.2 General Neuro ·Fuzzy Hybrid Systems (General NFHS)

General Neuro -fuzzy hybrid systems (NFHS) resemble neural networks where a

fuzzy system is interpreted as a neural network of special kind. The architecture of

general NFHS gives it an ad van rage because there is no communication between

fuzzy system and neural network. Figure 16-3 illustrates an NFHS. In this figure

the rule base of a fuzzy system is assumed to be a neural network; the fuzzy sets

are regarded as weights and the rules and the input and output variables as Neuro ns.

The choice m include or discard Neuro ns can be made in the learning step. Also,

the fuzzy kno wledge base is represented by the Neuro ns of the neural network; this

overcomes the major drawbacks of both underlying systems.

Membership functions expressing the linguistic terms of the inference rules should

be formulated for building a fuzzy controller . However, in fuzzy systems, no formal

approach exists to define these functions. Any shape, such as Gaussian or triangular

or bell shaped or trapezoidal, can be considered as a membership function with an

munotes.in

## Page 292

291Chapter 10: Hybrid Soft Computing Techniques

arbitrary set of parameters. Thus for fuz zy system s, the optimization of these

functions in terms of generalizing the data is very important; this problem can be

solved by using neural ne tworks.

Using learning rules, the neural network must optimize cite parameters by fixing a

distinct shape of the membership functions; for example, triangular. But regardless

of the shape of the membership functions, training data should also be available.

The Neuro fuzzy hybrid systems can also be modelled in an another method. In

cit's case, the training data is grouped into several clusters and each duster is

designed to represent a particular rule. These rules are defined by the crisp data

points and are not defin ed linguistically. Hence a neural network , in this case, might

be applied to train the defined dusters. The resting can be carried out by presenting

a random resting sample to the trained neural network . Each and every output unit

will return a degree which extends to fit to the anr.ecedem of rule.

Figure 16·3 A general Neuro -fuzzy hybrid system.

10.2.4 Adaptive Neuro -Fuzzy Inference System (ANFIS) in MATLAB

The basic idea behind this Neuro -adaptive learning technique is very simple. This

technique provides a method for the fuzzy modelling procedure m learn information about a data se t, in order to compute the membership function

parameters that best allow the associated fuzzy inference system to track the given

input output data. This learning method works similarly to that of neural networks.

ANFIS Toolbox in MATLAB envi ronment performs the membership function

parameter adjustments. The function name used in activate this molbox in anfis.

ANFIS toolbox can be opened in MATLAB either at command line p tompt or at

Graphical User Interface. Based on the given input -output dam set, ANFIS mol box

munotes.in

## Page 293

292SOFT COMPUTING TECHNIQUES

builds a Fuzzy Inference System whose membership functions are adjusted either

using back Propagation network training algorithm or Adaline network algorithm,

which uses least mean square learning rule. This makes the fuzzy syHem to learn

from the data they model.

The Fuzzy Logic Toolbox function that accomplishes this membership function

parameter adjustment is called anfis. The ac tonym ANFIS derives its name from

adaptive Neuro -fuzzy inference system. The anfis function can be accessed either

from the command line or th tough dte ANFIS Edi tor GUI. Using a given

input/output data set, the toolbox function anfis constructs a fuzzy inference system

(FIS) whose membership function parameters are adjusted using either a back -

Propagation algorithm alone or in combination with a least squares type of method.

This enables fuzzy systems w learn from the data they are madeling.

10.2.4. 1 FIS Structure and Parameter Adjustment

A network -type structure similar to that of a neural network can be used to interpret

the input/output. This struc ture maps inputs th tough input membership functions

and associated parameters, and then th rough output membership functions and

associated parameters to output s. During the lear ning process , the parameters

associated with the membership functions will change. A gradient vec tor facilitates

the compu tation (or adjustment) of these parameters, p roviding a measure of ho w

well the fuzzy inference system models the input /output data fo r a given set of

parameters. After obtaining the gradient vec tor, any of several optimization

routines could be applied to adjust the parameters for reducing some er ror measure

(defined usually by the sum of the squared difference between the accrual and

desired outputs). anfis makes use of either back -propagacion or a combination of

adaline and back -propaga tion, for membership function parameter estimation.

10.2.4.2 Constraints of ANFIS

When compared to the general fuzzy infetence.sysrems anfis is more complex. It

is not available for all of the fuzzy inference system options and only supports

Sugeno -rype systems. Such systems have the following properties:

1. They should be the first- or ze toth-order Sugeno -type systems.

2. They should have a single ourpur that is obtained using weighted average

defuzzificarion. All outpu t membership functions must be the same rype and

can be either linear or constan t.

3. They do not share rules. The number of output membership functions must

be equal to the number of rules.

4. They must have unity weight for each rule. munotes.in

## Page 294

293Chapter 10: Hybrid Soft Computing TechniquesIf FIS structure does not comply with these constraints then an error would occur. Also, all the customization options that basic fuzzy inference allows cannm be accepted by anfis. In simpler words, membership functions and defuzzification functions cannot be made according to one's choice, rather those p tovided should be used. 10.2.4.3 The ANFIS Editor GUI To ger started with the ANFIS Edi tor GUI, type anfisedit at the MATLAB command prompt. The GUI as in Figure 10-4 will appear on your screen.

Figure 10-4 ANFIS Editor in MATLAB. From this GUI one can: 1. Load data (training, resting and checking) by selecting app topriate radio

buttons in the Load Data portion of the GUI and then cliclcing Load Data.

The loaded data is planed on the plot region. 2. Geneme an initial FIS model or load an initial FlS model using the options in

the Generate FIS portion of the GUI. 3. View the FIS model structure once an initial FIS has been generated or loaded

by clicking the Structure button. 4. Choose the FIS model parameter optimization method: back -Propagation ora

mixture of back -propagation and least squar es (hybrid method). 5. Choose the number of training epochs and the training error tolerance.

munotes.in

## Page 295

294SOFT COMPUTING TECHNIQUES

6. Train th.e FIS model by clicking the Train Now but ton. This training adjusts

the membership function parameters and plots the training (and/or checking

data) error plot(s) in the plot region.

7. View the FIS model output versus the training, checking, or testing data

output by clicking the Test Now button. This function plots the test data

against the PIS output in the plot region.

One can also use the ANFIS Edi tor GUI menu bar to load an FIS training

initialization, save your trained FIS, open a new Sugeno system, or open any of the

other GU Is to interpret the trained FIS model.

10.2.4.4 Data Formalities and the ANFIS Edi tor GUI

To scan training an FIS using either anfis or the ANFIS Edi tor GUI, one needs co

have a training data set char contains desired inpudoutput data paits of the rarger

sysrem to be modeled. In certain cases, optional resting data set may be available

that can c heck the generalization capability of the resuhing fuzzy inference system,

and/or a checking data sec that helps with model overfirring during the training.

One can account for overfiuing by resting the FIS trained on the training data

against the checking data and choosing the membership function parameters to be

those associated with the minimum checking error , if these er rors indicate model

overfitting. To determine this, their training error plots have to be examined fairly

closely. Usually, these rraining and checking data sets are s tored in separate files

after being collected based on observations of the target sysrem.

10.2.4.5 More on ANFIS Edi tor GUI

A minimum of two and maximum six arguments can be taken up by the command

anfis whose general format is

[fismat 1, trnEr ror, ss, fismat 2, chkEr ror] = ….....

Anfis (trnData, fismat, trnOpt, dispOpt, chkData, method);

Here trnOpt (training options), dispOpt (display options), chkData (checking da ta),

and method (training method) are optional. All of the output arguments are also

optional. In this section we will discuss the arguments and range components of the

command line function anfis as well as the analo gous functionaliry of the ANFIS

Editor GUI. Only the training data set must exist before implementing anfis when

the ANFIS Edi tor GUI is in voked using anfisedit. The s tep-size will be fixed when

the adaptive NFS is trained using this GUI tool.

munotes.in

## Page 296

295Chapter 10: Hybrid Soft Computing Techniques

Training Data

Both anfis and ANFIS Edi tor GUI require the training data, trnData, as an

argument. For the target system to be modeled each tow of trndata is a d esired

input/output pair; a row starts with an input vec tor and is followed by an output

value. So, the number of rows of trndata is equal to the number of training data

pairs. Also, because there is only one output, the number of columns of trndata is

one more than the number of inputs.

Input FIS Structure

The input FIS S tructure, fismat, can be obtained from any of the following fuzzy

editors :

1. The FIS Edi tor.

2. The Membership Func tion Edi tor.

3. The Rule Edi tor from ilie ANFIS Edi tor GUI (which allows a FIS structure

to be loaded from a file or the MATLAB workspace).

4. The command line function, genfisl (for which one needs to give only

numbers and cyp ES of membership functions).

The FIS structure contains both the model structure (specifying, e.g., number of

rules in the FIS, the number of members hip functions for each input, etc.) and the

parameters (which specify the shap es of the membership functions). For updating membership function parameters, anfis learning employs two methods:

1. Back -propaga tion for all parameters (a steep est descent meth od).

2. A hybrid method involving back -Propagation for the parameters associated

with the input membership functions and leastsquares estimation for the

parameters associated with the output membership functions.

This means that th toughout the learning process , at least locally, the training error

decreases. So, as the initial membership functions increasingly r esemble the

optimal ones, it becom ES easier for the model parameter rraining to converge. In

the setting up of th ese ·tnitial membership function parameters in the FIS structure,

it may be helpful to have human expenise about the target system co be modeled.

Based on a fixed number of membership functions, the genfisl function produces a

FIS structure . This structure invokes the so-called curse of dimensionality and

causes excessive p ropagation of the number of rules when the number of inputs is munotes.in

## Page 297

296SOFT COMPUTING TECHNIQUES

moderately lar ge (more than four or five). To enable some dimension reduction in

the fuzzy inference system, the Fuzzy Logic Toolbox software p tovides a method -

a FIS structure can be generated using the clustering algorithm discussed in

Subtractive Clustering. To use this clustering algorithm, select the Sub. Clustering

option in the Generate FIS portion of the ANFIS Edi tor GUI, before the FIS is

generated. The data is partition ed by the subtractive clustering method in to gtoups

called dusters and generate s a F1S with the minimum number of rules required to

distinguish the fuzzy qualities associated with each of the clusters.

Training Options

One can choose a d esired error tolerance and number of training epochs in the

ANFIS Edi tor GUI tool. For the command line anf is, training option trnOp t: is a

vector specifying the s topping criteria and the stepsize adaptation strategy:

1. trnOpt (1) : number of training epochs; default = 10

2. trnOpt (2) : error . tolerance; default= 0

3. trnOpt (3) : initial step -size; default= 0.01

4. trnOpt (4) : step·size decrease rate; default"' 0.9

5. trnOpt (5) : step --size increase rate; default= 1.1

The default value is taken if any element of trnOpt is mis sing o r is an NaN. The

training process stops if the designated epoch number is reached or the error goal

is achieved, whichever comes first.

The srep -size profile is usually a curve that increases initiaHy, reaches a maximum,

and then decreases for the remainder of the training. This ideal step -size profile can

be achieved by adjusting the initial step -size and the increase and decrease rates

(trnOpt (3) - trnOpt ( 5)). The default values are set up to cover a wide range of

learning ta sks. These step -size options may have to be modified, for any specific

application, in order to optimize the training. There are, however, no user.specilied

step-size options for uainin g the adaptive Neuro -fuzzy inference system generated

using the ANFIS Edi tor GUI.

Display Options

They apply only to the command line function anfis. The display options argument,

dispOpt, is a vec tor of either ls or Os that specifies the information to be displayed

(print in ilie MATLAB command window) before, during, and after the training

process . To denote print thi s option, 1 is used and to denote do not print this option,

0 is used . munotes.in

## Page 298

297Chapter 10: Hybrid Soft Computing Techniques

1. dispOpt (1) : display ANFIS information; default = 1

2. dispOpt (2) : display er ror (each epoch); default = I

3. dispOpt (3) : display s tep-size (each epoch); default = 1

4. dispOpt (4) : display final results; default = 1

All available information is displayed in the default mode. If any element of

dispDpt is missing or is NaN, the default value is used.

Method

To estimate membership function parameters, both the command line anfis and the

ANFIS Edi tor GUI apply either a back -Propagation form of the steepest descent

method, or a combination of back -Propagation and the least-squares method. The

choices for this argu ment are hybrid or back Propagation . In the command line

function, anfis, these method choices are designated by 1 and 0, respectively.

Output FIS Structure for Trai ning Data

The output FIS structure corresponding to a minimal training error is fismat l. This

is the FIS structure one uses to represent the fuzzy system when there is no checking

data used for modd c toss-validation. Also, when the checking data option is not

used, this data represents the FIS structure that is saved by the ANFIS Editor GU I.

When one uses the checking data option, the output saved is that associated with

the minimum checking er ror.

Training Error

This is the difference bet ween the training data output value and the output of the

fuzzy inference sysrem corresponding to the same uaining data input value (the one

associated with that training data output value.)

The root mean squared error (RMSE) of the training data set at each epoch is

recorded by the t raining error trnError ; and fismat l is the snapshot of the FIS

structure when the training error measure is at its minimum. As the system is

trained, the ANFIS Edi tor GUI plots the training error versus epochs curve.

Step -Size

With the ANFIS Edi tor GUI, one cannot control the step -size options. The step -

size array ss records the step-size during the uaining, using the command line anfis.

If one plots ss, one gets the step -siz.e profile which serves as a reference for

adjusting the initial step-size, and the corresponding decrease and increase rates . munotes.in

## Page 299

298SOFT COMPUTING TECHNIQUES

The guidelines followed for updating the step -size (ss) for the command line

function anfis are:

l. If the error undergoes four consecutive reductions, increase the step -size by

multiplying it by a cons tant (ssinc) greater than one.

2. If the error undergoes two consecucive combinations of one increase and one

reduction, decrease the step-size by multiplying it by a constant (ssdec) less

than one.

For the initial step -size, the default value is 0.01; for ssinc and ssdec, they are 1.1

and 0.9, respectively. All the default values can be changed via the training option

for the command line anfis.

Checking Data

For testing the generalization capability of the fuzzy inference system at each

epoch, the checking data, chkData, is used. The checking data and the training data

have the same format and elements of the former are generally distinct from those

of the latter.

For learning tasks for which the input number is large and /or the data itself is noisy,

the checking data is important. A fuzzy inference system needs to track a given

input/output data set welL The model structure used for anfis is f ixed, which means

that there is a tendency for the model to overfit the data on which it is trained,

especially for a large number of training epochs. In case overfitting occurs, the

fuzzy inference system may not respond well to other independent data set s,

especially if they are corrupted by noise. In these situations, a validation o r

checking dam set can be useful. To cross-validate the fuzzy inference model, this

data set is used; c toss-validation requires applying the checking data to the model

and then seeing how well the model responds to this data.

The checking data is applied to the model at each tmining epoch, when the checking

data option is used with anfis either via the comman d line or using the ANFIS

Editor GUI. Once the command line anfis is invoked, the model parameters £hat

correspond to the minimum checking error are returned via the output argument

fismat2. The FIS membership function parameters computed using the ANFIS

Editor GUI when both training and checking data are loaded, are associated with

the training epoch that has a minimum checking error .

The assumptions made when using the minimum checking data error epoch to set

the membership function parameters are: munotes.in

## Page 300

299Chapter 10: Hybrid Soft Computing Techniques1. The similarity between checking data and the training data means that the checking data error decreases as the training begins.

2. The checking data increases ar some point in the training after the data

overfitting occurs.

The resulting FIS may or may not be the one which is required to be used,

depending on the behavior of the checking data error .

Output FIS Structure for Checking Da ta

The output FIS structure with the minimum checking error is the output of the

command line anfis,

Fismat 2. If checking data is used for c toss-validation, this FIS structure is the one

rhat should be used for further calculation.

Checking Error

This is the difference becween the checking data ourpuc value and the output of the

fuzzy inference system corresponding to the same checking dala input value, which

is the one associated with that checking data output value. The Toot Mean Square

Error (RM SE) is reCorded for clte checking data at each epoch, by the checking

error chkError . The snapshot of ilie FIS structure when the checking error has its

minimum value is fismat 2. The checking error versus epochs curve is planed by

the ANFIS Edi tor GUI, as the system is trained.

10.3 Genetic Neuro ·Hybrid Systems

A Neuro -genetic hybrid or a genetic -Neuro ·hybrid system is one in which a neural

network employs a genetic algorithm to optimize its structural parameters iliat

define its architecture. In g eneral, neural networks and genetic algorithm refers to

two distinct methodologies. Neural networks learn and execute different tasks

using several examples, classify phenomena, and mode l nonlinear relationships;

that is neural networks solve problem s by self-learnig and self -organizing. On the

other hand, genetic algorithms present themselves as a potential solution f or the

optimization of parameters of neural networks.

10.3.1 P roperties of Genetic Neuro ·Hybrid Systems

Certain properties of genetic Neuro -hybrid systems are as follows:

1. The parameters of neural networks are encoded by genetic algorithms as a

string of properties of the network, that is, chromosome s. A large population munotes.in

## Page 301

300SOFT COMPUTING TECHNIQUES

of chromosome s is generated, which represent the many possible p arameter

sets for the given neural network.

2. Genetic Algorithm - Neural Network, or GANN, has the abilicy to locate the

neighborhood of the optimal solution quickly, compared to other

conventional search mategies.

Figure 10-5 shows the block diagram for the genetic -Neuro -hybrid systems. Their

drawbacks are: the large amount of memory required for handling and manipulation of chromosome s for a given network; and also the question of

scalabiliry of this problem as the size of the networks become large.

10.3.2 Genetic Algorithm Based Back -Propagation Network (BPN)

BPN is a method of reaching multi -layer neural networks how to perfonn a given

task. Here learning occurs during this training phase. The basic algorithm with

architecture is discussed in Chapter 3 (Section 3.5) of this book in derail. The

limitations of BPN are as follows:

1. BPN do not have the abiliry to recognize new patterns; they can recognize

patterns similar to those they have learnt.

2. They must be sufficiently trained so that enough general features applicable

to both seen and unseen instances can be extracted; there may be undesirable

effects du e to over training the network.

Figure 10-5 Block diagram of genetic -Neuro hybrids.

munotes.in

## Page 302

301Chapter 10: Hybrid Soft Computing Techniques

Also, it may be nored that the BPN determines its weight based on gradient search

technique and hence it may encounter a local minima problem . Though genetic

algorithms do nor guarantee to find global optimum solution, they are good in

quickly finding good acceptable solutions. Thus, hybridization ofBPN with genetic

algorithm is expected to ptovide many advantages compared to what they alone

can. The basic concepts and working of genetic alg orithm are discussed in Chapter

15. However, before a genetic algorithm is executed,

1. A suitable coding for the problem has to be devised.

2. A fitness function has to be formulated.

3. Parents have to be selected for re production and then c tossed over to generate

offspring.

10.3.2.1 Coding

Assume a BPN configuration n-1-m where n is the number of Neuro ns in the input

layer, l is the number of Neuro ns in the hidden layer and m is the number of output

layer Neuro ns. The number of weights to be determined is given by

(n + m)i

Each weight (which is a gene here) is a real number. Let dbe the number of digits

(gene length) in weight. Then a String S of decimal values having string length (n

+ m)ld is randomly generared. It is a string that represenrs weight matrices of

inpurhidden and the hidden -output layers in a linear form arranged as tow-major or

column -major depending upon the sryle selected. T hereafter a population of p

(which is the population size) chromosome s is randomly generated.

10.3.2.2 Weight Extraction

In order to determine the fitness values, weights are exuacred from each

chromosome . Let n3 …., nd, … ., nl, represent a chromosome and let npd + dpt +

2 ….. d(p+1)d represent pth gene (p > 0) in the chromosomes.

The actual weight wp is given by

munotes.in

## Page 303

302SOFT COMPUTING TECHNIQUES

10.3.2.3 Fitness Function

A fimess has to be formulated for each and every problem to be solved. Consider

ilie matrix given by

where X and Y are the inputs and targets, respectively. Compute initial population

Io of size 'j'. Let

O10, O20, ... , Op represem 'j' chromosome s of the initial population lo. Let the

weights extracted for each of the chromosome s upto the chromosome be w10, w20,

w30, …., w p. For a number of inputs and m number of outputs, let the calculated

output of the considered BPN be

As a result, the error here is calculated by

ER 1 = (y 11 – c11)2 + (y 21 – c21)2 + (y 31 – c31)2 + ….. + (y n1 – cn1)2

ER 2 = (y 12 – c12)2 + (y 22 – c22)2 + (y 32 – c32)2 + ….. + (y n2 – cn2)2

…………………………………………………………………….

…………………………………………………………………….

ER m = (y 1m – c1m)2 + (y 2m – c2m)2 + (y 3m – c3m)2 + ….. + (y nm – cnm)2

The fitness function is further derivd from this root mean square error given by

The process has to be carried out for all the total number of chromosome s.

munotes.in

## Page 304

303Chapter 10: Hybrid Soft Computing Techniques

10.3.2.4 Reprod uction of Offspring

In this process , before the parents produce the offspring with better fitness, the

mating pool has to be formulated. This is accomplished by neglecting the chromosome with minimum fitness and replacing it with a chromosome having

maximum fimess, In other words, the fittest individuals among clle chromosome s

will be given more chances to participate in the generations and the worst

individuals will be eliminated. Once the mating pool is formulated, parent pa its are

selected' randomly and the chromosome s of respective pa irs are combined using

crossover technique to reproduce offspring. The selection opera tor is suitably used

to select the best parem to participate in the re production process .

10.3.2.5 Convergence

The convergence for genetic algorithm is the number of generations with which the

fitness value increases towards the global optimum. Convergence is the p rogression

towards increasing uniformiry . When about 95% of the individuals in the population share th e same fitness value then we say that a population has

converged.

10.3.3 Advantages of Neuro ·Genetic Hybrids

The various advantages of Neuro -genetic hybrid are as follows:

• GA performs optimization of neural network parameters with simplicity, ease

of operation, minimal requiremems and global perspective.

• GA helps to find out complex structure of ANN for given input and the output

data set by using its learning rule as a fimess function.

• Hybrid app roach ensembles a powerful model that could significantly

improve the predicrability of the system under construction.

The hybrid approach can be applied to several applications, which include: load

forecasting, s tock forecasting, cost optimization in textile industries, medical

diagnosis, face recognition, multi -process or scheduling, job shop scheduling, and

so on.

10.4 Genetic Fuzzy Hybrid and Fuzzy Geneti c Hybrid Systems

Curremly, several researches has been performed combining fuzzy logic and

genetic algorithms (GAs), and there is an increasing interest in the integration of

these two topics. The imegration can be performed in the following two ways: munotes.in

## Page 305

304SOFT COMPUTING TECHNIQUES

1. By the use offuzzy logic based techniques for imp toving genetic algorithm

behavior and modelling GA components. This is called fozzy genetic algorithms (FGk).

2. By the application of genetic algorithms in various optimization and search

problem s involving fuzzy systems.

An FGA is considered as a genetic algorithm that uses techniques or tools based on

fuzzy logic to improve the GA behavior modelling . It may also b e defined as an

ordering sequence of instructions in which some of the instructions or algorithm

components may be designed with tools based on fuzzy logic. For example, fuzzy

opera tors and fuzzy connectives for designing genetic opera tors with differ ent

properties , fuzzy logic control systems for co ntrolling the GA parameters according

to some performance measures, s top criteria, representation tasks, ere.

GAs are utilized for solving different fuzzy optimization problem s. For example,

fuzzy flo wshop scheduling problem s, vehicle routing problem s with fuzzy due-

time, fuzzy optimal reliability design problem s, fuzzy mixed integer program ming

applied m resource distribution, job -shop scheduling problem with fuzzy process ing time, interactive fuzzy satisf ying method for multi -objective 0 -1, fuzzy

optimization of distribution network s, etc.

10.4.1 Genetic Fuzzy Rule Based Systems (GFRBSs)

For modelling complex systems in which classical tools are unsuccessful, due to

them being complex or imprecise, an imponant tool in the form of fuzzy rule based

systems has been identified. In this regard, for mechanizing the definicion of the

knowledge base of a fuzzy control ler G As have p roven to be a p owerful tool, since

adaptive conr tol, learning, and self -organization may be considered in a Joe of

cases as optimization or search process es. Over the last few years their advantages

have extended the use of GAs in the development of a wide range of approach es

for designing fuzzy control lers. In particular, the application to the design, learning

and tuning of knowledge bases has produced quite good results. In general these

approach es can be termed as Grnttic Fuzzy Systems (GFSs}. Figure 10-6 shows a

system where genetic design and fuzzy process ing are the two fundamental

constituents. Inside GFRBSs, it is possible to distinguish between either parameter

optimization or rule generation process es, that is, adaptation and learning.

munotes.in

## Page 306

305Chapter 10: Hybrid Soft Computing Techniques

The main objectives of optimization in fuzzy rule based system are as follows:

l. The task of finding an app topriate knowledge base (KB) for a particular

problem . This is equivalent to parameterizing the fuzzy KB (rules and

membership functions).

2. To find those parameter values that are optimal with respect to the design

criteria.

Figure 10·6 Block diagram of genetic fuzzy system

Considering a GFRBS, one has to decide which parts of the knowledge base (KB)

are subject to optimization by the GA. The KB of a fuzzy system is the union of

qualitatively different components and not a homogeneous structure. As an example, the KB of a descriptive Mathdani -type fuzzy sy stem has two components:

a rule base (RB) containing the collection of fuzzy rules and a data base (DB)

containing the definitions of the scaling factors and the membership functions of

the fuzzy sets associated with the linguistic labels.

In this phase , it is important to distinguish between tuning (alternatively,

adaptation) and learning problem s. See Table 10-2 for the differences.

munotes.in

## Page 307

306SOFT COMPUTING TECHNIQUES

10.4.1.1 Genetic Tuning Process

The task of tuning the scaling fu nctions and fu zzy membership func tion is

important in FRBS design. The adoption of parameteri zed scaling functions and

membership functions by the GA is based on the fimess function rhat specifies the

design criteria quantitatively. The responsibility of finding a set of optimal

parameters for th e member ship and/or the scal ing functions rests with the tuning

proces ses which assume a predefined rule base. The tuning process can he

performed a priori also. This can be done if a subsequent process derives the RB

once the DR has been obtained, that is a priori genetic DB learning. Figure 5.2-7

illustrates the process of gen eric tuning.

Tuning Scaling Functions

The universes of discourse wh ere fuzzy membership functions are defined are

normalized by scaling functions applied to the input and output variable of FRBSs.

In case of linear scaling, the scaling functions are parameterized by a single scaling

factor or either by specifying a lower and upper bound. On the other hand, in case

of non -linear scaling, the scaling functions are p arameterized by one or several

contraction / dilation parameters. These parameters are adapted such that the scaled

universe of discourse matches the underlying variable range.

Table 10·2 Tunin g verses learning problems Tuning Learning Problems It is concrned with optimization of an exisring FRBS. It constitutes an automated design medhos for fuzzy rule sets that start from scratch. Tuning processes assume a predefined RB and have the objective to find a et of optimal parameters for the membership and/or the scaling functions, DB parameters. Learning process perform a more elaborated watch in the space of possible RBs or whole KB and do not

depend on a predefined set of rules

Ideally, in these kinds of processes the approach is to adapt one to four parameters

per variable: one when using a scaling fac tor, two for linear scaling, and three or

four for non-linear scaling. This approach leads to a fixed length code as the number

of variables is predefined as is the number of parameter s required to code each

scaling function. munotes.in

## Page 308

307Chapter 10: Hybrid Soft Computing Techniques

Figure 10·7 Process of ntning the DB.

Tuning Membership Functions

It can be no ted that during the runing of membership funcrions, an individual

represents the entire DB. This is because its chromosome encodes the parameterized membership functions associated to the linguistic terms in every

fuzzy partition considered by the fuzzy rule based system. Triangular (either

isosceles or asymmetric), trapezoidal, or Gaussian functions are the most common

shape s for ilie membership functions (in GFRBSs). The number of parameters per

membership function can vary from one to four and each parameter can be either

binary or real coded.

For FRBSs of the descriptive (using linguistic variables) or the app roximate (using

fuzzy variables) type, the structure of the chromosome is different. In the process

of runing the membership functions in a linguistic model, the entire fuzzy partitions

are encoded in to the chromosome and in order to maintain the global sema ntic in

the RB, it is globally adapted. These approach es usually consider a predefined

number of linguistic terms for each variable -with no requirement to be the same for

each of them -which leads to a code of fixed length in what concerns membership

functi ons. Despite this, it is possible to evolve the number of linguistic terms

associated to a variable; simply define a maximum number (for the length of the

code) and let some of the membership functions be located out of the range of the

linguistic variable (which reduces the actual number of linguistic terms).

Descriptive f uzzy systems working with st tong fuzzy partitions, is a particular case

where the number of parameters to be coded is reduced. Here, the number of

parameters to code is reduced to the ones defining the core regions of the fuzzy

sets: the modal point for triangles and the ex treme poin ts of the core for trapezoidal

shapes.

munotes.in

## Page 309

308SOFT COMPUTING TECHNIQUES

Tuning the membership functions of a m odel working with funy variables (sca tter

partitions), on the other hand, is a particular instance of knowledge base learning.

This is because, instead of referring to linguistic terms in the DB, the rules are

defined completely by their own membership fun ctions.

10.4. 1.2 Genetic Learning of Rule Bases

As shown in Figure 10-8, genetic learning of rule bases assumes a predefined set

of fuzzy membership functions in the DB to which the rules refer, by means of

linguistic labels. As in the app roximate approach adapting rules, it only applies to

descriptive FRBSs, which is equivalent to modifying the membership functions.

When considering a rule based system and focusing on learning rules , there are

three main approach es that have been applied in the literature:

1. Pittsburgh approach .

2. Michigan approach .

3. Iterative rule learning approach .

Figure 10·8 Genetic learning of rule base.

Figure 10-9 Genetic learning of the knowledge base.

The Pi ttsburgh approach is characterized by representing an entire rule se t as a

genetic code (chromosome ), maintaining a population of candidate rule se ts and

using selection and genetic opera tors to produce new generations of rule sets. The

munotes.in

## Page 310

309Chapter 10: Hybrid Soft Computing Techniques

Michigan approach considers a different model where the members of the

population are individual rules and a rule set is represented by the entire population.

In the third approach , the iterative one, chromosome s code individu al rules, and a

new rule is adapted and added to the rule set, in an iterative fashion, in every run

of the genetic algorithm.

10.4. 1.3 Genetic Learning of Knowledge Base

Genetic learning of a KB includes different genetic represenrarions such as variable

length chromosome s, multi -chromosome genomes and chromosome s encoding

single rules instead of a whole KB as it deals with heterogeneous search spaces. As

the complexity of the search space increases, the computational cost of the genetic

search also g rows. To combat this issue an option is to maintain a GFRBS that

encodes individual rules rather than entire KB. In this manner one can maintain a

flexible, compl ex rule space in which the search for a solution remains feasible and

efficient. The three learning approach es as used in case of rule base can also be

considered here: Michigan, Pittsburgh, and iterative rule learning approach . Figure

5.2-9 illustrates th e genetic learning of KB.

10.4.2 Advantages of Genetic Fuzzy Hybrids

The hybridization between fuzzy systems and GAs in GFSs became an important

research area during the last decade. GAs allow us to represent different kinds of

structures, such as weights, features together with rule parameters, e tc., allowing

us m code multiple models of knowledge representation. This provides a wide

variety of approach es where it is necessary m design specific gene tic components

for evolving a specific representation. Nowadays, it is a g towing research area,

where researchers need to reflect in order to advance towards strengths an d

distinctive features of the G FSs, providing useful advances in the fuzzy systems

theory. Genetic algorithm efficiently optimizes the rules, membership functions,

DB and KB of fuzzy systems. The methodology adopted is simple and the fittest

individual is identified during the process .

10.5 Simplifi ed Fuzzy ARTMAP

The basic concepts of Adaptive Resonance Theory Neural Network s are discussed

in Chapter 5. Both the types of ART Network s, ART -1 and ART2, are discussed

in derail in Section 5.6.

Apart from these two ART networks , the other two maps are ARTMAP and fuzzy

ARTMA P. ARTMAP is also known as Predictive ART. It combines two slightly

modified ART -1 or ART -2 units into a supervised learning structure . Here , the first

unit rakes the inpm data and the second unit rakes the correct outpu t data. Then munotes.in

## Page 311

310SOFT COMPUTING TECHNIQUES

minimum poible adjusrmem of the vigilance parameter in the fim unit is made using

the correct output data .o rhar correct classification can be made.

The Fuzzy ARTMAP model h as fuzzy -logic -based computations incorporated in

the ARTMAP model. Fuzzy ARTMAP is neural network architecture for conducting supervised learning in a multidimensional setting. When Fuzzy ARTMAP is used on learning problem , it is trained till it correctly classifies all

uaining data. This feature causes Fuzzy ARTMAP to "overfir" some darasers,

especially those in which the underlying panern harm overlap. To avoid the

problem of "overfiuing" one must allow for error in the training process .

10.5.1 Supervised ARTMAP System

Figure 5.2-10 shows the super \'ised ARTMAP system. Here, two ART modules are

linked by an inrer -ART module called the Map Field. The Map Field forms

predictive associations be tween categories of the ART modules and realizes a

march tracking rule. If ARTa and ARTb are disconnected, then each module would

be of self -organize category, g touping their respective in pursers. In supervised

mode, the mappings are learned berween input vecmrs a and b.

Figure 10·10 Supervised ARTMAP system.

10.5.2 Comparison of ARTMAP with BPN

1. ARTMAP networks are self -stabilizing, while in BPNs the new information

gradually washes away old information. A consequence of rhis is rhat a BPN

has separate training and performance phases while ART MAP systems

perform and learn at the same time.

munotes.in

## Page 312

311Chapter 10: Hybrid Soft Computing Techniques

2. ART MAP networks are designed to work in real -time. While BPNs are

typically designed to work off -lilne at least during their training phase.

3. ARTMAP system can learn both in a fast as well as inslow match

configuration, while, the BPN can only learn in slow mismatch configuration.

This means that an ARTMAP system learns, or adaprts its weights, only

when the input matches an established category, while BPNs learn when the

input does not match an estab lished category.

4. In BPNs there is always a chance to the system getting trapped in a local

minimum while th is is impossible for ART system s.

However, the system based on ART modules learning may depend upon the

ordering of the input patterns.

10.6 Summary

In this chapter, the various bybrids of individual neural networks, fuzzy logic and

genetic algorithm have been discussed in detail. The advantages of each of these

techniques are combines together to give a better solution to the problem under

considerat ion. Each of these systems possesses certain limitations when they

operate individually and these limitations are met by bringing out the advantages

of combining these systems. The hybrid systems are found to provide better

solution for complex problems and the advent of hybrid systems makes it applicable to be applied in various application domains.

16.7 Solved Problems using MATLAB

1. Write a MATLAB program to adapt the given input to write wave from using

adaptive neun -fuzzy hybrid technique.

Epocha = 570;

Creating fuzzy inference engine

Fix = genfisl (trndata, mfs);

Plotfix (fix);

Figure

R=showrule (fis);

Creating adaptive neuro fuxxy inference engine

Nfix – anfis (trndata, fix, epocha);

R1 = showrule (infis);

munotes.in

## Page 313

312SOFT COMPUTING TECHNIQUES

Evaluating anfix with given input

Y = evalfix (x, nfis);

Disp (‘The output data from ansif : ‘);

disp(y);

calculating error rate

e=y-t;

plot(e);

title (‘Error rate’)

figure

ploting given training data and anfix output

plot (x, t, ‘o’, x, y, …);

title (‘Training data vs Out data’);

legend (‘Training data’, ‘ANFIS Output’);

Output

The input data given x is :

0

0.3000

0.6000

0.9000

1.2000

1.5000

1.8000

2.1000

2.4000

2.7000

3.0000

3.3000

3.6000

3.9000

4.2000

4.5000

4.8000

5.1000

5.4000

5.7000

6.0000 munotes.in

## Page 314

313Chapter 10: Hybrid Soft Computing Techniques

6.3000

6.6000

6.9000

7.2000

7.5000

7.8000

8.1000

8.4000

8.7000

9.0000

9.3000

9.6000

9.9000

10.2000

10.5000

10.8000

11.1000

11.4000

11.7000

12.0000

12.3000

12.6000

12.9000

13.2000

13.5000

13.8000

14.1000

14.4000

14.7000

15.0000

15.3000

15.6000

15.9000

10.2000

10.5000

10.0000

17.1000

17.4000

17.7000 munotes.in

## Page 315

314SOFT COMPUTING TECHNIQUES

18.0000

18.3000

18.6000

18.9000

19.2000

0.9437

0.9993

0.9657

0.8457

0.6503

0.3967

0.1078

-0.1909

-0.4724

-0.7118

-0.8876

-0.9841

-0.9927

-0.9126

-0.7510

-0.5223

-0.2470

0.0504

0.3433

0.6055

0. 8137

ANFIS info:

Number of nodes: 32

Number of linear parameters: 14

Number of nonlinear parameters: 21

Total number of parameters: 35

Number of training data pa its: 67

Number of checking data pa its: 0

Number of fuzzy rules: 7

Start training ANFIS

1 0.0517485

2 0. 0513228

3 0.0508992 munotes.in

## Page 316

315Chapter 10: Hybrid Soft Computing Techniques

4 0.0504776

5 0.0500581

Step size increases to 0.011000 after epoch 5.

6 0.0496406

7 0.0491837

a o.o4S7291

568 0.00105594 -

Hybrid Soft Comp"t;og Techo;q,es

Designated epqch number reahed --->ANFIS training completed at epoch 570.

The outp ut data from anfis':

-0.0014

0.2981

0.5647

0.7817

0.9314

0.9984

0.9747

0.8629

0.6746

0. 4271

0.145.2

-0.1571

-0.4425

-0.6884

-0.8720

-0.9772

-0.9955

-0.9260

-0.7735

-0.5509

-0.2788

0.0174

0. 3112

0. 5777

0.7935

0.9387

0.9991

0.9697 munotes.in

## Page 317

316SOFT COMPUTING TECHNIQUES

0.8540

0.6627

0.4122

0.1247

-0.1741

-0.4574

-0.7000

-0.8801

-0.9812

-0.9941

-0.9189

-0.7623

-0.5371

-0.2629

0.0346

0.3277

0.5908

0.8024

0.9442

1.0014

0.9667

0.8443

0.6484

0.3969

0.1093

-0.1900

-0.4731

-0.7130

-0.8879

-0.9833

-0.995.2

-0.9125

-0.7521

-0.5232

-0.2457

0.0526

0.3426

0.6015

0.85.23

munotes.in

## Page 318

317Chapter 10: Hybrid Soft Computing Techniques

Figure 10-11 illustrates the ANFIS system module; figure 10-12 the error me; and

Figure 10-13 the performance of training dam and output data. Thus ir can be noted

from Figure 10-13, that an f is has adapted the given inpur to sine wave form.

Input 1 (7) output (7)

System anfix: 1 inputs, 1 outputs, 7 rules

Figure 10-11 ANFIS system module.

Figure 10-12 Arror rate.

munotes.in

## Page 319

318SOFT COMPUTING TECHNIQUES

Figure 10-13 Performance of training data and output data.

2. Write a MATLAB program to recognize the given input of alphabets to its

respective output using adaptive Neuro -fuzzy hybrid technique.

Source code

%program to recognize the given input of .alphabets to its respective

%outputs using adaptive Neuro fuzzy hybrid technique.

'clc;

clear all;

close all;

%input data

x= [0,l,0,0;l,0,l,l;l,l,l,2;1,0,1,3;1,0,1,4;

1,1,0,5;1,0,1,6;1,1,0,7;1.0,1,8;1,1,0,9;

0,1,1,10;1,0,0,11;1,0,0,12;1,0,0,13;0,1,1,14;

1,1,0,15;1,0,1, 5.2;1,0,1,17;1,0,1,1 8;1,1,0,19;

1, 1, 1, 20; 1 , 0, 0, 21; 1, 1, 0, 22; 1 , 0, 0, 23; 1, 1 , 1, 24; ]

munotes.in

## Page 320

319Chapter 10: Hybrid Soft Computing Techniques%target data t:::[0 ;0;0;0;0;

1;1;1;1;1;

2;2;2;2;2;

3;3;3;3;3;

4;4;4;4;4; 1

%training data

trndata= [x, t);

mfs::o3;

epochs=400;

%creating fuzzy inference engine

fis=genfis1(trndata,mfs);

plotmf ( fis, 'input' , 1) ;

r=showrule(fis);

%creating adaptive Neuro fuzzy inference engine

nfis = anfis(tr ndata,fis,epochs);

surfview(nfis);

figure

r1=showru1e(nfis);

%evaluating anfis with given input

Y=eva1fis(x,nfis);

disp('The output data from anfis:'):

disp(y);

%calculating error rate

esy-t;

plot (e);

title(' Error rate');

figure

%ploting given training data and anfis output

Plot (x,t, ‘or’, x,y, ‘kx’);

Title (Training data vs Output data’)

Legend (‘Training data’, ANFIS Output’, ‘location’, North’);

munotes.in

## Page 321

320SOFT COMPUTING TECHNIQUES

Output

X =

0 1 0 0

1 0 1 1

1 1 1 2

1 0 1 3

1 0 1 4

1 1 0 5

1 0 1 6

1 1 0 7

1 0 1 8

1 1 0 9

0 1 1 10

1 0 0 11

1 0 0 12

1 0 0 13

c 1 1 14

1 1 0 15

1 0 1 5.2

1 0 1 17

1 0 1 18

1 1 0 19

1 1 1 20

1 0 0 21

1 1 0 22

1 0 0 23

1 1 1 24

t =

0

0

0

0

0

1

1

1

1

1

2 munotes.in

## Page 322

321Chapter 10: Hybrid Soft Computing Techniques

2

2

2

2

3

3

3

3

3

4

4

4

4

4

ANFIS info:

Number of nodes: 193

Number of linear parameters: 405

Number of nonlinear parameters: 36

Total number of parameters: 441

Number of training data pa its: 25

Number of checking data pa its: 0

Number of fuzzy rules: 81

Start training ANFIS

1 0.08918

2 0.0889038

3 0.0886229

4 0.0883371

5 0.0880464

Step size increases to 0.011000 after epoch 5.

6 0.0877506

7 0.0874193

.

.

.

.

398 0.00102 5.21

399 0.00102102

400 0.0010191 munotes.in

## Page 323

322SOFT COMPUTING TECHNIQUES

Step size increases to 0.003347 after epoch 400.

Designated epoch number reached --> ANFIS training completed at epoch 400.

The output data from anfis:

-0.0000

0.0009

0.0000

-0.0031

0.0024

1. 0000

0.9997

1. 0000

1.0002

1.0001

2.0000

2.0001

1. 9998

2.0001

2.0000

2.9999

2.9982

3.0022

2.9994

3.0001

4.0000

4.0000

3.9999

4.0000

4.0000

Figure 10.14 shows the degree of membership. Figure 10.15 illusrmes the surface

view of the given system; Figure 10.16 the error rate; and Figure 10.17 the

performance of training data with output data. munotes.in

## Page 324

323Chapter 10: Hybrid Soft Computing Techniques

Figure 10.14 Degree of membership.

Figure 10.15 Surface view of the given system.

munotes.in

## Page 325

324SOFT COMPUTING TECHNIQUES

Figure 10.16 Error rate.

Figure 10·17 Performance of training data with output data.

munotes.in

## Page 326

325Chapter 10: Hybrid Soft Computing Techniques

3. Write a MATI.AB program m train the given trmh table using adaptive

Neuro -fuuy hybrid technique. Source code

% Program to train the given truth table using adaptive Neuro fuzzy %hybrid

technique.

clc;

clear all;

close all;

%input data

X = [ 0, 0, 0; 0, 0, 1 ; 0, 1, 1; 0, 0,1,1,1,0,0,1,1,1,0,1,1,1;]

%target data

c=[0,0,0,1,0,1,1,1]

%training data

trndata= [x, t];

mfs=3;

mfType = 'gbellmf';

epochs=49;

%creating fuzzy inference engine

Fis=genfisl (trndata, mfs, mfType);

Plotfis (fis);

title ('The created fuzzy logic');

figure

plotmf(fis, 'input',l);

title ('The membership function of the fuzzy’):

surfview ( fis) ;

figure

ruleview ( fis) ;

r=showrule(fis);

%creating adaptive Neuro fuzzy inference engine

nfis = anfis (trndata, fis, epochs);

plotfis (nfis);

title ('The created anfis');

figure

plotmf (nfis,'input',l);

title ('The membership function of the anfis');

surfview (nfis);

figure

ruleview(nfis);

rl=showrule(nfis); munotes.in

## Page 327

326SOFT COMPUTING TECHNIQUES

%evaluating anfis with given input

y=evalfis (x,nfis);

disp ('The output data from anfis:');

disp (y);

%calculating error rate

e=y-t;

plot(e);

title(' Error rate');

figure

%plating given training data and anfis output

plot (x, t, 'o' ,x,y, '*' l;

title ('Training data vs Output data');

legend ('Training data','ANFIS Output');

Output

X =

0 0 0

0 0 0

0 1 0

0 1 1

1 0 0

1 0 1

1 1 0

1 1 1

T =

0

0

0

1

0

1

1

1

ANFIS info:

Number of nodes: 78

Number of linear parameters: 108

Number of nonlinear parameters: 27

Total number of parameters: 135 munotes.in

## Page 328

327Chapter 10: Hybrid Soft Computing TechniquesNumber of training data paits: 8 Number of checking data pa its: 0

Number of fuzzy rules: 27

Start training ANFIS …

1 3.13863e -007

2 3.0492e -007

3 2.9784le -007

4 2. 90245e -007

5 2.84305e -007

Step size increases to 0.011000 after epoch 5

6 2.78077e -007

.

.

.

.

47 2.22756e -007

48 2.22468e -007

49 2.22431e -007

Step size increases to 0.015627 after epoch 49.

Hybrid Soft Computing Techniques

Designated epoch number reached --> ANFIS training completed at epoch 49.

The output data from anfis:

-0.0000

0.0000

0.0000

1.0000

0.0000

1.0000

1.0000

1.0000

Figure 10-18 shows the ANFIS module for the given system with specified inputs.

Figure 10-19 illus trates the rule viewer for the ANFI S module. Figure 10-20 gives

the error rate. Figure 10-21 shows the performance of Training data and ourpur

data. munotes.in

## Page 329

328SOFT COMPUTING TECHNIQUES

System anfis : 3 inputs, 1 outputs, 27 rules.

Figure 10-18 ANFIS module for the given system with specified inputs.

Figure 10-19 Rule viewer for the ANFIS module.

munotes.in

## Page 330

329Chapter 10: Hybrid Soft Computing Techniques

Figure 10-20 Error rate.

Figure 10-21 Performance of training data and output data.

4. Write a MATLAB program to optimize the neural network parameters for

the given truth table using genetic algorithm.

Source code

%Program to optimize the neural network parameters from given truth table

%using genetic algorithm

clc;

clear all;

close all;

%input data

p = [ 0 0 1 1; 0 1 0 1 ];

munotes.in

## Page 331

330SOFT COMPUTING TECHNIQUES

%target data

T = [ -1 1 -1 1 ];

%creating a feedforeward neural network

net=newff (minrnax(p), [2,1]);

%creating two layer net with two Neuro ns in hidden (1) layer

net.inputs (l).size = 2;

net.numLayers = 2;

%initializing network

net= init(net);

net.initFcn = 'initlay';

%initializing weights and bias

net.layers{1}.initFcn = 'initwb';

net.layers{2).initFcn = 'initwb';

%Assigning weights and bias from function 'gawbinit'

net.inputWeights{1,1).initFcn = 'gawbinit';

net.layerWeights{2,1).initFcn = 'gawbinit';

net.biases{l}.initFcn='gawbinit';

net.biases{2).initFcn='gawbinit';

%configuring training parameters

net.trainParam.lr = 0.05; %learning rate

net.trainParam.min_grad=Oe -10; %min. gradient

net.trainParam.epochs = 60; %No. of iterations

%Training neural net

net=train(net,p,t);

%simulating the net with given input

y = sim (net,p);

disp ('The output of the net is : ');

disp (y};

plating given training data and anfis output

plot (p,t,'o',p,y, '*');

title ('Training data vs Output data');

%calculating error rate

e= gsubtract (t,y); % e=t -y

disp ('The error (t-y) of the net is :');

disp (e); munotes.in

## Page 332

331Chapter 10: Hybrid Soft Computing Techniques

%program to calculate weights and bias of the net

function outl = gawbinit (inl, in2, in3, in4, in5,-)

%%=======================================================

%Implementng genetic algorithm

%configuring ga arguments

A= [ ]; b = [ ]; %: linear constraints

Aeq = [ ]; beq = [ ]; %linear inequalities

lb = { -2 -2 -2 -2 -2 -2]; %lower bound

ub = [2 2 2 2 2 2]; %upper bound

%ploting ga parameters

Options = gaoptimset ('PlotFcns',{ Egaplotscorediversity, Egaplotbest f)};

%creating a multi objective genetic algorithm

%number of variables , for 2 layer 1 output 5 Neuro n net there are

%6 weights and 3 biases (6+3= 9)

nvars=9;

[X, fval, exitFlag, Output]=gamultiobj{@fitnesfun,nvars,A,b,Aeq,beq,lb,

figure

%displaying the ga output parameters

disp(X );

fprint f ('The number of generations was : %d \n', Output.generations);

fprint f ('The number of function evaluations was : %d \n', OUtput.funccount);

fprint f ('The best function value found was %g \n', fval);

%%=======================================================

%Assigning the values of weights and bias respectively

%getting information of the net

persistent INFO;

if isempty (INFO), INFO= nnfcnWeightinit(mfilename,'Random Symmetric',

7.0, ... true, true, true, true, true, true, true, true); end

if ischar (inl)

switch lower (inl)

case 'info', outl =INFO;

%configuring function

case 'configure'

outl = struct; munotes.in

## Page 333

332SOFT COMPUTING TECHNIQUEScase 'initialize' %selecting input weights , layer·weights and bias separately switch(upper(in3)) case {'IW') %for input weights· if INFO.initinputWeight if in2.inputConnect(in4,in5) x=X; %Assigning ga output 'X' to input weights %Taking first 4 ga outputs to cFeate input weight matrix 'wi' wi(l,l)=x(l,l); wi{1,2)=x{1,2); wi(2,l)=x(l,3); wi(2,2)=x(1,4); disp(wil; outl = wi;%Returning input layer matrix else outl = [ ]; end else 505 nerr.thtow([upper(mfilename) ' does not initialize input weights.']); end case {'LW'} %for layer weights if INFO.initLayerWeight if i2.layerConnect{in4,in5) x=X; %Assigning ga output 'X' to layer weights %Taking 7th and 8th ga outputs to create layer weight matrix 'wl' wl(l,l)=x{l, 7); wl{1,2)=x(l,Bl; disp (wl); RXWO ZO5HWXUQLQJOD\HUHLJKWPDWUL[ else outl [ ]; end else nnerr.thtow([upper(mfilename) ' does not initialize input weights.']); end case {'B'} %for bias if INFO.initBias munotes.in

## Page 334

333Chapter 10: Hybrid Soft Computing Techniques

if in2.biasConnect{in4)

x=X; %Assigning ga output 'X' to bias

%Taking 5th, 6th and 9th ga outputs to create bias matrix 'bl'

bl[l)=x{l,5);

bl[2)=x[1,6);

bl [3) =x(l, 9);

disp(bl);

outl = bl;

else

end

[];%Returning bias matrix

nnerr.th tow([upper(mfilename) ' does not initialize biases.']);

end

otherwise,

end

end

end

end

nnerr.th tow('Unrecognized value type.');

%Creating fitness function for genetic algorithm

function z = fitnesfun(e)

%The error (t-y) for all 4 i/o pa its are summed to get overall error

%For 4 input target pa its the overall error is divided by 4 to get average

%error value (1/4=0.25)

z=0.25*surn(abs(e));

end

Output

Optimization terminated: average change in the spread of Pare to solutions

less than options. TolFun.

Columns 1 th tough 7

0.0280 0.0041 0.0112 0.0069 0.0050

Columns 8 th tough 9

0.0018 0.0003

The number of generations was : 102

The number of function evaluations was : 13906

The best function value found was : 0.0177734

0.0062 0.0075 munotes.in

## Page 335

334SOFT COMPUTING TECHNIQUES

Optimization terminated: average change in the spread of Pare to solutions

less than options. TolFun.

Column s 1 th tough 7

0.0012 0.0020 0.0096 0.0014 0.0018

Columns a th tough 9

0.0084 0.0025

The number of generations was 102

The number of function evaluations was : 13906

The best function value found was 0.00988699

The output of the net is :

-1.0000 1.0000 -1.0000 1.0000

The error (t-y) of the net is :

1.0e-011

-0.3097 0.2645 -0.2735 0.3006

0.0044 0.0084

Figure 10.22 shows the plot of the generations versus fitness value and his togram.

Figure 10-23 illustrates the Neural Nenvork Training Tool for the given input and

output pa its. Figure 10-24 shows the neural network training p.erf'Oimance. Neural

necwork training state is shown in Figure 10-25. Figure 10-26 displays the

performance of uaining data versus output data.

munotes.in

## Page 336

335Chapter 10: Hybrid Soft Computing Techniques

munotes.in

## Page 337

336SOFT COMPUTING TECHNIQUES

10.8 Review Questions

1. State the limitations of neural network s and fuzzy systems when operated

individually.

2. List the various cypes of hybrid systems.

3. Mention the characteristics and properties of Neuro -fuzzy hybrid systems.

4. What are the classifications of Neuro -fuzzy hybrid sysrems? E xplain in derail

any one of the Neuro -fuzzy hybrid systems.

munotes.in

## Page 338

337Chapter 10: Hybrid Soft Computing Techniques

5. Give derails on the various applications of ncu tofuzzy hybrid systems.

6. How are genetic algorithms utilized for optimizing the w eights in neural

network archirecmre?

7. Explain in derail the concepts of fu zzy genetic hybrid systems.

8. Differentiate: ARTMAP and Fuzzy ARTMAP, Fuzzy ARTMAP and back -

Propagation neural networks.

9. Write nares on the supervised fuzzy ARTMAPs.

10. Give description on the operation of ANFIS Editor in MATI.AB.

Exercise Problem s

1. Write a MATLAB program m train NAND gate with binary inputs and

targe ts (rwo input -one Output) using adaptive Neuro -fuzzy hybrid technique.

2. Consider some alphabe ts of your own and recognize the assumed characters

using ANFIS Edi tor module in MATLAB

3. Perform Problem 2 for any assumed numeral charaaers. .

4. Design a genetic algoriilim to optimize the weights of a neural network model

while training Hybrid Soft Computing Techniques an OR gate wiili 2 bipolar

inputs and 1 bipolar targets.

5. Write a MATLAB M·file program for the working of washing machine using

fuzzy genetic hybrids .

REFERENCES:

S.Rajasekaran, G. A. Vijayalakshami , Neural Networks, Fuzzy Logic and Genetic

Algorithms: Synthesis & Applications, Prentice Hall of India, 2004

https://neptune.ai/blog/adaptive-mutation-in-genetic-algorithm

https://www.cs.ucdavis.edu/~vemuri/classes/ecs271/The%20GP%20Tutorial.htm

https://link.springer.com/article/10.1007/BF00175354

munotes.in