FYBMS-Business-statistics-SEM-I-munotes

Page 1

1 1
INTRODUCTION TO STATISTICS
Unit Structure
1.0 Objectives:
1.1 Introduction:
1.2 Functions/Scope
1.3 Importance
1.4 Limitations
1.5 Let us sum up
1.6 Unit end Exercises
1.7 List of References
1.0 OBJECTIVES:
After going through this chapter you will able to know:
• Definition of statistics
• Function and Scope of statistics.
• Importance of statistics in real life.
• Limitation of statistics.
1.1 INTRODUCTION
In our daily life, we come across many situation which are directly or
indirectly related to numbers.
• Average percentage of students.
• The largest earthquake measured 9.2 on the Richter scale.
• Men are at least 10 times more likely than women to smok e.
• In economics, graph of demand and price relation.
• Business growth rates any company.
• One in every 8 South Africans is HIV positive.
• Suppose you need to find the number of employed citizens in a city.
If the city has a population of 10 lakh peopl e, we will take a sample
of 1000 people. Based on this, we can prepare the data, which is the
statistic.
• you may want to predict the price of a stock in six months from now
on the basis of company performance measures and other economic
factors. munotes.in

Page 2

2 Business Statistics
2 • As a college student, you may be interested in knowing the
dependence of the mean starting salary of a college graduate, based
on your GPA.
These are just some examples that highlight how statistics are used in our
modern society. To figure out the desired info rmation for each example,
you need data to analyze.
The purpose of this course is to introduce you to the subject of statistics as
a science of data. There is data abound in this information age; how to
extract useful knowledge and gain a sound understandi ng of complex data
sets has been more of a challenge. In this course, we will focus on the
fundamentals of statistics, which may be broadly described as the
techniques to collect, clarify, summarize, organize, analyze, and interpret
numerical information.
This course will begin with a brief overview of the discipline of statistics
and will then quickly focus on descriptive statistics, introducing graphical
methods of describing data. You will learn about combinatorial probability
and random distributions, t he latter of which serves as the foundation for
statistical inference. On the side of inference, we will focus on both
estimation and hypothesis testing issues. We will also examine the
techniques to study the relationship between two or more variables; t his is
known as regression.
By the end of this course, you should gain a sound understanding of what
statistics represent, how to use statistics to organize and display data, and
how to draw valid inferences based on data by using appropriate statistical
tools.
The study of statistics involves math and relies upon calculations of
numbers. But it also relies heavily on how the numbers are chosen and how
the statistics are interpreted. Statistics are often presented in an effort to add
credibility to an argum ent or advice. You can see this by paying attention
to television advertisements. Many of the numbers thrown about in this way
do not represent careful statistical analysis. They can be misleading and
push you into decisions that you might find cause to re gret. For these
reasons, learning about statistics is a long step towards taking control of
your life.
Definition:
In simple words statistics is the study and manipulation of given data. It
deals with the analysis and computation of given numerical data. Let us take
into consideration some more definitions of statistics given by different
authors here:
The Merriam -Webster dictionary defines the term statistics as “The
particular data or facts and conditions of a people within a state - especially
the val ues that can be expressed in numbers or in any other tabular or
classified way”. munotes.in

Page 3

3 Introduction to Statistics According to Sir Arthur Lyon Bowley, statistics is defined as “Numerical
statements of facts or values in any department of inquiry placed in specific
relation to each other ”.
Statistics is a branch of mathematics that deals with the collection, review,
and analysis of data. It is known for drawing the conclusions of data with
the use of quantified models. Statistical analysis is a process of collecting
and evaluating data a nd summarizing it into mathematical form.
Statistics can be defined as the study of the collection, analysis,
interpretation, presentation, and organization of data. In simple words, it is
a mathematical tool that is used to collect and summarize data.
Uncertainty and fluctuation in different fields and parameters can be
determined only through statistical analysis. These uncertainties are
determined by the probability that plays a very important role in statistics.
Basics of Statistics
Statistics con sist of the measure of central tendency and the measure of
dispersion . These central tendencies are actually the mean, median, and
mode and dispersions comprise variance and standard deviation .
Mean is defined as the average of all the given data. Median is the central
value when the given data is arranged in order. The mode determines the
most frequent observations in the given data.
Variation can be defined as the measure of spread out of the collection of
data. Standard deviation is defined as the measure of the dispersion of data
from the mean and the square of the standard deviation is also equal to the
variance .
Mathematical Statistics
Mathematical statistics is the usage of Mathematics to Statistics. The most
common application of Mathematical statistics is the collection and analysis
of facts about a country: its economy, and, military, population, number of
employed citizens, GDP grow th, etc. Mathematical techniques like
mathematical analysis, linear algebra, stochastic analysis, differential
equation , and measure -theoretic probability theory are used for different
analytics.
Since probability uses statistics, Mathematical Statistics is an application of
Probability theory.
For analyzing the data, two methods are used:
1. Descriptive Statistics: It is used to synopsize (or summarize) the data
and their properties.
2. Inferential Statistics: It is used to get a conclusion from the data. munotes.in

Page 4

4 Business Statistics
4 In descriptive statistics, the data or collection of data is described in the form
of a summary. And the inferential stats are used to explain the descriptive
one. Both of these types are used on a large scale.
There is one more type of statistics, in which descriptive statistics are
transitioned into inferential stats.
1.2 FUNCTIONS AND SC OPE:
Function of Statistics:
1. It presents facts in numerical figures:
We can represent the things in their true form with the help of figures.
Without a statistical study, our ideas would be vague and indefinite.
The facts are to be given in a definite form. If the results are given in
numbers, then they are more convincing than if the results are
expressed on the basis of quality.
The statements like, there is lot of unemployment in India or
population is increasing at a faster rate are not in the definite form.
The statement should be in definite form like the population in 2004
would be 15% m ore as compared to 1990.
2. It presents facts in aggregated and simplified form:
The statistics are presented in a definite form so they also help in
condensing the data into important figures. So statistical method
present meaningful information. In oth er statistics help in simplifying
complex data to simple to make them understandable.
The data may be presented in the form of a graph, diagram or through
an average, or coefficients etc. for example, we cannot know the price
position from individual pric es of all good, but we can know it, if we
get the index of general level of prices.
3. It facilitates comparison:
After simplifying the data, it can be correlated as well as compared.
The relationship between the two groups is best represented by certain
mathematical quantities like average of coefficients etc. Comparison
is one of the main functions of statistics as the absolute figures convey
a very less meaning.
4. Formulation and Testing hypothesis:
These statistical methods help us in formulating and testing the
hypothesis or a new theory. With the help of statistical techniques, we
can know the effect of imposing tax on the exports of tea on the
consumption of tea in other countries. The other example could be to
study whether credit squeeze is effec tive in checking inflation or not. munotes.in

Page 5

5 Introduction to Statistics 5. Forecasting:
Statistics is not concerned with the above functions, but it also
predicts the future course of action of the phenomena. We can make
future policies on the basis of estimates made with the help of
statis tics. We can predict the demand for goods in 2005 if we know
the population in 2004 on the basis of growth rate of population in
past. Similarly a businessman can exploit the market situation in a
successful manner if he knows about the trend in the market . The
statistics help in shaping future policies.
6. Policy making or decision making:
With help of statistics we can frame favourable policies. How much
food is required to be imported in 2007? It depends on the food
production in 2007 and the demand f or food in 2007. Without
knowing these factors we cannot estimate the amount of imports. On
the basis of forecast the government forms the policies about food
grains, housing etc. but if the forecasting is not correct, then the whole
set up will be affecte d.
7. Its enlarge knowledge:
Whipple rightly remarks that “Statistics enables one to enlarge his
horizon”. So when a person goes through various procedures of
statistics, it widens his knowledge pattern. It also widens his thinking
and reasoning power. It also helps him to reach to a rational
conclusion.
8. To measure uncertainty
Future is uncertain, but statistics help the various authorities in all the
phenomenon of the world to make correct estimation by taking and
analyzing the various data of the part. So the uncertainty could be
decreased. As we have to make a forecast we have also to create trend
behaviors of the past, for which we use techniques like regression,
interpolation and time series analysis.
Scope of Statistics:
Statistics can be used in many major fields such as psychology, geology,
sociology, weather forecasting, probability, and much more. The main
purpose of statistics is to learn by analysis of data, it focuses on applications,
and hence, it is distinctively considered as a ma thematical science.
Applications of Statistics
Information around the world can be determined mathematically through
Statistics. There are various fields in which statistics are used:
1. Mathematics: Statistical methods like dispersion and probability are
used to get more exact information. munotes.in

Page 6

6 Business Statistics
6 2. Business: Various statistical tools are used to make quick decisions
regarding the quality of the product, preferences of the customers, the
target of the mark et etc.
3. Economics: Economics is totally dependent on statistics because
statistical methods are used to calculate the various aspects like
employment, inflation of the country. Exports and imports can be
analysed through statistics.
4. Medical: Using statistics, the effectiveness of any drug can be
analysed. A drug can be prescribed only after analysing it through
statistics.
5. Quality Testing: Statistics samples are used to test the quality o f all
the products a Company produces.
6. Astronomy: Statistical methods help scientists to measure the size,
distance, etc. of the objects in the universe.
7. Banking: Banks have several accounts to deposit customers’ money.
At the same time, Banks hav e loan accounts as well to lend the money
to the customers in order to earn more profit from it. For this purpose,
a statistical approach is used to compare deposits and the requesting
loans.
8. Science: Statistical methods are used in all fields of scien ce.
9. Weather Forecasting: Statistical concepts are used to compare the
previous weather with the current weather so as to predict the
upcoming weather.
There are various other fields in which statistics is used. Statistics have a
number of applications in various fields in Mathematics as well as in real
life. Some of the major uses of statistics are given below:
• Applied statistics, theoretical statistics, and mathematical statistics
• Machine learning and data mining
• Statistical computin g
• Statistics is effectively applied to the mathematics of the arts and
sciences
• Used for environmental and geographical studies
• Used in the prediction of weather
1.3 IMPORTANCE:
The student’s aims in his study of Statistics:
1. To master the vocabu lary of statistics: in order to read and understand
a foreign language, there is always the necessity of building up an
adequate vocabulary. To the beginner, statistics should be regarded as munotes.in

Page 7

7 Introduction to Statistics a foreign language. The vocabulary consists of concepts that are
symbolized by words and by letter symbols.
2. To acquire, or to revive, and to extend skill in computation: Statistics
aims at developing computational skills within the students. The
understanding of statistics concepts comes largely through applying
them in computing operations.
3. To learn to interpret statistical results correctly: Statistical results can
be useful only to the extent that they are correctly interpreted. With
full and proper interpretation extracted from data, statistical results
are the most powerful source of meaning and significance.
Inadequately interpreted, they may represent something worse than
wasted effort. Erroneously understood they are worse than useless.
4. To grasp the logic of statistics: Statistics provides a way of thinki ng
as well as a vocabulary and a language. It is a logical system, like all
mathematics, which is peculiarly adaptable to the handling of
scientific problems. Guilford has rightly remarked, “ well -planned
investigations always include in their design clear considerations of
the specific statistical operations to be employed.”
5. To learn where to apply statistics and where not to: while all statistical
devices can illuminate data, each has its limitations. It is in this respect
that the average student will probably suffer most from lack of
mathematical background, whether he realizes it or not. Every
statistics is developed as a purely mathematical idea. As such, it rests
upon certain assumptions. If those assumptions are true of the
particular data with w hich we have to deal, the statistical may be
appropriately applied.
6. To understand the underlying mathematics of statistics: This objective
will not apply to all students. But it should apply to more than those
with unusual previous mathematical training . This will give him a
more than commonsense understanding of what goes on in the use of
formulas.
1.4 LIMITATIONS
We are known that Statistics is very important tool to study any kind of data
but while study statistics we have some limitation as given below.
1. Qualitative aspect ignored: The statistical methods don’t study the
nature of phenomenon which cannot be expressed in quantitative
terms. Such phenomena cannot be a part of the study of statistics.
These include health, riches, intelligence etc. it needs conversion of
qualitative dat a into quantitative data. So experiments are being
undertaken to measure the reactions of a man through data. Now a
days statistics is used in all the aspects of the life as well as universal
activities. munotes.in

Page 8

8 Business Statistics
8 2. It does not deal with individual items: it is cle ar from the definition
given by Prof. Horace Sacrist, “ by statistics we mean aggregates of
facts and placed in relation to each other”, that statistics deals with
only aggregates of facts or items and it does not recognize any
individual items. Thus, indi vidual terms as death of 6 persons in a
accident, 85% results of a class of a school in particular year, will not
amount to statistics as they are not placed in a group of similar items.
It does not deal with the individual items, however, important they
may be.
3. It does not depict entire story of phenomenon: when even
phenomena happen, that is due to many causes, but all these causes
cannot be expressed in terms of data. So we cannot reach at the correct
conclusions. Development of a group depends upon many social
factors like, parents economics condition, education, culture, region,
administration by government etc. but all these factors cannot be
placed in data. So we analyse only that data we find quantitatively and
not qualitatively. So results or c onclusion are not 100% correct
because many aspects are ignored.
4. It is liable to be miscued: As W.I. King points out, “One of the short -
comings of statistics is that do not bear on their face the label of their
quality.” So we can say that we can check the data and procedures of
its approaching to conclusions. But these data may have been
collected by inexperienced persons or they may have been dishonest
or biased. As it is a delicate science and can be easily misused by an
unscrupulous person. So data m ust be used with a caution. Otherwise
results may prove to be disastrous.
5. Law are not exact: As far as two fundamental laws are concerned
with statistics, i) law of inertia of large number and ii) law of statistical
regularity, are not as good as their science laws. They are based on
probability. So these results will not always be as good as of scientific
laws. On the basis of probability or interpolation, we can only
estimate the production of paddy in 2008 but cannot make a claim that
it would be exac tly 100%. Here only approximations are made.
6. Results are true only on average: As discussed above, here the
results are interpolated for which time series or regression or
probability can be used. These are not absolutely true. If average of
two section s of students in statistics is same, it does not mean that all
the 50 students is section A has got same marks as in B. There may
be much variation between the two. So we get average results.
“Statistics largely deals with averages and these averages may be
made up of individual items radically different from each other.” –
W.L. King.
7. To many method to study problems: In this subject we use so many
methods to find a single result. Variation can be found by quartile
deviation, mean deviation or standard deviations and results vary in
each case. “ it must not be assumed that the statistics is the only munotes.in

Page 9

9 Introduction to Statistics method to use in research, neither should this method of considered
the best attack for the problem.” – Croxten and Cowden.
8. Statistical results are not al ways beyond doubt: Although we use
many laws and formulae in statistics but still the results achieved are
not final and conclusive. As they are unable to give complete solution
to a problem, the result must be taken and used with much wisdom.
“Statistics deals only with measurable aspects of things and therefore,
can seldom give the complete solution to problem. They provide a
basis for judgment but not the whole judgment.” -Prof. L.R Connor.
Hence statistics is very useful tool but it depends on how the
statistician used it more effectively. It depends on the requirement of
data analysis.
1.5 LET US SUM UP:
In this chapter we have learn:
• Definition of statistics.
• Functions and Scope of statistics.
• Importance of study of statistics.
• Limitations of statistics.
1.6 UNIT END EXERCISES:
1. What is Statistics? Explain its various uses.
2. Discuss important of Statistics.
3. Give various applications of statistics in Business and Economics.
4. Discuss limitation of Statistics.
5. Explain briefly the functions of Statistics.
Multiple Choice Questions:
1) Statistics is applied in
a) Economics b) Business management
c) Commerce and industry d) All these
2) Which of the following is a branch of statistics?
a) Descriptive statistics b) Inferent ial statistics
c) Industry statistics d) Both A and B
3) Which of the following statement is false?
a) Statistics is derived from the latin word ‘Stastu’.
b) Statistics is derived from the Italian word ‘Statista’.
c) Statistics is derived from the Frenc h word ‘Statistik’.
d) None of these munotes.in

Page 10

10 Business Statistics
10 4) Statistics is defined in terms of numerical data in the
a) Singular sense b) Plural sense
c) Either (a) or (b) d) both (a) and (b)
5) Statistics concerned with
a) Qualitative information b) Quantitative informat ion
c) Either (a) or (b) d) both (a) and (b)
6) An attribute is
a) A qualitative characteristic b) A quantitative characteristic
c) A measurable characteristic d) All these
1.7 LIST OF REFERENCES :
• Fundamentals of mathematical Statistics by S.C. Gupta and V.K
Kapoor.
• Basic Statistics by B. L. Agrawal.
7777777
munotes.in

Page 11

11 2
STATISTICAL DATA
Unit Structure
2.0 Objectives:
2.1 Introduction:
2.2 Relevance of Data,
2.3 Primary
2.4 Secondary
2.5 Census and Sample survey
2.7 Let us sum up
2.8 Unit end Exercises
2.9 List of References
2.0 OBJECTIVES:
After going through this chapter you will able to know:
• Different types of data.
• How to collect data?
• Different method of collection of data.
• Difference between primary and secondary data
• Sample survey and census.
2.1 INTRODUCTION:
The statistical methods or techniques are applicable only when some data
are available irrespective of the methods or data collection. The data are
collected either by experiments or by survey method and they are tabulated
and analysed statistically. Whatever may be the resulting value obtained
from analysis, proper and correct interferences have to be drawn from these
numerical values. These inferences lead to a final decision.
Our society is highly dependent on data, which underscores the importance
of collecting it. Accurate data collection is necessary to make informed
business decisions, ensure quality assurance and keep research integrity.
During data collection, the statist ician must identify the data types, the
sources of data and what methods are being used. We will soon see that
there are many different data collection methods. There is heavy reliance on
data collection in research, commercial, and government fields.
Before an analyst begins collecting data, we can break up data into
quantitative and qualitative types. Qualitative data covers descriptions such munotes.in

Page 12

12 Business Statistics
12 as color, size, quality and appearance. Quantitative data unsurprisingly,
deals with numbers such as statistics, poll numbers, percentages etc.
Some basic terminology: In statistics as well as in quantitative
methodology, the set of data are collected and selected from a statistical
population with the help of some defined procedures. There are two
different types of data sets namely, population and sample.
Population: In statistics, population is the entire set of items from which
you draw data for a statistical study. It can be a group of individuals, a set
of items, etc. it makes up the data pool for a study. Gener ally, population
refers to the people who live in a particular area at a specific time. But in
statistics, population refers to data on your study of interest. It can be a
group of individuals, objects, events, organizations, etc. For e.g. in
statistical s tudy we may have a population of a number of students in
college.
For the above situation, it is easy to collect data. The population is small
and willing to provide data and can be contacted. The data collected will be
complete and reliable.
Sample: A sam ple is defined as a smaller and more manageable
representation of a larger group. A subset of a larger population that
contains characteristics of that population. A sample is used in statistical
testing when the population size is too large for all member s or observations
to be included in the test.
Variable: A characteristic from the population which can be expressed
numerically and which varies from object to object is called a variate. For
e.g. weight of students, wages of employees can be measured quan titatively
and so these are variates.
Attribute: Certain characteristics cannot be expressed quantitatively but
they can be described qualitati vely. For e.g. intelligence, beauty, sex, etc.
These are called attributes.
Parameter: A statistical measure like mean, standard deviation which is
calculated for all objects in the population is called a parameter.
2.2 RELEVANCE OF DAT A:
In today’s world everything runs on data. Be it from social media to large
companies. The term data ref ers to information about anything. Each
company and each institution has a set of data to be maintained and analysed
to improve and evaluate the growth of the companies. Analysis of the data
or in other words, data analytics is a vast field and one of the most important
fields to cover today.
The term data analytics refers to the analysis of the data collected to draw
out certain conclusions required as per the company’s or researcher
objective. It involves the structuring of a massive amount of irregular data
and deriving the useful required information from them using statistical munotes.in

Page 13

13 Statistical Data tools. It all involves the preparation of charts, graphs etc. The application of
data analytics is not limited to manufacturing companies or any industrial
areas, but it get invo lved in almost every field of human living.
In today’s world are moving towards the digital economy, companies have
to access to more data than ever before. This data creates a foundation of
intelligence for important business decisions. To ensure employee s have the
right data for decision making, companies must invest in data that improve
visibility, reliability, security and scalability.
As application of statistics involves data and therefore the question comes
in mind that how to collect data and what are the sources of data?
Data represents information collected in the form of numbers and text. Data
collection is generally done after the experiment or observation. Primary
data and secondary data are helpful in planning and estimating. Data
collection i s either qualitative or quantitative.
Different types of data collection methods are used in business and sales
organisations to analyse the outcome of a problem, arrive at a solution, and
understand a company’s performance. Furthermore, there are two type s of
data collection methods, namely, primary data collection and secondary
data collection methods.
There are two categories of data namely, i) primary data and ii) secondary
data.
2.3 PRIMARY DATA :
The data which are collected from the units or individual respondents
directly for the purpose of certain study or information are known as
primary data. For instance, an enquiry is made from each tax payer in a city
to obtain their opinion about the tax collecting machinery. The data obtained
in a stu dy by the investigator are termed as primary data. If an experiment
conducted to know the effect of certain fertilizer doses on the yield or the
effect of a drug on the patients, the observations taken on each plot or patient
constitute the primary data.
The primary data is the information collected by researcher or investigator
for the purpose of the enquiry for the first time. The following are the
methods using which the primary data can be collected.
i) Direct personal investigation
ii) Indirect oral investigation
iii) Questionnaires and schedules
Direct personal investigation: In this method the investigator directly
meet to person and collect data personally using the following method.
a) Personal contact Method: As the name says, the investigator himself
goes to the field, meets the respondents and gets the required
information. Here investigator personally interviews the respondent munotes.in

Page 14

14 Business Statistics
14 either directly or through phone or through electronic media. This
method is suit able when the scope of investigation is small and greater
accuracy is needed.
b) Telephonic interviewing: In the present age of communication
explosion, telephones and mobile phones are extensively used to
collect data from the respondents. This saves the cos t and time of
collecting the data with a good amount of accuracy.
Indirect Method I nvestigation : The indirect method is used in cases where
it is delicate or difficult to get the information from the respondents due to
unwillingness or indifference. The in formation about the respondent is
collected by interviewing the third party who knows the respondent well.
Instances for this type of data collection include information on addiction,
marriage proposal, economics status, witnesses in court, criminal
proce eding etc. the shortcoming of this method is genuineness and accuracy
of the information, as it completely depends on the third party.
Local correspondents: In this method the investigator appoints local
agents or correspondents in different places. They collect the information
on behalf of the investigator in their locality and transmit the data to the
investigator or headquarters. This method is adopted by newspapers,
government agencies and trading concerns. This method is less accurate but
quick and more expensive.
Questionnaires and Schedules: A questionnaire contains a sequence of
questions relevant to the study arranged in a logical order. Preparing a
questionnaire is a very interesting and challenging job and required good
experience and skill. Questionnaires include open -ended questions and
close -ended questions allow the respondent considerable freedom in
answering. However, questions are answered in details. Close -ended
question s have to be answered by the respondent by choosing an answer
from the set of answers given under a question just by ticking.
Before starting the investigation, a question sheet is prepared which is called
schedule. The schedu le contains all the questions which would extract a
complete information from a respondent. The order of questions the
language of the questions and the arrangement of parts of the schedule are
not changed. However the investigator can explain the question s if the
respondent faces any difficulty. It contains direct questions as well as
question in tabular form.
Following are the essential of a good questionnaire:
1. The length of questionnaire should be proper one and limited .
2. The language used should be easy and simple. It should not convey
two meanings.
3. The term used in questionnaire are explained properly.
4. The questions should be arranged in a proper way.
5. The questions should be in logical manner. munotes.in

Page 15

15 Statistical Data 6. The questions should be in analytical form.
7. Complex question s should be broken into filter questions.
8. The questions should be described precisely and correctly.
9. The questionnaire should be constructed for a specific period of time.
10. The questions should be moving around the theme of the investigator.
11. In questionnaire personal questions should be avoided as far as
possible.
12. The answers should be short, simple , accurate and direct one .
2.3.1 Editing the primary data :
Once a preliminary draft of the questionnaire has been designed, the
researcher is obligated to critically evaluate and edit, if needed. This phase
may seem redundant, given all the careful thoughts that went into each
question. But recall the crucial role pla yed the questionnaire. The following
points must be remembered.
1. The main objective of editing is to detect possible errors and
irregularities .
2. While editing primary data the following considerations need
attention.
3. The data should be complete.
4. The data sho uld be accurate.
5. The data should consistent.
6. The data should be homogeneous.
Advantages of primary data:
• Data collected is very specific to the problem and its useful.
• Quality of the data collected is not doubtful and is meaningful.
• It may lead to the discovery of additional data and information during
its collection.
• It is more accurate and it can be edited or update afterwards.
Disadvantage of primary data:
• There are numerous hassles involved in the collection of primary data
like taking a decision s uch as how, when, what and why to collect.
• The cost involved in the collection of data is very high.
• The collection of primary data is more time consuming.
2.4 SECONDARY :
The data collected through various published or unpublished sources by
certain people or agency is known as secondary data. Now information
contained in it is used again from records, processed and statistically munotes.in

Page 16

16 Business Statistics
16 analysed to extract some information for other purpose, is termed as
secondary data. Such data are cheaper and more quickly obtainable than the
primary data and also may be available when primary data cannot be
obtained at all.
Types of Sources of Secondary Data :
Secondary data can be obtained from different sources:
1) Published sources : Secondary data is usually gathered from
published (printed) sources. A few major sources of published
information are as follows:
• Published articles of local bodies, and central and state
governments .
• Statistical synopses, census records, and other reports issued by
the d ifferent departments of the government .
• Official statements and publications of the foreign governments
• Publications and reports of chambers of commerce, financial
institutions, trade associations, etc.
• Magazines, journals, and periodicals .
• Publications of government organizations like the Central
Statistical Organization (CSO), National Sample Survey
Organization (NSSO) .
• Reports presented by research scholars, bureaus, economists,
etc.
2) Unpublished sources : Statistical data can be obtained from several
unpublished references.
• Some of the major unpublished sources from which secondary
data can be gathered are as follows:
• The research works conducted by teachers, professors, and
professionals
• The records that are maintained by private and bu siness
enterprises
• Statistics are maintained by different departments and agencies
of the central and the state government, undertakings,
corporations, etc.
• There are various secondary sources of data col lection. Some of
these include:
• Books, Magazines, and Newspapers: Newspapers, and
magazines also carry out surveys and interviews of their own on
various aspects like socio -economic conditions, crimes in the
country, etc. munotes.in

Page 17

17 Statistical Data • Reports: Industries and trade associations also publish reports
periodically which contain data regarding trade, production,
exports, imports, and the like. The information in these reports
will facilitate different types of secondary research.
• Publications by Reno wned Organisations: Organisations like
WHO, ICMR, and oth er renowned national and international
bodies carry out timely surveys and case studies of their own
which they then publish on their websites. The data and
statistics in these surveys can be accessed by almost everyone
by visiting their official website.
• Research Articles: Several websites publish research papers by
scholars and scientists from respective fields like medicine,
finance, economics, etc., which act as secondary data
information.
• Gove rnment Data: Data released by the government of any
country is one of the largest sources of secondary data.
Sometimes, the central or state government sets up committees
to look into some issues. These committees publish reports
based on their investigation, which function as a valuable source
of secondary data.
• Advantage of secondary data:
• It is less expensive. It saves efforts and time.
• It helps to make primary data collection more specific since
with the help of secondary data, we are able to make out what
are the gaps and deficiencies and what additional infor mation
needs to be collected.
• It helps to improve the understanding of the problem.
• The fact that much information exists in documented form.
• Many existing data sets are enormous, and far greater than the
researcher would be able to collect him or herself, with a far
larger sample.
• Disadvantage of Secondary data:
• Accuracy of secondary data is not known.
• Data may be outdated.
• Collecting primary data builds up more research skills than
collecting secondary data.
• The researcher has no control over the quality of the data .
• The data cannot be edited. munotes.in

Page 18

18 Business Statistics
18 2.5 CENSUS AND SAMPL E SURVEY:
• Census: A statistical investigation in which the data are colle cted for
each and every element or unit of the population is termed as census
method. It is also known as ‘complete enumeration’ or ‘100%
enumeration’ or ‘complete survey’ . A census method is that process
of the statistical list where all members of a population are analysed.
The population relates to the set of all observations under concern.
For instance, if you want to carry out a study to find out student’s
feedback about the amenities of your school, then all the students of
your school would form a component of the ‘population’ for your
study. In our country, the Government conducts the Census of
India every ten years. The Census appropriates information from
households regarding their incomes, the earning members, the total
number of children, members of the family, etc. This method must take
into account all the units. It cannot leave out anyone in colle cting data.
But instead of study entire population we can study part of the
population is called sample.
• Sample survey: A survey involving the collection of data about
sample of units selected from the population is called sample survey.
For example if we want to find the mean weight of all BMS students
studying in University of Mumbai, we may select a sample of BMS
students from ID OL. Here the survey in which the data regarding
weights of IDOL BMS students are obtained is a sample survey.
A sample survey required information about only a fraction of the
population and therefore it saves money and time in comparison to
census survey . In sample survey the amount of work is reduced and
so we can afford and obtain more accurate results.
Steps in involving in a sample survey:
• Each objective should be defined clearly, and relevant questions
related to the objective should be introduced i n the questionnaire.
• These objectives should be measurable, specific and should be able to
help to derive the results that are expected from the survey.
• Your objectives thoroughly and clearly, the next step is to determine
the population.
• Selecting a sample from the population is a significant step .
• The next step would be designing the survey. Many of the sample
surveys collect different types of information .
• The next step would be to implement the survey to the required
sample population and collect t he data from them .
• The next step is to analyze the data. Statistically, correct data should
be analyzed so that the results are precise. munotes.in

Page 19

19 Statistical Data • The analysis is done by keeping the objectives of the study in mind so
that the results that are obtained relevant to the study.
• Data analysis should be done to arrive at proper conclusions that are
relevant to the study.
2.6 LET US SUM UP:
In this chapter we have learn:
• Different types of data i.e. primary data and secondary data.
• Different Methods of collection of primary data.
• How to edit primary data.
• Different sources of secondary data.
• Census method of data collection for population.
• Sample survey method for data collection sample .
2.7 UNIT END EXERCIS ES:
1. Write the short on types of data.
2. Distinguish betw een primary data and secondary data.
3. Distinguish between the census method and sampling method.
4. What are different methods to collect primary data by direct
investigation method?
5. Explain how primary data is edited.
6. Write advantage and disadvant age of primary data.
7. Write advantage and disadvantage of secondary data.
8. Explain the sources of secondary data.
9. Describe the various steps involving to conduct sample survey.
10. What are the requirements of a good questionnaire?
11. What are the various ways of collecting primary data?
12. Write a short note on Census.
Multiple Choice Questions:
1) Data collected on religion from the census reports are
a) Primary data b) Secondary data
c) Sample data d) (a) or (b)
2) The primary data collecte d by
a) Interview method b) observation method
c) Questionnaire method d) All these
3) The quickest method to collect primary data is munotes.in

Page 20

20 Business Statistics
20 a) Personal interview b) Indirect interview
c) Telephone interview d) By observation
4) Some important sources of secondary data are
a) International and government sources
b) International and primary sources
c) Private and primary sources
d) Government sources
5) The best method to collect data, In case of a natural calamity, is
a) Personal interview b) Indirect i nterview
c) Questionnaire method d) Indirect observation metho
2.8 LIST OF REFERENC ES:
• Fundamentals of mathematical Statistics by S.C. Gupta and V.K
Kapoor.
• Basic Statistics by B. L. Agrawal.
7777777
munotes.in

Page 21

21 3
PRESENTATION OF DATA
Unit Structure
3.0 Objectives:
3.1 Introduction:
3.2 Classification of Data,
3.3 Tabulation
3.4 Graph
3.5 Let us sum up:
3.6 Unit end Exercises:
3.7 List of References:
3.0 OBJECTIVES:
After going through this chapter you will able to know:
• The classification of data to represent the data.
• To represent data we can used tabulation.
• Types of tabulation of data.
• To represent the data we used graphical methods.
3.1 INTRODUCTION:
After collecting, the desired data the first step is to be taken is to classify
and tabulate the data. In order to make the data simple and easily
understandable, simplify them in such a way that irrelevant data are
removed and their significant features are standing out prominently. The
procedure adopted for this purpose is known as method of classification and
tabulation. The classification and tabulation provide a clear picture of the
collected data and on that basis the further processing is decided.
After study of the importance and techniques of classificatio n and tabulation
that help to arrange the mass of collected data in a logical and summarize
manner. However, it is a difficult and cumbersome task for common man
and researcher to interpret the data. Too many figures are often confusing
and may fail to con vey the message effectively to those for whom it is
meant.
To overcome this inconvenience, the most appealing way in which
statistical results may be presented is through diagrams and graphs.
A diagram is a visual form for presentation of statistical data , highlighting
their basic facts and relationships. If we draw diagrams on the basis of the
data collected they will be understood and appreciated by all. Every day we munotes.in

Page 22

22 Business Statistics
22 can find the presentation of stock market, cricket score etc. in newspaper,
television a nd magazines in the form of diagrams and graphs.
In this chapter we will discuss classification, tabulation and some of the
major types of diagrams, graphs and maps frequently used in presenting
statistical data.
3.2 CLASSIFICATION O F DATA:
“Classificatio n is the process of arranging data into sequences and
groups according to their common characteristics or separating them
into different but related parts”. – Secrist
Usually the data can be collected through questionnaire, schedules or
response sheets. Th is collected data need to be consolidated for the purpose
of analysis and interpretation. This process is known as Classification and
Tabulation. We can include a huge volume of data in a simple statistical
table and one can get an outline about the model by observing the statistical
table rather the raw data. To construct diagrams and graphs, it is essential
to tabulate the data.
For Example, letters in the post office are classified according to their
destinations viz., Delhi, Madurai, Bangalore, Mumbai etc.,
Requisites of Ideal Classification
It should be unambiguous : There should be no uncertainty or ambiguity.
Classes should be defined rigidly, so as to avoid any ambiguity.
It should be flexible: The classification should be enough to accommodate
change, amendment and inclusion in various classes in accordance with new
situations.
It should be homogeneous: Units of each class should be homogeneous.
All the units included ia a class or group should be present according to the
property on basis of which the classification was done.
It should be suitable for the purpose: The composition of the class should
be according to the purpose.
For example: to find out the economic condition of the persons, create
classes on the basis of income.
It should be stab le: Stability is necessary to make data comparable and to
make out meaningful comparison of the results. This means that the
classification of data set into different classes must be performed is a way,
that whenever an investigation is carried out, there is no change in classes
and so the results of the investigation can be compared easily.
It should be exhaustive: Each and every item of data must belong to a
particular class. An ideal classification is one that is free from any residual
classes such as ot hers or miscellaneous, as they do not state the
characteristics clearly and completely. munotes.in

Page 23

23 Presentation of Data It should be mutually exclusive : The classes should be mutually exclusive.
Formation of a Discrete Frequency Distribution:
The formation of discrete frequency distribut ion is quite simple. The
number of times a particular value is repeated is noted down and mentioned
against that values instead of writing that value repeatedly. In order to
facilitate counting prepare a column of tallies. In another column, place all
possible values of variable from lowest to the highest. Then put a bar
(vertical line) opposite the particular value to which it relates. To facilitate
counting, blocks of five bars are prepared and some space is left in between
each block. Finally count the n umber of bars and get the frequency.
Example 1: The daily wages in Rs. Paid to the works are given below. Form
the Discrete Frequency Distribution.
300, 200, 200, 100, 100, 200, 100, 100, 400, 100, 100, 200, 400, 300, 100,
300, 100, 100, 200, 100, 100, 1 00, 200, 100, 200, 100, 100, 100, 400, 200.
Solution: Frequency distribution of daily wages in Rs Daily wages (in Rs.) Tally Marks No. of workers 100 IIII IIII IIII I 16 200 IIII III 08 300 III 03 400 III 03 Total = 30 Formation of a Continuous Frequency Distribution :
The following technical terms are important when a continuous frequency
distribution is formed.
a. Class Limits: Class limits are the lowest and the highest values that
can be included in a class. The two boundaries of class are kn own as
the lower limit and the upper limit of the class. The lower limit of the
a class is the value below which there can be no item in the class. The
upper limit of the a class is the value above which there can be no
item in the class.
For example, for the class 20 – 40, 20 is the lower limit and 40 is the
upper limit. If there was an observation 40.5, it would not be included
in this class. Again if there was an observation of 19.5, it would not
be included in this class.
b. Class Intervals: The differ ence between upper and lower limit of the
class is known as class interval of that class.
For example, in the class 20 – 40, the class interval is 20 (i. e. 40
minus 20 ). An important decision while constructing a frequency munotes.in

Page 24

24 Business Statistics
24 distribution is about the widt h of the class interval i. e. whether it
should be 10, 20, 50, 100, 500 etc. It depends upon the range in the
data, i. e. the difference between the smallest and largest item, the
details required and number of classes to be formed, etc.
Following is the simple formula to obtain the estimate of appropriate
class interval,
i =௅ିௌ
௞ ,
where i = Class interval,
L = largest item
S = smallest item
K = the number of classes.
For example, if the salary of 100 employees in a company undertaking
varied between Rs. 1000 to 6000 and we want to form 10 classes then
the class interval would be
i =௅ିௌ
௞
L = 6000, S = 1000, k = 10
i =௅ିௌ
௞ = ଺଴଴଴ିଵ଴଴଴
ଵ଴ = 500
The staring class would be 500 – 1000, 1000 – 1500 and so on.
The question now is how to fix the number of classes i.e. k. The
number can be either fixed arbitrarily keeping in view the nature of
problem under study or it can be decided with the help of Sturge’s
Rule.
According to Sturge’s Rule, number of classes can be determined b y
the formula:
k = 1 + 3.322 log N
where N = total number of observations and
log = logarithms of the number.
Therefore, if 10 observations are there, the number of classes shall
be,
k = 1 + (3.322 x 1) = 4.322 or 4 [׶ log 10 = 1]
Therefore, if 100 observations are there, the number of classes shall
be,
k = 1 + (3.322 x 2) = 7.644 or 8 [׶ log 100 = 2]
It should be noted that since log is used in the formula, the number
of classes shall be between 4 and 20. It cannot be less than 4 even if
N is less than 10 and if N is 10 lakh, k will be 1 + (3.322 x 6) = 20.9
or 21. munotes.in

Page 25

25 Presentation of Data c. Class Frequency: The number of observations corresponding to a
particular class is known as the frequency of that class or the class
frequency.
d. Clas s Mid -point or Class mark: Mid-point of a class -interval is
calculated for further calculations in statistical work.
Mid-point of a class = ௎௣௣௘௥ ௟௜௠௜௧ ௢௙ ௧௛௘ ௖௟௔௦௦ା௅௢௪௘௥ ௟௜௠௜௧ ௢௙ ௧௛௘ ௖௟௔௦௦
ଶ
Methods of Data Classification:
There are two methods of classifying the data according to class intervals.
a. Exclusive method: When the class intervals are so fi xed that the
upper limit of one class is the lower limit of the next class it is known
as the exclusive method of classification. Weight (in kg) No. of Students 40-50 40 50-60 160 60-70 110 70-80 200 80-90 90 In the above example, there are 40 stu dents whose weight is between
40 to 49.99 kg. A student whose weight is 50 kg is included in the
class 50 – 60.
b. Inclusive method: In this method, the upper limit of one class is
included in that class itself.

In the class 40 – 49, we include students whose weight is between 40
kg and 49 kg. If the weight of the student is exactly 50 kg he is
included in next class.
Example 2: Prepare a frequency distribution for the students marks
data.
25 85 41 70 85 55 85 55 72
72 50 90 52 68 72 52 91 53
79 75 60 35 65 80 70 70 36
66 55 80 72 41 88 60 45 78
42 90 66 47 80 88 91 82 50
52 55 72 68 65 Weight (in kg) No. of Students 40-49 40 50-59 160 60-69 110 70-79 200 80-89 90 munotes.in

Page 26

26 Business Statistics
26 Solution: Since the lowest value is 25 and the largest value is 91, we
take class in tervals of 10. Marks Tally Marks Frequency 25 – 35 I 01 35 – 45 IIII 05 45 – 55 IIII III 08 55 – 65 IIII I 06 65 – 75 IIII IIII IIII 14 75 – 85 IIII II 17 85 – 95 IIII IIII 19 Total = 50 Example 3: Prepare a frequency distribution for the fo llowing data
by taking class interval such that their mid values are 17, 22, 27, 32
and so on.
30 30 36 33 42 27 22 41 30 42
30 21 54 36 31 40 28 19 48 26
48 15 37 16 17 54 42 51 44 32
42 31 21 25 36 22 41 40 46 52
Solution: Since we have to classify the d ata in a such manner that
the mid values are 17, 22, 27, 32 and so on The first class interval
should be 15 – 19 (mid -value = (15 + 19)/2 = 17). Variable Tally Marks Frequency 15 – 19 IIII 4 20 – 24 IIII 4 25 – 29 IIII 4 30 – 34 IIII III 8 35 – 39 IIII 4 40 – 44 IIII IIII 9 45 – 49 III 3 50 – 54 IIII 4 Total = 40 3.3 TABULATION:
The simplest and most revealing devices for summarizing data and
presenting them in a meaningful manner is the statistical table. After
classifying the statistical da ta, next step is to present them in the form of
tables. A table is a systematic organization of statistical data in rows and
columns. The purpose of a table is to simplify the presentation and to
facilitate comparisons. The main objective of tabulation is to answer various
queries concerning the investigation. Tables are very helpful for doing
analysis and drawing inferences from them. munotes.in

Page 27

27 Presentation of Data Classification and tabulation go together, classification being the first step
in tabulation. Before the data are put in t abular form, they have to be
classified.
Objectives of Tabulation
• To simplify complex data: It reduces raw data in a simplified and
meaningful form. The reader gets a very clear idea of what the table
present. It can be easily interpreted by a common p erson in less time.
• To facilitate comparison: Since the table is divided into rows and
columns, for each row and column there is total and subtotal, the
relationship between different parts of data can be done easily.
• To bring out essential features of data : It brings out main features
of data. It presents facts clearly and precisely without textual
explanation.
• To give identity to the data : when the data are arranged in a table
with title and number, they can be differently identified.
• To save space : Table saves space without sacrificing the quality and
quantity of data.
Parts of Table
Generally, a table should be comprised of the following components:
1. Table Number: Each table must be given a number. Table number
helps in distinguishing one table from other tables. Usually tables are
numbered according to the order of their appearance in a chapter. For
example, the first table in the first chapter of a book should be given
number 1.1 and second table of the same chapter be given 1.2. Table
number should be given at its top or towards the left of the table.
2. Title of the Table: The title is a description of the contents of the
table. Every table must be given suitable title. A complete title has to
answer the questions what categories of sta tistical data are shown,
where the data occurred and when the data occurred. The title should
be clear and brief. It is placed either just below the table number or at
its right.
3. Caption: Caption refers to the column headings. It may consist of one
or more column headings. Under one column there may be sub heads.
The caption should be clearly defined and placed at the middle of the
column. If the different columns have different units, the units should
be mentioned with the captions.
4. Stub: Stub refe rs to the rows or row heading. They are at extreme left
of the table. The stubs are usually wide than column headings but they
are as narrow as possible. munotes.in

Page 28

28 Business Statistics
28 5. Body: It is most important part of the table. It contains number of
cells. Cells are formed by int ersection of rows and columns. The body
of the table contains numerical information.
6. Headnote: It is used to explain certain points relating to the whole
table that have not included in the title, in the caption or stubs. It is
placed below the title or at the right hand corner of the table. For
example, the unit of measurement is frequently written as a headnote,
such as “in thousands”, “in crores”, etc.
7. Footnotes: It helps in clarifying the point which is not clear from the
title, captions or stubs. It is placed at the bottom of a table.
There are different ways of identifying the footnotes. One is numbering
them consecutively with small numbers 1, 2, 3 or letters a, b, c, d. Another
way identifies the first footnote with one star (*), second footno te with two
stars (**), third footnote with three stars (***) and so on. Sometimes instead
of * , +,@,£ etc used.
3.4 GRAPHICAL PRESENTATI ON OF DATA:
Graphical presentations are very simple for even a common person to
understand. It is popular method of p resentation of data. With the help of
graphs, two or more sets of data can be easily compared and analysed. The
trend of the data also can be seen from the graph.
A graph is drawn in a plane with two reference lines called the X -axis
(horizontal) and the Y -axis (vertical). The axes are perpendicular to each
other and their point of intersection is called Origin . Every point in the plane
is identified by two coordinates ( x, y). The first coordinate ( x) represents the
value of the variable on the X -axis and t he second coordinate ( y) represents
the value of the variable on the Y -axis.
Proper scale of measurement should be taken to accommodate the complete
data on the graph. If needed the origin can be shifted from (0, 0) to any other
required value. Such a proc ess is called shifting of origin .
Now, we shall study different types of graphical presentation of data:
One Dimensional or Bar Diagrams
This is the most common type of diagrams. They are called one -dimensional
diagrams because only length of the bar matte rs and not the width. For large
number of observations lines may be drawn instead of bars to save space.
Types of Bar Diagrams:
a. Simple bar diagram
b. Subdivided bar diagram
c. Multiple bar diagram
d. Percentage bar diagram munotes.in

Page 29

29 Presentation of Data Simple Bar Diagram: A simple bar diagram is used to represent only one
variable. It should be kept in mind that, only length is taken into account
and not width. Width should be uniform for all bars and the gap between
each bar is normally identical. For example the figures of product ion. Sales,
profits etc for various years can be shown by bar diagrams.
Example 1: Prepare a simple bar diagram for following data related to
wheat exports. Year Exports (in million tons) 2003 12 2004 15 2005 19 2006 25 2007 40 Solution: By taking years on x axis, Exports (in million tons) on y axis,
rectangles of equal width are drawn. The distance between successive
rectangles is same. The scale on y axis is 1 cm = 5 million tons.

Figure 3.1 Simple Bar Diagram Showing the Wheat Exports in
Different Years
Subdivided Bar Diagram: I n this diagram, one bar is constructed for total
value of the different components of the same variable. Further, it is
subdivide impropriation to the various components of that variable.
A bar is represented in the order of magnitude from the largest component
at the base of the bar to the smallest at the end of the bar, but the order of
various components in each bar is kept in the same order. Different shades
or colors are used to distinguish between different comp onents. To explain
such differences, the index should be used in the bar diagram. The
subdivided bar diagrams can be constructed both on horizontal and vertical
bases.
Example 2: The following data shows the production of rice for the period
2010 to 2018. Represent the data by a subdivided bar diagram.
munotes.in

Page 30

30 Business Statistics
30 Year Non-Basmati Rice
(in Million
metric tons) Basmati Rice (in Million
metric tons) Total (in Million
metric tons)
2010 29 35 64
2011 35 33 68
2012 25 35 60
2013 40 30 70
2014 42 32 74
2015 32 40 72
Solution:

Figure 3.2 Subdivided Bar Diagram Showing the production of Rice
(in Different Years)
Multiple Bar Diagram: Whenever the comparison between two or more
related variables is to be made, multiple bar diagram should be preferred. In
multiple b ar diagrams two or more groups of interrelated data are presented.
The technique of drawing such type of diagrams is the same as that of
simple bar diagram. The only difference is that since more than one
components are represented in each group, so differ ent shades, colors, dots
or crossing are used to distinguish between the bars of the same group.
Example 3: Represent the following data by a multiple bar diagram. Class Physics Chemistry Mathematics Student A 50 63 57 Student B 55 60 68 Student C 48 60 55
munotes.in

Page 31

31 Presentation of Data Solution:

Figure 3.3 Multiple Bar Diagram
Percentage Bar Diagram: Percentage bars are particularly useful in
statistical work which requires the representation of the relative changes in
data. When such diagrams are prepared, the length of the ba rs is kept equal
to 100 and segments are cut in these bars to represent the percentages of an
average.
Example 4: Draw percentage bar diagram for following data.
Particulars Cost Per Unit
(2010) Cost Per Unit
(2020)
Material 22 35
Lobour 30 40
Delivery 10 20
Total 62 95
Solution: Express the values in terms of percentage for both the years. Particulars Cost Per
Unit
(2010) % Cost Cumulative
% cost Cost Per
Unit
(2020) % Cost Cumulative % cost Material 22 35.48 35.48 35 36.84 36.84 Lobour 30 48.38 83.86 40 42.10 78.94 Delivery 10 16.12 99.98 20 21.05 99.99 Total 62 100 95 100
munotes.in

Page 32

32 Business Statistics
32
Figure 3.4 Percentage Bar Diagram
Histogram
A Histogram is a graph of a frequency distribution where adjacent
rectangles are drawn to represent the data. The width of the rectangles
depends upon the class width of the class intervals which are taken on the
X-axis. The height of the rectangles depends upon the class frequency,
which are taken on the Y -axis.
Example: Draw a histogram representing the following frequency
distribution: Sales in ’000 Rs. 10-20 20-30 30-40 40-50 No of companies 06 10 16 12 Solution: The class intervals are taken on the X -axis and the frequencies (no
of companies) are taken on the Y -axis. The Histogram of the given data is
as shown below:
Example 2:Draw a Histogram to present the following data:
munotes.in

Page 33

33 Presentation of Data Ages in yrs 5 – 10 10 – 15 15 – 20 20 - 25 25 – 30 No of Boys 12 28 20 32 16
Solution:

Note : If the class intervals are discrete they should be first converted to
continuous class interval s.
3.4.2 Frequency Curve
To draw a frequency curve, the class marks of the continuous class intervals
are computed and taken on the X -axis. The frequencies are taken on the Y -
axis. The class marks are plotted against the corresponding frequencies.
These points are then joined by a smooth curve. This resultant curve is
called as frequency curve .
The only important care that has to be taken is that in the process of joining
the successive points by smooth curve the trend of the data is not hampered.
Examp le 3:
Draw a frequency curve for the following data:
Income . 0 – 4 4 – 8 8 – 12 12 – 16 16 – 20
in ’000 Rs.
No of families 20 28 26 30 32
Ans: The class marks of the class intervals are 2, 6, 10, 14 and 18. Now
these class marks are plotted aga inst the corresponding class frequencies
and the frequency curve is drawn as shown below:
munotes.in

Page 34

34 Business Statistics
34
3.4.3 Ogive Curves
Ogive curves are the frequency curves in which instead of class marks the
class limits (either upper or lower) are plotted against the cumulative
frequencies (either less than or more than type). Hence Ogive curves are
also called as cumulative frequency curve .
Less than Ogive Curve
The upper class limits are plotted against the less than cumulative
frequencies and joined by a smooth curve. A less than Ogive curve is always
in upward direction.
More than Ogive Curve
The lower class limits are plotted against the more than cumulative
frequencies and joined by a smooth curve. A more than Ogive curve is
always in downward direction.
Example 4:
Draw th e Ogive Curves for the following data: Weight in Kg 20 – 25 25 – 30 30 – 35 35 – 40 40 – 45 No of children 15 10 25 5 10
Ans: We prepare the less than and more than cumulative frequency table
and draw both the Ogive curves as shown below: Weight in kg No of children less than cf more than cf 20 – 25 15 15 65 25 – 30 10 25 50 30 – 35 25 50 40 35 – 40 5 55 15 40 – 45 10 65 10
munotes.in

Page 35

35 Presentation of Data

Ogive Curves are very useful to find graphically the positional averages like
median, quartiles, percentiles and mode. We w ill study all this in the next
chapter.
3.5 LET US SUM UP:
In this chapter we have learn:
• Classification of data.
• Types of data classification.
• Tabulation of different types of data.
• Different types of diagrams and graphs to represent the data.
3.6 UNIT END EXERCISES:
1. Define the following terms :
a) Frequency b) Class – Interval
c) Class – Limits d) Class – marks
e) Cumulative Frequency
2. The following data gives the height of 24 students in cm of a class.
Prepare a frequency distribution tab le and find the cumulative
frequencies.
145 146 138 152 144 155 172 160 168 173
170 140 150 145 165 135 141 153 167 156
166 174 133 170
3. The following data gives the weight of 30 boundaries in tons. Prepare
a frequency distribution table and find (a) r elative frequencies and (b)
percentage frequencies
munotes.in

Page 36

36 Business Statistics
36 12 7 8 15 16 14 11 9 5 13
18 12 6 10 5 4 12 13 17 14
9 14 16 11 12 19 20 5 10 4
4. The following data gives the ages of 50 child labours. Prepare a
frequency distribution table with cumulative frequencie s and answer
the following question (i) how many labours are there whose age is
less than 9, (ii) how many child labours have age less than 13 , (iii)
how many child labours are there with age more than 13? Take class
intervals as 5 – 7, 7 – 9, 9 – 11, …….
12 6 5 13 8 15 9 11 14 6
10 14 9 12 11 7 8 13 11 13
9 6 5 10 14 12 5 8 6 13
11 11 10 6 14 13 11 7 9 12
8 11 9 13 12 6 11 8
5. The rainfall in mm is given to a certain area. Prepare the frequency
distribution table. Write down the class marks and class w idth of the
class intervals. Prepare % frequencies and cumulative frequencies for
the data.
24.5 16.5 13.8 15.5 19.7 21.3 30 28.4 14.2
17.9 16.5 11.8 13.2 15.4 24.5 26.1 18.6 19.2
27.6 26.8 21.5 24 16.2 17.1 14.5 19.5 23.4
25.8 27.6 18.2
6. Convert the f ollowing inclusive class intervals into exclusive type
(i) 10 – 14, 15 – 19, 20 – 24, 25 – 29
(ii) 22 – 28, 30 – 36, 38 – 44, 46 – 52, 52 – 58
(iii) 2 – 12, 16 – 26, 30 – 40, 44 – 54, 58 – 68
7. Prepare a bivariate frequency distribution table for the following data:
Marks
in
Statist
ics 15 10 18 28 20 30 35 45 14 16 40 4 48 28 12 25 15 35 18 6 26 21 32 45 9 11 16 35 41 34
Marks
in
Bus.
Law 5 20 8 15 6 22 28 39 26 18 38 12 29 38 13 26 27 9 25 17 30 41 40 18 15 40 5 35 7 12
8. Prepare a bivariat e frequency distribution table for the following data: munotes.in

Page 37

37 Presentation of Data Hei
ght
in
cm 13
2 14
5 13
0 15
0 13
6 14
7 15
4 15
5 13
2 13
6 14
1 14
6 15
4 15
0 15
2 13
5 14
0 13
7 15
1 14
2 13
9 14
8 14
6 15
4 14
5 15
2 13
6 15
1 14
4 15
1
We
ight
in
Kg 26 32 38 45 40 35 48 52 36 32 50 40 38 54 58 27 33 46 57 55 59 60 40 45 32 49 32 51 60 33
9. Write a short note on different types of graphs.
10. Explain briefly the Ogive curves.
11. Distinguish between Histogram and Frequency curve.
12. Draw a histogram for the following data: C.I. 0 – 5 5 – 10 10 – 15 15 – 20 20 – 25 25 – 30 f 4 10 18 14 20 18 13. Draw a histogram for the following data: Rainfall in mm 30 – 45 45 – 60 60 – 75 75 – 90 90 – 105 105 – 120 No of Cities 11 6 14 23 16 10 14. Draw a frequency curve for the following data:
Time in min 0 – 2 2 – 4 4 – 6 6 – 8 8 – 10 10 – 12 No of customers 7 10 17 15 16 25
15. Draw Ogive curves for the following data:
Marks 0 –
10 10–
20 20 –
30 30 –
40 40 –
50 50 –
60 60 –
70 70 –
80 80 -
90 90-10
0 No of students 9 22 33 40 36 54 42 23 17 13
Find from the graph:
1. Number of students who have got distinction (marks more than 75)
2. Number of students who have got marks less than 35. (Hint: For more
than 75: On the X -axis locate 75. Draw a perpendicular till it touches
more than ogive curve from there ex tend it on Y -axis. The point where
it meet is required answer.) munotes.in

Page 38

38 Business Statistics
38 Multiple Choice Questions:
1) The given data 5,1,0,4,2,4,3,1,2,6 is called : -
a) Frequency distribution data b) Grouped frequency data
c) Row data d) None of the above
2. ___________ is us ed to present data involving one variable.
a) Multiple Bar diagram b) Pie diagram
c) Simple bar diagram d) None of these.
3. The median of a given frequency distribution is found graphically
with the help of _______________.
a) Histogram b) Simple bar diagram
c) Frequency polygon d) Ogive.
4. The best method of presentation of data is
a) Tabular b) Textual c) Diagrammatic d) (b) and (c)
5. The most accurate mode of data presentation is
a) Diagrammatic methods b) Tabulation
c) Tex tual presentation d) None of these
6. The frequency distribution of a continuous variable is known as
a) Grouped frequency distribution
b) Simple frequency distribution
c) ungroup frequency distribution
d) None of these
7. Mutually inclusive classifica tion is usually meant for
a) A discrete variable b) A continuous variable
c) An attribute d) All these
8. A comparison among the class frequencies is possible only in
a) Frequency polygon b) Histogram
c) Ogives d) (a) and (b)
9. The number of types of cumulative frequency is
a) one b) Two
c) Three d) Four
3.7 LIST OF REFERENC ES:
• Fundamentals of mathematical Statistics by S.C. Gupta and V.K
Kapoor.
• Basic Statistics by B. L. Agrawal.
7777777munotes.in

Page 39

39 4
MEASURES OF CENTRAL TENDENCY
Unit Structure
4.0 Objectives:
4.1 Introduction:
4.2 Objectives of an Average
4.3 Fundamentals of a good Average
4.4 Mean
4.5 Weighted Mean
4.6 Combine Mean
4.7 Merit and Demerit of Mean
4.8 Let us sum up:
4.9 Unit end Exerc ises:
4.10 List of References:
4.0 OBJECTIVES:
After going through this chapter you will able to know:
•The concept of central tendency (Average) of data.
•The different measure of central tendency.
•The advantage and disadvantage of mean.
•Compute measure of central tendency for ungroup and group data.
•Calculation o f combined mean and weighted mean.
4.1 INTRODUCTION
In statistical analysis there is a need of condensation of the huge data
available so as to study its different characteristics. In the previous chapter
we have seen how to classify and present the data in a tabular form. The
data arranged in the frequency distribution tables shows a tendency to
cluster around at certain values. This tendency is called as Central Tendency
and is calculated statist ically. A measure of central tendency or an average
is the single value representing the complete data. It is an important
summary measure in statistics. The word average in Statistics has a
quantitative meaning and not qualitative.
As defined by Clark an d Sekkade : “ An average is an attempt to find out
one single figure to describe the whole of figures ”. munotes.in

Page 40

40 Business Statistics
40 4.2 OBJECTIVES OF A N AVERAGE
(i) A single value representation of the entire data :
With the help of an average one can present a huge amount of data i n
a summarized form, which is easier to understand. It gives a bird’s
eye view of the entire data. For Example, it is not possible to and
required to know the individual requirements of petrol consumption,
but an average quantity of petrol consumption is e nough for the
government agencies in planning petroleum imports.
(ii) To compare different statistical data :
Also it helps to compare different sets of data. For Example, the
passing percentage of students of two colleges can be compared by
the average pa ssing percentage of students of each of the college.
(iii) To analyse and facilitate decision making :
The most important aspect of an average is that it can be used for
analyzing the data and hence making some predictions or decisions
based on that. For Example, if the average monthly revenue of a
product is found to be decreasing, then the manufacturer can think of
some advertising or other measures to increase the revenue.
4.3 FUNDAMENTALS OF A GOOD AVERAGE
(i) It should be easily understood: Since stat istical measures are used for
simplifying huge data, an average should be easily understood by the
end user.
(ii) It should be rigidly defined: As statistical measures are used by
different people, an average should be simple but properly or rigidly
define d so as to avoid alterations caused due to interpretations of
different individuals. There should be no chance of a bias of an
individual in its calculation.
(iii) It should be easy to calculate: In order to make an average most
popular it is important tha t the algebraic formula and the method is
not difficult or complex for an individual to calculate.
(iv) It should be based on entire data: An average should be based on each
and every observation of a given data. If any observation is deleted it
should aff ect the average also, otherwise it cannot be called as a
representative of the entire data.
(v) It should not be excessively affected by extreme values: Since an
average is a measure of central tendencies for a large data, it should
truly represent charact eristics of the entire data. Hence, it should not
be excessively affected or distorted by the extreme (very small or very
large) observations. munotes.in

Page 41

41 Measures of Central Tendency (vi) It should be capable of further algebraic treatment: Any statistical
measure should be useful for further a nalyses or calculations.
Otherwise it would be of limited use to statisticians.
(vii) It should have sampling stability: If independent sets of samples of
same size and type of data are taken, we should expect approximately
same average for each sample. In other words, an average should have
sampling stability.
Now we shall study the different types of averages.
TYPES OF AVERAGES
The various measures of central tendencies can be classified into two major
types: ( i) Mathematical Averages and ( ii) Positional Averages. These are
further classified into subtypes as follows:

As per our scope of syllabus, we will restrict ourselves to arithmetic mean,
geometric mean, median, quartiles and mode.
4.4 ARITHMETIC MEAN:
SIMPLE (OR UNWEIGHTE D) ARITHMETIC MEAN :
The si mple arithmetic mean (A.M.) is defined as the ratio of sum of all the
observations to the total number of observations. In symbolic form, let 123, , ,.......,n xx x xbe ‘n’ number of observations. The A.M. is denoted by xand is g iven by the formula: 123 .......n xx x xxn 11n
i
ixn ¦ or simply xxn¦ .
In order to simplify the formula notation, the index ‘i’ can be skipped from
the summation symbol.
munotes.in

Page 42

42 Business Statistics
42 Example 1: The marks of 10 students are as f ollows: 02, 07, 04, 05, 06,
05, 09, 03, 07, and 04. Find the average marks.
Solution: Average marks x= 2 7 4 5 6 5 9 3 7 4 525.210 10
Example 2: The monthly sales (in ’00 Rs.) of a product are given below.
Find its average monthly sale s.

Solution: 10 14 13 9 8 11 15 16 15 17 16 14 15813.1712 12x
Thus, the average sales are Rs. 1,317.
Example 3: The average score of Rohit in 5 matches is 56, of which 4
matches scores are as 45, 31, 68, and 52. How much his score in fifth
match.
Solution: Here, total number of matches, ݊=5.
Let he score ݔହ runs in fifth match.
෍ݔ=45+31+68+52+ݔହ=196 +ݔହ
The average s core is 56.i.e. ݔҧ=56.
Now
ݔҧ=σݔ
݊=196 +ݔହ
5
56=196 +ݔହ
5
56×5=196 +ݔହ
ݔହ=280 െ196 =84.
Therefore, he scores 84 runs in fifth match.
Example 3: Ajit has given first year BMS exam and score average marks
65. He attempted 7 papers of which 5 papers marks are as 58, 62, 74, 78,
82. But difference between remaining two subject marks is 7. Find the
remaining two subjects marks.
Solution: Here total number of subjects, ݊=7.
Let remaining two subjects marks are ݔ଺,ݔ଻.
෍ݔ=58+62+74+78+82+ݔ଺+ݔ଻=354 +ݔ଺+ݔ଻
munotes.in

Page 43

43 Measures of Central Tendency Now
ݔҧ=σݔ
݊=354 +ݔ଺+ݔ଻
7
65=354 +ݔ଺+ݔ଻
7
65×7=354 +ݔ଺+ݔ଻
ݔ଺+ݔ଻=455 െ354 =101
׵ݔ଺+ݔ଻=101 ……(I)
Given that the difference between two subjects is 7.
i.e. ݔ଺െݔ଻=7…….(II)
Solving equations (I) and (II) we get
ݔ଺=54 and ݔ଻=47
Therefore he score 54 and 47 marks in remaining two subjects.
For group data: The group data is also called frequency distribution. We
have learn that there are two types of grou p data i) discrete data and ii)
continuous data.
For discrete data: Let the variable X takes values ݔଵ,ݔଶ,ݔଷ,……….,ݔ௡
with the correspondence frequencies ݂ଵ,݂ଶ,݂ଷ,……….,݂௡ respectively, than
the mean of the variable X i s given by,
ݔҧ=݂ଵݔଵ+݂ଶݔଶ+݂ଷݔଷ+ڮ………..+݂௡ݔ௡
݂ଵ+ ݂ଶ+ ݂ଷ+,……….+݂௡
ݔҧ=σ௙௫
ே ׶σ݂=ܰ
Steps to calculate A.M .:
1. We calculate the total frequency denotedNf ¦, where 123 ....n fff f f¦
2. Then we calculate the products 11 2 2 3 3., . , .,fx f x fx….,.nnfx.
3. The A.M. is now calculated by the formula: fxxN¦ , where
11 2 2 3 3 ....nn fx f x f x f x f x¦ .
Example 5: The foll owing is the frequency distribution of heights of
students in a class in a college. Calculate the average height of the class. Height (in cms) 152 153 154 155 156 157 158 No. of 10 16 20 28 18 14 14 munotes.in

Page 44

44 Business Statistics
44 Students Solution: (Students are directed to draw verti cal tables while practicing the
problems)
Let the height (in cms) be denoted by x and the no. of students by f.
Introducing the column of fx we have, Height (in cms) (x) 152 153 154 155 156 157 158 Total No. of Students (f) 10 16 20 28 18 14 14 N =
120
Fx 1520 2448 3080 4340 2808 2198 2212 fx¦=18606 From the table we have: fx¦=18606 and N = 120. 18606155.05120fxxN¦? . Thus, the average height of the class is 155.05
cms.
Example 6: The following tab le gives the survey report of 100 people who
were asked to count maximum number in one breath. Find the average
numbers one can count in one breath. Numbers 50 60 70 80 90 100 110 120 No. of Persons 25 20 15 5 15 10 5 5
Solution: Let the numbers be denot ed by x and the no. of persons by f.
Introducing the column of fx: Numbers (x) 50 60 70 80 90 100 110 120 Total No. of Persons (f) 25 20 15 5 15 10 5 5 N =
100
Fx 1250 1200 1050 400 1350 1000 550 600 fx = 6400 From the table Ze have N 100 and fx = 6400.
The average numbers in one breath is 640064100fxxN6 .
Grouped (continuous) A.M.
Consider a grouped and continuous class distribution with class intervals as 12 23 34 , , ,.... aa a a aaetc. Let the corresponding frequencies be denoted by 123, , ,.......,n ff f f. munotes.in

Page 45

45 Measures of Central Tendency Steps to calculate A.M. for continuous class distributions
1. We calculate the total frequency denotedNf ¦
2. Now the midpoints ( xi) of the continuous class i ntervals are
calculated. For Example, the midpoint of first class interval is12
12aax , for the second interval it is23
22aax and so on.
3. Now we introduce the column of fx. The midpoints ( xi) are multiplied
by the correspo nding frequencies ( fi). The sum of these products is
calculated and is denoted by fx¦.
4. The A.M. is now calculated by the formula: fxxN¦ .
Example 7: The minimum marks for passing in a post graduate diploma
course are 50. The following data gives the marks of 100 students who
passed the course. Find the average marks scored by the students.
Marks 50 -59 60 -69 70 -79 80 -89 90 -99
No. of 28 30 25 15 2
Students
Solution: In calculation of mean it is not necessary to convert the inclusive
class intervals to exclusive ones. Now, we introduce two columns for
calculating the mean. ( i) mid points of intervals and ( ii) fx Marks Mid points (x) Frequency(f) fx 50-59 54.5 28 1526 60-69 64.5 30 1935 70-79 74.5 25 1862.5 80-89 84.5 15 1267.5 90-99 94.5 2 189 Total -- N = 100 fx = 6780 From the table fx = 6780 and N = 100 678067.8100fxxN6?
Example 8: The following table gives the marks of 100 students. Find the
average marks.
Marks (less than) 10 20 30 40 50 60 70 80 90 100
No. of stu dents 4 12 23 45 60 70 75 82 92 100
Solution: Observe that the data given has marks less than, which means the
frequency given is less than cumulative type. Converting them to munotes.in

Page 46

46 Business Statistics
46 frequencies and introducing two columns as d one in the previous problem,
the mean is calculated as follows: Marks Mid points (x) Frequency(f) fx 0-10 5 4 20 10-20 15 8 120 20-30 25 11 225 30-40 35 22 770 40-50 45 15 675 50-60 55 10 550 60-70 65 5 325 70-80 75 7 525 80-90 85 10 850 90-100 95 8 760 Total -- N = 100 fx = 4820 The average marks are: 482048.2100fxxN6
MISSING FREQUENCIES
Example 9: If the arithmetic mean for the following data is 24.8, find the
missing frequency.
Class Interval 0 -10 10 -20 20 -30 30 -40 40 -50
Frequency 9 11 -- 12 8
Ans: Let the missing frequency be k. Now, completing the table for
calculating the mean, we have: Class Interval Mid points (x) Frequency(f) fx 0-10 5 9 45 10-20 15 11 165 20-30 25 K 25k 30-40 35 12 420 40-50 45 8 360 Total -- N = 40 + k fx = 990 + 25k Now, given 24.8x and fxxN6 990 2524.840k
k? 24.8(40 ) 990 25 kk ? 992 24.8 990 25 kk 0.2 2k . Thus, k = 10.
The missing frequency is 10. munotes.in

Page 47

47 Measures of Central Tendency CORRECTED MEAN
Example 10: The average weight of 25 boys from age group 10 -15 was
calculated as 38 Kg. Later on it was found that one of the boy’s weight was
wrongly taken as 34 Kg instead of 43 Kg. Find the corrected mean.
Solution: Given n = 25 and x= 38, wrong value = 34, correct value = 43
The formula for mean is xxn6 , which means . xn x6 .
Using this we have 25 x 38 950x6 . This sum is wrong as one of the
observation is incorrect. So we find the correct sum by subtracting the
wrong value from the sum and adding the correct value.
Correct sum = x6wrong value + correct value = 950 34 + 43 = 959 ? correct mean = (correct sum)/ n = 959/25 = 38.36
4.5 WEIGHTED A.M.
In calculating simple arithmetic mean we assume that all observations are
of equal importance. Practically it may be that some observations are more
important or less important as compared with the rest. For example, while
computing average salary of a company, the class I, class II, class III and
class IV employees have different levels of salaries and allowances and
hence are not of same level. A simple arithmetic me an in this case will not
be representative of all the employees of the company. In such cases weights
are assigned to different observations depending upon their importance.
Let 123, , ,.......,n xx x xbe ‘n’ number of observations with corresponding
weights as w1, w2, …, wn.
Then the A.M. is calculated by the formula:11 2 2
12....
....nn
nwx w x w x wxxww w w ¦ ¦
Example 11: Compute the weighted mean salary of employees in a
company from the following data: Employee Class I Class II Class III Class IV Salary (in ‘000 Rs. ) 30 24 18 10 Number of employees 10 25 10 5
Solution:
Employee Class I Class II Class III Class IV Salary (x) 30 24 18 10 Total munotes.in

Page 48

48 Business Statistics
48 (in ‘000 Rs.) Number of employees
(f) 10 25 10 15 N = 60
Fx 300 600 180 50 fx6= 1130 113018.8360fxxN6 . The weighted average salary is Rs. 18,830.
Example 12: Given below is the performance of students of three colleges
A, B and C in different courses. The data gives the percentage of students
passed and no. of students (in’000). Using weigh ted average mean find out
the best performing college. Colleges College A College B College C B.A. 65% 2 80% 4 70% 3 B.Com. 54% 3 75% 4 50% 2 B.Sc. 72% 1 70% 2 80% 5 Solution: Colleges College A College B College C Courses x w Wx x w wx x w wx B.A. 65 2 130 80 4 320 70 3 210
B.Com. 54 3 162 75 4 300 50 2 100
B.Sc. 72 1 72 70 2 140 80 5 400
Total - w
=
6 wx
=3
64 - w
=1
0 wx
=7
60 - w
=10 wx
=71
0
Now the weighted mean for different college is calculated as follows:

munotes.in

Page 49

49 Measures of Central Tendency Comparing all the three weighted means, since the weighted mean of
College B is highest, its performance is the best among the three colleges.
Example 13: In an entrance examination , different weights were attached
for the subject Maths, Physics, Chemistry and Biology. The marks obtained
by Tejas, Dhyey and Anant are given below. Find the weighted mean and
give a comment on it.

Solution: The weighted means for the three students is calculated as
follows:
Subject Weigh
t
(w) Students Tejas Dhyey Anant X wx x Wx x wx Maths 4 75 300 68 272 81 324 Physics 3 90 270 70 210 79 237 Chemistry 3 77 231 74 222 84 252 Biology 1 92 92 72 72 89 89 Total w6=11 -- wx6=893 -- wx6=776 -- wx6=902
Now we calculate the weighted mean for the three students using the
formula: wwxxw6 6
For Tejas: wwxxw6 689381.1811
For Dhyey: 77670.5511wwxxw6 6
For Anant: 9028211wwxxw6 6.
munotes.in

Page 50

50 Business Statistics
50 Comparing the individual weighted mean, since Anant has the highest
among the three students, his performance is the best.
In the above case if instead of weighted mean, simple mean is calculated
then the conclusion may differ and would not be p roper as all the subjects
are not to be treated equally.
For Tejas: 334x6 33483.54xxn6
For Dhyey: 284x6 284714xxn6
For Anant: 333x6 33383.254xxn6
Compari ng the three means the obvious conclusion is that Tejas’s
performance is better than of Anant.
4.6 COMBINED A.M.
Any statistical measure is expected to be useful in further algebraic
treatment of the data. For Example, if the average daily wages of men an d
women are known then the average wages for the total workers can be
useful for the employer. This can be done as follows: Let 1xbe the A.M. for
a set of 1nobservations and 2xbe the m ean for a set of n2 observations. Then
the combined mean for the set of n1 + n2 observations is given by the
formula : 11 2 2
12cnx n xxnn .
This formula can be extended to any ‘ k’ number of sets of observations. The
combined mean hence will be given by: 11 2 2
12....
....kk
c
knx n x n xxnn n .
Example 14: The mean daily wages of 20 women workers are Rs. 100 and
that of 35 men workers are Rs. 140. Find the mean daily wages for all
workers taken together.
Solution: The data can be tabulated as follows:

Let n1 = 35, 1x = 140 and n2 = 20, 2x= 100
The combined mean daily wages are:
munotes.in

Page 51

51 Measures of Central Tendency 11 2 2
1235 x 140 20 x 100 6900125.4535 20 55cnx n xxnn
Example 15: The average height of students in a class i s 162 cm. If the
average height of boys is 167 cm and that of girls is 160 cm, find the ratio
of boys to girls. Also find the number of boys and girls if there are 80 boys
in the class.
Solution: Given, 162cx , 1167x and 2160x . Let there be a boys and b
girls in the class.
Now, 11 2 2
12cnx n xxnn . 167 160162ab
ab? 162 162 167 160ab ab ? 25ba . :2 : 5ab? . Thu s, the ratio of boys to girls is 2:5
If there are 80 boys in the class, the number of girls = 5 x (80/2) = 200.
4.7 MERITS AND DEMER ITS OF ARITHMETIC MEAN
Merits:
1. It is simple to calculate and easy to understand.
2. It is rigidly defined.
3. It can be used for further algebraic treatment very easily and hence is
a popular statistical measure.
4. It is based on all observations of a given set of data.
5. It is least affected by the sampling fluctuations.
6. It is useful in comparing two or more sets of data.
Dem erits or Limitations
1. If the number of observations is not large, then some extreme items
in the data may considerably affect the average. For Example, in a
cricket match if one player has scored 200 runs and rest have scored
together 100 runs, then the tot al runs for the team is 300 with an
average score for each player being 27 runs. This does not represent
the data in true sense as the remaining 10 players actually have an
average score of only 10 runs.
2. The A.M. has an upward bias. The large observations dominate the
small observations in calculating its average. For example, the
average of 1, 1.1, 0.9 and 21 is 6. We can see clearly that the because
of the last observation the average has gone up.
3. Comparisons based merely on A.M. may be misleading. For Ex ample,
two contractors payment to their workers is suppose 800, 250, 50, 50,
50 and 350, 300, 300, 150, 100. The average pay for both comes out munotes.in

Page 52

52 Business Statistics
52 to be 240, which may force to conclude that both the contractors pay
equal wages to their workers, which as we c an see is far from true.
4. The A.M. obtained may not be among the observations taken into
consideration. For Example, the A.M. of 10, 20, and 70 is 33.33,
which is not among the three observations. A.M. being a
mathematical average, it does not reflect the q ualitative part like
sincerity, beauty etc of the data. Also, if there are 100, 110 and 125
students in three classes then the average number of students is
111.66. Can the number of students be in fractions? This is absurd.
5. The arithmetic mean cannot be c omputed for open end classes. Since
computation of A.M. for a continuous frequency distribution requires
the class marks or mid points of class intervals, if there are open end
class intervals it is not possible to find their mid points. Hence A.M.
cannot be computed in such cases.
4.8 LET US SUM UP:
In this chapter we have learn:
•Meaning of measure of central tendency and its types.
•Calculation Arithmetic Mean (A.M.) for ungroup and group data.
•Calculation of weighted Mean.
•Calculate Combine Mean for tw o and more variables.
4.9 UNIT END EXERCISES:
1. What is the meaning of measure of central tendencies? Write down
the characteristics of a good average.
2. Write a short note on different types of central tendencies.
3. Define Arithmetic Mean. What are the merits and demerits of A.M.
Justify your answer with suitable Examples.
4. The daily income in ’000 Rs. of 10 shopkeepers in Mumbai is as
follows: 12, 08, 19, 25, 06, 30, 28, 41, 15, and 10. Find the average
daily income.
5. The daily sales in numbers of vadapav in diffe rent areas in Mumbai
is as follows:

Find the average number of vadapav’s sold per day.
6. The marks obtained by 30 students in a test are given below. Find the
average score of the class.
10, 15, 22, 13, 26, 44, 23, 18, 10, 20, 36, 45, 29, 36, 11
07, 39, 09, 33, 31, 25, 12, 41, 34, 26, 16, 18, 20, 31, 38
munotes.in

Page 53

53 Measures of Central Tendency 7. The profit (in lakhs of Rs.) of 10 companies in a city is as follows:
125, 224, 100, 435, 250, 565,
280, 195, 320, 402. Find the average profit of all the companies.
8. The number of defective bulbs in 100 boxes is as follows: Find the
average number of defective bulbs for all boxes.
Defective bulbs 0 1 2 3 4 5 6 7 8 9 10
Boxes 10 15 21 14 11 9 5 3 3 5 4
9. To mark the 150th anniversary of India’s first freedom struggle of
1857, 120 students of a school were told to write down the martyrs
they remember. The number of martyrs known to each student were
as follows: No. of martyrs 0 1 2 3 4 5 6 7 8 9 10 No. of students 12 38 24 11 16 07 06 03 01 02 00
10. The average weight of 30 students in a class is 40 kg. The f ollowing
table gives the weight of 20
students in the class. Find the average weight of remaining 10
students.
Weight (in kg) 34 36 38 40 42 44 46
Students 2 2 5 6 4 0 1
11. The fo llowing data shows the consumption of milk in litres in
different families. Find the average consumption of milk. Milk consumption 0.5 1 1.5 2 2.5 3 No. of families 10 28 30 21 16 5 12. The number of inquiries per day in a shoe shop for different sizes of
shoes is given below. Find the average shoe size inquired about and
what do you hence suggest the shopkeeper to do?

13. The following table gives the marks obtained by students in the first
term examina tion. Find the average marks per student Marks 0-10 10-20 20-30 30-40 40-50 No. of students 11 10 15 8 6
munotes.in

Page 54

54 Business Statistics
54 14. The following data gives the distance required by villagers of
different villages of Thane district to come to the Collector’s office.
Find the avera ge distance of travelling. Distance (in km) 0 –
10 10 –
20 20 –
30 30 –
40 40 –
50 50 –
60 60 –
70 No. of villages 98 62 35 60 152 70 189
15. The monthly wages of employees in a company is given below. Find
the average wages. Wages in ’00 Rs. 5-15 15-25 25-35 35-45 45-55 55-65 No. of workers 05 16 22 18 10 09
16. The average life of two types of tube lights is given below. Find the
average life of each type and comment on the result. Life in months 0 – 6 6 – 12 12 – 18 18 – 24 24 – 30 30 – 36 Tube A 04 12 15 10 8 01 Tube B 06 10 14 08 05 07 17. The performance of students of three Universities A, B and C in
different courses is given below. The data gives the percentage of
students passed and no. of students (in’000). Using weighted average
mean find out the bes t performing college. Colleges University A University B University C M.A. 55% 20 78% 40 80% 30 M.Com. 64% 30 70% 40 65% 20 M.Sc. 69% 10 65% 20 70% 50 18. Find the missing frequency from the following data if the total
frequency is 100 and mean is 23.9 Class Interval 0-10 10-20 20-30 30-40 40-50 50-60 Frequency 05 16 -- 18 10 09 19. Find the missing frequency from the following data if the total
frequency is 200 and mean is 49.75 Class Interval 20-30 30-40 40-50 50-60 60-70 70-80 Frequency 25 -- 42 51 36 17 munotes.in

Page 55

55 Measures of Central Tendency 20. Find the missing frequency if the mean for the following data is 9.725. Income (in ’000 Rs) 2 – 5 5 – 8 8 – 11 11 – 14 14 – 17 17 – 20 No. of families 14 22 8 ? 12 04 21. If the average marks of 50 students in the tutorial test are 4.86, find
the missing frequencies: Marks 0 1 2 3 4 5 6 7 8 9 10 No. of students 2 6 ? 11 4 3 ? 4 2 7 5
22. The following frequency distribution gives number of students of a
class who have passed an examination. Find their average marks.
Also, find the average marks of the faile d students if the total number
students in the class were 100. Marks 35-50 50-60 60-75 75-90 90-100 No. of students 12 15 10 11 12 23. The average salary of 400 employees in a private firm is Rs. 12,000.
Due to increase in inflation, the firm decides to give an increment of
20% of the average salary to the highest paid employees, 15% of the
average salary to the lowest paid employees and 10% of the average
salary to the remaining employees. Find the extra payment the firm
has to make for all employees. Also f ind the average salary after the
increment.
24. The following table gives the salary per month of 200 employees in a
company.
Salary 0-2500 2500-5000 5000-7500 7500-10000 10000-12500 No. of employees 50 40 45 35 30
The company offers an increment due to its profit in the first quarter
as follows: 25% of the average salary to the highest paid employees,
20% of the average salary to the lowest paid employees and 15% of
the average salary to the remaining employees. Find the extra
payment the company has to make and also find the average salary
per employee after the increment.
25. The average height of 100 students is 170 cm. The average height of
55 boys is 173 cm and that of the girls is 169 cm. Find the number of
girl students.
26. The average pay of 25 men and 35 wo men in a factory are Rs. 100
and Rs. 80 respectively. Find the average pay of all the employees.
27. The average charge of a movie ticket for children and adults is Rs. 30
and Rs. 50 respectively. If 100 children and 150 adults watch a movie,
what is the avera ge charge of ticket of the audience?
28. The average marks of students in a class are 55. If the average marks
for boys is 54 and the average marks for girls is 58. Find the ratio of munotes.in

Page 56

56 Business Statistics
56 boys and girls in the class. If there are 60 boys in the class, find the
numb er of girl students.
29. Mr. Vipul Patel owns three factories. The average wages for 50
labourers working in the first factory is Rs. 120, th at of 80 labourers
in second factory is Rs. 100 and for the 70 labourers in the third
factory is Rs. 110. Find the average wage for all labourers in Mr.
Patel’s factories.
30. The average salary of 120 employees in a company is Rs. 12,000. The
average salary o f 20 Grade I employees is Rs. 16,000 and that of 40
Grade II employees is Rs. 12,400. Find the average salary of
remaining employees.
31. The average marks of 20 students in a class are 75. If the average
marks of 12 students are 70, find the average of remain ing students in
the class.
32. The mean height of 39 students in class is 164. The average becomes
164.2 because of the entry of a new student. What are the marks of
the new student?
33. A salesman has average sales of Rs. 11,000 in the first 5 months of
his job . Due to crash down in the market his sales for the sixth month
are very low thereby decreasing the six monthly averages to Rs.
10,000. Find the sales made by the salesman in the sixth month.
34. The mean salary of 1000 employees in a company is Rs. 11,500. It
was discovered that the salary of one employee was wrongly taken as
10,000 instead of 1,000. Find the correct mean salary.
35. The average weight of 50 people participating in a diet contest was
calculated as 45 kg. But it was found that the actual weight of one of
the participant was 62 kg and not 52 kg. Find the correct average
weight of all participants.
Multiple Choice Questions:
1. If σ݂ݔ =2120 & σ݂ =80 then ݔҧ is ______
a) 26.5 b) 27.5 c) 37.5 d) 38.5
2. If there are two group with 100 observations each and 35 and 45 as
values of their mean then the value of combined mean of 200
observations will be
a) 35 b) 40 c) 45 d) None of these.
3. If n =5, σݓݔ=1450, σݓ=80 then weighted mean is : -
a) 40 b) 19.25 c) 36 d) 25
4. Mean or average used to measure central tendency is called
a) sample mean. b) arithmetic mean.
c) negative mean. d) populati on mean.
5. The mean of 25,15,20, 10, 30 is
a) 25 b) 30 c) 20 d) 10 munotes.in

Page 57

57 Measures of Central Tendency 6. The arithmetic mean of asset of 10 numbers is 20. If each number is
first multiplied by 2 and then increased by 5, then what is the mean of
new numbers?
a) 20 b) 25 c) 40 d) 45
7 What is the weighted mean of first 10 natural numbers whose
weights are equal to the corresponding number?
a) 7 b) 5.5 c) 5 d) 4.5
8. The arithmetic mean of first ten whole number is____.
a) 5.5 b) 5 c) 4 d) 4.5
9. The mean of 5 observation is 25 o f which first four observation are
35, 20, 40 and 5 than the fifth value is_____.
a) 25 b) 30 c) 15 d) 50
10. The average weight of 50 people is 45 kg. If the average weight of
30 of them is 42 kg, than the average weight of remaining people is
___.
a) 48.5 b) 50.5 c) 49.5 d) 51.5
4.10 LIST OF REFEREN CES:
• Fundamentals of mathematical Statistics by S.C. Gupta and V.K
Kapoor.
• Basic Statistics by B. L. Agrawal.
7777777
munotes.in

Page 58

58 Business Statistics
58 5
MEASURES OF CENTRAL TENDENCY II
Unit Structure
5.0 Objectives:
5.1 Introduction:
5.2 Median
5.2.1 M edian for a ungrouped data:
5.2.2 Median for a grouped (discrete) data
5.2.3 Median for a class distribution:
5.2.4 Graphical method for finding median
5.3 Quartiles, Deciles and Percentiles
5.4 Merits and Demerits of median
5.5 Mode
5.5.1 Mode for ungroup data:
5.5.2 Mode for group data:
5.5.3 Graphical location of mode
5.5 Merits and demerits of mode
5.6 Comparative analysis of all measures of Centr al Tendency
5.7 Let us sum up:
5.8 Unit end Exercises:
5.9 List of References:
5.0 OBJECTIVES:
After going through this chapter you will able to know:
• Types of positional averages
• How to calculate central value using median.
• Extend median and calculate qu artiles, Deciles, percentiles.
• Find median, quartile, deciles, percentiles by graphically using
cumulative frequency curve.
• Calculate Mode for ungroup and ungroup data.
• Find mode graphically using histrogram.
5.1 INTRODUCTION
In previous chapter we have le arn mathematical averages. Now in this
chapter we are going to discuss about positional averages. The mean, munotes.in

Page 59

59 Measures of Central Tendency I I median and mode are all valid measures of central tendency, but under
different conditions, some measures of central tendency become more
appropriat e to use than others. In the following sections, we will look at the
mean, mode and median, and learn how to calculate them and under what
conditions they are most appropriate to be used.
5.2 MEDIAN
One of the limitations of arithmetic mean is that, it is affected by the extreme
observations. To overcome this positional averages are very useful. Median
is the first of the type of positional averages. It is the central value among a
given set of observations written in either ascending or descending order of
their magnitude.
5.2.1 MEDIAN FOR A U NGROUPED DATA:
Steps to find Median
1. First the given set of n observations is arranged in ascending or
descending order.
2. If the number of observations ( n) is odd, then median is the 1
2thn§·
¨¸©¹obser vation.
3. If the number of observations ( n) is even, then the median is the
arithmetic mean of 2thn§·
¨¸©¹observation and 12thn§·¨¸©¹observation.
Example 1: Compute the median for the following series: 100, 78, 81, 43,
65, 77 , 102, 34, and 59
Solution: We first arrange the given data in ascending order as follows
34, 43, 59, 65, 77, 78, 81, 100, 102
The numbers of observations are 9, i.e. odd in number.
Using the formula we have, Median = 1
2thn§·
¨¸©¹observation = th91
2§·
¨¸©¹observation ?Median = 5th observation = 77.
Example 2: Compute the median for the following series: 9, 12, 17,8, 4, 15,
3, and 10
Solution: We first arrange the given data in ascending order as follows
3, 4, 8, 9, 10, 12, 15, 17
The numbers of observations are 8, i.e. even in number.
Now, n/2 = 8/2 = 4 and n/2 + 1 = 5
Using the formula Median = A.M. of 2thn§·
¨¸©¹and 12thn§·¨¸©¹observation, we
have, munotes.in

Page 60

60 Business Statistics
60 ?Median = A.M. of 4th and 5th observation = A.M. of 9 and 10 = 9.5
5.2.2 MEDIAN FOR A G ROUPED (DISCRETE) DA TA
Consider a set of observations 123, , .......,n xx x xwith corresponding
frequencies123, , .......,n ff f f.
Steps to find Median
1. First t he less than cumulative frequencies are calculated.
2. :e then find the total freTuency N f.
3. Now we calculate the value of N/2. (Irrespective of whether N is odd
or even.)
4. From the table we find the first cumulative frequency which is just
greater than N/2. The corresponding observation is the median.
Example 3: The heights of 60 students in a class are given below. Find the
median height. Height in cm 165 166 167 168 169 170 No. of students 11 15 10 5 12 7
Solution: Introducing the column of less than cumulative frequencies we
have,
Heigh t in cm No. of students (f) Cumulative frequency 165 11 11(!30) 166 15 26(! 30) 167 10 36 (> 30) 168 5 41 169 12 53 170 7 60 Total N = 60 From the table N = 60. ?N/2 = 30.
We look at the first cumulative frequency just greater than 30 in the third
column. It is 36
The corresponding value in the first column is 167. ? the Median = 167 cm.
5.2.3 MEDIAN FOR A CLASS DISTRIB UTION: munotes.in

Page 61

61 Measures of Central Tendency I I Consider a grouped and continuous class distribution with class intervals as 12 23 34 , , ,.... aa a a aaetc. Let the corresponding frequencies be denoted by123, , ,.......,n ff f f.
Steps to find Median
1. First the less than cu mulative frequencies are calculated.
2. :e then find the total freTuency N f.
3. Now we calculate the value of N/2 = m (say).
4. The class interval whose cumulative frequency is just greater than N/2
is the median class where the median lies.
5. Let l1 be the upper limit and l2 be the lower limit of the median class.
Let f denote the frequency of the median class and pcf denote the
cumulative frequency of the previous class interval. Then the median
is calculated by the formula:
Median = 1 x m pcflifªº«»
«»¬¼, where m = N/2 and i = 21llis the width
of the class interval.
Example 4: Find the median for the following data: Class Interval 10-30 30-50 50-70 70-80 80-90 90-100 Frequency 17 34 20 52 36 23 Solution: Introducing the column of less than cumulative frequencies we
have,
From the table N = 182. ?m = N/2 = 91.
The first cumulative frequency just greater than 91 is 123. The
corresponding class which is called as median class i s 70 – 80.
Thus, l1 = 70, l2 = 80. ? i = l2 – l1 = 10, f = 52 and pcf = 71 Class -Interval Frequency Cumulative frequency 10-30 17 17 (! 91) 30-50 34 51 (! 91) 50-70 20 71 (! 91) 70-80 52 123 ( > 91) 80-90 36 159 90-100 23 182 Total N = 182 munotes.in

Page 62

62 Business Statistics
62 Now, Median = 1 x m pcflifªº«»
«»¬¼= 91 7170 x 1052ªº«»
«»¬¼=20x 107052ªº«»¬¼
?Median = 70 3.84 73.84
Example 5: Find the median age for the following data:
Age 10-14 15-19 20-24 25-29 30-34 35-39 No. of persons 8 12 17 14 11 18
Solution: Since to compute median the class intervals need to be exclusive,
we convert the given inclusive intervals to exclusive intervals by subtracting
0.5 from their lower limit and adding 0.5 to their upper limit. Introducing
the column of less than cumulative frequencies we have,
From the table N = 80. ?m = N/2 = 40.
The first cumulative frequency just greater than 40 is 51. The corresponding
class which is called as median class is 24.5 –29.5.
Thus, l1 = 24.5, l2 = 29.5. ? i = l2 – l1 = 5, f = 14 and pcf = 37
Now, Median = 1 x m pcflifªº«»
«»¬¼= 40 3724.5 x 514ªº«»
«»¬¼=3 x 524.514ªº«»¬¼ ?Median = 24.5 + 1.07 = 25.57.
5.2.4 GRAPHICAL METH OD FOR FINDING MEDIA N
The median can be found by graphical method using the following steps:
1. First the less than cumulative frequencies are calculated.
2. The upper limits of the class intervals are taken on the horizontal X -
axis and the cumulative frequencies are taken on the vertical Y -axis. Age No. of Persons Cumulative frequency 9.5–14.5 8 8 (! 40) 14.5–19.5 12 20 (! 40) 19.5–24.5 17 37 (! 40) 24.5–29.5 14 51 ( > 40) 29.5–34.5 11 62 34.5–39.5 18 80 Total N = 80 munotes.in

Page 63

63 Measures of Central Tendency I I 3. Then we draw a less than ogive curve (cumulative freque ncy curve).
4. The value of N/2 is calculated and is marked on the Y -axis. Let this
point be A.
5. From point A, a line parallel to X -axis is drawn till it touches the ogive
curve at point B (say).
6. From B a line parallel to Y -axis is drawn till it touch es the X -axis at
point C (say). This point C is the required median.
Example 6: Locate the median for the following data graphically:
C.I. 0-10 10-20 20-30 30-40 40-50 50-60 60-70 Frequency 52 36 42 38 54 44 34 Solution: Introducing the column of less th an cumulative frequencies we
have,
C.I. 0-10 10-20 20-30 30-40 40-50 50-60 60-70 Frequency 52 36 42 38 54 44 34 Cf 52 88 130 168 222 266 300 Now we plot the graph of upper limits of the class intervals taken on the
horizontal X -axis against the cumulat ive frequencies taken on the vertical
Y-axis.

Now, since N = 300, m = 150. We locate 150 on the Y -axis, draw a line
parallel to X -axis till it touches the ogive curve and then drop a
perpendicular on the X -axis. The point where it meets the X -axis is th e
median. In this case the approximate value of median is 35.
munotes.in

Page 64

64 Business Statistics
64 v Interested students can calculate median using formula and cross -check
the graphical value of median.
Note : The graphical value of median is approximately same as the
calculated value from f ormula provided a proper graph with right scale is
drawn.
5.3 QUARTILES, DECIL ES AND PERCENTILES
The median is the value which divides the entire data in two equal parts.
50% of the observations are less than (or equal to) and 50% are greater than
(or equ al to) the median value. The value which divides the data in more
than two parts is also useful for statistical calculations and analysis. A value
which divides the entire data in four equal parts is called as Quartile. As we
need three values to divide a set of values in four equal parts, there are three
quartiles, namely first quartile Q1, second quartile Q2 and third quartile Q3.
Similarly, to divide a data in 10 equal parts we need 9 values called as
Deciles. The Deciles are named D1, D2,…. D9. Percent iles are those values
which divide a data in 100 equal parts. There are 99 percentiles, viz P1, P2,
…. , P99.
QUARTILES
We know that Quartiles are the values which divide a set of observations in
four equal parts. Q1 is the value which has 25% observations less than or
equal to it and 75% observations greater than or equal to it. For a continuous
frequency distribution Q1 is that value which represents 25% area under the
histogram to the left of it. Q2 is that value such that 50% observations are
less than or equal to it and 50% observations are greater than or equal to it.
In other words, Q2 is nothing but Median. Q3 is that value such that 75%
observations are less than or equal to it and 25% observations are greater
than or equal to it. For a continuous f requency distribution Q3 is that value
which represents 75% area under the histogram to the left of it and 25% of
area under the histogram to right of it.
The algebraic formula and steps to calculate quartiles are similar to that of
Median. The formulae ar e: 11 x m pcfQl ifªº «»
«»¬¼ 21 x m pcfQl ifªº «»
«»¬¼ 31 x m pcfQl ifªº «»
«»¬¼
Here m = N/4 Here m = N/2 Here m = 3N/4
The steps are exactly same with only difference in step 3 and step 4. The
value of m depends on which quartile we are to find. As Q1 divides the total
frequency in the ratio of 1:4, m is taken as N/4. Q3 divides the total frequency
in the ratio 3:4, m is taken as 3 N/4. munotes.in

Page 65

65 Measures of Central Tendency I I Quartile, like median can be located graphically by taking the
corresponding value of m on the Y -axis and proceeding in the same manner
as stated in section 2.13.
DECILES
Deciles are the values which divide the data into 10 equal parts. The steps
for calculation and the formula for deciles are same as that of the quarti les
with the difference of the value of m. For D1, m = N/10, for D2, m = 2N/10
= N/5, for D3, m = 3N/10, and so on. In general, the formula for calculating
the kth decile is: 1 x km pcfDl ifªº «»
«»¬¼, where 10kNm and k = 1, 2, 3, …, 9
PERCENTILES
Percentiles are the values which divide the data into 100 equal parts. The
steps for calculation and the formula for percentiles are same as that of the
quartiles and deciles with the difference of the value of m. For P1, m =
N/100, for P2, m = 2N/100 = N/50, for P3, m = 3N/100, and so on. In general,
the formula for calculating the kth percentile is:
1 x km pcfPl ifªº «»
«»¬¼, where 100kNm and k = 1, 2, 3, …….. , 99.
Note :
From the above formulae it is clear that D5 = Q2, P25 = Q1, P50 = Q2= D5, 10 1PD , 20 2PD etc.
Example 7: Find the median, 1st & 3rd quartiles, 4th & 8th deciles and 30th
& 60th percentile for the following data Wages per day in Rs. : 50-100 100-150 150-200 200-250 250-300 No. of Workers : 10 24 39 65 52 Wages per day in Rs. : 300-350 350-400 400-450 450-500 500-550 No. of Workers : 45 34 26 15 14 Also find how many workers have wages between 175 and 375.
Solution: Introducing the less than cumulative frequency table: Wages per day in Rs. No. of workers (f) Cumulative frequency 50-100 10 10 100-150 24 34 150-200 39 73 munotes.in

Page 66

66 Business Statistics
66

From the table N = 224.
(i) Median : m = N/2 = 112. The cumulative frequency just greater than
112 is 138. Thus, the median class is 200 – 250 and hence l1 = 200, l2
= 250. ? i = l2 – l1 = 50, f = 65 and pcf = 73
Using the formu la for median, we have
Median = 112 73200 x 5065ªº«»
«»¬¼= 230.
(ii) Quartiles : We have already calculated Q2 which is the median.
For Q1: m = N/4 = 56. The cumulative frequency just greater than 56
is 73. Thus, the first quartile class is 150 – 200 and hence l1 = 150, l2
= 200. ? i = l2 – l1 = 50, f = 39 and pcf = 34.
?Q1 = 11 x m pcfQl ifªº «»
«»¬¼= 56 34150 x 5039ªº«»
«»¬¼= 178.20
For Q3: m = 3N/4 = 168. The cumulative frequency just greater than
168 is 190. Thus, the third quartile class is 250 – 300 and hence l1 =
250, l2 = 300. ? i = l2 – l1 = 50, f = 52 and pcf = 138.
?Q1 = 11 x m pcfQl ifªº «»
«»¬¼= 3168 138250 x 5052Qªº «»
«»¬¼= 278.84
(iii) Deciles :
For D4: m = 4N/10 = 89.6 Proceeding in similar way we have l1 = 150,
l2 = 200. ? i = l2 – l1 = 50
f = 39 and pcf = 34. Using the formula for 4th decile we get:
489.6 34150 x 5039Dªº «»
«»¬¼= 221.28 200-250 65 138 250-300 52 190 300-350 45 235 350-400 34 269 400-450 26 295 450-500 15 210 500-550 14 224 munotes.in

Page 67

67 Measures of Central Tendency I I For D10: m = 8N/10 = 179.2. Hence l1 = 250, l2 = 300. ? i = l2 – l1 =
50, f = 52 and pcf = 138.
8179.2 138250 x 5052Dªº? «»
«»¬¼= 289.61
(iv) Percentiles :
For P30: m = 30N/100 = 67.2. Hence l1 = 150, l2 = 200. ? i = l2 – l1 =
50, f = 39 and pcf = 34.
3067.2 34150 x 5039Pªº? «»
«»¬¼= 192.56
For P60: m = 60N/100 = 134.4. Hence l1 = 200, l2 = 250. ? i = l2 – l1
= 50, f = 65 and pcf = 73
60134.4 73200 x 5065Pªº? «»
«»¬¼= 247.23
(v) To find the number of workers whose wages are between 175 and
375.
Clearly the class intervals 200 – 250 and 250 – 300 include the
number of workers required, which is 65 + 52 = 117.
The number of workers with wages between 175 and 200 is calculated as : 200 175 x 395020|
The number of workers with wages between 350 and 375 is calculated as : 375 350 x 3450= 17
Thus, the number of workers whose daily wages are between Rs. 175 and
Rs. 375
= 117 + 20 + 17 = 154.
Note : In the above problem, the calculation of no. of workers with wages
between a certain limit is done using the following formula: x lxfi,
where l: class limit (upper or lower) of interval,
x: given observation, i: width if the inte rval and f: frequency of that class
interval.
For example: In the interval 20 – 40 with corresponding frequency 15, if we
want to know how many observations are there above 30, then using above
formula,
no. of observations = 40 30 x 15 7.520 munotes.in

Page 68

68 Business Statistics
68 In the interval 10 – 20 with corresponding frequency 12, if we want to
know how many observations are there below 15, then using above
formula, no. of observations = 15 10 x 12 320
Example 8: The following data gives the scores of 150 candidates wh o
appeared for a test. Find the 6th decile and 66th percentile for the data: Scores 50-55 55-60 60-65 65-70 70-75 75-80 80-85 85-90 90-95 95-100 Candidates 12 15 14 21 17 18 20 18 7 8 Solution: Introducing the column of less than cumulativ e frequency we
have, Scores 50-55 55-60 60-65 65-70 70-75 75-80 80-85 85-90 90-95 95-100 Candidates 12 15 14 21 17 18 20 18 7 8 c f 12 27 41 62 79 97 117 135 142 150 From the table: N = 150.
(i ) The formula for D6 is: 61 x m pcfDl ifªº «»
«»¬¼, where m = 6N/10 = 3 N/5 ? m = 90. Hence l1 = 75, l2 = 80, i = 80 – 75 = 5, f = 18 and pcf = 79. 690 7975 x 518Dªº? «»
«»¬¼= 78.05
(ii) The formula for 66th Percentile is : 61 x m pcfPl ifªº «»
«»¬¼, where m =
66N/100 = 33 N/50 ? m = 99. Hence l1 = 80, l2 = 85, i = 85 – 80 = 5, f = 20 and pcf = 97. 699 9780 x 520Pªº? «»
«»¬¼= 80.5
MISSING FREQUENCY PR OBLEMS
Example 9: Find the missing frequency in the following data with median
= 27.5
Profit 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 Firms 4 -- 20 10 7 3 munotes.in

Page 69

69 Measures of Central Tendency I I Solution: Let the missing frequency be k. Preparing the cumulative
frequency distribution table we have: Profit in lakhs of Rs. Firms Cf 0 – 10 4 4 10 – 20 k 4 + k 20 – 30 20 24 + k 30 – 40 10 34 + k 40 – 50 7 41 + k 50 – 60 3 44 + k Total N = 44 + k Since the median = 27.5, it means that the median class is 20 – 30.
Hence, m = 44
22Nk , l1 = 20, l2 = 30, i = 30 – 20 = 10, f = 20 and pcf =
4 + k
Using the media n formula : Median = 1 x m pcflifªº«»
«»¬¼, we have
27.5 = 44(4 )220 x 1020kkªº§·¨¸«»©¹«»
«»
«»¬¼= 44 2(4 )20 x 102 x 20kk ªº «»
«»¬¼
44 8 227.5 204kkªº? «»¬¼ 3627.5 204k
367.54k? 30 36 k
6k? . The missing frequency is 6.
Example 10: If the total frequency for the following data is 100 and median
is 31.5 find the missing frequencies.
C.I. 0.5–10.5 10.5–20.5 20.5–30.5 30.5–40.5 40.5–50.5 50.5–60.5 60.5–70.5 Frequency 8 - 25 20 16 - 6 Solu tion: Let the two missing frequencies be a and b. Preparing the
frequency distribution table we have: munotes.in

Page 70

70 Business Statistics
70 C.I. Frequency cf 0.5–10.5 8 8 10.5–20.5 a 8 + a 20.5–30.5 25 33 + a 30.5–40.5 20 53 + a 40.5–50.5 16 69 + a 50.5–60.5 b 69 + a + b 60.5–70.5 6 75 + a + b Now, from the table N = 75 + a + b. But it’s given that N = 100 ?75 + a + b = 100 a + b = 25 … (*)
Since median = 31.5, the median class is 30.5 – 40.5.
Hence, m = N/2 = 100/2 = 50, l1 = 30.5, l2 = 40.5, i = 40.5 – 30.5 = 10, f =
20 and pcf = 33 + a
Using the median formula : Median = 1 x m pcflifªº«»
«»¬¼, we have
31.5 = 50 (33 )30.5 x 1020a ªº«»¬¼ 50 3331.5 30.52a ? 1712a 2 17 a 15a? (1)
From (*): a + b = 25 and from (1): a = 15 ? b = 10 (2)
Thus, the missing frequencies are 15 and 10 respectively.
5.5 MERITS AND DEMER ITS OF MEDIAN
Merits:
1. It is rigidly defined.
2. It can be easily calculated and also understood.
3. It can be calculated even if some extreme observations are
incomplete.
4. It is not affected by extreme observations in the data. munotes.in

Page 71

71 Measures of Central Tendency I I 5. It can be located graphically using ogive curves.
6. It gives the justice t o find the average of a qualitative attribute of the
data.
Demerits
1. It is difficult to arrange large number of data in ascending or
descending order.
2. It is not useful for any further algebraic treatment.
3. Since it is not affected by the extreme val ues, it may not be a true
representative of a data where the extreme values are important.
4. It is likely to be affected by the sampling variations.
5.5 MODE:
Many a times it is important to know the most likely value in a set of data.
For Example, if w e need to know, which is the most commonly read book
or newspaper in a library or the most common eatable at a stall etc. Mode
is that measure of central tendency which gives a number representing the
most likely item. It is denoted by Z.
5.5.1 Mode for un group data:
For a raw data, it is defined as the observation which occurs maximum
number of times among a set of observations.
Example 11: Find the mode for the following data: 10, 12,11,15,12, 11, 16,
19, 12, 10, 19, 12, 15, 20, and 12.
Solution: By mere inspection we can see that the value 12 is repeated
maximum number of times i.e. 5 times and hence the mode for the given
data is 12.
Note: 1. If all value are identical than mode is same value.
2. If all value are distinct than the mode does not exists.
5.5.2 Mode for group data:
For a discrete data , the observation with highest frequency is defined as
Mode.
Example 12: Calculate the mode for the following distribution: X 5 6 7 8 9 F 12 27 35 23 14 Solution: Here maximum frequency is 35.
Maximum frequen cy is corresponding to the group 7.
Therefore, Mode = 7. munotes.in

Page 72

72 Business Statistics
72 For a continuous frequency distribution , mode is calculated by the
following steps:
1. Let f1 : Highest frequency and l1 & l2 denote the lower and upper class
limits of the corresponding modal class.
2. Let f0 : frequency of pre -modal class and f2 : frequency of post -modal
class.
3. The Mode is now calculated by the formula: 10
1
102 x 2ffZl ifffªº «»¬¼,
where i = l2 – l1
This formula can also be stated as 1
1
12 x Zl iªº' «»' '¬¼, where 110ff ' and 21 2ff ' .
Example 13: The following data gives the ages of the number of drop out
students in a village. Find the modal age.
Age group 3-5 5-7 7-9 9-11 11-13 13-15 15-17 Frequency 25
(f0) 71
(f1) 44
(f2) 33 58 30 39
Solution: This is a continuous frequency distribution. Since the maximum
frequency is 71, the modal class is 5 – 7. Hence, l1 = 5, l2 = 7, i = 7 – 5 = 2,
f0 = 25, f1 = 71 and f2 = 44. ?1'= f1 – f0 = 71 – 25 = 46 and 2'= f1 – f2 =
71 – 44 = 27.
Now, the formula to calculate mode now is: 1
1
12 x Zl iªº' «»' '¬¼ 465 x 246 27Zªº? «»¬¼ 5 1.26Z = 6.26.
Example 14: The following data gives the population of women of different
age groups in a city. Find the modal age Age group 1-10 11-20 21-30 31-40 41-50 51-60 61-70 No. of women 1022 1239 (f0) 1350 (f1) 768 (f2) 981 1074 739
Solution: This is inclusive type of data. First the age groups are converted
to exclusive type by subtracting and adding 0.5 to the lower and upper class
limits respectively.
Now the maximum frequency is 1350, the modal class is 20.5 – 30.5. Hence,
l1 = 20.5, l2 = 30.5, munotes.in

Page 73

73 Measures of Central Tendency I I i = 30.5 – 20.5 = 10, f0 = 1239, f1 = 1350 and f2 = 768. ?1'= f1 – f0 =
1350 – 1239 = 111 and 2'= f1 – f2 = 1350 – 768 =582.
Now, the formula to calculate mode now is: 1
1
12 x Zl iªº' «»' '¬¼ 11120.5 x10111 582Zªº? «»¬¼= 20.5 + 1.6 = 22.1 |22 ?the modal age is 22 years.
Missing frequency problems involving mean, median and mode
Example 15: If the median is 27.41 and Mode is 25.63 for the following
data, find the missing frequencies.
C.I. 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 Frequency 12 -- 27 20 -- 6 Solution: Let the missing frequencies be a and b. C.I. Frequency Cf 0 – 10 12 12 10 – 20 a 12 + a 20 – 30 27 39 + a 30 – 40 20 59 + a 40 – 50 b 59 + a + b 50 – 60 6 65 + a + b Given Z = 25.63 the modal class is 20 – 30. Now, l1 = 20, l2 = 30, i =
30 – 20 = 10, f0 = a, f1 = 27, and f2 = 20. ?1'= f1 – f0 = 27 - a and 2'= f1
– f2 = 27 – 20 = 7.
Now, the formula to calculate mode now is: 1
1
12 x Zl iªº' «»' '¬¼ 2725.63 20 x1027 7a
aªº? «»¬¼ 270 1025.63 2034a
a 5.63(34 ) 270 10 aa ? 191.42 5.63 270 10 aa 4.37 78.58a ? 17.98 18a | (1)
Now from tab le and (1) we have: N = 65 + a + b = 65 + 18 + b = 83 + b munotes.in

Page 74

74 Business Statistics
74 Since Median = 27.41, median class is again 20 – 30. Hence, m = N/2 = 83
2b, l1 = 20, l2 = 30,
f = 27 and pcf = 12 + a = 12 + 18 = 30 …from (1)
Median = 1 x m pcflifªº«»¬¼ 8330227.41 20 x 1027bªº«»? «»
«»«»¬¼ 83 6027.41 20 x 1027 x 2bªº? «»¬¼ 230 107.4154b 400.14 230 10 b ? 170.14 10 b 17.01 17b | (2)
Thus the missing frequencies are 18 and 17.
5.5.3 GRAPHICAL LOCA TION OF MODE
Mod e for a grouped frequency distribution can be computed graphically by
drawing the Histogram representing the data. The steps involved are as
follows:
1. The Histogram representing the data is drawn.
2. Two diagonals are drawn connecting the upper corners o f the bar
representing the modal class to the upper corners of the adjacent bars.
3. A perpendicular is drawn from the point of intersection of these two
diagonals to meet the X -axis at a point say Z. This value is the Mode
of the distribution.
Example 16: Locate the mode graphically for the following data: Class Interval 0 – 3 3 – 6 6 – 9 9 – 12 12 – 15 Frequency 3 6 8 5 2 Solution: The Histogram representing the above frequency distribution is
drawn as follows:

munotes.in

Page 75

75 Measures of Central Tendency I I 5.6 MERITS AND DEMER ITS OF MODE:
Merits
1. It is simply defined and hence is easy to understand and calculate.
2. It can be easily located from the graph.
3. It is not affected by the extreme values of the data.
4. It may be used for describing quantitative and qualitative data.
5. Because of its most likely approach, mode is a popular average for
common people to areas like Biostatistics.
Demerits
1. In case of bimodal or multimodal frequency distributions it is not
possible to interpret or compare.
2. It is not rigid ly defined. Different methods may give different answers
3. It is not based on all observations and hence does not represent the
entire set of data.
4. It cannot be used for further algebraic treatment.
5.7 COMPARATIVE ANALYSI S OF ALL MEASURES OF
CENTRAL TENDENCY :
For a given data, a choice of an average is very important. Any choice
should be made by taking into consideration the following:
(i) Purpose of computing an average.
(ii) Need for further treatment of the average.
(iii) Type of data in han d.
(iv) Merits and Demerits of an average over the other.
Arithmetic Mean can be generally used for all purposes.
Geometric Mean can be used where the small observations require more
weight and vice versa.
Median being a positional average can be used for qualitative interpretation
or when the data is incomplete.
Mode can be used when the purpose is to find the most likely type of value
from the data.
Finally we conclude with some funny yet important note. Drawing
conclusions from any statistical measure i s no doubt the important job but
also a difficult one. If the averages are not interpreted properly it may lead
to absurd conclusions. Let us see some Examples:
(i) The average depth of a swimming pool is 5ft.
Wrong Conclusion: Any person of height more than 5 ft for e.g. 6ft
can cross the pool easily! munotes.in

Page 76

76 Business Statistics
76 (ii) The average marks of students of Division A of FYBMS class of a
college are 20% and that of Division B are 50%.
Wrong Conclusion: All students of Division A are not academically
strong as compared to those of Division B!
(iii) On an average, only 10% casualties related to local trains are due to
travelling on the roof, while the other reasons constitute 90%
casualties.
Wrong Conclusion: It is safer to travel on the roof of a local train!
(iv) The a verage daily income of a resident of Mumbai is Rs. 5,000.
Wrong Conclusion: All Mumbaikars, from beggars ( well that may be
true!!) to entrepreneurs earn more than or at least Rs. 5,000 per day.
(v) In a society with 10 flats, the number of children in e ach family is 2,
1, 3, 2, 2, 1,0, 2, 3, 2.
Wrong Conclusion: ( a) Every family has at least 2 children! This is
based on Mode.
(b) Every family has 1.8 children! This is based on arithmetic mean.
(Don’t ask where to br ing that 0.8th child from!!)
RELATION BETWEEN MOD E FROM MEAN AND MEDI AN
Mode can also be calculated using mean and median by the experimental
approximate formula given by Karl Pearson:
Mode = 3Median – 2Mean
If the values of mean and median for a certain are known, then the mode
can be calculated by the above formula.
Example 17: If the mean and median of a certain data is 34.6 and 38
respectively, find the mode
Solution : Using the Karl Pearson’s formula, we have
Mode = 3Me dian – 2Mean ?Z = 3(38) -2(34.6) = 44.8
5.8 LET US SUM UP:
In this chapter we have learn:
• Types of positional averages.
• To calculate median for ungroup and group data.
• To calculate median graphically.
• To calculate quartile s, deciles and percentiles.
• To calculate mode foe ungroup and group data.
• To calculate mode graphically. munotes.in

Page 77

77 Measures of Central Tendency I I • To find relation between mean, median and mode.
5.9 UNIT END EXERCIS ES:
1. Define Mean, Median and Mode. What are the advantages of median
over m ean and mode over mean?
2. State giving examples, the factors based on which a measure of central
tendency is selected.
3. What are important features of a good average?
4. Discuss with examples the difference between a simple arithmetic
mean and weighted mean.
5. Write a short note on graphical representations of averages.
6. The heights in cm of 12 boys in a class are given as follows: 154,
157, 162, 158, 171, 169, 153, 156, 157, 164, 166, 170. Find the
median age.
7. The speed of a bus at 10 check point s was observed as follows: 40,
35, 42, 50, 40, 60, 45, 50, 44, 52. Find the median speed of the bus.
8. The weights in kg of 13 children are as follows: 36, 32, 40, 39,35,
43, 41, 38, 30, 29, 37, 33, 40. Find the median weight.
9. The number of telephone c alls made every successive hour from a
PCO is given below. Find the median. No. of calls 10 15 20 25 30 35 Frequency 26 30 16 22 10 6 10. The IQ test of 145 students is as follows: Find the media, 1st quartile,
8th decile and 24th percentile.
C.I. 50-70 70-90 90-110 110-130 130-150 150-170 170-190 No. of students 22 38 40 10 15 5 20
11. Locate the quartiles graphically for the following data:
Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100 No. of students 37 55 62 30 28 39 40 100 67 42
12. If the median for the following distribution is 17.5, find the missing
frequency. C.I. 0-10 10-20 20-30 30-40 40-50 Frequency 10 ? 20 00 10 munotes.in

Page 78

78 Business Statistics
78 13. If the median and mode for the following data is 27.5 and 25.83
(approx), find the missing frequencies . Marks (less than) 10 20 30 40 50 60 No. of students 4 10 30 40 47 50
14. The weights of Alphanso mangoes which are to be exported are
given below. The standard weight for export quality mango is 800
gm. What percentage of the total mangoes will qualify to get
exported?
Weights 200-400 400-600 600-700 700-800 800-900 900-1000 1000-1100 1100-1200 No. of students 37 55 62 30 28 39 40 100
15. Find the mode for the following data: 11, 23, 12, 11, 32, 23, 18, 11,
22, 24, 31, 11, 15, 18, 12, 11.
16. In an e xamination, the questions attempted by 100 students are given
below. Find the mode. Q. No. 1 2 3 4 5 6 7 8 No. of attempts 45 58 77 35 49 70 22 48
17. If the mode for the following data is 55, find the missing frequency
for the following data:
C.I. 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100 Frequency 3 5 7 10 ? 15 12 6 2 8 18. If the mean and median values are 67.56 and 61.23, find the modal
value.
19. If the average marks of 120 students are 75 and their modal marks
are 70, find the m edian marks.
20. Locate the mode for the following data graphically.
C.I. 0-10 0-20 0-30 0-40 0-50 0-60 0-70 0-80 0-90 0-100 Frequency 6 18 33 47 60 80 92 100 115 120 21. Locate the mode graphically for the following data:;
Age 10 – 15 15 – 20 20 – 25 25 – 30 30 – 35 35 – 40 No. of persons 14 28 40 30 22 16 munotes.in

Page 79

79 Measures of Central Tendency I I 22. Calculate the mean, median and mode for the following data: Income in ’000
Rs. 1-
5 6-
10 11-
15 16-
20 21-
25 26-
30 31-
35 36-
40 41-
45 46-
50 No. of families 13 15 20 22 16 10 08 05 09 02
23. Calcul ate the mean, median and mode for the following data:
C.I. 5-15 15-25 25-35 35-45 45-55 55-65 65-75 75-85 85-95 Frequency 4 12 16 20 14 18 03 12 01 24. If the mean and median of the following data is 23 and 22.77, find
the missing frequency. CI 0-10 10-20 20-30 30-40 40-50 Frequency 16 ? 18 ? 10 Multiple Choice Questions:
1. Which one divide the data in four equal parts : -
a) Mode b) Median c) Quartile d) Decile
2. In case of extreme values the best measure of central tendency is : -
a) A.M. b) Med ian c)Mode d) None of the above
3. ________the value of the middle observation when the observation
are arranged in the order of their magnitude .
a) Mean b) Median c) Standard deviation
d) Variance
4. The mode of the data, 1, 2, 2, 2, 3, 3, 1,5 i s__________
a) 1 b) 2 c) 3 d) 5
5. Percentiles divide the data into__________
a) 100 b) 200 c) 300 d) 110
6. The mode of the following data, is__________ X 12 14 18 20 22 F 2 18 24 12 8 a) 18 b) 22 c) 24 d) 8 munotes.in

Page 80

80 Business Statistics
80 7. Percentiles divide the data into 100 parts using __________ number of
values.
a) 100 b) 99 c) 90 d) 101
8. In deciles, central tendency median to be measured must lie in
a) fourth deciles. b) seventh deciles c) eighth deciles. d)
fifth deciles.
9. When data is arranged, middle value in set of observations is
classified as
a) median. b) mean. c) variance. d)
standard deviation.
10. In a negative skewed distribution, order of mean, median a nd mode
is as
a) mean < median > mode. b) mean > median > mode.
c) mean < median < mode.m d) mean > median < mode.
5.10 LIST OF REFEREN CES:
• Fundamentals of mathematical Statistics by S.C. Gupta and V.K
Kapoor.
• Basic Statistics by B. L. Agrawal.
7777777
munotes.in

Page 81

81 6
MEASURE OF DISPERSION
Unit Structure
6.0 Objectives:
6.1 Introduction:
6.2 Function of measures of dispersion
6.3 Fundamentals of dispersion
6.4 Types of dispersion
6.5 Range
6.5.1 Coefficient of range
6.5.2 Merits and demerits of range
6.6 Semi – inter quartile range or quartile deviation (Q.D.)
6.6.1 Coefficient of quartile deviation
6.6.2 Merits and demerits of quartile deviation
6.7 Mean deviation and coefficient of Mean deviation.
6.7.1 Merits and demerits of Mean deviation
6.8 Variance and Sta ndard Deviation
6.8.1 Coefficient of variation
6.8.2 Combined standard deviation
6.8.3 Merits and demerits of standard deviation
6.9 Skewness& kurtosis
6.10 Let us sum up
6.11 Unit end Exercises
6.12 List of References
6.0 OBJECTIVES:
After going throu gh this chapter you will able to know:
• Meaning and function of measure of dispersion.
• Types of measure of dispersion.
• Different method of calculating measure of dispersion.
• Concept of Skewness and Kurtosis.
6.1 INTRODUCTION:
In the previous chapter we have seen that how a measure of central tendency
represents the entire data. Though an average is of great importance for
statistical analysis, it has its limitations, as seen in the last section of chapter munotes.in

Page 82

82 Business Statistics
82 two. There can be cases wherein the averages may come out to be same but
the individual observations and their trends may be completely different.
For example, let us consider the following three sets of data:
Data I :0, 10, 30, 50, 80, 130
Data II :45, 57, 48, 59, 60, 31
Data III : 50.2, 49.8, 48.7, 50.5, 50.7, 50.1
In all the three cases, the average is 50. But is it correct hence, to conclude
all the three data show a similar trend of observations? The answer is no.
The first type of data is completely away from the central value 50. In the
next case, th e data is scattered around the central value. While in the third
case, the data is densely scattered around the central value. So, if we analyse
the given data only on the basis of its average then we may not get a right
conclusion. There is something mor e which needs to be known. There are
cases when we may be interested more in how is the trend of the
observations and not its average. In view of the above example, we may be
interested in knowing how much is the data scattered from the central value.
This extent of scatter is called as dispersion. The lesser the dispersion it
means that the average is a true representative of the entire data. More
dispersion indicates that the average is not the true representation of the
data. This information is most imp ortant for testing of hypothesis, testing
the consistency or for forecasting purposes in statistical analysis.
Dispersion is measured as an average of the deviations of all the
observations from the central value. We know that the measures of central
tende ncies are referred as the averages of first order. Since dispersion is
calculated from the central value or averages, it is often called as average of
second order.
6.2 FUNCTION OF MEAS URES OF DISPERSION:
The following are the functions of a measure of di spersion:
1. To test the reliability of the measures of central tendencies
The measures of dispersion help in testing whether an average is a
true representation of the data. If the dispersion is small, then it means
that the observations are scattered aro und the average value. In other
words, the average taken is reliable. If the dispersion is high, then it
means that the observations are scattered away from the central value
and hence, the average taken in not reliable. Zero dispersion indicates
that all the observations are identical.
2. To facilitate in identifying and rectifying the causes of variations
Measures of dispersion facilitates in identifying the extent of
variations of the observations from the average value. This
information may be used to r ectify and take proper measure to control
the variations. In areas of Biostatistics and Sociology it is used to
understand the nature and extent of the causes of deviations from the
average values. munotes.in

Page 83

83 Measure of Dispersion 3. To compare the variability of two or more series
The mo st important function of a measure of dispersion is its utility
to compare the variations in different sets of data. The consistency of
players, performance of students, insurance policies or its agents,
share prices of different firms etc can be calculate d using measures of
dispersion.
4. To help in computations for further statistical measures
The measures of dispersion are used for further statistical measures
like correlation anlaysis, regression analysis, forecasting, testing of
hypothesis, analysis of variance etc.
6.3 FUNDAMENTALS OF DISPERSION:
The fundamentals of a measure of dispersion are same as that of the
measures of central tendencies. The properties are as follows:
• It should be easy for calculation.
• It should be simple to understand.
• It shoul d be rigidly defined.
• It should not be affected by the extreme observations.
• It should be based on all observations.
• It should not be affected by sampling fluctuations.
• It should be available for further algebraic treatment.
6.4 TYPES OF DISPERS ION:
The me asures of dispersion are classified in two types: (1) Absolute
measure and (2) Relative measure.
(1) Absolute measure: The measure of dispersion which is expressed as
the absolute variation between the observations from the central
value, in the same units that of the observations, is called as an
absolute measure. This is useful in comparing the variations in two or
more sets of data measured in same units.
(2) Relative measure: The measure of dispersion which is expressed as
the ratio of the absolute devi ations to the central value is called as
relative measure.
The different absolute and relative measures of dispersion are: Absolute measure Relative measure Range Coefficient of Range Quartile Deviation Coefficient of Quartile Deviation munotes.in

Page 84

84 Business Statistics
84 Mean Deviation (from mean, median or
mode) Coefficient of Mean
Deviation Standard Deviation Coefficient of Variation 6.5 RANGE:
It is one of the simplest and elementary measures of dispersion. Range is
the difference between the smallest and the largest observation of a data.
Range of an ungrouped data :
Symbolically, if L denotes the largest observation and S denotes the smallest
observation, then the absolute measure for an ungrouped data, Range
denoted by R is given by: R = L – S.
For example, if the height in cm o f 6 students in a class are 156, 160, 145,
161, 167, 164 then the R = 167 – 145 = 22 cm. This gives the variability in
the height of students.
Range of a grouped data
If the data is grouped into class intervals then the range can be computed by
any of the following methods:
(1) R = U – L, where U : upper limit of the highest class interval and L : lower limit of the lowest class interval.
OR
(2) R = xU – xL, where xU: mid point of the highest class interval and xL: mid point of the lowest class interval.
6.5.1 Coefficient of range:
As in the above example if the weight in kg of the same 6 students is 42,
39, 51, 47, 55, 40 then the range is R = 55 – 39 = 16 kg. But both the range
values cannot be compared as one of them is in cm and the other is in kg.
Thus, a relative measure which independent of the unit of observations is
used called as coefficient of range .
The relative measure of range i.e., Coefficient of Range = LS
LS
or UL
UL

Example 1: Find the range for the following data giving the distances
covered in km by different missiles. 1000, 600, 3000, 3500, 2000, 1500,
1200 and 250. Also compute coefficient of range.
Solution: Here L = 3500 and S = 250 ?Range R = L – S = 3500 – 250 = 3250 km.
Coefficient of Range = LS
LS
= 3500 250 32500.873500 250 3750
Example 2: Find the range for the following data giving marks of 50
students in a class test of 50 marks. Also find the coefficient of range. munotes.in

Page 85

85 Measure of Dispersion Marks 0 – 10 10 – 20 20 – 30 30 – 40 No. of students 08 15 17 10
Solution: Here the upper limit of the highest class interval 30 – 40 is U =
40 and the lower limit of the lowest class int erval 0 – 10 is L = 0. ?R = U – L = 40 – 0 = 40.
Coefficient of Range =40 0140 0UL
UL .
Example 3: The coefficient of range of a certain set of observations is 0.6.
If the smallest observation is 100, find the largest obs ervation.
Solution: Given S = 100 and coefficient of range i.e LS
LS
= 0.6 ?1000.6100L
L 100 0.6( 100)LL 100 0.6 600LL? 0.6 100 600LL 0.4 700 1750LL ?
Thus, the largest observation is 1750.
6.5.2 Merits and demerits of range:
Merits:
(1) It is easy to compute.
(2) It is easy to understand.
Demerits:
1. It is not based on all observations : Range is computed only on the
basis of extreme values. So, two different types of data with same
extreme values will have same dispersion, eventhough the individual
sets of data may be having completely different observations. Thus, it
does not measure the dispersion of all observations.
2. It is easily affected by th e extreme observations : Again, it is
obviously affected by the change in extreme values.
3. It is readily affected by sampling fluctuations : If the samples taken
include or exclude the extreme values of the entire population, the
value of the range will be completely diverse.
4. It cannot be used for open -end frequency distributions : If a given
data has unknown lower class limit of the lowest class or unknown
upper class limit if the highest class, it is not possible to compute
range.
As can be seen clear ly, Range has more demerits to its credit than
merits; still it is a popular measure and is used widely in certain fields. munotes.in

Page 86

86 Business Statistics
86 In data related to small fluctuations, range is commonly used. For
example, to study the temperature fluctuations of a city in day ti me,
share prices of certain firm, rainfall, prices of commodities, currency
rate, to prepare control charts and also as a quality control measure.
6.6 SEMI – INTER QUA RTILE RANGE OR QUART ILE
DEVIATION (Q.D.):
Semi – inter quartile range also called as quar tile deviation, is the mid point
of the inter quartile range. Symbolically, Q.D. = 31
2QQ. It is an absolute
measure of dispersion.
6.6.1 Coefficient of quartile deviation:
The corresponding relative measure of Q.D. is defined as follo ws:
Coefficient of Q.D. = 31
31
31 312
2QQ
QQ
QQ QQ

Example 4: Find the Quartile Deviation of the daily wages (in Rs.) of 11
workers given as follows: 125, 75, 80, 50, 60, 40, 50, 100, 85, 90, 45.
Solution: Arranging the data in ascending order we have t he wages of the
11 workers as follows:
40, 45, 50, 50, 60, 75, 80, 85, 90, 100, 125
Since the number of observations is odd (11), the 1st Quartile is given by:
Q1 = (11 + 1)/4 = 3rd observation = 50. … (1)
Q3 = 3(11 + 1)/4 = 9th observation = 90. … (2)
Q.D. = 31
2QQ= (90 – 50)/2 = 20 … from (1) and (2)
Example 5: The following data gives the weight of 60 students in a class.
Find the range of the weights of central 50% students.
Solution: To find range of the weight’s of central 50 % students means to
find the inter quartile range. For that we require Q1 and Q3. The column of
less than c f is introduced as follows:
Weight in kg 30 – 35 35 – 40 40 – 45 45 – 50 50 – 55 55 – 60 No. of students 4 16 12 8 10 5 munotes.in

Page 87

87 Measure of Dispersion
Q1 class Q3 class
To find Q1: N = 56. Thus m = N/4 = 14.
The cf just greater than 14 is 20, so 35 – 40 is the 1st quartile class and
l1 = 35, l2 = 40, i = 40 – 35 = 5, f = 16 and pcf = 4. 1114 4 x 35 x 5 35 3.125 38.12516m pcfQl ifªº ªº? «» «»¬¼ ¬¼
To find Q3: N = 56. Thus m = 3N/4 = 42.
The cf just greater than 42 is 50, so 50 – 55 is the 3rd quartile class and
l1 = 50, l2 = 55, i = 55 – 50 = 5, f = 10 and pcf = 40. 1142 40 x 50 x 5 50 1 5110m pcfQl ifªº ªº? «» «»¬¼ ¬¼ ?inter quartile range = Q 3 – Q1 = 51 – 38.125 = 12.875 kg
Thus, the range of weight for the central 50% students = 12.875kg
Example 6: Find the semi – inter quartile ra nge and its coefficient for the
following data:
Solution: The less than c f are computed and the table is completed as
follows:
Here N = 112.
To find Q1: m = N/4 = 28. Weight in kg 30 – 35 35 – 40 40 – 45 45 – 50 50 – 55 55 – 60 No. of students 4 16 12 8 10 6 cf 4 20 32 40 50 56
Size of shoe 0 1 2 3 4 5 6 7 8 9 10 No. of boys 7 10 15 11 18 10 16 5 12 6 2 Size of shoe 0 1 2 3 4 5 6 7 8 9 10 No. of boys 7 10 15 11 18 10 16 5 12 6 2 c f 7 17 32 43 61 71 87 92 104 110 112 munotes.in

Page 88

88 Business Statistics
88 The first c f just greater than 28 is 32, so the 1st quartile is Q1 = 2
To find Q3: m = 3N/4 = 84.
The first c f just greater than 84 is 87, so the 3rd quartile is Q3 = 6
?the semi inter quartile range i.e. Q.D. = 31 62222QQ
Coefficient of Q.D. = 31
31620.562QQ
QQ
Example 7: The following data gives the sales (in ’00 Rs.) of two salesmen,
Hari a nd Ganesh in a week. Compare their coefficient of Q.D. of the sales
and comment.
Solution: We first arrange the given data in ascending order.

Here
N =
7.
Comparing both the coefficients we can s ay that the deviations in the sales
of Hari from the median sales are less than that of the deviations in the sales
of Ganesh from his median sales. Sales of Hari 70 55 80 60 55 88 75 Sales of Ganesh 60 75 85 65 60 90 110 Observation no. 1 2 3 4 5 6 7 Sales of Hari 55 55 60 70 75 80 88 Sales of Ganesh 60 60 65 75 85 90 110 For Hari For Ganesh (N+1)/4 = 2 (N+1)/4 = 2 ?Q1 = 55 ?Q1 = 60 3(N+1)/4 = 6 3(N+1)/4 = 6 ?Q3 = 80 ?Q3 = 90 Coefficient of Q.D. = 31
31QQ
QQ

= 80 55
80 55
=ଶହ
ଵଷହ
0.18 Coefficient of Q.D. =

90 60 30
90 60 150
0.2 31
31QQ
QQ
munotes.in

Page 89

89 Measure of Dispersion 6.6.2 Merits and demerits of quartile deviation:
Merits
• It is easy to understand and compute.
• It is rigidly defined.
• It is not affected by extreme observations.
• It can be used for open end class interval distribution.
Demerits
• It is not based on all observations.
• It is affected by sampling fluctuations.
• It cannot be used for further algebraic treatment.
6.7 MEA N DEVIATION AND COEF FICIENT OF MEAN
DEVIATION:
The previous two measures range and Q.D. did not consider all the
observations and their deviation from the central value. Mean deviation also
called as mean absolute deviation, overcomes this drawback. Mean
Deviation (M.D.) is an absolute measure and is defined as the average of all
the absolute differences of the observations from the central value. Any of
the three averages; mean, median or mode can be taken. Mode of an
observation is generally not considere d for mean deviation as its value is
sometimes indeterminate. The value of M.D. from median is always less
than the value of M.D. from mean.
Symbolically the formula for ungrouped and grouped data is as tabled
below:
Table 6.1 For ungrouped data For grouped data d
n6
Here d = avgx, n:no. of
observations
and avg: mean, median or mode .fd
N6
Here d = avgx, N = f6
and avg: mean, median or
mode
The r elative measure of M.D. is the coefficient of M.D. is given by the
formula:
Coefficient of M.D. = mean deviation
average
The formulae to find the mean deviations are summarized in the table
below: munotes.in

Page 90

90 Business Statistics
90 Table 6.2
Here x: mean, M: median and Z: mode
Steps to find M.D.
(1) The average from whom the deviation is to b e found is computed first.
(2) The absolute differences from the average are calculated and their
sum is computed.
(3) In case of grouped data, the product of absolute differences with the
corresponding frequencies is calculated and their sum is computed.
(4) Using the appropriate formula the mean deviation is computed.
v If nothing is mentioned about the average then median is to be taken.
Example 8: Find the mean deviation from mean and its coefficient for the
following data giving the rainfall in cm in d ifferent areas in Maharashtra:
105, 90, 102, 67, 71, 52, 80, 30, 70 and 48.
Solution: Since we have to compute M.D. from mean, we first prepare the
table for finding mean and then introduce columns of absolute deviations
from the mean.
105 +90 102 67 71 52 80 30 70 48. 71571.510 10x

Ungrouped Data Grouped Data Coefficient of M.D. M.D. from mean (xG) xx
n6 fx x
N6 x
xG
M.D. from median (MG) xM
n6 fx M
N6 M
MG
M.D. from mode (ZG) xZ
n6 fx Z
N6 Z
ZG
x d = xx 105 33.5 90 18.5 102 30.5 67 4.5 71 0.5 52 19.5 80 8.5 30 41.5 70 1.5 48 23.5 x = 715 d6= 182 Now, M.D. from mean = d
n6
xG = 182
10
xG?= 18.2
Coefficient of M.D. from
mean
= x
xG= 18.2
71.5 munotes.in

Page 91

91 Measure of Dispersion Example 9: The marks obtained by 10 students in a test are given below.
Find the M.D. from median and its relative measure.
Solution: The marks of 10 students are arranged i n ascending order and its
median is found. The column of absolute deviations from median is
introduced and its sum is computed. Using the formula mentioned above,
M.D. from median and its coefficient is calculated. x d = xM 03 5 04 4 05 3 06 2 10 2 10 2 11 3 13 5 15 7 17 9 Total d6= 42 Example 10: On the Mumbai – Nashik highway the number of accidents
per day in 6 months are given below. Find the mean deviation and
coefficient of M.D. No. of accide nts 0 1 2 3 4 5 6 7 8 9 10 No. of days 26 32 41 12 22 10 05 01 06 15 10
Solution: Marks: 15 10 10 03 06 04 11 17 13 05
No. of accidents No. of days ( f) cf d = xM f.d 0 26 26 2 52 1 32 58 1 32 2 41 99 0 0 3 12 111 1 12 4 22 133 2 44 Since N = 10, Median = A.M. of 5th and 6th
observation
M?= 6 1082
M.D. from median = d
n6= 42
10 ?MG= 4.2
Coefficient of M.D. = 4.20.5258M
MG
munotes.in

Page 92

92 Business Statistics
92 Now, N = 180 ?m =
N/2 = 180/2 = 90.
The cf just greater
than 90 is 99. The
corresponding
observation is 2. ?M
= 2
M.D. from median =MG= fd
N6= 4162.31180
Coefficient of M.D. = M
MG= 2.311.152
Example 11: The following data gives the wages of 20 0 workers in a
factory with minimum wages Rs. 60 and maximum wages as Rs. 200. Find
the mean deviation and compute its relative measure. Wages less than 80 100 120 140 160 180 200 No of workers 30 45 77 98 128 172 200
Solution: The data is given with les s than cf, we first convert them to
frequencies then find the median and follow the steps to compute M.D. as
mentioned above.
Now, N = 200 ?m and = N/2 = 200/2 = 100. 5 10 143 3 30 6 05 148 4 20 7 01 149 5 05 8 06 155 6 36 9 15 170 7 105 10 10 180 8 80 Total N = 180 - - fd6= 416
Wages in Rs No. of workers ( f) cf x d = 141.33x fd 60 – 80 30 30 70 71.33 2139.9 80 – 100 15 45 90 51.33 769.95 100 – 120 32 77 110 31.33 1002.56 120 – 140 21 98 130 11.33 237.93 140 – 160 30 128 150 8.67 260.1 160 – 180 44 172 170 28.67 1261.48 180 – 200 28 200 190 48.67 1362.76 Total N = 200 - - - fd6= 7034.68 munotes.in

Page 93

93 Measure of Dispersion The median class is 140 – 160. ?l1 = 140, l2 = 160, i = 160 -140 = 20, f = 30
and pcf = 98 ?M = 1100 98 x 140 x 20 140 1.33 141.3330m pcflifªº ªº «» «»¬¼ ¬¼
M.D. from media n =MG= fd
N6= 7034.6835.1734200
Coefficient of M.D. = M
MG= 35.17340.25141.33
Example 12: The following data gives the ages of people residing in a
society. Find the mean, M.D. from mean and coefficient of M.D.
Ages in
yrs. 0 –
10 10 –
20 20 –
30 30 –
40 40 –
50 50 –
60 60 –
70 No of people 21 16 43 54 32 12 22
Solution: The table for computing mean and M.D. from mean is as follows:
Ages
in yrs. Mid point
(x) No. of
peopl e
(f) Fx d fd 0 – 10 5 21 105 29.2 613.2 10 – 20 15 16 240 19.2 307.2 20 – 30 25 43 1075 9.2 395.6 30 – 40 35 54 1890 0.8 43.2 40 – 50 45 32 1440 10.8 345.6 50 – 60 55 12 660 20.8 249.6 60 – 70 65 22 1430 30.8 677.6 Total - 200 fx6 6840 - fd6= 2632
From the table we have N = 200 and fx6 6840 684034.2200fxxN6?
We compute the absolute differences of the mid points from xin the 5th
column of d and mul tiply with the corresponding frequencies in the 6th
column of fd. munotes.in

Page 94

94 Business Statistics
94 Now, we have fd6= 2632 and N = 200 ?M.D. from mean = 263213.16200fdxNG6
Coefficient of M.D. from mean = 13.160.3834.2x
xG
6.7.1 Merits and demerits of mean deviation:
Merits
• It is easy to understand and compute.
• It is rigidly defined.
• It is based on all observations.
• It is not affected by extreme observations.
Demerits
• Though it is rigidly defined, it can be calculated using any of the
averages, which may create problems in comparing different mean
deviations.
• If mean deviation is calculated from mode it does not prove to be an
accurate measure of dispersion.
• It cannot be used for further algebraic treatment.
6.8 VARIANCE AND ST ANDARD DEVIATION:
In computing mean deviations the sign of the deviations are ignored by
taking their absolute value. This can be overcome by taking the squares of
the deviations.
The average of the square of the deviations measured from mean is called
as variance.
Symbolically, variance = 2d
n6or 2.fd
N6; where d = xx.
The positive square root of variance is called as standard deviation (S.D.)
and is denoted by the Greek alphabet ‘V’ (sigma).
Symbolically, 2d
nV6 or 2.fd
NV6 ; where d = xx.
…(2.1)
Thus, 2V= variance.
S.D. can also be computed by another formula: munotes.in

Page 95

95 Measure of Dispersion 2 222 xx xxnn nV66 6 §· ¨¸©¹ …(2.2) 2 222 ..fx fx f xxNN NV66 6 §· ¨¸©¹ …(2.3)
The advantage of second type of formulae over (1) is that, if the number of
observations is large and the mean is not a whole number, it is easier to find
x2 and f.x2 than calculating the absolute de viations ( d) from mean and their
squares.
Steps to find S.D. using formula (2): Ungrouped Data Grouped Data 1. Find the sum of all observations. i.e. x6. 1. Find the class mid points; multiply by the corresponding frequencies a nd total it. i.e. find fx6.
2. Square the observations and find its total, i.e.2x6. 2. Multiply each entry from the column of fx with x to get fx2 and
sum them to get2.fx6.
3. Using the formula (2.2), S.D. is computed. 3. Using the formula (2.3), S.D. is computed.

v In the following solved examples only the first example is solved using
formula 2.1, rest all are solved using formula 2.2 or 2.3. Students can
themselves understand the si mplicity and speed of these formulae over 2.1.
Example 13: The marks of internal assessment obtained by FYBMS
students in a college are given below. Find the mean marks and standard
deviation. 22 30 36 12 15 25 18 10 33 29 Solution: We first sum all the observations and find the mean. Then the
differences of the observations from the mean are computed and squared.
The positive square root average of sum of square of the differences is the
required standard deviation. x dx x d2 22 -1 1 30 7 49 I. xxn6 = 2302310 .
II. 2738
10d
nV6 = munotes.in

Page 96

96 Business Statistics
96 36 13 169 12 -11 121 15 -8 64 25 2 4 18 -5 25 10 -13 169 33 10 100 29 6 36 x6= 230 - 2d6= 738
Example 14: Find the standard deviation for the following data: 03 12 17 29 10 05 18 14 12 20 Solution: We find the sum of the observations and the sum of its squares.
Using formula 2.2, S.D. is computed as follows: x x2 3 9 12 144 17 289 29 841 10 100 05 25 18 324 14 196 12 144 20 400 x6= 140 2x6= 2472

Example 15: Compute the standard deviation for the following data x 100 102 104 106 108 110 112 F 5 11 7 9 13 10 12 Solution: Short -cut method :
In problems where the value of x is large (consequently its square also will
be very large t o compute), we use the short -cut method. In this method, a The formula 2.2 gives the S.D. as below: 2 2xx
nnV66§· ¨¸©¹= 22472 140
10 10§·¨¸©¹
= 247.2 196 51.2
munotes.in

Page 97

97 Measure of Dispersion fixed number x0 (which is usually the central value among x) is subtracted
from each observation. This difference is denoted as
u = x – x0. Now the columns of fu and fu2 are computed and the S.D. is
calculated by the formula: 2 2.fu f u
NNV66§· ¨¸©¹. One can observe that this
formula is similar to that mentioned in 2.3. This formula is called as change
of Origin formula.
In this problem we assume x0 = 106. The table of calculations is as follows : x 106 ux f fu fu2 100 -6 5 -30 180 102 -4 11 -44 176 104 -2 7 -14 28 106 0 9 0 0 108 2 13 26 52 110 4 10 40 160 112 6 12 72 432 Total - N = 67 fu6= 50 2.fu6= 1028

From the table w e have: 2.fu6= 1028, fu6= 50 and N = 67. 22 2. 1028 50
67 67fu f u
NNV66§· § ·? ¨¸ ¨ ¸©¹ © ¹ 15.34 0.56 14.78 V? 3.84V?
Example 16: The income (in ’000 Rs.) of 100 families is given below. Find
the mean and S.D .
Income 0 – 5 5 – 10 10 – 15 15 – 20 20 – 25 25 – 30 30 – 35 No. of families 18 20 26 5 10 12 9
Solution: Computation using short -cut method:
Income x u = x – 17.5 f fu fu2 0 – 5 2.5 -15 18 -270 4050 In the table, the column of
fu2 is computed by multiplying
the entries of the columns fu and u. munotes.in

Page 98

98 Business Statistics
98 5 – 10 7.5 -10 20 -200 2000 10 – 15 12.5 -5 26 -130 650 15 – 20 17.5 0 5 0 0 20 – 25 22.5 5 10 50 250 25 – 30 27.5 10 12 120 1200 30 – 35 32.5 15 9 135 2025 Total - - N = 100 fu6= -295 2.fu6= 10175
From the table we have: 2.fu6= 10175, fu6= -295 and N = 100. 22 2. 10175 295
100 100fu f u
NNV66 §· § ·? ¨¸ ¨ ¸©¹ © ¹
= 101.75 8.7025 93.0475 V? Rs. 9.65
v Sometimes only change of origin does not simplify the calculations.
In such cases change of scale method is applied. In this method, we assume
u = 0xx
i, where i : change of scale. The formula for mean and S.D. using
this u is not the same. 0. xx i u and x xui VV , where 2 2.
ufu f u
NNV66§· ¨¸©¹
Example 17: The following data gives scholarships awarded to students of
a college. Find the S.D. Scholarship 1000 2000 3000 4000 5000 No. of students 16 20 8 10 6
Solution: Here we take x0 = 3000 and i = 1000
Scholarship u = 3000
1000x f fu fu2 1000 -2 16 -32 64 2000 -1 20 -20 20 3000 0 8 0 0 4000 1 10 10 10 munotes.in

Page 99

99 Measure of Dispersion 5000 2 6 12 24 Total - N = 60 fu6= -30 2fu6= 118
From the table we have, N = 60, fu6= -30 and 2fu6= 118.
Now, 2 2.
ufu f u
NNV66§· ¨¸©¹= 2118 30
60 60§·¨¸©¹= 1.96 0.25 1.71 1.3076 x xui VV = 1000 x 1.3076 = 1307.66 ?xV= Rs. 1308 (approx.)
6.8.1 Coefficient of variation:
Standard Deviation is an absolute measure of dispersion and is expressed in
the same units of measurement as that of the data. To compare two different
sets of data we need a relative measure which is free from the units of the
observations. The relative meas ure of dispersion for standard deviation
given by Karl Pearson, is called as coefficient of standard deviation or also
as coefficient of variation ( CV). The formula to compute CV is as follows:
Coefficient of variation ( CV) = x 100xV
This relative measure of S.D. measure how large is the S.D. in comparison
with the mean of the data. The data whose CV is small is said to be more
consistent.
Example 18: Find the mean, S.D. and CV for the following data:
Solution:
C.I. x u = 22.5
5x f fu fu2 10 – 15 12.5 -2 3 -6 12 15 – 20 17.5 -1 9 -9 9 20 – 25 22.5 0 7 0 0 25 – 30 27.5 1 2 2 2 30 – 35 32.5 2 4 8 16 Total - - N = 25 fu6= -5 2fu6= 39 C.I. 10 – 15 15 – 20 20 – 25 25 – 30 30 – 35 f 3 9 7 2 4 munotes.in

Page 100

100 Business Statistics
100 From the table: N = 25, fu6= -5, 2fu6= 39. 0. xx i u = 22.5 + 55
25§·
¨¸©¹= 22.5 – 1 = 21.5 21.5x? … (1) 2 2.
ufu f u
NNV66§· ¨¸©¹= 239 51.56 0.04 1.52 1.232825 25§· ¨¸©¹ x 5(1.2328) 6.164xui VV? 6.164xV? … (2)
Coefficient of Variation CV = x 100xV= 6.164 x 100 28.6621.5 % … (3)
Example 19: Choudhari & Bros own a factory which manufactures two
products A and B. The profit (in ’000 Rs.) of the two products from 1995 to
2003 is given below. Find which product gives more consistent profit to
Choudhari & Bros. Year 1995 1996 1997 1998 1999 2000 2001 2002 2003 Profit for A 101 95 110 105 112 99 102 100 112 Profit for B 82 90 109 80 81 72 75 80 115
Solution: Profit for Product A Profit for Product B x u = 102x u2 x u = x – 80 u2 101 -1 1 82 2 4 95 -7 49 90 10 100 110 8 64 109 19 361 105 3 9 80 0 0 112 10 100 81 1 1 99 -3 9 72 -8 64 102 0 0 75 -5 25 100 -2 4 80 0 0 112 10 100 115 35 1225 Total u6= 18 2u6= 336 Total u6= 54 2u6= 1780 munotes.in

Page 101

101 Measure of Dispersion For Product A: 0 xx u = 102 + 18/9 = 102 + 2 = 104 2 2uu
NNV66§· ¨¸©¹= 2336 1837.33 4 33.33 5.773599§· ¨¸©¹
Now, x 100 CVxV = 5.7735 x 100 5.55104 … (1)
For Product B: 0 xx u = 102 + 54/9 = 102 + 6 = 108 2 2uu
NNV66§· ¨¸©¹= 21780 54197.77 36 161.77 12.719299§· ¨¸©¹
Now, x 100 CVxV = 12.7192 x 100 11.78108 … (1)
From (1) and (2), we observe that the coefficient of variation of product A
is less than that of product B.
Thus, the profits earned by product A are more consistent.
Example 20: The average pulse rate of patient increased from 62 to 70 and
the S.D. also increased from 0.8 to 1.2 after treatment. Is it right to conclude
that there is improvement in the patients’ health?
Solution: Given 1x= 62, 1V= 0.8 and 2x= 70, 2V= 1.2 1
1
10.8 x 100 x 100 = 1.2962CVxV?
After treatment, 2
2
21.2 x 100 x 100=1.6672CVxV?
The CV after treatment is higher than before treatment. This means after
treatment the pulse r ate have become more variable. Hence it is not proper
to conclude that the health of the patient has improved.
6.8.2 Combined standard deviation:
One of the properties of standard deviation is that it can be used for further
algebraic treatment. We can fin d the combined standard deviation of two
different sets of data.
Consider two sets of data with number of observations n1, n2; their
respective means x1, x2 and respective standard deviations as 1Vand2V.
Then the combined standard deviation is given by the formula: 22 2 2
11 1 2 2 2
12
12() ( )nd n d
nnVVV munotes.in

Page 102

102 Business Statistics
102 where d1 = 1 12xx, d2 = 2 12xx and 11 2 2
1212nx n xxnn the combined mean.
Example 21: Find the combined mean and S.D. for th e following two
groups with the details given below. Also find which group is more variable. Group I Group II observations 40 60 Mean 60 70 S.D. 8 5 Solution: Given 11 140, 60, 8nx V and 22 260, 70 and 5 nx V
Combined mean = 11 2 2
12
12nx n xxnn =40(60) 60(70) 66006640 60 100
Now, d1 = 1 12xx = 60 – 66 = -6 and d2 =2 12xx = 70 – 66 = 4
Combined S.D. = 22 2 2
11 1 2 2 2
12
12() ( )nd n d
nnVVV =22 2 240(8 ( 6) ) 60(5 4 )
40 60
1240(64 36) 60(25 16) 4000 246064.6100 100V ? = 8.04 (approx.)
To find which group is variable, we compute the CV of the both the groups.
For Group I : 1
1
1800 x 100 13.3360CVxV
For Group II : 2
2
2500 x 100 7.1470CVxV
The CV of the Group I is higher than that of Group II. Thus, Group I is more
variable than Group II.
Example 22: If the coefficient of variation of two groups is 13.25% and
17% and their averages are 110 and 95 respectively, find the corresponding
standard deviations.
Solution: Given 1CV= 13.25, 1x= 110 and 2CV= 17, 2x= 95
If 12 and VVare the corresponding standard deviations, using the formula for
CV, we have 1
1
1 x 100 CVxV 11
1. 110(13.25)14.58100 100xC VV munotes.in

Page 103

103 Measure of Dispersion 2
2
2 x 100 CVxV 22
2. 95(17)16.15100 100xC VV
CORRECTED S.D.: Just as the arithmetic mean can be corrected for the
incorrect observations, S.D. also can be corrected. Before understanding the
steps to find the correct S.D. let us understand the formula of S.D. in more
detail.
We know that, 22 xxnV6 22 2 xxnV6 22 2() xn x V ?6 .
Thus, if any of the observations is incorrect the value of 2x6becomes
incorrect and has to be corrected first. Now we shall write down the steps
to calculate the correct S.D.
Steps to find correct S.D.
(1) We calculate the wrong x6 value by using the formula: x nx6 .
(2) Now we calculate the wrong 2x6by using the formula: 22 2() xn x V6
(3) Correct x6= (x6) – wrong observation + correct observation.
(4) Correct 2x6= (2x6) – (wrong observation)2 + (correct observation)2.
(5) Now the correct mean and standard deviation is computed using the
formulae:
Correct correct xxn6 and Correct2
2 correct (correct )xxnV6
This method can be similarly extended to problems where there are more
than one wrong observations.
Examp le 23: For a certain data with 10 observations, the mean and S.D.
were calculated to be 9 and 4 respectively. But later on it was observed that
one of the value was wrongly taken as 14 instead of 4. Find the correct mean
and the correct S.D.
Solution: Give n: n = 10, x= 9, V= 4, wrong value = 14 and correct
value = 4
x nx6 = 10(9) = 90
22 2() xn x V6 = 10(16 + 81) = 970
Now, correct 90 14 4 80x6
and correct 2x6= 970 – (14)2 + 42 = 970 – 196 + 16 = 790 munotes.in

Page 104

104 Business Statistics
104 Correct correct xxn6 = 80/10 = 8
Correct S.D. = 2
2 correct (correct )xxnV6 = 2 790(8) 79 64 15 3.8710
Example 24: A problem of finding the mean and standard deviation was
given t o the students of a class by their teacher. The mean and S.D. of 20
observations was calculated as 12 and 4 respectively. Later on the teacher
found that one of the observations was misheard by students as 13 instead
of 30. Find the correct mean and S.D.
Solution: Given: n = 20, x= 12, V= 4, wrong value = 13 and correct
value = 30
x nx6 = 20(12) = 240
22 2() xn x V6 = 20(16 + 144) = 3200
Now, correct 240 13 30x6 = 257
and correct 2x6= 3200 – (13)2 + (30)2 = 3200 – 169 + 900 = 3931
Correct correct xxn6 = 257/20 = 12.85
Correct S.D. = 2
2 correct (correct )xxnV6
= 2 3931(12.85) 196.55 165.1225 31.427520
correct 5.61 V?
6.8.3 Me rits and demerits of standard deviation:
Merits
(1) It is based on all observations.
(2) It is rigidly defined.
(3) It can be used for further algebraic treatment.
(4) It is not affected by sampling fluctuations.
Demerits
(1) Compared with other measures o f dispersion, it is difficult to
calculate. munotes.in

Page 105

105 Measure of Dispersion (2) More importance is given to the extreme observations in calculating
standard deviation. The square of the deviations of extreme values
from the mean, dominate the total and hence the value of S.D.
MISSING VAL UES
Example 25 : Find the missing values in the following table:
Group I Group II Combined observations 60 ? 140 mean ? 22.5 30 S.D. 4.5 7 ? Solution: Given: n1 = 60 and n1 + n2 = 140 n2 = 140 – 60 = 80.
Also given: x12 = 30, x2 = 22.5, using the combined mean formula, we have 11 2 2
12
12nx n xxnn 160 80(22.5)30140x 1 4200 60 1800 x 160 2400x ? 140x?
Now, d1 = 1 12xx = 40 – 30 = 10 and d2 =2 12xx = 22.5 – 30 = - 7.5
Combined S.D.22 2 2
11 1 2 2 2
12
12() ( )nd n d
nnVVV =
22 2 260[(4.5) 10 ] 80[7 ( 7.5) ]
60 80

12V= 7215 8420
140= 10.56
CHOICE OF A MEASURE OF DISPERSION
Throughout the chapter we have seen different measu res of dispersion like
range, inter quartile range, quartile deviation, mean deviation and standard
deviation with their merits and demerits. The selection of a particular
measure of dispersion depends upon the following three aspects:
(1) The objective to find the measure of dispersion : If the purpose is to
find the degree of variation of the observations from the mean then
standard deviation is more suitable than mean deviation.
(2) The type of data available : If the data available has open end class
intervals then it is not possible to use mean deviation or standard
deviation. Also if the data is very large and scattered, range cannot be
used as a proper measure of dispersion. munotes.in

Page 106

106 Business Statistics
106 (3) The characteristics of the measure of dispersion : The merits and
demerits o f the different measure over one another would be helpful
to select the most required one.
6.9 SKEWNESS& KURTOS IS:
Definition: Skewness means ‘lack of symmetry’. We study skewness to
have an idea about the shape of the curve which can be drawn with the
help of given data.
Distribution is said to be skewed if –
1. Mean, median and mode fall at different points.
2. Quartiles are not equidistance from median; and
3. The curve drawn with the help of the given data is not symmetrical
but stretched more to one si de than to other.
Note:
1. A distribution is said to be symmetric about its arithmetic mean
(A.M.) if the deviation of the values of the distribution from their
A.M. are such that corresponding to each positive deviation, there is
negative deviation of the same magnitude.
2. If the distribution is symmetric then ߤଷ= σ(௫೔ି௫ҧ)య ೙
೔సభ
௡= 0 & ߤଷ=
σ௙೔(௫೔ି௫ҧ)య ೙
೔సభ
ே = 0 .
3. If a distribution is not symmetric then the distribution is called a
skewed distribution.
4. A skewed distribution is also called as asymmetric distribution.
5. Thus in case of a skewed distribution the magnitudes of the
positive and the negative deviations of the values from their mean do
not balance .
Types of Skewness:
Skewness is of two types
(Fig. 01)
munotes.in

Page 107

107 Measure of Dispersion 1. Positive Skewness : Skewness is positive if the larger tail of the
distribution lies towards the higher values of the variate (the right),
i.e. if the curve drawn with the help of the given data is stretched more
to the right than left and the distribution is said to be positively
skewed distribution .
For a positively asymmetric distribution: A.M. > Median > Mode
2. Negative Skewness : Skewness is negative if the larger tail of the
distribution lies towards the lower values of the variate ( the left), i.e.
if the curve drawn with the help of the given data is stretched more to
the left than right and the distribution is said to be negatively skewed
distribution .
For a negatively asymmetric distribution: A.M. < Median < Mode
Note:
1. For a symmetric distribution: A.M. = Median = Mode
2. Skewness is positive if A.M. > Median or A.M. > Mode
3. Skewness is negative if A.M. < Median or A.M. < Mode
Measure of Skewness
Various measures of skewness are (these are absolute measures of
skewness)
1. Sk = Mean – Median
2. Sk = Mean – Mode
3. Sk = (Q 3 – Md) – (Md – Q1)
Kurtosis
The three measures namely, measures of central tendency, measure of
variations (moments) and measure of skewness that we have studied so far
are not sufficient to describe complet ely the characteristics of a frequency
distribution. Neither of these measures is concerned with the peakedness of
a frequency distribution.
Kurtosis is concerned with the flatness or peakedness of frequency curve –
The graphical representation of frequenc y distribution.
Definition:
Clark and Schkade defined kurtosis as: “Kurtosis is the property of a
distribution which express its relative peakedness.”
Types of Kurtosis:
1. Mesokurtic
2. Leptokurtic munotes.in

Page 108

108 Business Statistics
108 3. Platykurtic
1. Mesokurtic: The frequency curve whi ch is bell shaped curve is
considered as standard and such distribution is called Mesokurtic.
The normal curve is termed Mesokurtic.
2. Leptokurtic: A curve which is more peaked than the normal curve is
called Leptokurtic. For Leptokurtic curve kurtosis is positive and
dispersion is least among all the three types .
3. Platykurtic: A curve which is flatter than the normal curve is called
Platykurtic. For Platykurtic curve kurtosis is negative and
dispersion is more .
6.10 LET US SUM UP:
In this chapter we have learn:
• How to calculate Range and coefficient of range.
• How to calculate Quartile deviation and coefficient of quartile
deviation.
• How to calculate Mean deviation and coefficient of Mean deviation.
• How to calculate Standard deviation and coef ficient of variation.
6.11 UNIT END EXERCI SES:
1. Define Dispersion. Discuss the importance of different measures of
Dispersion.
2. What are the functions of a measure of dispersion?
3. Explain with suitable example how does a measure of dispersion
proves to be a supplementary tool for averages.
4. Differentiate between relative and absolute dispersion.
munotes.in

Page 109

109 Measure of Dispersion 5. Define Mean Deviation. Illustrate with examples different types of
mean deviations.
6. Write a short note on variance and its advantages over other meas ures
of dispersion.
7. Discuss with suitable examples the advantages of coefficient of
variation.
8. What are the criteria to select a particular measure of dispersion?
9. Explain briefly the different measures of dispersion.
10. Define Quartile Deviation. Explain the advantages of Q.D. over
range.
11. Write a short note on standard deviation and explain with examples
why is it the most popular measure of dispersion.
12. Define coefficient of variation. Explain its importance in statistical
analysis of seri es of data.
13. Find the range and its coefficient for the following data:
a) Wages in Rs.: 100, 55, 45, 90, 80, 120, 30, 125, 140 and 40.
b) Marks: 78, 37, 56, 89, 22, 30, 34, 10, 55, 38, 46, 62, 77, 12 and
44.
c) Temperature in degree Celsius: 32.5, 33.8 , 32, 35, 35.2, 38, 33,
32.7, 31 and 31.4
d) Share Price in Rs: 1021, 1000, 1009, 1022, 1022.5, 1024, 1011,
1015, 1002.5, 1020
e) Rainfall in mm: 65, 67, 77, 62, 60, 56, 60, 45, 76, 80 and 44.
14. Find the range and its coefficient for the following data:
Income 0 – 5 5 – 10 10 – 15 15 – 20 20 – 25 25 – 30 30 – 35 No. of families 12 14 11 18 20 08 10
15. If the coefficient of range is 0.5 and the smallest value is 10, find the
largest value of the data.
16. If the coefficient of range is 0.8 and the larg est value is 40, find the
smallest value of the data.
17. Find the inter quartile range for the following data:
a) Marks: 5, 10, 7, 2, 8, 11, 16, 1, 6, 11, 12, 3, 7, 14, 12, 10 munotes.in

Page 110

110 Business Statistics
110 b)

18. Find the Q.D. for the following data and also find its coefficient. Height in cm 120 125 130 135 140 150 160 No. of boys 6 12 18 22 32 10 5
19. Find the Q.D. for the following data and also find its coefficient. No. of phone
calls 100 120 140 160 180 200 220 Frequency 5 15 18 22 19 10 6 20. Find the Q.D. for the following data and also find its coefficient. Income in Rs.
less than 50 70 90 110 130 150 170 No of families 54 150 290 595 820 900 1000
21. Find the range of the cen tral 50 % of workers, Q.D. and its
coefficient for the following data: Income in Rs. 10 – 30 30 – 50 50 – 70 70 – 90 90 – 110 110-130 130-150 No of workers 7 18 12 15 10 8 10
22. Find the M.D. from median for the following data. Also compute its
coeffic ient. Size 4 8 12 16 20 24 28 32 Frequency 6 11 17 20 12 14 8 10 23. The following data gives the rainfall in cm in past 10 years for two
cities A and B. Find the M.D. from mean and coefficient of M.D.
and comment. City A 36 67 35 50 54 71 60 45 Fuel in ltrs 1 – 5 6 – 10 11 – 15 16 – 20 21 – 25 26 – 30 No. of vehicles 40 35 10 15 20 30
munotes.in

Page 111

111 Measure of Dispersion City B 80 85 77 79 80 82 88 85
24. Compute the M.D. from mean, median and mode for the following
data. Compare the three values, which M.D. is the lowest?
Marks 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80 No of students 16 20 15 24 10 38 22 15
25. Compute the mean deviation from mean and median for the
following data. Also find the coefficient of mean deviation.
C.I. 3 – 4 4 – 5 5 – 6 6 – 7 7 – 8 8 – 9 9 – 10 Frequency 10 16 14 19 12 4 2 26. The differences between the ages of husband and wives are given
below. Find the M.D. from mean and also find the coefficient of
M.D. Age in yrs 0 – 2 2 – 4 4 – 6 6 – 8 8 – 10 10 – 12 No of couples 349 278 402 100 112 25
27. Compute the standard deviation for the following data giving marks
obtained by FYBMS student in their Statistics project. 12 8 10 11 7 3 19 12 12 13 7 5 10 6 4 8 11 15 5 16 7 13 14 11 5 9 6 4 2 10 28. Compute the standard deviation for the following data: Age in yrs 0 – 5 5 – 10 10 – 15 15 – 20 20 – 25 25 – 30 No of people 26 11 05 08 28 30
29. Calculate the mean and standard deviation for the following data:
C.I. 10 – 12 12 – 14 14 – 16 16 – 18 18 – 20 20 – 22 22 – 24 Frequency 100 102 89 132 90 88 55 30. Calculate the mean and standard deviation for the following d ata: munotes.in

Page 112

112 Business Statistics
112 Marks less than 15 30 45 60 75 90 No of students 40 60 50 10 30 10
31. Calculate the mean and standard deviation for the following data: length of wire in cm 12 – 15 16 – 19 20 – 23 24 – 27 28 – 31 32 – 35 Frequency 16 21 18 20 14 11 32. The speed of vehicles on the Mumbai – Pune express Highway per
day is given below.
Find the mean and S.D.
Speed in
km 30 –
40 40 –
50 50 –
60 60 –
70 70 –
80 80 –
90 90-
100 No of vehicles 78 105 150 177 188 200 170
33. After a month l ong control of diet and exercises the loss in weight of
50 people in a society is given below. Find the mean and S.D. for the
data Weight loss in kg 0 – 2 2 – 4 4 – 6 6 – 8 8 – 10 10 – 12 No of persons 5 10 12 10 5 8
34. The amount of money taken from a n ATM machine is as given
below. Find the S.D. Amount in ’000 Rs. 2 –
5 5 – 8 8 – 11 11 –
14 14 –
17 17 –
20 No of days 15 37 25 28 40 30 35. Compute the mean and S.D. for the following data: Age below 10 20 30 40 50 60 70 f 20 25 40 15 30 25 10 36. The following data gives the number of goals made by the Indian
and Pakistani Hockey team. Find out which team is more consistent. munotes.in

Page 113

113 Measure of Dispersion Goals 0 1 2 3 4 5 Indian Team 20 18 10 7 3 5 Pakistan Team 25 15 16 4 3 2
37. The fol lowing data gives the duration of phone calls made by boys
and girls of a college. Find the S.D., CV . Duration in seconds 0 – 60 60 – 120 120 – 180 180 – 240 240-300 300-360 No of Boys 17 20 30 16 10 15 No of Girls 20 30 24 18 8 10 38. The life of two types of tube lights in market is given below. Which
type of tube light is more uniform? Life in months 0 – 4 4 – 8 8 – 12 12 – 16 16 – 20 20 – 24 Tube A 8 12 18 6 10 4 Tube B 12 20 32 24 16 10 39. The average weekly wag es of workers in two factories are Rs. 85 and
Rs. 60 respectively. The number of workers is 100 and 120 while the
standard deviation of wages is Rs. 12 and Rs. 8 respectively. Find the
combined mean and standard deviation.
40. The mean and standard deviati ons of two groups with 100 and 150
observations are 55,6 and 40,8. Find the combined mean and S.D. for
all the observations taken together.
41. The average weekly wages of 60 workers in a firm A are Rs. 210 with
S.D. Rs. 10, while in firm B the numbers are 100, Rs. 90 and Rs. 12.
Which firm has greater consistency in the weekly wages? Find the
combined wages and S.D. for all the workers.
42. The mean and standard deviations of two groups with 50 and 70
observations are 110,9 and 80,8. Find the combined mean and S.D.
for all the observations taken together.
43. The average performance and S.D. of two machines in a factory are
80, 5 and 75, 5.5 respectively. Which machine is more consistent?
44. A sample of size 20 has mean 4.5 and S.D. 2.8. Another sample of
size 25 has mean 5.6 and S.D. 3. Find the mean and S.D. of the
combined samples.
45. Find the missing values in the following table: munotes.in

Page 114

114 Business Statistics
114 Group I Group II Combined observations 25 ? 60 mean ? 13.5 12 S.D. ? 8 10 46. Two samples of size 30 and 70 have same mean 40 but different S.D.
14 and 18 respectively. Find the combined S.D. of the sample of total
size.
47. Find the missing values in the following table: Group I Group II Combined observations ? 30 50 mean 28 ? 32 S.D. 4.5 ? 6 48. For a distribution of 300 observations the mean and S.D. was found
to be 50 and 5 respectively. Later on it was found that one of the
observations was wrongly taken as 15 instead of 50. Find the correct
mean and S.D.
49. The mean and S.D. of 100 observations was calculated t o be 35 and
3.5 respectively. One of the observations was wrongly taken as 14.
Calculate the mean and S.D. if ( i) the wrong value is omitted and ( ii)
it is replaced by the correct value 40.
50. A group of 50 observations has mean 65cm and S.D. 8. Two more
observations 70
Multiple choice questions:
1) Which of the following is not absolute measure of dispersion?
a) Range b) Quartile deviation
c) Stander deviation d) Coefficient of variation
2) For the data 8, 1, 4, 5, 9, 3, 2, 7 the range is ___ _
a) 6 b) 8 c) 4 d) 7
3) If upper quartile and lower quartile are 90 and 45 respectively than
coefficient of quartile deviation is
a) 0.5 b) 0.67 c) 0.4 d) 0.33
4) If S.D=24 & Mean=120 then C.V is _______
a) 12% b) 15% c) 18% d) 20%
5) __________ is a measure of dispersion.
a) Mean b) Median c) Mode d) Range
6) If Q 1 =15, Q 3=40 then quartile deviation is : - munotes.in

Page 115

115 Measure of Dispersion a) 15 b) 12.5 c) 11.6 d) 14
7) The mean and coefficient of variation are 10 and 5 respect ively. The
standard deviation is : -
a) 2.5 b) 6.5 c) 3.5 d) None of the above
8) Two samples A and B have the same standard deviation but the mean
of A is greater than that of B the coefficient of variation of A is
a) Greater than that of B b) Less than that of B
c) Equal to that of B d) None of these
9) Measure of dispersion which is affected most by extreme observation
is :-
a) Range b) Q.D c) M.D d) S.D
10) Algebraic sum of deviation from mean is
a) Positive b) Negative c) Zero d) Difference for each case
6.12 LIST OF REFEREN CES
• Fundamentals of mathematical Statistics by S.C. Gupta and V.K
Kapoor.
• Basic Statistics by B. L. Agrawal.
7777777
munotes.in

Page 116

116 Business Statistics
116 7
CORRELATION
Unit Structure
7.0 Objectives:
7.1 Introduction:
7.2 Importance of correlation
7.3 Properties of correlation coefficient
7.4 Correlation and causation
7.5 Types of correlation
7.6 Scatter diagram
7.6.1 Merits and demerits of scatter diag ram
7.7 Karl pearson’s coefficient of correlation
7.7.1 Coefficient of correlation for a grouped data
7.7.2 Merits and demerits of karl pearson’s coefficient of correlation
7.8 Spearman’s rank correlation coefficient
7.8.1 Ranks are not given and are no n – repeated ranks
7.8.2 Repeated ranks
7.9 Let us sum up
7.10 Unit end exercises
7.11 List of references
7.0 OBJECTIVES:
After going through this chapter you will able to know:
• Meaning of correlation and its types.
• Properties of correlation.
• Method of c alculation of coefficient of correlation.
7.1 INTRODUCTION:
In the previous two chapters we have seen statistical measures used for the
analysis of a univariate data, i.e. a data in one variable only. For example,
weight, height, marks, wages, income, pric e etc. But there are sets of data
which can be related to each other. Let us consider the weights and heights
of people. Medically too it is said that are very closely related. In problems
related to science, business and economics also it is important to know
whether two variables are related to each other or not. For example, the
relation between density of pollutants in air and number of vehicles,
investments in advertisement and sales of a product, use of a vaccine and munotes.in

Page 117

117 Correlation number of patients, ranks or marks given by different judges in reality show,
marks of same students in two consecutive exams etc. The question
immediate is, if they have some relation then what is the magnitude and
direction of that relation?
This relation between a bivariate data is sai d to be correlation and the
magnitude of correlation is called as the correlation coefficient. The study
of the magnitude and nature of correlation between two variables is called
as correlation analysis.
7.2 IMPORTANCE OF CO RRELATION:
(1) Correlation giv es an answer to the basic question of measuring the
extent of correlation between two variables.
(2) Correlation helps to understand the behavioral pattern of different
variables in business and economics.
(3) The most important aspect of any analysis is decision making.
Correlation finds the magnitude and direction of association between
two variables and hence facilitates in the decision making process.
(4) Correlation provides a comfortable platform for estimation or
forecasting, which is again an impo rtant tool in statistical analysis.
7.3 PROPERTIES OF CO RRELATION COEFFICIEN T:
(1) The measure of correlation is called as the coefficient of correlation
and is denoted by r.
(2) It is independent of the units of measurement of the variables.
(3) The value of r ranges between –1 and 1 and depends on the slope of
the line passing through the values of the variables. The sign of the
value of r indicates the type of correlation between the two variables.
This is explained in the section 7.5 (i).
(4) The valu es r = –1 and r = 1, are the extreme values of correlation
indicating a perfect correlation in either direction.
(5) For 0 < r < 1, we say that there is an imperfect positive correlation.
This again is classified into strong and weak positive correlation ( or
high and low degree positive correlation).
(6) For –1 < r < 0, we say that there is an imperfect negative correlation.
This again is classified into strong and weak negative correlation (or
high and low degree negative correlation).
(7) If r = 0, we s ay that there is no correlation or zero correlation between
the two variables.
(8) It is independent of the change of scale and change of origin. If a
constant is added to (or subtracted from) all the values of both the
variables then also the value of r remains unchanged, this is called as
change of origin. The value of r remains unchanged even if all the
values of the variables are divided (or multiplied) by a constant, this
is called as change of scale. munotes.in

Page 118

118 Business Statistics
118 7.4 CORRELATION AND CAUSATION:
Though corre lation gives the magnitude of interdependency of two
variables, it does not say about the cause and effect relationship. Even if
there is a high magnitude of correlation between two variables it does not
necessarily mean that they are having a close relati onship. For example, the
increase in sales of TV sets and increase in the sales of umbrellas in a city
may quantitatively show strong correlation. But the sales of TV sets may be
due a pay hike or say IPL cricket matches and the sales of umbrellas due to
rainy season! Such type correlation is called as nonsense correlation. The
following are the causes of correlation between two variables:
(1) Influence of multiple factors: The correlation between two variables
under consideration may be due the influence of multiple factors. In
practice, there are third party factors which may affect the variations
in the variables at the same time. For example, demand and price of a
certain product may be affected by the inflation, natural calamities,
economical policy, etc. The increase in the sales of different luxury
items does not means that are actually correlated as the increase can
be due to a third factor common to all, like increase in income.
(2) Mutual Influence: Both the variable may be affecting each other. In
economics as we can observe that the increase in price of a commodity
leads to decrease in its demand. But the relationship between price
and demand is dual. So, an increase in demand may lead to increase
in price also. Thus, a correlation between two var iables may be due to
mutual influences.
(3) Coincidence: It may happen that a pair of variables shows some
correlation only by a chance and that may not be universal. Such
coincidences are seen in a small sample. For example, if a small
sample is taken fro m a village for the sales of a certain product and
compared with the advertising expenditure, it may show a strong
correlation. But this may be due to monopoly of that product in the
area from where the sample is taken and may not be true for a wider
area. Thus, a conclusion that increase in advertising expenditure has
lead to increase in sales, on the basis of the strong correlation may be
far from true.
7.5 TYPES OF CORRELA TION:
Correlation can be classified into three types as follows:
(i) Positive or ne gative Correlation:
Positive (or negative) correlation is related to the direction of the
Correlation. If the increase (or decrease) in one variable results in
increase (or decrease) in the other variable then we say that the
correlation is positive . The value of r in such type of correlation is
between 0 and 1. If the increase (or decrease) in one variable results
in decrease (or increase) in the other variable then we say that the munotes.in

Page 119

119 Correlation correlation is negative. The value of r in such type of correlation is
between –1 and 0.
(ii) Simple, Partial or multiple Correlation:
If the correlation is only between two variables then we say it as a
simple correlation . If the other influencing factors affecting the two
variables are assumed to be constant then it is sa id to be partial
correlation. If more than two variables are studied for their inter
dependencies then it is said to be multiple correlation.
(iii) Linear and Non -Linear Correlation:
If the relation between the two variables is linear we say it is a linear
correlation . The meaning of being linear is that a mathematical
relation of the type y = ax + b , which is an equation of straight line,
can be established between the two variables. In other words, the
values of the variables must be in constant ratio.
For example, consider the following data of two variables price and
demand:
Price (x) : 2 4 6 8 10 12
Demand (y) : 8 12 16 20 24 28
The relation between price and demand can be written as y = 2x + 4.
Practically we do not get a linear relati onship always. Such type of
Correlation is said non-linear correlation . The graph of the values of
the variables is not a straight line, but a curve.
There are different methods to measure the magnitude or to
understand the extent of correlation between two variables. In this
chapter we are going to study three such methods:
(1) Scatter Diagram, (2) Correlation Table, (3) Correlation Graph and
(4) Correlation Coefficient. As per our scope of syllabus, we shall be
studying in detail two types of Correlat ion Coefficients namely, Karl
Pearson’s Coefficient of Correlation and Spearman’s rank Correlation
Coefficient.
7.6 SCATTER DIAGRAM:
This is the simplest method to study the correlation between two variables.
Scatter Diagram is a graphical method to study the extent of correlation
between variables. A graph of the two variables X and Y is drawn by taking
their values on the corresponding axis. Points are plotted on the graph and
the conclusion is made on the density of the points on the graph. The
followin g figures and the explanations would make it clearer. munotes.in

Page 120

120 Business Statistics
120 (i) Perfect Positive Correlation:
If the graph of the values of the variables is a straight line with positive
slope as shown in Figure 7.1, we say there is a perfect positive
correlation between X and Y. Here r = 1.
(ii) Imperfect Positive Correlation:
If the graph of the values of X and Y show a band of points from lower
left corner to upper right corner as shown in Figure 7.2, we say that
there is an imperfect positive correlation . Here 0 < r < 1.

(iii) Perfect Negative Correlation:
If the graph of the values of the variables is a straight line with
negative slope as shown inFigure 7.3, we say there is a perfect
negative correlation between X and Y. Here r = –1.
munotes.in

Page 121

121 Correlation

(iv) Imperfect Negative Cor relation:
If the graph of the values of X and Y show a band of points from upper
left corner to the lower right corner as shown in Figure 7.4, then we
say that there is an imperfect negative correlation . Here –1 < r < 0

(v) Zero Correlation:
If the graph of the values of X and Y do not show any of the above
trend then we say that there is a zero correlation between X and Y. The
graph of such type can be a straight line perpendicular to the axis, as
shown in Fi gure 7.5 and 7.6, or may be completely scattered as shown
in Figure 7.7. Here r = 0.
munotes.in

Page 122

122 Business Statistics
122
The Figure 7.5 show that the increase in the values of Y has no effect on the
value of X, it remains the same, hence zero correlation. T he Figure 7.6 show
that the increase in the values of X has no effect on the value of Y, it remains
the same, hence zero correlation. The Figure 7.7 show that the points are
completely scattered on the graph and show no particular trend, hence there
is no correlation or zero correlation between X and Y.
7.6.1 Merits and demerits of scatter diagram:
Merits
(1) It is very easy to understand and interpret the degree of correlation.
(2) It is a simple method and involves no mathematical calculations.
(3) It is not affected by extreme values.
Demerits
(1) It fails to give the exact magnitude of correlation.
(2) It cannot be used for any further analysis.
7.7 KARL PEARSON’S C OEFFICIENT OF
CORRELATION:
The Karl Pearson’s Correlation Coefficient is also known as the product
moment correlation coefficient. The coefficient of correlation as developed
by Karl Pearson is defined as the ratio of the Covariance between x and y to
the product of respective standard deviations. Thus, (, )
xyCov x yrVV … (1)
where Cov(x, y) is the Covariance between x and y andxV, yVare the
standard deviations of x and y respectively.
Covariance between x and y, which is the average of sum of the product of
deviati ons of the values from their respective averages, is given by the
munotes.in

Page 123

123 Correlation formula: () ()(, )xxyyCov x yn6 and we know that: 2()
xxx
nV6 , 2()
yyy
nV6
Substituting in (1), we have 221() ()
()()xxyynr
xx yy
nn6

6 6= 22() ()
() ()xxyy
xx yy6
6 6
.... (2)
The simplified form of the above formula is as follows:
22
22xyxynr
xyxynn666

666 6 … (3)
or 22 22nx y xyr
nx x ny y6 6 6
6 6 6 6 … (4)
The formula (3) or (4) is the most commonly used formula for computing
the correlation coefficient from a raw data. If the sum of product of
deviations and the standard deviations are known directly then formula (1)
or (2) can be used.
Steps to find Karl Pearson’s Correlation Coefficient ‘ r’ from a raw
data:
(1) The columns of xy, x2 and y2 are introduced.
(2) The respective column totals: 22, , , and xyx yx y666 6 6are calculated.
(3) Using the above formula no. (3) or (4), r is calculated.
Note : The value of r is between – 1 and 1, if we get a value which is not in
this range it means our calculations are not correct!
Example 1: Calculate the Karl Pearson’s coefficient of correlation for the
following data and comment: x 10 8 11 7 9 12 y 8 5 10 6 7 11 Solution: Introducing the columns as mentioned above, the table of
computation is: x y xy x2 y2 10 8 80 100 64 8 5 40 64 25 munotes.in

Page 124

124 Business Statistics
124 11 10 110 121 100 7 6 42 49 36 9 7 63 81 49 12 11 131 144 121 x6= 57 y6= 47 xy6= 466 2x6= 559 2y6= 395 Here n = 6. Using the calculated totals from the table and the formula no.
(4), we have
22 2 2 226 x 466 57 x 47
6 x 559 (57) 6 x 395 (47)nx y xyr
nx x ny y6 6 6
6 6 6 6
(Students should be careful in substituting and calculating the values
properly in the above formula. Especially in the case of 2 2 and xx66.
Neither 2x6is not the square of x6as can be seen from the table nor 2x6is same as 2x6!!)
2796 2679
3594 3249 2370 2209r?
= 117 117
18.57 x 12.68 345 161
0.49r?
Since r = 0.49 > 0, there is an imperfect positive correlation between x and
y. (to be more precise weak imperfect positive correlation )
Example 2: Calculate the coefficient of correlation from the following
given i nformation and comment: n = 10, x6= 608, y6= 640, xy6= 39965, 2x6= 39054 and = 42096
Solution: All the required totals are provided, hence using the formula no.
(4), we have
22 22 2210 x 39965 608 x 640
10 x 39054 (608) 10 x 42096 (640)nx y xyr
nx x ny y6 6 6
6 6 6 6
399650 389120 10530 10530 144.48 x 106.58 390540 369664 420960 409600 20876 11360r?

0.68r?
Thus, there is an imperfect positive correlation between x and y.
Example 3: Calculate the coefficient of correlation from the following
given information and comment: n = 6, = 105, y6= 305, xy6= 5110, 2x6= 1855 and = 18525 2y6
x62y6munotes.in

Page 125

125 Correlation Solution: All the required totals are provided, hence using the formula no.
(4), we have
22 2 2 226 x 5110 105 x 305
6 x 1855 (105) 6 x 18525 (305)nx y xyr
nx x ny y6 6 6
6 6 6 6
30660 32025 1365 1365 10.25 x 134.63 11130 11025 111150 93025 105 18125r ?

0.98 r?
Since, r = -0.98 < 0, there is a strong imperfect negative correlation.
Example 4: Calculate the coefficient of correlation from the following
given information and comment: () ()xxyy6 = 1240, 2()xx6= 1650 and 2()yy6= 2430.
Solution: From the formula no. (2), we know that: 22() ()
() ()xxyyr
xx yy6
6 6
1240 12400.6240.62 x 49.29 1650 2430r?
Thus, there is imperfect positive correlation.
Example 5: Calculate t he product moment correlation coefficient and
comment: x 75 60 55 50 48 45 y 70 80 82 85 90 94 Solution: In problems where the values of the variables are large, change
of origin can be done to simplify the problem. We know from property (8)
of r that it is independent of change of origin and scale.
Let us assume u = x – 55 and v = y – 80. We know by above mentioned
property that,uv xyrr . Now introducing columns of the type uv, u2 and v2,
we prepare the table of calculation as follows: x y u v Uv u2 v2 75 70 20 -10 -200 400 100 60 80 5 0 0 25 0 55 82 0 2 0 0 4 50 85 -5 5 -25 25 25 48 90 -7 10 -70 49 100 munotes.in

Page 126

126 Business Statistics
126 45 94 -10 14 -140 100 196 Total u6= 3 v6= 21 uv6= -435 2u6= 599 2v6= 425
Here, n = 6.The change of origin formula is given as:
22 22uvn uv u vr
nu u nv v6 6 6
6 6 6 6 226 x (-435) 3 x 21
6 x 599 (3) 6 x 425 (21)uvr?

2610 63 2673 2673 59.87 x 45.92 3594 9 2550 441 3585 2109uvr ?

0.97uvr? = xyr
Thus, there is a strong negative correlation between x and y.
Example 6: Calculate the Karl Pearson’s correlation coefficient and
comment: x 100 150 200 300 400 550 y 20 30 40 50 60 70 Solution: If the values of the variables are multiples of a common n umber
then the problem can be simplified by using change of scale, as we know
that the correlation coefficient is independent of change of scale.
Let 300
50xu and 40
10yv . Now introducing columns of the type uv, u2
and v2, we prepare the table of calculation as follows: x y u v Uv u2 v2 100 20 -4 -2 8 16 4 150 30 -3 -1 3 9 1 200 40 -2 0 0 4 0 300 50 0 1 0 0 1 400 60 2 2 4 4 4 550 70 5 3 15 25 9 Total u6= -2 v6= 3 uv6= 30 2u6= 58 2v6= 19
Here n = 6. Since uv xyrr , the formula and the calculations are as follows: munotes.in

Page 127

127 Correlation 22 2 2 226 x (30) ( 2) x 3
6 x 54 ( 2) 6 x 19 (3)uvn uv u vr
nu u nv v6 6 6
6 6 6 6
180 6 186 186 18.55 x 10.25 348 4 114 9 344 105uvr?

0.978xy uvrr?
Thus, there is a strong positive correlation between x and y.
7.7.1 Coefficient of correlation for a grouped data:
If the values of the variables are given as a grouped data with their
corresponding frequencies, a bivariate table is needed to be calculated. The
formula for calculation is as follows:
22 22Nf x y f xf yr
Nf x f x Nf y f y6 6 6
6 6 6 6 …
(5)
This formula is similar to what we have been using till now. The difference
in notations is that now the corresponding frequencies are getting mu ltiplied
to each term in the formula.
But the steps to calculate the correlation coefficient have to be very
carefully understood as the bivariate table has class intervals and
frequencies of both the variables.
Note : While practicing the solved problem, s tudents should refer to the steps
given below, one by one and then compare it with the calculations done in
the table.
Steps to find correlation coefficient for a Grouped Bivariate data
(1) Given a bivariate grouped data, we introduce the column and row of
their mid points (say x and y). If the data is discrete ( i.e. without class
intervals) this step is to be skipped as the values of x and y are already
given. If change of scale or origin is needed, it should be done in this
step.
(2) Total the frequencies in each column and in each row, write them in
a new column and row named as f. The horizontal and vertical total
of these column values and row values is same and is denoted as N.
(3) Multiply the frequency in each cell with its corresponding values of x
and y and write it in parenthesis inside the same cell. These values can
also be put in a box or circled to differentiate them from the frequency
in that cell. munotes.in

Page 128

128 Business Statistics
128 (4) Now introduce the columns and rows of fx, fx2, fxy and fy, fy2, fxy (in
case of change of or igin or scale these columns and rows are fu, fu2,
fuv and fv, fv2, fuv)
(5) The values in the cells under fx are found by multiplying the total
frequency in that column (or row) with the corresponding mid points.
The values of fx2 are found by multiplying the values of fx with x. This
step has to be repeated for finding values for fy and fy2. The totals of
all columns and rows of fx, fx2, fy, fy2 are to be calculated.
(6) Now, the values of entries in the column (and row) of fxy are the
respective totals of the values written in the upper right corner of
every cell. The horizontal and vertical total of all fxy’s should be same.
(7) The totals thus obtained are substituted in the formula no. (5) and the
correlation coefficient is finally computed!
Solved belo w are three different examples covering different cases of
bivariate grouped data.
Example 7: Find the correlation coefficient for the following bivariate
grouped data: y x 2 4 6 8 1 - 3 5 1 3 1 - - 4 5 4 - 2 - 7 - 1 - - Solution: The given bivari ate data is a discrete data. No entry in any cell
means the frequency for that cell is 0. Skipping the first step, we calculate
the row and column totals of the corresponding frequencies as follows:

munotes.in

Page 129

129 Correlation Now, using step (3), the frequencies in each cell are multiplied with
orresponding values of x and y and the product is written in the parenthesis
inside the cell as shown below.

Now, the final table is completed by introducing the columns of fx, fx2 and
fxy and the rows of fy, fy2 and fxy as shown below.
From the table the values of the required terms are:
N = 21, fx6=61, 2fx6=253, fy6= 108, 2fy6= 656 and fxy6= 280. 22 22Nf x y f xf yr
Nf x f x Nf y f y6 6 6
6 6 6 62221 x 280 61 x 108
21 x 253 (61) 21 x 656 (108)

Y x 2 4 6 8 F fx fx2 fxy 1 0 (0) 3 (12) 5 (30) 1 (8) 9 9 9 50
3 1 (6) 0 (0) 0 (0) 4 (96) 5 15 45 102
5 4 (40) 0 (0) 2 (60) 0 (0) 6 30 150 100
7 0 (0) 1 (28) 0 (0) 0 (0) 1 7 49 28
f 5 4 7 5 N = 21 fx6=61 2fx6=253 fxy6=280
fy 10 16 42 40 fy6= 108
fy2 20 64 252 320 2fy6= 656
fxy 46 40 90 104 fxy6= 280 munotes.in

Page 130

130 Business Statistics
130 708 708
39.89 x 45.96 1592 2112r? 0.386 r?
Example 8: Calcul ate the Karl Pearson’s correlation coefficient for the
following data:

Solution: The mid points of class intervals are found in the first step and
the row and column totals of the frequencies are completed as follows:

Now the procedure as done in above example is followed and the table is
completed as shown b elow:
munotes.in

Page 131

131 Correlation

From the table the values of the required terms are:
N = 52, fx6=3780, 2fx6=289200, fy6= 1030, 2fy6= 23300 and fxy6=
79300. 22 22Nf x y f xf yr
Nf x f x Nf y f y6 6 6
6 6 6 62252 x 79300 3780 x 1030
52 x 289200 (3780) 52 x 23300 (1030)
230200 230200
388.2 x 866.03 750000 150700r? = 0.6847
Example 9: Compute the coefficient of correlation, for the following data:

munotes.in

Page 132

132 Business Statistics
132 Solution: The previous problem was solved by direct method. But this
problem we shall solve using the change of scale method. The mid points
of the Group I are 5, 15 and 25, so we assume u as
u = 15
10x. The mid points of Group II are 30, 50 and 70, so let us assume
v = 50
20y.
The initial table can be completed as follows:

Now we the table is completed by following the remaining steps as shown
below. The initial two rows and columns are not shown below as we have
changed the scale.

From the table the values of the required terms are substituted in the formula
as shown below: 22 22uvN fuv fu fvr
Nf u f u Nf v f v6 6 6
6 6 6 6= 2240 x 4 ( 3) x 3
40 x 29 ( 3) 40 x 25 (3)
160 9 169 169 33.93 x 31.48 1160 9 1000 9 1151 991uvr?
0.158xy uvrr?
munotes.in

Page 133

133 Correlation 7.7.2 Merits and demerits of karl pea rson’s coefficient of correlation
Merits
(1) It gives the magnitude and direction of the correlation between two
variables.
(2) It is the most commonly used and popular measure of finding
correlation coefficient.
Demerits
(1) The formula and method is not very easy to remember and understand
quickly.
(2) It does not ascertain the existence of correlation between two
variables. In other words, there may not be any actual relationship
between variables but the value of r may not say so. The correlation
coeff icient has to does not give the cause and effect relationship
between variables. There is thus, a high chance of misinterpretation.
(3) It is affected by extreme values of the variables taken into
consideration.
(4) It cannot be used for a non linear relat ionship. The formula assumes
that there is always a linear relation between the variables, whereas
actually it may not be so.
7.8 SPEARMAN’S RANK CORRELATION
COEFFICIENT
This formula developed by Charles Spearman is useful in measuring the
correlation bet ween two variables when the data is given in a certain order.
This order is generally ranks given to the variables based on some
qualitative information. Fir example, ranks to students based on their
performance, ranks to the contestants of some competitio n, ranks to TV
serials based on their TRP’s etc.
The Spearman’s Rank Correlation Coefficient is denoted by R. In general
the formula is as given below:
2
261
(1 )dR
nn6
… (6)
Where n: number of observations
d2: The square of d ifferences between the ranks
If R1 denotes ranks given to the first variable, R2 denotes the ranks given to
the second variable, then d = R1 – R2. Every such difference is squared and
finally totaled to get 2d6. munotes.in

Page 134

134 Business Statistics
134 The value of R, the S pearman’s rank correlation coefficient, like that of the
Karl Pearson’s correlation coefficient also ranges between – 1 and 1.
There are three different types of problems related to the formula. We shall
discuss each separately and understand it with suita ble examples.
If Ranks are given directly: If the ranks to the variables are already given
then it is very simple to compute the value of R.
Step I For every observation the difference between the ranks, i.e. d = R1 –
R2 is calculated.
Step II The column o f the squares of these differences is introduced and
its sum 2d6is calculated.
Step III The formula no. (6) mentioned above is used to compute the
rank correlation coefficient.
Example 10: The following data gives the ranks of 10 stu dents in two
consecutive years 1990 and 1991 1990 2 4 1 7 3 9 6 10 8 5 1991 3 2 1 5 6 7 8 9 10 4 Find the rank correlation coefficient.
Solution: Let the ranks in the year 1990 be denoted by R1 and those for
the year 1991 be denoted by R2. The above men tioned steps are followed
and the table of calculations is completed as follows: R1 R2 d = R1 – R2 d2 2 3 -1 1 4 2 2 4 1 1 0 0 7 5 2 4 3 6 -3 9 9 7 2 4 6 8 -2 4 10 9 1 1 8 10 -2 4 5 4 1 1 n = 10 2d6= 32 The Spearman’s r ank correlation coefficient is calculated as follows: 2
261
(1 )dR
nn6
= 26 x 321
10(10 1)
(students should not square 32 here!!) munotes.in

Page 135

135 Correlation 1921 1 0.1939 0.808990R?
Thus, there is strong positive correlation between the two performances in
the two years.
Example 11: In a singing contest twelve participants were judged by three
judges. The ranks given to the participants by the judges are given below.
Find which pair of judges have a most common approach in judgment.
Judge A 2 7 6 3 1 9 11 8 12 4 10 5 Judge B 5 4 7 1 2 6 8 12 11 3 9 10 Judge C 1 6 7 3 2 8 12 9 11 5 10 4 Solution: In such kind of problems we consider the data of the variables
in pairs. Here we will be finding the rank correlation between Judge A and
B, Judge A and C and Judge A and C.

Here n = 12.
For Judge A and B : 2
ABd6= 86 2
261
(1 )AB
ABdR
nn6
= 26 x 86 5161 1 1 0.31716 12(12 1)

munotes.in

Page 136

136 Business Statistics
136 0.7ABR? … (i)
For Judge A and C : 2
ACd6= 10 2
226 6 x 10 601 1 1 1 0.0351716 ( 1) 12(12 1)AC
ACdR
nn6
0.965ACR? … (ii)
For Judge B and C : 2
BCd6= 94 2
226 6 x 94 5641 1 1 1 0.32861716 ( 1) 12(12 1)BC
BCdR
nn6

0.6714BCR? … (iii)
Comparing the values of the rank coefficients (i), (ii) and (iii), we observe
that the value of rank
correlation between Judge A and C is the highest, among the three pairs.
Thus, Judge A and Judge C have the most common approach in judgment.
7.8.1 Ranks are not given and are non – repeated ranks
If the instead of ranks the actual data is given related to the variables we
rank the data. The ranks can be given in any order either ascending or
descending. The highest value gets the rank 1, and so on till the smallest
value getting the nth rank. Once the ranks are given, the remaining steps are
similar to the previous type. In this sub section we shall see example where
the values are not repeated.
Example 12: Find the rank correlation coefficient for the following data
giving marks of FYBMS students in the subjects of Mathematics and
Statistics: Marks in Maths 65 45 78 35 52 73 67 49 40 Marks in Stats 60 40 70 28 72 59 69 56 55 Solution: The marks in the subjects are ranked with the highest getting 1st
rank and the least one getting 9th rank. The table of computation is further
completed as shown below: Marks in Maths R1 Marks in Stats R2 d d2 65 4 60 4 0 0 45 7 40 8 -1 1 78 1 70 2 -1 1 munotes.in

Page 137

137 Correlation

From the
table: n = 9
and 2d6= 28 2
226 6 x 28 1681 1 1 1 0.23720 ( 1) 9(9 1)dR
nn6
0.77R?
Thus, there is a imperfect positive correlation between the marks in b oth the
subjects.
7.8.2 Repeated ranks
If two or more observations in a data set have same values then the ranks
cannot be discrete. To avoid this, such observations are given rank equal to
the arithmetic mean of the ranks that would have been given if th ey were
different and in order.
Let us consider an example of a data related to marks of students. After
giving first two ranks if there are three students with same marks, then each
is given rank equal to the average of 3, 4 and 5. If they would had been
distinct and in order, then each would had got one of these ranks. Thus, each
observation here gets the rank 4 which is the average of 3.
If after giving first five ranks there are two students with same marks, then
again both of them are given rank equal to the mean of the next two ranks
i.e. mean of 6 and 7, which is 6.5. Thus, each observation in this case gets
rank 6.5
In formulating the formula for rank correlation it is assumed that each
observation is ranked distinctly. When repeated ranks are assig ned a
correction factor (C.F.) is to be calculated for every repeated rank and their
total called as T.C.F., total correction factor is added to the 2d6value in the
actual formula. The correction factor for a particular rank is calc ulated by
the formula: 2(1 )C.F.12mm , where m is the number of times that rank is
repeated. In above example, in the first case m = 3. ?23(3 1) 24C.F. 212 12 . In the second case m = 2. 22(2 1) 6C.F. 0.512 12? . Hence T.C.F. = 2 + 0.5
= 2.5 35 9 28 9 0 0 52 5 72 1 4 16 73 2 59 5 -3 9 67 3 69 3 0 0 49 6 56 6 0 0 40 8 55 7 1 1 n = 9 2d6= 28 munotes.in

Page 138

138 Business Statistics
138 The new formula of rank correlation for repeated ranks is: 2
26( TCF)1
(1 )dR
nn6

Note : In the numerator of the formula, the TCF is added to 2d6and their
total is multiplied with 6.
Example 13: Calcul ate the rank correlation coefficient for the following
data:
Solution: Let us observe that in the first data, 10 is repeated two time and
7 is also repeated two times. In the second dat a, 6 is repeated three times.
Following the procedure explained in the above section, the table of
calculations is completed as shown below:
Data I R1 Data II R2 d = R1 – R2 d 2 10 6.5 6 8 -1.5 2.25 7 8.5 11 3 5.5 30.25 12 4 5 10 -6 36 10 6.5 9 5 1.5 2.25 16 2 6 8 -6 36 14 3 10 4 -1 1 5 10 15 2 8 64 11 5 18 1 4 16 18 1 6 8 -7 49 7 8.5 8 6 2.5 6.25 n = 10 2d6= 243
Before proceeding with the further calculations let us understand how the
ranks are given to the values in Dat a I and Data II.
In Data I, the first five ranks are in order. The next value 10 is repeated
twice and is given rank equal to the mean of the next two ranks i.e. of 6 and
7. The mean of 6 and 7 is 6.5, so rank 6.5 is given to both the values. The
next valu e below 10 is 7 which is also repeated twice. The next two ranks Data I 10 7 12 10 16 14 5 11 18 7 Data II 6 11 5 9 6 10 15 18 6 8
munotes.in

Page 139

139 Correlation in the order now are 8 and 9 (as 6 and 7 have been utilized). Thus, these
values get a rank equal to the mean of 8 and 9 which is 8.5.
In Data II, the first six ranks are in order. The next l ower value is 6 which
is repeated thrice. Hence it is given a rank equal to the mean of the next
three ranks which are 7, 8 and 9. Hence, these values are ranked as 8, which
is the average of 7, 8 and 9.
Calculation of Correction Factor :
For Data I: (i) 10 is repeated two times, so m = 2. ?CF = 0.5
(ii) 7 is repeated two times, so m = 2. ?CF = 0.5
For Data II: (i) 6 is repeated three times, so m = 3. ?CF = 2
?TCF = 0.5 + 0.5 + 2 = 3
Now, n = 10, 2d6= 243 and TCF = 3
2
226( TCF) 6 x (243+3) 14761 1 1 1 01.49990 ( 1) 10(10 1)dR
nn6

0.49 R?
Thus, there is a negative correlation between the given two sets of data.
Example 14: If the sum of the squar es of differences in the ranks of two
variables is 82.5 and the rank correlation coefficient is 0.5, find the number
of observations.
Solution: Given: R = 0.5 and 2d6= 82.5
We know that 2
26( )1
(1 )dR
nn6
. Substituting the v alues given we get,
26(82.5)0.5 1
(1 )nn
= 1 – 2495
(1 )nn
2495
(1 )nn?
= 1 – 0.5 = 0.5
2 495(1 )0.5nn? = 990 = 10 x 99 = 10 (102 – 1)
?n = 10.
7.9 LET US SUM UP
In this chapter we h ave learn:
• Definition of correlation and its types.
• Properties of correlation.
• Graphical method of correlation to explain types of correlation. munotes.in

Page 140

140 Business Statistics
140 • Karl Pearson’s method to calculate coefficient of correlation.
• Spearman’s rank correlation.
7.10 UNIT END EXERCI SES
1. Define Correlation and coefficient of correlation.
2. Explain the importance of Correlation.
3. Describe the different types of correlation with suitable diagrams
and examples.
4. Interpret the values of r.
5. What are different types of correlation coefficients?
6. Discuss the merits and demerits of correlation coefficients.
7. Compute the coefficient of correlation for the following data: X 7 9 8 5 6 3 4 1 2 Y 18 20 19 21 24 26 25 23 27 8. Compute the coefficient of correlation for the following data: X 90 102 106 110 120 115 119 114 111 Y 60 70 64 67 69 73 71 68 66 9. Find the coefficient of correlation for the data given below
representing the income and expenditure of families in a city. The
income and expenditure values are in ’000 Rs.: Income 7 9 11 13 15 17 Expenditure 4 5 7 10 13 16 10. The following data gives the percentage of students in their SSC and
HSC examination. Find the coefficient of correlation. SSC 60 55 48 72 81 88 59 42 77 HSC 55 54 50 78 80 86 67 48 74 11. The following data gives the details of imports and exports in terms
of money (in lakhs of Rs.) for a country. Find the coefficient of
correlation. Imports 22 25 21 26 29 31 28 32 33 35 Exports 36 38 40 46 42 40 44 50 52 57 munotes.in

Page 141

141 Correlation 12. The following data gives marks obtained by 8 students in two tests.
Find the coefficient of correlation between the performances in two
tests. Test I 5 17 13 8 22 18 12 14 Test II 8 20 12 5 18 20 10 11 If the class teacher later on gives 5 marks for attendance to all the
students, what will be the correlation coefficient then?
13. The following data gives the sensitive index number in BSE and
NSE for 5 consecutive days. Find the correlation coefficient between
them. BSE 15540 15600 16022 14870 14900 15120 NSE 652 660 710 590 625 634 14. The following data gives the average rainfall (in cm) in an area and
yield of a crop (in tons). Find the correlation coefficient between
them. Are the two related to each other? Rainfall 120 168 170 165 150 172 180 175 Yield 79 82 90 121 156 175 190 230 15. The following data gives the heights of father and their sons. Find
whether there is any correlation between the two heights? Height of
father 162 156 166 172 171 178 153 160 177 180 Height of
Son 156 160 155 170 165 168 160 154 180 182
16. The followin g data give the amount of chemical fertilizer ( X) used
by a farmer over a period of 10 years, the yield of the crop ( Y) and
the percentage of minerals ( Z) in the farm. Find the correlation
between X and Y, X and Z. X 10 12 14 16 18 20 22 24 26 28 Y 88 90 96 102 110 109 108 105 106 104 Z 70 68 66 65 64 62 61 60 58 56 17. The following data gives the amount of pocket money given to
college going students and their expenditure on eatables. Find the
correlation coefficient. munotes.in

Page 142

142 Business Statistics
142 Money 100 150 200 250 300 350 Expenditure on eatables 45 80 120 150 180 230 18. The following data gives the percentage of toppers in their SSC and
HSC examination. Find the degree of correlation is there between
the two results. SSC 90 92 91 96 97 95 HSC 86 90 85 92 91 70 19. Raut Ph arma ltd. wants to know about the impact of advertisement
on the sales of their product. The expenditure on advertisement (in
’000 Rs.) and the total profit on sales is given below for a period of
5 years. Find the correlation coefficient and comment on wh at Raut
Pharma ltd. should conclude? Advertisement expenditure 40 45 48 50 55 Profit in lakhs of Rs. 10 12 11 10 13 20. The following data gives the ages in years of males and their Blood
Pressure count. Find the coefficient of correlation. Age 42 55 63 77 38 49 50 57 60 71 Blood Pressur
e 12
2 13
2 13
0 14
3 12
7 14
6 15
2 15
5 14
5 15
8
21. The covariance between two variables is 105 and the standard
deviations are 10.5 and 14.2. Find the correlation coefficient.
22. The covariance between two variables is 1065 and the variances are
1210 and 13082. Find the correlation coefficient.
23. If the coefficient of correlation is 0.76 and the standard deviations are
11.54 and 12.8, find the covariance between the two variables.
24. Find the number of observations if r = 0.9, 6xV , 5yV and ( )( ) 270xxyy6 .
25. If for a pair of data, n = 12, x6= 110, y6= 90, 2x6= 1260, 2y6= 950
and xy6= 1010, find the coefficient of correlation.
26. The following bivariate table gives the ages of couple and the ages of
their children. Find the Correlation coefficient. munotes.in

Page 143

143 Correlation
27. The following bivariate table gives frequency distribution for the
classes. Find the coefficient of correlation.

28. The following bivariate table gives frequency distribution of sales of
two products of a company. Find the coefficient of correlation.

29. The marks obtained by 20 students in two subjects are given below in
pairs. Prepare a bivariate frequency distribution by taking proper class
intervals and find the Karl Pearson’s coefficient of correlation
between the two results:
(10, 12), (7, 9), (8, 16), (12, 6), (5, 9),(4, 2), (8, 10), (11, 16), (13, 7),
(9, 12),(17, 19), (15, 19), (10, 11), (6, 8), (3, 10),(5, 5), (2, 9), (9, 4),
(7, 2) and (18 , 6).
30. The following data gives the ranks given to students in two subjects.
Find the rank correlation coefficient. Sub: A 6 1 3 5 2 4 7 8
munotes.in

Page 144

144 Business Statistics
144 Sub: B 3 4 7 2 8 1 5 6 31. The ranks given by judges of a competition to participants are given
below. Find the rank correlation coefficient. Judge A 3 7 1 4 5 2 9 6 8 10 Judge B 2 5 1 3 6 4 7 8 10 9
32. The marks obtained by 10 students in two subjects Business Law ( X)
and Business Statistics ( Y) are given below. Find the rank correlation
coefficient. X 30 78 45 60 59 48 38 77 65 81 Y 42 70 58 65 45 49 40 66 71 63 33. The following data gives the marks obtained by 8 students in their
theory and practical examination. Find the rank correlation
coefficient. Theory 45 52 56 41 35 29 53 46 Practical 12 15 16 13 10 8 17 11 34. Find the rank correlation coefficient for the following data: Height in cm 120 136 178 120 135 160 175 152 Weight in kg 33 41 48 41 50 54 58 41 35. The marks obtained by students in two semesters are given below.
Rank the marks and find the rank correlation coefficient. Sem I 345 440 560 340 345 560 400 380 345 490 Sem II 380 400 520 610 422 400 520 353 518 500
36. Ten candidates, who appeared for a PI, were interviewed by the MD
(X), Deputy Manager ( Y) and HR Manager ( Z). The candidates we re
given ranks by all the three as shown below. Find using Spearman’s
Rank Correlation Coefficient, which two of them have a most
common approach. X 5 9 1 6 3 8 4 7 2 10 Y 3 7 2 8 5 6 1 4 10 9 munotes.in

Page 145

145 Correlation Z 4 8 1 7 6 10 5 3 2 9 37. The salesmen in a company were gi ven marks on the basis of their
performance, by the Board of Directors A, B and C. The marks were
as follows: A 11 14 10 18 7 16 9 5 13 20 B 16 18 19 13 20 11 12 6 10 15 C 10 12 11 19 8 17 7 9 14 18 Rank the above data and find the rank correlation betw een all pairs of
Directors, and comment which pair has the most common approach
in their assessment.
38. The rank correlation coefficient between two data is given as 0.25. If
the sum of the squares of the differences in their ranks is 63, find the
number of observations.
39. The rank correlation coefficient between two data is given as 0.5. If
the sum of the squares of the differences in their ranks is 110, find the
number of observations.
40. If the sum of the square of the difference in the ranks for a b ivariate
data with repeated ranks is 278, the rank correlation coefficient -0.7
and number of observations 10, find the total correction factor.
Multiple Choice Questions:
1) If r =1, then there is ____________ correlation between the two
variables.
a) No b) Perfect negative
c) Perfect positive d) Elastic
2) Product moment correlation coefficient is also known as
__________
a) Pearson’s b) Spearman’s
c) Laspeyre’s d) Paasche’s
3) Coefficient of correlation lies between ________.
a) -1 and +1 b) -2 and +2
c) 0 and -1 d) None of these
4) Find Rank Correlation coefficient if dA2 204 and n 10.
a) 0.273 b) -0.237
c) 0.237 d) 0.5

5) When the values of two variables move in the same direction,
correlation is said to be ............................
a) Linear b) Non -linear c) Positive d) Negative munotes.in

Page 146

146 Business Statistics
146 6) The correlation between shoe -size and intelligence is
a) Zero b) Positive c) Negative
d) None of these
7) Scatter diagram helps us to
a) Find the nature correlation between two vari ables.
b) Compute the extent of correlation between two variables.
c) Obtain the mathematical relationship between two variables.
d) Both (a) and (c).
8) The covariance between two variables is
a) Strictly positive b) Strictly negative
c) always zero d) either positive or negative or zero
9) For finding correlation between two attributes we consider
a) Pearson’s correlation coefficient b) Scatter diagram
c) Spearman’s rank correlation coefficient
d) Coefficient of correlation
10) “Dema nd for goods and their prices under normal times” , correlation
are
a) Positive b) negative c) Zero d) None of these
7.11 LIST OF REFEREN CES
• Fundamentals of mathematical Statistics by S.C. Gupta and V.K
Kapoor.
• Basic Statistics by B. L. Agrawal.
7777777munotes.in

Page 147

147 8
REGRESSION ANALYSIS
Unit Structure
8.0 Objectives:
8.1 Introduction:
8.2 Importance of regression analysis
8.3 Methods of studying regression
8.3.1 Method of Least Squares
8.4 Properties of regression
8.5 Let us sum up
8.6 Unit end Exercises
8.7 List of References
8.0 OBJECTIVES:
After going through this chapter you will able to know:
• Meaning of regression.
• Types of methods to solve regression equation.
• Least square method to solve linear regression method.
• Properties of regression analysis.
8.1 INTRODUCTION:
In the previous chapter on Correlation we have seen that the correlati on
coefficients measure the magnitude and direction of correlation between
two variables. The statistical analysts are not satisfied with only the degree
of correlation but are also interested to know what the mathematical relation
between the variables in to consideration is. Obviously, this is only possible
when the correlation is due to a cause and effect relation between the
concerned variable. If a relation between the expenditure on advertisement
and the sales of a product is known to the company, it c an easily predict the
sales for a particular amount of advertisement. Thus, such a relation is
useful in estimation, an important tool of statistical analysis.
For example, if the correlation coefficient between the heights and weights
of people is 0.8, i t leads to the conclusion that height of a person is strongly
and positively related with his or her weight. The obvious interesting
question will be, can this relation be represented by a linear (or non - linear)
equation. To answer all such questions regr ession analysis is used. munotes.in

Page 148

148 Business Statistics
148 The study of defining a mathematical relation between two or more
variables and facilitate in forecasting or estimating value of one variable
given the value of the other variable is called as regression analysis.
Thus, correlatio n coefficient gives the degree of correlation and regression
gives the exact relation between the variables
8.2 IMPORTANCE OF REGRES SION ANALYSIS:
• Regression analysis is statistical technique to represent the
relationship between variables. This is used widely in various fields
like, social sciences, psychology, economics, bioinformatics,
business etc.
• The important aspect of regression analysis is its use to estimate value
of the dependent variable using the value of the independent variable.
• The mathematical equations representing the relation between
variables are called as regression equations and the coefficients of the
variables in the equ ation are called as regression coefficients.
• The correlation coefficient and the regression coefficients are also
mathematically related. This helps in estimating the correlation
coefficient if the regression coefficients are known.
8.3 METHODS OF STUDY ING REGRESSION:
There are two methods to determine the regression equations:
(1) Free Hand Curve method : This is a graphical way of drawing
regression lines using Scatter Diagram.
(2) Method of Least Squares : This is used to determine the regression
equations assuming that the relation is linear.
As per our scope of syllabus, we shall study the second method.
8.3.1 Method of Least Squares:
In this method the sum of the squares of the deviations of the values of the
variables from its estimated value repr esented by the best suitable linear
equation is minimized.
If y = a + bx is the best fitting line for a given set of data related to variables
x and y, then by using the method of least squares we have the conditions
as follows: yN a b x6 6 … (1) 2xy a x b x6 6 6 … (2)
These are called as the normal equations. Solving these normal equations
simultaneously, we get the values of a and b, the regression coefficients.
The equation y = a + bx is called as the regression equation of y on x. munotes.in

Page 149

149 Regression Analysis If xab y cc is the best fitting line to a given set of data related to variables
x and y, then by using the method of least squares we have the conditions
as follows: xN aby cc6 6 … (1) 2xy a y b y cc6 6 6 … (2)
These are called as the normal equations. Solving these normal equations
simultaneously, we get the values of ac and bc, the regression coefficients.
The equation xab y cc is called as the regression equation of x on y .
1) Direct formula:
For all computational purposes, the following formula is more
popular as it uses the direct values of the variables from the raw data.
(i) If y a bx is a regression equation of y on x, then the regression
coefficients a and b are computed as follows: 22()yxnx y xyb
nx x6 6 6
6 6and ..yx yxyxab y b xnn66 … (3)
The regression coefficient b is corresponding to the regression
equation of y on x, hence is denoted asyxband is called as the
regression coefficient of y on x.
(ii) If xab y cc is a regression equation of x on y, then the regression
coefficients and abccare computed as follows: 22()xynx y xyb
ny y6 6 6
6 6 and yxxyabnn66c … (4)
The regression coefficient bcis corresponding to the regression
equation of x on y, hence is denoted asxyband is called as the
regression coefficient of x on y.
If we observe th e formula for byx and bxy carefully the denominators
are nothing but respective variances (or squares of the respective
standard deviations) and the numerator is the same as in the formula
for computing the correlation coefficient. This is no coincidence. The
regression coefficients are indeed related to the standard deviations
and the correlation coefficient. This leads us to the second type of
formula as mentioned below
(1) The relation between the regression coefficients, standard
deviations and correlat ion coefficient is as follows:
y
yxbrxV
V and x
xy
ybrV
V munotes.in

Page 150

150 Business Statistics
150 2 . y x
yx xy
ybb r r rxVV
VV§·§· ¨ ¸ ¨¸¨¸¨¸©¹©¹
. yx xy rb b? r … (5)
We have mentioned in the section 5.2 that the regression
coefficients can be used to compute the corre lation coefficient.
The above formula no. (5) gives the same.
The sign of r depends on the signs of the regression coefficients
byx and bxy.
If both byx and bxy are positive then r is also positive.
If both byx and bxy are negative then r is also negative .
(2) If the respective means and the regression coefficients are
known then the two regression equations are given by the
formula:
Regression equation of y on x: ()yx xx b yy … (6)
Regression Equation of x on y: () ()xy yy b xx … (7)
8.4 PROPERTIES OF REGRES SION
(1) The point of intersection of the two regression equations is (, )xy. If
we solve simultaneously both the regression equations, then the
solution set or simply the point of intersection is the mea n of x and
mean of y.
(2) The correlation coefficient is the geometric mean of the regression
coefficients. i.e. . yx xy rb b r. Both the regression coefficients have
the same sign (either both are positive or both are negative).
(3) Regression equations are independent of change of origin but not
independent of change of scale.
ALL FORMULAE AT A GLANCE Regression Equation of y on x Regression Equation of x on y .yx ya bx .xy xaby c 22()yxnx y xyb
nx x6 6 6
6 6 .yxyxabnn66 22()xynx y xyb
ny y6 6 6
6 6 .xyxyabnn66c Normal Equations: Normal Equations munotes.in

Page 151

151 Regression Analysis

Example 1: Find the two regression equations given the following
information: 382, 75 xy x6 6 , 21442 x6 , 270, 1320yy6 6 and n = 10.
Solution: (i) Regression Equation of y on x: yx ya b x
We first find the regression coefficient yxb: 22 210(482) (75)(70) 4820 52500.04914420 5625 10(1442) (75)yxnx y xyb
nx x6 6 6 6 6
and 70 75( 0.049) 7 0.37 7.3710 10yxyxabnn66 ? Regression Equation of y on x is y = 7.37 – 0.049 x … (1)
(ii) Regression E quation of x on y: xy xab y c
Now, 22 210(482) (75)(70) 4820 52500.05213200 4900 10(1320) (70)xynx y xyb
ny y6 6 6 6 6
and 75 70( 0.052) 7.5 0.364 7.86410 10xyxyabnn66c ? Regression Equation of x on y is x = 7.864 – 0.052 y … (2)
Example 2: The following data gives the amount of sales and purchase in
lakhs of Rs. of a company. Find the regression equations.
Solution: Since the values are large, we shall use the property that the
regression coefficients are independent of change of or igin.
Let x = s – 22 and y = p – 25. The table of computations is as shown below:
2yN a b x
xy a x b x6 6
6 6 6 2 xN aby
xy a y b ycc6 6
cc6 6 6 ()yx yyb xx ()xy xx b yy The point of intersection of the regression equations is (, )xy . yx xy rb b r where r is positive if both & yx xybbare positive and
r is negative if both & yx xybbare negative The regression coefficients are independent of change of origin.
Sales (s) 20 43 22 30 40 Purchase (p) 18 32 15 25 30 munotes.in

Page 152

152 Business Statistics
152 Sales (s) x = s – 22 Purchase (p) y = p – 25 xy x2 y2 20 – 2 18 – 7 14 4 49 43 21 32 7 147 441 49 22 0 15 – 10 0 0 100 30 12 25 0 0 144 0 40 18 30 5 90 324 25 Total x6= 43 - y6= -5 xy6= 251 2x6= 913 2y6= 223
(i) To find Regression Equation of y on x: yx ya b x
Using the table va lues we have,
22 25(251) (43)( 5) 1255 2150.544565 1849 5(913) (43)yxnx y xyb
nx x6 6 6 6 6
and 5 43(0.54) 1 4.64 5.6455yxyxabnn66
? Regression Equation of y on x is y = – 5.64 + 0.54 x … (1)
(ii) Regression Equation of x on y: xy xab y c
Now, 22 25(251) (43)( 5) 1255 2151.351115 25 5(223) ( 5)xynx y xyb
ny y6 6 6 6 6
and 43 ( 5)(1.35) 8.6 1.35 9.9555xyxyabnn66 c
? Regression Equation of x on y is x = 9.95 –1.35y … (2)
Example 3: The average marks of 300 students in English and Hindi are 45
and 56 respectively while their respective standard deviations ar e 10 and
12. If the sum of the products of the deviations from the averages is 32724,
find the regression equations. Also estimate the marks obtained in English
if a student obtains 70 marks in Hindi.
Solution: Let the data related to English be denoted by x and that related to
Hindi be denoted by y. Given: n = 300, x= 45, y= 56, xV= 10, yV= 12 and () ()xxyy6 = 32724.
The problem is solved in 4 steps: (i) finding r, (ii) computing the regression
coefficients, munotes.in

Page 153

153 Regression Analysis (iii) finding the regression equations and (iv) Estimation.
Now, we know that r = () ()
xyxxyy
nVV6
32724
300 x 10 x 12r? = 0.909
The regression coefficients can now be obtai ned as follows:
120.909 x 1.0910y
yx
xbrV
V and 100.909 x 0.757512x
xy
ybrV
V
The regression equation of y on x is given by ()yx yyb xx
56 1.09( 45)yx? 56 1.09 49.05yx
1.09 49.05 56yx?
6.95 1.09yx? … (1)
The regression equation of x on y is given by ()xy xx b yy
45 0.7575( 56)xy? 45 0.7575 42.42xy
0.7575 42.42 45xy?
2.58 0.7575xy? … (2)
Estimation :
To find marks obtained in English ( x) if m arks obtained in Hindi ( y) are 70
Given y = 70, to find x we use the regression equation of x on y.
Substituting y = 70 in eqn (2) above, we get est 2.58 0.7575(70) 2.58 53.025 55.6 x est 56 x?|
Thus, the estimated marks in English are 56.
Example 4: Using the following tabulated information, find (i) the most
probable value of x when y = 10 and (ii) the most probable value of y when
x = 12. r = 0.65 x y Mean 15 22 S.D. 7.5 9 Solution: Given from the table: r = 0.65, x= 15, y= 22, xV= 7.5, yV= 9 munotes.in

Page 154

154 Business Statistics
154 We first find the regression coefficients: 90.65 x 0.787.5y
yx
xbrV
V and 7.50.65 x 0.549x
xy
ybrV
V
The regression equation of y on x is given by ()yx yyb xx 22 0.78( 15)yx? 22 0.78 11.7yx 0.78 11.7 22yx? 10.3 0.78yx? … (1)
The regression equation of x on y is given by ()xy xx b yy 15 0.54( 22)xy? 15 0.54 11.88xy 0.54 11.88 15xy? 26.88 0.54xy? … (2)
Estimation :
(i) To estimate the value of x when y = 10
Given y = 10, to find x we use the regression equation of x on y.
Substituting y = 10 in eqn (2) above, we get est 26.88 0.54(10) 26.88 5.4 32.28 x
(ii) To estimate the value of y when x = 12
Given x = 12, to find y we use the regression equation of y on x.
Substituting x = 12 in eqn (1) above we get est10.3 0.78(12) 10.3 9.36 19.66 y
Example 5: The two regression equations are as follows: 4 x – 3y + 12 = 0
and 5 x – 2y – 20 = 0. Find which one of these is the regression equation of
y on x and which is the regression equation of x on y.
Solution: Let 4x – 3y + 12 = 0 … (1)
5x – 2y – 20 = 0 … (2)
Since we don’t know which equation represents which type, we start with
assuming that the eqn (1)is the regression equation of y on x and the eqn (2)
is the regression equation of x on y.
Rewriting these equations in their standard form, we have
From (1): 4x – 3y + 12 = 0 3y = 4x + 12 munotes.in

Page 155

155 Regression Analysis ?y = 4
3x + 12
3= 4 + 1.33 x
Comparing with yx ya b x , we have the regression coefficient as yxb= 1.33
From (2): 5x – 2y – 20 = 0 5x = 2y + 20 ?x = 2 20
55y= 4 + 0.4 y
Comparing with xy xab y c , we have the regression coefficient as xyb = 0.4
Now, we know that . yx xy rb b r
Since both the coefficients are positive, r is also positive and its value is:
r = 1.33 x 0.4 0.532 0.73
Since the value of r is in the range of – 1 and 1, our assumption about the
equations is correct.
Thus eqn(1) represents the regression equatio n of y on x and eqn(2)
represents the regression equation of x on y.
Note :
1. Problems of such type are to be solved by making the assumption as
discussed in the previous problem.
2. If the value of r comes out to be greater than 1 or less than – 1, then
we have to back to the first step and alter the assumption made
regarding the equations. Simplify the equations to get the regression
coefficients and then find the correct value of r.
3. This method is lengthy only if our assumption is wrong.
Example 6: The following information is provided regarding the regression
equation of y on x: The equation is 5 x – 2y – 21 = 0, x= 9, the coefficient
of correlation is 0.8. Find the mean value of y and the ratio of the standard
deviations of x and y.
Solution: The given equation 5x – 2y – 21 = 0 is the regression equation of
y on x. Rewriting it in the standard form, we have:
2y = 5x – 21 y = 2.5 x – 10.5 (dividing by 2 throughout) … (*) ?yxb= 2.5
We know that the point of intersection of the regression equations is the
mean value of x and y. In other words (, )xysatisfy the regression equations.

Thus, to find the mean value of y, we substitute x= 9 in (*) munotes.in

Page 156

156 Business Statistics
156 ?y = 2.5 (9) – 10.5 = 22.5 – 10.5 = 12
Mean value of y = 12.
Given r = 0.8, to find the ratio of the S.D.’s we use the formula: y
yx
xbrV
V ?2.5 = 0.8 x y
xV
V ?y
xV
V= 2.5
0.8= 25
8 ? the ratio of standard deviations is 25:8
8.5 LET US SUM UP
In this chapter we have learn:
• Definition of regression and types of regression.
• Using least square methods formed linear regression equation.
• To solve regression equations using analysis
• Using regression equation we can predicate value of unknown
variable using known variable.
8.6 UNIT END EXERCISES :
1. Define regression analysis.
2. Explain the significance of regression equations.
3. Define regression coefficients. Also state their properties.
4. Find the two regression equations given the following information: 456, 90 xy x6 6 , 2920 x6 , 280, 1360yy6 6 and n = 10.
5. Find the two regression equations given the following information:
2214345, 210, 12342, 182, 25720 and 10xy x x y y n 6 6 6 6 6
Also find the correlation coefficient.
6. Find the two regression equations given the following information:
2245, 12, 32, 32, 144 and 8 xy x x y y n6 6 6 6 6
Also find the correlation coefficient.
7. The following details are given regarding the prices of pulses in
Mumbai and Raigad. Average price of pulses in Mumbai: Rs. 14
S.D. of price of pulses in Mumba i : Rs. 2
Average price of pulses in Raigad : Rs. 10
S.D. of price of pulses in Raigad : Rs. 4 munotes.in

Page 157

157 Regression Analysis If the coefficient of correlation is 0.4, find the (i) price of pulses in
Raigad if that in Mumbai are Rs. 10 and (ii) price of pulses in
Mumbai, if that in R aigad are Rs. 8.
8. The following data gives the amount of sales in lakhs of Rs. and of
two products of a company. Find the regression equations.
9. By using the normal equations, find the two regressi on equations.
Also estimate the value of (i) y when x = 18 and (ii) x when y = 12
10. From the following data find the two regression equations and also
the correlation coefficient X 2 4 1 6 3 8 7 10 Y 8 10 9 11 12 5 6 2 11. The marks obtained by students in two terms, Term I and Term II, are
given below. Find the two regression equations and the correlation
coefficient. Also estimate (i) marks in Term I if marks in Term II are
40 and (ii) marks in Term II if marks in Ter m I are 30. Term I 50 52 47 60 38 79 85 40 55 77 Term II 45 50 49 64 40 80 82 43 51 75 12. Find a regression equation of income ( y) on the expenditure ( x) on the
basis of the following data provided. Estimate the income if the
expenditure is Rs.11,550. Income in ’000Rs. 5 12 8 11 16 22 30 25 Expenditure in ’000 Rs. 3 7 3 8 11 17 22 20 13. The following data gives the height in cm of 8 mothers and their sons.
Find the two regression equations. Estimate the height of a son whose
mother’s height is 165 cm. Height of
mother 145 166 170 150 155 172 138 159 Height of son 150 162 160 170 154 170 142 161 14. Mr. Sagar Mistry is dealer of four wheeler vehicles and also owns his
own garage. The annual maintenance charge is Rs. 600. The following
data gives the n umber of vehicles sold and number of car owners who
took AMC from him. Find the two regression equations and estimate
the income from AMC’s of Mr. Mistry if 300 cars are sold. Product I 32 78 45 66 89 Product II 21 37 26 33 41 X 2 5 8 11 14 Y 4 13 22 31 40 munotes.in

Page 158

158 Business Statistics
158 No of Cars
sold 50 90 65 100 120 180 150 175 No of AMC’s 10 36 30 48 54 77 110 125 15. The following data gives the income of 10 persons corresponding to
their period of service. Salary in ’000 Rs 4 6 8 14 18 22 28 35 40 45 Period of
service 1 3 6 8 10 14 18 20 25 30
Find the two regression equations and estimate (i) period of ser vice,
if salary is Rs. 50,000 and (ii) salary, if period of service is 35 years.
16. The following data gives the ages of males ( x) and their Blood
Pressure ( y). Find the regression equations and hence estimate (i) age
when blood pressure is 140 and (ii) B lood pressure when age is 40
years. Age in years 25 28 37 32 45 68 52 78 60 65 Blood Pressur
e 14
5 15
6 16
0 13
5 14
2 15
5 16
2 17
6 16
6 17
2
17. Fit a regression line of y on x by method of least square to the
following data: X 2 3 4 5 6 8 Y 2.4 2.9 4.2 5.6 5.5 7.4
18. The average marks of 400 students in Economics and Accounts are
45 and 65 respectively while their respective standard deviations are
12 and 16. If the sum of the products of the deviations from the
averages is 32700, find the regression equati ons. Also estimate the
marks obtained in Economics if a student obtains 80 marks in
Accounts.
19. The average marks of 100 students in Mathematics and Statistics are
52 and 48 respectively while their respective standard deviations are
9 and 13.5. If the s um of the products of the deviations from the
averages is 11450, find the regression equations. Also estimate the
marks obtained in Statistics if a student obtains 75 marks in
Mathematics.
20. The average marks of 250 students in Business Law and Industria l
Law are 40 and 52 respectively while their respective standard
deviations are 8 and 11. If the sum of the products of the deviations
from the averages is 9875, find the regression equations. Also munotes.in

Page 159

159 Regression Analysis estimate the marks obtained in (i) Business Law if a stude nt obtains
60 marks in Industrial Law and (ii) Industrial Law if a student obtains
35 marks in Business Law.
21. Using the following tabulated information, find (i) the most probable
value of X when Y = 10 and (ii) the most probable value of Y when X
= 15. r = 0.65 X Y Mean 15 22 S.D. 7.5 9 22. Using the following tabulated information, find (i) the most probable
value of X when Y = 25 and (ii) the most probable value of Y when X
= 30. r = 0.9 X Y Mean 36 38 S.D. 6 8 23. Given the mean wages of men an d women working in a factory as Rs.
100 and Rs. 75 with standard deviations Rs. 12 and Rs. 16 and the
coefficient of correlation as 0.54, find the regression equations and
estimate (i) the wages of men, if the wages for women are Rs. 100 and
(ii) the wages of men, if the wages of women are Rs. 60.
24. Find the regression equations, if for a given data x= 120, y= 150,
xV= 16, yV= 12 and r = 0.58.
25. The follow ing data is about the Sales and advertising expenditure of
a company A to Z ltd. which produces educational software. r = 0.24 Sales in ’00 Rs. Advertising in ’00 Rs. Mean Rs. 45 Rs. 18 S.D. Rs. 14 Rs. 11 Find the regression equations and estimate the most probable amount
of sales if the advertising expenditure is Rs. 25,000.
26. Find the regression equations of y on x and of x on y, if for a given
data x= 30, y= 50,
xV= 12.5, yV= 10 and r = -0.25.
27. The following data is given for the marks obtained by students in the
subjects of Business Communication and English Literature in an
examination. Find the regression equations and estimate (i) the most
probable marks in Business Communication if the marks in English
Literature are 78. r = 0.85 Business Communication English Literature Mean 64 56 munotes.in

Page 160

160 Business Statistics
160 S.D. 16 12 28. The following equations represent the regression equations. Find the
mean values of x an y.
3 7 114xy and5 3 86xy .
29. The regression equations are givens as: 2 x – y = 10 and 3 x – 2y – 5 =
0. Find the mean values of x and y. Also find the correlation
coefficient.
30. The two regression equations are given as : 4 x – 3y – 27 = 0 and 3 x –
4y + 6 = 0. Find
(i) The mean values of x and y, (ii) correlation coefficient.
31. The two regression equations are as follows: 4 x – 3y + 3 = 0 and 5 x –
2y – 26 = 0. Find (i)which is the regression equation of y on x and
which is the regression equation of x on y, (ii)the mean values of x
and y, (iii) correlation coefficient, (iv) S.D. of x if S.D. of y is 12 and
(v) most probable value of y if x = 10.
32. The two regression equations are as follows: 2 y – x – 2 = 0 and 3 y –
2x + 1 = 0. Find (i)which is the regression equation of y on x and
which is the regression equation of x on y, (ii)the mean values of x
and y, (iii) correlation coefficient, (iii) the most probable value of y
when x = 11 and (iv) the most probable value of x when y = 20.
33. The two regression equations are as follows: 2 x – y – 3 = 0 and 4 x –
2y – 6 = 0. Find (i)which is the regression equation of y on x and which
is the regression equation of x on y, (ii)the mean values of x and y, (iii)
correlation coefficient, (iii ) the most probable value of y when x = 14
and (iv) the most probable value of x when y = 30.
34. The following information is provided regarding the regression
equation of y on x: The equation is 10 y – 8x – 240 = 0, y= 40, The
coefficient of correlation is 0.9. Find the ratio of the standard
deviations of x and y.
35. The regression equation of amount of fertilizer ( x) used on the yield
(y) of the crop is given by 2 x – 3y + 2 = 0. If the average yield of the
crop is 12 tons and the ratio of the standard deviations is 9:4, find the
coefficient of correlation.
Multiple Choice questions:
1) If the regression equation of X and Y is 5X+7Y=135 , the estimated
value of X when Y = 10 is ___________
a) 8 b) 10 c) 5 d) 13
2) If the valu es of regression coefficient are 0.2 and 0.8 ,then the values
of correlation coefficient is _____
a) 0.6 b) -0.6 c) 0.36 d) 0.4
3) If the values calculated for 10 pairs of x and y variables are: ܾ௬௫ =
ଵ
ଷ, r=ଵ
ଶ, s.d of x=3, then s.d of y is munotes.in

Page 161

161 Regression Analysis a) 2 b) 4 c) 1 d) 3
4) The value of correlation coefficient is _____of the two regression
coefficients
a) Arithmetic Mean b) Geometric Mean
c) Harmonic Mean d) Standard deviation
5) Identify the triplet of numbers representing values of the two
regression confidents and correlation coefficient respectively.
a) 1.5, 1.5, 1.5 b) 1, 1, 1
c) 0.5, -0.5, 0.5 d) 0.4, 1.6, -0.8
6) If the two regression lines are represented by 2x -3y+11=0 and x -
2y+9=0, the main value of x and y are _________
a) (7, 5) b) (6, 5)
c) (6, 7) d) (5, 7)
7) For 10 pairs of x and y, variance of x and y are respectively 25 and
49 with means 15 and 8 respectivly. If the covariance between x and
y is -24.5, thus regression of coefficient of y on x i s _________
a) -0.98 b) -0.5
c) -0.78 d) -0.58
8) If the two regression equation are 40x - 18y=214 and 8x -10y+66=0,
then coefficient of correlation is__________
a) 0.65 b) 0.85
c) 0.7 0 d) 0.60.
9) If regression coefficient of y on x is ିସ
ଷ, coefficient of correlation is -
0.8, variance of x is 9, then standard deviation of y is __________
a)3 b) 5
c) 8 d) None of these .
10) Two regression lines corriance with each other , thus the correlation
between the variables are _________.
a) perfect positive b) perfect negative
c) both (a) and (b). d) None of these .
8.7 LIST OF REFERENC ES
• Fundam entals of mathematical Statistics by S.C. Gupta and V.K
Kapoor.
• Basic Statistics by B. L. Agrawal.
7777777munotes.in

Page 162

162 Business Statistics
162 9
TIME SERIES
Unit Structure
9.0 Objectives:
9.1 Introduction:
9.2 Importance of time series analysis
9.3 Components of time series
9.4 Models for analysis of time series
9.5 Methods to find trend
9.6 Method of moving averages
9.7 Merits and demerits of mo ving average method
9.8 Method of least squares
9.9 Measurement of seasonal trend using simple average
9.10 Let us sum up
9.11 Unit end exercises
9.12 List of references
9.0 OBJECTIVES:
After going through this chapter you will able to know:
•The arrangement data with respect to historical order.
•Study time series data.
•Moving average method to study time series.
•Least square method to study and to predicate the future value of the
data.
•Find seasonal indices for quarterly.
9.1 INTROD UCTION:
Swami Vivekananda said once, “Taking the inspiration from the past and
keeping an eye on the future if we determine our present, we shall lead to
success”. Every business venture needs to know their performance in the
past and with the help of some predictions based on that would like to decide
their strategy for the present. This is applicable to economic policy makers,
meteorological department, social scientists, political analysts etc.
Forecasting thus is an important tool in Statistical analysi s. Globalization
has made the economy more competitive and in turn has made it necessary
for all businessmen, entrepreneurs to know about the future of their
products, keeping in mind all the constraints. munotes.in

Page 163

163 Time Series Forecasting techniques facilitates prediction on t he basis of a data available
from the past. This data from the past is called as time series. A set of
observations, of a variable, taken at a regular (fixed and equal) interval of
time is called times series. The interval of time may be an hour, a day, a
month, a quarter, a year or more than that. Census of population for
example, is taken after every ten years. A time series is a bivariate data, with
time as the independent variable and the other is the variable under
consideration. There are various fore casting methods for time series which
enable us to study the variations or trends and estimate the same for the
future.
If Y denotes the dependent variable and t denotes the time period, then the
relation between both the variables is written as Y = f(t). We say that Y is a
function of t and can be represented symbolically as Yt. If at time period t1,
t2,…., tn the values of the dependent variable are Y1, Y2,…, Yn then the pairs
(t1, Y1), (t2, Y2),…., ( tn, Yn) represent the time series.
9.2 IMPORTANCE OF TI ME SERIES ANALYSIS
The analysis of the data in the time series using various forecasting models
is called as time series analysis . The importance of times series analysis is
due to the following reasons:
•Understanding the past behavior : This involves identifying the
various factors that have affected the variable in the past which will
help us for the future prediction.
•Planning the future actions : Based on the past patterns or trends and
influence of various factors, planning for the future is done using
forecasting statistical methods.
•Comparative study : Two or more time series data can be compared
for studying the impact of various factors affecting the values of the
dependent variable. For example the growth of population in two
cities can be compared for a fixed period of time or the agricultural
production in two areas also can be compared to give an overview of
the difference or similarities in the observations and analyse them
statistically.
9.3 COMPONENTS OF TIME SERIES
The values of the dependent variable in the times series are affected by
many factors which can be seasonal, cyclic or due to some impulsive
reasons. The major type of variations, also called as the components of
times series ar e as follows:
1) Secular Trend:
The tendency of a time series of increase, stagnate or decrease over a
long period of time is called secular trend or just trend of the time
series. Many business and economical activities show a secular trend.
For example, the interest rates of banks, the rates in the real estate, munotes.in

Page 164

164 Business Statistics
164 inflation, population, birth -rate, death -rate etc. The term “long
period” is a relative term. For time series of inflation it may be a week
and for a time series of population the period may be in years. The
following two graphs of time series of a population and the time series
of the death rate of a city are shown. While the population time series
shows an upward trend the death rate series shows a downward
(declining) trend.

The forecasts mad e on the basis of a secular trend assume that the
factors influencing the increase or decrease of the variable are
constant throughout a major period of time. However, this may not
true always. The secular trend is denoted by T.
(1) Short term fluctuations :
(a) Seasonal Variations:
The variations that are observed in a time series at a regular interval
of time with period less than or equal to a year are called as seasonal
variations. These occur mainly due to the climatic (seasonal)
variations of the nat ure or due to local traditions and customs. The
sales of umbrellas, raincoats in rainy season, woollen clothes in winter
season and cotton garments or cold drinks in summer season are
example of time series which show seasonal variations due to climatic
reasons. The sales and prices of jewellery items in marriage season,
crackers in Diwali or sweets during a festival are examples of seasonal
variations due to local customs.
The seasonal variations are periodic and regular. The knowledge of
such seasonal fl uctuations is useful both for the producer from his
business point of view and the customer too. The seasonal variations
are denoted by S.
2) Cyclical Variations:
The variations that are observed in a time series at a regular interval
of time with period m ore than a year are called as cyclical variations .
Business cycles are examples of such cyclical variations. Here the
variations are regular but they may not be periodic. Economic
variables show regular ups and downs over a long period of time.
These varia tions may differ in intensity, length of the period.
Generally there are four phases of a business cycle as shown below:
munotes.in

Page 165

165 Time Series

(i) The first phase is called as Boom which represents prosperity.
(ii) Then the next phase is of Recession which represents
decline. (iii) After recession it is depression , which is the lowest
peak of the times series. (iv) The final phase is of recovery .
These phases as mentioned in the beginning are regular but the
period of a cycle can vary from two years to t en years or even
more in some cases.
The knowledge of cyclic variations is important for a
businessman to plan his activity or design his policy for the
phase of recession or depression. But one should know that the
factors affecting the cyclical variation s are quite irregular,
difficult to identify and measure. The cyclical variations are
denoted by C.
3) Irregular Variations:
The variations which occur due random factors are called as Irregular
Variations . Natural Calamities like earthquake, flood, famin es or man
made calamities like war, strikes are the major factors responsible for
the irregular variations. The occurrence of these variations cannot be
predicted and are not bounded by any interval of time. Hence it is
difficult to identify and isolate su ch variations in the time series. The
irregular variations are denoted by I.
9.4 MODELS FOR ANALYSIS OF TIME SERIES:
As seen above the times series is affected by four components: (i) Secular
trend ( T), (ii) Seasonal variations ( S), (iii) Cyclic variatio ns (C) and (iv)
Irregular variations ( I). The dependent variable can be decomposed into
these components using different mathematical models as discussed below:
Additive Model: This model assumes that all the components are
independent of each other and th eir effects on the times series are additive.
Symbolically, it can be represented as:
Yt = T + S + C + I
In practice however, the components may be dependent on each other.
munotes.in

Page 166

166 Business Statistics
166 Multiplicative Model: This model assumes that the components depend on
each other. Symbolically, it can be represented as:
Yt = T x S x C x I
It is also assumed that the geometric mean of the S, C and I is less than or
equal to one. i.e. x x 1SCI d.
Though most of the business and economic series follow the multiplicativ e
model, there are mixed models which are also considered wherein one or
more of the components are assumed to dependent or independent of the
rest. For example:
(i) Yt = T x S x C + I (ii) Yt = T + S x C x I (iii) Yt = T + S + C x I
All the above menti oned models can be used to determine one unknown
value if the remaining values are known.
9.5 METHODS TO FIND TREND:
There are various methods to find the trend. The major methods are as
mentioned below:
1. Free Hand Curve.
2. Method of Semi – Averages.
3. Method of Moving Averages.
4. Method of Least Squares.
As per the cope of our syllabus we will restrict our discussion to the method
of moving averages and least squares.
9.6 METHOD OF MOVING AVERAGES
This is a simple method in which we take the arithmet ic averages of the
given times series over a certain period of time. These averages move over
that period and are hence called as moving averages. The time interval for
the averages is taken as 3 years, 4 years or 5 years and so on. The averages
are thus c alled as 3 yearly, 4 yearly and 5 yearly moving averages. The
moving averages are useful in smoothing the fluctuations caused to the
variable. Obviously large the time interval of the average more is the
smoothing. We shall study the odd yearly (3 and 4) m oving averages first
and then the 4 yearly moving averages.
Odd Yearly Moving Averages:
In this method the total of the values in the time series is taken for the given
time interval and is written in front of the middle value. The average so
taken is also written in front of this middle value. This average is the trend
for that middle year. The process is continued by replacing the first value
with the next value in the time series and so on till the trend for the last
middle value is calculated. Let us un derstand this with examples: munotes.in

Page 167

167 Time Series Example 1: Determine the trend for the following data using 3 yearly
moving averages. Plot the graph of actual time series and the trend values.

Solution: The time series is divided into overlapping groups of three years,
their 3 yearly totals and averages are calculated as shown in the following
table:

To plot the graph of the actual time series the actual data is plotted and
joined by a straight line. Then the trend values (3 -yearly moving averages)
for the corresponding years are plotted and joined by a dotted line to mark
the difference. The graph of the actual time series and the trend values is as
shown below:
Example 2: Determine the trend of the following time series using 5 yearly
moving averages. Plot the graph of the actual time series and trend values.

Solution: The time series is divided into overlapping groups of five years,
their 5 yearly totals and averages are calculated as shown in the following
table:
The graph of the actual time series and the trend values is as shown below:
munotes.in

Page 168

168 Business Statistics
168
Remark :
1. In case of the 3 -yearly moving averages, the total and average for the
first and the last year in the time series is not calculated. Thus, the
moving average of the first and the last year in the series cannot be
computed.
2. In case of the 5 -yearly moving averages, the total and average for the
first two and the last two years in the time series is not calculated.
Thus, the moving average of the first two and the last two years in the
series cannot be computed.
3. To find the 3 -yearly total (or 5 -yearly total) for a particular year, you
can subtract the first value and add the next value from the previous
year’s total, so as to save your time!
Even yearly moving averages
In case of even yearly moving ave rages the method is slightly different as
here we cannot find the middle year of the four years in consideration. Here
we find the total for the first four years and place it between the second and
the third year value of the variable. These totals are aga in summed into
groups of two, called as centered totals and is placed between the two totals.
The 4 -yearly moving average is found by dividing these centered totals by
8.
Let us understand this method with an example.
Example 3: Calculate the 4 yearly movi ng averages for the following data:

Solution: The table of calculation is shown below. Students should leave
one line blank after every year to place the cen tered totals in between two
years.
munotes.in

Page 169

169 Time Series

9.7 MERITS AND DEMER ITS OF MOVING AVERAG E
METHOD
Merits
(1) The method of finding moving averages is very simple and the
plotting of trend values is also easy.
(2) There are no mathematical formulae for calculating the trend.
(3) It is more rigidly defined as compared with the free hand curve and
semi – averages methods.
(4) In case of cyclical variations, if the period of moving averages and the
period of cycle are same then the variations are reduced and in some
cases completely eliminated.
(5) It is possible to isolate the fluctuations due to seasonal, cyclical and
irregular components using the moving averages method.
Demerits
(1) As mentioned in t he remark above, there are no trend values for two
or more than two years of the time series.
(2) The method fails in case of non – linear trend.
(3) The trend values obtained from this method do not give any
mathematical relationship between time and the variable into
consideration. Thus, it is not possible to forecast on the basis of the
trend values so obtained.

munotes.in

Page 170

170 Business Statistics
170 9.8 METHOD OF LEAST SQUARES:
In this method, which we have already seen in the last chapter, we obtain a
straight line trend for the given time series. It is assumed that there is a linear
relation between the given bivariate data. This line is called as the line of
best fit. Let y = a + bx be the line of best fit. We know that the formulae to
calculate b and a are as follows: 22()nx y xyb
nx x6 6 6
6 6 and yxabnn66
Odd number of years in the time series
When the number of years in the given time series is odd, for the middle
year we assume the value of x = 0. For the years above the middle year the
values given to x are …, -2, -1 and so on while those after the middle year
are values 1, 2, … and so on.
Even number of years in the time series
When the number of years in the time series is even, then for the upper half
years the values of x are assumed as …, -5, -3, -1. For the lower half years,
the values of x are assumed as 1, 3, 5, … and so on.
One can easily observe that the values assumed in this fashion will make
the total of x to be 0, which simplifies our calculation of the formulae
mentioned above. If we substitute x6= 0 in the above formulae, the new
formulae for computing the coefficient become as follows: 2xyb
x6
6 and yan6
In the table of calculations, now we have to find the values of y6, xy6and 2x6.
Example 4: Fit a straight line trend for the following data giving the annual
profits (in lakhs of Rs.) of a company. Estimate the profit for the year 1999. Year 1992 1993 1994 1995 1996 1997 1998 Profit 30 34 38 36 39 40 44 Solution: Let y = a + bx be the straight line trend.
The number of years is seven, which is odd. Thus, the values of x are taken
as 0 for the middle year 1995, for upper three years as -3, -2, -1 and for
lower three years as 1, 2, 3.
The table of computation is as shown below: munotes.in

Page 171

171 Time Series

From the table: n = 7, xy6= 55, 2x6= 28, y6= 261 ?2xyb
x6
6= 55
28= 1.96 and a = y
n6= 261
7= 37.29
Thus, the straight line trend is y = 37.29 + 1.96 x.
The trend values in the ta ble for the respective years are calculated by
substituting the corresponding value of x in the above trend line equation.
For the trend value for 1992: x = – 3: y1992 = 37.29 + 1.96 ( – 3) = 37.29 –
5.88 = 31.41
Similarly, all the remaining trend values ar e calculated.
(A short – cut method in case of odd number of years to find the remaining
trend values once we calculate the first one, is to add the value of b to the
first trend value to get the second trend value, then to the second trend value
to get th e third one and so on. This is because the difference in the values
of x is 1.)
Estimation :
To estimate the profit for the year 1999 in the trend line equation, we
substitute the prospective value of x, if the table was extended to 1999. i.e.
we put x = 4, the next value after x = 3 for the year 1998. ?y1999 = 37.29 + 1.96 (4) = 45.13 ?the estimated profit for the year 1999 is Rs. 45.13 lakhs.
Example 5: Fit a straight line trend to the following data. Draw the graph of
the actual time series and the trend line. Estimate the sales for the year 2007. Year 1998 1999 2000 2001 2002 2003 2004 2005 Sales in
’000
Rs 120 124 126 130 128 132 138 137
munotes.in

Page 172

172 Business Statistics
172 Solution: Let y = a + bx be the straight line trend.
The number o f years in the given time series is eight, which is an even
number. The upper four years are assigned the values of x as – 7, – 5, – 3, –
1 and the lower four years are assigned the values of x as 1, 3, 5, and 7. Note
that here the difference between the v alues of x is 2, but the sum is zero.
Now, the table of computation is completed as shown below:

From the tab le: n = 8, xy6= 205, 2x6= 168, y6= 1035.
b = 2xy
x6
6= 205
168= 1.22 and a = y
n6= 1035
8= 129.38
Thus, th e straight line trend is y = 129.38 + 1.22 x.
The trend values in the table for the respective years are calculated by
substituting the corresponding value of x in the above trend line equation.
For the trend value for 1998: x = – 7: y1998 = 129.38 + 1.22 ( – 7) = 129.38 –
8.54 = 120.84
Similarly, all the remaining trend values are calculated.
(A short – cut method in case of even number of years to find the remaining
trend values once we calculate the first one, is to add twice the value of b to
the first tr end value to get the second trend value, then to the second trend
value to get the third one and so on. This is because the difference in the
values of x is 2. In this example we add 2 x 1.22 = 2.44)
Estimation : To estimate the profit for the year 2007 in the trend line
equation, we substitute the prospective value of x, if the table was extended
to 2007. i.e. we put x = 11, the next value after x = 9 for the year 2006 and
x = 7 for 2005. ?y2007 = 129.38 + 1.22 (11) = 142.8 ?the estimated sales for the year 2007 is Rs. 1,42,800.
munotes.in

Page 173

173 Time Series Now we draw the graph of actual time series by plotting the sales against
the corresponding year. The period is taken on the X -axis and the sales on
the Y -axis. The points are joined by straight lines. To draw the trend line it
is enough to plot any two points (usually we take the first and the last trend
value) and join it by straight line.
To estimate the trend value for the year 2007, we draw a line parallel to Y -
axis from the per iod 2007 till it meets the trend line at a point say A. From
this point we draw a line parallel to the X -axis till it meets the Y -axis at
point say B. This point is our estimated value of sales for the year 2007. The
graph and its estimated value (graphica lly) is shown below:

From the graph, the estimated value of the sales for the year 2007 is 142
i.e. Rs. 1,42,000 (approximately)
Example 6: Fit a straight line trend to the following data. Estimate the
import for the year 1998. Year 1991 1992 1993 1994 1995 1996 Imports in ’000 Rs 40 44 48 50 46 52
Solution: Here again the period of years is 6 i.e. even. Proceeding similarly
as in the above problem, the table of calculations and the estimation is as
follows:

munotes.in

Page 174

174 Business Statistics
174 From the table: n = 6, xy6= 68, 2x6= 70, y6= 280
b = 2xy
x6
6= 68
70= 0.97 and a = y
n6= 280
6= 46.67
Thus, the straight line trend is y = 46.67 + 0.97 x.
All the remaining trend values are calculated as described in the above
problem.
Estimation : To estimate the imports for the year 1998, we put x = 9 in the
trend line equation. ?y1997 = 46.67 + 0.97 (9) = 55.4 ?the imports for the year 1997 are Rs. 55, 400.
9.9 MEASUREMENT OF S EASONAL TREND USING
SIMPLE AVERAGE
Seasonal variations as we know have a period of less than a year. All the
trend values we have calculated so far have the original time series data with
a period difference of one year. Thus, it is not possible to identify and
eliminate the seasonal variations using these methods. This process of
elimin ation is called as deseasonalization . One of the simple methods used
for this is the simple average method .
Example 7: Compute the seasonal indices for the following data giving the
amount of loan disbursements (in crores of Rs.) of a bank for the years 20 04,
2005 and 2006.

Solution: In this case, when the time series data is give n month wise we
proceed as follows:
The averages of monthly values for all the three years are calculated. For
example, the values for the month of January for the three years are: 22, 23
and 21. The average of these values is 22. In a similar way we find all
averages of monthly values in the column heading SA which stands for
seasonal average.
Then we sum all the averages of monthly values i.e. we find()SA6.
Now, the grand average ( G) is calculated i.e. G = ()
12SA6.
The seasonal index ( SI) is now calculated by the formula: SI = x 100SA
G.
For example, the SA value for January is 22, thus its SI = 22
30.75x 100 = 71.55
munotes.in

Page 175

175 Time Series

The total of all SI’s should be 1200 for this example, but we can see that it
is 1199.79. This can be adjusted by multiplying e ach SI by the factor: 1200
1199.79.
Example 8: Compute the seasonal indices for the following data using
simple average method: Quarters I II III IV 1999 30 34 42 38 2000 32 38 40 42 2001 34 40 44 48 2002 30 36 40 44 Solution: The proced ure is analogous to that of the previous one.
The quarterly totals are calculated and the seasonal average for the four
years is computed.
The grand average is calculated using the formula: G = ()
4SA6
The Seasonal Index for every quart er is calculated by the formula: SI = SA
Gx 100
The table of calculations is as follows: Quarters I II III IV 1999 30 34 42 38 2000 32 38 40 42 2001 34 40 44 48 2002 30 36 40 44
munotes.in

Page 176

176 Business Statistics
176 Total 126 148 166 172 SA 31.5 37 41.5 43 ()SA6=153 G = 38.25 SI 82.35 96.73 108.5 112.42 400 To find the deseasonalized values of the variable are calculated by the
simple formula as given below:
Deseasonalized value = actual value
corresponding SI x 100.
Let us calculate the de seasonalized values for the problem.
For example 8, the deseasonalized values are as shown below: Quarters I II III IV 1999 36.43 35.15 38.71 33.8 2000 38.86 39.28 36.87 37.36 2001 41.23 41.35 40.55 42.7 2002 36.43 37.22 36.87 39.14 Despite being the simplest method, the method of simple averages is of very
little use. The main reason is that it assumes that there is negligible trend or
cylical or irregular component in the given time series, which is not
practically true. Hence it is not useful for f orecasting in business and
economical time series data.
9.10 LET US SUM UP
In this chapter we have learn:
• That data can be easy to understand with respect to time.
• To study time series with respect to trend value.
• Moving average method to calculate trend value.
• Least square methods to find trend value and predicate the future
value.
• To calculate Seas onal indices.
9.11 UNIT END EXERCI SES
1. Define time series and time series analysis.
2. What are the objectives of a time series analysis?
3. Write a short note on short term fluctuations in a time series.
4. What are the different components of a time s eries? Explain with
suitable examples.
5. Write a short note on the different phases of a business cycle.
6. Give the merits and demerits of moving average method. munotes.in

Page 177

177 Time Series 7. “Longer the period of moving average, better is the trend value
obtained”, is this statem ent correct? Justify your answer.
8. Write a short not on the method of least squares.
9. Explain the simple average method to find the seasonal indices of a
time series.
10. Define deseasonalization. What is its significance?
11. Determine the trend for t he following data using 3 yearly moving
averages. Plot the graph of actual time series and the trend values.

12. Determine the trend for the following data using 3 yearly moving
averages. Plot the graph of actual time series and the trend values.

13. Determine the trend for the following data using 5 yearly moving
averages. Plot the graph of actual time series and the trend values.

14. Determine the trend for the following data giving the production of
steel in millio n tons, using 5 yearly moving averages. Plot the graph
of actual time series and the trend values.

15. Determine the trend for the following data giving the production of
wheat in thousand tons from the year 1980 to 1990, using the 5 –
yearly moving averages. Plot the graph of the actual time series and
the trend values.

16. Determine the trend for the following data giving the income (in
million dollars) from the export of a product from the year 1988 to
1999. Use the 4 – yearly moving averages method and plot the graph
of ac tual time series and trend values.
munotes.in

Page 178

178 Business Statistics
178
17. Using the 4 – yearly moving average method find the trend for the
following data:

18. Determine the trend for the following data giving the sales (in ’00 Rs.)
of a product per week for 20 weeks. Use appropriate moving average
method.

19. An online marketing company works 5 -days a week. The day -to-day
total sales (in ’000 Rs.) of their products for 4 weeks are given below.
Using a proper moving average method find the trend values.

20. Fit a straight line trend to the followi ng data. Draw the graph of the
actual time series and the trend line. Estimate the sales for the year
2001.

21. Fit a straight line trend to the follow ing data. Draw the graph of the
actual time series and the trend line. Estimate the sales for the year
2007.

munotes.in

Page 179

179 Time Series 22. Fit a straight line trend for the following da ta giving the number of
casualties (in hundreds) of motorcyclists without helmets. Estimate
the number for the year 1999.

23. Fit a straight line trend to the followi ng data. Draw the graph of the
actual time series and the trend line. Estimate the sales for the year
2002.

24. Fit a straight line trend to the follo wing data giving the price of crude
oil per barrel in USD. Draw the graph of the actual time series and the
trend line. Estimate the sales for the year 2003.

25. The following data shows the quarterly profits (in crores of Rs.) of an
IT company for the years 2002 to 2004. Assuming that the trend is
negligible, calculate the seasonal indices using simple average
method.

26. Assuming that the trend is absent, find the seasonal indices for the
following data and also find the deseasonalized values.

munotes.in

Page 180

180 Business Statistics
180 27. Assuming that the trend is absent, find the seasonal indices for the
following data and also find the deseasonalized values.

Multiple choi ce questions:
1) A time series consists of:
a) Short -term variations b) Long -term variations
c) Irregular variations d) All of the above
2) The method of moving average is used to find the:
a) Secular trend b) Seasonal variation
c) Cyclical variation d) Irregular variation.
3) Value of b in the trend line Y = a + bX is:
a) Always negative b) Always positive
c) Always zero d) Both negative and positive
4) A time series has:
a) Two components b) Three components
c) Four components d) Five components
5) For odd number of year, formula to code the values of X by taking
origin at Centre is:
a) X = year – average of years b) X = year – first year
c) X = year – last yea r d) X = year – ½ average of years
9.12 LIST OF REFERENCES
• Fundamentals of mathematical Statistics by S.C. Gupta and V.K
Kapoor.
• Basic Statistics by B. L. Agrawal.
7777777
munotes.in

Page 181

181 10
INDEX NUMBER
Unit Structure
10.0 Objectives:
10.1 Introduction:
10.2 Importance of index numbers
10.3 Price index numbers
10.3.1 Simple (unweighted) price index no. by aggregative method
10.3.2 Simple (unweighted) price index number by average of pric e
relatives method
10.3.3 Weighted index numbers by aggregative method
10.3.4 Weighted index numbers using average of price relatives
method
10.4 Cost of living index number or consumer price index number
10.5 Chain base index number
10.6 Deflating, spl icing, shifting of base year
10.7 Problems in constructing index numbers
10.8 Demerits of index numbers
10.9 Let us sum up
10.10 Unit end exercises
10.11 List of references
10.0 OBJECTIVES:
After going through this chapter you will able to know:
• Definition o f index number and its important.
• Types of index numbers.
• The different methods of construction of index number.
• Different applications of index number.
10.1 INTRODUCTION:
Every variable undergoes some changes over a period of time or in different
regions or due to some factors affecting it. These changes are needed to be
measure. In the last chapter we have seen how a time series helps in
estimating the value of a variable in future. But the magnitude of the
changes or variations of a variable, if known, a re useful for many more
reasons. For example, if the changes in prices of various household
commodities are known, one can plan for a proper budget for them in munotes.in

Page 182

182 Business Statistics
182 advance. If a share broker is aware of the magnitude of fluctuations in the
price of a particula r share or about the trend of the market he can plan his
course of action of buying or selling his shares. Thus, we can feel that there
is a need of such a measure to describe the changes in prices, sales, profits,
imports, exports etc, which are useful fo r a common man to a business
organization.
Index number is an important statistical relative tool to measure the changes
in a variable or group of variables with respect to time, geographical
conditions and other characteristics of the variable(s). Index number is a
relative measure, as it is independent of the units of the variable(s)
taken in to consideration. This is the advantage of index numbers over
normal averages. All the averages which we studied before are absolute
measures, i.e. they are expres sed in units, while index numbers are
percentage values which are independent of the units of the variable(s). In
calculating an index number, a base period is considered for comparison
and the changes in a variable are measured using various methods.
Thou gh index numbers were initially used for measuring the changes in
prices of certain variables, now it is used in almost every field of physical
sciences, social sciences, government departments, economic bodies and
business organizations. The gross nationa l product (GNP), per capita
income, cost of living index, production index, consumption, profit/loss etc
every variable in economics uses this as a tool to measure the variations.
Thus, the fluctuations, small or big, in the economy are measure by index
numbers. Hence it is called as a barometer of economics.
10.2 IMPORTANCE OF I NDEX NUMBERS
The important characteristics of Index numbers are as follows:
1. It is a relative measure : As discussed earlier index numbers are
independent of the units of the varia ble(s), hence it a special kind of
average which can be used to compare different types of data
expressed in different units at different points of time.
2. Economical Barometer : A barometer is an instrument which measure
the atmospheric pressure. As the i ndex numbers measure all the ups
and downs in the economy they are hence called as the economic
barometers.
3. To generalize the characteristics of a group : Many a times it is
difficult to measure the changes in a variable in complete sense. For
example, i t is not possible to directly measure the changes in a
business activity in a country. But instead if we measure the changes
in the factors affecting the business activity, we can generalize it to
the complete activity. Similarly the industrial production or the
agricultural output cannot be measured directly.
4. To forecast trends : Index numbers prove to be very useful in
identifying trends in a variable over a period of time and hence are
used to forecast the future trends. munotes.in

Page 183

183 Index Number 5. To facilitate decision makin g: Future estimations are always used for
long term and short term planning and formulating a policy for the
future by government and private organizations. Price Index numbers
provide the requisite for such policy decisions in economics.
6. To measure th e purchasing power of money and useful in deflating :
Index numbers help in deciding the actual purchasing power of
money. We often hear from our elders saying that “ In our times the
salary was just Rs. 100 a month and you are paid Rs. 10,000, still you
are not happy!” The answer is simple (because of index numbers!)
that the money value of Rs. 100, 30 years before and now is drastically
different. Calculation of real income using index numbers is an
important tool to measure the actual income of an individ ual. This is
called as deflation.
There are different types of index numbers based on their requirement like,
price index, quantity index, value index etc. The price index is again
classified as single price index and composite price index.
10.3 price inde x numbers
The price index numbers are classified as shown in the following diagram:

Notations:
P0: Price in Base Year Q0: Quantity in Base Year
P1: Price in Current Year Q1: Quantity in Current Year
The suffix ‘0’ stands for the base year and the suff ix ‘1’ stands for the
current year.
10.3.1 Simple (unweighted) price index no. by aggregative method:
In this method we define the price index number as the ratio of sum of prices
in current year to sum of prices in base year and express it in percentage.
i.e. multiply the quotient by 100.
Symbolically, I = 1
0P
P6
6x 100. … (1)
Steps for computation:
1. The total of all base year prices is calculated and denoted by 0P6.
2. The total of all current year prices is c alculated and denoted by 1P6.
munotes.in

Page 184

184 Business Statistics
184 3. Using the above formula, simple price index number is computed.
Example 1: From the following data, construct the price index number by
simple aggregative method:
Commodity Unit Price in 1985 1986 A Kg 10 12 B Kg 4 7 C Litre 6 7 D Litre 8 10 Solution: The totals of the 3rd and 4th columns are computed as shown
below:
Commodity Unit Price in 1985(P0) 1986(P1) A Kg 10 12 B Kg 4 7 C Litre 6 7 D Litre 8 10 Total -- 0P6= 28 1P6= 36 ?I = 1
0P
P6
6x 100 = 36
28x 100 = 128.57
Meaning of the value of I: I = 128.57 means that the prices in 1986, as
compared with that in 1985 have increas ed by 28.57 %.
10.3.2 Simple (unweighted) price index number by average of price
relatives method
In this method the price index is calculated for every commodity and its
arithmetic mean is taken. i.e. the sum of all price relative is divided by the
total number of commodities.
Symbolically, if there are n commodities in to consideration, then the simple
price index number of the group is calculated by the formula:
I = 1
01 x 100P
nP§·6¨¸
©¹ … (2)
Steps for computation
1. The price relatives for each commodity are calculated by the formula: 1
0 x 100P
P. munotes.in

Page 185

185 Index Number 2. The total of these price relatives is calculated and denoted as 1
0 x 100P
P§·6¨¸
©¹.
3. The arithmetic mean of the price realtives using the above formula no.
(2) gives the req uired price index number.
Example 2: Construct the simple price index number for the following data
using average of price relatives method:
Commodity Unit Price in 1997 1998 Rice Kg 10 13 Wheat Kg 6 8 Milk Litre 8 10 Oil Litre 15 18 Solution: In t his method we have to find price relatives for every commodity
and then total these price relatives. Introducing the column of price relatives
the table of computation is as follows:
Commodity Unit Price in 1
0 x 100P
P 1997(P0) 1998(P1) Rice Kg 10 13 130 Wheat Kg 6 8 133.33 Milk Litre 8 10 125 Oil Litre 15 18 120 Total: 508.33
Now, n = 4 and the total of price relatives is 508.33
1
01 x 100PInP§·? 6¨¸
©¹= 508.33
4= 127.08
The price s in 1998 have increased by 27 % as compared with in 1997.
Remark :
1. The simple aggregative method is calculated without taking in to
consideration the units of individual items in the group. This may give
a misleading index number.
2. This problem is ove rcome in the average of price relatives method as
the individual price relatives are computed first and then their average
is taken. munotes.in

Page 186

186 Business Statistics
186 3. Both the methods are unreliable as they give equal weightage to all
items in consideration which is not true practically .
10.3.3 Weighted index numbers by aggregative method:
In this method weights assigned to various items are considered in the
calculations. The products of the prices with the corresponding weights are
computed; their totals are divided and expressed in p ercentages.
Symbolically, if W denotes the weights assigned and P0, P1 have their usual
meaning, then the weighted index number using aggregative method is
given by the formula:
I = 1
0PW
PW6
6x 100 … (3)
Steps to find weighted index num ber using aggregative method
1. The columns of P1W and P0W are introduced.
2. The totals of these columns are computed.
3. The formula no. (3) is used for computing the required index number.
Example 3: From the following data, construct the weighted price index
number: Commodity A B C D Price in 1982 6 10 4 18 Price in 1983 9 18 6 26 Weight 35 30 20 15 Solution: Following the steps mentioned above the table of computations
is as shown below:
Commodity Weight (W) Price in 1982
(P0) P0W Price in 1983
(P1) P1W A 35 6 210 9 315 B 30 10 300 18 540 C 20 4 80 6 120 D 15 18 270 26 390 Total - - 0PW6
= 860 - 1PW6 =
1365
Using the totals from the table, we have
Weighted Index Number I = 1
0PW
PW6
6x100 = 1365
860x 100 = 158.72 munotes.in

Page 187

187 Index Number Remark : There are different formulae based on what to be taken as the
weight while calculating the weighted index numbers. Based on the choice
of the weight we are going to study here four types of weighted index
numbers: (1) Laspeyre’s Index Number, (2) Paasche’s Index Number, (3)
Fisher’s Index Number and (4) Kelly’s Index Numbers.
(1) Laspeyre’s Index Number :
In this method Laspeyre assumed the base quantity ( Q0) as the weight
in constructing the index nu mber. Symbolically, P0, P1 and Q0 having
their usual meaning, the Laspeyre’s index number denoted by IL is
given by the formula: 10
00LPQIPQ6 6x 100 … (4)
Steps to compute IL:
1. The columns of the products P0Q0 and P1Q0 are introduced.
2. The totals of these columns are computed.
3. Using the above formula no. (4), IL is computed.
Example 4: From the data given below, construct the Laspeyre’s
index number:
Commodity 1965 1966 Price Quantity Price A 5 12 7 B 7 12 9 C 10 15 15 D 18 5 20 Solution: Introducing the columns of the products P0Q0 and P1Q0, the
table of computation is completed as shown below:
Commodity 1965 1966 P0Q0 P1Q0 Price (P0) Quantity( Q0) Price (P1) A 5 12 7 60 84 B 7 12 9 84 108 C 10 15 15 150 225 D 18 5 20 90 100 Total - - - 00PQ6 = 384 10PQ6= 517 munotes.in

Page 188

188 Business Statistics
188 Using the totals from the table and substituting in the formula no. (4),
we have 10
00517 x 100384LPQIPQ6 6x 100 = 134.64
(2) Paasche’s Index Number :
In this method, Paasch as sumed the current year quantity ( Q1) as the
weight for constructing the index number. Symbolically, P0, P1 and
Q1 having their usual meaning, the Paasche’s index number denoted
by IP is given by the formula: 11
01PPQIPQ6 6x 100 … (5)
The steps for computing IP are similar to that of IL.
Example 5: From the data given below, construct the Paasche’s index
number:
Commodity 1985 1986 Price Price Quantity A 5 8 10 B 10 14 20 C 6 9 25 D 8 10 10 Solution: Introducing the columns of the product s 01PQand11PQ, the
table of computations is completed as shown below:
Commodity 1985 1986 P0Q1 P1Q1 Price (P0) Price (P1) Quantity (Q1) A 5 8 10 50 80 B 10 14 20 200 280 C 6 9 25 150 225 D 8 10 10 80 100 Total - - - 6P0Q1
=
480 6P1Q1
=
685
Using the totals from the table and substituting in the formula no. (5),
we have munotes.in

Page 189

189 Index Number 11
01PPQIPQ6 6 x 100 = 685
480 x 100 = 142.71
(3) Fisher’s Index Number :
Fisher developed his own method by using the formulae of Laspeyre
and Paasche. He defined the index number as the geometric mean of
IL and IP. Symbolically, the Fisher’s Index number denoted as IF is
given by the formula: IF = x LPII= 10 11
00 01 x PQ PQ
PQ PQ66
66 x 100.. (6)
Note :
1. The multiple 100 is outside the square root sign .
2. While computing products of the terms, care should be taken to
multiply corresponding numbers properly.
Example 6: From the following data given below, construct the (i)
Laspeyre’s index number, (ii) Paasche’s index number and hence (iii)
Fisher’s index number.
Item 1975 1976 Price Quantity Price Quantity A 4 12 6 16 B 2 16 3 20 C 8 9 11 14 Solution: Introducing four columns of th e products of P0Q0, P0Q1,
P1Q0 and P1Q1, the table of computations is completes as shown
below:
From the table, we hav e 6P0Q0 = 152, 6P0Q1 = 216, 6P1Q0 = 219
and 6P1Q1 = 310 ?IL = 10
00PQ
PQ6
6x 100 = 219
152x 100 = 1 44.08 ?IP = 11
01PQ
PQ6
6x 100 = 310
216x 100 = 143.52 ?IF = x LPII = 144.08 x 143.52 = 143.8 Item P0 Q0 P1 Q1 P0Q0 P0Q1 P1Q0 P1Q1 A 4 12 6 16 48 64 72 96 B 2 16 3 20 32 40 48 60 C 8 9 11 14 72 112 99 154 Total 152 216 219 310 munotes.in

Page 190

190 Business Statistics
190 Remark :
1. Laspeyre’s index number though popula r has a drawback that it does
not consider the change in consumption over a period. (as it does not
take into account the current quantity).
2. Paasche’s index number overcomes this by assigning the current year
quantity as weight.
3. Fisher’s index number being the geometric mean of both these index
numbers, it considers both the quantities. Hence it is called as the ideal
index number.
(4) Kelly’s Index Number :
In this method, Kelly assigns the average of both the quantities as the
weight for constructing the index number. Symbolically, P0, P1, Q0
and Q1 having their usual meaning, the Kelly’s index number denoted
by IK is given by the formula:
IK = 1
0PQ
PQ6
6 x 100, where Q = 01
2QQ … (7)
Steps to find IK:
1. The ave rage of both the quantities is computed.
2. The column products of the type 1PQand 0PQare introduced.
3. The corresponding column totals are computed.
4. Using the formula no. (7), the required index number IK is
computed.
Example 7: From the following data given below, construct the
Kelly’s index number:
Item Base Year Current Year Price Quantity Price Quantity A 18 20 24 22 B 9 10 13 16 C 10 15 12 19 D 6 13 8 15 E 32 14 38 18 Solution: Introducing the columns of Q = 01
2QQ, P0Q and P1Q, the
table of computations is completed as shown blow: Item Q0 Q1 Q P0 P0Q P1 P1Q A 20 22 21 18 378 24 504 B 10 16 13 9 117 13 169 C 15 19 17 10 170 12 204 munotes.in

Page 191

191 Index Number D 13 15 14 6 84 8 112 E 14 18 16 32 512 38 608 1261 1597 From the table, we have 6P0Q = 1261 and 6P1Q = 1597
?IK = 1
0PQ
PQ6
6x 100 = 1597
1261x 100 = 126.65
10.3.4 Weighted index n umbers using average of price relatives
method:
This is similar to what we have seen in subsection 7.3.2. Here the individual
price relatives are computed first. These are multiplied with the
corresponding weights. The ratio of the sum of the products and the total
value of the weight is defined to be the weighted index number.
Symbolically, if W denotes the weights and I denote the price relatives then
the weighted index number is given by the formula: IW
W6
6 … (8)
One of the important weighted index number is the cost of living index
number, also known as the consumer price index (CPI) number.
10.4 COST OF LIVING INDEX NUMBER OR
CONSUMER PRICE INDEX NUMBER:
There are two methods for constructing this index number: (1) Aggregative
expe nditure method and (2) Family Budget Method
(1) In aggregative expenditure method we construct the index number by
taking the base year quantity as the weight. In fact this index number
is nothing but the Laspeyre’s index number.
(2) In family budget metho d, value weights are computed for each item
in the group and the index number is computed using the formula:
IW
W6
6, where I = 1
0P
Px 100 and W = P0Q0 … (9)
Example 8: A survey of families in a city reveal ed the following
information:
Item Food Clothing Fuel House Rent Misc. % Expenditure 30 20 15 20 15 Price in 1987 320 140 100 250 300 munotes.in

Page 192

192 Business Statistics
192 Price in 1988 400 150 125 250 320 What is the cost of living index number for 1988 as compared to that
of 1987?
Soluti on: Here % expenditure is taken as the weight ( W). The table
of computations are completed as shown below:
Item P0 P1 I = 1
0P
Px 100 % Expenditure
(W) IW Food 320 400 125 30 3750 Clothing 140 150 107.14 20 2142.8 Fuel 100 125 125 15 1875 House Rent 250 250 100 20 2000 Miscellaneous 300 320 106.67 15 1600.05 Total W6=100 11367.85
From the table, we have W6= 100 and IW6= 11376.85
?cost of living index number = IW
W6
6= 11367.85
100= 113.68
Use of Cost of living index numbers
1. These index numbers reflect the effect of rise and fal l in the
economy or change in prices over the standard of living of the
people.
2. These index numbers help in determining the purchasing power
of money which is the reciprocal of the cost of living index
number.
3. It is used in deflation. i.e. determinin g the actual income of an
individual. Hence it also used by the management of
government or private organizations to formulate their policies
regarding the wages, allowance to their employees.
10.5 CHAIN BASE INDE X NUMBER:
In constructing an index number, the reference year called as the base year
may be fixed or changing. The indices which are computed using a fixed
base are called as Fixed Base Indices (FBI) and the indices which are
compute using a changing base are called as Chain Base Indices (CBI). Th e
CBI’s are computed by taking every time the index of the previous year as munotes.in

Page 193

193 Index Number the base year. These are called as link relatives . The chain index is now
calculated by the formula:
Chain Index for a year = link relative of the year x chain index of previous year
100
… (10)
Conversion F rom Fixed Base To Chain Base
A FBI is converted to a CBI by the following formula:
CBI of a year = FBI of the year x 100
FBI of previous year
Conversion From Chain Base to Fixed Base
A CBI is converted to a FBI by the following formula:
FBI of a year = CBI of the year x FBI of previous year
100 … (11)
Example 9: From the following data which gives the prices of rice per
quintal from 1985 to 190, construct index numbers by (i) fixed base 1985
and (ii) chain base method: Year 1985 1986 1987 1988 1989 1990 Price 45 50 65 78 84 90 Solutio n: (i) Fixed base as 1985.
We assign the index number 100 to the year 1985. Now, the table of
computations is completed as shown below:

Year Price Index Number 1985 45 100 1986 50 50
45x 100 = 111.11
1987 65 65
45x 100 = 144.44
1988 78 78
45x 100 = 173.33
1989 84 84
45x 100 = 186.66
1990 90 90
45x 100 = 200 munotes.in

Page 194

194 Business Statistics
194 (ii) Chain Base Method:

Remark : The fixed base index (FBI) is same as the chain base index
(CBI) when th ere is only one item in consideration.
10.6 DEFLATING, SPLI CING, SHIFTING OF BA SE YEAR:
Shifting Of Base Year
The base year chose in constructing an index number will be become
outdated after a long period of time. Also we may require sometimes
comparing a given series with another but having a different base year. This,
both these reasons leads us to the process of shifting of the base year . In this
process we obtain the new index numbers by multiplying the previous
indices with a common factor of 100
i, where i is the index of the new base
year.
Example 10 : For the following data of price index numbers with base
year 1970, shift the base to 1975.

Solution: Every price index is multiplied by the common factor of 100
Index of new base year= 100
130

Year Price Link Relative Chain Index Number 1985 45 100 100 1986 50 50
45x 100 = 111.11 111.11 x 100
100 = 111.11
1987 65 65
50x 100 = 130 130 x 111.11
100= 144.44
1988 78 78
65x 100 = 120 120 x 144.44
100= 173.33
1989 84 84
78x 100 = 107.7 107.7 x 173.33
100 = 186.66
1990 90 90
84x 100 = 107.14 107.14 x 186.66
100 = 200
munotes.in

Page 195

195 Index Number The table of shifting th e base year to 1975 is as follows: Year Old price Index New Price Index 1970 100 100
130x 100 = 77
1971 110 100
130x 110 = 84
1972 115 100
130x 115 = 88
1973 125 100
130 x 125 = 96
1974 120 100
130x 120 = 92 1975 130 100 1976 140 100
130x 140 = 108
1977 135 100
130x 135 = 104
1978 150 100
130x 150 = 115
1979 170 100
130x 170 = 13 1
Splicing
The procedure of combining two or more overlapping series and revising
the index numbers to get a continuous series of index numbers is called as
splicing . In practice, in a series of data the base year considered may not be
relevant after a c ertain period and is replaced with another year as the base.
But the continuity has to be maintained regarding the time series. In such
cases, splicing is a very helpful tool.
We find a common factor of 100i, where i : stands for the index number of
the overlapping period from where the continuity is to be formed. This
common factor is then multiplied (or divided depending upon which series
is to be spliced) to all the remaining series where the splicing is demanded.
Example 11: For t he following data, splice the index B to index A to obtain
up-to-date continuous index numbers: Year 1972 1973 1974 1975 1976 Index A 100 120 130 135 150 Year 1976 1977 1978 1979 1980 Index B 100 110 130 135 145 Solution: Using the formula for splicing an index number the table of
computation is completed as shown below: munotes.in

Page 196

196 Business Statistics
196 Deflating
As discussed earlier in this chapter, index numbers are very useful in finding
the real income of an individu al or a group of them, which facilitates the
different managements to decide their wage policies. The process of
measuring the actual income vis -a-vis the changes in prices is called as
deflation .
The formula for computing the real income is as follows:
Real Income of a year = Income for the year
Price Index of that yearx 100
Example 12
Calculate the real income for the following data: Year 1990 1991 1992 1993 1994 1995 Income in Rs. 800 1050 1200 1600 2500 2800
Price Index 100 105 115 125 130 140
Solution: The real inc ome is calculated by the formula:
Real income = Income for the year
price index of that yearx 100
The table of computation of real income’s is completed as shown below:
Year Income in Rs. Price Index Real Income 1990 800 100 800 1991 1050 105 1050
105x100 = 1000 Year Index A Index B Splicing of B to A 1972 100 1973 120 1974 130 1975 135 1976 150 100 150 x 100
100= 150
1977 110 150 x 110
100= 165
1978 130 150 x 130
100= 195
1979 135 150 x 135
100= 202.5
1980 145 150 x 145
100= 217.5
munotes.in

Page 197

197 Index Number 1992 1200 115 1200
115x 100 = 1043
1993 1600 125 1600
125x 100 = 1280
1994 2500 130 2500
130x 100 = 1923
1995 2800 140 2800
140x 100= 2000

10.7 PROBLEMS IN CONSTRUC TING INDEX
NUMBERS:
There are various problems in constructing an index number. Some of them
are discussed below with their remedies:
(1) Purpose of an index number : Any activity to be performed requires a
pre defined purpose, so do th e construction of an index number.
Depending upon the requirement a suitable index number can be
constructed and the necessary procedure is followed. The procedure
involves selecting the supportive data, its base year and type of index
number to be used.
(2) Selection of a suitable base year : As mentioned earlier, the
construction of index number is done with a reference year called as
base year. Care should be taken that the base year selected is not the
year of any irregular variations in the variable. A nother important
point is that the base year selected, should be contemporary with the
available data. In absence of this, the comparison is invalid, as a long
period forces appreciable changes in the tastes, customs and habits of
people.
(3) Selection of data: For constructing an index number, all the items of
a group whose change in prices is to be represented need not be taken.
A reasonably large (this is a relative term here) sample of items is
enough. The sample should not be too large so that it affe ct the cost
and time constraints and should not be so small that it does not
represent the qualities of the group under consideration. A
standardized sample covering all varieties serves our purpose.
(4) Obtaining Price quotations: As the sample contains d ifferent varieties
from different places, their prices also vary. So it becomes difficult to
obtain the price quotations. This is overcome by appointing an
unbiased and reliable agency to quote the prices.
(5) Selection of proper average: An index number i s also a type of
average. Generally arithmetic mean and geometric mean are used as
to construct index numbers. Geometric mean though ideal than
arithmetic mean is not a popular average because of its rigorous
calculations. munotes.in

Page 198

198 Business Statistics
198 10.8 DEMERITS OF INDEX NU MBERS
(1) There are numerous types and methods of constructing index
numbers. If an appropriate method is not applied it may lead to wrong
conclusions.
(2) The sample selection may not be representative of the complete series
of items.
(3) The base period select ion also is personalized and hence may be
biased.
(4) Index number is a quantitative measure and does not take into account
the qualitative aspect of the items.
(5) Index numbers are approximations of the changes, they may not
accurate.
10.9 LET US SUM UP :
In this chapter we have learn:
10.10 UNIT END EXERCISES
1. Define Index Numbers.
2. Write a short note on the importance of Index Numbers.
3. “Index Numbers are the Economical barometers”. Discuss this
statement with examples.
4. Discuss the steps to co nstruct Index Numbers.
5. What are the problems in constructing an Index Number?
6. Define the terms: (i) Deflation, (ii) Splicing and (iii) Shifting of Base
Year. Write short notes on their significance.
7. Define Cost of Living Index Number and explain i ts importance.
8. What do you mean by (i) Chain Based Index Number and (ii) Fixed
Base Index Number? Distinguish between the two.
9. Define (i) Laspeyre’s Index Number, (ii) Paasche’s Index Number
and (iii) Fisher’s Index Number. What is the difference bet ween the
three? Which amongst them is called as the ideal Index Number?
Why?
10. What are the demerits of Index Numbers?
11. From the following data, construct the price index number by simple
aggregative method: munotes.in

Page 199

199 Index Number Commodity Unit Price in 1990 1991 A Kg 14 18 B Kg 6 9 C Litre 5 8 D Litre 12 20
12. From the following data, construct the price index number for 1995,
by simple aggregative method, with 1994 as the base:
Commodity Unit Price in 1994 1995 Rice Kg 8 10 Wheat Kg 5 6.5 Oil Litre 10 13 Eggs Dozen 4 6 13. From the following data, construct the price index number for 1986,
by average of price relatives method:
Commodity Unit Price in 1985 1986 Banana Dozen 4 5 Rice Kg 5 6 Milk Litre 3 4.5 Slice Bread One Packet 3 4 14. From the fo llowing data, construct the price index number, by method
of average of price relatives:
Commodity Unit Price in 1988 1990 A Kg 6 7.5 B Kg 4 7 C Kg 10 14 D Litre 8 12 E Litre 12 18 munotes.in

Page 200

200 Business Statistics
200 15. From the following data, construct the price index number for 1998,
by (i) simple aggregative method and (ii) simple average of price
relatives method, with 1995 as the base:
Commodity Unit Price in 1995 1998 Rice Kg 12 14 Wheat Kg 8 10 Jowar Kg 7 9 Pulses Kg 10 13 16. From the following data, construct the w eighted price index number: Commodity A B C D Price in 1985 10 18 36 8 Price in 1986 12 24 40 10 Weight 40 25 15 20 17. From the following data, construct the index number using (i) simple
average of price relatives and (ii) weighted average of price r elatives:
18. From the data given below, construct the Laspeyre’s index number:
Commodity 1975 1976 Price Quantity Price A 5 10 8 B 6 15 7.5 C 2 20 3 D 10 14 12 Commodity Weight Price in 1988 1990 Rice 4 8 10 Wheat 2 6 8 Pulses 3 8 11 Oil 5 12 15 munotes.in

Page 201

201 Index Number 19. From the following data given below, construct the (i) Laspeyre’s
index number, (ii) Paasche’s index number and hence (iii) Fisher’s
index number.
Commodity 1980 1990 Price Quantity Price Quantity A 6 15 9 21 B 4 18 7.5 25 C 2 32 8 45 D 7 20 11 29 20. From the following data given below, construct the (i) Laspeyre’s
index number, (ii) Paasche’s index number and (iii) Fisher’s index
number.
Commodity Base Year Current Year Price Quantity Price Quantity Cement 140 200 167 254 Steel 60 150 95 200 Coal 74 118 86 110 Limestone 35 50 46 60 21. From the following data given below, construct the Kelly’s index
number:
Commodity Base Year Current Year Price Quantity Price Quantity A 2 8 4 14 B 6 14 7 20 C 8.5 10 12 15 D 14 8 19 12 E 22 60 38 85 22. From the following data, construct the aggregative price index
numbers by taking the average price of the three years as base. Commodity Price in 1980 Price in 1981 Price in 1982 A 10 12 16 B 16 19 25 C 5 7 10 munotes.in

Page 202

202 Business Statistics
202 23. From the follo wing data, construct the price index number by taking
the price in 1978 as the base price: Commodity Price in 1978 Price in1979 Price in 1980 A 16 18 24 B 4 6 7.5 C 11 15 19 D 20 28 30 24. For the following data if the Laspeyre’s Index number is 133.6 , then
find the missing quantity:
Commodity Base Year Current Year Price Price Quantity A 10 12 14 B 16 ? 20 C 5 15 8 25. For the following data if the value of IL = 130.19 and IP = 142.86,
find the missing quantities:
26. From the following data, construct (i) IL, (ii) IP, (iii) IF and (iv) IK
Commodity 1969 1970 Price Quantity Price Quantity Rice 2 10 3 12 Wheat 1.5 8 1.9 10 Jowar 1 6 1.2 10 Bajra 1.2 5 1.6 8 Pulses 4 14 6 20 27. Construct the cost of living index number for 1980 using the Family
Budget Method: Commodity 1980 1990 Price Quantity Price Quantity Oil 12 15 15 20 Milk 6 10 8 15 Eggs 5 5 8 10
Item Quantity Price in 1975 1980 A 10 5 7 B 5 8 11 C 7 12 14.5 D 4 6 10 E 1 250 600 munotes.in

Page 203

203 Index Number 28. Construct the cost of l iving index number for the following data with
base year as 1989.

29. A survey of families in a city revealed the following information:
Item Food Clothing Fuel House Rent Misc. % Expenditure 30 20 15 20 15 Price in 1987 320 140 100 250 300 Price in 1988 400 150 125 250 320
What is the cost of living index number for 1988 as compared to that
of 1987?
30. Const ruct the consumer price index number for the following
industrial data:
31. A person spends Rs. 12,000 a month. If th e cost of living index
number is 130, find the amount the person spends on Clothing and
House rent from the following data:

Item Weight Price in 1989 1990 1991 Food 4 45 50 60 Clothing 2 30 33 38 Fuel 1 10 12 13 House Rent 3 40 42 45 Miscellaneous 1 5 8 10
Item Weight Price Index Industrial Production 30 180 Exports 15 145 Imports 10 150 Transportation 5 170 Other activity 5 190 Item Expenditure in ’000 Rs. Index No. Food 3 130 Clothing ? 125 Fuel & Lighting 1 140 House Rent ? 175 Miscellaneous 2 190 munotes.in

Page 204

204 Business Statistics
204 32. From the following data which gives the prices of rice per quintal
from 1975 to 1980, construct index numbers by (i) fixed base 1975
and (ii) chain base method: Year 1975 1976 1977 1978 1979 1980 Price 100 105 120 125 130 140 33. From the following data of price of a commodity from 1990 to 1996
given below, construct the index numbers by (i) taking 1990 as base
and (ii) chain base method: Year 1990 1991 1992 1993 1994 1995 1996 Price 42 48 50 54 55 58 64 34. Construct the fixed base index numbers from the following chain base
index numbers: Year 1967 1968 1969 1970 1971 1972 1973 C.B.I.N. 105 112 128 135 150 140 160 35. For the following data of price index numbers with base year 1970,
shift the base to 1975.

36. The following data is about the price index numbers with base year
1989. Shift the base to 1995.

37. For the following data, splice the index B to index to index A to obtain
up-to-date continuous index numbers: Year 1972 1973 1974 1975 1976 1977 Index A 100 110 125 140 180 225
Year 1977 1978 1979 1980 1981 1982
Index B 100 105 115 120 115 125
38. Calculate the real income for the following data: Year 1988 1989 1990 1991 1992 1993 Income in Rs. 500 550 700 780 900 1150 Price Index 100 110 115 130 140 155
munotes.in

Page 205

205 Index Number 39. Calculate the real income for the fol lowing data: Year 1977 1978 1979 1980 1981 1982 Income in Rs. 250 300 350 500 750 1000 Price Index 100 105 110 120 125 140
40. The per capita income and the corresponding cost of living index
numbers are given below. Find the per capita real income: Year 1962 1963 1964 1965 1966 1967 per capita income 220 240 280 315 335 390 cost of living
I.N. 100 110 115 135 150 160
Multiple Choice questions:
1) A composite index number is a number that measures an
average relative change ___?
a) in single variab le b) in a group of relative variables
with respect to a base
c) in prices of commodities d) Both A & B
2) Which of the following is method of constructing Index Numbers
?
a) Aggregative Method b) Relative Method
c) Both A & B d) None of the Above .
3) Index Numbers may be categorized in terms of ___?
a) variables b) constants
c) numbers d) All of the Above
4) Laspeyre's index = 110, Paasche's index = 108, then Dorbis -Bowley
index is equal to:
a) 110 b) 108 c) 100 d) 109
4) The ra tio of a sum of prices will current period to the sum of prices
ill the base period, expressed as a percentage is called:
a) Simple price index number
b) Simple aggregative price index number
c) Weighted aggregative price index number
d) Quantity index number munotes.in

Page 206

206 Business Statistics
206 5) For a particular set of data, Laspeyre’s index number is 120 and
Paasche’s index number is 125, then its Drobish -Bowley’s index
number is_______.
a) 122 b) 123 c) 123.5 d) 122.5
6) The quantity index number ________ measures changes in le vel of
expenditure.
a) Always b) Sometimes c) Rarely d) Never
7) The Laspeyre's and Paasche's index are examples of:
a) Weighted quantity index only b) Weighted index numbers
c) Aggregate index numbers d) Weighted price index only.
8) The cost of living index number is always
a) Weighted index b) price index
c) quantity index d) None
9) Index number shows _________ changes rather than absolute
amount of charge.
a) relative b) percentage
c) both (a) and (b) d) non e
10) Index number by family budget is Weighted average of price
relative.
a) True b) False
10.7 LIST OF REFEREN CES
• Fundamentals of mathematical Statistics by S.C. Gupta and V.K
Kapoor.
• Basic Statistics by B. L. Agrawal.
7777777munotes.in

Page 207

207 11
PROBABILITY
Unit Structure
11.0 Objectives:
11.1 Introduction:
11.2 Basic concept of probability:
11.3 Probability Axioms:
11.3.1 Addition theorem of probability:
11.4 Let us sum up:
11.5 Unit end Exercises:
11.6 List of References:
11.0 OBJECTIVES:
After going through this unit, you will able to know:
• The basic terminology of probability.
• How real life the concept of probability used.
• Solve basic example on probability.
• Probability axioms.
11.1 INTRODUCTION:
Some time in daily life certain things com e to mind like “I will be success
today’, I will complete this work in hour, I will be selected for job and so
on. There are many possible results for these things but we are happy when
we get required result. Probability theory deals with experiments whos e
outcome is not predictable with certainty. Probability is very useful
concept. These days many field in computer science such as machine
learning, computational linguistics, cryptography, computer vision, robotics
other also like science, engineering, m edicine and management.
Probability is mathematical calculation to calculate the chance of
occurrence of particular happing, we need some basic concept on random
experiment , sample space, and events.
Prerequisites and terminology:
Before we start study of probability, we should know some prerequisites
which are required to study probability. Also we have to discuss some basic
terminology.
Factorial notation: The product of first n natural number is called factorial
of n. munotes.in

Page 208

208 Business Statistics
208 It is denoted by ݊!.
i.e. ݊!=݊×(݊െ1)×(݊െ2)×……×2×1
For e.g. 4!=4×3×2×1=24
This can be written as ݊!=݊×(݊െ1)!
Note: The value of 0!=1.
Fundamental principal of counting:
If a certain thing can be done by m ways and other thing are done by n ways
then the total number of ways in which the two things can be done is
(݉×݊) ways.
For e.g. Suppose you have 2 pants and 3 shirts. How many choices do you
have or how many different ways can you dress?
Then, a tree diagram can be used to show all the choices you can make As
you can see in the diagram you have choice of 3 different colors of shirts to
wear with brown pant. Similarly, with black pant also you have choice of 3
different colors of shirts to wear.
Thus, you have 2 × 3 = 6 choices.
Permutation: A permutation is an ar rangement of all or part of a set of
objects, with regard to the order of the arrangement. For example, suppose
we have a set of three letters: A, B, and C. we might ask how many ways
we can arrange 2 letters from that set. Each possible arrangement would be
an example of a permutation .
ܲ௥௡=௡!
(௡ି௥)! ; r ൑ n (repetition not allowed)
=݊௥ ; r ൑ n (repetition allowed)
For e.g. How many words can be formed using four different alphabets of
word “COMBINE?”
Here, Total numbe r of words in “COMBINE” is 7. Word can be formed
using four different alphabets i.e. here n = 7 & r = 4
7P4 =଻!
(଻ିସ)!=଻×଺×ହ×ସ×ଷ!
ଷ!=7×6×5×4=840
Combination: Combination is the selection of items in which order does not
matter.
Formula: ܥ௥௡=௡!
௥!(௡ି௥)!
We can use some properties:
1) ܥ௡௡=1, ܥ଴௡=1
2) ܥଵ௡=݊, ܥ௡ିଵ௡=݊
3) ܥ௥௡=ܥ௡ି௥௡ ,0 ൑ r ൑ n munotes.in

Page 209

209 Probability For e.g. In how many ways can a committee of 3 men and 2 women be
formed out of 10 men and 5 women?
Here the selection of 3 men out of 10 men can be done by 10C3 and the
selection of 2 women can be done by 5C2 .
Total number of ways the committee can be selected = 10C3 ×5C2
=ଵ଴×ଽ×଼
ଷ×ଶ×ଵ ×ହ×ସ
ଶ×ଵ=
120 ×10=1200 .
11.2 BASIC CONCEPT O F PROBABILITY:
Random experiment: When experiment can be repeated any number of
times under the similar conditions but we get different results on same
experiment, also result is not predict able such experiment is called random
experiment. For. e.g. A coin is tossed, A die is rolled and so on.
Outcomes: The result which we get from random experiment is called
outcomes of random experiment.
Sample space: The set of all possible outcomes of r andom experiment is
called sample space. The set of sample space is denoted by S and number
of elements of sample space can be written as ݊(ܵ). For e.g. A die is rolled,
we get ={1,2,3,4,5,6} , ݊(ܵ)=6.
Events: Any subset of the sample space is called an event. Or a set of
sample point which satisfies the required condition is called an events.
Number of elements in event set is denoted by ݊(ܧ). For example in the
experiment of throwing of a dia. The sample space is
S = {1, 2, 3, 4, 5, 6 } each of the following can be event i) A: even number
i.e. A = { 2, 4, 6} ii) B: multiple of 3 i.e. B = { 3, 6} iii) C: prime numbers
i.e. C = { 2 , 3, 5}.
Types of events:
Impossible event: An event which does not occurred in random experiment
is called impossible event. It is denoted by ׎ set. i.e. ݊(׎)=0. For example
getting number 7 when die is rolled. The probability measure assigned to
impossi ble event is Zero.
Equally likely events: when all events get equal chance of occurrences is
called equally likely events. For e.g. Events of occurrence of head or tail
in tossing a coin are equally likely events.
Certain event: An event which contains al l sample space elements is called
certain events. i.e. ݊(ܣ)=݊(ܵ).
Mutually exclusive events: Two events A and B of sample space S, it does
not have any common elements are called mutually exclusive events. In the
experiment of throwing of a die A: number less than 2 , B: multiple of 3.
There fore ݊(ܣתܤ)=0 munotes.in

Page 210

210 Business Statistics
210 Exhaustive events: Two events A and B of sample space S, elements of
event A and B occurred together are called exhaustive events. For e.g. In a
thrown of fair die occurrence of even number and occurrence of odd number
are exhaustive events. There fore ݊(ܣ׫ܤ)=1.
Complement event: Let S be sample space and A be any event than
complement of A is denoted by ܣҧ is set of elements from sample space S,
which does not belong to A. For e.g. if a die is thrown, S = {1, 2, 3, 4, 5, 6}
and A: odd numbers, A = {1, 3, 5}, then ܣҧ={2,4,6}.
Probability: For any random experiment, sample space S with required
chance of happing event E than the probability of event E is define as
ܲ(ܧ)=݊(ܧ)
݊(ܵ)
Basic properties of probability:
1) The probability of an event E lies between 0 and 1. i.e. 0൑ܲ(ܧ)൑
1.
2) The probability of impossible event is zero. i.e. ܲ(׎)=0.
3) The probability of certain event is unity. i.e. ܲ(ܧ)=1.
4) If A and B are exhaustive events than probability of ܲ(ܣ׫ܤ)=1.
5) If A and B a re mutually exclusive events than probability of
ܲ(ܣתܤ)=0.
6) If A be any event of sample space than probability of complement of
A is given by ܲ(ܣ)+ܲ(ܣҧ)=1֜׵ܲ(ܣҧ)=1െܲ(ܣ).
Example 1: An unbiased coin is tossed three times. Find the probabili ty that
i) no heads turn up, ii) only one head turn up, iii) Atleast one head turn up,
iv) At most one head turn up.
Solution: When 3 coins are tossed the sample space is as follows:
ܵ={ܪܪܪ,ܶܪܪ,ܪܶܪ,ܪܪܶ,ܶܶܪ,ܶܪܶ,ܪܶܶ,ܶܶܶ}
׵݊(ܵ)=8
i) Event A contained no head turn up.
ܣ={ܶܶܶ}, ݊(ܣ)=1
׵ܲ(ܣ)=௡(஺)
௡(ௌ)=ଵ
଼=0.125 .
ii) Event B contained only one head turn up.
ܤ={ܶܶܪ,ܶܪܶ,ܪܶܶ}
݊(ܤ)=3
׵ܲ(ܤ)=௡(஻)
௡(ௌ)=ଷ
଼=0.375 . munotes.in

Page 211

211 Probability iii) Event C contained Atleast one head turn up,
ܥ={ܪܪܪ,ܶܪܪ,ܪܶܪ,ܪܪܶ,ܶܶܪ,ܶܪܶ,ܪܶܶ}
׵݊(ܥ)=7
׵ܲ(ܥ)=݊(ܥ)
݊(ܵ)=7
8=0.875
iv) Event D contained At most one head turn up,
ܦ={ ܶܶܪ,ܶܪܶ,ܪܶܶ,ܶܶܶ}
׵݊(ܦ)=4
׵ܲ(ܦ)=݊(ܦ)
݊(ܵ)=4
8=0.5
Example 2: If two dice are rolled, find the probability that the sum of the
numbers turn up on uppermost faces of the dice is i) even number, ii) a
prime number, iii) a perfect square, iv) multiple of 4 v) divisible by 3.
Solution: When 2 dice are rolled the sample space is as follows:
ܵ=
ەۖ۔ۖۓ(1,1)(1,2)(1,3)
(2,1)(2,2)(2,3)
(3,1)
(4,1)
(5,1)
(6,1)(3,2)
(4,2)
(5,3)
(6,2)(3,3)
(4,3)
(5,3)
(6,3) (1,4)(1,5)(1,6)
(2,4)(2,5)(2,6)
(3,4)
(4,4)
(5,4)
(6,4)(3,5)
(4,5)
(5,5)
(6,5)(3,6)
(4,6)
(5,6)
(6,6)ۙۖۘۖۗ
׵݊(ܵ)=36
i) Event A: sum of the numbers turn up on uppermost faces of the dice
is even number.
(i.e. score are 2,4,6,8,10,12)
݊(ܣ)=18
׵ܲ(ܣ)=௡(஺)
௡(ௌ)=ଵ଼
ଷ଺=0.5.
ii) Event B: sum of the num bers turn up on uppermost faces of the dice
is a prime number.(i.e score are 2,3,5,7,11)
݊(ܤ)=15
׵ܲ(ܤ)=௡(஻)
௡(ௌ)=ଵହ
ଷ଺=0.4167 .
iii) Event C: sum of the numbers turn up on uppermost faces of the dice
is a perfect square.(i.e. score are 4, 9)
݊(ܥ)=7 munotes.in

Page 212

212 Business Statistics
212 ׵ܲ(ܥ)=௡(஼)
௡(ௌ)=଻
ଷ଺=0.194 .
iv) Event D: sum of the numbers turn up on uppermost faces of the dice
is multiple of 4.
(i.e score are 4,8, 12)
݊(ܦ)=9
׵ܲ(ܦ)=௡(஽)
௡(ௌ)=ଽ
ଷ଺=0.25.
v) Event E: sum of the numbers turn up on uppermost faces of the dice
is divisible by 3.
(i.e score are 3,6,9,12)
݊(ܧ)=12
׵ܲ(ܧ)=௡(ா)
௡(ௌ)=ଵଶ
ଷ଺=0.33.
Example 3: From a well -shuffled pack of cards, a card is drawn at random,
find the probability that the card drawn is i) a red car d, ii) a king card, iii)
a face card, iv)a diamond card.
Solution: When a card is drawn at random, the sample space is
݊(ܵ)=52C1 =52
i) Event A: drawn a red card.
݊(ܣ)=26C1 =26
׵ܲ(ܣ)=௡(஺)
௡(ௌ)=ଶ଺
ହଶ=0.5
ii) Event B: drawn a king card,
݊(ܤ)=4C1 = 4
׵ܲ(ܤ)=௡(஻)
௡(ௌ)=ସ
ହଶ=0.077
iii) Event C: drawn a face card,
݊(ܥ)=12C1 =12
׵ܲ(ܥ)=௡(஼)
௡(ௌ)=ଵଶ
ହଶ=0.23
iv) Event D: drawn a diamond card.
݊(ܦ)=13C1 =13
׵ܲ(ܦ)=௡(஽)
௡(ௌ)=ଵଷ
ହଶ=0.25 munotes.in

Page 213

213 Probability Example 4: A box contains 50 tickets numbered 1 to 50. A ticket is drawn
at random from the box. Find the probability that the number on the ticket
drawn is i) an odd number, ii) multiple of 5, iii) greater than 35.
Solution: When ticket is drawn at random, the sample spa ce is
ܵ={1,2,3,4,…..,50}
݊(ܵ)=50
i) Event A: To select an odd number.
݊(ܣ)=25
׵ܲ(ܣ)=݊(ܣ)
݊(ܵ)=25
50=0.5
ii) Event B: To select a multiple of 5,
݊(ܤ)=25
׵ܲ(ܤ)=݊(ܤ)
݊(ܵ)=10
50=0.2
iii) Event C: To select a greater than 35.
݊(ܥ)=15
׵ܲ(ܥ)=݊(ܥ)
݊(ܵ)=15
50=0.3
Example 5: A bag contains 6 white and 4 black balls. Two balls are drawn
at random from the bag. Find the probability that i) both are white, ii) both
are black, iii) one of each color.
Solution: A bag contains total num ber of balls = 6 + 4 = 10 balls.
When two balls are drawn at random from the bag, the sample space is
݊(ܵ)=10C2 =ଵ଴×ଽ
ଶ×ଵ=45.
i) Event A: to select that both are white balls.
݊(ܣ)=6C2 =଺×ହ
ଶ×ଵ=15
׵ܲ(ܣ)=݊(ܣ)
݊(ܵ)=15
45=0.33
ii) Event B: to select that both are black balls.
݊(ܤ)=4C2 =ସ×ଷ
ଶ×ଵ=6
׵ܲ(ܤ)=݊(ܤ)
݊(ܵ)=6
45=0.13 munotes.in

Page 214

214 Business Statistics
214 iii) Event C: to select that one ball of each color.
݊(ܥ)=6C1 ×4C1 = 6×4=24
׵ܲ(ܥ)=݊(ܥ)
݊(ܵ)=24
45=0.53
Example 6: Two cards are drawn from a wel l-shuffled pack of 52 cards.
Find the probability that i) one king and one queen, ii) both of same color,
Solution: When two cards is drawn at random, the sample space is
݊(ܵ)=52C2 =ହଶ×ହଵ
ଶ×ଵ=1326
i) Event A: to select that one king card and one quee n card.
݊(ܣ)=4C1 × 4C1=16
׵ܲ(ܣ)=݊(ܣ)
݊(ܵ)=16
1326=0.012
ii) Event B: to select that both are of same color cards,
݊(ܤ)=2× 26C2 =650
׵ܲ(ܤ)=݊(ܤ)
݊(ܵ)=650
1326=0.49
Example 7: A committee of 4 is to be formed from a group of 7 boys and
5 girls. Find the probability that the committee consists of i) all girls, ii) 3
boys and 1 girl, iii) No girls.
Solution: Total numbers = 7 boys + 5 girls = 12
When committee of 4 is to be formed out of 12, the sample space is
݊(ܵ)= 12C4 = ଵଶ×ଵଵ×ଵ଴×ଽ
ସ×ଷ×ଶ×ଵ=495
i) Event A: to select that the committee contains all girls.
݊(ܣ)= 5C4 = 5
׵ܲ(ܣ)=݊(ܣ)
݊(ܵ)=5
495=0.01
ii) Event B: to select that the committee contains 3 boys and 1 girl
݊(ܤ)=7C3 ×5C1= ଻×଺×ହ
ଷ×ଶ×ଵ ×5=35×5=175
׵ܲ(ܤ)=݊(ܤ)
݊(ܵ)=175
495=0.35
iii) Event C: to select that the committee contains no girls.(i.e all are boys)
݊(ܥ)= 7C4 = ଻×଺×ହ×ସ
ସ×ଷ×ଶ×ଵ=35
׵ܲ(ܥ)=݊(ܥ)
݊(ܵ)=35
495=0.07 munotes.in

Page 215

215 Probability Example 8: A room has three lamp sockets for which bulbs are chosen fr om
a set of 10 bulbs of which 4 are defective. What is the probability that i)
room is dark ii) room is lighted?
Solution: To select 3 bulbs from 10 bulbs for sockets, the sample space is
݊(ܵ)= 10C3 = ଵ଴×ଽ×଼
ଷ×ଶ×ଵ=120
i) Event A: to select that the con dition for room is dark.(i.e. no light in
room or all are defective bulbs)
݊(ܣ)= 4C3 = ସ×ଷ×ଶ
ଷ×ଶ×ଵ=4
׵ܲ(ܣ)=݊(ܣ)
݊(ܵ)=4
120=0.033
ii) Event B: to select that the condition for room is lighted.(i.e atleast 1
bulb is working in room)
We use he re complement probability,
P(room is lighted) = 1 – P (room is dark) = 1 - ܲ(ܣ)=1െ0.033 =
0.967 .
Example 9: If letter of the word MISHA can be arranged at random. Find
the probability that i) vowel are together, ii) vowels are not together, iii)
begins and end with vowels.
Solution: When the letter of the word MISHA is arranged at random, the
sample space is
݊(ܵ)=5P5 = 5! =120.
i) Event A: to arrange the letter that vowel are together.
݊(ݏ)=2P2×4P4 = 2!×4!=2×24=48.
׵ܲ(ܣ)=݊(ܣ)
݊(ܵ)=48
120=0.4
ii) Event B: to arrange the letter that vowels are not together.
݊(ݏ)=4P2×3P3 = 6×3!=6×6=36.
׵ܲ(ܤ)=݊(ܤ)
݊(ܵ)=36
120=0.3
iii) Event C: to arrange the letter that begins and ends with vowels.
݊(ݏ)=2P2×3P3 = 2!×3!=2×6=12.
׵ܲ(ܥ)=݊(ܥ)
݊(ܵ)=12
120=0.1 munotes.in

Page 216

216 Business Statistics
216 Check your Progress:
1. Three coins are tossed, find the probability that i) All are tails, ii) at
most 1 tail,
iii) exactly 2 tails.
2. A card is selected at random from a pack of 52 cards. Find the
probability that i) a red card, ii) a heart card, iii) a face card.
3. Two cards are drawn from a well -shuffled pack of 52 cards. Find the
probability that i) one black and one red card, ii) Both are hearts card,
iii) both are from same suit, iv) both are from different suit.
4. A committee of 3 is to be formed from a group of 6 boys and 4 girls.
Find the probability that the committee consists of i) all boys, ii) at
least one boy, iii) no boys, iv) exactly two boys.
5. Four men and three women have to stand in a row for a photograph.
If they choose their position at random, find the probability that i)
Women are together, ii) women are not together.
6. Tickets numbered from 1 to 100 are well -shuffled and a ticket is
drawn. What is the probability that the drawn ticket has i) an odd
number? ii) number 5 and multiple of 5? iii) a number which is a
square?
7. A pair of uniform dice is thrown. Find the probability that the sum of
the numbers obtained on uppermost face is i) a two digit number, ii)
Divisible by 5, iii) less than 5, iv) A perfect square.
8. A box contains 4 red and 3 yellow and 5 green balls. If t wo balls are
selected at random from the box, what is the probability that i) both
are red, ii) one red and one green iii) one yellow ball iv) none red ball.
9. If the letter of the word MANISH be arranged at random, what is the
probability that i) vowel a re together? ii) Vowel are at extremes? iii)
Word begins with M and end with S.
10. A lot of 300 gift articles manufacture in a factory contains 60
defective gift articles. If 3 articles are picked from the lot at random,
find the probability that they a re non -defective.
11.3 PROBABILITY AXI OMS:
Let S be a sample space. A probability function P from the set of all event
in S to the set of real numbers satisfies the following three axioms for all
events A and B in S.
i) ܲ(ܣ)൒0 .
ii) ܲ(׎)=0 and ܲ(ܵ)=1.
iii) If A and B are two disjoint sets i.e. ܣתܤ=׎) than the probability
of the union of A and B is ܲ(ܣ׫ܤ)=ܲ(ܣ)+ܲ(ܤ).
Theorem: Prove that for every event A of sample space S, 0൑ܲ(ܣ)൑1. munotes.in

Page 217

217 Probability Proof: ܵ=ܣ׫ܣҧ , ׎=ܣתܣҧ.
׵1= ܲ(ܵ)=ܲ(ܣ׫ܣҧ)=ܲ(ܣ)+ܲ(ܣҧ)
׵1= ܲ(ܣ)+ܲ(ܣҧ)
ܲ֜(ܣ)=1 െܲ(ܣҧ) or ܲ(ܣҧ)=1 െܲ(ܣ).
If ܲ(ܣ)൒0. than P( ܣҧ)൑1.
׵ for every event ܣ ; 0 ൑ܲ(ܣ)൑1.
11.3.1 Addition theorem of probability:
Theorem: If A and B are two events of sample space S, then probability of
union of A and B is given by ܲ(ܤ׫ܣ)=ܲ(ܣ)+ܲ(ܤ)െܲ(ܤתܣ) .
Proof: A and B are two events of sample space S.

Now from diagram probability of union of two events A and B is given by,
ܲ(ܤ׫ܣ)=ܲ(ܤתܣത)+ܲ(ܤתܣ)+ܲ(ܣתܤҧ)
But ܲ(ܤתܣത)=ܲ(ܣ)െܲ(ܤתܣ) and ܲ(ܣתܤҧ)=ܲ(ܤ)െܲ(ܤתܣ).
ܲ׵(ܤ׫ܣ)= ܲ(ܣ)െܲ(ܤתܣ) +ܲ(ܤתܣ)+ܲ(ܤ)െܲ(ܤתܣ)
ܲ׵(ܤ׫ܣ)=ܲ(ܣ)+ܲ(ܤ)െܲ(ܤתܣ).
Note: The above theorem can be extended to three events A, B and C as
shown below:
ܲ(ܥ׫ܤ׫ܣ )=ܲ(ܣ)+ܲ(ܤ)+ܲ(ܥ)െܲ(ܤתܣ)െܲ(ܥתܤ)െܲ(תܥ
ܣ)+ܲ(ܥתܤתܣ )
Example 10: A bag contains 4 black and 6 white balls; two balls are
selected at random. Find the probability that balls are i) both are different
colors. ii) both are of same color s.
Solution: Total number of balls in bag = 4 blacks + 6 white = 10 balls
To select two balls at random, we get
munotes.in

Page 218

218 Business Statistics
218 ݊(ܵ)=ܥ(10,2)=45.
i) A be the event to select both are different colors.
׵݊(ܣ)=ܥ(4,1)×ܥ(6,1)=4×6=24.
ܲ(ܣ)=݊(ܣ)
݊(ܵ)=24
45=0.53.
ii) To select both are same colors.
Let Abe the event to select both are black balls
݊(ܣ)=ܥ(4,2)=6
ܲ(ܣ)=݊(ܣ)
݊(ܵ)=6
45
Let B be the event to select both are white balls.
݊(ܤ)=ܥ(6,2)=15
ܲ(ܤ)=௡(஻)
௡(ௌ)=ଵହ
ସହ .
A and B are disjoin t event.
׵ The required probability is
ܲ(ܣ׫ܤ)=ܲ(ܣ)+ܲ(ܤ)=଺
ସହ+ଵହ
ସହ=ଶଵ
ସହ=0.467 .
Example 11: From 40 tickets marked from 1 to 40, one ticket is drawn at
random. Find the probability that it is marked with a multiple of 3 or 4.
Solution: From 40 tickets marked with 1 to 40, one ticket is drawn at
random
݊(ܵ)=ܥ(40,1)=40
it is marked with a multiple of 3 or 4, we need to select in two parts.
Let A be the event to select multiple of 3,
i.e. A = { 3,6,9,….,39}
݊(ܣ)=ܥ(13,1)=13
ܲ(ܣ)=݊(ܣ)
݊(ܵ)=13
40
Let B be the event to select multiple of 4.
i.e. B = {4,8,12, …., 40}
݊(ܤ)=ܥ(10,1)=10 munotes.in

Page 219

219 Probability ܲ(ܤ)=݊(ܤ)
݊(ݏ)=10
40.
Here A and B are not disjoint.
ܣתܤ be the event to select multiple of 3 and 4.
i.e. ܣתܤ ={12,24,36}
݊(ܣתܤ)=ܥ(3,1)=3
ܲ(ܣתܤ)=݊(ܣתܤ)
݊(ܵ)=3
40
׵ The required probability is
ܲ(ܣ׫ܤ)=ܲ(ܣ)+ܲ(ܤ)െܲ(ܣתܤ)=13
40+10
40െ3
40=20
40=0.5.
Example 12: Two cards are drawn at random from a well shuffled pack of
cards. Find the probability that i) both ar e red cards or both are face cards,
ii) both are king or queen.
Solution: Two cards are drawn at random from a well shuffled pack of cards
than the sample space is ݊(ܵ)=52C2= 1326.
i) both are red cards or both are face cards
Event A: to select both are red cards
݊(ܣ)= 26C2 = 325
׵ܲ(ܣ)=݊(ܣ)
݊(ܵ)=325
13260.245
Event B: to select both are face cards
݊(ܤ)= 12C2 = 66
׵ܲ(ܤ)=݊(ܤ)
݊(ܵ)=66
13260.05
Event ܣתܤ :To select both red and face cards.
݊(ܣתܤ)= 6C2 = 15
׵ܲ(ܣתܤ)=݊(ܣתܤ)
݊(ܵ)=66
13260.0045
׵ The required probability is
ܲ(ܣ׫ܤ)=ܲ(ܣ)+ܲ(ܤ)െܲ(ܣתܤ)
ܲ(ܣ׫ܤ)=0.245 +0.05െ0.0045 =0.2905 munotes.in

Page 220

220 Business Statistics
220 ii) Both are king or queen.
Event A: to select both are King Cards.
݊(ܣ)= 4C2 = 6
׵ܲ(ܣ)=݊(ܣ)
݊(ܵ)=6
1326
Event B: to select both are queen cards.
݊(ܤ)= 4C2 = 6
׵ܲ(ܤ)=݊(ܤ)
݊(ܵ)=6
1326
Here event A and Bare mutually exclusive events(i.e. ܲ(ܣתܤ)=0)
׵ The required probability is
ܲ(ܣ׫ܤ)=ܲ(ܣ)+ܲ(ܤ)=6
1326+6
1326=12
1326=0.009
Example 13: If the probability is 0.45 that a program development job; 0.8
that a networking job applicant has a graduate degree and 0.35 that applied
for both. Find the probability that applied for atleast one of jobs. If number
of graduate are 500 then how many are not a pplied for jobs?
Solution: Let Probability of program development job= ܲ(ܣ)=0.45.
Probability of networking job = ܲ(ܤ)=0.8.
Probability of both jobs = ܲ(ܣתܤ)=0.35.
Probability of atleast one i.e. to find ܲ(ܣ׫ܤ).
ܲ(ܣ׫ܤ)=ܲ(ܣ)+ܲ(ܤ)െܲ(ܣתܤ)
ܲ(ܣ׫ܤ)=0.45+0.8െ0.35=0.9
Now there are 500 applications, first to find probability that not applied for
job.
ܲ(ܣ׫ܤതതതതതതത)=1െܲ(ܣ׫ܤ)=1െ0.9=0.1
Number of graduate not applied for job = 0.1×500 =50 .
Check your Progress:
1. A card is drawn from pack of 5 2 cards at random. Find the probability
that it is a i) face card or a diamond card, ii)
2. If ܲ(ܣ)=ଷ
଼ and (ܤ)=ହ
଼ , ܲ(ܣ׫ܤ)=଻
଼ than find i) ܲ(ܣ׫ܤതതതതതതത) ii)
ܲ(ܣתܤ). munotes.in

Page 221

221 Probability 3. In a class of 60 students, 50 passed in computers, 40 passed in
mathe matics and 35 passed in both. What is the probability that a
student selected at random has i) Passed in atleast one subject, ii)
failed in both the subjects, iii) passed in only one subject.
4. Two dice are rolled. Find the probability that the sum of t he numbers
on the uppermost faces is i) an odd number and a prime number, ii)
divisible by 2 or 3.
5. A box contains 5 white and 7 black marbles. If 2 marbles are drawn
at random. Find the probability that both are of same colors.
11.8 LET US SUM UP:
In this chapter we have learn:
• Basic concept of probability like random experiment, outcomes,
sample space, events and types of events.
• Probability Axioms and its basic properties.
• Addition theorem of probability for disjoint events.
11.9 UNIT END EXERCI SES:
1. Two coins are tossed. Find the probability that i) Exactly one head
turn up ii) atleast one head turn up, iii) no head turn up.
2. An unbiased dice is rolled. Find the probability that i) number less
than 3, ii) odd number, iii) a prime number.
3. A car d is drawn at random from the pack of well -shuffled 52 cards.
Find the probability that it is i) a king, ii) a black, iii) a heart, iv)
number between 4 to 9 includes both.
4. A bag contains 5 red and 7 blue marbles and a marble selected at
random. Find the probability that a marble is i) red, ii) blue, iii) either
red or blue.
5. There are 5 boys and 4 girls and a committee of 4 is to be selected at
random. Find the probability that committee contains i) no boys, ii)
atleast one boy, iii) exactly 2 boys, iv) atmost 2 boys.
6. Find the probability that a leap year contains 53 Sundays.
7. If two dice are rolled simultaneously, what is the probability of getting
the same number on both dice?
8. A card is drawn at random from well shuffled pack of card find the
probability that it is red or king card.
9. There are 30 tickets bearing numbers from 1 to 15 in a bag. One
ticket is drawn from the bag at random. Find the probability that the
ticket bears a number, which is even, or a multiple of 3. munotes.in

Page 222

222 Business Statistics
222 10. In a group o f 200 persons, 100 like sweet food items, 120 like salty
food items and 50 like both. A person is selected at random find the
probability that the person (i). Like sweet food items but not salty
food items (ii). Likes neither.
11. A bag contains 7 white b alls & 5 red balls. One ball is drawn from
bag and it is replaced after noting its color. In the second draw again
one ball is drawn and its color is noted. The probability of the event
that both the balls drawn are of different colors.
12. A bag contains 8 white & 6 red balls. Find the probability of
drawing 2 balls of the same color.
Multiple Choice Questions:
1) Set of all possible outcomes of a random experiments is called
a) Event b) Experiment
c) Sample Space d) Space
2) In probability theories, ev ents which can never occur together is
known as
a) mutually exclusive events b) Not exclusive events
c) mutually exhaustive events d) Complementary event
3) When we throw a dice then what is the probability of getting prime
number?
a) 1/5 b) 1/6 c) ½ d) 1/3
4) A card is drawn at random from a well -shuffled pack of cards. What
is the probability that the card drawn is a red?
a) ½ b) 1/13 c) 2/13 d) 1/4
5) A card is drawn at random from a well -shuffled pack of cards. What
is the pro babilit y that the card drawn is a diamond?
a) 1/3 b) 1/13 c) 2/13 d) 1/4
6) Bag contain 10 back and 20 white balls, One ball is drawn at
random. What is the probability that ball is white
a) 1 b) 2/3 c) 1/3 d) 4/3
7) The collection of one or more outcomes from an experiment is
called
a) Probability b) Event
c) Random Variable d) Random Experiment
8) Which of the following is not a correct statement about a probability. munotes.in

Page 223

223 Probability a) It must have a value between 0 and 1
b) It can be reported as a decimal or a fraction
c) A value near 0 means that the event is not likely to occur/happens
d) It is the collection of several experiments.
9) A bag contains 50 tokens numbered 1 to 50. A token is drawn at
random. What is probability that number on the token is A multiple
of 4?
a) 6/25 b) 6/24 c) 6/23 d) 6/22
10) Any subset of the sample space is called
a) an event b) an ele ment c) superset d) empty set

11.10 LIST OF REFERE NCES:
• Schaum‘s outline of theory and problems on probability and statistics
by Murray R. Spiegel.
• Fundamentals of mathematical Statistics by S.C. Gupta and V.K
kapoor.
• Basic Statisti cs by B. L. Agrawal.
7777777
munotes.in

Page 224

224 Business Statistics
224 12
PROBABILITY II
Unit Structure
12.0 Objectives
12.1 Introduction:
12.2 Condition Probability
12.3 Independent events
12.4 For Independent events multiplication theorem:
11.5 Baye’s formula
12.6 Expected Value
12.7 Let us sum up
12.8 Unit end Exercises
12.9 List of References
12.0 OBJECTIVES:
After going through this unit, you will able to:
• Conditional probability and its examples.
• Independent events and multiplication theorem of probability.
• Baye’s formula of probability.
• Expected value of probability.
12.1 INTRODUCTION:
We have learned basic of probability in previous chapter. He re we are going
to discuss bout the conditional probability and its axiom. To find the
probability of depended and independed events we used conditional
probability. Conditional probability is the probability of one event
occurring with some relationship to one or more other events. The concept
of “randomness” is fundamental to the field of statistics. As mentioned in
the probability theory notes, the science of statistics is concerned with
assessing the uncertainty of inferences drawn from random samples of data.
Now that we’ve defined some basic concepts related to set operations and
probability theory, we can more formally discuss what it means for things
to be random. Here we will discuss discrete random variable and its
expected value. munotes.in

Page 225

225 Probability II 12.2 CONDITIONAL PROBABI LITY:
In many case we have the occurrence of an event A and are required to find
out the probability of occurrence an event B which depend on event A this
kind of problem is called conditional probability problems.
Definition: Let A and B be two events. The conditional probability of event
B, if an event A has occurred is defined by the relation,
ܲ(ܤ|ܣ)=௉(஻ת஺)
௉(஺) if and only if ܲ(ܣ)>0.
In case when ܲ(ܣ)=0,ܲ(ܤ|ܣ) is not define because ܲ(ܤתܣ)=0 and
ܲ(ܤ|ܣ)=଴
଴ which is an indeterminate quantity.
Similarly, Let A and B be two events. The conditional probability of event
A, if an event B has occurred is defined by the rela tion,
ܲ(ܣ|ܤ)=௉(஺ת஻)
௉(஻) If and only if ܲ(ܤ)>0.
Example 1: A pair of fair dice is rolled. What is the probability that the
sum of upper most face is 6, given that both of the numbers are odd?
Solution: A pair of fair dice is rolled, therefo re ݊(ܵ)=36.
A to select both are odd number, i.e. A = {(1,1), (1,3), (1,5), (3,1), (3,3),
(3,5), (5,1),(5,3), (5,5)}.
ܲ(ܣ)=݊(ܣ)
݊(ܵ)=9
36
B is event that the sum is 6, i.e. B = { ((1,5),(2,4), (3,3),(4,2), (5,1)}.
ܲ(ܤ)=݊(ܤ)
݊(ܵ)=5
36
ܣתܤ ={ (1,5), (3,3), (5,1)}
ܲ(ܣתܤ)=݊(ܣתܤ)
݊(ܵ)=3
36
By the definition of conditional probability,
ܲ(ܤ|ܣ)=ܲ(ܣתܤ)
ܲ(ܣ)=336ൗ
936ൗ=1
3.
Example 2: If A and B are two events of sample space S, such that ܲ(ܣ)=
0.85,ܲ(ܤ)=0.7 and ܲ(ܣ׫ܤ)=0.95. Find i) ܲ(ܣתܤ) ,ii) ܲ(ܣ|ܤ) ,iii)
ܲ(ܤ|ܣ).
Solution: Given that ܲ(ܣ)=0.85,ܲ(ܤ)=0.7 and ܲ(ܣ׫ܤ)=0.95.
i) By Addition theorem,
ܲ(ܣ׫ܤ)=ܲ(ܣ)+ܲ(ܤ)െܲ(ܣתܤ)
0.95=0.85+0.7െܲ(ܣתܤ)
ܲ(ܣתܤ)=1.55െ0.95=0.6. munotes.in

Page 226

226 Business Statistics
226 ii) By the definition of conditional probability ,
ܲ(ܣ|ܤ)=௉(஺ת஻)
௉(஻)=଴.଺
଴.଻=0.857 .
iii) ܲ(ܤ|ܣ)=௉(஺ת஻)
௉(஺)=଴.଺
଴.଼ହ=0.706
Example 3: An urn A contains 4 Red and 5 Green balls. Another urn B
contains 5 Red and 6 Green balls. A ball is transf erred from the urn A to the
urn B, then a ball is drawn from urn B. find the probability that it is Red.
Solution: Here there are two cases of transferring a ball from urn A to B.
Case I: When Red ball is transferred from urn A to B.
There for probability of Red ball from urn A is ܲ(ܴ஺)=ସ
ଽ
After transfer of red ball, urn B contains 6 Red and 6 Green balls.
Now probability of red ball from urn B = ܲ(ܴ஻|ܴ஺)×ܲ(ܴ஺)=଺
ଵଶ×ସ
ଽ=
ଶସ
ଵ଴଼.
Case II: When Green ball is transferred from urn A to B.
There for probabi lity of Green ball from urn A is ܲ(ܩ஺)=ହ
ଽ
After transfer of red ball, urn B contains 5 Red and 7 Green balls.
Now probability of red ball from urn B = ܲ(ܴ஻|ܩ஺)×ܲ(ܩ஺)=ହ
ଵଶ×ହ
ଽ=
ଶହ
ଵ଴଼.
Therefore required probability =ଶସ
ଵ଴଼+ଶହ
ଵ଴଼=ସଽ
ଵ଴଼=0.4537 .
Check your progress:
1. A family has two children. What is the probability that both are boys,
given at least one is boy?
2. Two dice are rolled. What is the condition probability that the sum of
the numbers on the dice exceeds 8 , given that the first shows 4?
3. Consider a medical test that screens for a COVID -19 in 10 people in
1000. Suppose that the false positive rate is 4% and the false negative
rate is 1%. Then 99% of the time a person who has the condition tests
positive fo r it, and 96% of the time a person who does not have the
condition tests negative for it. a) What is the probability that a
randomly chosen person who tests positive for the COVID -19
actually has the disease? b) What is the probability that a randomly
chosen person who tests negative for the COVID -19 does not indeed
have the disease? munotes.in

Page 227

227 Probability II 4. Out of 200 articles 150 are good and 50 are defective. Find the
probability that out of 2 articles are selected at random i) both are
good articles. ii) first is good artic le and second is defective.
5. An unbiased coin is tossed three times and the outcomes of successive
tossed are different. Find the probability that the last toss has resulted
in tail.
12.3 INDEPENDENT EVENTS:
Independent events: Two events are said to be independent if the
occurrence of one of them does not affect and is not affected by the
occurrence or non -occurrence of other.
i.e. ܲ൫ܤܣൗ൯=ܲ(ܤ) or ܲ൫ܣܤൗ൯=ܲ(ܣ).
Multiplication theorem of probability: If A and B are any two events
associated with an experiment, then the probability of simultaneous
occurrence of events A and B is given by
ܲ(ܣתܤ)=ܲ(ܣ) ܲ൫ܤܣൗ൯
Where ܲ൫ܤܣൗ൯ denotes the conditional probability of event B given that
event A has already occurred.
OR
ܲ(ܣתܤ)=ܲ(ܤ) ܲ൫ܣܤൗ൯
Where ܲ൫ܣܤൗ൯ denotes the conditional probability of event A given that
event B has already occurred.
11.5.1 For Indep endent events multiplication theorem:
If A and B are independent events then multiplication theorem can be
written as,
ࡼ(࡭ת࡮)=ࡼ(࡭) ࡼ(࡮)
Proof. Multiplication theorem can be given by,
If A and B a re any two events associated with an experiment, then the
probability of simultaneous occurrence of events A and B is given by
ܲ(ܣתܤ)=ܲ(ܣ) ܲ൫ܤܣൗ൯
By definition of independent events, ܲ൫ܤܣൗ൯=ܲ(ܤ) or ܲ൫ܣܤൗ൯=ܲ(ܣ).
׵ࡼ(࡭ת࡮)=ࡼ(࡭) ࡼ(࡮) .
Note:
1) If A and B are independent event then, ܣҧ and ܤത are independent event. munotes.in

Page 228

228 Business Statistics
228 2) If A and B are independent event then, ܣҧ and B are independent event.
3) If A and B are independent event then, A and ܤത are independent event.
Example 4: From a well -shuffled pack of 52 cards, two cards are drawn at
random one after the other without replacement. Find the probability that i)
both the cards are same color. ii) the first card is a king and other is a queen.
Solution: i) Here we have to select first black and second also black or first
red and second also red. i.e. Event A 1 to select black card and event A 2/A1
that to select a black card or Event B 1 to select black card and event B 2/B1
that to select a black ca rd.
By multiplication theorem,
The required probability = ܲ(ܣଵ)×ܲ(ܣଶ|ܣଵ)+ܲ(ܤଵ)×ܲ(ܤଶ|ܤଵ)
=26
52×25
51+26
52×25
51=25
102+25
102=50
102=25
51=0.49
ii) Here we have to select first card is a king and then second car d is a queen.
i.e. Event A to select king card and event B/A that to select a queen card.
By multiplication theorem
ܲ(ܣתܤ)=ܲ(ܣ)×ܲ(ܤ|ܣ)
=4
52×4
51=4
663=0.006 .
Example 5: Manish and Mandar are trying to make Software for company.
Probabili ty that Manish can be success is ଵ
ହ and Mandar can be success is ଷ
ହ,
both are doing independently. Find the probability that i) both are success.
ii) Atleast one will get success. iii) None of them will success. iv) Only
Mandar will success but Ma nish will not success.
Solution: Let probability that Manish will success is ܲ(ܣ)=ଵ
ହ=0.2.
Therefore probability that Manish will not success is ܲ(ܣҧ)=1െܲ(ܣ)=
1െ0.2=0.8.
Probability that Mandar will success is ܲ(ܤ)=ଷ
ହ=0.6.
Therefore probability that Mandar will not success is ܲ(ܤത)=1െܲ(ܤ)=
1െ0.6=0.4.
i) Both are s uccess i.e. ܲ(ܣתܤ).
ܲ(ܣתܤ)=ܲ(ܣ)×ܲ(ܤ)=0.2×0.6=0.12 ׶ A and B are
independent events.
ii) Atleast one will get success. i.e. ܲ(ܣ׫ܤ)
By addition theorem, munotes.in

Page 229

229 Probability II ܲ(ܣ׫ܤ)=ܲ(ܣ)+ܲ(ܤ)െܲ(ܣתܤ)=0.2+0.6െ0.12=0.68.
iii) None of them will success. ܲ(ܣ׫ܤതതതതതതത) or ܲ(ܣҧתܤത)
[By DeMorgan’s law both are same]
ܲ(ܣ׫ܤതതതതതതത)=1െܲ(ܣ׫ܤ)=1െ0.68=0.32.
Or
If A and B are independent than ܣҧ and ܤത are also independent.
ܲ(ܣҧתܤത)=ܲ(ܣҧ)×ܲ(ܤത)=0.8×0.4=0.32.
iv) Only Mandar will success but Manish wi ll not success. i.e. ܲ(ܣҧתܤ).
ܲ(ܣҧתܤ)=ܲ(ܣҧ)×ܲ(ܤ)=0.8×0.6=0.48
Example 6: 50 coding done by two students A and B, both are trying
independently. Number of correct coding by student A is 35 and student B
is 40. Find the probability of only one of them will do correct coding.
Solution: Let probability of student A get correct coding is ܲ(ܣ)=ଷହ
ହ଴=
0.7
Probability of student A get wrong coding is ܲ(ܣҧ)=1െ0.7=0.3
Probability of student B get correct coding is ܲ(ܤ)=ସ଴
ହ଴=0.8
Probability o f student B get wrong coding is ܲ(ܤത)=1െ0.8=0.2.
The probability of only one of them will do correct coding.
i.e. A will correct than B will not or B will correct than A will not.
ܲ(ܣתܤത)+ܲ(ܤתܣҧ)=ܲ(ܣ)×ܲ(ܤത)+ܲ(ܤ)×ܲ(ܣҧ).
=0.7×0.2+0.8×0.3
=0.14+0.24=0.38
Example 7: Given that ܲ(ܣ)=ଷ
଻,ܲ(ܤ)=ଶ
଻ , if A and B are independent
events than find i) ܲ(ܣתܤ), ii) ܲ(ܤത), iii) ܲ(ܣ׫ܤ), iv) ܲ(ܣҧתܤത).
Solution : Given that ܲ(ܣ)=ଷ
଻,ܲ(ܤ)=ଶ
଻ .
i) A and B are independent events,
׵ܲ(ܣתܤ)=ܲ(ܣ)×ܲ(ܤ)=3
7×2
7=6
49=0.122
ii) ܲ(ܤത)=1െܲ(ܤ)=1െଶ
଻=ହ
଻=0.714 .
iii) By addition theorem, munotes.in

Page 230

230 Business Statistics
230 ܲ(ܣ׫ܤ)=ܲ(ܣ)+ܲ(ܤ)െܲ(ܣתܤ)=3
7+2
7െ6
49=29
49
=0.592 .
iv) ܲ(ܣҧתܤത)=ܲ(ܣ׫ܤതതതതതതത)=1െܲ(ܣ׫ܤ)=1െ0.592 =0.408 .
Example 8: The probability that a student A can solve a problem is ଶ
ଷ, that
B can solve it is ଵ
ଶ and that C solve it is ଷ
ସ . if all of them try it independently,
what is the probabil ity that the problem solved?
Solution: Given that (ܣ)=ଶ
ଷ , ܲ(ܤ)=ଵ
ଶ , ܲ(ܥ)=ଷ
ସ .
׵ The probability of problem will not solve by each of them is,
ܲ(ܣᇱ)=ଵ
ଷ, ܲ(ܤᇱ)=ଵ
ଶ, ܲ(ܥᇱ)=ଵ
ସ .
We are going to used co mplementary event that problem is solved is
problem is not solved.
i.e. ܲ( Problem is solved) = 1 – P( Problem is not solved)
ܲ(ܣ׫ܤ׫ܥ)=1െܲ(ܣᇱתܤᇱתܥᇱ)
=1െܲ(ܣᇱ) ܲ(ܤᇱ) ܲ(ܥᇱ) ׵ (A,B,C are indepen dent their complement
are also independent)
=1െ൬1
3×1
2×1
4൰
=1െଵ
ଶସ=ଶଷ
ଶସ
Check your progress:
1. If ܲ(ܣ)=ଶ
ହ ,ܲ(ܤ)=ଵ
ଷ and if A and B are independent events, find
(݅) ܲ(ܣתܤ),(݅݅) ܲ(ܣ׫ܤ),(݅݅݅)ܲ(ܣҧתܤത).
2. The probability that A , B and C can solve the same problem
independently are ଵ
ଷ,ଶ
ହ ܽ݊݀ ଷ
ସ respectively. Find the probability that i)
the problem remain unsolved, ii) the problem is solved, iii) only one
of them solve the problem.
3. The probability that Ram can shoot a target is ଶ
ହ and probability of
Laxman can shoot at the same target is ସ
ହ. A and B shot indep endently.
Find the probability that (i) the target is not shot at all, (ii) the target
is shot by at least one of them. (iii) the target shot by only one of them.
iv) target shot by both.
4. Two cards are drawn one after another from a well -shuffled pack o f
52 cards. Find the probability that both the cards are diamonds if the
cards are drawn i) with replacement, ii) without replacement. munotes.in

Page 231

231 Probability II 5. A bag contains 6 white balls and 4 black balls of same shape. One ball
is removed at random, its color is noted and is replaced in the box.
Then a second ball is drawn. Find the probability that, i) both are
black, ii) both are white, iii) first is white and the second one is black.
12.4 BAYES FORMULA:
In 1763, Thomas Bayes put forward a theory of revising the prior
probabilities of mutually exclusive and exhaustive events whenever new
information is received. These new probabilities are called as posterior
probabilities. The generalized formula of bayes theorem is given below:
Suppose ܣଵ,ܣଶ,…..,ܣ௞ are k mu tually exclusive events defined in B (a
collection of events) each being a subset of the sample space S such that
ڂܣ௜=ܵ௞
௜ୀଵ and ܲ(ܣ௜)>0,׊ ݅=1,2,,…݇.
For Some arbitrary event B, which is associated with ܣ௜ such that ܲ(ܤ)>
0, we ca n find out the probabilities ܲ(ܤ|ܣଵ),ܲ(ܤ|ܣଶ),…..,ܲ(ܤ|ܣ௞).
In Baye’s approach we want to find the posterior probability of an event ܣ௜
given that B has occurred. i.e. ܲ(ܣ௜|ܤ).
By definition of conditional probability, ܲ(ܣ௜|ܤ)=௉(஺೔ת஻)
௉(஻)
׶ܤאܵ such that ܤתܵ=ܤ.
ܤ=ܤת(ܣଵ׫ܣଶ׫….׫ܣ௞)
ڂܣ௜=ܵ௞
௜ୀଵ and ܣ௜’s are disjoint.
i.e. ܤ=(ܤתܧଵ)׫(ܤתܧଶ)׫….׫(ܤתܧ௞)
׵ܲ(ܤ)=෍ܲ(ܤתܣ௜)௞
௜ୀଵ
ܲ(ܤתܣ௜)=ܲ(ܣ௜|ܤ)×ܲ(ܤ)֜ܲ(ܣ௜|ܤ)=௉(஻ת஺೔)
௉(஻)
But ܲ(ܤתܣ௜)=ܲ(ܤ|ܣ௜)ܲ(ܣ௜) and ܲ(ܤ)=σܲ(ܤ|ܣ௜)ܲ(ܣ௜).௞
௜ୀଵ
Therefore we get,
ܲ(ܣ௜|ܤ)=௉൫ܤหܣ௜൯௉(஺೔)
σ௉൫ܤหܣ௜൯௉(஺೔).ೖ
೔సభ this known as Baye’s formula.
Example 9: There are three bags, first bag contains 2 white, 2 black, 2 red
balls; second bag 3 white, 2 black, 1 red balls and third bag 1 white 2 black,
3 red balls. Two balls are drawn from a bag chosen at random. Thes e are
found to be one white and I black. Find the probability that the balls so
drawn came from the third bag. munotes.in

Page 232

232 Business Statistics
232 Solution: Let ܤଵ be the first bag, ܤଶ be the second bag and ܤଷ be the third
bag.
A denotes the two ball are white and black.
First select the bag from any three bags,))
i.e. P (ܤଵ)=ܲ(ܤଶ)=ܲ(ܤଷ)=ଵ
ଷ.
Probability of white and black ball from fi rst bag:
ܲ(ܣ|ܤଵ)=஼(ଶ,ଵ)×஼(ଶ,ଵ)
஼(଺,ଶ)=ସ
ଵହ.
Probability of white and black ball from second bag:
ܲ(ܣ|ܤଶ)=ܥ(3,1)×ܥ(2,1)
ܥ(6,2)=6
15.
Probability of white and black ball from third bag:
ܲ(ܣ|ܤଷ)=ܥ(1,1)×ܥ(2,1)
ܥ(6,2)=2
15
By Baye’s theorem,
ܲ(ܤଷ|ܣ)=ܲ(ܤଷ)ܲ(ܣ|ܤଷ)
ܲ(ܤଵ)ܲ(ܣ|ܤଵ)+ܲ(ܤଶ)ܲ(ܣ|ܤଶ)+ܲ(ܤଷ)ܲ(ܣ|ܤଷ)
=భ
య×మ
భఱ
భ
య×ర
భఱାభ
య×ల
భఱାభ
య×మ
భఱ=ଶସହൗ
ଵଶସହൗ=ଵ
଺.
Example 10: A company has two factories ܨଵ and ܨଶ that produce the same
chip, each producing 55% and 45% of the total production. The probability
of a defective chip at ܨଵ and ܨଶ is 0.07 and 0.03 respectively. Suppose
someone shows us a defective chip. What is the probability that this chip
comes from factory ܨଵ?
Solution: Let ܨ௜ denote the event that the chip is produced by factory. A
denote the event that chip is defective.
Given that ܲ(ܨଵ)=0.55,ܲ(ܨଶ)=0.45,ܲ(ܣ|ܨଵ)=0.07,ܲ(ܣ|ܨଶ)=
0.03.
By Bayes’ formula,
ܲ(ܨଵ|ܣ)=௉(ிభ)௉(஺|ிభ)
௉(ிభ)௉൫ܣหܨଵ൯ା௉(ிమ)௉(஺|ிమ)=଴.ହହ×଴.଴଻
଴.ହହ×଴.଴଻ା଴.ସହ×଴.଴ଷ=଴.଴ଷ଼ହ
଴.଴ହଶ=0.74.
12.5 EXPECTED VALUE:
In order to understand the behavior of a random variable, we may want to
look at its average value. For probability we need to find Average is called munotes.in

Page 233

233 Probability II expected value of random variable X. for that first we have to learn some
basic concept of random variable.
Random Variable: A probability measurable real valued functions, say X,
defined over the sample space of a random experiment with respective
probability is called a random variable.
Types of random variables: There are two type of random variable.
Discrete Random Variable: A random var iable is said to be discrete
random variable if it takes finite or countably infinite number of values.
Thus discrete random variable takes only isolated values.
Continuous Random variable: A random variable is continuous if its set
of possible values cons ists of an entire interval on the number line.
Probability Distribution of a random variable: All possible values of the
random variable, along with its corresponding probabilities, so
that σܲ௜=1௡
௜ୀଵ, is called a probability distribution of a ra ndom variable.
The probability function always follow the following properties:
i) ܲ(ݔ௜)൒0 for all value of ݅.
ii) σܲ௜=1௡
௜ୀଵ.
The set of values ݔ௜ with their probability ܲ௜ constitute a discrete probability
distribution of the discrete variable X.
For e.g. Three coins are tossed, the probability distribution of the discrete
variable X is getting head. X= ݔ௜ 0 1 2 3 ܲ(ݔ௜) 18 38 38 18
Expectati on of a random variable (Mean) :
All the probability information of a random variable is contained in
probability mass function for random variable, it is often useful to consider
various numerical characteristics of that random variable. One such number
is the expectation of a random variable.
If random variable X takes values ݔଵ,ݔଶ,…..ݔ௡ with corresponding
probabilities ܲଵ,ܲଶ,…….ܲ௡ respectively, then expectation of random
variable X is
ܧ(ܺ)=σ݌௜ݔ௜௡
௜ୀଵ where σܲ௜=1௡
௜ୀଵ
Example 11: In Vijay sales every day sale of number of laptops with his
past experience the probability per day are given below: No. of laptop 0 1 2 3 4 5 Probability 0.05 0.15 0.25 0.2 0.15 0.2 munotes.in

Page 234

234 Business Statistics
234 Find his expected number of lapto ps can be sale?
Solution: Let X be the random variable that denote number of laptop sale
per day.
To calculate expected value, ܧ(ܺ)=σ݌௜ݔ௜௡
௜ୀଵ
ܧ(ܺ)=(0×0.05)+(1×0.15)+(2×0.25)+(3×0.2)+(4×0.15)
+(5×0.2)
ܧ(ܺ)= 2.85 ~3
Therefor e expected number of laptops sale per day is 3.
Example 12: A random variable X has probability mass function as follow: X= ݔ௜ -1 0 1 2 3 P(ݔ௜) K 0.2 0.3 2k 2k Find the value of k, and expected value.
Solution: A random variable X has probabi lity mass function,
σܲ௜=1௡
௜ୀଵ.
֜ k + 0.2 + 0.3 + 2k +2k = 1
֜5k = 0.5
֜ k = 0.1
Therefore the probability distribution of random variable X is X= ݔ௜ -1 0 1 2 3 P(ݔ௜) 0.1 0.2 0.3 0.2 0.2 To calculate expected value, ܧ(ܺ)=σ݌௜ݔ௜௡
௜ୀଵ
ܧ(ܺ)=(െ1×0.1)+(0×0.2)+(1×0.3)+(2×0.2)+(3×0.2)=
1.2 .
Example 13: A box contains 5 white and 7 black balls. A person draws 3
balls at random. He gets Rs. 50 for every white ball and losses Rs. 10 every
black ball. Find the expectation of him.
Solution: Total number of balls in box = 5 white + 7 black = 12 balls.
To select 3 balls at random, ݊(ݏ)=ܥ(12,3)=ଵଶ×ଵଵ×ଵ଴
ଷ×ଶ×ଵ=220 .
Let A be the event getting white ball.
A takes value of 0, 1, 2 and 3 white ball. munotes.in

Page 235

235 Probability II Case I : no white ball. i.e. A = 0,
ܲ(ܣ=0)=ܥ(7,3)
220=35
220
Case II: one white ball i.e. A = 1,
ܲ(ܣ=1)=ܥ(5,1)×ܥ(7,2)
220=105
220
Case III: two white balls i.e. A = 2,
ܲ(ܣ=2)=ܥ(5,2)×ܥ(7,1)
220=70
220
Case IV: three white balls i.e. A = 3,
ܲ(ܣ=3)=ܥ(5,3)
220=10
220
Now let X be amount he get from the game.
Therefore the probability distribution of X is as follows: X= ݔ௜ -30 30 90 150 P(ݔ௜) 35220 105220 70220 10220
To calculate expected value, ܧ(ܺ)=σ݌௜ݔ௜௡
௜ୀଵ
ܧ(ܺ)=ቀെ30×ଷହ
ଶଶ଴ቁ+ቀ30×ଵ଴ହ
ଶଶ଴ቁ+ቀ90×଻଴
ଶଶ଴ቁ+ቀ150 ×ଵ଴
ଶଶ଴ቁ= Rs.
45.
12.6 LET US SUM UP:
In this chapter we have learn:
• Condition Probability for dependent events.
• Independent events.
• For Independent events multiplication theorem.
• Baye’s formula and its application.
• Expected Value for discrete random probability distribution.
12.7 UNIT END EXERCISES:
1. The probability of A winning a race is ଵ
ଷ & that B wins a race is ଷ
ହ. Find
the probability that (a). either of the two wins a race. b) , no one wins
the race. munotes.in

Page 236

236 Business Statistics
236 2. Three machines A, B & C manufacture respectively 0.3, 0.5 & 0.2 of
the total production. The percentage of defective items produced by
A, B & C is 4 , 3 & 2 percent respectively. for an item chosen at
random , what is the proba bility it is defective.
3. An urn A contains 3 white & 5 black balls. Another urn B contains 5
white & 7 black balls. A ball is transferred from the urn A to the urn
B, then a ball is drawn from urn B. find the probability that it is white.
4. A husband & wife appear in an interview for two vacancies in the
same post. The probability of husband selection is ଵ
଻ & that of wife’s
selection is ଵ
ହ. What is the probability that, a). both of them will be
selected. b). only one of them will be selected. c). n one of them will
be selected?
5. A problem statistics is given to 3 students A,B & C whose chances of
solving if are ଵ
ଶ, ଷ
ସ & ଵ
ସ respectively. What is the probability that the
problem will be solved?
6. A bag contains 8 white & 6 red balls. Find the probability of drawing
2 balls of the same color.
7. Find the probability of drawing an ace or a spade or both from a deck
of cards?
8. A can hit a target 3 times in a 5 shots, B 2 times in 5 shots & C 3 times
in a 4 shots. they fire a volley. What is the probability that a).2 shots
hit? b). at least 2 shots hit?
9. A purse contains 2 silver & 4 cooper coins & a second purse contains
4 silver & 4 cooper coins. If a coin is selected at random from one of
the two purses, what is the probability that it i s a silver coin?
10. The contain of a three urns are : 1 white, 2 red, 3 green balls; 2 white,
1 red, 1 green balls & 4 white, 5 red, 3 green balls. Two balls are
drawn from an urn chosen at random. This are found to be 1 white &
1 green. Find the probabil ity that the balls so drawn come from the
second urn.
11. Three machines A,B & C produced identical items. Of there
respective output 2%, 4% & 5% of items are faulty. On a certain day
A has produced 30% of the total output, B has produced 25% & C the
remai nder. An item selected at random is found to be faulty. What are
the chances that it was produced by the machine with the highest
output?
12. A person speaks truth 3 times out of 7. When a die is thrown, he says
that the result is a 1. What is the probabil ity that it is actually a 1?
13. There are three radio stations A, B and C which can be received in a
city of 1000 families. The following information is available on the
basis of a survey:
(a) 1200 families listen to radio station A
(b) 1100 families li sten to radio station B. munotes.in

Page 237

237 Probability II (c) 800 families listen to radio station C.
(d) 865 families listen to radio station A & B.
(e) 450 families listen to radio station A & C.
(f) 400 families listen to radio station B & C.
(g) 100 families listen to radio stati on A, B & C.
The probability that a family selected at random listens at least to one
radio station.
14. The probability distribution of a random variable x is as follows. X 1 3 5 7 9 P(x) K 2k 3k 3k K Find value of (i). K (ii). E(x)
15. A player toss ed 3 coins. He wins Rs. 200 if all 3 coins show tail, Rs.
100 if 2 coins show tail, Rs. 50 if one tail appears and loses Rs. 40 if
no tail appears. Find his mathematical expectation.
16. The probability distribution of daily demand of cell phones in a
mobile gallery is given below.
Find the expected mean . Demand 5 10 15 20 Probability 0.4 0.22 0.28 0.10 17. If ܲ(ܣ)=ସ
ଵହ ,ܲ(ܤ)=଻
ଵହ and if A and B are independent events, find
(݅) ܲ(ܣתܤ),(݅݅) ܲ(ܣ׫ܤ),(݅݅݅)ܲ(ܣҧתܤത).
18. If ܲ(ܣ)=ହ
ଽ ,ܲ(ܤത)=ଶ
ଽ and if A and B are independent events, find
(݅) ܲ(ܣתܤ),(݅݅) ܲ(ܣ׫ܤ),(݅݅݅)ܲ(ܣҧתܤത).
19. If ܲ(ܣ)=0.65 ,ܲ(ܤ)=0.75 and ܲ(ܣתܤ)=0.45, where A and B
are events of sample space S, find (݅)ܲ(ܣ|ܤ) ,(݅݅) ܲ(ܣ׫ܤ),
(݅݅݅)ܲ(ܣҧתܤത).
20. A box containing 5 red and 3 black balls, 3 balls are drawn at random
from box. Find the expected number of red balls drawn.
21. Two fair dice are rolled. X denotes the sum of th e numbers appearing
on the uppermost faces of the dice. Find the expected value.
Multiple Choice Questions:
1) __________ variable takes only isolated values.
a) Continuous Random b) Discrete random
c) Possible random d)Mean random
2) Variance of rand om variable X is munotes.in

Page 238

238 Business Statistics
238 a) ܸ(ܺ)=ܧ(ܺଶ)+[ܧ(ܺ)]ଶ b) ܸ(ܺ)=ܧ(ܺଶ)െܧ(ܺ)
c) ܸ(ܺ)=[ܧ(ܺ)]ଶെܧ(ܺଶ) d) ܸ(ܺ)=ܧ(ܺଶ)െ[ܧ(ܺ)]ଶ
3) The expected value for the following probability distribution of
random variable X is X 2 4 6 P(X) 0.35 0.4 0.25 a) 3.4 b) 2.8 c) 3.2 d) 3.8
4) If for a random variable X, ܧ(ܺ)=1.5 and ܧ(ܺଶ)=9.25, then V(X)
is
a) 6.25 b) 6 c) 7 d) 8
5) If for a random variable X, ܸ(ܺ)=3.5 and ܧ(ܺଶ)=19.5. Then
E(X) is
a) 4.5 b) 4 c) 4.79 d) 4.2
6) The probability mass function, has condition
a) 0 P([i) 1, for each i b) 0 < P(x i) 1, for each i
c) 0 P([ i ) <1, for each i d) 0 < P(x i) =1, for each i
7) If event A and B are independent events than ܲ(ܣתܤ) is
a) ܲ(ܣ)×ܲ(ܤ) b) ܲ(ܣ)+ܲ(ܤ)
c) ܲ(ܣԢ)×ܲ(ܤ) d) ܲ(ܣ)×ܲ(ܤԢ)
8) If event A and B are independent events, P(A) = 0.3 and P(B) = 0.5
than ܲ(ܣתܤ) is
a) 0.8 b) 0.2 c) 0.15 d) 1.5
9) If P(A) = ¼ , P(B) = 4/5 and event A, B ar e independent then ܲ(ܣת
ܤ) is
a) 1/5 b) 1/20 c) ½ d) 1/16
10) If A and B are any two events associated with an experiment, then the
probability of simultaneous occurrence of events A and B is given by
a) P(A) + P(B) b) P(A) – P(B) c) P( A׫B) d) P(A תB)
12.8 LIST OF REFERENCES:
• Schaum‘s outline of theory and problems on probability and statistics
by Murray R. Spiegel.
• Fundamentals of mathematical Statistics by S.C. Gupta and V.K
kapoor.
• Basic Statistics by B. L. Agrawal.
7777777munotes.in

Page 239

239 13
DECISION THEORY
Unit Structure
13.0 Objectives:
13.1 Introduction
13.2 Components of Decision Theory
13.3 Types of Decision Making Criteria
13.3 Decision Making Under Uncertainty (Non -Probability)
13.4 Let us sum up:
13.5 Unit end Exercises:
13.6 List o f References:
13.0 OBJECTIVES:
After going through this unit, you will able to know:
• The term involving in decision making.
• The different environments of decision maker.
• How to make decision for non -probability with different criteria.
13.1 INTRODUCTION:
Every individual has to make some decisions or other regarding his every
day activity like:
i) What we are going have in breakfast?
ii) What we are going to wear today?
iii) Which way we will reach office?
iv) Which movie we are going to watch this week end?
v) Which mobile modal we have to purchase?
Like these many questions comes in mind and we make decision on it.
While taking these decisions we have different options and different
conditions. The decisions of routine nature do not involve high risks a nd are
consequently trivial in nature. When business executive make decisions,
their decisions affect other people like consumers of the product,
shareholders of the business unit, and employees of the organization. munotes.in

Page 240

240 Business Statistics
240 Thus, Decision is a choice whereby a per son comes to a conclusion about
given circumstances/ situation. It represents a course of behavior or action
about what one is expected to do or not to do. Decision - making may,
therefore, be defined as a selection of one course of action from two or more
alternative courses of action. Thus, it involves a choice -making activity and
the choice determines our action or inaction.
Some characteristic problems in decision theory:
1. Production problem: a factory produces several products, and we have
to decide h ow much to manufacture from each product such that e.g.
the profit should be maximal, or the aim can be maximal profit with
minimal use of energy (or labor) during the production process.
2. Investment problem: to choose a portfolio with maximal yield.
Constraint: financial, other points to consider risk factors, duration of
the investment etc.
3. Work scheduling: e.g. a supermarket employs a certain number of
workers. On each day, depending on the trade a certain number of
workers have to work. We have t o make a weekly schedule of the
available workers such that the total weekly wage of the workers be
minimal (wage for Saturday is higher than other days).
4. Buying fighter planes. Points to consider: cost, max. speed, reliability
etc
5. Tender evaluation. An international bank wants to replace its
computers with new ones. How to decide which offer to accept?
Points to consider: cost, quality of hardware, service conditions,
guarantees, etc. In each case the aim is one action: the best production
plan, the highest return, optimal work schedule, finding the best
fighter planes for the country, etc.
13.2 COMPONENTS OF DECISION THEORY:
Before learn the decision theory we have to learn some basic components
of decision theory so that it is easy to understand the concepts.
Decision Making: Decision -making is the selection based on some criteria
from two or more possible alternatives. “George R. Terry”
A decision is an act of choice, wherein an executive forms a conclusion
about what must be done in a given situ ation. A decision represents a course
of behaviour chosen from a number of possible alternatives. -—D.E. Mc.
Farland
From these definitions, it is clear that decision -making is concerned with
selecting a course of action from among alternatives to achieve a
predetermined objective. munotes.in

Page 241

241 Decision Theory Decision Maker: The person (Individual or a group) who are responsible
to take decision of best alternative is called decision maker. While making
decision he can use different criterion or different mathematical models,
also he has taken care of different environments and psychology of the
persons which involves in this.
Course of Action: Number of alternatives is available for decision making
is called course of action or acts or strategies. These are under control
known to the decision maker.
State of nature: Events that may follows when a particular decision
alternative is selected are called state of nature. The state of nature is
mutually exclusive and collectively exhaustive with respect to any decision
problem. This is not under the control of decision maker.
Pay-off: In order to compare each combination of action and state of nature
we need a payoff (e.g. profit or loss). Typically, this will be a numerical
value and it will be clear how we compare the payoffs. For example , we
seek to maximize profit or to minimize loss. We shall deal with
maximization problems (note that, as one is the negation of the other, there
is a duality between maximization and minimization problems). Initially,
we shall consider the payoff to be mo netary.
Pay-off table: The table consists of all pay -offs tabulated for all courses of
action ܣଵ, ܣଶ,….., ܣ௡ under all possible situations i.e. state of nature ܵଵ, ܵଶ,
….., ܵ௠. Note that these are mutually exclusive and collectively exhaustive
in nature. State of nature Course of actions ܣଵ ܣଶ ------ ----- ܣ௡ ܵଵ ܵଶ
ܵଷ
.
.
.
ܵ௠ ܽଵଵ ܽଶଵ
.
.
.
.
ܽ௠ଵ ܽଵଶ ܽଶଶ
.
.
.
.
ܽ௠ଶ -------- ------
-------
-------
-------
--------
------- -------- --------
-------
-------
--------
-------
------- ܽଵ௡ ܽଶ௡
.
.
.
.
ܽ௠௡
Number of alternative courses of action is available for making the
decision. These are also called actions, acts or strategies. These are
under control and known to the decision maker.
2. State of nature: Consequences (or events) that may follow when
a particular decision alternative (or strategy) is selected are called
states of nature. The states of nature a Number of alternative courses munotes.in

Page 242

242 Business Statistics
242 of action is available for making the decision. These are also called
actions, ac ts or strategies. These are under control and known to the
decision maker.
2. State of nature: Consequences (or events) that may follow when
a particular decision alternative (or strategy) is selected are called
states of nature. The states o f nature
13.3 TYPES OF DECISI ON MAKING CRITERIA:
We can experience several times in decision -making where we don't have
the necessary information to decide and keep hesitating. It's often a decision
that we have a lot of data with the circumstances and are very specific. There
are three types of settings for decision making that we can define. There are
also -
• Decision Making in Certain Conditions
• Decision Making in Uncertain Conditions
• Decision Making in Risky Conditions
Decision Making in Certain C onditions
Decision -making under such circumstances ensures that the person who
makes a decision has all the complete and appropriate knowledge for the
decision to be made. With all the data available, the individual can predict
the outcome of the decision. We can easily create a particular decision with
confidence by being able to predict the result. Typically, the product that
gives the best outcome will be used and carried out.
Decision Making in Uncertain Conditions
When you are unaware of the situatio n, making a decision is similar to the
absence of information to help us decide. The decision -maker doesn't know
the future because of inadequate knowledge and can't predict the outcome
of any choice he has. The decision -maker will have to judge and decide
based on their expertise to decide under certain circumstances. They have
to communicate and seek advice from people who have more experience if
they do not have those experiences. However, there is a slight risk involved
because we cannot predict the out come, but knowledge from the past will
close the gap.
The success or failure of the said company would be determined by the
nature of the decisions made in it. So before making an important decision,
all the knowledge and alternatives available must be st udied. The decision -
making process will help a great deal. The atmosphere in which they are
made is another aspect that impacts these decisions. In which these choices
are made, there are a few different types of environments.
Decision Making Under Risk : The last type of decision -making
environment is risky environments. Risk environments are when the munotes.in

Page 243

243 Decision Theory probability of multiple events is tied to a decision. You’re never sure about
the outcomes of your decision other than calculated guesses. Such decisions
are associated with events that could either be very successful or quite
disastrous for the organization.
When you’re faced with such problems, you will have some data available
related to the situation, but it’s all a game of probabilities. The past
experi ences of managers play a huge role, and they often have to take a good
look at their past when confronted with such decisions.
The best course of action to take in risky environments is first analyzing
the risk of all the alternative actions based on the i nformation available
to you.
13.3. DECISION MAKIN G UNDER UNCERTAINTY
(NON -PROBABILITY)
A decision problem, where a decision -maker is aware of various possible
states of nature but has insufficient information to assign any probabilities
of occurrence to them, is termed as decision -making under uncertainty. A
decision under uncertainty is when there are many unknowns and no
possibility of knowing what could occur in the future to alter the outcome
of a decision.
We feel uncertainty about a situation when w e can't predict with complete
confidence what the outcomes of our actions will be. We experience
uncertainty about a specific question when we can't give a single answer
with complete confidence.
Launching a new product, a major change in marketing strateg y or opening
your first branch could be influenced by such factors as the reaction of
competitors, new competitors, technological changes, changes in customer
demand, economic shifts, government legislation and a host of conditions
beyond your control. The se are the type of decisions facing the senior
executives of large corporations who must commit huge resources.
The small business manager faces, relatively, the same type of conditions
which could cause decisions that result in a disaster from which he or she
may not be able to recover. A situation of uncertainty arises when there can
be more than one possible consequences of selecting any course of action.
In terms of the payoff matrix, if the decision -maker selects A 1, his payoff
can be X 11, X 12, X 13, etc., depending upon which state of nature S 1, S2, S3,
etc., is going to occur.
Different criterion for decision making under uncertainty.
There are a variety of criteria that have been proposed for the selection of
an optimal course of action under the envi ronment of uncertainty. Each of
these criteria make an assumption about the attitude of the decision -maker.
Maximax Criterion : This criterion, also known as the criterion of
optimism, is used when the decision -maker is optimistic about future. munotes.in

Page 244

244 Business Statistics
244 Maximax impl ies the maximisation of maximum payoff. The optimistic
decision -maker locates the maximum payoff for each possible course of
action. The maximum of these payoffs is identified and the corresponding
course of action is selected. This is explained in the fol lowing example :
Example : Let there be a situation in which a decision -maker has three
possible alternatives A 1, A2 and A 3, where the outcome of each of them can
be affected by the occurrence of any one of the four possible events S 1, S2,
S3 and S 4. The m onetary payoffs of each combination of A i and S j are given
in the following table: State of nature Course of action ܣଵ ܣଶ ܣଷ ܵଵ ܵଶ
ܵଷ 1800 1400
700 1900 1500
600 1700 1300
500 Maximum 1800 1900 1700 Minimum 700 600 500 The optimal course of action in the above example, Since 1900 is maximum
out of the maximum payoffs, based on this criterion , is A 2.
Maximin Criterion: This criterion, also known as the criterion of
pessimism, is used when the decision -maker is pessimistic about future.
Maximin implies the maximisation of minimum payoff. The pessimistic
decision -maker locates the minimum payof f for each possible course of
action. The maximum of these minimum payoffs is identified and the
corresponding course of action is selected.
In above example, Since 700 is maximum out of the minimum payoffs, the
optimal action is A 1.
Regret Criterion: This criterion focuses upon the regret that the decision -
maker might have from selecting a particular course of action. Regret is
defined as the difference between the best payoff we could have realised,
had we known which state of nature was going to occur and the realised
payoff. This difference, which measures the magnitude of the loss incurred
by not selecting the best alternative, is also known as opportunity loss or
the opportunity cost .
From the payoff matrix (given in above example), the payoffs
corresponding to the actions A 1, A2, ...... An under the state of nature S j are
X1i, X 2j, ...... X nj respectively. Of these assume that X 2j is maximum. Then
the regret in selecting A i, to be denoted by R ij is given by X 2j - Xij, i = 1 to
m. We note that the r egret in selecting A 2 is zero. The regrets for various
actions under different states of nature can also be computed in a similar
way. munotes.in

Page 245

245 Decision Theory The regret criterion is based upon the minimax principle, i.e., the decision -
maker tries to minimise the maximum regret. Thus, the decision -maker
selects the maximum regret for each of the actions and out of these the
action which corresponds to the minimum regret is regarded as optimal. The
regret matrix of example can be written as given below: State of nature Course of action ܣଵ ܣଶ ܣଷ ܵଵ ܵଶ
ܵଷ 100 0
0 0 100
100 200 300
200 Maximum 100 100 300 From the maximum regret column, we find that the regret corresponding to
the course of action is A 1 and A 2 are minimum. Hence, A 1 and A 2 are
optimal.
Hurwi cz Criterion: The maximax and the maximin criteria, discussed
above, assumes that the decision -maker is either optimistic or pessimistic.
A more realistic approach would, however, be to take into account the
degree or index of optimism or pessimism of the decision -maker in the
process of decision -making. If a, a constant lying between 0 and 1, denotes
the degree of optimism, then the degree of pessimism will be 1 - a. Then a
weighted average of the maximum and minimum payoffs of an action,
with a and 1 - a as respective weights, is computed. The action with highest
average is regarded as optimal.
We note that a nearer to unity indicates that the decision -maker is optimistic
while a value nearer to zero indicates that he is pessimistic. If a = 0.5, the
decis ion maker is said to be neutralist.
We apply this criterion to the payoff matrix of example 17. Assume that the
index of optimism a = 0.6. State of nature Course of action ܣଵ ܣଶ ܣଷ ܵଵ ܵଶ
ܵଷ 1800 1400
700 1900 1500
600 1700 1300
500 Maximum 1800 1900 1700 Minimum 700 600 500 [Max×ܽ]+[݅݊݅݉×
(1െܽ)] 1800×0.6+700 ×0.4=
1360 1900×0.6+600 ×0.4=
1380 1700×0.6+500 ×0.4=
1220
Since 1380 is maximum, Hence A 2 is maximum, it is optimal.
munotes.in

Page 246

246 Business Statistics
246 Laplace Criterion: In the absence of any knowledge about the probabilities
of occurrence of various states of nature, one possible way out is to assume
that all of them are equally likely to occur. Thus, if there are n states of
nature, each can be assigned a probability of occurrence = 1/n. Using these
probabilities, we compute the expected payoff for each course of action and
the action with maximum expected value is regarded as optimal. State of nature Course of action ܣଵ ܣଶ ܣଷ ܵଵ ܵଶ
ܵଷ 1800 1400
700 1900 1500
600 1700 1300
500 Average 1300 1333.33 1166.67 Since the average for A 2 is maximum, it is optimal.
Example 1: For the following pay -off table, obtained the best decision
using i) Maximax criterion, ii) Maximin criterion. State of nature Course of action ܣଵ ܣଶ ܣଷ ܵଵ ܵଶ
ܵଷ 85 40
70 95 50
60 70 30
50
Solution: Select maximum and minimum value from each course of action
from pay -off table. State of nature Course of action ܣଵ ܣଶ ܣଷ ܵଵ ܵଶ
ܵଷ 85 40
70 95 50
60 70 30
50 Maximum 85 95 70 Minimum 40 50 30 i) Maximax = Max (Maximum)
= Max ( 85, 95, 70)
= 95 munotes.in

Page 247

247 Decision Theory Which correspond to the course of action ܣଶ.
׵ The best decision is ܣଶ.
ii) Maximin = Max (Minimum)
= Max ( 40, 50, 30)
= 50
Which correspond to the course of action ܣଶ.
׵ The best decision is ܣଶ.
Example 2: Following payoff tables are given about the demands and
different types of product, obtain the best decision usin g the person of mind
set with i) optimistic ii) Pessimistic. Products Demands High Medium Low ܲଵ ܲଶ
ܲଷ 850 1050
1200 650 700
950 300 400
500
Solution: First interchange raw and column of table: Demands Products ܲଵ ܲଶ ܲଷ High Medium
Low 850 650
300 1050 700
400 1200 950
500 Maximum 850 1050 1200 Minimum 300 400 500 i) Optimistic: Maximax = Max(Maximum)
= Max( 850, 1050, 1200)
= 1200
Therefore, 1200 belongs to product ܲଷ.
Therefore optimal decision is ܲଷ.
ii) Pessimistic: Minimax = Max( Minimum) munotes.in

Page 248

248 Business Statistics
248 = Max( 300, 400, 500)
= 500
Therefore, 300 belongs to product ܲଷ.
Therefore optimal decision isܲଷ.
Example 3: For the following pay -off table, find the best decision using
Minimax regret criterion. Events Course of action ܣଵ ܣଶ ܣଷ ܧଵ ܧଶ
ܧଷ 120 145
160 140 150
170 710 130
150
Solution: First we have to prepare regret table. Events Course of action ܣଵ ܣଶ ܣଷ ܧଵ ܧଶ
ܧଷ 50 5
10 30 0
0 0 20
20 Maximum 50 30 20 For Minimax regret criterion,
Minimax = Min( Maximum)
= Min( 50, 30, 20)
=20
Which correspond to the course of action ܣଷ.
׵ The best decision is ܣଷ.
Example 4: For the given pay -off table, state which alternative can be
chosen as the best alternative. Using Hurwicz Alpha criterion as ߙ=0.7. Alternatives State of nature ܵଵ ܵଶ ܵଷ ܵସ ܣଵ ܣଶ
ܣଷ 35 40
30 25 20
30 12 15
18 18 20
25 munotes.in

Page 249

249 Decision Theory Solution: First interchange Raw and column of pay -off table. State of nature Alternatives ܣଵ ܣଶ ܣଷ ܵଵ ܵଶ
ܵଷ
ܵସ 35 25
12
18 40 20
15
20 30 30
18
25 Maximum 35 40 30 Minimum 12 15 18 [Max×ܽ]+[݅݊݅݉×(1െ
ܽ)] 35×0.7+12×0.3
=28.1 40×0.7+15+0.3=32.5 30×0.7+18×0.3
=26.4
Here Maximum value 32.5 which belong to alternative ܣଶ.
Hence the decision is ܣଶ.
Example 5: Form the given pay -off table, obtain the best decision using
Laplace criterion. Participation Policy ܲଵ ܲଶ ܲଷ High Medium
Low 80 60
50 90 50
40 100 70
30
Solution: Here we have to find average of each policy. Participation Policy ܲଵ ܲଶ ܲଷ High Medium
Low 80 60
50 90 50
40 100 70
30 Average 190/3 = 63.33 180/3 = 60 200/3 = 66.67 Using Laplace criterion,
Laplace criterion = Max(Average) munotes.in

Page 250

250 Business Statistics
250 = Max(66.33, 60, 66.67)
= 66.67
Here Maximum value 66.67 which belong to alternative ܲଷ.
Hence the decision is ܲଷ.
Example 6: The following table p resents the pay -off returns associated with
four alternative types of investment decision. State of economy Investment alternative (Rs 10,000) Saving A/C Fixed Deposit Mutual Fund Stock Recession Stable
Expansion 450 400
500 500 550
650 800 950
1050 -250 1200
1500
State which can be chosen as the best act using: (a) Maximax, (b) Maximin,
(c) Equal likelihood (Laplace), (d) HurZicz Alpha criterion Į 0.4 (e)
Minimax regret (savage criterion),.
Solution: State of economy Investment alternative (Rs 10,000) Saving A/C Fixed Deposit Mutual Fund Stock Recession Stable
Expansio
n 450 400
500 500 550
650 800 950
1050 -250 1200
1500 Maximum 500 650 1050 1500 Minimum 400 500 800 -250 Average 1350/3=450 1700/3=566.67 2800/933.33 2450/3=816.67 [Max×ܽ]+
[݅݊݅݉×
(1െܽ)] 500×0.4+400
×0.6
=440 650×0.4+500 ×0.6
=460 1050×0.4+800
×0.6
=900 1500×0.4+(െ250
×0.6)=450

(a) Maximax: Max(Maximum) = Max(500, 650, 1050, 1500) = 1500.
Here the value 1500 belong to alternative Stocks. munotes.in

Page 251

251 Decision Theory Hence the decision is invest in Stock.
(b) Maximin = Max(Minimum) = Max(400, 500, 800, -250) = 800.
Here the value 800 belong to alternative Mutual Fund.
Hence the deci sion is invest in Mutual Fund.
(c) Equal likelihood (Laplace) = Max(Average) = Max(450, 566.67,
933.33, 816.67) = 933.33.
Here the value 933.33 belong to alternative Mutual Fund.
Hence the decision is invest in Mutual Fund.
(d) HurZicz Alpha criterion Į 0.4 Ma[(440, 460, 900, 450) 900.
Here the value 900 belong to alternative Mutual Fund.
Hence the decision is invest in Mutual Fund
(e) Minimax regret (savage criterion),.
Prepared regret table, State of economy Investment alternative (Rs 10,000) Saving A/C Fixed Deposit Mutual Fund Stock Recession Stable
Expansion 350 800
1000 300 650
850 0 250
450 1050 0
0 Maximum 1000 850 450 1050 Minimax regret (savage criterion) = Min(1000, 850, 450, 1050) = 450.
Here the value 900 belong to alternative Mutual Fund.
Hence the decision is invest in Mutual Fund
13. 4 LET US SUM UP:
In this chapter we have learn
• Basic term requirement of decision theory.
• Different environments of decision making.
• Different methods of decision making under uncertainty.
13.5 UNIT END EXERC ISES: munotes.in

Page 252

252 Business Statistics
252 1. Given the following pay -off table, obtain the optimum decision using
i) Maximax Criterion, ii)Maximin criterion, iii) Laplace criterion. State of nature Course of action ܣଵ ܣଶ ܣଷ ܵଵ ܵଶ
ܵଷ 4000 3500
2000 2500 2500
3600 2500 1200
2000
2. For the following pay off table. Find the best decision using minimax
regret criterion. State of nature Course of action ܣଵ ܣଶ ܣଷ ܵଵ ܵଶ
ܵଷ 400 350
200 250 250
360 250 120
200
3. For the given pay -off table, for different mindset persons obtain the
optimal decision using
i) Optimistic ii) Pr ismatic. Evants Acts ܣଵ ܣଶ ܣଷ ܧଵ ܧଶ
ܧଷ 40 35
20 20 25
36 25 12
20
4. For the given pay -off table, state which alternative can be chosen as
the best alternative. Using Hurwicz Alpha criterion as ߙ=0.8. Alternatives State of nature ܵଵ ܵଶ ܵଷ ܵସ ܣଵ ܣଶ
ܣଷ 350 400
300 250 200
300 120 150
180 180 200
250 munotes.in

Page 253

253 Decision Theory 5. For the given pay -off table, state which alternative can be chosen as
the best alternative. Using i) Maximax Criterion, ii)Maximin
criterion, iii ) Laplace criterion. Alternatives State of nature ܵଵ ܵଶ ܵଷ ܵସ ܣଵ ܣଶ
ܣଷ 650 400
300 750 850
900 150 150
180 180 200
250
6. A management is faced with the problem of choosing one of three
products for manufacturing.
The potential demand for each product may turn out to be good,
moderate or poor. Suggest the management the best product using 1)
Maximax, 2) Maximin, 3) Equal likelihood (Laplace), 4) Hurwicz
Alpha criterion Į 0.4 5) Minima[ regret (savage cr iterion). Product Nature of Demand Good Moderate Poor W X
Y
Z 20,000 15,000
12,000
21,000 18,000 17,000
18,000
19,000 10,000 12,000
15,000
14,000
7. The research department of consumer products division has
recommended the marketing department to la unch a soap with 3
different perfumes. The marketing manager has to decide the type of
perfume to launch under the following estimated payoff for various
levels of sales. Sales (Units) Types of perfume I II III 20,000 15,000
10,000 40 60
30 60 70
50 30 20
10
Find the best decision using (i)Maximax , (ii) Maximin , (iii) Minimax
Regret and (iv) Laplace criteria. munotes.in

Page 254

254 Business Statistics
254 8. Construct a payoff matrix for the following situations and find the best
decision using (i) Maximin, (ii) Maximax, (iii) Laplace criteria. Product Fixed cost (Rs) Variable cost (Rs.) X Y
Z 500 400
300 15 12
10
The likely demand (units) of products Poor demand 300 Moderate
demand 700 High demand 1000 Selling price of each product is Rs.
25.
9. for the following pay -off table, obtained the be st product using 1)
Maximax, 2) Maximin, 3) Equal likelihood (Laplace), 4) Hurwicz
Alpha criterion Į 0.6) Minima[ regret (savage criterion). Evants Acts ܣଵ ܣଶ ܣଷ ܧଵ ܧଶ
ܧଷ 150 315
220 420 235
360 225 125
200
10. Suggest the best decision for the given pay -off table using i)
Maximax, ii) Maximin, iii) Equal likelihood (Laplace), iv) Hurwicz
Alpha criterion Į 0.6 v) Minima[ regret (savage criterion). Evants Acts ܣଵ ܣଶ ܣଷ ܧଵ ܧଶ
ܧଷ 80 65
70 90 85
76 85 92
60
13.6 LIST OF REFERE NCES:
• Fundamentals of mathematical Statistics by S.C. Gupta and V.K
kapoor.
• Basic Statistics by B. L. Agrawal.
7777777munotes.in

Page 255

254 Business Statistics
254 14
DECISION THEORY II
Unit Structure
14.0 Objectives:
14.1 Introduction:
14.2 Decision Making Under Risk ( Probabilitistics)
14.3 Decision Tree
14.4 Let us sum up:
14.5 Unit end Exercises:
14.6 List of References:
14.0 OBJECTIVES:
After going thr ough this unit, you will able to know :
• The environment of decision making under risk
• How to make decision for probability with different criteria.
• The decision tree technique for multi -stage decision making
14.1 INTRODUCTION:
In the previous chapter we have learn about the non -probabili ty base but
with different techniques to make decision. Here we are going to learn if we
make decision when probabilities with each state of events are given. To
obtain best decision we used pay -off table and given probability.
In our day to day life we t ake lot of decisions, like purchasing any object or
to do investment for that object. In these decisions some are simple in the
manner but when there are many possibilities to take the decision at that
time risk and uncertainty occurs that which possible c ondition I should take
for the better output. Today by experience we know that few people make
decisions after the well deliberated calculations, no matter if the decision
situation is in a job situation or in a personal life
In deterministic models, a goo d decision is judged by the outcome alone.
However, in probabilistic models, the decision maker is concerned not only
with the outcome value but also with the amount of risk each decision
carries. As an example of deterministic versus probabilistic models,
consider the past and the future. Nothing we can do can change the past, but
everything we do influences and change the future, although the future has
an element of uncertainty munotes.in

Page 256

255 Decision Theory II 14.2 DECISION MAKING UNDER RISK
(PROBABILITISTICS) :
In case of decision -making under uncertainty the probabilities of occurrence
of various states of nature are not known. When these probabilities are
known or can be estimated, the choice of an optimal action, based on these
probabilities, is termed as decision making under risk.
Risk implies a degree of uncertainty and an inability to fully control the
outcomes or consequences of such an action. Risk or the elimination of risk
is an effort that managers employ. However, in some instances the
elimination of one risk may increase s ome other risks. Effective handling of
a risk requires its assessment and its subsequent impact on the decision
process. The decision process allows the decision -maker to evaluate
alternative strategies prior to making any decision. The process is as
follo ws:
• The problem is defined and all feasible alternatives are considered.
The possible outcomes for each alternative are evaluated.
• Outcomes are discussed based on their monetary payoffs or net gain
in reference to assets or time.
• Various uncertaintie s are quantified in terms of probabilities.
• The quality of the optimal strategy depends upon the quality of the
judgments. The decision -maker should identify and examine the
sensitivity of the optimal strategy with respect to the crucial factors.
There are two methods to take best decision under risk:
i) Expected Monetary Value (EMV)
ii) Expected Opportunity Loss (EOL)
14.2.1 Expected Monetary value (EMV):
The expected monetary value of a decision is the long run average value of
the outcome of that dec ision. In other words, if we have a decision to make,
let’s suppose that we could make that exact same circumstances many times.
One time a good state of nature may occur and we would have a very
positive outcome. Another time we many have a negative outco me because
some less favorable state of nature happened. If somehow we could repeat
that decision lots and lots of times and determine the outcome for each time
and then average all thos e outcomes then we would have the EMV of the
decision alternative.
Step to be followed for EMV calculation:
• First multiple probability with the pay -off table to get EMV table.
• Find the sum of each Course of Action of EMV table.
• Select the maximum EMV from Course of action is the best decision. munotes.in

Page 257

256 Business Statistics
256 For example : Events Probability Course of action EMV ܣଵ ܣଶ ܣଵ ܣଶ ܧଵ ܧଶ
ܧଷ 0.3 0.5
0.2 40 60
80 30 70
90 12 30
16 9 35
18 Total 58 62 Here maximum EMV is 62.
Therefore the best decision is ܣଶ.
Example 1: For the following pay -off table, obtained the best decision
using EMV m ethod. State of nature Probability Course of action ܣଵ ܣଶ ܣଷ ܵଵ ܵଶ
ܵଷ 0.3 0.4
0.3 120 140
150 140 180
100 100 160
150
Solution: Prepare the EMV table. State of nature Probability Course of action EMV ܣଵ ܣଶ ܣଷ ܣଵ ܣଶ ܣଷ ܵଵ ܵଶ
ܵଷ 0.3 0.4
0.3 120 140
150 140 180
100 100 160
150 48 56
45 42 72
30 30 64
45 Total 149 144 139 Here maximum EMV is 149.
Therefore the best decision is ܣଵ.
Example 2: A person has to make a choice of purcha sing 100 shares of
company A or
company B. If the market is high, it will cost him Rs. 30,000, if it is fair, it
will cost him Rs. 15,000 and for a lo w market, it will cost him Rs. 10,000
for company A. While for company B, the corresponding amounts ar e Rs munotes.in

Page 258

257 Decision Theory II 35,000, 15 ,000 and Rs. 12,000 respectively for high, fair and law market
condition. The respective probabilities for these market conditions are 0.6,
0.3 and 0.1 respectively. Draw appropriate Decision Tree and advice the
person about purchase of shares .
Solution: First covert the data into pay -off and prepare EMV table. State of Economy Probability Company EMV A B A B High Fair
Low 0.6 0.3
0.1 30,000 20,000
10,000 35,000 15,000
12,000 18,000 6,000
1,000 21,000 4500
1200 Total 25,000 26,700 Here maximum EMV is 26,700.
Therefore the best decision is Company B .
EPPI and EVPI:
• Using the information in the decision problem, obtain pay -off table
and compute EMV for each decision alternatives (courses of
action). Next, compute Expected Pay- -off with Perfect
Information (EPPI) and Expected Value of Perfec t Information
(EVPI).
EPPI: If the decision maker has perfect information before selecting
a course of action, he will select the best alternative (with highest pay -
off) corresponding to each state of nature (event). Suppose the dealer
can buy market information that can accurately predict market
demand (state of nature), he can decide how many units to order. For
this purpose, EPPI is computed as follows:
EPPI = summation of product of probability of each event and
maximum pay -off.
EPPI (probability ma[imum pay -off for each column)
Next, EVPI is the difference of expected pay -off without and
with perfect information. It is the maximum amount the dealer should
spend for obtaining perfect information abou t the market demand
(state of nature).
EVPI E[pected Value Zith perfect Information ŷ E[pected Value
without Perfect Inf
EVPI EPPI ŷ Ma[ (EMV).
EPPI and EVPI: munotes.in

Page 259

258 Business Statistics
258 • Using the information in the decision problem, obtain pay -off table
and compute EMV for each decision alternatives (courses of
action). Next, compute Expected Pay- -off with Perfect
Information (EPPI) and Expected Value of Perfect Information
(EVPI).
EPPI: If the decision maker has perfect information be fore selecting
a course of action, he will select the best alternative (with highest pay -
off) corresponding to each state of nature (event). Suppose the dealer
can buy market information that can accurately predict market
demand (state of nature), he can decide how many units to order. For
this purpose, EPPI is computed as follows :
EPPI = summation of product of probability of each event and
maximum pay -off.
EPPI (probability ma[imum pay -off for each column)
Next, EVPI is t he difference of expected pay -off without and
with perfect information. It is the maximum amount the dealer should
spend for obtaining perfect information about the market demand
(state of nature).
EVPI = Expected Value with perfect Info rmation ŷ E[pected Value
without Perfect Inf
EVPI EPPI ŷ Ma[ (EMV).
14.2.2 Expected Pay -off with perfect information (EPPI) and Expected
Value of perfect information (EVPI) :
EPPI: If the decision maker has perfect information before selec ting a
course of action, he will select the best alternative (with highest pay -off)
corresponding to each state of nature (event). Suppose the dealer can buy
market information that can accurately predict market demand(State of
nature), he can decide how m any units to order. For this purpose, EPPI is
computed as follows:
EPPI= Summation of product of probability of each event and maximum
pay-off.
EPPI= σ(࢚࢟࢏࢒࢏࢈ࢇ࢈࢕࢘ࡼ ×ࡹࢇ࢞࢏࢓࢛࢓ ࢖࢟ࢇെ
࢕ࢌࢌ ࢘࢕ࢌ ࢋࢇࢎࢉ ࡿࢋ࢚ࢇ࢚ ࢕ࢌ ࢔ࢋ࢛࢚࢘ࢇ)
EVPI: EVPI is the difference of expected pay -off without and with perfect
information. It is the maximum amount the dealer should spend for
obtaining perfec t information about the market demand(State of nature)
EVPI = EPPI – Max(EMV).
Example 3: The following is demand distribution of a certain product munotes.in

Page 260

259 Decision Theory II No. of unit demanded 10 11 12 Probability 0.3 0.5 0.2 If the product is sold at Rs. 80 per unit with cost price Rs. 60 per unit, obtain
the best decision using EMV. Also compute EVPI.
Solution: Here we have to prepare pay -off table,
S.P = Rs. 80
C.P = Rs. 60
Profit per unit = Rs. 20
ݐ݂݅݋ݎܲ ݂ܿ݊ݑ݋݅ݐ݊=20ܦ ܦ൒ܵ
=20ܦെ60(ܵെܦ) ܦ<ܵ Demand (D) Probability Production(S) Maximum EMV 10 11 12 10 11 12 10 11
12 0.3 0.5
0.2 200 200
200 180 220
220 160 200
240 200 220
240 60 100
40 54 110
44 48 100
48 Maximum 200 208 196 Here maximum EMV is 208.
Therefore the best decision is to produce 11 units daily .
EPPI = 200 ×0.3+220 ×0.5+240 ×0.2=218
EVPI= EPPI – Max(EMV) = 218 – 208 = 10
Expected Oppor tunity Loss (EOL):
An alternative approach is to maximize EMV by minimizing expected
opportunity loss. First an opportunity loss table is constructed. Then the
EOL is computed for each alternative by multiplying the opportunity loss
by the probability and adding these together.
EOL is the cost of not picking the best solution.
• First construct an opportunity loss table.
• For each alternative, multiply the opportunity loss by the probability
of that loss for each possible outcome and add these together.
• Minimum EOL will always result in the same decision as the
Maximum EMV.
• Minimum EOL will always equal EVPI munotes.in

Page 261

260 Business Statistics
260 Example 4: For the given pay -off table, obtained the best optimal decision
using EOL method. Demand Probability Alternatives ܣଵ ܣଶ ܣଷ High Medium
Low 0.5 0.3
0.2 95 45
15 80 60
20 100 75
10
Solution: For the Expected opportunity loss (EOL) first we have to prepare
regret table. Demand Probability Regret table EOL ܣଵ ܣଶ ܣଷ ܣଵ ܣଶ ܣଷ High Medium
Low 0.5 0.3
0.2 5 30
5 20 15
0 0 0
10 2.5 9
1 10 4.5
0 0 0
2 Total 12.5 14.5 2 Here the minimum EOL is 2.
Therefore the optimal solution is ܣଷ.
14.3 DECISION TREE :
Decision trees are best for projects that involve decisions over time. These
results in m any possible outcomes. Decision trees are inherently for
decision making under risk since we must assign probabilities for each node
emanating from a chance node. Decision trees also can incorporate the
alternatives into one graphic showing the decisions t o be made.
Any problem that can be presented in a decision table can also be
graphically illustrated by a decision tree. All decision trees contain decision
nodes and state of nature nodes.
• Decision nodes are represented by squares from which one or several
alternatives may be chosen.
• State -of-nature nodes are represented by circles out of which one or
more state -of-nature will occur.
In drawing the tree, we begin at the left and move t o the right. Branches
from the squares (decision nodes) represent alternatives, and branches from
the circles (state -of-nature node) represent the state of nature.
Decision Tree Analysis:
• Define the problem
• Structure or draw the decision tree munotes.in

Page 262

261 Decision Theory II • Assign probabilities to the states of nature
• Estimate payoffs for each possible combination or alternatives and
states of nature
• Solve the problem by computing expected monetary values (EMVs)
for each state of nature node.
Structure of Decision Trees:
• Trees start from left to right.
• Represent decisions and outcomes in sequential order.
• Squares represent decision nodes
• Circles represent states of nature nodes
• Lines or branches connect the decision nodes and the states of nature
Example 5: Draw the d ecision tree for the given pay -off table. Also
obtained the best decision by EMV method. Demand Probability Colours Red Black White High Medium Low 0.6 0.3 0.1 60 40 30 80 60 40 70 50 40 Solution:

Here maximum EMV is 70.
Therefore the best decision is Black Colour.
munotes.in

Page 263

262 Business Statistics
262 14.4 LET US SUM UP:
In this chapter we have learn :
• To take decision making under risk.
• Different methods to take decision under ri sk by EMV and EOL.
• To represent pay -off in decision tree.
• Decision making with decision tree with EMV.
14.5 UNIT END EXERCI SES:
1. Write short note on decision tree and procedure of drawing decision
tree.
2. Explain Decision making under risk.
3. Explain EMV with one example.
4. Write not on EOL.
5. For the following pay -off tabl e, find optimal decision using EMV
method Course of Action State of Nature ܵଵ ܵଶ ܵଷ ܣଵ 25 85 95 ܣଶ 40 0 60 ܣଷ 65 30 55 6. The following pay of matrix has been formed by portfolio manager
giving pay -offs for different modes of investment under different
states of the economy. Decide on the best mode of investment by
calculating expected monetary values (EMV).

State of economy Probability Investment alternative Gov. F.D Company
F.D Mutual fund Shares Depression 0.25 100 90 50 0 Recovery 0.45 100 110 120 140 Prosperity 0.30 100 120 150 200 munotes.in

Page 264

263 Decision Theory II 7. For the following pay -off table, select the best decision using EOL
criteria. Course of Action State of Nature ܵଵ ܵଶ ܵଷ ܣଵ 25 85 95 ܣଶ 40 0 60 ܣଷ 65 30 55 Probability 0.5 0.2 0.3 8. Find the best decision by using EOL criterion for the following pair
of Matrix. State of nature Decisions ܣଵ ܣଶ ܣଷ probability ܵଵ 20 30 10 0.5 ܵଶ 60 40 30 0.3 ܵଷ 30 70 40 0.2 9. For the following pay of table, suggest the best decision by EOL
method. Course of Action State of Nature ܵଵ ܵଶ ܵଷ ܣଵ 14 16 10 ܣଶ 12 15 16 ܣଷ 20 18 14 Prob. 0.4 0.3 0.3 10. The following is the demand distribution of a certain product. No. of units demanded 100 150 200 Probability 0.3 0.45 0.25 If the cost the units is Rs. 250 per unit and selling price is Rs. 34 0 per
unit. Prepare a pay -off table and obtained best decision using EMV.
11. The probability distribution of daily demand of cell phones in a
mobile gallery is given below. Demand 5 10 15 20 Probability 0.4 0.2 0.3 0.1 munotes.in

Page 265

264 Business Statistics
264 If the cost the units is Rs. 25000 per unit and selling price is Rs. 34000
per unit. Prepare a pay -off table and obtained best decision using
EMV.
12. A company wants to launch a new drink this summer. From the
following pay -off table.
Decide t he flavour to be launched using decision tree by EMV
method. Summer condition Flavour of the soft drink Orange Mango Lime Mild 150 58 59 Moderate 153 151 154 Severe 158 230 198 Very severe 250 268 278 13. Draw a decision tree for the following decision making pro blem and
suggest the best decision. Course of Action State of Nature ܵଵ ܵଶ ܵଷ ܣଵ 34 20 18 ܣଵ 14 16 12 Prob. 0.2 0.3 0.5 14. A manufacturer of toys i s interested to know whether he should launch
a deluxe model or a popula r model of a toy. If the deluxe model is
launched, the probabilities that the market will be good, fair or poor
are given by 0. 3,0.4 and 0.3 respectively with payoffs Rs. 1,40,000,
Rs. 70,000 and Rs. ( -10,000). If the p opular model is introduced, the
corresponding probabilities are given by 0 .4, 0.3 and 0.3 with
respective payoffs Rs 1,50,000, Rs. 80,000 and Rs. (-15,000). D ecide
which model should be launched using decision tree by EMV method.
15. Unique home appliances finds that the cost of holding a cooking ware
in stock for a month is Rs. 200 . Customer who cannot obtain a cooking
ware immedia tely tends to go to other dealers and he estimates that
for every customer who cannot get immediate delivery he loses an
average of Rs. 500. The probabilities of a demand of 0, 1, 2, 3 , 4, 5
cooking ware in a month are 0.05, 0.1, 0.2, 0 .3, 0.2, 0 .15 respectively.
Determine the optimum stock level of cooking wares. Using EMV
criterion. munotes.in