Page 1
1
UNIT I
1
THE MEAN, MEDIAN, MODE, AND
OTHER MEASURES OF CENTRAL
TENDENCY
Unit structure
1.0 Objective s
1.1 Introduction
1.2 Index or Subscript , Notation , Summation Notation
1.3 Averages or Measures of Central Tendency
1.4 Arithmetic Mean
1.4.1 Arithmetic Mean Computed from Grouped Data
1.4.2 Properties of the Arithmetic Mean
1.5 Weighted Arithmetic Mean
1.6 Median
1.7 Mode
1.8 Empirical Relation between the Mean, Median, and Mode
1.9 Geometric Mean
1.10 Harmonic Mean
1.11 Relation between the Arithmetic , Geometric, and Harmonic Means
1.12 Root Mean Square
1.13 Quartiles, Deciles and Percentiles
1.14 Software and Measures of Central Tendency
1.15 Summary
1.16 Exercise
1.17 References
1.0 OBJECTIVES
After going through this chapter, students will able to learn
To present huge data in a summarized form
To calculate and interpret the mean, the median and the mode,
To facilitate comparison
To calculate geometric mean, harmonic mean
To trace precise relationship munotes.in
Page 2
2
To calculate Quartiles, deciles and percentiles
To help in decision -making
1.1 INTRODUCTION
A measure of central tendency is a single value that describes a set of data
by identifying the central position within that set of data. Mean, median
and mode are different measures of central tendency in a numerical data.
The word average is commonly used in day to day conversation like we
often talk about average height of the girls, average student of the class,
and average run rate of the match. When we say average means neither too
good nor b ad. However, in statistics the term average has different
meaning. Average is a single value which representing a group of values
so such a value easy to understand, easy to compute and based on all
observations.
1.2 INDEX OR SUBSC RIPT, NOTATION, SUMMATION NOTATION
Let the symbol X i (read „X subscript i) denote any of the N values X 1, X2,
X3, …….X N assumed by a variable X. The letter i in X i which can stand
for any of the numbers 1, 2, 3, ……, N is called a subscript or index. Any
letter other than i such as j, k, p, q or r could be used also.
Summation Notation:
The symbol ∑
is used to denote the sum of all the ‟s from i = 1 to
N.
∑
= X1 + X 2 + X 3 + ……. + X N
We generally denote this sum simply by ∑ , ∑ .
The symbol ∑ is the Greek capital letter sigma denoting sum.
Ex. ∑
= aX1 + aX 2 + aX 3 + ……. + aX N
= a (X1 + X 2 + X 3 + ……. + X N) = a ∑
, where a
is a constant.
1.3 AVERAGES OR MEASURES OF CENTRAL TENDENCY
There are different ways of measuring the central tendency of a set of
values .
Various authors defined Average differently.
“Average is an attempt to find one single figure to describe whole of
figures.” – Clark munotes.in
Page 3
3
“An average is a single value selected from a group of values to represent
them in some way - a value which is supposed to stand for whole group, of
which it is a part, as typical of all the values in the group.” – A. E. Waugh
“An average is a typical value in the sense that it is sometimes employed
to represent all the individual values in the series or of a variable.” – Ya-
Lun-Chou
Types of Averages:
Arithmetic Mean: a. Simple, b. W eighted
Median
Mode
Geometric Mean
Harmonic Mean
1.4 ARITHMETIC MEAN
The most popular and widely used measure of representing the entire data
by one value is mean or Average.
It simply involves taking the sum of a group of numbers, then dividing
that sum by the total number of values in the group.
Arithmetic mean can be of two types.
a. Simple arithmetic mean
b. Weighted arithmetic mean
A. Simple Arithmetic Mean – Individual Observations:
Calculation of mean in case of individual observations [ i. e. when
frequencies are not given] is very simple. Here, we add all values of the
variable and divide the total by the number of items.
̅ =
= ∑
̅ = Arithmetic Mean; N = number of observations;
∑ = sum of all the values of the variable X i. e.
Ex 1. Find the Arithmeti c mean of following five values 8, 45 , 49, 54, 79.
Sol: We know that, ̅ =
̅ =
=
= 47
Ex 2. Find the Arithmetic mean of following values.
4350, 7200, 6750, 5480, 7940, 3820, 5920, 8450, 4900, 5350.
Sol: We know that, ̅ =
munotes.in
Page 4
4
̅ =
=
=
6416
Short cut method: ̅ = A + ∑
Where A is assumed mean and d is deviation of items from assumed mean
i. e. d = ( ).
Ex 3. Calculate arithmetic mean from following data.
2690, 3670, 4580, 5660, 2750, 2830, 4100, 572 0, 50 40, 4840
Sol: X ( X-A) 2690 -2310 3670 -1330 4580 -420 5660 660 2750 -2250 2830 -2170 4100 -900 5720 720 5040 40 4840 -160 ∑ = -8120
Consider assumed mean, A = 5000
̅ = A + ∑
= 5000 -
=4188
1.4.1 Arithmetic Mean Computed from Grouped Data:
Simple Arithmetic Mean – Discrete series:
Calculation of mean in case of frequencies are given,
̅ = ∑
f = Frequency;
X = the variable
N = Total number of observations i.e. ∑
Here, first multiply the frequency of each row with variable and obtain the
total ∑ and then divide the total by number of observations, i.e. total
frequency.
Ex 4. Following are the marks obtained by 60 students. Calculate
arithmetic mean.
munotes.in
Page 5
5
Marks 15 30 45 60 70 80 No. of students 6 14 15 15 4 6
Sol: Let the marks denoted by X and number of students denoted by f.
Marks X No. of Students f fX 15 6 90 30 14 420 45 15 675 60 15 900 70 4 280 80 6 480 N = 60 ∑ = 2845
̅ = ∑
=
= 47.42
Short cut method: ̅ = A + ∑
Where A is assumed mean and d is deviation of items from assumed mean
i. e. d = ( ),
N = ∑
Ex 5. Calculate arithmetic mean by the short cut method using data from
Ex. 4
Sol: Marks X No. of Students f d = ( ) fd 15 6 -30 -180 30 14 -15 -210 45 15 0 0 60 15 15 225 70 4 25 100 80 6 35 210 N = 60 ∑ = 145
Assumed mean, A = 45
̅ = A + ∑
= 45 +
=47.4166
Simple Arithmetic Mean – Continuous Series:
̅ = ∑
m = mid-point of various classes ; f = the frequency of each class;
N = the total frequency
Here, first obtain the mod -point of each class and denote it by m. munotes.in
Page 6
6
Multiply the se mid -points by the respective frequency of each class and
obtain the total ∑
Divide the total by the sum of the frequency, i.e. N.
Ex 5. From the following data compute arithmetic mean.
Marks 0-10 10-20 20-30 30-40 40-50 50-60 No. of students 5 10 25 30 20 10
Sol:
Marks Mid- point m No. of Students f fm 0-10 5 5 25 10-20 15 10 150 20-30 25 25 625 30-40 35 30 1050 40-50 45 20 900 50-60 55 10 550 N = 100 ∑ = 3300
̅ = ∑
=
= 33
Ex 6 . From the following data compute arithmetic mean.
Class Intervals 0-10 10-20 20-30 30-40 40-50 50-60 60-70 Frequency 4 4 7 10 12 8 5
Sol:
Marks Mid- point m No. of Students f f m 0-10 5 4 20 10-20 15 4 60 20-30 25 7 175 30-40 35 10 350 40-50 45 12 540 50-60 55 8 440 50-60 65 5 325 N = 50 ∑ = 1910
̅ = ∑
=
= 38.2
Short cut method: ̅ = A + ∑
munotes.in
Page 7
7
Where A is assumed mean and d is deviation of items from assumed mean
i. e. d = ( ), m= mid point , N = ∑
Ex 7. Calculate arithmetic mean by the short cut method using data from
Ex. 5.
Sol: Marks Mid- point m No. of Students f d = ( ) fd 0-10 5 5 -30 -150 10-20 15 10 -20 -200 20-30 25 25 -10 -250 30-40 35 30 0 0 40-50 45 20 10 200 50-60 55 10 20 200 N = 100 ∑ = 200
Assumed mean, A = 35
̅ = A + ∑
= 35 -
= 33
1.4.2 Properties of the Arithmetic Mean:
1. The sum of deviation from their arithmetic mean is always equal to
zero.
Symbolically, ∑( ̅ )= 0
Ex 8:
X 10 20 30 40 50 ∑ = 150 X - ̅ -20 -10 0 10 20 ∑ ̅= 0
̅ ∑
=
= 30
When we calculate the deviations of all the items from their arithmetic
mean ( ̅ =30), we find that the sum of the deviations from the arithmetic
mean i. e. ∑( ̅ )= 0
2. The sum of squared deviations of the items from arithmetic mean is
minimum, that is, less than the sum of squared deviations of the items
from any other value.
Ex 9 : X X - ̅ (X - 4)2 2 -2 4 3 -1 1 4 0 0 munotes.in
Page 8
8
5 1 1 6 2 4 ∑ = 20 ∑ ̅= 0 ∑( )̅̅̅ 2= 0
̅ ∑
=
= 4
The sum of the squared deviations is equal to 10 in the above example. If
the deviations are taken from any other value the sum of the squared
deviations are taken from any other value the sum of the squared
deviations would be gre ater than 10.
Let us calculate the squares of the deviations of item from the value less
than the arithmetic mean, say 3
X X - 3 (X - 3)2 2 -1 1 3 0 0 4 1 1 5 2 4 6 3 9 ∑ = 20 ∑( )2= 0
3. Arithmetic mean is NOT independent of change of origin.
If each observation of a series is increased (or decreased) by a constant,
then the mean of these observations is also increased (or decreased) by
that constant.
4. Arithmetic mean is NOT independent of change of scale.
If each observation of a series is multiplied (or divided) by constant, then
the mean of these observations is also multiplied (or divided) by that
constant.
5. If arithmetic mean and number of items of two or more related groups
are given, then we can compute the combined mean using the formula
given below.
̅12 = ̅̅̅̅ ̅̅̅̅
,
Where
̅12 = Combined mean of two groups ;
N1 = Number of items in the first group ; N2 = Number of items in the
second group
̅̅̅̅ = Ari thmetic mean of the first group; ̅̅̅̅ = Arithmetic mean of the
second group
munotes.in
Page 9
9
1.5 WEIGHTED ARITHMETIC MEAN
Arithmetic mean gives equal importance to all the items. When
importance of the items are not same, in these cases we compute weighted
arithmetic mean. The term weighted represents to the relative importance
to the item.
̅w =
= ∑
∑
Where
̅w represent the weighted arithmetic mean; X represent the variable
values i. e. X 1, X2 …… Xn
W represent the weights attached to the variable values i. e. w 1, w2 ……
wn respectively.
To calculate weighted arithmetic mean, multiply the w eight by the
variable X and obtain the total ∑ . Then divide this total by the sum of
the weights, i.e. ∑
In case of frequency distribution, if f1, f2. ….. . fn are the frequencies of the
variable values X 1, X 2,……X n respectively then the weighted arithmetic
mean is given by
̅w = ( ) ( ) ( )
̅w = ∑ ( )
∑
Note: Simple arithmetic mean shall be equal to the arithmetic mean if the
weights are equal.
Ex. 10 Calculate the weighted mean for following data.
X 1 2 5 7 W 2 14 8 32
Sol:
X W WX 1 2 2 2 14 28 5 8 40 7 32 224 ∑ = 56 ∑ = 294 munotes.in
Page 10
10
̅w = ∑
∑ =
= 5.25
Ex. 11 Calculate the weighted mean for following data.
Wages per Day ( X ) 200 150 85 No. of workers ( W ) 25 20 10
Sol:
̅w = ∑
∑ =
= 160.90
Ex. 12. Calculate the weighted mean for following data and compare it
with arithmetic mean
Subject Weight Student X Y Z Physics 2 72 42 52 Chemistry 3 75 52 62 Biology 5 58 88 68
Sol: For Student X,
Arithmetic Mean, ̅X = ∑
=
=
= 67.67
Weighted Arithmetic Mean, ̅wX = ∑
∑ = ( ) ( ) ( )
=
=
= 65.9
For Student Y,
Arithmetic Mean, ̅y = ∑
=
=
= 60.67
Weighted Arithmetic Mean, ̅wY = ∑
∑ = ( ) ( ) ( )
=
=
= 68
For Student Z,
Arithmetic Mean, ̅Z = ∑
=
=
= 60.67 X W WX 200 25 5000 150 20 3000 85 10 850 ∑ = 55 ∑ = 8850 munotes.in
Page 11
11
Weighted Arithmetic Mean, ̅wZ = ∑
∑ = ( ) ( ) ( )
=
=
= 63
1.6 MEDIAN
Median is a middle value in the distribution. Median is a numeric value
that separates the higher half of a set from the lower half . It is the value
that the number of observations above it is equal to the number of
observations below it. The median is thus a positional average.
For example, if the salary of five employees is 6100, 7150, 7250, 7500 and
8500 the median would be 7250.
When odd number of observations are there then the calculations of
median is simple. When an even number of observations are given, there
is no single middle position value and the median is taken to be the
arithmetic mean of two middlemost items.
In the above example we are given the salary of six employees as 6100 ,
7150, 7250, 7500, 8500 and 9000, the median salary would be
Median =
=
= 7375
Hence, in case of even number of observations median may be found by
averaging two middle position values.
Calculations of Median – Individual Observation s:
Arrange the data in ascending or descending order of magnitude.
In a group composed of an odd number of values, add 1 to the total
number of values and divide by 2 gives median value.
Median = size of
th item
Ex. 13 From the fol lowing data, compute the median:
15, 9, 7, 23, 25, 25, 42, 25, 16, 14, 58, 25, 31
Sol: Arrange the numbers in ascending order 7, 9, 14, 15, 16, 23, 25, 25,
25, 25, 31, 42, 58
Median = = size of
th item =
= 7th item = 25
Median = 25
The procedure for calculating median of an even numbered of items is not
as above. The median value for a group composed of an even number of
items is the arithmetic mean of the two middle values – i.e. adding two
values in the middle and dividing by 2
munotes.in
Page 12
12
Ex. 14 From the fol lowing data, compute the median:
451, 502, 523, 512, 622, 612, 754, 732, 701, 721
Sol: Arrange the numbers in ascending order
451, 502, 512, 523, 612, 622, 701, 721, 732, 754
Median = = size of
th item =
= 5.5th item
Size of 5.5th item =
=
Median = 617
Calculations of Median – Discrete Series:
Steps:
1. First arrange the data in ascending or descending order.
2. Find out the cumulative frequencies.
3. Apply formula : Median = size of
4. Find out total in the cumulative frequency column which is equal to
or next higher to that value and determine the value of the
variable corresponding to it. That gives the median value.
Ex. 15 From the following data, find the value of median.
Income ( Rs.) 450 500 630 550 710 580 No. of persons 29 31 21 25 11 35
Sol: Income ( Rs.) Ascending order No. of persons f Cumulative Frequency c.f. 450 29 29 500 31 60 550 25 85 580 35 120 630 21 141 710 11 152 Median = = size of
th item =
= 76.5th item
Size of 76.5th item = Rs. 550 It is median income.
Calculations of Median – Continuous Series:
The following formula is used to calculate median for continuous series.
Median = L + ⁄
x i
L = Lower limit of median class; f = Simple freq. of the median class;
c.f. = Cumulative freq. of the preceding the median class;
i= Class interval of the median class
munotes.in
Page 13
13
Ex. 16 From the following data, find the value of median.
Marks 70-80 60-70 50-60 40-50 30-40 20-30 10-20 No. of students 10 15 26 30 42 31 24
Sol: Arrange the data in ascending order
Marks f c.f. 10-20 24 24 20-30 31 55 30-40 42 97 40-50 30 127 50-60 26 153 60-70 15 168 70-80 10 178
Median = size of
item =
= 89th item
Median lies in the class 30-40 (marked in pink)
Median = L + ⁄
x i
L = 30.
= 89, c.f. = 55, f = 42, i = 10
Median =30 +
x 10
= 30 + 8.09 = 38.09
1.7 MODE
The mode or the modal value is that value in a series of observations
which occurs with the greatest frequency.
For example, the mode of the values 4, 6, 9, 6, 5, 6, 9, 4 would be 6.
Calculations of Mode – Discrete Series:
Ex. 1 7 From the following data, find the value of m ode.
Size of cloth 28 29 30 31 32 33 No. of persons wearing 15 25 45 70 55 20
Sol: The mode or modal size is 31 because the value 31 occurred
maximum number of times.
Calculations of Mode – Continuous Series:
The following formula is used to calculate mode for continuous series.
Mode =
x i ,
L = Lower limit of modal class; f1 = freq. of the modal class; munotes.in
Page 14
14
fo= freq. of the class preceding the m odal class;
f2= freq. of the class succeeding the modal class;
i= Class interval of the modal class
Ex. 1 8 From the following data, find the value of mode. Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100 No. of students 3 5 7 10 12 15 12 6 2 8
Sol: After observing the table, modal class is 50-60
Mode =
x i ,
= 50 +
x 10
= 50 +
x 10 = 55
1.8 EMPIRICAL RELATION BETWEEN THE MEAN, MEDIAN, AND MODE
Karl Pearson has expressed the relationship between mean, median and
mode as follows:
Mode = Mean – 3 [Mean – Median]
Mode = 3 Median – 2 Mean
If we know any of the two values out of the three, we can compute third
from these relationships.
1.9 GEOMETRIC MEAN
Geometric mean of a set of n observations is the nth root of their product.
G. M. = √( )( )( ) ( ) .
G. M. of 3 values 2, 4, 8 would be
G. M. = √ = √ = 4
For calculation purpose, take the logarithm of both sides
log G. M.=
log G. M.= ∑
G. M. = Antilog [∑
In Discrete series, G. M. = Antilog [∑
In Continuous series, G. M. = Antilog [∑
munotes.in
Page 15
15
Calculations of Geometric Mean – Discrete/ Individual Series:
Ex. 1 9 Daily income of ten families of a particular place is below.
Calculate Geometric Mean . 85 70 15 75 500 8 45 250 40 36
Sol:
X log X 85 1.9294 70 1.8451 15 1.1761 75 1.8751 500 2.6990 8 0.9031 45 1.6532 250 2.3979 40 1.6021 36 1.5563 ∑ log X 17.6373
G. M. = Antilog [∑
= Antilog [∑
= Antilog (1.7637) = 58.03
Calculations of Geometric Mean – Continuous Series:
Ex 20 . Calculate Geometric Mean from following data.
Marks 4-8 8-12 12-16 16-20 20-24 24-28 28-32 32-36 36-40 Frequency 8 12 20 30 15 12 10 6 2
Sol: Marks m.p (m) f log m f log m 4-8 6 8 0.7782 6.2256 8-12 10 12 1.0000 12.0000 12-16 14 20 1.1461 22.922 16-20 18 30 1.2553 37.6590 20-24 22 15 1.3424 20.1360 24-28 26 12 1.4150 16.9800 28-32 30 10 1.4771 14.7710 32-36 34 6 1.5315 9.1890 36-40 38 2 1.5798 3.156 N= 115 ∑ f log m= 143.0386 munotes.in
Page 16
16
G. M. = Antilog [∑
= Antilog [∑
= Antilog (1.2438) = 17. 53
1.10 HARMONIC MEAN
Harmonic mean of a number of observations, none of which is zero, is the
reciprocal of the arithmetic mean of the reciprocals of the given values.
Thus, harmonic mean (H. M.) of n observations x i, i = 1, 2, ….,n is given
by,
H. M. =
∑
=
Calculations of Harmonic Mean – Individual Observations:
Ex. 21 Find the harmonic mean of 4, 36, 45, 50, 75.
Sol: H. M. =
=
=
= 15
Calculations of Harmonic Mean – Discrete Series:
Formula for harmonic mean in Discrete series,
H. M. =
∑
=
∑
Ex. 22 From the following data, Find the harmonic mean.
Marks 10 20 30 40 50 No. of students 20 40 60 30 10
Sol: Marks X f f/X 10 20 2 20 40 2 30 60 2 40 30 0.75 50 10 0.20 N = 160 ∑ ⁄
H. M. =
∑
=
= 23.0215 munotes.in
Page 17
17
Calculations of Harmonic Mean – Continuous Series:
Formula for harmonic mean in continuous series,
H. M. =
∑
Ex. 23 From the following data, compute the value of harmonic mean.
Class interval 10-20 20-30 30-40 40-50 50-60 Frequency 6 8 12 9 5
Sol: Class Interval Mid point (m) f f/m 10 – 20 15 6 0.40 20 – 30 25 8 0.32 30 – 40 35 12 0.3428 40 – 50 45 9 0.2 50 - 60 55 5 0.0909 N = 40 ∑ ⁄
H. M. =
∑
=
= 29.54
1.11 RELATION BETWEEN THE ARITHMET IC, GEOMETRIC AND HARMONIC MEAN
Arithmetic mean is greater than geometric mean and geometric mean is
greater than harmonic mean.
A.M. G. M H. M.
The quality signs hold only if all the numbers X 1, X 2, X 3,…. X n are
identical.
1.12 ROOT MEAN SQUARE
The root mean square (RMS) is defined as the square root of the mean
square (the arithmetic mean of the squares of a set of numbers). It is also
called as the Quadratic average. Sometimes it is denoted by √ ̅ and
given by,
RMS = √ ̅ = √∑
= √∑
It is very useful in fields that study sine waves like electrical engineering.
munotes.in
Page 18
18
Ex. 24 Find RMS of 1, 3, 5, 7 and 9
Sol: RMS = √∑
= √
= √
= √ = 7.28
1.13 QUARTILES, DECILES AND PERCENTILES
From the definition of median that it‟s the middle point which divides the
set of ordered data into two equal parts. I n the same way we can divide the
set into four equal parts and this called quartiles. These values denoted by
Q1, Q 2 and Q 3, are called the first, second and the third quartile
respectively. In the same way the values that divide the data into 10 equal
parts are called deciles and are denoted by D 1, D 2, …., D 9 whereas the
values dividing the data into 100 equal parts are called percentiles and are
denoted by P 1, P 2, …., P 99. The fifth decile and the 50th percentile
corresponds to median.
Formulae:
Quartile :
For individual observations, Qi =(
). No. of observation , i= 1, 2, 3
For discrete series, Qi =(
). N, N= ∑ and i= 1, 2, 3
For continuous series, Q i = L +
. c,
Where, i= 1, 2, 3 , c = size of class interval.
L = Lower limit of the class interval in which lower quartile lies,
f = freq. of the interval in which lower quartile lies,
cf = cumulative freq. of the class preceding the quartile class,
Deciles:
For individual observations, Di =(
). No. of observation, i= 1, 2, …, 9
For discrete series, D i =(
). N, N= ∑ and i= 1, 2,…., 9
For continuous series, D i = L +
. c, i= 1, 2, …, 9
Percentiles:
For individual observations, Pi =(
). value of observation, i= 1,2,…, 99
For discrete series, Pi =(
). N, N= ∑ and i= 1, 2,…., 99
For continuous series, P i = L +
. c, i= 1, 2, …, 99
munotes.in
Page 19
19
Ex. 25 Find the quartiles Q 1, Q3 , D 1, D 5, D 8, P 8, P 50 and P 85 of the
following data 20, 30, 25, 23, 22, 32, 36.
Sol: Arrange data in ascending order, n = 7 i.e. odd number
20, 22, 23, 25, 30, 32, 36
q1 = (
).7 = 1.75 q1 = 2 Q1= 22
q3 = (
).7 = 5 .75 q2 = 6 Q3= 32
d1 = (
).7 = 0.7 d1 = 1 D1= 20
d5 = (
).7 = 3.5 d5 = 4 D5= 25
d8 = (
).7 = 5.6 d8 = 6 D8= 32
p8 = (
).7 = 0.56 p8 = 1 P8= 20
p50 = (
).7 = 3.5 p50 = 4 P50= 25
p85 = (
).7 = 5.95 p85 = 6 P85= 32
Ex. 26 Find Q 1, Q3, D4, P27 for the following data.
X 0 1 2 3 4 5 6 7 8 f 1 9 26 59 72 52 29 7 1 c.f. 1 10 36 95 167 219 248 255 256
Sol. We know that, Qi =(
). N
Q1 = (
).256 = 64 and c.f. just greater than 64 is 95. Hence Q1 = 3
Q3 = (
).256 = 192 and c.f. just greater than 192 is 219. Hence Q3 = 5
D4 =(
).256 = 102.4 and c.f. just greater than 102.4 is 167. Hence D4 = 4
P27 =(
).256 = 69.12 and c.f. just greater than 69.12 is 95. Hence P27 = 3
Ex. 2 7 Find Q 1, Q3, D2, P90 for the following data.
Marks Below 10 10-20 20- 40 40-60 60-80 Above 80 No. of students 8 10 22 25 10 5
Sol: We know that, Qi = L +
. c, Marks Below 10 10-20 20- 40 40-60 60-80 Above 80 f 8 10 22 25 10 5 cf 8 18 40 65 75 80 munotes.in
Page 20
20
Q1 = Size of (N/4)th item = size of (80/4)= 20th item. Q 1 lies in the class
20-40.
L=20, N/4 = 20, cf = 18, f = 22 and c = 20
Q1 = 20 + {(20 – 18)/22}* 20 = 20 + 1.82 = 21.82
Q3 = Size of (3N/4)th item = size of (3*80/4)= 60th item. Q 3 lies in the
class 40 -60.
L=40, 3N/4 = 6 0, cf = 40, f = 25 and c = 20
Q3 = 40 + {( 60 – 40)/25}* 20 = 56
D2 = Size of (2N/10)th item = size of (2*80/10)= 16th item. D 2 lies in the
class 10 -20
L=10, 2N/10 = 16, cf=8, f = 10 and c=10
D2 = 10 + {(16 –8)/10}*10 = 18
P90 = Size of (90N/100)th item = size of (90*80/10)= 72th item. P 90 lies in
the class 60 -80.
L=60, 90N/100 = 72, cf=65, f = 10 and c=20
P90 = 60 + {(72 –65)/10}*20 = 74.
1.14 SOFTWARE AND MEASURES OF CENTRAL TENDENCY
There are many software available to calculate measures of central
tendency. We can use Excel to calculate the standard measures of central
tendency (mean, median and mode). In Microsoft Excel, the mean can
calculated by using one of the functions like AVERAGE, AVERAGEA,
AVERAGEIF, AVERAGEIFS. The mean can be calculated by using th e
MEDIAN function. We can calculate a mode by using the MODE
functio n, GEOMEAN to calculate geometric mean and HARMEAN to
calculate harmonic mean.
We can use SPSS to calculate the standard measures of central tendency
(mean, median and mode). We can get S PSS to compute mean, median
and mode in the command submenu. Go to the Statistics menu, select the
Analyse submenu, and then the Descriptive Statistics submenu and then
the Frequencies option. We can use MINITAB to calculate the standard
measures of centra l tendency using the functions M ean, Median, Mode
ang GMEAN. To compute these go to Stat -Tables -Descriptive statistics.
Using R software one could easily obtain the value of the mean using
summary function.
We could find median value using summary function in R. The
randomForest library can be used to impute the missing values using
Median for numeric variables. Mode is used for missing value imputation
for categorical variables using randomForest library in R. Model can be
easily located graphically. You shouldn‟t be surprised that the R‟s mode munotes.in
Page 21
21
function (mode ()) does not provide a model value. It shows the datatype
of the particular variable which does not comply with our standard
expectation. So how one would find mode using R software? We need to
use table function for finding mode. As you know the table function in R
provides frequency distribution of the variable. Thus the value with
highest frequency is a modal value.
Geometric mean is the only average that is recommended for finding
average growth (decline) rates. It is defined as the nth root of the product
of n terms. Since it is defined in product terms so the observation
shouldn‟t be having zero or negative values. We don‟t have a built-in
function in R for its computation but one could find it by using its formula
directly in R platform.
1.15 SUMMARY
A measure of central tendency is a measure that tells us where the middle
of a group of data lies. Mean, median and mode are the most important
measu res of central tendency. The complete dataset may be represented by
these values. It is not necessary for mean, median and mode to have the
same values. Mean is sensitive to extreme data values. Median is a better
way to understand skewed distribution than mean. It is possible that there
is no mode in the data. Mean and median cannot be zero unless all data
values are zero.
1.16 EXERCISE
1. Find the arithmetic mean of the following distribution:
X 10 30 50 70 89 f 7 8 10 15 10
2. Find the arithmetic mean of the following distribution:
X 3 9 12 14 15 17 f 1 3 4 1 4 2
3. Find the arithmetic mean of the following data.
Class Interval 15-25 25-35 35- 45 45- 55 55-65 65-75 75-85 Frequency 6 11 7 4 4 2 1
4. Find the arithmetic mean of the following data. munotes.in
Page 22
22
Class Interval 10-20 20-30 30-40 40-50 50-60 Frequency 30 27 14 17 2
5. Obtain the median for the following frequency distribution:
X 1 2 3 4 5 6 7 8 9 f 8 10 11 16 20 25 15 9 6
[Ans: Median = 5]
6. Obtain the median from the following data.
X 20-25 25-30 30-35 35-40 40-45 45-50 50-55 55-60 f 35 45 70 105 90 74 51 30
7. Find the mode for the following distribution.
Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 No. of students 5 8 7 12 28 20 10 10 [Ans: Mode = 46.67]
8. Calculate Geometric Mean from following data.
125 1462 38 7 0.22 0.08 12.75 0.5 [ Ans: 6.952]
9. Find the geometric mean, harmonic mean and root mean square of the
numbers 3, 5, 6, 6, 7, 10 and 12.
[Ans: G. M. = 6.43, H. M. = 5.87, RMS = 7.55]
10. Find the arithmetic mean, geometric mean, harmonic mean of
numbers 2, 4 and 8. Check the relation between them.
11. Calculate Quartile 3, Deciles -7 and Percentiles 20 from following
data.
Class 2 - 4 4 – 6 6 – 8 8 - 10 Frequency 3 4 2 1
12. Calculate Q 1, Q2, Q3 D1, D5, D9, P11, P65 from following data.
Wages No. of employees 250.00 – 259.99 8 260.00 – 269.99 10 munotes.in
Page 23
23
270.00 – 279.99 16 280.00 – 289.99 14 290.00 – 299.99 10 300.0 – 309.99 5 310.00 – 319.99 2
1.16 REFERENCES
FUNDAMENTAL OF MATHEMATICAL STATISTICS by S. C.
Gupta and V. K. Kapoor
Statistical Methods by S. P. Gupta
STATISTICS by Murray R. Spiegel, Larry J. Stephens
*****
munotes.in
Page 24
24
2
THE STANDARD DEVIATION AND
OTHER MEASURES OF DISPERSION
Unit structure
2.0 Objectives
2.1 Introduction
2.2 Dispersion, or Variation
2.3 Range
2.4 Semi -Interquartile Range
2.5 Mean Deviation
2.6 10–90 Percentile Range
2.7 Standard Deviation
2.8 Short Methods for C omputing the Standard Deviation
2.9 Propert ies of the Standard Deviation
2.10 Variance
2.11 Charlie r’s Check
2.12 Sheppard’s Correction for Variance
2.13 Empirical Relations between Measures of Dispersion
2.14 Absolute and Relative Dispersion
2.15 Coefficient of Variation
2.16 Standardized Variable and Standard S cores
2.17 Software and Measures of Dispersion
2.18 Summary
2.19 Exercise
2.20 Reference
2.0 OBJECTIVES
After going through this chapter, students will able to learn
To provide the importance of the concept of dispersion
To calculate range, semi -Interquartile range, mean deviation
To explain why measures of dispersion must be reported in addition to
measures of central tendency
To calculate standard devi ation, variance, standard scores
To trace precise relationship
To compare two or more series with regard to their variability munotes.in
Page 25
25
2.1 INTRODUCTION
The measures of central tendency or Averages give us an idea of the
concentration of the observations about the central part of distribution. But
the average alone cannot adequately describe a set of observations. They
must be supported and supplemented by some other measures, called
Dispersion.
2.2 DISPERSION OR VARIATION
Literal meaning of dispersion is ‘sca tteredness’. In two or more
distributions the central value may be the same but still there can be wide
differences in the formation of distribution. Measures of dispersion help us
in studying this important characteristic of a distribution.
Definitions of Dispersion:
1. “Dispersion is the measure of the variation of the items.” – A. L.
Bowley
2. “Dispersion is the measure of extent to which individual item vary.” –
L. R. Connor
3. “The degree to which numerical data tend to spread about an average
value is called variation or dispersion of the data”. – Spiegel
2.3 RANGE
Range is the difference between two extreme observations of the
distribution. Symbolically,
Range = L – S, where L = Largest item, S = smallest item
The relative measure corresponding to range, called the coefficient of
range.
Coefficient of range = ିௌ
ାௌ
Since range is based on two extreme observations, it is not at all a reliable
measure of dispersion.
Ex 1. From the following data, calculate range and coefficient of range. Day Mon Tues Wed Thurs Fri Sat Price 20 21 18 16 22 25
Sol: Range = L – S = 25 – 16 = 9
Coefficient of range = ିௌ
ାௌ munotes.in
Page 26
26
= ଶହିଵ
ଶହାଵ = ଽ
ସଵ = 0.21
For continuous series, find the difference between the upper limit of the
highest class and the lower limit of the lowest class.
Ex 2 . . From the following data coefficient of range.
Marks 10– 20 20 -30 30-40 40-50 50-60 No. of Students 10 12 14 8 6
Sol: Coefficient of range = ିௌ
ାௌ
= ିଵ
ାଵ = ହ
= 0.21
2.4 SEMI -INTERQUARTILE RANGE OR QUARTILE DEVIATION
Semi -Interquartile Range Or Quartile Deviation is given by,
Q. D. =ொయି ொభ
ଶ
Quartile Deviation is a better measure than a range as it makes use of 50%
of the data. But since it ignores the other 50% of the data, it cannot be
considered as a reliable measure.
Q. D. =ொయି ொభ
ଶ
The relative measure corresponding to Q. D., called the coefficient of Q.
D.
Coefficient of Q. D. = ொయష ೂభଶൗ
ொయశ ೂభଶൗ = ொయି ொభ
ொయశ ೂభ
Coefficient of Q. D. can be used to compare the degree of variation in
different distributions.
Computation of Quartile Deviation - Individual Observations:
Ex. 3 Find out Quartile Deviation and Coefficient of Quartile Deviation
from following data.
25 33 45 17 35 20 55
Sol: Arrange the data in ascending order:
17 20 25 33 35 45 55
Q1 = size of [ேାଵ
ସ] th item = size of [ାଵ
ସ] th item = 2nd item munotes.in
Page 27
27
∴ Q1 = 20
Q3 = size of 3 [ேାଵ
ସ] th item = size of 3 [ାଵ
ସ] th item = 6th item
∴ Q3 = 45
Q. D. =ொయି ொభ
ଶ = ସହିଶ
ଶ = 12.5
Coefficient of Q. D. = ொయି ொభ
ொయశ ೂభ = ସହିଶ
ସହାଶ = ଶହ
ହ = 0.455
Computation of Quartile Deviation -Discrete Series:
Ex. 4 Find out Quartile Deviation and Coefficient of Quartile Deviation
from following data.
Marks 10 20 30 40 50 60 No. of Students 7 10 18 12 10 6
Sol: Marks 10 20 30 40 50 60 Frequency f 7 10 18 12 10 6 cf 7 17 35 47 57 63
Q1 = size of [ேାଵ
ସ] th item = size of [ଷାଵ
ସ] th item = 16th item
∴ Q1 = 20
Q3 = size of 3 [ேାଵ
ସ] th item = size of 3 [ଷାଵ
ସ] th item = 48th item
∴ Q3 = 50
Q. D. =ொయି ொభ
ଶ = ହିଶ
ଶ = 15
Coefficient of Q. D. = ொయି ொభ
ொయశ ೂభ
= ହିଶ
ହାଶ = ଷ
= 0.4285
Computation of Quartile Deviation - Continuous Series:
Ex. 5 Find out Quartile Deviation and Coefficient of Quartile Deviation
from following data.
Marks 35-44 45 - 54 55- 64 65 - 74 75 - 84 No. of Students 12 40 33 13 12 munotes.in
Page 28
28
Sol:
Marks 35-44 45 - 54 55- 64 65 - 74 75 - 84 Frequency f 12 40 33 13 12 cf 12 52 75 88 100
Q1 = size of [ே
ସ] th item = size of [ଵ
ସ] th item = 25th item
∴ Q1 lies in the class 45 – 54
Q1 = L + ேସൗି..
* i
L = 45, 𝑁4ൗ= 25, c.f. = 12 [c.f. of previous class], f= 40, i = 9
Q1 = 45 + ଶହ ିଵଶ
ସ* 9 = 47.925
Q3 = size of 3 [ே
ସ] th item = size of 3 [ଵ
ସ] th item = 75th item
∴ Q3 lies in the class 55 -64
Q3 = L + ଷேସൗି..
* i
L = 55, 3𝑁4ൗ= 75, c.f. = 52 [c.f. of previous class], f= 33, i = 9
Q3 = 55 + ହ ିହଶ
ଷଷ* 9 = 61.2727
Q. D. = ொయି ொభ
ଶ
= ଵ.ଶଶିସ.ଽଶହ
ଶ = 6.67
Coefficient of Q. D. = ொయି ொభ
ொయశ ೂభ
= ଵ.ଶଶିସ.ଽଶହ
ଵ.ଶଶାସ.ଽଶହ = .
ଵଽ.ଵଽ = 0.061
2.5 MEAN DEVIATION
Mean deviation is also known as the average deviation.
If xi | fi , i = 1, 2, …, n is the frequency distribution, then mean deviation
from the average A ( usually mean, median or mode).
Since mean deviation is based on all the observations, it is a better
measure of dispersion than range and quartile deviation.
Note: Mea n deviation is least when taken from median munotes.in
Page 29
29
The relative measure corresponding to the mean deviation called the
coefficient of mean deviation and is obtained by,
Coefficient of M. D. = ெ..
ௌ
Computation of Mean deviation – Individual observations
M. D. = ଵ
∑|𝑋−𝐴|
= ଵ
ே ∑|𝐷|, where |𝐷| = |𝑋−𝐴| is the modulus value or absolute value of
the deviation ignoring plus and minus signs.
Ex. 6 Calculate mean deviation and coefficient of mean deviation from
following data:
600, 620, 640, 660, 680
Sol: From above data, Median = 640
Data Deviation from median 640 |𝑫| 600 40 620 20 640 0 660 20 680 40 N= 5 ∑|𝐷| =120
M. D. = ଵ
ே ∑|𝐷| = ଵଶ
ହ = 24
Coefficient of M. D. = ெ..
ௌ
= ଶସ
ସ
= 0.0375
Computation of Mean deviation – Discrete series:
M. D. = ଵ
ே ∑𝑓|𝐷|, where |𝐷| = |𝑋−𝐴|
Ex. 7 Calculate mean deviation from following data.
X 20 21 22 23 24 f 6 15 21 15 6
Sol:
X f c.f |𝐷| f |𝐷| 20 6 6 2 12 munotes.in
Page 30
30
21 15 21 1 15 22 21 42 0 0 23 15 57 1 15 24 6 63 2 12 N = 63 ∑𝑓|𝐷|=54
Median = size of ேାଵ
ଶ th item = size of ଷାଵ
ଶ th item = 32th item
Size of 32th item is 22, hence Median = 22
M. D. = ଵ
ே ∑𝑓|𝐷|
= ହସ
ଷ = 0.857
Computation of Mean deviation – Continuous series:
Here we have to obtain the mid -point of the various classes and take
deviations of these points from median. Formula is same.
M. D. = ଵ
ே ∑𝑓|𝐷|
Ex. 8 Calculate mean and mean deviation from following data.
Size 0-10 10-20 20-30 30-40 40-50 50-60 60-70 Frequency 7 12 18 25 16 14 8
Sol: Size f c.f. m.p (m) |𝑚−35.2| |𝐷| f |𝐷| 0-10 7 7 5 30.2 211.4 10-20 12 19 15 20.2 242.4 20-30 18 37 25 10.2 183.6 30-40 25 62 35 0.2 5.0 40-50 16 78 45 9.8 156.8 50-60 14 92 55 19.8 277.2 60-70 8 100 65 29.8 238.4 N= 100 ∑𝑓|𝐷|=1314.8
Median = size of ே
ଶ th item = size of ଵ
ଶ th item = 50th item
Median lies in the class 30 – 40
Median = L + ேଶൗି..
* i
L= 30, 𝑁2ൗ= 50, c.f. = 37, f = 25, i = 10
Median = 30 + ହିଷ
ଶହ *10 = 35.2 munotes.in
Page 31
31
M. D. = ଵ
ே ∑𝑓|𝐷|
= ଵଷଵସ.଼
ଵ =13.148
2.6 10 –90 PERCENTILE RANGE :
The 10 – 90 percentile range of a set of data is defined by,
10 – 90 percentile range = P 90 – P10
Where P 10 and P 90 are the 10th and 90th for the data.
Semi 10 -90 percentile range = వబି భబ
ଶ
2.7 STANDARD DEVIATION :
Standard deviation is the positive square root of the arithmetic mean of the
squares of the deviations of the given values from their arithmetic mean.
Standard deviation is also known as root mean square deviation as it is the
square root of the mean of the standard deviation from arithmetic mean.
Standard deviation is denoted by the small Greek letter 𝜎 (read as sigma).
Calculation of Standard Deviati on - Individual Observations:
𝜎 =ට∑௫మ
ே , where x = (X -𝑋ത)
Calculation of Standard Deviation - Discrete Series:
𝜎 =ට∑௫మ
ே , where x = (X -𝑋ത)
Calculation of Standard Deviation: Continuous Series:
𝜎 = ට∑ௗమ
ே− ቀ∑ௗ
ேቁଶ
∗𝑖, where d =(ି)
, i = class interval
Ex. 9 Calculate mean and standard deviation from the following data.
Size 0-10 10-20 20-30 30-40 40-50 50-60 60-70 Frequency 7 10 32 43 50 35 23
Sol: Marks m. p. (m) f d = (m-35)/10 d2 fd fd2 0-10 5 7 -3 9 -21 63 10-20 15 10 -2 4 -20 40 20-30 25 32 -1 1 -32 32 30-40 35 43 0 0 0 0 40-50 45 50 1 1 50 50 50-60 55 35 2 4 70 140 munotes.in
Page 32
32
60-70 65 23 3 9 69 207 N = 200 ∑fd =116 ∑fd2= 532
Assumed mean, A = 35
𝑋ത = A + ∑ௗ
ே * i
=35 + ଵଵ
ଶ *10 = 40.8
𝜎 = ට∑ௗమ
ே− ቀ∑ௗ
ேቁଶ
∗𝑖
= ටହଷଶ
ଶ− ቀଵଵ
ଶቁଶ
∗10
= √ 2.66−0.3364 *10 = 1.5243*10 = 15.243
2.8 SHORT METHODS FOR COMPUTING THE STANDARD DEVIATION:
Calculation of Standard Deviation - Individual Observations :
When actual mean is in fractions eg 568.245, it would be too bulky to do
calculations. In such case either the mean may be approximated or the
deviations be taken from assumed mean A. Following is formula if we
take deviations from assumed mean A:
𝜎 = ට∑ௗమ
ே− ቀ∑ௗ
ேቁଶ
, where d = ( X – A)
Ex. 10 Calculate standard deviation with the help of assumed mean.
340, 360, 390, 345, 355, 388, 372, 363, 277, 351
Sol: Consider assumed mean = 364
X d = (X – 364) d2 340 -24 576 360 -4 16 390 26 676 345 -19 361 355 -9 81 388 24 576 372 8 64 363 -1 1 377 13 169 351 -13 169 ∑ d = 1 ∑ d2 = 2689 munotes.in
Page 33
33
𝜎 = ට∑ௗమ
ே− ቀ∑ௗ
ேቁଶ
= ටଶ଼ଽ
ଵ− ቀଵ
ଵቁଶ
= 16.398
Calculation of Standard Deviation - Discrete Series :
Assumed mean method: 𝜎 = ට∑ௗమ
ே− ቀ∑ௗ
ேቁଶ
, where d = ( X – A)
Ex. 11 Calculate standard deviation from the following data.
Size 3.5 4.5 5.5 6.5 7.5 8.5 9.5 Frequency 4 8 21 60 85 30 9
Sol:
Size f d = (X -6.5) d2 fd fd2 3.5 4 -3 9 -12 36 4.5 8 -2 4 -16 32 5.5 21 -1 1 -21 21 6.5 60 0 0 0 0 7.5 85 1 1 85 85 8.5 30 2 4 60 120 9.5 9 3 9 27 81 N = 217 ∑fd = 123 ∑fd2= 375
Assumed mean, A= 6.5
: 𝜎 = ට∑ௗమ
ே− ቀ∑ௗ
ேቁଶ
= ටଷହ
ଶଵ− ቀଵଶଷ
ଶଵቁଶ
= √1.7281−0.3212 = 1.1861
2.9 PROPERTIES OF THE STANDARD DEVIATION
1. Combined standard deviation: We can compute combined standard
deviation of two or more groups. It is denoted by 𝜎12 and given by
𝜎12 = ටேభఙభమ ା ேమఙమమ ାேభௗభమାேమఙమమ
ேభା ேమ
Where 𝜎12 = combined standard deviation;
𝜎ଵ = standard deviation of first group;
𝜎ଶ = standard deviation of second group;
d1 = |𝑋ଵതതത− 𝑋ଵଶതതതതത | ;
d2 = |𝑋ଶതതത− 𝑋ଵଶതതതതത | munotes.in
Page 34
34
2. The standard deviation of the first n natural numbers can obtained by,
𝜎 = ටଵ
ଵଶ (𝑁ଶ−1)
Thus the standard deviation of natural numbers 1 to 20 will be
𝜎 = ටଵ
ଵଶ (20ଶ−1) = ටଵ
ଵଶ 399 = 5.76
3. Standard deviation is always computed from the arithmetic mean
because the sum of the squares of the deviations of items from their
arithmetic mean is minimum.
4. For normal distribution,
Mean ± 1 𝜎 covers 68.27% of the items.
Mean ± 2 𝜎 covers 95.45% of the items.
Mean ± 3 𝜎 covers 99.73% of the items.
2.10 VARIANCE
The square of standard deviation is called the variance and is given by,
Variance = ∑(ି )തതതതమ
ே
i.e. Variance = 𝜎2 or 𝜎 = √𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
In the frequency distribution where deviations are taken from assumed
mean,
Variance = ൜∑ௗమ
ே− ቀ∑ௗ
ேቁଶ
ൠ* i2, where d =(ି)
and i = class interval
Ex. 12 Calculate standard deviation from the following data.
Marks 10-20 20- 30 30-40 40-50 50-60 60-70 No. of students 2 6 8 12 7 5
Sol: Marks m.p (m) f d=(m-35)/10 d2 fd fd2 10-20 15 2 -2 4 -4 8 20-30 25 6 -1 1 -6 6 30-40 35 8 0 0 0 0 40-50 45 12 1 1 12 12 50-60 55 7 2 4 14 28 60-70 65 5 3 9 15 45 N = 40 ∑fd = 31 ∑fd2= 99 munotes.in
Page 35
35
Variance = ൜∑ௗమ
ே− ቀ∑ௗ
ேቁଶ
ൠ* i2
= ൜ଽଽ
ସ− ቀଷଵ
ସቁଶ
ൠ*102
= (2.475 – 0.6006)*100 = 187.44
2.11 CHARLIE’S CHECK
Some error may be made while calculating the value of mean and standard
deviations using different method. The accuracy of calculations can be
checked by using following formulae.
∑𝑓 (u + 1) = ∑𝑓u + ∑𝑓 = ∑𝑓u + N
∑𝑓 (u + 1)2 = ∑𝑓(u2 + 2u +1) = ∑𝑓u2 + 2 ∑𝑓u + ∑𝑓= ∑𝑓u2 + 2 ∑𝑓u +
N
∑𝑓 (u + 1)3 = ∑𝑓u3 + 3 ∑𝑓u2 + 3∑𝑓𝑢 + N
Ex. 13 Use Charlier’s check to verify mean and the standard deviation. Size 20-30 30-40 40-50 50-60 60-70 70-80 80-90 Freq 9 12 8 10 11 35 15
Sol:
X f m. p. (m) u= (m-55)/i u+1 f(u+1) u2 fu fu2 20-30 9 25 -3 -2 -18 9 -27 81 30-40 12 35 -2 -1 -12 4 -24 48 40-50 8 45 -1 0 0 1 -8 8 50-60 10 55 0 1 10 0 0 0 60-70 11 65 1 2 22 1 11 11 70-80 35 75 2 3 105 4 70 140 80-90 15 85 3 4 60 9 45 135 N=∑f =100 ∑ f(u+1)= 167 ∑fu =67 ∑fu2 =423
∑𝑓 (u + 1) = 167
∑𝑓u + N = 67 +100 =167
∴ ∑𝑓 (u + 1) = ∑𝑓u + N
This provides the required check on the mean.
X f m. p. (m) u= (m-55)/i u+1 f(u+1) f(u+1)2 20-30 9 25 -3 -2 -18 36 30-40 12 35 -2 -1 -12 12 40-50 8 45 -1 0 0 0 munotes.in
Page 36
36
50-60 10 55 0 1 10 10 60-70 11 65 1 2 22 44 70-80 35 75 2 3 105 315 80-90 15 85 3 4 60 240 N=∑f =100 ∑ f(u+1) = 167 ∑ f(u+1)2 ==657
∑𝑓 (u + 1)2 = 657
∑𝑓u2 + 2 ∑𝑓u + N =423 +2*67 +100 = 657
∴ ∑𝑓 (u + 1)2 = ∑𝑓u2 + 2 ∑𝑓u + N
This provides the required check on the standard deviation .
2.12 SHEPPARD’S CORRECTION FOR VARIANCE
The computation of the standard deviation is somewhat in error as a result
of grouping the data into classes (grouping error). To adjust for grouping
error, we use the formula,
Corrected variance = variance from grouped data - మ
ଵଶ
Where i is the class interval size. The correction మ
ଵଶ is called Sheppard’s
correction. It is used for distribution of continuous variables where the
tails tends to zero in both direction.
Ex. 14 Apply Sheppard’s Correction to determine the standard de viation
of the data in Ex. 8
Sol: 𝜎 = 15.243 ∴ 𝜎2 = 232.349 and i= 10.
Corrected variance = variance from grouped data - మ
ଵଶ
= 232.349 - ଵమ
ଵଶ = 224.016
Corrected Standard deviation = √224.016 = 14.9671
2.13 EMPIRICAL RELATIONS BETWEEN MEASURES OF DISPERSION :
There is a fixed relationship between the three measures of dispersion in
normal distribution. munotes.in
Page 37
37
Q. D. = ଶ
ଷ 𝜎 or 𝜎 = ଷ
ଶ Q. D and
M. D. = ସ
ହ 𝜎 or 𝜎 = ସ
ହ M. D
The quartile deviation is smallest, the mean deviation next and the
standard deviation is largest.
2.14 ABSOLUTE AND RELATIVE DISPERSION:
Measures of dispersion may be either absolute or relative. Absolute
measures of dispersion are expressed in the sa me statistical unit in which
the original data are given such as kilograms, tons, rupees etc. These
values may be used to compare the variations in two distributions provided
the variables are expressed in the same units and of the same average size.
In ca se the two sets of data are expressed in different units such as quintals
of sugar versus tons of sugarcane, the absolute measures of dispersion are
not comparable. In such cases measures of relative dispersion is used.
A measure of relative dispersion is the ratio of a measure of absolute
dispersion to an appropriate average. It is sometimes called coefficient of
dispersion.
Relative dispersion = ௦௨௧ ௗ௦௦
௩
2.15 COEFFICIENT OF VARIATION:
Coefficient of is used in problems where we want to compare the
variability of two or more than two series. That series or group for which
the coefficient of variation is greater is said to be more variable or less
consistent, less uniform, less stable or less homogeneous. The series for
which the coefficient of variation is less is said to be less variable or more
consistent, more uniform, more stable or more homogeneous.
If the absolute dispersion is standard deviation 𝜎 and if average is the
mean 𝑋ത, then relative dispersion is called coefficient of variation, it is
denoted by C. V. and is given by,
Coefficient of variation (C.V. ) = ఙ
ത x100
Ex. 15 Calculate arithmetic mean, standard deviation and coefficient of
variation.
Class 23-27 28-32 33-37 38-42 43-47 48-52 53-57 58-62 63-67 68-72 Freq 2 6 7 12 18 13 9 7 4 2
munotes.in
Page 38
38
Sol:
Class m. p. (m) f d = (m-50)/5 d2 fd fd2 23-27 25 2 -5 25 -10 50 28-32 30 6 -4 16 -24 96 33-37 35 7 -3 9 -21 63 38-42 40 12 -2 4 -24 48 43-47 45 18 -1 1 -18 18 48-52 50 13 0 0 0 0 53-57 55 9 1 1 9 9 58-62 60 7 2 4 14 28 63-67 65 4 3 9 12 36 68-72 70 2 4 16 8 32 N = 80 ∑fd = -44 ∑fd2= 380
Mean (𝑋ത) = A + ∑ௗ
ே x i
= 50 + ିସସ
଼ x 5 = 47.25
S. D. (𝜎) = ට∑ௗమ
ே− ቀ∑ௗ
ேቁଶ
x i
= ටଷ଼
଼− ቀିସସ
଼ቁଶ
x 5
= √4.75−0.3025 x 5= 22.23 75
C. V. = ఙ
ത x100
= ଶଶ.ଶଷହ
ସ.ଶହ x100 = 47.06%
2.16 STANDARDIZED VARIABLE AND STANDARD SCORES
The variable that measures the deviation from the mean in units of the
standard deviation is called standardized variable, is independent of the
units used and is given by,
z = ିത
ఙ
If the deviations from the mean are given in units of the standard
deviation, they are said to be expressed in standard units or standard
scores. These are of great value in the com parison of distribution . The
variable z is often used in educational testing, where it is called as a
standard score.
munotes.in
Page 39
39
Ex. 16 Your test score is 160 while the test has a mean of 120 and
standard deviation of 15. If the distribution is normal, what is your z
score? Explain the meaning of the result.
Sol: z = ିത
ఙ = ଵିଵଶ
ଵହ = 2.7
The score is 2. 7 standard deviations above the mean.
Ex. 17 A student received a grade of 84 on a final examination in English
for which mean grade was 76 and the standard deviation was 10. On a
final examination in Science for which mean grade was 82 and the
standard deviation was 16, she received a grade of 90. In which subject
was her relative standing higher?
Sol: Standardized variable z = ିത
ఙ
For English, z = ଼ସି
ଵ = 0.8
For Science, z = ଽି଼ଶ
ଵ = 0.5
Thus, the student had a grade of 0.8 of a standard deviation above the
mean in English but only 0 .5 of a standard deviation above the mean in
science. Thus her relative standing was higher in English.
2.17 SOFTWARE AND MEASURES OF DISPERSION:
The statistical software gives a variety of measures for dispersion. The
dispersion measures are usually given in descriptive statistics. EXCEL and
MINITAB allows for the computation of all the measures discussed above.
The output from MINITAB and STATISTIX has helped clarify some of
the statistical concepts which are hard to understand without some help
from the graphics involved .
Calculating Range In Excel: Excel does not offer a function to compute
range. However, we can easily compute it by subtracting the minimum
value from the maximum value. The formula would be =MAX() -MIN()
where the dataset would be t he referenced in both the parentheses. The
=MAX() and =MIN() functions would find the maximum and the
minimum points in the data. The difference between the two is the range.
Microsoft Excel has two functions to compute quartiles. The inter -quartile
range has to be calculated as the difference between the quartile 3 and
quartile 1 values Quartiles can be calculated using =QUARTILE.INC() or
=QUARTILE.EXC(). Both functions calculate the quartiles by calculatin g
the percentiles on the data. Excel offers two fu nctions, =STDEV.S() for
sample standard deviation, and =STDEV.P() for population standard
deviation.Excel with two different functions: =VAR.P() for population
variance, and =VAR.S() for sample variance. Minitab may be used to munotes.in
Page 40
40
compute descriptive statistic s for numeric variables, including the mean,
median, mode, standard deviation, variance and coefficient of variance. To
compute these go to Stat -Tables -Descriptive statistics.
You can use SPSS to calculate the measures of dispersion such as range,
semi -interquartile range, standard deviation and variance. We can get
SPSS to compute these in the command submenu. Go to the Statistics
menu, select the Analyse submenu, and then the Descriptive Statistics
submenu and then the Frequencies option. We can use MINITAB to
calculate the measures of dispersion the functions Q1, Q3, Range StDev,
Variance and CorfVar
2.18 SUMMARY
A measure of dispersion indicates the scattering of data. Di spersion is the
extent to which values in a distribution differ from the average of the
distribution. The measure of dispersion displays and gives us an idea about
the variation and the central value of an individual item. The range and
interquartile range are generally ineffective to measure the dispersion of
set of data. The useful measure that describes the dispersion of all the
values is standard deviation or variance. Dispersion can prove very
effective in association with central tendency in making an y statistical
decision.
2.19 EXERCISE
1. Calculate Quartile deviation (Q. D.), Mean Deviation (M. D. ) from
mean for the following data.
Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 No. of Students 6 5 8 15 7 6 3
[Ans: Q.D. = 11.23, Mean = 33.4, M.D from mean = 13.184 ]
2. Calculate Mean Deviation (M. D.) from mean for the following data
Size 2 4 6 8 10 12 14 16 f 2 2 4 5 3 2 1 1
[Ans: Mean = 8, M.D from mean = 2.8 ]
3 Calculate Mean Deviation and its coefficient from mean for the
following data .
Size 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70 -80 Freq 5 8 12 15 20 14 12 6
[Ans: Median = 43, M.D = 15.37, Coe. Of M. D. = 0.357] munotes.in
Page 41
41
4. Find the standard deviation of the following data.
i. 12, 6, 7,3,15, 10, 18, 5
ii. 9, 3, 8, 8, 9, 8, 9,18
[Ans: i. St. dev. 𝜎= 4.87, ii. St. dev. 𝜎= 3.87]
5. Find the standard deviation of the following data. Age 20-25 25-30 30-35 35-40 40-45 45-50 No. of persons 170 110 80 45 40 35
Take assumed average = 32.5
[Ans: Standard deviation 𝜎= 7.936]
6. Calculate the standard deviation from the following data by short
method.
240.12, 240.13, 240.15, 240.12, 240.17, 240.15, 240.17, 240.16, 240.22,
240.21
7. Calculate standard deviation from the following data by short method. Salary 45 50 55 60 65 70 75 80 No. of persons 3 5 8 7 9 7 4 7 [Ans: Standard deviation = 10.35]
8. Calculate arithmetic mean, standard deviation and coefficient of
variation. Class 20-25 25-30 30-35 35-40 40-45 Frequency 1 22 64 10 3
[Ans: 𝑋ത = 32.1, S. D. (𝜎) =3.441, C. V. = 10.72]
2.20 REFERENCE
FUNDAMENTAL OF MATHEMATICAL STATISTICS by S. C
Gupta and V. K. Kapoor
Statistical Methods by S. P. Gupta
STATISTICS by Murray R. Spiegel, Larry J. Stephens
***** munotes.in
Page 42
42
3
INTRODUCTION TO R
Unit structure
3.0 Objectives
3.1 Introduction
3.2 Basic syntax
3.3 Data types
3.4 Variables
3.5 Operators
3.6 Control statements
3.7 R-functions
3.8 R –Vectors
3.9 R – lists
3.10 R Arrays
3.11 Summary
3.12 Exercise
3.13 References
3.0 OBJECTIVES
After going through this chapter, students will able to learn
1. Understand the different data types, variables in R.
2. Understand the basics in R prog ramming in terms of operators,
control statements
3. Use of built-in and user defined function
4. Understand the different data structures in R.
3.1 INTRODUCTION
R is programming language and software environment for statistical
computing and graphics. It is an open source programming language. It
was designed by Ross Ihaka and Robert Gentleman at the University of
Auckland, New Zealand. in 1993 . It was released on 31 -Oct-2014 by the R
Development Core Team . It is widely used by researchers from diverse
disciplines to estimate and display results and by teachers of statistics and
research methods. Today, millions of analysts, researchers, and brands
such as Facebook, Google, Bing, Accenture, and Wipro are using R to
solve complex issues. The applications of R are not limited to just one
sector, we can see the use of R in banking, e -commerce, finance, and
many more sectors munotes.in
Page 43
43
It is f reely available on www.r -project.org or can also download from
CRAN (Comprehensive R Archive Network) website http://CRAN.R -
project.org .
R Command Prompt:
We will be using RStudio . Once we have R environment setup, then it’s
easy to start R command prompt by just typing the following command at
command prompt −
$ R
This will launch R interpreter and you will get a prompt > where you can
start typing your program
> “Hello, World!”
[1] “Hello, world!”
Usually, we will write our code inside scripts which are
called RScripts in R.
Write the below given code in a file
1 print (“Hello, World!”)
and save it as myfirstprogrm.R and then run it in console by writing:
Rscript my firstprogram .R
It will produce following output
[1] "Hello, World!"
3.2 BASIC SYNTAX
Any program in R is made up of three things: Variables, Comments, and
Keywords . Varia bles are used to store the data.
Comments are us ed to improve code readability. They are like helping text
in your R program. Single comment is written using # in the beginning of
the statement.
Eg. # This is my first R program
Keywords are reserved words that hold a spe cific meaning to the compiler.
Keyword cannot be used as a variable name, function name.
Following are the Reserved words in R: if, else, while, repeat, for,
function, in, next, break, TRUE, FALSE, NU LL, Inf, NaN, NA,
NA_integer_, NA_real_, NA_complex_, NA_character etc
We can view these keywords by using either help (reserved) or ?reserved
R is case sensitive language.
munotes.in
Page 44
44
3.3 DATA TYPES
Variables can store data of different types and different types can do
different things. Variables are the reserved memory location to store
values. As we create a variable in our program some space is reserved in
memory.
Following are the data types use d in R programming.
Data type Example Description Numeric 50, 25.65, 999 Decimal values Logical True, False Data with only two possible values
which can be constructed as true/false Character ‘A’, “Excellent”,
’50.50’ A character is used represent string
values. Integer 5L, 70L, 9876L L tells R to store the value as an
integer. Complex X= 5+4i A complex value in R defined as the
pure imaginary value i. Raw A raw data type is used to holds raw
bytes.
We can use the class ( ) function to check the data type of a variable.
# numeric
a < - 25.5
class (a)
# complex
a < - 10+5i
class (a)
# integer
a < - 100L
class (a)
# logical/boolean
a < - TRUE
class (a)
# character/string
a < - “I am doing R programming”
class (a)
output:
[1] “numeric” munotes.in
Page 45
45
[1] “complex”
[1] “integer”
[1] “logical”
[1] “character”
3.4 VARIABLES
Variables are used to store the information to be manipulated in the R
program. A variable in R can store an atomic vector, group of atomic
vectors or a combination of many R -objects. A valid variable name
consists of letters, numbers and the dot or underline characters. The
variable name must start with a letter or the dot not followed by a number.
Ex – valid - a , a_b, a.b , a 1 , a1. , a.c
Invalid - 2a, _a
R does not have a command for declaring a variable. A variable is created
the moment you first assign a value to it.
In R, the assignment can be denoted in three ways:
1. = (Simple Assignment)
2. <- (Leftward Assignment)
3. -> (Rightward Assignment)
name = “Ajay”
gender < - “Male”
age < - 25
Here, name, gender and age are variables an d “Ajay”, “Male”, 25 are
values.
To print/output variable, you do not need any function. You can just type
the name of the variable.
name = “Ajay”
O/P:
[1] “Ajay” #auto print the value of name variable
However, R have a print() and cat() function s which are used to print the
value of the variable . The cat( ) function combines multiple values into a
continuous print output.
Cat (“My name is” , name , “\n”)
Cat (“my age is”, age, “ \n”)
O/P: My name is Ajay
My age is 25
ls() function: To know all the variables currently available in the
workspace, use the ls() function. munotes.in
Page 46
46
# using equal to operator
a = “Good morning”
# using leftward operator
b < - “Good morning”
# using leftward operator
“Good morning - > c
print(ls())
O/P: “a” “b” “c”
# List the variables starting with the pattern "var".
> print(ls (pattern="var"))
The variables starting with dot (.) are hidden, they can be listed using
"all.names=TRUE" argument to ls() function.
> print(ls(all.name=TRUE))
rm() function: This is a built in function used to delete an wanted
variables .
> rm( variable )
# using equal to operator
a = “Good morning”
# using leftward operator
b < - “Good morning”
# using leftward operator
“Good morning - > c
# Removing variable
rm(a)
print(a)
O/P: Error in print(a) : object ‘a’ not found
All the variables can be deleted by using the rm() and ls() function
together.
> rm(list=ls())
> print(ls())
3.5 OPERATORS
Operators are the symbols directing the compiler to perform various
kinds of operations between the operands. There are different types of munotes.in
Page 47
47
operator, and each operator performs a different task. Operators simulate
the various mathematical, logical, and decision operations performed on
a set of Complex Numbers, Integers, and Numericals as input operands .
Types of Op erators used in R programming:
• Arithmetic Operators
• Relational Operators
• Logical Operators
• Assignment Operators
• Miscellaneous Operators
Arithmetic Operators:
Arithmetic operators are used with numeric values to perform common
mathematical operations
< - , = Assignment A < - 5 ; b=10 + Addition x <- c( 2,5.5,6) ; y < - c(8, 3, 4);
print(x+ y)
# O/P [1] 10.0 8.5 10.0 - Subtraction x <- c( 2, 5.5,6); y <- c(8, 3, 4);
print(x - y) * Multiplication x<- c( 2,5.5,6); y <- c(8, 3, 4); print(x*y) / division x <- c( 2,5.5,6); y <- c(8, 3, 4); print(x/y) %% remainder x<- c( 2,5.5,6); y <- c(8, 3, 4); print(x%%y) #O/P [1] 2.0 2.5 2.0 %/% gives quotient x <- c( 2,5.5,6); y <- c(8, 3, 4) ;print(x%/%y) # O/P 0 1 1 ^ ** exponent x <- c( 2,5.5,6) ; y <- c(8, 3, 4); print(x^y) #O/P 256.000 166.375 1296.000
Relational Operators: Relational/ Comparison operators are used to
compare two values
> Greater than x <- c(2,5.5,6,9) ; y <- c(8,2.5,14,9); print(x>y) # O/P [1] FALSE TRUE FALSE FALSE < Less than x <- c(2,5.5,6,9) ; y <- c(8,2.5,14,9) ; print(x < y) ; #O/P [1] TRUE FALSE TRUE FALSE munotes.in
Page 48
48
<= Less than equal to x <- c(2,5.5,6,9) ;y<- c(8,2.5,14,9) print(x<=y) #O/P [1] TRUE FALSE TRUE TRUE >= Greater than equal to x <- c(2,5.5,6,9) ;y <- c(8,2.5,14,9) print(x>=y) #O/P [1] FALSE TRUE FALSE TRUE == Equal x < - c(2,5.5,6,9); y < - c(8,2.5,14,9) print(x==y) #O/P [1] FALSE FALSE FALSE TRUE != Not equal x <- c(2,5.5,6,9) ; y <- c(8,2.5,14,9) print(x!=y) # O/P [1] TRUE TRUE TRUE FALSE
Logical Operators: Logical operators are used to combine conditional
statements.
& Element wise Logical AND x <- c(3, 1, TRUE, 2+3i); y <- c(4, 1,
FALSE, 2+3i)
print(x&y );
# O/P [1] TRUE TRUE FALSE TRUE | Element wise Logical OR x <- c(3, 0, TRUE, 2+2i); y <- c(4, 5, FALSE, 2+3i) print(x|y) # O/P [1] TRUE TRUE TRUE TRUE ! Element wise Logical NOT x<- c(8, 0, FALSE, 4+4i); print(!x) # O/P [1] FALSE TRUE TRUE FALSE && Takes first element of both the vectors and gives the TRUE only if both are TRUE. x <- c(3,0,TRUE, 8+9i); y<- c(1,3,TRUE, 3+4i) print(x&&y) # O/P [1] TRUE || Logical OR operator. It returns TRUE if one of the statement is TRUE. x <- c(4, 0,TRUE, 8+9i); y<- c(3, 5, TRUE, 2+3i) print(x||y) # O/P [1] TRUE
Miscellaneous Operators: Miscellaneous operators are used to anipulate
data:
: Create a series of numbers in sequence x <- 2:9 print(x) # [1] 2 3 4 5 6 7 8 9 %in% Find out if an element belongs to a vector x <- 8 ; y <- 12 ; z <- 1:10 print(x %in% z) ; print(y %in% z) # O/P [1] TRUE [1] FALSE munotes.in
Page 49
49
%*% It is used to multiply a matrix with its transpose
3.6 CONTROL STATEMENTS
Control statements are expressions used to control the execution and flow
of the program based on the conditions provided in the statements.
if condition: if statement checks the expression provided in the
parenthesis is true or not true. The block of code insi de if statement will be
executed only when the expression evaluates to be true.
Syntax :
if (expression ) {
// statements will execute if expression is true.
}
a < - 500
if (a > 100) {
print((x, “is greater than 100”))
}
O/P: [1] “500 is greater than 100”
If ….. else condition : If expression evaluates to be true, then the if block
of code will be executed, otherwise else block of code will be executed.
Syntax:
if(expression){
// statements will execute if expression is true.
}
else{
// statements will execute if expression is false.
}
a < - 500
if (a > 100) {
print(a, “is greater than 100”)
} else {
print(a, “is smaller than 100”)
}
O/P: [1] “500 is greater than 100” munotes.in
Page 50
50
Repeat loop: Repeat loop executes the same code again and again until
stop condition met.
Syntax:
repeat { commands
if (condition){ break
}
}
a <- 1
repeat {
print(a)
a =a+1
if (a>5){ break
}
}
O/P:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
return statement: return statement is used to return the result of an
executed function and returns control to the calling function .
Syntax:
retur n(expression)
Example: func < - function(a) {
if(a > 0){
return (“POSITIVE”)
}else if (a < 0){
return(“NEGATIVE”)
}else{
return( “ZERO”)
}
}
fun(1)
fun(0) munotes.in
Page 51
51
fun(-1)
O/P :
“POSITIVE”
“NEGATIVE”
“ZERO”
next statement: next statement is useful when we want to skip the current
iteration of a loop without terminating it.
Syntax:
next
Example:
a < - 1:8
#Print even numbers
for( i in a){
if(i%%2 !=0){
next
}
print(i)
}
O/P:
[1] 2
[1] 4
[1] 6
[1] 8
break statement: The break keyword is a jump statement that is used to
terminate the loop at a particular iteration
Syntax:
if (test_ expression) {
break
}
switch Statement: A switch statement is a selection control mechanism.
Switch case is a multiway branch statement. It allows a variable to be
tested for equality against a list of value s. If there is more than one match
for a specific value, then the switch statement will return the first match
found of the value matched with the expression.
Syntax:
switch(expression, case1, case2, case3, ……) munotes.in
Page 52
52
Example:
a < - switch( 2, “Nagpur”, “Mumbai”, “Delhi”,“Raipur”)
print(a)
O/P:
[1] “Mumbai”
while loop: The while loop executes the same code again and again until
stop condition is met.
Syntax:
While (test_expression) {
Statement
}
Example:
a < - c(“Hello”, “World”)
count < - 1
while (count < 5) {
print(a)
count = count + 1
}
O/P
[1] “Hello” “World”
[1] “Hello” “World”
[1] “Hello” “World”
[1] “Hello” “World”
for loop: The for loop can be used to execute a group of statements
repeatedly depending upon the number of elements in the object. It is an
entry controlled loop, in this loop the test condition is tested first, then the
body of the loop executed, the loop body would not be executed if the test
condition is false.
Syntax:
for (value in vector) {
statements
}
Example:
v < - LETTERS[1:5]
for ( x in v
}) { munotes.in
Page 53
53
print(x)
O/P:
[1] “A”
[1] “B”
[1] “C”
[1] “D”
[1] “E”
Example:
for (x in c(-5, 8, 9, 11))
{ print(x)
}
O/P:
[1] -5
[1] 8
[1] 9
[1] 11
Nested for-loop: Nested loops are similar to simple loops. Nested means
loops inside loop. R programming allows using one loop inside another
loop. In loop nesting, we can put any type of loop inside of any other type
of loop. For example, a if loop can be inside a for loop or vice versa.
Moreover, nested loops are used to manipulate the matrix .
for ( i in 1:3)
{
for ( j in 1:i)
{
print( i * j)
}
}
O/P:
[1] 1
[1] 2
[1] 4
[1] 3
[1] 6
[1] 9
munotes.in
Page 54
54
3.7 R -FUNCTIONS A set of statements which are organized together to perform a specific task
is known as a function. A function is a set of statements organized
together to perform a specific task. R has a large number of in -built
functions and the user can create their own functions. An R function is
created by using the keyword function .
Syntax :
Function_name < - function (arg1, arg2, …..)
{ function body
}
The different components of function are -
Function name is the actual name of the function.
An argument is placeholder. In function, argument are optional means
a function may or may not contain arguments, and these arguments
have default values also.
The function body contains a set of statements which defines what the
function does.
Return value is the last expression in the function body which is to be
evaluated.
R also has two types of function, i.e. Built in function and user
defined function.
Built -in function: The functions which are already defined in the
programming framework are known as built in functions. Simple
examples of built -in functions are seq (), mean(), amx(), sum(), paste(…)
etc. They are directly called by user written programs.
print(seq(50, 60))
O/P: [1] 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60
print(mean(30, 40))
O/P: [1] 35
User defined Function: R allows us to create our own function in our
program. They are specific to what a user wants and once created they can
be used like built-in functions.
Example:
areaofCircle < - function (radius){
area = pi*radius^2
return(area)
}
print(areaofCircle(2))
O/P: [1] 12.56637 munotes.in
Page 55
55
Example :
# create a function without an argument.
a.function < - function (){
for(i in 1:5) {
b < - i^2
print( b)
}
}
# call the function a.function without supplying an argument
a.function()
O/P
[1] 1
[1] 4
[1] 9
[1] 16
[1] 25
# create a function with an argument.
a.function < - function (a){
for(i in 1:a) {
b < - i^2
print(b)
}
}
# call the function a.function without supplying 5 as an argument
a.function(5)
O/P
[1] 1
[1] 4
[1] 9
[1] 16
[1] 25
Calling a function with argument values :
#Create a function with argument
a.function < - function(x,y,z) {
esult < - x * y + z
print( result)
}
# call the function by position of arguments
a.function(4, 2, 10)
# call the function by names of the arguments
a.function(x=10, y=4, z=2)
O/P:
[1] 18 munotes.in
Page 56
56
[1] 42
Calling a function with default argument:
#Create a function with argument
a.function < - function(x = 5, y= 7) {
result < - x * y
print( result)
}
# call the function without giving any arguments
a.function()
# call the function with giving new values of the argument.
a.function(10, 6)
O/P:
[1] 35
[1] 60
3.8 R –VECTORS
A vector is a basic data structure. In R, a sequence of elements which
share the same data type is known as vector. A vector supports logical,
integer, double, character, complex, or raw data type. A vector length is
basically the number of elements in the vector, and it is calculated with the
help of the length() function.
Vector is classified into two parts, i.e., Atomic vectors and Lists . There is
only one difference between atomic vectors and lists. In an atomic vector,
all the elements are of the same type, b ut in the list, the elements are of
different data types. The elements which are contained in vector known
as components of the vector. We can check the type of vector with the
help of the typeof() function.
Creation of atomic vector
Single Element Vector: when you write just one value, it becomes a
vector of length 1.
print (“xyz”)
print(25.5)
print(TRUE)
O/P
[1] “xyz”
[1] 25.5
[1] TRUE
Multiple Elements vector:
1. Using the colon ( : ) operator:
# Creating a sequence from 1 to 8
v< - 1:8 munotes.in
Page 57
57
print (v)
# Creating a sequence from 1.5 to 8.5
v< - 1.5:8.5
print (v)
O/P:
[1] 1 2 3 4 5 6 7 8
[1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5
2. Using sequence (seq) operator:
# Create a vector from 1 to 5 incrementing by 0.6
print (seq( 1, 5, by = 0.6 ))
O/P [1] 1.0 1.6 2.2 2.8 3.4 4.0 4.6
3. Using the c () function : The non character values are converted to
character type if one of the elements is a character.
x < - c(‘mango’, ‘yellow’, 10, TRUE)
print(x)
O/P
[1] “mango” “yellow”, “10”, “TRUE”
Accessing Vector Elements: Elements of a Vector are accessed using
indexing. The [ ] brackets are used for indexing. Indexing starts with
position 1. Giving a negative value in the index drops that element from
result. TRUE , FALSE or 0 and 1 can also be used for indexing.
# Accessing vector elem ents using position
x < - c(“Jan”, “Feb”, “Mar”, “Ap ril”, “May”, “Jun”, “July”, “Aug”,
“Sept”, “Oct”, “Nov”, “Dec”)
a < - x[c(2,4,8)]
print(a)
# Accessing vector elements using logical indexing.
b < - x[c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE,
FALSE, TRUE, TRUE, FALSE, FALSE)]
print (b)
# Accessing vector elements using negative indexing.
c < - x[c(-1, -3, -4, -8, -9, -10, -11)]
print(c )
# Accessing vector elements using 0/1indexing.
d < - x[c(0,1,0,0,1,0,0,0,0,0,0,1)]
print(d )
O/P
[1] “Feb” “April” “Aug”
[1] “Jan” “Feb” “May” “Sept” “Oct”
[1] “Feb” “May” “Jun” “July” “Dec”
[1] “Feb” “May” “Dec”
munotes.in
Page 58
58
Vector Manipulation:
Vector arithmetic : Two vectors of same length can be added, subtracted,
multiplied or divided giving the result as a vector output.
#create two vectors.
x < - c(2,7,3,4,0,10)
y < - c(3,10,0,7,1,1)
add.result < - x + y
print(add.result)
multi.result < - x * y
print(multi.result)
O/P:
[1] 5 17 3 11 1 11
[1] 6 70 0 28 0 10
Vector Element Recycling: If we apply arithmetic op erations to two
vectors of unequal length, then the elements of the shorter vector are
recycled to complete the operations.
x < - c(2,7,3,4,0,10)
y < - c(3,10)
# y becomes c(3,10,3,10,3,10)
add.result < - x + y
print(add.result)
O/P: [1] 5 17 6 14 3 20
Vector Element Sorting: Elements in a vector can be sorted using
the sort() function.
x < - c(2,7,3, -11, 4,0,210)
sort.result < - sort(x)
print(sort.result)
resort.result < - sort(x, decreasing - TRUE)
print(resort.result)
O/P:
[1] -11 0 2 3 4 7 210
[1] 210 7 4 3 2 0 -11
3.9 R – LISTS
Lists are heterogeneous data structures. Lists are the R objects which
contain elements of different types. These are also one -dimensional data
structures. A list can be a list of vectors, list of matrices, a list of
characters and a list of functions and so on. List is created
using list() function.
Creating a List:
#Create a list containing strings, numbers, vectors)
list_1 < - list(“Apple”, “Mango”, 25.25, 60.5, c(16,25,36))
print(list_1) munotes.in
Page 59
59
O/P:
[[1]]
[1] “Apple”
[[2]]
[1] “Mango”
[[3]]
[1] 25.25
[[4]]
[1] 60.5
[[5]]
[1] 16 25 36
Naming List Element: The list elements can be given and they can be
accessed using these names.
list_1 < - list(c(“Mon”, “Tues”, “Wed”), matrix(c(1,2,3,4,5,6), nrow = 2)
#Give names to the elements in the list.
names(list_1) < - c(“Days of Week”, “Matrix”)
print(list_1)
O/P
$Days of Week
[1] “Mon” “Tues” “Wed”
$Matrix
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
Accessing List Elements:
Elements of the list can be accessed by the index of the element in the list.
In case of named lists it can also be accessed using the names.
list_1 < - list(c(“Mon”, “Tues”, “Wed”), matrix(c(1,2,3,4,5,6), nrow = 2)
names(list_1) < - c(“Days of Week”, “Matrix”)
print(list_1[1])
print(list_1 $Matrix)
O/P:
$Days of Week
[1] “Mon” “Tues” “Wed”
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
munotes.in
Page 60
60
Manipulating List Elements:
We can add, delete and update list elements as shown below. We can add
and delete elements only at the end of a list. But we can update any
element.
list_1 < - list(c( “Mon”, “Tues”, “Wed”), matrix(c(1,2,3,4,5,6), nrow = 2)
names(list_1) < - c(“Days of Week”, “Matrix”)
#add element at the end of the list
list_1[3] < - “Add Element”
print (list_1[4])
O/P
[[1]]
[1] “Add Element”
Merging Lists:
You can merge many lists into one list by placing all the lists inside one
list() function.
list_a < - list(1,2)
list_b < - list(“Ankit”, “Pooja”)
#merge tow lists
merged.list < - c(list_a, list_b)
print(merged.list)
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] Ankit
[4]]
[1] Pooja
Converting List to vector:
A list can be converted to a vector so that the elements of the vector can be
used for further manipulation. All the arithmetic operations on vectors can
be applied after the list is converted into vectors.
list_a < - list(10:13)
print( list_a)
list_b < - list(20:23)
print( list_b)
#Convert the lists to vectors
x1 < - unlist (list_a)
x2 < - unlist (list_b)
print(x1 )
print(x2 )
add < -x1 + x2 munotes.in
Page 61
61
print( add)
O/P
[[1]]
[1] 10 11 12 13
[[2]]
[1] 20 21 22 23
[1] 10 11 12 13
[1] 20 21 22 23
[1] 30 32 34 36
3.10 R ARRAYS
Arrays are the R data objects which can store data in more than two
dimensions. In R, an array is created with the help of the array() function.
This array() function takes a vector as an input and to create an array it
uses vectors values in the dim parameter.
#create two vectors of different lengths
v1 < - c(1,2,3)
v2 < - c(4,5,6,7,8,9)
#Take these vectors as input to the array
array1 < - (c(v1,v2), dim = c(3,3,2))
print(array1)
O/P
, , 1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
, , 2
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
Naming Columns and Rows: We can give names to the rows, columns
and matrices in the array by using the dimnames parameter.
#create two vectors of different lengths
v1 < - c(1,2,3)
v2 < - c(4,5,6,7,8,9)
column.names < - c(“Col1”, “Col2”, “Col3”)
row.names < - c(“Row1”, “Row2”, “Row3”)
matrix.names < - c(“Matrix1”, “Matrix2”)
array1 < - array(c(v1,v2),dim = c(3,3,2), dimnames = list(row.names,
column.names, matrix.names))
print(array1)
O/P: munotes.in
Page 62
62
, , Matrx1
Col1 Col2 Col3
Row1 1 4 7
Row2 2 5 8
Row3 3 6 9
, , Matr ix2
Col1 Col2 Col3
Row1 1 4 7
Row2 2 5 8
Row3 3 6 9
Accessing Array Elements:
#create two vectors of different lengths
v1 < - c(1,2,3)
v2 < - c(4,5,6,7,8,9)
column.names < - c(“Col1”, “Col2”, “Col3”)
row.names < - c(“Row1”, “Row2”, “Row3”)
matrix.names < - c(“Matrix1”, “Matrix2”)
array1 < - array(c(v1,v2),dim = c(3,3,2), dimnames = list(row.names,
column.names, matrix.names))
#Print the second row of the second matrix of the array.
print(array1[2,2])
#Print the element in the first row and 3rd column of the first matrix.
print(array1[1,3,1])
#Print the first Matrix
print(array1[, ,1])
O/P
Col1 Col2 Col3
2 5 8
[1] 7
Col1 Col2 Col3
Row1 1 4 7
Row2 2 5 8
Row3 3 6 9
Manipulating Array Element: As array is made up matrices in multiple
dimensions, the operations on elements of array are carried out by
accessing elements of the matrices.
#create two vectors of different lengths
v1 < - c(5,9,3 )
v2 < - c(10,11,12,13,14,15 )
#Take these vectors as input to the array
array1 < - array(c(v1,v2),dim = c(3,3,2))
#create two vectors of different lengths
V3 < - c(9,1,0 ) munotes.in
Page 63
63
V4< - c(6,0,11,3,14,1,2,6,9 )
array2 < - array(c( v3,v4 ),dim = c(3,3,2))
#create matrices from these arrays
matrix1 < - array1[, , 2]
matrix2 < - array2[, , 2]
#add the matrices
add1 < - matrix1+matrix2
print(add1)
O/P:
[,1] [,2] [,3]
[1,] 10 20 26
[2,] 18 22 28
[3,] 6 24 30
3.11 SUMMARY
R is world's most widely used statistics programming language . It is the 1
choice of data scientists R is taught to solve critical business applications.
In addition, R is a full -fledged programming language, with a rich
complement of mathematical functions, matrix operations and control
structures. It is very easy to write your own functions. In this chapter we
covered basic programming to different types of data objects of R with
suitable examples in simple and easy steps.
3.12 EXERCISE
1. Find the output of following code.
1) b= "15"
a = switch ( b,
"5"="Hello A",
"10"="Hello B",
"15"="Hello C",
"20"="Hello D" )
print (a)
2) a= 1
b = 2
y = switch (a+b, "Hello, A", "Hello B", "Hello C", "Hello D" )
print (y)
3) # Create vegetable vector
vegetable <- c('Potato' , 'Onion' ,’Brinjal’ , 'Pumpkin' )
for ( x in vegetable) {
print(x)
}
munotes.in
Page 64
64
4) for ( i in c (5, 10, 15, 20, 0, 25)
{
if (i == 0)
{
break
}
print (i)
}
print(“outside loop”)
5) for ( i in c (5, 10, 15, 20, 0, 25)
{
if (i == 0)
{
next
}
print (i)
}
print(“outside loop”)
6) a < - 10
b<- 14
count=0
if(a{
cat(a,"is a smaller number \n")
count=1
}
if(count==1){
cat("Block is successfully execute")
}
7) a <-1
b<-24
count=0
while (a cat(a,"is a smaller number \n")
a=a+2
if(x==15)
break
}
8) a < -24
if(a%%2==0){
cat(a," is an even number")
}
if(a%%2!=0){
cat(a," is an odd number")
}
9) x <- c("Hardwork","is","the","key","of","success") munotes.in
Page 65
65
if("key" %in% x) {
print("key is found")
} else {
print("key is not found")
}
10) Rectangle = function(l=6, w=5){
area = l * w
return(area)
}
print(Rectangle(3, 4))
print(Rectangle(w = 9, l = 3))
print(Rectangle())
3.13 REFERENCES
The Art of R Programming: A Tour of Statistical Software Design by
Norman Matloff
Beginning R – The Statistical Programming Language by Mark
Gardener
https://www.javatpoint.com/
https://www.ict.gnu.ac.in/content/r -programming
https://www.geeksforgeeks.org/
*****
munotes.in
Page 66
66
UNIT I I
4
MOMENTS, SKEWNESS,
AND KURTOSIS
Unit Structure
4.1 Objective
4.2 Introduction
4.3 Moments
4.3.1 Moments for Grouped Data.
4.3.2 Relations between Moments.
4.3.3 Computati on of Moments for Grouped Data.
4.4 Charlie’s C heck and Sheppard’s Corrections.
4.5 Moments in Dimensionless Form
4.6 Skewness
4.7 Kurtosis
4.8 Software Computation of Skewness and Kurtosis .
4.9 Summary
4.10 Exercise
4.11 List of References
4.1 OBJECTIVE
After going through this unit, you will able to :
Define Moments and calculate for ungroup and group data.
Explain types of moments.
Find relation between raw and central moment.
Used Charlier’s check method in computing moments by coding
method.
Define Sheppard’s correction for moments.
Define moments in dimens ional form.
Define Skewness and Kurtosis.
Calculate moments, Skewness and Kurtosis using software.
4.2 INTRODUCTION
The measure of central tendency (location) and measure of dispersion
(variation) both are useful to describe a data set but both of them fail to tell
anything about the shape of the distribution. We need some other certain
measure called the moments to identify the shape of the distribution munotes.in
Page 67
67
known as Skewness and Kurtosis .Moments are statistical measures that
give certain characteristics of the distribution . Moments provide su fficient
information to reconstruct a frequency distribution function . Moments are
a set of statistical parameters to measure a distribution. Four moments are
commonly used: 1st moments for Average, 2nd for Variance, 3rd for
Skewness and 4th moment for Kurtosis.
4.3 MOMENTS
The arithmetic mean of the rth power of deviations taken either from mean,
zero or from any arbitrary origin is called moments. Assume there is
sequence of random variables 𝑥ଵ,𝑥ଶ,𝑥ଷ,………𝑥. The first sample
moment, usually called the average is defined b y first moments. Three
types of moments are defined as follow:
When the deviations are computed from the arithmetic mean, then such
moments are called moments about mean (mean moments) or sometimes
calle d central moments, denoted by 𝜇 and given as follows: Hence for
ungroup data,
i) The first moment about A, as 𝜇ଵ=∑(௫ି௫̅)
.
ii) The second moment about A, as 𝜇ଶ=∑(௫ି௫̅)మ
.
iii) The third moment about A, as 𝜇ଷ=∑(௫ି௫̅)య
.
iv) The fourth moments about A, as 𝜇ସ=∑(௫ି௫̅)ర
When the deviations of the values are computed from any arbitrary value
(provisional mean) say A, then such moments are called moments about
arbitrary or provisional mean denoted by 𝜇(𝑎).Hence for ungroup data,
i) The first moment about A, as 𝜇ଵ(𝑎)=∑(௫ି)
.
ii) The second moment about A, as 𝜇ଶ(𝑎)=∑(௫ି)మ
.
iii) The third moment about A, as 𝜇ଷ(𝑎)=∑(௫ି)య
.
iv) The fourth moments about A, as 𝜇ସ(𝑎)=∑(௫ି)ర
When the deviations of the values are computed from the origin or zero,
then such moments are called momen ts abo ut origin or raw moments
denoted by 𝜇(𝑎)
i) The first moment about origin , as 𝜇′ଵ=∑(௫)
.
ii) The second moment about origin , as𝜇′ଶ=∑(௫)మ
.
iii) The third moment abou t origin , as𝜇′ଷ=∑(௫)య
. munotes.in
Page 68
68
iv) The fourth moments about origin , as 𝜇′ସ=∑(௫)ర
Example 1: Find raw moments for the following data: 5, 8, 12, 4, 6.
Solution: 𝑥 𝑥ଶ 𝑥ଷ 𝑥ସ 5 8 2 4 6 25 64 4 16 36 125 512 8 64 216 625 4096 16 256 1296 𝑥=25 𝑥ଶ=145 𝑥ଷ=925 ∑𝑥ସ=6289
i) The first moment about origin , as 𝜇′ଵ=∑(௫)
=ଶହ
ହ=5 .
ii) The second moment about origin , as𝜇′ଶ=∑(௫)మ
=ଵସହ
ହ=29.
iii) The third moment abou t origin , as𝜇′ଷ=∑(௫)య
=ଽଶହ
ହ=185 .
iv) The fourth moments about origin , as 𝜇′ସ=∑(௫)ర
=ଶ଼ଽ
ହ=1257.8
4.3.1 Moments for Grouped Data :
1. Moments about arbitrary point :Let 𝑥 represents a variable occurring
with frequency 𝑓, in a given distribution, then the 𝑖௬ moment 𝜇(𝑎) about
𝐴 is defined as
𝜇(𝑎)=∑(௫ି)
ே, where 𝑁=∑𝑓.
We generally find moments upto 𝑖=4.
∴we can write :
i) The first moment about A, as 𝜇ଵ(𝑎)=∑(௫ି)
ே .
ii) The second moment about A, as 𝜇ଶ(𝑎)=∑(௫ି)మ
ே .
iii) The third moment about A, as 𝜇ଷ(𝑎)=∑(௫ି)య
ே .
iv) The fourth moments about A, as 𝜇ସ(𝑎)=∑(௫ି)ర
ே.
Example 2: For the following distribution find all four moments about 5. X 2 4 6 8 10 F 4 6 12 5 3
Solution: let prepared table first, 𝑥 𝑓 (𝑥−5) 𝑓(𝑥−5) 𝑓(𝑥−5)ଶ 𝑓(𝑥−5)ଷ 𝑓(𝑥−5)ହ 2 4 -3 -12 36 -108 324 4 6 -1 -6 6 -6 6 6 12 1 12 12 12 12 8 5 3 15 45 135 405 10 3 5 15 75 375 1875 Total 30 24 174 408 2622 munotes.in
Page 69
69
Moments about arbitrary A = 5 is given by
The first moment about A, as 𝜇ଵ(𝑎)=∑(௫ି)
ே=ଶସ
ଷ=0.8 .
The second moment about A, as 𝜇ଶ(𝑎)=∑(௫ି)మ
ே=ଵସ
ଷ=5.8 .
The third moment about A, as 𝜇ଷ(𝑎)=∑(௫ି)య
ே=ସ଼
ଷ=13.6 .
The fourth moments about A, as 𝜇ସ(𝑎)=∑(௫ି)ర
ே=ଶଶଶ
ଷ=87.4.
2. Moments about mean (Central moments):
These are moments about the Arithmetic Mean 𝑥̅. Hence when A is taken
as 𝑥̅, we obtain these moments. Thus it is given by
i) The first moment about 𝑥̅, as
𝜇ଵ=∑(௫ି௫̅)
ே .
ii) The second moment about 𝑥̅, as
𝜇ଶ=∑(௫ି௫̅)మ
ே .
iii) The third moment about 𝑥̅, as
𝜇ଷ=∑(௫ି௫̅)య
ே .
iv) The fourth moments about 𝑥̅, as
𝜇ସ=∑(௫ି௫̅)ర
ே.
From the definition of the mean 𝑥̅ and the standard deviation 𝜎, it
immediately follows that 𝜇ଵ=0 , 𝜇ଶ=𝜎ଶ and 𝜇ଷ measure the asymmetry
of the curve. These moments are important study the nature of the
distribution.
Example 3: Find the central moments for the following distribution: X 1 2 3 4 5 F 2 5 6 5 2
Solution:
𝑥 𝑓 𝑓𝑥 (𝑥−𝑥̅) 𝑓 (𝑥−𝑥̅ ) 𝑓 (𝑥−−𝑥 ଶ) 𝑓(𝑥−𝑥̅)ଷ 𝑓(𝑥−𝑥̅)ସ 1 2 2 -2 -4 8 -16 32 2 5 10 -1 -5 5 -5 5 3 6 18 0 0 0 0 0 4 5 20 1 5 5 5 5 5 2 10 2 4 8 16 32 Total 20 60 0 26 0 74 munotes.in
Page 70
70
Here , 𝑥̅=∑௫
ே=
ଶ=3.
Therefore, the central moments are given by
i) The first moment about 𝑥̅, as
𝜇ଵ=∑(௫ି௫̅)
ே=
ଶ=0 .
ii) The second moment about 𝑥̅, as
𝜇ଶ=∑(௫ି௫̅)మ
ே=ଶ
ଶ=1.3 .
iii) The third moment about 𝑥̅, as
𝜇ଷ=∑(௫ି௫̅)య
ே=
ଶ=0 .
iv) The fourth moments about 𝑥̅, as
𝜇ସ=∑(௫ି௫̅)ర
ே=ସ
ଶ=3.7.
3. Moments about origin(Raw moments):
As the name suggests, taking A as the origin ( A = 0), we get these
moments. Thus it is given by
i) The first moment about Origin , as
𝜇ଵ′=∑௫
ே .
ii) The second moment about Origin, as
𝜇ଶ′=∑௫మ
ே .
iii) The third moment about Origin , as
𝜇ଷ′=∑௫య
ே .
iv) The fourth moments about Origin , as
𝜇ସ′=∑௫ర
ே.
Note that for first moment about origin is mean of the data.
Example 4: Find the raw moments for the following data: X -1 0 1 2 3 4 F 2 4 3 7 3 1
Solution : lets prepared table 𝑥 𝑓 𝑓𝑥 𝑓𝑥ଶ 𝑓𝑥ଷ 𝑓𝑥ସ -1 2 -2 2 -2 2 0 4 0 0 0 0 1 3 3 3 3 3 2 7 14 28 56 112 3 3 9 27 81 243 4 1 4 16 64 256 Total 20 28 76 202 616
Therefore, the raw moments are given by
i) The first moment about Origin , as munotes.in
Page 71
71
𝜇ଵᇱ=∑௫
ே=ଶ଼
ଶ=1.4 .
ii) The second moment about Origin, as
𝜇ଶᇱ=∑௫మ
ே=
ଶ=3.8 .
iii) The third moment about Origin , as
𝜇ଷᇱ=∑௫య
ே=ଶଶ
ଶ=10.1 .
iv) The fourth moments about Origin , as
𝜇ସᇱ=∑௫ర
ே=ଵ
ଶ=30.8.
4.3.2 Relations between Moments:
We studied three different types of moments. Now it is very useful to
simplifying relation between them. We will now give inter -relation
between various moments and solve example using these relations.
Relation between moments about arbitrary point and the central moment:
i) 𝜇ଵ=𝜇ଵ(𝑎)−𝜇ଵ(𝑎)=0
ii) 𝜇ଶ=𝜇ଶ(𝑎)−𝜇ଵ(𝑎)ଶ
iii) 𝜇ଷ=𝜇ଷ(𝑎)−3𝜇ଵ(𝑎)𝜇ଶ(𝑎)+2𝜇ଵ(𝑎)ଷ
iv) 𝜇ସ=𝜇ସ(𝑎)−4𝜇ଵ(𝑎)𝜇ଷ(𝑎)+6𝜇ଵ(𝑎)ଶ𝜇ଶ(𝑎)−3𝜇ଵ(𝑎)ସ
Conversely the moments 𝜇(𝑎)′𝑠 about A in term of 𝜇′𝑠are given as
follow s:
i) 𝜇ଵ(𝑎)=𝑥̅−𝐴
ii) 𝜇ଶ(𝑎)=𝜇ଶ+𝜇ଵ(𝑎)ଶ
iii) 𝜇ଷ(𝑎)=𝜇ଷ+3𝜇ଶ𝜇ଵ(𝑎)+𝜇ଵ(𝑎)ଷ
iv) 𝜇ସ(𝑎)=𝜇ସ+4𝜇ଷ𝜇ଵ(𝑎)+6𝜇ଶ𝜇ଵ(𝑎)ଶ+𝜇ଵ(𝑎)ସ
Relation between Raw moments and central moments:
Recall that, the raw moments 𝜇′ are obtained from the general moments
𝜇(𝑎) when A is taken as ‘0’.
Hence taking A as ‘0’and replacing 𝜇(𝑎) by corresponding 𝜇′ in the
formula, we get
i) 𝜇ଵ=𝜇′ଵ−𝜇′ଵ=0
ii) 𝜇ଶ=𝜇ଶ′−𝜇ଵ′ଶ
iii) 𝜇ଷ=𝜇ଷ′−3𝜇ଵ′𝜇ଶ′+2𝜇ଵ′ଷ
iv) 𝜇ସ=𝜇ସ′−4𝜇ଵ′𝜇ଷ′+6𝜇ଵ′ଶ𝜇ଶ′−3𝜇ଵ′ସ
Conversely the moments 𝜇′ in term of 𝜇 are given as follows:
i) 𝜇ଵ′=𝑥̅
ii) 𝜇ଶ′=𝜇ଶ+𝜇ଵ′ଶ munotes.in
Page 72
72
iii) 𝜇ଷ′=𝜇ଷ+3𝜇ଶ𝜇ଵ′+𝜇ଵ′ଷ
iv) 𝜇ସ′=𝜇ସ+4𝜇ଷ𝜇ଵ′+6𝜇ଶ𝜇ଵ′ଶ+𝜇ଵ′ସ
Example 5: The first four central moments of a distribution are 0, 3, 5, 10.
If the mean of the distribution is 2, find the moments about 3.
Solution: We have 𝐴=3,𝑥̅=2,𝜇ଵ=0,𝜇ଶ=3,𝜇ଷ=5,𝜇ସ=10.
Using the relation between central moments and arbitrary moments,
i) 𝜇ଵ(𝑎)=𝑥̅−𝐴=2−3=−1.
ii) 𝜇ଶ(𝑎)=𝜇ଶ+𝜇ଵ(𝑎)ଶ=3+(−1)ଶ=4
iii) 𝜇ଷ(𝑎)=𝜇ଷ+3𝜇ଶ𝜇ଵ(𝑎)+𝜇ଵ(𝑎)ଷ=5+3(3)(−1)+
(−1)ଷ=−5.
iv) 𝜇ସ(𝑎)=𝜇ସ+4𝜇ଷ𝜇ଵ(𝑎)+6𝜇ଶ𝜇ଵ(𝑎)ଶ+𝜇ଵ(𝑎)ସ
=10+4(5)+6(3)(−1)ଶ+(−1)ସ=49.
Example 6: The first four raw moments about the origin are 2, 12, 74 and
384. Find the mean 𝑥̅ and the first four central moments.
Solution: We already define the raw moments about the origin i.e.
𝜇(𝑎)′s with 𝐴=0. Given that 𝜇ଵ(𝑎)=2,𝜇ଶ(𝑎)=12,𝜇ଷ(𝑎)=
74,𝑎𝑛𝑑 𝜇ସ(𝑎)=384,𝑤𝑖𝑡ℎ 𝐴=0.
Therefore, Mean =𝑥̅=𝜇ଵ(𝑎)+𝐴=2+0=2.
Using the relation between raw moments and central moments
i) 𝜇ଵ=𝜇ଵ(𝑎)−𝜇ଵ(𝑎)=2−2=0
ii) 𝜇ଶ=𝜇ଶ(𝑎)−𝜇ଵ(𝑎)ଶ=12−2ଶ=8
iii) 𝜇ଷ=𝜇ଷ(𝑎)−3𝜇ଵ(𝑎)𝜇ଶ(𝑎)+2𝜇ଵ(𝑎)ଷ=74−
3(2)(12)+2(2ଶ)=10
iv) 𝜇ସ=𝜇ସ(𝑎)−4𝜇ଵ(𝑎)𝜇ଷ(𝑎)+6𝜇ଵ(𝑎)ଶ𝜇ଶ(𝑎)−3𝜇ଵ(𝑎)ସ
=284−4(2)(74)+6(2ଶ)(12)−3(2ସ)
=384−592+288−48=128.
Example 7: The first four central moments for a distribution are 0,3,0 and
7. If the mean 𝑥̅ of the distribution is 4, find the first four raw moments.
Solution: The raw moments are the moments about origin. Given that
𝜇ଵ=0,𝜇ଶ=3,𝜇ଷ=0 𝑎𝑛𝑑 𝜇ସ=7 𝑤𝑖𝑡ℎ 𝑥̅=4.
Using the relation between central moments and raw moments.
i) 𝜇ଵᇱ=𝑥̅=4
ii) 𝜇ଶᇱ=𝜇ଶ+𝜇ଵ′ଶ=3+4ଶ=17.
iii) 𝜇ଷᇱ=𝜇ଷ+3𝜇ଶ𝜇ଵᇱ+𝜇ଵ′ଷ=0+3(3)(4)+4ଷ=100.
𝜇ସᇱ=𝜇ସ+4𝜇ଷ𝜇ଵᇱ+6𝜇ଶ𝜇ଵ′ଶ+𝜇ଵ′ସ munotes.in
Page 73
73
4.3.3 Computat ion of Moments for Grouped Data:
We have already found mean and standard deviation for the continuous
data (grouped data). Now to calculate moments for the continuous data we
used coding method (Short method).
When the values of 𝑥 are not consecutive, but equally spaced at an interval
of length ′𝑐′. We need to divide the expression by ′𝑐′. It is called change of
scale by ′𝑐′.
Where we take 𝑥=𝑎+𝑐𝑢 𝑜𝑟 𝑢=௫ି
.
We give below the effect of change of origin and scale on mo ments.
Let 𝑥=𝑎+𝑐𝑢 ,
∴ 𝑥̅=𝑎+𝑐𝑢ത.
i)The moments of 𝑥 about A are given by
𝜇(𝑎)=∑𝑓𝑢
𝑁×𝑐
ii) The central moments of 𝑥 are given by
𝜇=∑𝑓(𝑢−𝑢ത)
𝑁×𝑐
Note: When A = 0, we get the raw moment.
Example 8: Find the central moments for the following data: Class interval 0-20 20-40 40-60 60-80 Frequency 4 7 6 3
Solution: first find mean by coding method, taking 𝑎=30
Here, 𝑢ത=∑௨
ே=଼
ଶ=0.4.
The central moments of 𝑥 are given by C.I F Cla
ss
Mar
ks
(x) 𝑢
=𝑥−𝑎
𝑐 F
u (𝑢
−0.4) 𝑓(𝑢
−0.4) 𝑓(𝑢
−0.4)ଶ 𝑓(𝑢
−0.4)ଷ 𝑓(𝑢
−0.4)ସ 0-
20 4 10 -1 -4 -1.4 -5.6 7.84 -10.976 15.3664 20-
40 7 30 0 0 -0.4 -2.8 3.92 -0.448 0.1792 40-
60 6 50 1 6 0.6 3.6 2.16 1.296 0.7776 60-
80 3 70 2 6 1.6 4.8 7.68 12.288 19.6608 Tot
al 2
0 8 0 21.6 2.16 35.984 munotes.in
Page 74
74
𝜇=∑𝑓(𝑢−𝑢ത)
𝑁×𝑐
i) 𝜇ଵ=∑(௨ି௨ഥ)భ
ே×𝑐ଵ=
ଶ×20=0.
ii) 𝜇ଶ=∑(௨ି௨ഥ)మ
ே×𝑐ଶ=ଶଵ.
ଶ×20ଶ=432.
iii) 𝜇ଷ=∑(௨ି௨ഥ)య
ே×𝑐ଷ=ଶ.ଵ
ଶ×20ଷ=864.
iv) 𝜇ସ=∑(௨ି௨ഥ)ర
ே×𝑐ସ=ଷହ.ଽ଼ସ
ଶ×20ସ=2,87,872.
4.4 CHARLIE’S CHECK AND SHEPPARD’S CORRECTIONS
A check which can be used to verify correct computations in a table of
grouped classes. For example, consider the following table with specified
class limits and frequencies 𝑓. The class marks 𝑥 are then computed as
well as the rescaled fr equencies 𝑢, which are given by
𝒖𝒊=𝒇𝒊−𝒙𝟎
𝒄
Where the class mark is taken as 𝑥=44.5 and the class interval is 𝑐=
10. The remaining quantities are then computed as follows. Class interval 𝑥 𝑓 𝑢 𝑓𝑢 𝑓𝑢ଶ 𝑓(𝑢+1)ଶ 0-9 4.5 2 -4 -8 32 18 10-19 14.5 3 -3 -9 27 12 20-29 24.5 11 -2 -22 44 11 30-39 34.5 20 -1 -20 20 0 40-49 44.5 32 0 0 0 32 50-59 54.5 25 1 25 25 100 60-69 64.5 7 2 14 28 63 Total 100 -20 176 236
In order to compute the variance , note that
𝑉(𝑢)=∑𝒇𝒊𝒖𝒊𝟐
∑𝒇𝒊−ቆ∑𝒇𝒊𝒖𝒊
∑𝒇𝒊ቇଶ
=176
100−൬−20
100൰ଶ
=1.72
So the variance of the original data is
𝑉(𝑥)=𝑐ଶ𝑉(𝑢)=100×1.72=172. munotes.in
Page 75
75
Charlier's check makes use of the additional column 𝑓(𝑢+1)ଶ added to
the right side of the table. By noting that the identity
𝒇𝒊(𝒖𝒊+𝟏)𝟐= 𝒇𝒊(𝒖𝒊𝟐+𝟐𝒖𝒊+𝟏)
=𝒇𝒊𝒖𝒊𝟐+𝟐𝒖𝒊+𝒇𝒊
connects columns five through seven, it can be checked that the
computations have been done correctly. In the example above, 236 = 176 +2 (- 20) +100 (8) Hence, the computations pass Charlier's check.
Charlier's check in computing moments by the coding method uses the
following identities:
𝑓(𝑢+1)=𝑓𝑢+𝑓
𝑓(𝑢+1)ଶ=𝑓𝑢ଶ+2𝑓𝑢+𝑓
𝑓(𝑢+1)ଷ=𝑓𝑢ଷ+3𝑓𝑢ଶ+3𝑓𝑢+𝑓
𝑓(𝑢+1)ସ
=𝑓𝑢ସ+4𝑓𝑢ଷ+6𝑓𝑢ଶ+4𝑓𝑢
+𝑓
Sheppard’s Corrections:
When the frequency distribution, consists of interval, we take 𝑥 as the
class mark of the interval and use this 𝑥 in all the formulae.
While doing this, it is assumed that all the values in the interval
concentrate at the class mark. But this assumption may not be always true
and we are likely to get some errors in this calculation.
The well -known statistician Sheppa rd gave the corrected values of the
moments as follows:
𝜇ଵ(𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑 )=𝜇ଵ
𝜇ଶ(𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑 )=𝜇ଶ−𝑐ଶ
12
𝜇ଷ(𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑 )=𝜇ଷ
𝜇ସ=𝜇ସ−1
2𝑐ଶ𝜇ଶ+7
240𝑐ସ munotes.in
Page 76
76
Where ‘c’ is the length of the class -interval, which is the same as the
spacing between the mid values.
Note that even though this correction has great mathematical significance,
we need not use these corrections in practice because the error is too small
hence negligible and also in statistic, we look for estimates, which are
approximate values .
Example 9:Apply Sheppard’s corrections to determine the moments abo ut
the mean for the data Class Interval 0-10 10-20 20-30 30-40 40-50 Frequency 1 2 9 2 6
Solution: Lets prepared table, taking 𝐴=25. Class Interval Class Mark (X) f 𝑢
=𝑥−𝑎
𝑐 𝑓𝑢 (𝑢−0.5) 𝑓(𝑢−0.5) 𝑓(𝑢−0.5)ଶ 𝑓(𝑢−0.5)ଷ 𝑓(𝑢−0.5)ସ 0-10 5 1 -2 -2 -2.5 -2.5 12.5 -15.625 39.0625 10-20 15 2 -1 -2 -1.5 -3 4.5 -6.75 10.125 20-30 25 9 0 0 -0.5 -4.5 2.25 -1.125 0.5625 30-40 35 2 1 2 0.5 1 0.5 0.25 0.125 40-50 45 6 2 12 1.5 9 13.5 20.25 30.375 Total 20 10 0 33.25 -3 80.25
Here, 𝑢ത=∑௨
ே=ଵ
ଶ=0.5.
The central moments of 𝑥 are given by
𝜇=∑𝑓(𝑢−𝑢ത)
𝑁×𝑐
i) 𝜇ଵ=∑(௨ି௨ഥ)భ
ே×𝑐ଵ=
ଶ×20=0.
ii) 𝜇ଶ=∑(௨ି௨ഥ)మ
ே×𝑐ଶ=ଷଷ.ଶହ
ଶ×20ଶ=665.
iii) 𝜇ଷ=∑(௨ି௨ഥ)య
ே×𝑐ଷ=ିଷ
ଶ×20ଷ=−1200.
iv) 𝜇ସ=∑(௨ି௨ഥ)ర
ே×𝑐ସ=଼.ଶହ
ଶ×20ସ=6,42,000.
Sheppard gave the corrected values of the moments as follows: munotes.in
Page 77
77
𝜇ଵ(𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑 )=𝜇ଵ=0, 𝜇ଶ(𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑 )=𝜇ଶ−మ
ଵଶ=665−ଶమ
ଵଶ=
631.67
𝜇ଷ(𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑 )=𝜇ଷ=−1200
𝜇ସ=𝜇ସ−ଵ
ଶ𝑐ଶ𝜇ଶ+
ଶସ𝑐ସ=6,42,000−ସ
ଶ×665+
ଶସ×1,60,000=
5,13,666.67.
4.5 MOMENTS IN DIMENSIONLESS FORM
To avoid particular units, we can define the dimensionless central
moments as
𝑎=𝜇
𝜎
Where 𝜎 is the standard deviation, so, as we have 𝜎=√𝜇ଶ ,
We already know that for central moments, 𝜇ଵ=0,𝜇ଶ=𝜎ଶ.
So, we get 𝑎=0 𝑎𝑛𝑑 𝑎ଵ=1.
4.6 SKEWNESS
Skewness is one more concept which deals with the symmetry or rather
asymmetry of the values of distribution around its central value. When a
frequency distribution is plotted on a chart, an ideal distribution by a nice,
symmetric, bell -shaped curve around the central value. Such a distribution
is called symmetric distribution or a normal distribution. However in
practice every distribution that we across need not be normal. Their graph
will be asym metric or skew. Such distributions are called skewed
distribution.
Definition: Skewness defined by famous statistician Garrett “ A
distribution is said to be skewed when the mean and median fall at
different points of the distribution and balance is shift ed to one side or the
other to left or right. ”
Types of Skewness: In order to unde rstand this concept we draw the
following graphs, where 𝑥̅= Mean, 𝑀= Median and 𝑀= Mode of the
distributions.
Figure 4.1
munotes.in
Page 78
78
It is clear from the diagram that
i) Represents a symmetric distribution, for which Mean = Median =
Mode.
ii) Represents a positive skewed distribution for which Mode < Median
iii) Represents a negative skewed distribution for which Mean < Median
Measure of Skewness:
Since mean, median a nd mode are different for a skewed distribution, the
simplest measure would be the difference between two of these in pairs.
Though such measures are simple to calculate, their main drawback is the
following: these measures are expressed with respect to th e corresponding
units of the distribution. Therefore two distributions with different units
cannot be compared. In order to overcome this difficulty, relative
measures are defined. These are called Coefficients of Skewness.
Karl Pearson’s Coefficient of S kewness: it is defined as
𝑆=𝑀𝑒𝑎𝑛−𝑀𝑜𝑑𝑒
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 =𝑥̅−𝑀
𝜎
Using the relation between mean, median and mode:
Mean – Mode = 3 (Mean – Median), we can write
𝑆=3(𝑀𝑒𝑎𝑛−𝑀𝑒𝑑𝑖𝑎𝑛)
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛=3(𝑥̅−𝑀)
𝜎
Interpretation on 𝑆
i) If 𝑆 is positive then the distribution is positively skewed.
ii) If 𝑆 is negative then the distribution is negatively skewed.
iii) If 𝑆=0 then the distribution is symmetric.
iv) Theoretica lly the limits of 𝑆 are from -3 to +3.
Example 10: For the following ungrouped data find the Karl Pearson’s
Coefficient of Skewness.
12,18,25,15, 16, 10, 8 15, 27,14
Solution: For the Karl Pearson’s Coefficient of Skewness we need to find
mean, mode and standard deviation of the data.
Mean = 𝑥̅=∑௫
=ଵଶାଵ଼ାଶହାଵହାଵାଵା଼ାଵହାଶାଵସ
ଵ=ଵ
ଵ=16.
Mode = 15 ( number which repeated maximum time)
𝑥ଶ=144+324+625+225+256+100+64+225+729
+196=2,888 munotes.in
Page 79
79
𝜎=ඨ∑𝑥ଶ
𝑛−(𝑥̅)ଶ=ඨ2,888
10−(15)ଶ=√288.8−225=√63.8=7.99
Therefore, the Karl Pearson’s Coefficient of Skewness is
𝑆=𝑀𝑒𝑎𝑛−𝑀𝑜𝑑𝑒
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 =𝑥̅−𝑀
𝜎=16−15
7.99=1
7.99=0.125
Example 11: For the following grouped data find th e Karl Pearson’s
Coefficient of Skewness. Also interpret the type of distribution. C.I 0-4 4-8 8-12 12-16 16-20 F 1 3 10 4 2
Solution: First we find mean, mode and standard deviation. C.I F x fx 𝑓𝑥ଶ 0-4 1 2 2 4 4-8 3 6 18 108 8-12 10 10 100 1000 12-16 4 14 56 784 16-20 2 18 36 648 Total 20 212 2,544
Mean = 𝑥̅=∑௫
ே=ଶଵଶ
ଶ=10.6.
Standard deviation =
𝜎=ඨ∑𝑓𝑥ଶ
𝑁−(𝑥̅)ଶ=ඨ2,544
20−(10.6)ଶ=√127.2−112.36=√14.84
=3.85.
𝑀𝑜𝑑𝑒=𝑙ଵ+𝑓ଵ−𝑓
2𝑓ଵ−𝑓−𝑓ଶ×ℎ
𝑀𝑜𝑑𝑒=8+10−3
2(10)−3−4×4=10.15.
Therefore, the Karl Pearson’s Coefficient of Skewness is
𝑆=𝑀𝑒𝑎𝑛−𝑀𝑜𝑑𝑒
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 =𝑥̅−𝑀
𝜎=10.6−10.15
3.85=0.45
3.85=0.12.
Bowley’s Coefficient of Skewness:
This measure is based on Quartiles, hence it is also known as Quartile
Coefficient of Skewness. It is given by
𝑺𝑩=𝑸𝟑+𝑸𝟏−𝟐𝑸𝟐
𝑸𝟑−𝑸𝟏
The limits of Bowley’s Coefficient of Skewness are between -1 to +1.
Example 12: Find the Bow ley’scoefficient of Skewness for the following
information are given: 𝑄ଵ=12.5,𝑄ଶ=17.2,𝑄ଷ=24.7
munotes.in
Page 80
80
Solution: Given that
𝑄ଵ=12.5,𝑄ଶ=17.2,𝑄ଷ=24.7
The Bowley’s coefficient of Skewness is given by
𝑆=𝑄ଷ+𝑄ଵ−2𝑄ଶ
𝑄ଷ−𝑄ଵ=24.7+12.5−2(17.2)
24.7−12.5=2.8
12.2=0.23
Example 13:Find the Bowley’s coefficient of Skewness for the following
distribution: X 1 3 5 7 9 11 F 3 8 14 20 18 7
Solution: Let find all three quartiles for the distribution: X F cf(Cummulative frequency) 1 3 3 3 8 11 5 14 25 7 20 45 9 19 64 11 7 71 Total N =71
Therefore,
𝑄ଵ=𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 1൬𝑁+1
4൰൨௧
𝑖𝑡𝑒𝑚
=𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 1൬71+1
4൰൨௧
𝑖𝑡𝑒=𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 18𝑡ℎ 𝑖𝑡𝑒𝑚=5
𝑄ଶ=𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 2൬𝑁+1
4൰൨௧
𝑖𝑡𝑒𝑚
=𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 2൬71+1
4൰൨௧
𝑖𝑡𝑒=𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 36𝑡ℎ 𝑖𝑡𝑒𝑚=7
𝑄ଷ=𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 3൬𝑁+1
4൰൨௧
𝑖𝑡𝑒𝑚
=𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 3൬71+1
4൰൨௧
𝑖𝑡𝑒=𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 54𝑡ℎ 𝑖𝑡𝑒𝑚=9
The Bowley’s coefficient of Skewness is given by
𝑆=𝑄ଷ+𝑄ଵ−2𝑄ଶ
𝑄ଷ−𝑄ଵ=9+5−2(7)
9−5=0
.4=0.
Therefore, the distribution is symmetric.
munotes.in
Page 81
81
4.7 KURTOSIS
Kurtosis in Greek means ‘Bulginess’. In statistics, Kurtosis refers to the
degree of flatness or peakedness around the mode of a frequency curve.
The measure of kurtosis is with respect to a normal curve, which is
accepted as a yardstick to decide the nature of oth er curves.
In other words measures of kurtosis tell us to what extent the given
distribution is flat or peaked with respect to the standard normal curve.
Figure 4.2
i) Normal curve is called Mesokurtic (M).
ii) Flat one is called Platykurtic (P).
iii) Peaked is c alled Leptokurtic (L).
Measures of Kurtosis:
The most prominent measure of kurtosis is the coefficient 𝛽ଶ, given by
𝛽ଶ=𝜇ସ
𝜇ଶଶ
Where 𝜇′𝑠are the moment about mean 𝑥̅.
Bigger value of 𝛽ଶ gives more peak to the distributions. For normal
distribution 𝛽ଶ= 3.
Hence the given distribution is :
i) Leptokurtic, if 𝛽ଶ> 3.
ii) Mesokurtic, if 𝛽ଶ= 3.
iii) Platykurtic, if 𝛽ଶ< 3.
Example 14: for the following distribution find 𝛽ଵ and 𝛽ଶ and comment
on the Skewness and Kurtosis of the distribution.
munotes.in
Page 82
82
X 2 3 4 5 f 4 3 2 1 Solution: First calculate moments about mean for the given distribution.
𝑥̅=∑𝑓𝑥
∑𝑓=2(4)+3(3)+4(2)+5(1)
4+3+2+1=30
10=3. 𝑥 𝑓 (𝑥−3) 𝑓(𝑥−3) 𝑓(𝑥−3)ଶ 𝑓(𝑥−3)ଷ 𝑓(𝑥−3)ସ 2 4 -1 -4 4 -4 4 3 3 0 0 0 0 0 4 2 1 2 2 2 2 5 1 2 2 4 8 16 Total 10 0 10 6 22
The central moments are given by
i) The first moment about 𝑥̅, as
𝜇ଵ=∑(௫ି௫̅)
ே=
ଵ=0 .
ii) The second moment about 𝑥̅, as
𝜇ଶ=∑(௫ି௫̅)మ
ே=ଵ
ଵ=1 .
iii) The third moment about 𝑥̅, as
𝜇ଷ=∑(௫ି௫̅)య
ே=
ଵ=0.6 .
iv) The fourth moments about 𝑥̅, as
𝜇ସ=∑(௫ି௫̅)ర
ே=ଶଶ
ଵ=2.2.
𝛽ଵ=𝜇ଷଶ
𝜇ଶଷ=(0.6)ଶ
(1)ଷ=0.36.
𝛽ଶ=𝜇ସ
𝜇ଶଶ=2.2
(1)ଶ=2.2
Since 𝛽ଵ≠0, the curve is not symmetric Also 𝜇ଷ=0.6>0.
Therefore, the curve is positively skewed.
Since 𝛽ଶ=2.2<3.
Therefore, the curve is flat as compared to the normal curve.
Hence the distribution is platykurtic.
4.8 SOFTWARE COMPUTATION OF SKEWNESS AND KURTOSIS
To compute Skewness and Kurtosis by using different software are given
below:
Sigma Magic : Using the Sigma Magic software, calculating the Skewness
and Kurtosis is relatively straightforward. Just add a new Basic Statistics
template to Excel by clicking on Stat > Basic Statistics. Copy and paste munotes.in
Page 83
83
the data for which you want to Skewness and Kurto sis into the input area
and then click on Compute Outputs. The analysis results will include the
Skewness and Kurtosis values.
Excel : You could also calculate these values in Excel by using the formula
=SKEW(…) for the Skewness value, =KURT(…) for the Kurt osis value.
Minitab : If you use the Minitab software, you can copy and paste the data
into Minitab and then click on Stat > Basic Statistics > Display
Descriptive Statistics. Then select the data column and then click on OK.
This will print out the quarti les for the sample values. If you want the
Skewness and Kurtosis values, you have to go back to the menu and click
on Statistics and select the checkbox next to Skewness and Kurtosis in the
statistics options. Note that the values provided by Minitab may b e slightly
different from Excel and Sigma Magic software.
4.10 SUMMARY
In this unit, we have discussed:
Moments and its types for ungroup and grouped data.
The relation between raw, arbitrary and central moments.
The effect of change of origin and scale on moments.
Charlie’s check, and Shepha rd’s Correction for Moments.
Skewness and about symmetry of distribution.
Kurtosis .
4.11 EXERCISE
1.The first four moments of a distribution are 1, 4, 10 and 46 respectively.
Compute the moment coefficients of skewness and kurtosis and comment
upon the nature of the distribution.
2. Compute the first four central moments from the following data. Also
find the two beta coefficients.
X 5 10 15 20 25 30 35 f 8 15 20 32 23 17 5
3. The first four central moments of a distribution are 0, 2.5, 0.7 and 18.75.
Examine the skewness and kurtosis of the distribution.
4. Calculate first four central moments for the following distribution: Class interval 0-4 4-8 8-12 12-16 16-20 Frequency 5 8 13 9 5
5. Find the first four arbitrary moments about A = 7 for the following: munotes.in
Page 84
84
10, 5, 8, 7, 2, 3, 12, 14
6. Find raw moment for the following data: C.I 5-10 10-15 15-20 20-25 25-30 f 3 4 7 4 2
7. The first four central moments of a distribution are 0, 15, 36, 78. If the
mean of the distribution is 8, find the moments about A = 5.
8. The first four raw moments about origin are 4, 16, 33, 89. Find mean
and the first four central moments.
9. For the following data verify Charlie’s check : C.I 2-8 8-14 14-20 20-26 f 1 4 3 2
10. Find the first four central moments using coding method, also find
Sheppard’s correction for moments. C.I 0-5 5-10 10-15 15-20 20-25 25-30 f 3 8 12 13 7 2
11. For the following data , find Karl Pearson’s coefficient of Skewness
and also find the type of distribution:
i) 12,15,17, 12,8,25,16,6,7,41
ii) 3,7,8, 12, 15 X 1 2 3 4 5 6 7 f 2 8 12 15 18 9 6 C.I 0-2 2-4 4-6 6-8 8-10 frequency 4 7 13 10 6
12. Find the Bowely’s coefficient of Skewness for each of the following:
i) 𝑄ଵ=165.5,𝑄ଶ=184.3,𝑄ଷ=196.7.
ii) 2,8,7,12,14,17,20.
X 1 2 3 4 5 6 7 F 2 8 12 15 18 9 6
4.12 LIST OF REFERENCES
Statistics by Murry R. Spiegel, Larry J. Stephens. Publication
McGRAWHILL INTERNATIONAL.
Fundamental Statistic by S.C. Gupta
***** munotes.in
Page 85
85
5
ELEMENTARY PROBABILITY
THEORY
Unit Structure
5.1 Objective
5.2 Introduction
5.3 Definitions of Probability
5.4 Conditional Probability
5.4.1 Independent and Dependent Events, Mutually Exclusive
Events
5.5 Probability Distributions
5.6 Mathematical Expectation
5.7 Combinatorial Analysis
5.8 Combinations, Stirling’s Approximation to n!
5.9 Relation of Probability to Point Set Theory, Euler or Venn Diagrams
and Probability
5.10 Summary
5.11 Exercise
5.12 List of References
5.1 OBJECTIVE
After going through this unit, you will able to :
Determine the probability of different experimental results.
Explain the concept of probability.
Calculate probability for simple, compound and complimentary
events.
Conditional probability and i ts examples.
Independent events and multiplication theorem of probability.
Probability distribution and its Expected value of probability
distribution.
Combination and Stirling’s number approximation.
Relations between probability and set theory with help of Venn
diagram.
5.2 INTRODUCTION Some time s in daily life certain things come to mind like “I will be success
today’, I will complete this work in hour, I will be selected for job and so munotes.in
Page 86
86
on. There are many possible results for these things but we are happy
when we get required result. Probability theory deals with experiments
whose outcome is not predictable with certainty. Probability is very useful
concept. These days many field in comp uter science such as machine
learning, computational linguistics, cryptography, computer vision,
robotics other also like science, engineering, medicine and management.
Probability is mathematical calculation to calculate the chance of
occurrence of some happening , we need some bas ic concept on random
experiment , sample space, and events.
Basic concept of probability:
Random experiment: When experiment can be repeated any number of
times under the similar conditions but we get different results on same
experiment, also result is not predictable such experiment is called random
experiment. For.e.g. A coin is tossed, A die is rolled and so on.
Outcomes: The result which we get from random experiment is called
outcomes of random experiment.
Sample space: The set of all possible outcomes of random experiment is
called sample space. The set of sample space is denoted by S and number
of elements of sample space can be written as 𝑛(𝑆). For e.g. A die is
rolled, we get ={1,2,3,4,5,6} , 𝑛(𝑆)=6.
Events: Any subset of the sample space is called an event. Or a set of
sample point which satisfies the required condition is called an events.
Number of elements in event set is denoted by 𝑛(𝐸).For example in the
experiment of throwing of a dia. The sample space is
S = {1, 2, 3, 4, 5, 6 } each of the following can be an event :
i) A: even number i.e. A = { 2, 4, 6} ii) B: multiple of 3 i.e. B = { 3, 6}
iii) C: prime numbers i.e. C = { 2, 3, 5}.
Types of events:
Impossible event: An event which does not occurre d in random
experiment is called impossible event. It is denoted by ∅ set. i. e. 𝑛(∅)=
0. For example getting number 7 when die is rolled. The probability
measure assigned to impossible event is Zero.
Equally likely events : when all events get equal chance of occurrences is
called equally likely events. For e.g. Events of occurrence of head or tail
in tossing a coin are equally likely events.
Certain event: An event which contains all sample space elements is
called certain events. i.e. 𝑛(𝐴)=𝑛(𝑆).
Mutually e xclusive events: Two events A and B of sample space S, it
does not have any common elements are called mutually exclusive events.
In the experiment of throwing of a die A: number less than 2 , B: multiple
of 3. There fore 𝑛(𝐴∩𝐵)=0
Exhaustive events: Two events A and B of sample space S, elements of
event A and B occurred together are called exhaustive events. For e.g. In a munotes.in
Page 87
87
thrown of fair die occurrence of even number and occurrence of odd
number are exhaustive events. There fore 𝑛(𝐴∪𝐵)=1.
Complement event: Let S be sample space and A be any event than
complement of A is denoted by 𝐴̅ is set of elements from sample space S,
which does not belong to A. For e.g. if a die is thrown, S = {1, 2, 3, 4, 5,
6} and A: odd numbers, A = {1, 3, 5}, then 𝐴̅={2,4,6}.
5.3 DEFINITIONS OF PROBABILITY
Probability: For any random experiment, sample space S with required
chance of happing event E than the probability of event E is define as
𝑃(𝐸)=𝑛(𝐸)
𝑛(𝑆)
Basic properties of probability:
1) The probability of an event E lies between 0 and 1. i.e. 0≤𝑃(𝐸)≤1.
2) The probability of impossible event is zero. i.e. 𝑃(∅)=0.
3) The probability of certain event is unity. i.e. 𝑃(𝐸)=1.
4) If A and B are exhaustive events than probability of 𝑃(𝐴∪𝐵)=1.
5) If A and B are mutually exclusive events than probability of 𝑃(𝐴∩
𝐵)=0.
6) If A be any event of sample space than probability of complement of
A is given by 𝑃(𝐴)+𝑃(𝐴̅)=1⇒∴𝑃(𝐴̅)=1−𝑃(𝐴).
Probability Axioms:
Let S be a sample space. A probability function P from the set of all event
in S to the set of real numbers satisfies the following three axioms for all
events A and B in S.
i) 𝑃(𝐴)≥0 .
ii) 𝑃(∅)=0 and𝑃(𝑆)=1.
iii) If A and B are two disjoint sets i.e. 𝐴∩𝐵=∅) than the probability of
the union of A and B is 𝑃(𝐴∪𝐵)=𝑃(𝐴)+𝑃(𝐵).
Theorem : Prove that for every event A of sample space S, 0≤𝑃(𝐴)≤
1.
Proof: 𝑆=𝐴∪𝐴̅ , ∅=𝐴∩𝐴̅.
∴1=𝑃(𝑆)=𝑃(𝐴∪𝐴̅)=𝑃(𝐴)+𝑃(𝐴̅)
∴1=𝑃(𝐴)+𝑃(𝐴̅)
⇒𝑃(𝐴)=1−𝑃(𝐴̅)or𝑃(𝐴̅)=1−𝑃(𝐴).
If 𝑃(𝐴)≥0. than P( 𝐴̅)≤1.
∴for every event 𝐴; 0≤𝑃(𝐴)≤1.
Addition theorem of probability:
Theorem: If A and B are two events of sample space S, then probability of
union of A and B is given by 𝑃(𝐴∪𝐵)=𝑃(𝐴)+𝑃(𝐵)−𝑃(𝐴∩𝐵). munotes.in
Page 88
88
Proof: A and B are two events of sample space S.
Now from diagram probability of union of two events A and B is given by,
𝑃(𝐴∪𝐵)=𝑃(𝐴∩𝐵ത)+𝑃(𝐴∩𝐵)+𝑃(𝐵∩𝐴̅)
But 𝑃(𝐴∩𝐵ത)=𝑃(𝐴)−𝑃(𝐴∩𝐵) and 𝑃(𝐵∩𝐴̅)=𝑃(𝐵)−𝑃(𝐴∩𝐵).
∴𝑃(𝐴∪𝐵)=𝑃(𝐴)−𝑃(𝐴∩𝐵)+ 𝑃(𝐴∩𝐵)+𝑃(𝐵)−𝑃(𝐴∩𝐵)
∴𝑃(𝐴∪𝐵)=𝑃(𝐴)+𝑃(𝐵)−𝑃(𝐴∩𝐵).
Note: The above theorem can be extended to three events A, B and C as
shown below:
𝑃(𝐴∪𝐵∪𝐶)=𝑃(𝐴)+𝑃(𝐵)+𝑃(𝐶)−𝑃(𝐴∩𝐵)−𝑃(𝐵∩𝐶)
−𝑃(𝐶∩𝐴)+𝑃(𝐴∩𝐵∩𝐶)
Example 1: A bag contains 4 black and 6 white balls; two balls are
selected at random. Find the probability that balls are i) both are different
color s. ii) both are of same colors.
Solution: Total number of balls in bag = 4 blacks + 6 white = 10 balls
To select two balls at random, we get
𝑛(𝑆)=𝐶(10,2)=45.
i) A be the event to select both are different colors.
∴𝑛(𝐴)=𝐶(4,1)×𝐶(6,1)=4×6=24.
𝑃(𝐴)=𝑛(𝐴)
𝑛(𝑆)=24
45=0.53.
ii) To select both are same colors.
Let Abe the event to select both are black balls
𝑛(𝐴)=𝐶(4,2)=6
𝑃(𝐴)=𝑛(𝐴)
𝑛(𝑆)=6
45
Let B be the event to select both are white balls.
𝑛(𝐵)=𝐶(6,2)=15 S 𝐴∩𝐵ത 𝐵∩𝐴̅ 𝐴∩ B munotes.in
Page 89
89
𝑃(𝐵)=()
(ௌ)=ଵହ
ସହ .
A and B are disjoint event.
∴ The required probability is
𝑃(𝐴∪𝐵)=𝑃(𝐴)+𝑃(𝐵)=
ସହ+ଵହ
ସହ=ଶଵ
ସହ=0.467.
Example 2: From 40 tickets marked from 1 to 40, one ticket is drawn at
random. Find the probability that it is marked with a multiple of 3 or 4.
Solution: From 40 tickets marked with 1 to 40, one ticket is drawn at
random
𝑛(𝑆)=𝐶(40,1)=40
it is marked with a multiple of 3 or 4, we need to select in two parts.
Let A be the event to select multiple of 3,
i.e. A = { 3,6,9,….,39}
𝑛(𝐴)=𝐶(13,1)=13
𝑃(𝐴)=𝑛(𝐴)
𝑛(𝑆)=13
40
Let B be the event to select multiple of 4.
i.e. B = {4,8,12, …., 40}
𝑛(𝐵)=𝐶(10,1)=10
𝑃(𝐵)=𝑛(𝐵)
𝑛(𝑠)=10
40.
Here A and B are not disjoint.
𝐴∩𝐵be the event to select multiple of 3 and 4.
i.e. 𝐴∩𝐵 = {12,24,36}
𝑛(𝐴∩𝐵)=𝐶(3,1)=3
𝑃(𝐴∩𝐵)=𝑛(𝐴∩𝐵)
𝑛(𝑆)=3
40
∴ The required probability is
𝑃(𝐴∪𝐵)=𝑃(𝐴)+𝑃(𝐵)−𝑃(𝐴∩𝐵)=13
40+10
40−3
40=20
40=0.5.
Example 3: If the probability is 0.45 that a program development job; 0.8
that a networking job applicant has a graduate degree and 0.35 that applied
for both. Find the probability that applied for atleast one of jobs. If number
of graduate are 500 then how many are not applied for jobs?
Solution: Let Probability of program development job= 𝑃(𝐴)=0.45.
Probability of networking job = 𝑃(𝐵)=0.8.
Probability of both jobs = 𝑃(𝐴∩𝐵)=0.35. munotes.in
Page 90
90
Probability of atleast one i.e. to find 𝑃(𝐴∪𝐵).
𝑃(𝐴∪𝐵)=𝑃(𝐴)+𝑃(𝐵)−𝑃(𝐴∩𝐵)
𝑃(𝐴∪𝐵)=0.45+0.8−0.35=0.9
Now there are 500 application, first to find probability that not applied for
job.
𝑃(𝐴∪𝐵തതതതതതത)=1−𝑃(𝐴∪𝐵)=1−0.9=0.1
Number of graduate not applied for job = 0.1×500=50 .
Check your Progress:
1. A card is drawn from pack of 52 cards at random. Find the probability
that it is a face card or a diamond card.
2. If 𝑃(𝐴)=ଷ
଼and (𝐵)=ହ
଼ , 𝑃(𝐴∪𝐵)=
଼ than find i) 𝑃(𝐴∪𝐵തതതതതതത) ii)
𝑃(𝐴∩𝐵).
3. In a class of 60 students, 50 passed in computers, 40 passed in
mathematics and 35 passed in both. What is the probability that a
student selected at random has i) Passed in atleast one subject, ii)
failed in both the subjects, iii) passed in only one subject.
5.4 CONDITIONAL PROBABILITY
In many case we come across occurrence of an event A and for the same
are required to find out the probability of occurrence an event B which
depend on event A . This kind of problem is called conditional probability
problems.
Definition: Let A and B be two events. The conditional probability of
event B, if an event A has occurred is defined by the relation,
𝑃(𝐵|𝐴)=(∩)
()if and only if 𝑃(𝐴)>0.
In case when 𝑃(𝐴)=0,𝑃(𝐵|𝐴) is not define because 𝑃(𝐵∩𝐴)=0 and
𝑃(𝐵|𝐴)=
which is an indeterminate quantity.
Similarly, Let A and B be two events. The conditional probability of event
A, if an event B has occurred is defined by the relation,
𝑃(𝐴|𝐵)=(∩)
() If and only if 𝑃(𝐵)>0.
Example 4: A pair of fair dice is rolled. What is the probability that the
sum of upper most face is 6, given that both of the nu mbers are odd?
Solution: A pair of fair dice is rolled, therefore 𝑛(𝑆)=36.
A to select both are odd number, i.e. A = {(1,1), (1,3), (1,5), (3,1), (3,3),
(3,5), (5,1),(5,3), (5,5)}.
𝑃(𝐴)=𝑛(𝐴)
𝑛(𝑆)=9
36 munotes.in
Page 91
91
B is event that the sum is 6, i.e. B = { ((1,5),(2,4), (3,3),(4,2), (5,1)}.
𝑃(𝐵)=𝑛(𝐵)
𝑛(𝑆)=5
36
𝐴∩𝐵 = { (1,5), (3,3), (5,1)}
𝑃(𝐴∩𝐵)=𝑛(𝐴∩𝐵)
𝑛(𝑆)=3
36
By the definition of conditional probability,
𝑃(𝐵|𝐴)=𝑃(𝐴∩𝐵)
𝑃(𝐴)=336ൗ
936ൗ=1
3.
Example 5: If A and B are two events of sample space S, such th at
𝑃(𝐴)=0.85,𝑃(𝐵)=0.7and 𝑃(𝐴∪𝐵)=0.95. Find i) 𝑃(𝐴∩𝐵), ii)
𝑃(𝐴|𝐵), iii) 𝑃(𝐵|𝐴).
Solution: Given that 𝑃(𝐴)=0.85,𝑃(𝐵)=0.7and 𝑃(𝐴∪𝐵)=0.95.
i) By Addition theorem,
𝑃(𝐴∪𝐵)=𝑃(𝐴)+𝑃(𝐵)−𝑃(𝐴∩𝐵)
0.95=0.85+0.7−𝑃(𝐴∩𝐵)
𝑃(𝐴∩𝐵)=1.55−0.95=0.6.
ii) By the definition of conditional probability ,
𝑃(𝐴|𝐵)=(∩)
()=.
.=0.857.
iii) 𝑃(𝐵|𝐴)=(∩)
()=.
.଼ହ=0.706
Example 6: An urn A contains 4 Red and 5 Green balls. Another urn B
contains 5 Red and 6 Green balls. A ball is transferred from the ur n A to
the urn B, then a ball is drawn from urn B. find the probability that it is
Red.
Solution: Here there are two cases of transferring a ball from urn A to B.
Case I: When Red ball is transferred from urn A to B.
There for probability of Red ball from urn A is 𝑃(𝑅)=ସ
ଽ
After transfer of red ball, urn B contains 6 Red and 6 Green balls.
Now probability of red ball from urn B = 𝑃(𝑅|𝑅)×𝑃(𝑅)=
ଵଶ×ସ
ଽ=
ଶସ
ଵ଼.
Case II: When Green ball is transferred from urn A to B.
There for probability of Green ball from urn A is 𝑃(𝐺)=ହ
ଽ
After transfer of red ball, urn B contains 5 Red and 7 Green balls. munotes.in
Page 92
92
Now probability of red ball from urn B = 𝑃(𝑅|𝐺)×𝑃(𝐺)=ହ
ଵଶ×ହ
ଽ=
ଶହ
ଵ଼.
Therefore required probability =ଶସ
ଵ଼+ଶହ
ଵ଼=ସଽ
ଵ଼=0.4537.
Check your progress:
1. A family has two children. What is the probability that both are boys,
given at least one is boy?
2. Two dice are rolled. What is the condition probability that the sum of
the numbers on the dice exceeds 8, given that the first shows 4 ?
3. Consider a medical test that screens for a COVID -19 in 10 people in
1000. Suppose that the false positive rate is 4% and the false negative
rate is 1%. Then 99% of the time a person who has the condition tests
positive for it, and 96% of the time a person who does not have the
condition tests negative for it. a) What is the probability that a
randomly chosen person who tests positive for the COVID -19 actually
has the disease? b) What is the probability that a randomly chosen
person who tests negative for th e COVID -19 does not indeed have the
disease?
5.4.1 Independent and Dependent Eve nts, Mutually Exclusive Events:
Independent events:
Two events are said to be independent if the occurrence of one of them
does not affect and is not affected by the occurrence or non -occurrence of
other.
i.e. 𝑃൫𝐵𝐴ൗ൯=𝑃(𝐵) or 𝑃൫𝐴𝐵ൗ൯=𝑃(𝐴).
Multiplication theorem of probability: If A and B are any two events
associated with an experiment, then the probability of simultaneous
occurrence of events A and B is given by
𝑃(𝐴∩𝐵)=𝑃(𝐴)𝑃൫𝐵𝐴ൗ൯
Where 𝑃൫𝐵𝐴ൗ൯ denotes the conditional probability of event B given that
event A has already occur red.
OR
𝑃(𝐴∩𝐵)=𝑃(𝐵)𝑃൫𝐴𝐵ൗ൯
Where 𝑃൫𝐴𝐵ൗ൯ denotes the conditional probability of event A given that
event B has already occurred.
munotes.in
Page 93
93
11.5.1 For Independent events multiplication theorem:
If A and B are independent events then multiplication theorem can be
written as,
𝑷(𝑨∩𝑩)=𝑷(𝑨)𝑷(𝑩)
Proof. Multiplication theorem can be given by,
If A and B are any two events associated with an experiment, then the
probability of simultaneous occurrenc e of events A and B is given by
𝑃(𝐴∩𝐵)=𝑃(𝐴)𝑃൫𝐵𝐴ൗ൯
By definition of independent events, 𝑃൫𝐵𝐴ൗ൯=𝑃(𝐵) or 𝑃൫𝐴𝐵ൗ൯=𝑃(𝐴).
∴𝑷(𝑨∩𝑩)=𝑷(𝑨)𝑷(𝑩).
Note:
1) If A and B are independent event then, 𝐴̅ and 𝐵ത are independent event.
2) If A and B are independent event then, 𝐴̅ and B are independent event.
3) If A and B are independent event then, A and 𝐵ത are independent event.
Example 7: Manish and Mandar are trying to make Software for
company. Probability that Manish can be succes s is ଵ
ହ and Mandar can be
success is ଷ
ହ, both are doing independently. Find the probability that i)
both are success. ii) Atleast one will get success. iii) None of them will
success. iv) Only Mandar will success but Manish will not success.
Solution: Let probability that Manish will success is 𝑃(𝐴)=ଵ
ହ=0.2.
Therefore probability that Manish will not success is 𝑃(𝐴̅)=1−𝑃(𝐴)=
1−0.2=0.8.
Probability that Mandar will success is 𝑃(𝐵)=ଷ
ହ=0.6.
Therefore probability that Mandar will not success is 𝑃(𝐵ത)=1−𝑃(𝐵)=
1−0.6=0.4.
i) Both are success i.e. 𝑃(𝐴∩𝐵).
𝑃(𝐴∩𝐵)=𝑃(𝐴)×𝑃(𝐵)=0.2×0.6=0.12∵A and B are
independent events.
ii) Atleast one will get success. i.e. 𝑃(𝐴∪𝐵)
By addition theorem,
𝑃(𝐴∪𝐵)=𝑃(𝐴)+𝑃(𝐵)−𝑃(𝐴∩𝐵)=0.2+0.6−0.12=0.68.
iii) None of them will success. 𝑃(𝐴∪𝐵തതതതതതത) or 𝑃(𝐴̅∩𝐵ത)
[ ByDeMorgan’s law both are same]
𝑃(𝐴∪𝐵തതതതതതത)=1−𝑃(𝐴∪𝐵)=1−0.68=0.32.
Or
If A and B are independent than 𝐴̅ and 𝐵ത are also independent. munotes.in
Page 94
94
𝑃(𝐴̅∩𝐵ത)=𝑃(𝐴̅)×𝑃(𝐵ത)=0.8×0.4=0.32.
iv) Only Mandar will success but Manish will not success. i.e. 𝑃(𝐴̅∩𝐵).
𝑃(𝐴̅∩𝐵)=𝑃(𝐴̅)×𝑃(𝐵)=0.8×0.6=0.48
Example 8: 50 coding done by two students A and B, both are trying
independently. Number of correct coding by student A is 35 and
student B is 40. Find the probability of only one of them will do
correct coding.
Solution: Let probability of student A get correct coding is 𝑃(𝐴)=ଷହ
ହ=
0.7
Probability of student A get wrong coding is 𝑃(𝐴̅)=1−0.7=0.3
Probability of student B get correct coding is 𝑃(𝐵)=ସ
ହ=0.8
Probability of student B get wrong coding is 𝑃(𝐵ത)=1−0.8=0.2.
The probabi lity of only one of them will do correct coding.
i.e. A will correct than B will not or B will correct than A will not.
𝑃(𝐴∩𝐵ത)+𝑃(𝐵∩𝐴̅)=𝑃(𝐴)×𝑃(𝐵ത)+𝑃(𝐵)×𝑃(𝐴̅).
=0.7×0.2+0.8×0.3=0.14+0.24
=0.38
Example 9: Given that 𝑃(𝐴)=ଷ
,𝑃(𝐵)=ଶ
, if A and B are independent
events than find i) 𝑃(𝐴∩𝐵), ii) 𝑃(𝐵ത), iii) 𝑃(𝐴∪𝐵), iv) 𝑃(𝐴̅∩𝐵ത).
Solution: Given that 𝑃(𝐴)=ଷ
,𝑃(𝐵)=ଶ
.
i) A and B are independent events,
∴𝑃(𝐴∩𝐵)=𝑃(𝐴)×𝑃(𝐵)=3
7×2
7=6
49=0.122
𝑖𝑖) 𝑃(𝐵ത)=1−𝑃(𝐵)=1−ଶ
=ହ
=0.714.
iii) By addition theorem,
𝑃(𝐴∪𝐵)=𝑃(𝐴)+𝑃(𝐵)−𝑃(𝐴∩𝐵)=3
7+2
7−6
49=29
49=0.592.
iv) 𝑃(𝐴̅∩𝐵ത)=𝑃(𝐴∪𝐵തതതതതതത)=1−𝑃(𝐴∪𝐵)=1−0.592=0.408.
Check your progress:
1. If 𝑃(𝐴)=ଶ
ହ ,𝑃(𝐵)=ଵ
ଷ and if A and B are independent events, find
(𝑖)𝑃(𝐴∩𝐵),(𝑖𝑖)𝑃(𝐴∪𝐵),(𝑖𝑖𝑖)𝑃(𝐴̅∩𝐵ത).
2. The probability that A , B and C can solve the same problem
independently are ଵ
ଷ,ଶ
ହ𝑎𝑛𝑑ଷ
ସ respectively. Find the probability that i)
the problem remain unsolved, ii) the problem is solved, iii) only one
of them solve the pro blem.
3. The probability that Ram can shoot a target is ଶ
ହ and probability of
Laxman can shoot at the same target is ସ
ହ. A and B shot independently.
Find the probability that (i) the target is not shot at all, (ii) the target is
shot by at least one of them. (iii) the target shot by only one of them.
iv) target shot by both. munotes.in
Page 95
95
5.5 PROBABILITY DISTRIBU TIONS
In order to under stand the behavior of a random variable, we may want to
look at its average value. For probability we need to find Average is
called expected value of random variable X. for that first we have to learn
some basic concept of random variable.
Random Variable: A probability measurable real valued functions, say
X, defined over the sample space of a random experiment with respective
probability is called a random variable.
Types of random variables: There are two type of random variable.
Discrete Rando m Variable: A random variable is said to be discrete
random variable if it takes finite or countably infinite number of values.
Thus discrete random variable takes only isolated values.
Continuous Random variable: A random variable is continuous if its set
of possible values consists of an entire interval on the number line.
Probability Distribution of a random variable: All possible values of
the random variable, along with its corresponding probabilities, so
that∑ =1