Statistical-Methods-and-Testing-of-Hypothesis-munotes

Page 1

1
UNIT I
1
STANDARD DISTRIBUTIONS
CONTENTS OF MODULE
Unit Structure
1.0 Objective
1.1 Introduction
1.2 Study Guidance
1.3 Standard Distributions
1.3.1 Random, Discrete and continuous variable
1.3.2 Probability Mass Function
1.3.3 Probability Densi ty Function
1.3.4 Expectation
1.3.5 Variance
1.3.6 Cumulative Distribution Function
1.3.7 Reliability
1.4 Introduction and proper ties of following distributions
1.5 Binomial Distribution
1.6 Normal Distribution
1.7 Chi-square test
1.8 T-test
1.9 F-test
1.10 Summary
1.11 Unit End Questions
1.12 References
1.13 Further Readings
1.0 OBJECTIVES Students will be able to:
 Identify the types of random variables.
 Understand the concept of Probability distribution.
 Enable students to understand various types of distributions.
1.1 INTRODUCTION The science of statistics deals with assessing the uncertainty of inferences
drawn from random samples of data. This chapter focuses on random
variables its types and their probability distribution. To as sess the outcome munotes.in

Page 2


2 Standard Distributions Contents of Module of an experiment it is desirable to associate a real number X with the
possible outcome of an event. The concept of “randomness” is
fundamental to the field of statistics. Probability is not only used for
calculating the outcome of one eve nt but also can summarize the
likelihood of all possible outcomes. The relationship between each
possible outcome for a random variable and its probabilities is called a
probability distribution. Probability distributions are an important
foundational conc ept in probability and the names and shapes of common
probability distributions will be familiar. The structure and type of the
probability distribution vary based on the properties of the random
variable, such as continuous or discrete, and this, in turn, impacts how the
distribution might be summarized or how to calculate the most likely
outcome and its probability.
1.2 STUDY GUIDANCE  Understand the basic and concepts
 Practice the questions given in module
 Refer to further reading
1.3 STANDARD DISTR IBUTION 1.3.1 Random Variable:
A random variable is a real -valued variable or a function that assigns
values to each of the outcomes of an experiment. It is used to determine
statistical relationships among one another For. Eg. If random variable X
is the birth of a male child, then the result of the variable X could be 1 if a
male child is born and 0 if a female is born. Eg:2)If the random
experiment consists of tossing two coins then the random variable which
is the number of heads can be denoted as 0 ,1 or 2.
There are two types of random variables” discrete and continuous.
Discrete Random variable
A random variable that may assume only a finite number or a countably
infinite number of values is said to be discrete. For instance, a random
variable rep resenting the number of misprints in a book would be a
discrete random variable.
Continuous Random Variable :
A continuous random variable can assume any value in an interval on the
real number line or in a collection of intervals. Since there is an infin ite
number of values in any interval, it is not meaningful to talk about the
probability that the random variable will take on a specific value; instead,
the probability that a continuous random variable will lie within a given
interval is considered. munotes.in

Page 3


3 Statistical Methods And Testing of Hypothesis baPa X b f xd x
1.3.2 Probability Mass Function :
The probability distribution for a random variable describes how the
probabilities are distributed over the values of the random variable. For a
discrete random variable, x, the probability distrib ution is defined by a
probability mass function, denoted by p(x). This function provides the
probability for each value of the random variable. Probability distributions
always follow the following properties :
(1) p(x) must be nonnegative for each value of the random variable
i. e, 0ipxfor all values of i
(2) The sum of the probabilities for each value of the random variable
must be equal to one.
i.e, 1niipx
The set of values of xi with the corresponding pro babilities p(x i) constitute
probability distribution function of discrete random variable X . If X is a
discrete random variable then P(X) is called probability mass function
(PMF).
The following table shows the discrete distribution random variable X X1 X2 X3 X4 ….. ….. Xn P(X=x) P1 P2 P3 P4 ….. …. pn
Eg: The probability distribution of the discrete random variable X is
getting head while two coins are tossed
X 0 1 2 P[X=x] 1/4 2/4 1/4
1.3.3 Probability Density Function :
For a con tinuous random variable, x, the probability distribution is defined
by a probability density function (PDF), denoted by f(x) and the
probability density function should satisfy the following conditions:
 For a continuous random variable that takes some valu e between
certain limits, say a and b, The pdf is given by  baPa X b f x d x
 The probability density function is non -negative for all the possible
values, munotes.in

Page 4


4 Standard Distributions Contents of Module i.e. 0fx, for all x.
 The area between the density curve and horiz ontal X -axis is equal to
1,
i.e. 1fx d x
Note: Please note that the probability mass function is different from the
probability density function. f(x) does not give any value of probability
directly hence the rules of probability do no t apply to it.
Eg.: Let X be a continuous random variable with the PDF is given by ,0 12, 1 2xxFXxx   find p [0.2Solution: 1.20.20.2 1.2 PX f x d x 
We can split the integrals by taking the intervals as given below
11 . 20.2 12xdx x dx
11 . 2220.2 1222xxx       
110.02 2.4 0.72 222   
110.02 1.68 222  
0.66
1.3.4 Expectation o f Random Variable (Mean) :
Case 1 Discrete Random variable :
The The expected value of a random variable is the a verage value of the
random variable over a large number of experiments. In the case of
discrete random variables expected value can be found by using the
formula

E.g. Find the expected value of the following probability distribution from
the given probab ility distribution table munotes.in

Page 5


5 Statistical Methods And Testing of Hypothesis x -1 -2 -3 0 1 2 P(x) 0.25 0.35 0.01 0.01 0.2 0.18
Solution:
Expected value, 1niiiEX x Px 10 . 2 5 20 . 3 5 30 . 0 1 00 . 0 110 . 2 20 . 1 8        0.25 0.7 0.03 0 0.2 0.36    
= 0.42
Case 2 Continuous Random variable :
Let X be a random vari able with pdf f(x) then the mathematical
expectation of continuous random variable denoted by E(X) and given by EX x f x d x
For Eg: Let X be a continuous random variable with 23, 0 10 xxfXotherwise Find the expected value?

1.3.5 Variance o f A Random Variable :
Case 1: Discrete Random variable :
The variance for a discrete random variable is denoted by V(X) and is
defined as where E(X) is the expected value 22VX EX EX where E(X) is the expected value munotes.in

Page 6


6 Standard Distributions Contents of Module 22EX x px
Case 2: Continuous Random variable :
The variance for a continuous random variable is denoted by V(X) and is
defined as 22VX EX EX where E(X) is the expected value
22EX xf xd x
Eg: Find the Mean and Variance of the given data
X 1 2 3 4 5 6 P(X) 0.2 0.15 0.1 0.2 0.15 0.2 `niiiEX x px 10 . 2 20 . 1 5 30 . 1 40 . 2 50 . 1 5 60 . 2      0.2 0.3 0.3 0.8 0.75 1.2  3.55 22EX x Px 22 2 22 210 . 2 20 . 1 5 30 . 1 40 . 2 50 . 1 5 60 . 2      0.2 0.6 0.9 3.2 3.75 7.2  15.85 22VX EX EX 215.85 3.55 15.85 12.6025 3.2475
Mean = 3.55
Variance = 3.2475 munotes.in

Page 7


7 Statistical Methods And Testing of Hypothesis Eg: The p.d.f of random variable X is 26, 0 1fX xx x  Find
Mean and variance?
Mean EX x f xd x
1206xx x d x
1340634xx
11634
1612
12 22EX xf xd x
12206xx x d x
134066xx d x
1450645xx
1620
310 22VX EX EX 3110 4 120
Mean = 12 munotes.in

Page 8


8 Standard Distributions Contents of Module Variance 120
1.3.6 Cumulative Distribution Function :
When we are dealing with inequalities, for instance, X < a, the resulting
set of the outcome of elements will contain all the elements lesser than a
that is ”  to a.
When probabili ty function is applied over such inequality, it leads to a
cumulative probability value giving the estimate of the value being less
than or equal to a particular value. The cumulative distribution function of
a random variable is another method to describe the distribution of a
random variable.
If X is a continuous random variable with pdf f(x) then the function FP ,xX x f x d x x       Is called cumulative distribution
function (cdf)
1.3.7 Reliability :
Reliability is dependent on probability for mea suring and describing its
characteristics.The probability that the component survives until some
time t is called reliability R(t) of the component where X be the lifetime or
the time to failure of a component.
Thus, RP 1 ,tX tF t  where F is the distribution function of
the component lifetime X. The component is normally (but not always)
assumed to be working properly at time t = 0 [i.e., R(0) = 1], and no
component can work forever without failure .. l i m () 0ie R t . Also,
R(t) is a monotone decreasing function of t. For t less than zero, reliability
has no meaning, but we let R(t) = 1 for t < 0. F(t) will often be called the
unreliability.
1.4 INTRODUCTION -DISTRIBUTION FUNCTIONS In the previous section we discussed about various typ es of distribution
and its mean and variance. This section focuses on some standard
distribution and it properties.
Bernoulli’s Trial:
Bernoulli‟s trials are events or experiments which results in two mutually
exhaustive outcome one of them is termed as success and the other is
failure. For example , when an unbiased coin is tossed we can define
success as getting tail and hence getting head is failure
1.5 BINOMIAL DISTRIBUTION Consider „n‟ independent Bernoulli‟s trial which results into either success
or failure with probability of success “p” and probability of failure “q”. munotes.in

Page 9


9 Statistical Methods And Testing of Hypothesis Let „X‟ be a discrete random variable denoting the success in „n‟
independent trials the variate X is called random variate and the
probability distribution of X is called Binomial distribution and is defined
as Pxn znXx p qx
= 0 elsewhere Where x =0, 1 ,2….n,0For example, let‟s assume an unbiased coin is tossed 10 times and
probability of getting a head on one filp is ½.Flip 10 times ,the probability
of getting head on any throw is ½ and have a binomial distribution of
n=10 and p = ½. „„Success” would be “flipping a head” and failure will be”
flipping tail‟
Properties of Binomial distribution are as follows:
1. Mean of binomial distribution is np
2. Variance of Binomial distribution is npq
1.6 NORMAL DISTRIBUTION The normal distribution is the most important and most widely used
distribution in statistics. In statistics most of the symmetrical distributions
are similar to normal distribution.
The e quation of the normal curve is 22/22xNewhere =
standard 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 =𝐴𝑟𝑖𝑡ℎ𝑚𝑒𝑡𝑖𝑐 𝑚𝑒𝑎𝑛

We can transform the variable x to xzhere z is call ed normal
variate.
The parameters  and  are the mean and standard deviation,
respectively, and define the normal distribution.

The following are the properties of Normal distribution:
1. It is be ll shaped and symmetrical in nature. munotes.in

Page 10


10 Standard Distributions Contents of Module 2. The mean, median, and mode of a normal distribution are identical.
3. The total area under the normal curve is unity.
4. Normal distributions are denser in the center and less dense in the
tails.
5. Normal di stributions are defined by two parameters, the mean ()
and the standard deviation ().
6. 68% of the area of a normal distribution is within one standard
deviation of the mean.
7. Approximately 95% o f the area of a normal distribution is within two
standard deviations of the mean.
8. The position and shape of the normal curve depend upon ,  𝑎𝑛𝑑 𝑁
1.7 CHI -SQUARE DISTRIBUTION The chi -square distribution is a continuous probability distribution that is
widely used in statistical inference it is related to the standard normal
distribution in that if a random variable Z has the standard nor mal
distribution then that random variable squared has a chi -square
distribution with one degree of freedom.
The Chi -Square Distribution, denoted as 2X is related to the standard
normal distribution such as, if the independent nor mal variable, let‟s say Z
assumes the standard normal distribution, then the square of this normal
variable 2Z has the chi -square distribution with „K‟ degrees of freedom.
Here, K is the sum of the independent squared normal variab les.
The properties are as follows:
1. The chi -square distribution is a continuous probability distribution with
the values ranging from 0 to  (infinity) in the positive direction. The 2X can never assu me negative values.
2. The shape of the chi -square distribution depends on the number of
degrees of freedom „ v‟. When „ v‟ is small, the shape of the curve tends
to be skewed to the right, and as the „ v‟ gets larger, the shape becomes
more symmetrical and can be approximated by the normal distribution.
3. The mean of the chi -square distribution is equal to the degrees of
freedom, i.e. E(2X) = „v‟. While the variance is twice the degrees of
freedom, Viz. n(2X) = 2 v.
4. The 2X distribution approaches the normal distribution as v gets larger
with mean v and standard deviation as √22X. munotes.in

Page 11


11 Statistical Methods And Testing of Hypothesis

Chi-square distribution with different degree of freedom.
1.8 T - DISTRIBUTION The t -Distribution, also known as Student‟s t -distribution is the probability
distribution that estimates the population parameters when the sample size
is small and the population standard deviation is unknown.
It resembles the normal distribution and as the sample size increases the t -
distribution looks more normally distributed with the values of means and
standard deviation of 0 and 1 resp ectively.
Properties of t -Distribution:
1. The graph of the t distribution is also bell -shaped and symmetrical
with a mean zero.
2. The t-distribution is most useful for small sample sizes, when the
population standard deviation is not known, or both.
3. The student distribution ranges from  to  (infinity).
4. The shape of the t -distribution changes with the change in the degrees
of freedom.
5. The variance is always greater than one and can be defined only when
the degrees of freedom 3v
6. It is less peaked at the center and higher in tails, thus it assumes a
platykurtic shape. munotes.in

Page 12


12 Standard Distributions Contents of Module 7. The t -distribution has a greater dispersion than the standard normal
distribution. An d as the sample size „n‟ increases, it assumes the
normal distribution. Here the sample size is said to be large when 30n.
1.9 F-TEST DISTRIBUTION The distribution which is used to compute the behavior of two variances,
taken from two independent populations is called F -distribution. The
distribution of all possible values of f - statistic is called f -distribution with
degrees of freedom 111vnand 221vn. There are several properties
of F-distribution:
1. The curve is not symmetrical but skewed to the right.
2. The F -distribution is positively skewed with an increase in the degree
of freedom v 1 and v 2, its skewness increases.
3. The F statistic is greater than or equal to zero.
4. As the degrees of freedom for the numerator and the denominator gets
larger, the curve approximates the normal.
5. The statistic used to calculate the value of mean and variance is:
Mean = 212,2vvvfor 22v
Varianc e = 221 2212 222,24vv vvv v

6. The shape of the F -distribution depends on its parameters í1 and í2
degrees of freedom.
7. The values of the area lying on the left -hand side of the distribution
can be found out by taking the reciprocal of F values c orresponding to
the right -hand side and the degrees of freedom in the numerator and
the denominator are interchanged.
1.10 SUMMARY We discussed about random variable and its different types. There are two
types of probability distribution, discrete and c ontinuos.A random variable
assumes only a finite or countably infinite number of values are called a
discrete random variable. A continuous random variable can assumes
values uncountable number of values. Discrete random varia ble is
associated with probabi lity mass function and that of continuous related
with probability density function. Expected value and variance of the
discrete and continuous distribution were defined. We learnt some
standard distributions and its properties and these distributions will be
applicable in testing of hypothesis. The application methods of probability munotes.in

Page 13


13 Statistical Methods And Testing of Hypothesis can be seen in modeling of text and Web data, network traffic modeling,
probabilistic analysis of algorithms and graphs, reliability modeling,
simulation algorithms, data minin g, and speech recognition.
1.11 UNIT END QUESTIONS
1, Let X be a continuous random variable with the following PDF 0xaexofx x 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
Where a is a positive constant
(i) Find a
(ii) Find CDF of X , 𝐹𝑋 (𝑥)
(iii) Find P(1< 𝑋<3)

2. Let X be a random variable with PDF given by 1xaexofx x𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
Where a is a positive constant
(i) Find a
(ii) Find E(X) and V(X)
3. Check whether the following can define probability distribution
(i) 15xfx where x =0 , 1 , 2 , 3 , 4 ,5
(ii) 23xfxwhere x = 3, 4, 5
(iii) 12fxwhere x = 1, 2
4. Consider tossing of a fair coin 3 times Define X = number of times tails
occurred
Value 0 1 2 3 Probability 1/8 3/8 3/8 1/8
Find E(X) and V(X)
5. Find mean and variance x given the following probability distribution
x 2 4 6 8 10 P(x) 0.3 0.2 0.2 0.2 0.1 munotes.in

Page 14


14 Standard Distributions Contents of Module 6. A random variable x has following probability distribution
x 0 1 2 3 4 5 6 P(x) k 2k 3k 5k 4k 2k K
Find k. Hence find E(x).

7. A bag contains 4 Red and 6 White balls. Two balls are drawn at
random and gets Rs.10 for each r ed and Rs.5 for each white ball..
Find his mathematical expectation.
8. A continuous distribution of a variable X in the range ( -3, 3) is
defined by
(i) Verify that the area under the curve is unity.
(ii) Find the mean and variance of the above distr ibution.
213 - 3- 116Fx x x   2162 - 1 116xx   213 1316xx  

(i) Verify that the area under the curve is unity.
(ii) Find the mean and variance of the above distribution.
1.12 REFERENCES 1. Probability and Statistics with Reliability, Queuing and Computer
Science Applications, Kishor S. Trivedi, 2016 by John Wiley &Sons,
Inc., 1946.
2. Fundamentals of Mathematical Statistics by S.C. Gupta , 10th Edition,
2002.
1.13 FURTHER READING 1. Introductory Business Statistics , Alexander Holmes et.al., 2018.


***** munotes.in

Page 15

15
UNIT II
2
HYPOTHESIS TESTING
Unit Structure
2.0 Objective
2.1 Introduction
2.2 Hypothesis Testing
2.3 Null Hypothesis ( 𝐻o)
2.4 Alternate Hypothesis ( 𝐻1)
2.5 Critical Region
2.6 P-Value
2.7 Tests based on T
2.8 Normal and F Distribution
2.9 Analysis of Variance
2.10 One Way analysis of variance
2.11 Two-way analysis of variance
2.12 Summary
2.13 Unit End Questions
2.14 References for Future Reading
2.0 OBJECTIVE  Statistics is referred to as a process of collecting, organizing an d
analyzing data and drawing conclusions.
 The statistical analysis gives significance to insignificant data or
numbers.
 Statistics is “a branch of mathematics that deals with the collection,
analysis, interpretation, and presentation of masses of numerica l data.
2.1 INTRODUCTION  The science of collecting, organizing, analyzing and interpreting data
in order to make decisions.
 Statistics is used to describe the data set and to draw conclusion about
the population from the data set.
Statistical methods are of two types:
Descriptive Method: This method uses graphs and numerical summaries. munotes.in

Page 16


16 Hypothesis Testing Inferential Method: This method uses confidence interval and
significance test which are part of applied statistics.
2.2 HYPOTHESIS TESTING Definition
 Hypothesis is a claim or idea about a group or population.
 Hypothesis refers to an educated guess or assumption that can be
tested.
 Hypothesis is formulated based on previous studies.
2.3 NULL HYPOTHESIS ( 𝐻O) A statistical hypothesis which is formulated for the purpose of rejecting or
nullifying it.
2.4 ALTERNATE HYPOTHESIS ( 𝐻1) Any hypothesis which differs from the given null hypothesis:
The alternative hypothesis is what you might believe to be true o r hope to
prove true.
Null Hypothesis
Null Hypothesis can be a statement of equality.
There is no difference in mean scores of VSIT and SIES.
H0: μ1=μ2
Null Hypothesis can be a statement of no relationship.
Example - There is no relation between personal ity and job success.
Plant growth is not affected by light intensity.
Hypothesis Testing -Two tailed and One Tailed Test
Hypothesis Testing
▸ Decision -making process for evaluating claims about a population.
▸ Whether to accept or reject Ho munotes.in

Page 17


17 Statistical Methods And Testing of Hypothesis

Type I and T ype II Errors :
Type I Error :
When we reject a hypothesis when it should be accepted
Type II Error :
When we accept a hypothesis when it should be rejected

Two Tailed and One Tailed Test :
▸ Two Tail Test: critical area of a distribution is two -sided and te sts
whether a sample is greater than or less than a certain range of values.
▸ If the sample being tested falls into either of the critical areas, the
alternative hypothesis is accepted instead of the null hypothesis.
munotes.in

Page 18


18 Hypothesis Testing ▸ One tail test: A one -tailed test is a statistical test in which the critical
area of a distribution is one -sided so that it is either greater than or less
than a certain value, but not both.
▸ If the sample being tested falls into the one -sided critical area, the
alternative hypothesis w ill be accepted instead of the null hypothesis.


One-tailed tests are applied to answer for the questions: Is our finding
significantly greater than our assumed value? Or: Is our finding
significantly less than our assumed value?
Two-tailed tests are ap plied to answer the questions: Are the findings
different from the assumed mean?
Level of Significance :
▸ Maximum allowable probability of making type I error.
▸ This probability is denoted by 𝛼
▸ A significance level of 0.05 (5%) 𝑜𝑟 0.01 (1%) is common.
2.5 CRITICAL REGION LOS Test α=0.05 (5 %) α=0.01 (1 %) Two-tailed Test Zc=1.96 Zc= 2.58 One-tailed Test Zc=1.645 Zc= 2.33

munotes.in

Page 19


19 Statistical Methods And Testing of Hypothesis 2.6 P -VALUE Z Score :
▸ Mean Z= (X ̅ -μ)/(σ/√N)
▸ Proportion Z= (P -p)/√(pq/N)
Steps for hypothesis testing
1. Propose Ho and H1.
2. Identify test -
▹ one tailed (if <, >)
▹ two tailed (if ≠)
3. Get table value Zc ac cording to LOS mentioned in the problem.
4. Find Z score using the formula.
5. Inference -
▹ If Z < Zc , accept Ho.
▹ If Z > Zc , reject Ho

Question :

Solution :
Step 1 - Write given values


N = 50 Population Parameter munotes.in

Page 20


20 Hypothesis Testing ▸

▸ LOS =
= 0.01= 1 %
Step 2 - Propose H0

Step 3- Identify Test
As > sign is there, use One tailed Test
Step 4 - Get table value of Zc for LOS α=0.01 (1 %)
Zc= 2.33
Step 5- Find Z score usi ng formula:
Z= (X ̅-μ)/(σ/√N)
Z=3.5355
α=0.05 (5 %) α=0.01 (1 %) Two-tailed Test Zc=1.96 Zc= 2.58 One-tailed Test Zc=1.645 Zc= 2.33 Μ 1800 Σ 100 N 50 X ̅ 1850
Step 6 – Inference

Z > Zc, re ject Ho
▸ Therefore, we can support the claim at 0.01 LOS. i.e., the cable
strength is increased. munotes.in

Page 21


21 Statistical Methods And Testing of Hypothesis


Step 1 - Write given values
μ=74.5
σ=8
N = 200
X ̅=75.9
LOS = α = 0.05= 5 %
One tailed Test
Step 2 - Propose H0
H0: μ=74.5; performance of school is same as population
H1: μ>74.5; performance of school is better than population
Two tailed Test
Step 3 - Propose H0
H0: μ=74.5; performance of school is same as population
H1: μ≠74.5; performance of school is different than population α=0.05 (5 %) α=0.01 (1 %) Two-tailed Test Zc=1.96 Zc= 2.58 One-tailed Test Zc=1.645 Zc= 2.33
One tailed Test
Step 4 - Get table value of Zc for LOS α=0.05 (5 %)
Zc= 1.645
Step 5 - Find Z score using formula ; munotes.in

Page 22


22 Hypothesis Testing Z= (X ̅-μ)/(σ/√N)
Z=2.4748
μ 74.5 σ 8 N 200 X ̅ 75.9 Step 6 – Inference
Z =2.4748, Zc=1.645
As Z > Zc, reject Ho.
Therefore, we can support the claim at 0.05 LOS. i.e., the performance of
the school is better than population

Two tailed Test :
Step 4 : Get table value of Zc for LOS α=0.05 (5 %)
Zc= 1.96
Step 5 :
Z =2.4748
Step 6 : Inference
As Z > Zc, reject Ho.
Therefore, we can support the claim at 0.05 LOS. i.e., the performance of
the school is different than the population α=0.05 (5 %) α=0.01 (1 %) Two-tailed Test Zc=1.96 Zc= 2.58 One-tailed Test Zc=1.645 Zc= 2.33 munotes.in

Page 23


23 Statistical Methods And Testing of Hypothesis

Z Score
Mean
Z= (X ̅-μ)/(σ/√N)
Proportion
Z= (P -p)/√(pq/N)
Steps for hypothesis testing
1. Write given values.
2. Propo se Ho and H1.
3. Identify test -
one tailed (if <, >)
two tailed (if ≠)
4. Get table value Zc according to LOS mentioned in the problem.
5. Find Z score using the formula.
6. Inference -
If Z < Zc , accept Ho.
If Z > Zc , reject Ho.


munotes.in

Page 24


24 Hypothesis Testing Question

Step 1 - Write given values
p=90/100=0.9 (Population Parameter)
q=1 -0.9=0.1 (Population Parameter)
N = 200
P= 160/200=0.8 (Sample Data)
LOS = α = 0.05= 5 % (Sample Data)
Step 2 - Propose H0

Step 3 - Identify Test
As < sign is there, use One tailed Test
Step 4 - Get t able value of Zc for LOS α=0.05 (5 %)
Zc= 1.645
Step 5 - Find Z score using formula -
Z= (P -p)/√(pq/N)
Z=- 4.714 α=0.05 (5 %) α=0.01 (1 %) Two-tailed Test Zc=1.96 Zc= 2.58 One-tailed Test Zc=1.645 Zc= 2.33 p 0.9 q 0.1 P 0.8 N 200 munotes.in

Page 25


25 Statistical Methods And Testing of Hypothesis Step 6 – Inference
Z = -4.714, Zc= -1.645
As Z falls in critical region, reject Ho.
Therefore, we cannot support the claim at 0.05 LOS. i.e., the medicine is
not 90% effective.


Step 1 - Write given values :
p=probability of getting sum 7
E= { (1,6), (2,5), (3,4), (4,3), (5,2), (6,1)}
n(E) = 6
n(S) = 36
p=6/36= 1/6 = 0.167
q = 1 -p = 0.833
N = 100
P=23/100=0.23
LOS = α = 0.05= 5 %
Step 2 - Propose H0 :

Step 3 Identify Test :
As ≠ sign is there, use Two tailed Test munotes.in

Page 26


26 Hypothesis Testing Step 4 : Get table value of Zc for LOS α=0.05 (5 %)
Zc= 1.96
Step 5 : Find Z score using formula α=0.05 (5 %) α=0.01 (1 %) Two-tailed Test Z_c=1.96 Z_c= 2.58 One-tailed Test Z_c=1.645 Z_c= 2.33
Z= (P -p)/√(pq/N)
Z=0.1689 p 0.167 q 0.833 P 0.23 N 100 Step 6 : Inference
Z =0.1689, Zc=1.96
As Z < Zc, Accept Ho.
Therefore, we can support the cl aim at 0.05 LOS. i.e., the dice are fair.

2.7 TEST BASED ON T Student’s t distribution :
Degrees of freedom :
The number of independent pieces of information that went into
calculating the estimate.
Degrees of freedom = N -1 munotes.in

Page 27


27 Statistical Methods And Testing of Hypothesis z score, or z statistic is rep laced by a suitable t score, or t statistic.

Q.10 individuals are chosen at random from a population and their height
(in inches) is found to be – 63, 63, 64, 65, 66, 69, 69, 70, 70, 71. Find
students t by considering population mean to be 65.
Solution :
Formula -

Given -
N = 10
μ = 65

Q. In the past, a machine has produced washers having a thickness of
0.050 𝑖𝑛. To determine whether the machine is in proper working order, a
sample of 10 washers is chosen, for which th e mean thickness is 0.053 𝑖𝑛
and the standard deviation is 0.003 𝑖𝑛. Test the hypothesis that the
machine is in proper working order at 5% and 1% LOS. (tc at 5% LOS
=2.26, t_c at 1% LOS = 3.25)
Given :
μ = 0.050 in
N = 10
X ̅= 0.053 in

munotes.in

Page 28


28 Hypothesis Testing σ_x ̅ = 0.003
Propose Hypothesis:


t= 3
At 5% LOS
tc= 2.26
t = 3
As t > tc à Reject Ho at 5% LOS
At 1% LOS
tc=3.25
t = 3
As t < tc à Accept Ho at 1% LOS
1.8 NORMAL AND F DISTRIBUTION  Called as Fisher’s F Distribution.
 z score, or z statistic is replaced by a suitabl e F score, or F statistic.

Where,
N1= Sample 1 size
N2= Sample 2 size
σ1 = Population 1 SD
σ2= Population 2 SD
S1= Sample 1 SD munotes.in

Page 29


29 Statistical Methods And Testing of Hypothesis S2= Sample 2 SD
Q. Two samples of sizes 9 and 12 are drawn from two normally
distributed populations having variances 16 and 25 respectively. If the
sample variances are 20 and 8, determine whether t he first sample has a
significantly larger variance than the second sample at significance levels
of (a)0.05 (b) 0.01
(F0.95=2.95, F0.99=4.74)
Solution :
Given :
N1 = 9
N2 = 12
σ1^2 = Population 1 variance =16
σ2^2 = Population 2 variance = 25
S1^2 = Sampl e 1 variance = 20
S2^2 = Sample 2 variance = 8

At 5% LOS
Fc= 2.95
F = 4.03
As F > Fc à We can conclude that the variance of sample 1 is significantly
larger than that for sample 2.
At 1% LOS
Fc =4.74
F = 4.03
As F < Fc à Variance of sample 1 is not larg er than that for sample 2.
munotes.in

Page 30


30 Hypothesis Testing 2.9 ANALYSIS OF VARIANCE (ANOVA) Analysis of variance (ANOVA) is an analysis tool used in statistics that
splits an observed aggregate variability found inside a data set into two
parts: systematic factors and random factors. T he systematic factors have a
statistical influence on the given data set, while the random factors do not.
Analysts use the ANOVA test to determine the influence that independent
variables have on the dependent variable in a regression study.
The Formula f or ANOVA is: = MST/MSE
where:
F=ANOVA coefficient
MST =Mean sum of squares due to treatment
MSE =Mean sum of squares due to error
The ANOVA test is the initial step in analysing factors that affect a given
data set. Once the test is finished, an analyst per forms additional testing on
the methodical factors that measurably contribute to the data set's
inconsistency. The analyst utilizes the ANOVA test results in an f -test to
generate additional data that aligns with the proposed regression models.
The ANOVA t est allows a comparison of more than two groups at the
same time to determine whether a relationship exists between them. The
result of the ANOVA formula, the F statistic (also called the F -ratio),
allows for the analysis of multiple groups of data to dete rmine the
variability between samples and within samples.
2.10 ONE WAY ANALYSIS OF VARIANCE The one -way analysis of variance (ANOVA) is used to determine whether
there are any statistically significant differences between the means of
three or more indepen dent (unrelated) groups. This guide will provide a
brief introduction to the one -way ANOVA, including the assumptions of
the test and when you should use this test. If you are familiar with the one -
way ANOVA,
The one -way ANOVA compares the means between t he groups you are
interested in and determines whether any of those means are statistically
significantly different from each other. Specifically, it tests the null
hypothesis:

where µ = group mean and k = number of groups. If, however, the one -
way ANOVA returns a statistically significant result, we accept the
alternative hypothesis (H A), which is that there are at least two group
means that are statistically significantly different from each other. munotes.in

Page 31


31 Statistical Methods And Testing of Hypothesis 2.11 TWO -WAY ANALYSIS OF VARIANCE A two -way ANOVA is us ed to estimate how the mean of a quantitative
variable changes according to the levels of two categorical variables. Use
a two -way ANOVA when you want to know how two independent
variables, in combination, affect a dependent variable.
Example: You are rese arching which type of fertilizer and planting
density produces the greatest crop yield in a field experiment. You assign
different plots in a field to a combination of fertilizer type (1, 2, or 3) and
planting density (1=low density, 2=high density), and m easure the final
crop yield in bushels per acre at harvest time.
You can use a two -way ANOVA to find out if fertilizer type and planting
density influence average crop yield.
A two -way ANOVA with interaction tests three null hypotheses at the
same time:
 There is no difference in group means at any level of the first
independent variable.
 There is no difference in group means at any level of the second
independent variable.
 The effect of one independent variable does not depend on the effect
of the other ind ependent variable (a.k.a. no interaction effect).
A two -way ANOVA without interaction (a.k.a. an additive two -way
ANOVA) only tests the first two of these hypotheses.
The following columns provide all of the information needed to
interpret the model:
 Df shows the degrees of freedom for each variable (number of levels
in the variable minus 1).
 Sum sq is the sum of squares (a.k.a. the variation between the group
means created by the levels of the independent variable and the overall
mean).
 Mean sq shows the m ean sum of squares (the sum of squares divided
by the degrees of freedom).
 F value is the test statistic from the F -test (the mean square of the
variable divided by the mean square of each parameter).
 Pr(>F) is the p-value of the F statistic, and shows how likely it is that
the F -value calculated from the F -test would have occurred if the null
hypothesis of no difference was true.
munotes.in

Page 32


32 Hypothesis Testing 2.12 SUMMARY At the end of this chapter one can draw conclusion based on the data
available. Data will be processed, summarized and results can be
generated and in graphs it will be displayed.
2.13 UNIT END QUESTIONS Q1. Compute student’s t for data below :
-4 -2 -2 0 2 2 3 3
Take mean of universe to be zero.
Q2. In a city, it is claimed that average IQ of students is 102. The
intelligence quotients (IQs) of 16 students from one area of a city
showed a mean of 107 and a standard deviation of 10. Test the claim
at 5% LOS. (tc at 5% LOS =2.144)
Q3. Two samples of sizes 10 and 15 are drawn from two normally
distributed populations having variances 40 and 60, respectively. If
the sample variances are 90 and 50, determine whether the sample 1
variance is significantly greater than the sample 2 variance at
significance levels of (a) 0.05 and (b) 0.01. (F0.95=2.6 45,
F0.99=4.029)
Q4. Two samples of sizes 8 and 12 are drawn from two normally
distributed populations having variances 25 and 49, respectively. If
the sample variances are 36 and 60, determine whether Summary .
2.14 REFERENCES FOR FUTURE READING  https://www.basic -concept.com/c/basics -of-statistical -analysis.
 https://www.scribbr.com/statistics/two -way-anova/
 Problems are taken from Schaum’s Outline, Statistics, Fourth Edition
by Murray R, Larry Stephens.





*****
munotes.in

Page 33

33
UNIT III
3
NON-PARAMETRIC TESTS
Unit Structure
3.0 Objective
3.1 Introduction
3.2 Non-Parametric Test Definition
3.3 Need of Non -Parametric Test Definition
3.4 Sign Test
3.5 Wilcoxon’s Signed Rank Test
3.6 Run Test
3.7 Kruskal -Walis Test
3.8 Post-hoc analysis of one -way analysis of variance:
3.9 Duncan’s test Chi -square test of association
3.10 Summary
3.11 Unit End Questions
3.12 References for Future Reading
3.0 OBJECTIVE This type of statistics can be used without the mean, sample size, standar d
deviation, or the estimation of any other related parameters when none of
that information is available. Since nonparametric statistics makes fewer
assumptions about the sample data, its application is wider in scope than
parametric statistics.
3.1 INTRO DUCTION A non-parametric test (sometimes called a distribution free test) does not
assume anything about the underlying distribution (for example, that the
data comes from a normal distribution ). That’s compared to parametric
test, which makes assumptions about a population’s parameters (for
example, the mean or standard deviation ); When the word “non
parametri c” is used in stats, it doesn’t mean that you know nothing about
the population. It usually means that you know the population data does
not have a normal distribution .
3.2 NON -PARAMETRIC TEST DEFINITION A non -parametric test does not assume anything about the underlying
distribution (for example, that the data comes from a normal distribution ).
That’s compared to parametric test , which makes assumptions about a munotes.in

Page 34


34 Non-Parametric Tests population’s parameters (For example, the mean or standard deviation );
When the word “non -parametric” is used in statistics it means that the
population data does not have a normal distributi on.
3.3 NEED OF NON -PARAMETRIC TEST DEFINITION  Use nonparametric tests only when assumptions like normality are
being violated. Nonparametric tests can perform well with non-normal
continuous data with large sample size (generally 15 -20 items in each
group). Non -parametric tests are used when your data isn’t normal.
For nominal scales or ordinal scale s, use non -parametric statistics.
3.4 SIGN TEST A few nonparametric tests are:
 1-sample sign test : This test is used to estimate the median of a
population and compare it to a reference value or target value.
 1-sample Wilcoxon signed rank test. With this test, estimate the
population median and compare it to a reference/target value.
However, the test assumes the data comes from a symmetric
distribution (eg- Cauchy distribution or uniform distribution ).
 Kruskal -Wallis test . Use this test instead of a one -way ANOVA to find
out if two or more medians are different. Ranks of the data points are
used for the calculations, rather than the data points themselves.
 The Mann -Kendall Trend Test looks for trends in time -series data.
 Mann -Whitney test . Use this test to compa re differences between two
independent groups when dependent variables are either ordinal or
continuous.
Sign Test:
The sign test compares the sizes of two groups. It is a non-parametric or
“distribution free” test, which means the test doesn’t assume the data
comes from a particular distribution, like the normal distribution . The
sign test is an alternative to a one sample t test or a paired t test . It can
also be used for ordered (ranked) categorical data. The null hy pothesis for
the sign test is that the difference between medians is zero.
How to Calculate a Paired/Matched Sample Sign Tes t?
1. The data should be from two samples.
2. The two dependent samples should be paired or matched. For example,
depression scores from before a medical procedure and after.
Set the data in a t able. This set of data represents test scores at the end of
Spring and the beginning of the Fall semesters. The hypothesis is that
summer break means a significant drop in test scores. munotes.in

Page 35


35 Statistical Methods And Testing of Hypothesis  H0: No difference in median of the signed differences.
 H1: Median of the signed differences is less than zero.
Step1: Subtract set 2 from set 1 and put the result in the third
column.

Step 2: Add a fourth column indicating the sign of the n umber in
column 3
Step 3: Count the number of positives and negatives.
 4 positives.
 12 negatives.
Step 3: Add up the number of items in the sample and subtract, we get a
difference of zero for (in column 3). The sample size in this question was
17, with one zero, so n = 16.
Step 4: Find the p-value using a binomial distribution table or use a
binomial calculator .
 .5 for the probability . The null hypothesis is that there are an equal
number of signs (i.e., 50/50). Therefore, the test is simple binomial
experiment with a .5 chance of the sign being negative and .5 of it
being positive (assuming the null hypothesis is true).
 16 for the number of trials. munotes.in

Page 36


36 Non-Parametric Tests  4 for the nu mber of successes. “Successes” here is the smaller of either
the positive or negative signs from Step 2.
The p -value is 0.038, which is smaller than the alpha level of 0.05. We
can reject the null hypothesis and there is a significant difference.
3.5 WILCOXON’S SIGNED RANK TEST Definition:
The Wilcoxon Signed Rank Test is the non -parametric version of the
paired t -test. It is used to test whether there is a significant difference
between two population means. Use the Wilcoxon Signed Rank test when
you would like to use the paired t -test but the distribution of the
differences between the pairs is severely non-normally distributed .
Eg: Q. A basketball coach wants to know if a certain training program
increases the number of free throws made by his players. To test this, he
has 15 players shoot 20 free throws each before and after the training
program.
Solution: Since each player can be “paired” with themselves, the coach
had planned on using a paired t -test to determine if there was a significant
difference be tween the mean number of free throws made before and after
the training program.
However, the distribution of the differences turns out to be non -normal,
so the coach instead uses a Wilcoxon Signed Rank Test.
The following table shows the number of free th rows made (out of 20
attempts) by each of the 15 players, both before and after the training
program:
munotes.in

Page 37


37 Statistical Methods And Testing of Hypothesis Step 1: State the null and alternative hypotheses.
 H0: The median difference between the two groups is zero.
 HA: The median difference is negative. (e.g ., the players make less free
throws before participating in the training program)
Step 2: Find the difference and absolute difference for each pair.

Step3:

Step 4: Find the sum of the positive ranks and the negative ranks .
munotes.in

Page 38


38 Non-Parametric Tests Step 5: Reject or fail to reject the null hypothesis.
The test statistic, W, is the smaller of the absolute values of the positive
ranks and negative ranks. In this case, the smaller value is 29.5. Thus, our
test statistic is W = 29.5.
To determine if we should reject or fail to re ject the null hypothesis, we
can reference the critical value found in the Wilcoxon Signed Rank Test
Critical Values Table that corresponds with n and our chosen alpha level.
If our test statistic, W, is less than or equal to the critical value in the
table, we can reject the null hypothesis. Otherwise, we fail to reject the
null hypothesis.
The critical value that corresponds to an alpha level of 0.05 and n = 13
(the total number of pairs minus the two we didn’t calculate ranks for
since they had an observed difference of 0) is 17.
Since in test statistic (W = 29.5) is not less than or equal to 17, we fail to
reject the null hypothesis
Refer Wilcoxon Signed Rank Test Critical Values Table

Source: This Question and Solution is taken from the link: How t o
Perform the Wilcoxon Signed Rank Test - Statology
3.6 RUN TEST What Is a Runs Test?
A runs test is a statistical procedure that examines whether a string of data
is occurring randomly from a specific distribution. The runs test analyzes munotes.in

Page 39


39 Statistical Methods And Testing of Hypothesis the occurrence of similar events that are separated by events that are
different.
 Wolfowitz runs test, which was developed by mathematicians
Abraham Wald and Jacob Wolfowitz.
 A runs test is a statistical analysis that helps determine the
randomness of data by revealing any variables that might affect data
patterns.
 Technical traders can use a runs test to analyze statistical trends and
help spot profitable trading opportunities.
 For example, an investor interested in analyzing the price movement
of a particular stock mig ht conduct a runs test to gain insight into
possible future price action of that stock.
 A nonparametric test for randomness is provided by the theory of
runs. To understand what a run is, consider a sequence made up of
two symbols, a and b, such as
 aa bb b a bb aaaaa bbb aaaa
 The problem discussed is from Schaum’ Outline series by Murray
Spiegel, fouth edition.
 In tossing a coin, for example, a could represent ‘‘heads’’ and b
could represent ‘‘tails.’’ Or in sampling the bolts produced by a
machine , a could represent ‘‘defective’’ and b could represent
‘‘nondefective.
 A run is defined as a set of identical (or related) symbols contained
between two different symbols or no symbol (such as at the
beginning or end of the sequence).
 Proceeding from lef t to right in sequence (10), the first run, indicated
by a vertical bar, consists of two a’s; similarly, the second run
consists of three b’s, the third run consists of one a, etc. There are
seven runs in all.
 It seems clear that some relationship exists between randomness and
the number of runs. Thus, for the sequence
 a b a b a b a b a b a b
 there is a cyclic pattern, in which we go from a to b, back to a again,
etc., which we could hardly believe to be random. In such case we
have too many runs (in fact, we have the maximum number possible
for the given number of a’s and b’s). On the other hand, for the
sequence
 aaaaaa bbbb aaaaa bbb munotes.in

Page 40


40 Non-Parametric Tests  There seems to be a trend pattern, in which the a’s and b’s are
grouped (or clustered) together. In suc h case there are too few runs,
and we would not consider the sequence to be random. Thus, a
sequence would be considered nonrandom if there are either too
many or too few runs, and random otherwise.
 To quantify this idea, suppose that we form all possible sequences
consisting of N1 a’s and N2 b’s, for a total of N symbols in all N1 +
N2 = N. The collection of all these sequences provides us with a
sampling distribution: Each sequence has an associated number of
runs, denoted by V. In this way we are led to the sampling
distribution of the statistic V. It can be shown that this sampling
distribution has a mean and variance given, respectively, by the
formulas



By using formulas, we can test the hypothesis of randomness at
appropriate levels of significanc e. It turns out that if both N1 and N2 are
at least equal to 8, then the sampling distribution of V is very nearly a
normal distribution. Thus, it is normally distributed with mean 0 and
variance 1.
3.7 KRUSKAL -WALIS TEST The Kruskal –Wallis Non -Parametri c Hypothesis Test (1952) is a non -
parametric. It is generally used when the measurement variable does not
meet the normality assumptions of one -way ANOVA. It is also a popular
nonparametric test to compare outcomes among three or more
independent (unmatche d) groups.
Assumptions of the Kruskal -Wallis Test :
 All samples are randomly drawn from their respective population.
 Independence within each sample.
 The measurement scale is at least ordinal.
 Mutual independence among the various samples
Procedure to con duct Kruskal -Wallis Test :
 First pool all the data across the groups. munotes.in

Page 41


41 Statistical Methods And Testing of Hypothesis  Rank the data from 1 for the smallest value of the dependent variable
and next smallest variable rank 2 and so on… (if any value ties, in
that case it is advised to use mid -point), N bein g the highest variable.
 Compute the test statistic
 Determine critical value from Chi -Square distribution table
 Finally, formulate decision and conclusion
Calculation of the Kruskal -Wallis Non -Parametric Hypothesis Test:
The Kruskal –Wallis Non -Parametric Hy pothesis Test is to compare
medians among k groups (k > 2). The null and alternative hypotheses for
the Kruskal -Wallis test are as follows:
 Null Hypothesis H 0: Population medians are equal
 Alternative Hypothesis H 1: Population medians are not all equal
Kruskal-Wallis test pools the observations from the k groups into one
combined sample, and then ranks from lowest to highest value (1 to N),
where N is the total number of values in all the groups.
The test statistic for the Kruskal Wallis test denoted as H i s given as
follows :

Where T i = rank sum for the ith sample i = 1, 2…,k
In Kruskal -Wallis test, the H value will not have any impact for any two
groups in which the data values have same ranks. Either increasing the
largest value or
decreasing the smalles t value will have zero effect on H. Hence, the
extreme outliers (higher and lower side) will not impact this test.
Example of Kruskal -Wallis Non -Parametric Hypothesis Test :
In a manufacturing unit, four teams of operators were randomly selected
and sent to four different facilities for machining techniques training.
After the training, the supervisor conducted the exam and recorded the
test scores. At 95% confidence level does the scores are same in all four
facilities.
munotes.in

Page 42


42 Non-Parametric Tests  Null Hypothesis H 0: The distributio n of operator scores is same
 Alternative Hypothesis H 1: The scores may vary in four facilities
 N=16
Rank the score in all the facilities:


Right tailed chi -square test with 95% confidence level, and df =3,
critical χ2 value is 7.815

Calculated χ2 value is greater than the critical value of χ2for a 0.05
significance level. χ2
calculated >χ2
critical hence reject the null hypotheses
There is a difference in test scores exists for four teaching methods at
different facilities.
3.8 POST -HOC ANALYSIS OF ONE -WAY ANALYSIS OF VARIANCE ANOVA test tells the overall difference between the groups, but it does
not tell you which specific groups differed – post hoc tests do that.
Because post hoc tests are run to confirm where the differences occurred
between groups, they should only be run when showed an overall munotes.in

Page 43


43 Statistical Methods And Testing of Hypothesis statistically significant difference in group means (i.e., a statistically
significant one -way ANOVA result). Post hoc tests attempt to control the
experiment wise error rate (usually alpha = 0.05) in the same manner that
the one -way ANOVA is used instead of multiple t -tests.
3.9 DUNCAN’S TEST CHI -SQUARE TEST OF ASSOCIATION Chi square test:
A chi -square (χ2) statistic is a test that measures how expectations
compare to actual observed data (or model results).

Where o -observed frequency
e-expected frequency
Degree of freedom for mxn table -
ϑ=(no of rows -1)(no of columns -1)
ϑ=(m-1)(n-1)
Eg: In an experiment to study the dependence of hypertension on smoking
habits, the following data is taken from 180 individuals No Smokers Moderate Smokers Heavy smokers Hypertension 21 36 30 No hypertension 48 26 19
Test the hypothesis at 5 % LOS that the presence or absence of
hypertension is independent of smoking. (Given - χ_tab^2=5.99)
Solution:
Ho: Prese nce or absence of hypertension is independent of smoking.
H1: Presence or absence of hypertension is dependent of smoking. No Smokers O Moderate Smokers o Heavy smokers o Hypertension 21 36 30 RT1=87 No hypertension 48 26 19 RT2= 93 Total=180 CT1 =69 CT2=62 CT3=49 Total=180 RT=Row Total and CT=Column Total No Smokers Moderate Heavy munotes.in

Page 44


44 Non-Parametric Tests E Smokers e smokers e Hypertension (RT1 x CT1)/Total RT1xCT2/Total RT1xCT3/Total No hypertension (RT2 x CT1)/Total (RT2 x CT2)/Total (RT2 x CT3)/Total No Smokers E Moderate Smokers e Heavy smokers e Hypertension 87*69/180 87*62/180 87*49/180 No hypertension 93*69/180 93*62/180 93*49/180 No
Smokers
O Moderate Smokers O Heavy smokers O Total Hypertension 21 36 30 87 No hypertension 48 26 19 93 Total 69 62 49 180 o e (0-e)2/e 21 33.35 4.5734 36 29.967 1.2177 30 23.683 1.6849 48 35.65 4.2780 26 32.033 1.1363 19 25.316 1.5761
= 14.46
χ^2=14.46
χ_tab^2=5.99
As χ^2> χtab^2, Reject H0 at 5% LOS.
Therefore, we can conclude that Presence or absence of hypertension is
dependent of smoking.
The Chi -square test of independence determines whether there is a
statistically significant relationshi p between categorical variables. It is a
hypothesis test that answers the question —do the values of one
categorical variable depend on the value of other categorical variables?
This test is also known as the chi -square test of association. munotes.in

Page 45


45 Statistical Methods And Testing of Hypothesis  Null hypothesis: There are no relationships between the categorical
variables. If one variable is known, it does not help you predict the
value of another variable.
 Alternative hypothesis: There are relationships between the
categorical variables. Knowing the value of one variable does help
you predict the value of another variable.
The Chi -square test of association work s by comparing the distribution
that you observe to the distribution that you expect if there is no
relationship between the categorical variables.
For a Chi-square test, a p -value that is less than or equal to your
significance level indicates there is sufficient evidence to conclude that
the observed distribution is not the same as the expected distribution. You
can conclude that a relationship exists bet ween the categorical variables.
A Chi -square test of independence to determine whether there is a
statistically significant association between shirt color and deaths. We
need to use this test because these variables are both categorical variables.
Shirt c olor can be only blue, gold, or red. Fatalities can be only dead or
alive.
The problem discussed is from https://statisticsbyjim.com/hypothesis -
testing/chi -square -test-independence -example/
Eg -The color of the uniform represents each crewmember’s work are a.
We will statistically assess whether there is a connection between uniform
color and the fatality rate. Color Areas Crew Fatalities Blue Science and Medical 136 7 Gold Command and Helm 55 9 Red Operations, Engineering, and Security 239 24 Ships’ Total All 430 40
For example, we will determine whether the observed counts of deaths by
uniform color are different from the distribution that we’d expect if there
is no association between the two variables.
Color Status Frequency Blue Dead 7 Blue Alive 129 Gold Dead 9 Gold Alive 46 Red Dead 24 Red Alive 215 munotes.in

Page 46


46 Non-Parametric Tests

Both p -values are less than 0.05. Reject the null hypothesis and there
is a relationship between shirt color and deaths.
3.10 SUMMARY In statistics, nonparametric tests are methods of statist ical analysis that do
not require a distribution to meet the required assumptions to be analyzed
(especially if the data is not normally distributed).
It is also referred to as distribution -free tests. Nonparametric tests serve as
an alternative to parame tric tests such as T -test or ANOVA that can be
employed only if the underlying data satisfies certain criteria and
assumptions.
3.11 UNIT END QUESTIONS Q1.
munotes.in

Page 47


47 Statistical Methods And Testing of Hypothesis Given - χ_tab^2=9.49
Q2. The PQR Company claims that the lifetime of a type of battery that it
manu factures is more than 250 hours (h). A consumer advocate wishing
to determine whether the claim is justified measures the lifetimes of 24 of
the company’s batteries; the results are listed below. Assuming the
sample to be random, determine whether the comp any’s claim is justified
at the 0.05 significance level. Work the problem first by hand, supplying
all the details for the sign test

Q3.A sample of 40 grades from a statewide examination is shown below.
Test the hypothesis at the 0.05 significance level that the median grade for
all participants is (a) 66 and (b) 75. Work the problem first by hand,
supplying all the details for the sign test.

Q4.A company wishes to purchase one of five different machines: A, B,
C, D, or E. In an experiment designed to d etermine whether there is a
performance difference between the machines, five experienced operators
each work on the machines for equal times . The table below shows the
number of units produced by each machine. Test the hypothesis that there
is no difference between the machines at the (a) 0.05 and (b) 0.01
significance levels. Work the problem first by hand, supplying all the
details for the K ruskal –Wallis H test.

Q5. In 30 tosses of a coin the following sequence of heads (H) and tails
(T) is obtained:
H T T H T H H H T H H T T H T
H T H H T H T T H T H H T H T
(a) Determine the number of runs, V. munotes.in

Page 48


48 Non-Parametric Tests (b) Test at the 0.05 significance level w hether the sequence is random.
Work the problem first by hand, supplying all the details of the runs test
for randomness.
3.12 REFERENCES FOR FUTURE READING  Schaum’ Outline series by Murray Spiegel, fouth edition.
 https://www.statisticshowto.com/probability -and-statistics/statistics -
definitions/parametric -and-non-parametric -data/
 https://www.statisticshowto.com/sign -test/
 How to Perform the Wilcoxon Signed Rank Test - Statology
 https://sphweb.bumc.bu.edu/otlt/mph -
modules/bs/bs704_nonparametric/bs704_nonparametric_print.html
 https://www.statisticshowto.com/kruskal -wallis/
 https://statistics.laerd.com/statistical -guides/one -way-anova -statistical -
guide -4.php


*****



munotes.in