## Page 1

1

UNIT I

1

STANDARD DISTRIBUTIONS

CONTENTS OF MODULE

Unit Structure

1.0 Objective

1.1 Introduction

1.2 Study Guidance

1.3 Standard Distributions

1.3.1 Random, Discrete and continuous variable

1.3.2 Probability Mass Function

1.3.3 Probability Densi ty Function

1.3.4 Expectation

1.3.5 Variance

1.3.6 Cumulative Distribution Function

1.3.7 Reliability

1.4 Introduction and proper ties of following distributions

1.5 Binomial Distribution

1.6 Normal Distribution

1.7 Chi-square test

1.8 T-test

1.9 F-test

1.10 Summary

1.11 Unit End Questions

1.12 References

1.13 Further Readings

1.0 OBJECTIVES Students will be able to:

Identify the types of random variables.

Understand the concept of Probability distribution.

Enable students to understand various types of distributions.

1.1 INTRODUCTION The science of statistics deals with assessing the uncertainty of inferences

drawn from random samples of data. This chapter focuses on random

variables its types and their probability distribution. To as sess the outcome munotes.in

## Page 2

2 Standard Distributions Contents of Module of an experiment it is desirable to associate a real number X with the

possible outcome of an event. The concept of “randomness” is

fundamental to the field of statistics. Probability is not only used for

calculating the outcome of one eve nt but also can summarize the

likelihood of all possible outcomes. The relationship between each

possible outcome for a random variable and its probabilities is called a

probability distribution. Probability distributions are an important

foundational conc ept in probability and the names and shapes of common

probability distributions will be familiar. The structure and type of the

probability distribution vary based on the properties of the random

variable, such as continuous or discrete, and this, in turn, impacts how the

distribution might be summarized or how to calculate the most likely

outcome and its probability.

1.2 STUDY GUIDANCE Understand the basic and concepts

Practice the questions given in module

Refer to further reading

1.3 STANDARD DISTR IBUTION 1.3.1 Random Variable:

A random variable is a real -valued variable or a function that assigns

values to each of the outcomes of an experiment. It is used to determine

statistical relationships among one another For. Eg. If random variable X

is the birth of a male child, then the result of the variable X could be 1 if a

male child is born and 0 if a female is born. Eg:2)If the random

experiment consists of tossing two coins then the random variable which

is the number of heads can be denoted as 0 ,1 or 2.

There are two types of random variables” discrete and continuous.

Discrete Random variable

A random variable that may assume only a finite number or a countably

infinite number of values is said to be discrete. For instance, a random

variable rep resenting the number of misprints in a book would be a

discrete random variable.

Continuous Random Variable :

A continuous random variable can assume any value in an interval on the

real number line or in a collection of intervals. Since there is an infin ite

number of values in any interval, it is not meaningful to talk about the

probability that the random variable will take on a specific value; instead,

the probability that a continuous random variable will lie within a given

interval is considered. munotes.in

## Page 3

3 Statistical Methods And Testing of Hypothesis baPa X b f xd x

1.3.2 Probability Mass Function :

The probability distribution for a random variable describes how the

probabilities are distributed over the values of the random variable. For a

discrete random variable, x, the probability distrib ution is defined by a

probability mass function, denoted by p(x). This function provides the

probability for each value of the random variable. Probability distributions

always follow the following properties :

(1) p(x) must be nonnegative for each value of the random variable

i. e, 0ipxfor all values of i

(2) The sum of the probabilities for each value of the random variable

must be equal to one.

i.e, 1niipx

The set of values of xi with the corresponding pro babilities p(x i) constitute

probability distribution function of discrete random variable X . If X is a

discrete random variable then P(X) is called probability mass function

(PMF).

The following table shows the discrete distribution random variable X X1 X2 X3 X4 ….. ….. Xn P(X=x) P1 P2 P3 P4 ….. …. pn

Eg: The probability distribution of the discrete random variable X is

getting head while two coins are tossed

X 0 1 2 P[X=x] 1/4 2/4 1/4

1.3.3 Probability Density Function :

For a con tinuous random variable, x, the probability distribution is defined

by a probability density function (PDF), denoted by f(x) and the

probability density function should satisfy the following conditions:

For a continuous random variable that takes some valu e between

certain limits, say a and b, The pdf is given by baPa X b f x d x

The probability density function is non -negative for all the possible

values, munotes.in

## Page 4

4 Standard Distributions Contents of Module i.e. 0fx, for all x.

The area between the density curve and horiz ontal X -axis is equal to

1,

i.e. 1fx d x

Note: Please note that the probability mass function is different from the

probability density function. f(x) does not give any value of probability

directly hence the rules of probability do no t apply to it.

Eg.: Let X be a continuous random variable with the PDF is given by ,0 12, 1 2xxFXxx find p [0.2Solution: 1.20.20.2 1.2 PX f x d x

We can split the integrals by taking the intervals as given below

11 . 20.2 12xdx x dx

11 . 2220.2 1222xxx

110.02 2.4 0.72 222

110.02 1.68 222

0.66

1.3.4 Expectation o f Random Variable (Mean) :

Case 1 Discrete Random variable :

The The expected value of a random variable is the a verage value of the

random variable over a large number of experiments. In the case of

discrete random variables expected value can be found by using the

formula

E.g. Find the expected value of the following probability distribution from

the given probab ility distribution table munotes.in

## Page 5

5 Statistical Methods And Testing of Hypothesis x -1 -2 -3 0 1 2 P(x) 0.25 0.35 0.01 0.01 0.2 0.18

Solution:

Expected value, 1niiiEX x Px 10 . 2 5 20 . 3 5 30 . 0 1 00 . 0 110 . 2 20 . 1 8 0.25 0.7 0.03 0 0.2 0.36

= 0.42

Case 2 Continuous Random variable :

Let X be a random vari able with pdf f(x) then the mathematical

expectation of continuous random variable denoted by E(X) and given by EX x f x d x

For Eg: Let X be a continuous random variable with 23, 0 10 xxfXotherwise Find the expected value?

1.3.5 Variance o f A Random Variable :

Case 1: Discrete Random variable :

The variance for a discrete random variable is denoted by V(X) and is

defined as where E(X) is the expected value 22VX EX EX where E(X) is the expected value munotes.in

## Page 6

6 Standard Distributions Contents of Module 22EX x px

Case 2: Continuous Random variable :

The variance for a continuous random variable is denoted by V(X) and is

defined as 22VX EX EX where E(X) is the expected value

22EX xf xd x

Eg: Find the Mean and Variance of the given data

X 1 2 3 4 5 6 P(X) 0.2 0.15 0.1 0.2 0.15 0.2 `niiiEX x px 10 . 2 20 . 1 5 30 . 1 40 . 2 50 . 1 5 60 . 2 0.2 0.3 0.3 0.8 0.75 1.2 3.55 22EX x Px 22 2 22 210 . 2 20 . 1 5 30 . 1 40 . 2 50 . 1 5 60 . 2 0.2 0.6 0.9 3.2 3.75 7.2 15.85 22VX EX EX 215.85 3.55 15.85 12.6025 3.2475

Mean = 3.55

Variance = 3.2475 munotes.in

## Page 7

7 Statistical Methods And Testing of Hypothesis Eg: The p.d.f of random variable X is 26, 0 1fX xx x Find

Mean and variance?

Mean EX x f xd x

1206xx x d x

1340634xx

11634

1612

12 22EX xf xd x

12206xx x d x

134066xx d x

1450645xx

1620

310 22VX EX EX 3110 4 120

Mean = 12 munotes.in

## Page 8

8 Standard Distributions Contents of Module Variance 120

1.3.6 Cumulative Distribution Function :

When we are dealing with inequalities, for instance, X < a, the resulting

set of the outcome of elements will contain all the elements lesser than a

that is ” to a.

When probabili ty function is applied over such inequality, it leads to a

cumulative probability value giving the estimate of the value being less

than or equal to a particular value. The cumulative distribution function of

a random variable is another method to describe the distribution of a

random variable.

If X is a continuous random variable with pdf f(x) then the function FP ,xX x f x d x x Is called cumulative distribution

function (cdf)

1.3.7 Reliability :

Reliability is dependent on probability for mea suring and describing its

characteristics.The probability that the component survives until some

time t is called reliability R(t) of the component where X be the lifetime or

the time to failure of a component.

Thus, RP 1 ,tX tF t where F is the distribution function of

the component lifetime X. The component is normally (but not always)

assumed to be working properly at time t = 0 [i.e., R(0) = 1], and no

component can work forever without failure .. l i m () 0ie R t . Also,

R(t) is a monotone decreasing function of t. For t less than zero, reliability

has no meaning, but we let R(t) = 1 for t < 0. F(t) will often be called the

unreliability.

1.4 INTRODUCTION -DISTRIBUTION FUNCTIONS In the previous section we discussed about various typ es of distribution

and its mean and variance. This section focuses on some standard

distribution and it properties.

Bernoulli’s Trial:

Bernoulli‟s trials are events or experiments which results in two mutually

exhaustive outcome one of them is termed as success and the other is

failure. For example , when an unbiased coin is tossed we can define

success as getting tail and hence getting head is failure

1.5 BINOMIAL DISTRIBUTION Consider „n‟ independent Bernoulli‟s trial which results into either success

or failure with probability of success “p” and probability of failure “q”. munotes.in

## Page 9

9 Statistical Methods And Testing of Hypothesis Let „X‟ be a discrete random variable denoting the success in „n‟

independent trials the variate X is called random variate and the

probability distribution of X is called Binomial distribution and is defined

as Pxn znXx p qx

= 0 elsewhere Where x =0, 1 ,2….n,0

For example, let‟s assume an unbiased coin is tossed 10 times and

probability of getting a head on one filp is ½.Flip 10 times ,the probability

of getting head on any throw is ½ and have a binomial distribution of

n=10 and p = ½. „„Success” would be “flipping a head” and failure will be”

flipping tail‟

Properties of Binomial distribution are as follows:

1. Mean of binomial distribution is np

2. Variance of Binomial distribution is npq

1.6 NORMAL DISTRIBUTION The normal distribution is the most important and most widely used

distribution in statistics. In statistics most of the symmetrical distributions

are similar to normal distribution.

The e quation of the normal curve is 22/22xNewhere =

standard 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 =𝐴𝑟𝑖𝑡ℎ𝑚𝑒𝑡𝑖𝑐 𝑚𝑒𝑎𝑛

We can transform the variable x to xzhere z is call ed normal

variate.

The parameters and are the mean and standard deviation,

respectively, and define the normal distribution.

The following are the properties of Normal distribution:

1. It is be ll shaped and symmetrical in nature. munotes.in

## Page 10

10 Standard Distributions Contents of Module 2. The mean, median, and mode of a normal distribution are identical.

3. The total area under the normal curve is unity.

4. Normal distributions are denser in the center and less dense in the

tails.

5. Normal di stributions are defined by two parameters, the mean ()

and the standard deviation ().

6. 68% of the area of a normal distribution is within one standard

deviation of the mean.

7. Approximately 95% o f the area of a normal distribution is within two

standard deviations of the mean.

8. The position and shape of the normal curve depend upon , 𝑎𝑛𝑑 𝑁

1.7 CHI -SQUARE DISTRIBUTION The chi -square distribution is a continuous probability distribution that is

widely used in statistical inference it is related to the standard normal

distribution in that if a random variable Z has the standard nor mal

distribution then that random variable squared has a chi -square

distribution with one degree of freedom.

The Chi -Square Distribution, denoted as 2X is related to the standard

normal distribution such as, if the independent nor mal variable, let‟s say Z

assumes the standard normal distribution, then the square of this normal

variable 2Z has the chi -square distribution with „K‟ degrees of freedom.

Here, K is the sum of the independent squared normal variab les.

The properties are as follows:

1. The chi -square distribution is a continuous probability distribution with

the values ranging from 0 to (infinity) in the positive direction. The 2X can never assu me negative values.

2. The shape of the chi -square distribution depends on the number of

degrees of freedom „ v‟. When „ v‟ is small, the shape of the curve tends

to be skewed to the right, and as the „ v‟ gets larger, the shape becomes

more symmetrical and can be approximated by the normal distribution.

3. The mean of the chi -square distribution is equal to the degrees of

freedom, i.e. E(2X) = „v‟. While the variance is twice the degrees of

freedom, Viz. n(2X) = 2 v.

4. The 2X distribution approaches the normal distribution as v gets larger

with mean v and standard deviation as √22X. munotes.in

## Page 11

11 Statistical Methods And Testing of Hypothesis

Chi-square distribution with different degree of freedom.

1.8 T - DISTRIBUTION The t -Distribution, also known as Student‟s t -distribution is the probability

distribution that estimates the population parameters when the sample size

is small and the population standard deviation is unknown.

It resembles the normal distribution and as the sample size increases the t -

distribution looks more normally distributed with the values of means and

standard deviation of 0 and 1 resp ectively.

Properties of t -Distribution:

1. The graph of the t distribution is also bell -shaped and symmetrical

with a mean zero.

2. The t-distribution is most useful for small sample sizes, when the

population standard deviation is not known, or both.

3. The student distribution ranges from to (infinity).

4. The shape of the t -distribution changes with the change in the degrees

of freedom.

5. The variance is always greater than one and can be defined only when

the degrees of freedom 3v

6. It is less peaked at the center and higher in tails, thus it assumes a

platykurtic shape. munotes.in

## Page 12

12 Standard Distributions Contents of Module 7. The t -distribution has a greater dispersion than the standard normal

distribution. An d as the sample size „n‟ increases, it assumes the

normal distribution. Here the sample size is said to be large when 30n.

1.9 F-TEST DISTRIBUTION The distribution which is used to compute the behavior of two variances,

taken from two independent populations is called F -distribution. The

distribution of all possible values of f - statistic is called f -distribution with

degrees of freedom 111vnand 221vn. There are several properties

of F-distribution:

1. The curve is not symmetrical but skewed to the right.

2. The F -distribution is positively skewed with an increase in the degree

of freedom v 1 and v 2, its skewness increases.

3. The F statistic is greater than or equal to zero.

4. As the degrees of freedom for the numerator and the denominator gets

larger, the curve approximates the normal.

5. The statistic used to calculate the value of mean and variance is:

Mean = 212,2vvvfor 22v

Varianc e = 221 2212 222,24vv vvv v

6. The shape of the F -distribution depends on its parameters í1 and í2

degrees of freedom.

7. The values of the area lying on the left -hand side of the distribution

can be found out by taking the reciprocal of F values c orresponding to

the right -hand side and the degrees of freedom in the numerator and

the denominator are interchanged.

1.10 SUMMARY We discussed about random variable and its different types. There are two

types of probability distribution, discrete and c ontinuos.A random variable

assumes only a finite or countably infinite number of values are called a

discrete random variable. A continuous random variable can assumes

values uncountable number of values. Discrete random varia ble is

associated with probabi lity mass function and that of continuous related

with probability density function. Expected value and variance of the

discrete and continuous distribution were defined. We learnt some

standard distributions and its properties and these distributions will be

applicable in testing of hypothesis. The application methods of probability munotes.in

## Page 13

13 Statistical Methods And Testing of Hypothesis can be seen in modeling of text and Web data, network traffic modeling,

probabilistic analysis of algorithms and graphs, reliability modeling,

simulation algorithms, data minin g, and speech recognition.

1.11 UNIT END QUESTIONS

1, Let X be a continuous random variable with the following PDF 0xaexofx x 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒

Where a is a positive constant

(i) Find a

(ii) Find CDF of X , 𝐹𝑋 (𝑥)

(iii) Find P(1< 𝑋<3)

2. Let X be a random variable with PDF given by 1xaexofx x𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒

Where a is a positive constant

(i) Find a

(ii) Find E(X) and V(X)

3. Check whether the following can define probability distribution

(i) 15xfx where x =0 , 1 , 2 , 3 , 4 ,5

(ii) 23xfxwhere x = 3, 4, 5

(iii) 12fxwhere x = 1, 2

4. Consider tossing of a fair coin 3 times Define X = number of times tails

occurred

Value 0 1 2 3 Probability 1/8 3/8 3/8 1/8

Find E(X) and V(X)

5. Find mean and variance x given the following probability distribution

x 2 4 6 8 10 P(x) 0.3 0.2 0.2 0.2 0.1 munotes.in

## Page 14

14 Standard Distributions Contents of Module 6. A random variable x has following probability distribution

x 0 1 2 3 4 5 6 P(x) k 2k 3k 5k 4k 2k K

Find k. Hence find E(x).

7. A bag contains 4 Red and 6 White balls. Two balls are drawn at

random and gets Rs.10 for each r ed and Rs.5 for each white ball..

Find his mathematical expectation.

8. A continuous distribution of a variable X in the range ( -3, 3) is

defined by

(i) Verify that the area under the curve is unity.

(ii) Find the mean and variance of the above distr ibution.

213 - 3- 116Fx x x 2162 - 1 116xx 213 1316xx

(i) Verify that the area under the curve is unity.

(ii) Find the mean and variance of the above distribution.

1.12 REFERENCES 1. Probability and Statistics with Reliability, Queuing and Computer

Science Applications, Kishor S. Trivedi, 2016 by John Wiley &Sons,

Inc., 1946.

2. Fundamentals of Mathematical Statistics by S.C. Gupta , 10th Edition,

2002.

1.13 FURTHER READING 1. Introductory Business Statistics , Alexander Holmes et.al., 2018.

***** munotes.in

## Page 15

15

UNIT II

2

HYPOTHESIS TESTING

Unit Structure

2.0 Objective

2.1 Introduction

2.2 Hypothesis Testing

2.3 Null Hypothesis ( 𝐻o)

2.4 Alternate Hypothesis ( 𝐻1)

2.5 Critical Region

2.6 P-Value

2.7 Tests based on T

2.8 Normal and F Distribution

2.9 Analysis of Variance

2.10 One Way analysis of variance

2.11 Two-way analysis of variance

2.12 Summary

2.13 Unit End Questions

2.14 References for Future Reading

2.0 OBJECTIVE Statistics is referred to as a process of collecting, organizing an d

analyzing data and drawing conclusions.

The statistical analysis gives significance to insignificant data or

numbers.

Statistics is “a branch of mathematics that deals with the collection,

analysis, interpretation, and presentation of masses of numerica l data.

2.1 INTRODUCTION The science of collecting, organizing, analyzing and interpreting data

in order to make decisions.

Statistics is used to describe the data set and to draw conclusion about

the population from the data set.

Statistical methods are of two types:

Descriptive Method: This method uses graphs and numerical summaries. munotes.in

## Page 16

16 Hypothesis Testing Inferential Method: This method uses confidence interval and

significance test which are part of applied statistics.

2.2 HYPOTHESIS TESTING Definition

Hypothesis is a claim or idea about a group or population.

Hypothesis refers to an educated guess or assumption that can be

tested.

Hypothesis is formulated based on previous studies.

2.3 NULL HYPOTHESIS ( 𝐻O) A statistical hypothesis which is formulated for the purpose of rejecting or

nullifying it.

2.4 ALTERNATE HYPOTHESIS ( 𝐻1) Any hypothesis which differs from the given null hypothesis:

The alternative hypothesis is what you might believe to be true o r hope to

prove true.

Null Hypothesis

Null Hypothesis can be a statement of equality.

There is no difference in mean scores of VSIT and SIES.

H0: μ1=μ2

Null Hypothesis can be a statement of no relationship.

Example - There is no relation between personal ity and job success.

Plant growth is not affected by light intensity.

Hypothesis Testing -Two tailed and One Tailed Test

Hypothesis Testing

▸ Decision -making process for evaluating claims about a population.

▸ Whether to accept or reject Ho munotes.in

## Page 17

17 Statistical Methods And Testing of Hypothesis

Type I and T ype II Errors :

Type I Error :

When we reject a hypothesis when it should be accepted

Type II Error :

When we accept a hypothesis when it should be rejected

Two Tailed and One Tailed Test :

▸ Two Tail Test: critical area of a distribution is two -sided and te sts

whether a sample is greater than or less than a certain range of values.

▸ If the sample being tested falls into either of the critical areas, the

alternative hypothesis is accepted instead of the null hypothesis.

munotes.in

## Page 18

18 Hypothesis Testing ▸ One tail test: A one -tailed test is a statistical test in which the critical

area of a distribution is one -sided so that it is either greater than or less

than a certain value, but not both.

▸ If the sample being tested falls into the one -sided critical area, the

alternative hypothesis w ill be accepted instead of the null hypothesis.

One-tailed tests are applied to answer for the questions: Is our finding

significantly greater than our assumed value? Or: Is our finding

significantly less than our assumed value?

Two-tailed tests are ap plied to answer the questions: Are the findings

different from the assumed mean?

Level of Significance :

▸ Maximum allowable probability of making type I error.

▸ This probability is denoted by 𝛼

▸ A significance level of 0.05 (5%) 𝑜𝑟 0.01 (1%) is common.

2.5 CRITICAL REGION LOS Test α=0.05 (5 %) α=0.01 (1 %) Two-tailed Test Zc=1.96 Zc= 2.58 One-tailed Test Zc=1.645 Zc= 2.33

munotes.in

## Page 19

19 Statistical Methods And Testing of Hypothesis 2.6 P -VALUE Z Score :

▸ Mean Z= (X ̅ -μ)/(σ/√N)

▸ Proportion Z= (P -p)/√(pq/N)

Steps for hypothesis testing

1. Propose Ho and H1.

2. Identify test -

▹ one tailed (if <, >)

▹ two tailed (if ≠)

3. Get table value Zc ac cording to LOS mentioned in the problem.

4. Find Z score using the formula.

5. Inference -

▹ If Z < Zc , accept Ho.

▹ If Z > Zc , reject Ho

Question :

Solution :

Step 1 - Write given values

N = 50 Population Parameter munotes.in

## Page 20

20 Hypothesis Testing ▸

▸ LOS =

= 0.01= 1 %

Step 2 - Propose H0

Step 3- Identify Test

As > sign is there, use One tailed Test

Step 4 - Get table value of Zc for LOS α=0.01 (1 %)

Zc= 2.33

Step 5- Find Z score usi ng formula:

Z= (X ̅-μ)/(σ/√N)

Z=3.5355

α=0.05 (5 %) α=0.01 (1 %) Two-tailed Test Zc=1.96 Zc= 2.58 One-tailed Test Zc=1.645 Zc= 2.33 Μ 1800 Σ 100 N 50 X ̅ 1850

Step 6 – Inference

Z > Zc, re ject Ho

▸ Therefore, we can support the claim at 0.01 LOS. i.e., the cable

strength is increased. munotes.in

## Page 21

21 Statistical Methods And Testing of Hypothesis

Step 1 - Write given values

μ=74.5

σ=8

N = 200

X ̅=75.9

LOS = α = 0.05= 5 %

One tailed Test

Step 2 - Propose H0

H0: μ=74.5; performance of school is same as population

H1: μ>74.5; performance of school is better than population

Two tailed Test

Step 3 - Propose H0

H0: μ=74.5; performance of school is same as population

H1: μ≠74.5; performance of school is different than population α=0.05 (5 %) α=0.01 (1 %) Two-tailed Test Zc=1.96 Zc= 2.58 One-tailed Test Zc=1.645 Zc= 2.33

One tailed Test

Step 4 - Get table value of Zc for LOS α=0.05 (5 %)

Zc= 1.645

Step 5 - Find Z score using formula ; munotes.in

## Page 22

22 Hypothesis Testing Z= (X ̅-μ)/(σ/√N)

Z=2.4748

μ 74.5 σ 8 N 200 X ̅ 75.9 Step 6 – Inference

Z =2.4748, Zc=1.645

As Z > Zc, reject Ho.

Therefore, we can support the claim at 0.05 LOS. i.e., the performance of

the school is better than population

Two tailed Test :

Step 4 : Get table value of Zc for LOS α=0.05 (5 %)

Zc= 1.96

Step 5 :

Z =2.4748

Step 6 : Inference

As Z > Zc, reject Ho.

Therefore, we can support the claim at 0.05 LOS. i.e., the performance of

the school is different than the population α=0.05 (5 %) α=0.01 (1 %) Two-tailed Test Zc=1.96 Zc= 2.58 One-tailed Test Zc=1.645 Zc= 2.33 munotes.in

## Page 23

23 Statistical Methods And Testing of Hypothesis

Z Score

Mean

Z= (X ̅-μ)/(σ/√N)

Proportion

Z= (P -p)/√(pq/N)

Steps for hypothesis testing

1. Write given values.

2. Propo se Ho and H1.

3. Identify test -

one tailed (if <, >)

two tailed (if ≠)

4. Get table value Zc according to LOS mentioned in the problem.

5. Find Z score using the formula.

6. Inference -

If Z < Zc , accept Ho.

If Z > Zc , reject Ho.

munotes.in

## Page 24

24 Hypothesis Testing Question

Step 1 - Write given values

p=90/100=0.9 (Population Parameter)

q=1 -0.9=0.1 (Population Parameter)

N = 200

P= 160/200=0.8 (Sample Data)

LOS = α = 0.05= 5 % (Sample Data)

Step 2 - Propose H0

Step 3 - Identify Test

As < sign is there, use One tailed Test

Step 4 - Get t able value of Zc for LOS α=0.05 (5 %)

Zc= 1.645

Step 5 - Find Z score using formula -

Z= (P -p)/√(pq/N)

Z=- 4.714 α=0.05 (5 %) α=0.01 (1 %) Two-tailed Test Zc=1.96 Zc= 2.58 One-tailed Test Zc=1.645 Zc= 2.33 p 0.9 q 0.1 P 0.8 N 200 munotes.in

## Page 25

25 Statistical Methods And Testing of Hypothesis Step 6 – Inference

Z = -4.714, Zc= -1.645

As Z falls in critical region, reject Ho.

Therefore, we cannot support the claim at 0.05 LOS. i.e., the medicine is

not 90% effective.

Step 1 - Write given values :

p=probability of getting sum 7

E= { (1,6), (2,5), (3,4), (4,3), (5,2), (6,1)}

n(E) = 6

n(S) = 36

p=6/36= 1/6 = 0.167

q = 1 -p = 0.833

N = 100

P=23/100=0.23

LOS = α = 0.05= 5 %

Step 2 - Propose H0 :

Step 3 Identify Test :

As ≠ sign is there, use Two tailed Test munotes.in

## Page 26

26 Hypothesis Testing Step 4 : Get table value of Zc for LOS α=0.05 (5 %)

Zc= 1.96

Step 5 : Find Z score using formula α=0.05 (5 %) α=0.01 (1 %) Two-tailed Test Z_c=1.96 Z_c= 2.58 One-tailed Test Z_c=1.645 Z_c= 2.33

Z= (P -p)/√(pq/N)

Z=0.1689 p 0.167 q 0.833 P 0.23 N 100 Step 6 : Inference

Z =0.1689, Zc=1.96

As Z < Zc, Accept Ho.

Therefore, we can support the cl aim at 0.05 LOS. i.e., the dice are fair.

2.7 TEST BASED ON T Student’s t distribution :

Degrees of freedom :

The number of independent pieces of information that went into

calculating the estimate.

Degrees of freedom = N -1 munotes.in

## Page 27

27 Statistical Methods And Testing of Hypothesis z score, or z statistic is rep laced by a suitable t score, or t statistic.

Q.10 individuals are chosen at random from a population and their height

(in inches) is found to be – 63, 63, 64, 65, 66, 69, 69, 70, 70, 71. Find

students t by considering population mean to be 65.

Solution :

Formula -

Given -

N = 10

μ = 65

Q. In the past, a machine has produced washers having a thickness of

0.050 𝑖𝑛. To determine whether the machine is in proper working order, a

sample of 10 washers is chosen, for which th e mean thickness is 0.053 𝑖𝑛

and the standard deviation is 0.003 𝑖𝑛. Test the hypothesis that the

machine is in proper working order at 5% and 1% LOS. (tc at 5% LOS

=2.26, t_c at 1% LOS = 3.25)

Given :

μ = 0.050 in

N = 10

X ̅= 0.053 in

munotes.in

## Page 28

28 Hypothesis Testing σ_x ̅ = 0.003

Propose Hypothesis:

t= 3

At 5% LOS

tc= 2.26

t = 3

As t > tc à Reject Ho at 5% LOS

At 1% LOS

tc=3.25

t = 3

As t < tc à Accept Ho at 1% LOS

1.8 NORMAL AND F DISTRIBUTION Called as Fisher’s F Distribution.

z score, or z statistic is replaced by a suitabl e F score, or F statistic.

Where,

N1= Sample 1 size

N2= Sample 2 size

σ1 = Population 1 SD

σ2= Population 2 SD

S1= Sample 1 SD munotes.in

## Page 29

29 Statistical Methods And Testing of Hypothesis S2= Sample 2 SD

Q. Two samples of sizes 9 and 12 are drawn from two normally

distributed populations having variances 16 and 25 respectively. If the

sample variances are 20 and 8, determine whether t he first sample has a

significantly larger variance than the second sample at significance levels

of (a)0.05 (b) 0.01

(F0.95=2.95, F0.99=4.74)

Solution :

Given :

N1 = 9

N2 = 12

σ1^2 = Population 1 variance =16

σ2^2 = Population 2 variance = 25

S1^2 = Sampl e 1 variance = 20

S2^2 = Sample 2 variance = 8

At 5% LOS

Fc= 2.95

F = 4.03

As F > Fc à We can conclude that the variance of sample 1 is significantly

larger than that for sample 2.

At 1% LOS

Fc =4.74

F = 4.03

As F < Fc à Variance of sample 1 is not larg er than that for sample 2.

munotes.in

## Page 30

30 Hypothesis Testing 2.9 ANALYSIS OF VARIANCE (ANOVA) Analysis of variance (ANOVA) is an analysis tool used in statistics that

splits an observed aggregate variability found inside a data set into two

parts: systematic factors and random factors. T he systematic factors have a

statistical influence on the given data set, while the random factors do not.

Analysts use the ANOVA test to determine the influence that independent

variables have on the dependent variable in a regression study.

The Formula f or ANOVA is: = MST/MSE

where:

F=ANOVA coefficient

MST =Mean sum of squares due to treatment

MSE =Mean sum of squares due to error

The ANOVA test is the initial step in analysing factors that affect a given

data set. Once the test is finished, an analyst per forms additional testing on

the methodical factors that measurably contribute to the data set's

inconsistency. The analyst utilizes the ANOVA test results in an f -test to

generate additional data that aligns with the proposed regression models.

The ANOVA t est allows a comparison of more than two groups at the

same time to determine whether a relationship exists between them. The

result of the ANOVA formula, the F statistic (also called the F -ratio),

allows for the analysis of multiple groups of data to dete rmine the

variability between samples and within samples.

2.10 ONE WAY ANALYSIS OF VARIANCE The one -way analysis of variance (ANOVA) is used to determine whether

there are any statistically significant differences between the means of

three or more indepen dent (unrelated) groups. This guide will provide a

brief introduction to the one -way ANOVA, including the assumptions of

the test and when you should use this test. If you are familiar with the one -

way ANOVA,

The one -way ANOVA compares the means between t he groups you are

interested in and determines whether any of those means are statistically

significantly different from each other. Specifically, it tests the null

hypothesis:

where µ = group mean and k = number of groups. If, however, the one -

way ANOVA returns a statistically significant result, we accept the

alternative hypothesis (H A), which is that there are at least two group

means that are statistically significantly different from each other. munotes.in

## Page 31

31 Statistical Methods And Testing of Hypothesis 2.11 TWO -WAY ANALYSIS OF VARIANCE A two -way ANOVA is us ed to estimate how the mean of a quantitative

variable changes according to the levels of two categorical variables. Use

a two -way ANOVA when you want to know how two independent

variables, in combination, affect a dependent variable.

Example: You are rese arching which type of fertilizer and planting

density produces the greatest crop yield in a field experiment. You assign

different plots in a field to a combination of fertilizer type (1, 2, or 3) and

planting density (1=low density, 2=high density), and m easure the final

crop yield in bushels per acre at harvest time.

You can use a two -way ANOVA to find out if fertilizer type and planting

density influence average crop yield.

A two -way ANOVA with interaction tests three null hypotheses at the

same time:

There is no difference in group means at any level of the first

independent variable.

There is no difference in group means at any level of the second

independent variable.

The effect of one independent variable does not depend on the effect

of the other ind ependent variable (a.k.a. no interaction effect).

A two -way ANOVA without interaction (a.k.a. an additive two -way

ANOVA) only tests the first two of these hypotheses.

The following columns provide all of the information needed to

interpret the model:

Df shows the degrees of freedom for each variable (number of levels

in the variable minus 1).

Sum sq is the sum of squares (a.k.a. the variation between the group

means created by the levels of the independent variable and the overall

mean).

Mean sq shows the m ean sum of squares (the sum of squares divided

by the degrees of freedom).

F value is the test statistic from the F -test (the mean square of the

variable divided by the mean square of each parameter).

Pr(>F) is the p-value of the F statistic, and shows how likely it is that

the F -value calculated from the F -test would have occurred if the null

hypothesis of no difference was true.

munotes.in

## Page 32

32 Hypothesis Testing 2.12 SUMMARY At the end of this chapter one can draw conclusion based on the data

available. Data will be processed, summarized and results can be

generated and in graphs it will be displayed.

2.13 UNIT END QUESTIONS Q1. Compute student’s t for data below :

-4 -2 -2 0 2 2 3 3

Take mean of universe to be zero.

Q2. In a city, it is claimed that average IQ of students is 102. The

intelligence quotients (IQs) of 16 students from one area of a city

showed a mean of 107 and a standard deviation of 10. Test the claim

at 5% LOS. (tc at 5% LOS =2.144)

Q3. Two samples of sizes 10 and 15 are drawn from two normally

distributed populations having variances 40 and 60, respectively. If

the sample variances are 90 and 50, determine whether the sample 1

variance is significantly greater than the sample 2 variance at

significance levels of (a) 0.05 and (b) 0.01. (F0.95=2.6 45,

F0.99=4.029)

Q4. Two samples of sizes 8 and 12 are drawn from two normally

distributed populations having variances 25 and 49, respectively. If

the sample variances are 36 and 60, determine whether Summary .

2.14 REFERENCES FOR FUTURE READING https://www.basic -concept.com/c/basics -of-statistical -analysis.

https://www.scribbr.com/statistics/two -way-anova/

Problems are taken from Schaum’s Outline, Statistics, Fourth Edition

by Murray R, Larry Stephens.

*****

munotes.in

## Page 33

33

UNIT III

3

NON-PARAMETRIC TESTS

Unit Structure

3.0 Objective

3.1 Introduction

3.2 Non-Parametric Test Definition

3.3 Need of Non -Parametric Test Definition

3.4 Sign Test

3.5 Wilcoxon’s Signed Rank Test

3.6 Run Test

3.7 Kruskal -Walis Test

3.8 Post-hoc analysis of one -way analysis of variance:

3.9 Duncan’s test Chi -square test of association

3.10 Summary

3.11 Unit End Questions

3.12 References for Future Reading

3.0 OBJECTIVE This type of statistics can be used without the mean, sample size, standar d

deviation, or the estimation of any other related parameters when none of

that information is available. Since nonparametric statistics makes fewer

assumptions about the sample data, its application is wider in scope than

parametric statistics.

3.1 INTRO DUCTION A non-parametric test (sometimes called a distribution free test) does not

assume anything about the underlying distribution (for example, that the

data comes from a normal distribution ). That’s compared to parametric

test, which makes assumptions about a population’s parameters (for

example, the mean or standard deviation ); When the word “non

parametri c” is used in stats, it doesn’t mean that you know nothing about

the population. It usually means that you know the population data does

not have a normal distribution .

3.2 NON -PARAMETRIC TEST DEFINITION A non -parametric test does not assume anything about the underlying

distribution (for example, that the data comes from a normal distribution ).

That’s compared to parametric test , which makes assumptions about a munotes.in

## Page 34

34 Non-Parametric Tests population’s parameters (For example, the mean or standard deviation );

When the word “non -parametric” is used in statistics it means that the

population data does not have a normal distributi on.

3.3 NEED OF NON -PARAMETRIC TEST DEFINITION Use nonparametric tests only when assumptions like normality are

being violated. Nonparametric tests can perform well with non-normal

continuous data with large sample size (generally 15 -20 items in each

group). Non -parametric tests are used when your data isn’t normal.

For nominal scales or ordinal scale s, use non -parametric statistics.

3.4 SIGN TEST A few nonparametric tests are:

1-sample sign test : This test is used to estimate the median of a

population and compare it to a reference value or target value.

1-sample Wilcoxon signed rank test. With this test, estimate the

population median and compare it to a reference/target value.

However, the test assumes the data comes from a symmetric

distribution (eg- Cauchy distribution or uniform distribution ).

Kruskal -Wallis test . Use this test instead of a one -way ANOVA to find

out if two or more medians are different. Ranks of the data points are

used for the calculations, rather than the data points themselves.

The Mann -Kendall Trend Test looks for trends in time -series data.

Mann -Whitney test . Use this test to compa re differences between two

independent groups when dependent variables are either ordinal or

continuous.

Sign Test:

The sign test compares the sizes of two groups. It is a non-parametric or

“distribution free” test, which means the test doesn’t assume the data

comes from a particular distribution, like the normal distribution . The

sign test is an alternative to a one sample t test or a paired t test . It can

also be used for ordered (ranked) categorical data. The null hy pothesis for

the sign test is that the difference between medians is zero.

How to Calculate a Paired/Matched Sample Sign Tes t?

1. The data should be from two samples.

2. The two dependent samples should be paired or matched. For example,

depression scores from before a medical procedure and after.

Set the data in a t able. This set of data represents test scores at the end of

Spring and the beginning of the Fall semesters. The hypothesis is that

summer break means a significant drop in test scores. munotes.in

## Page 35

35 Statistical Methods And Testing of Hypothesis H0: No difference in median of the signed differences.

H1: Median of the signed differences is less than zero.

Step1: Subtract set 2 from set 1 and put the result in the third

column.

Step 2: Add a fourth column indicating the sign of the n umber in

column 3

Step 3: Count the number of positives and negatives.

4 positives.

12 negatives.

Step 3: Add up the number of items in the sample and subtract, we get a

difference of zero for (in column 3). The sample size in this question was

17, with one zero, so n = 16.

Step 4: Find the p-value using a binomial distribution table or use a

binomial calculator .

.5 for the probability . The null hypothesis is that there are an equal

number of signs (i.e., 50/50). Therefore, the test is simple binomial

experiment with a .5 chance of the sign being negative and .5 of it

being positive (assuming the null hypothesis is true).

16 for the number of trials. munotes.in

## Page 36

36 Non-Parametric Tests 4 for the nu mber of successes. “Successes” here is the smaller of either

the positive or negative signs from Step 2.

The p -value is 0.038, which is smaller than the alpha level of 0.05. We

can reject the null hypothesis and there is a significant difference.

3.5 WILCOXON’S SIGNED RANK TEST Definition:

The Wilcoxon Signed Rank Test is the non -parametric version of the

paired t -test. It is used to test whether there is a significant difference

between two population means. Use the Wilcoxon Signed Rank test when

you would like to use the paired t -test but the distribution of the

differences between the pairs is severely non-normally distributed .

Eg: Q. A basketball coach wants to know if a certain training program

increases the number of free throws made by his players. To test this, he

has 15 players shoot 20 free throws each before and after the training

program.

Solution: Since each player can be “paired” with themselves, the coach

had planned on using a paired t -test to determine if there was a significant

difference be tween the mean number of free throws made before and after

the training program.

However, the distribution of the differences turns out to be non -normal,

so the coach instead uses a Wilcoxon Signed Rank Test.

The following table shows the number of free th rows made (out of 20

attempts) by each of the 15 players, both before and after the training

program:

munotes.in

## Page 37

37 Statistical Methods And Testing of Hypothesis Step 1: State the null and alternative hypotheses.

H0: The median difference between the two groups is zero.

HA: The median difference is negative. (e.g ., the players make less free

throws before participating in the training program)

Step 2: Find the difference and absolute difference for each pair.

Step3:

Step 4: Find the sum of the positive ranks and the negative ranks .

munotes.in

## Page 38

38 Non-Parametric Tests Step 5: Reject or fail to reject the null hypothesis.

The test statistic, W, is the smaller of the absolute values of the positive

ranks and negative ranks. In this case, the smaller value is 29.5. Thus, our

test statistic is W = 29.5.

To determine if we should reject or fail to re ject the null hypothesis, we

can reference the critical value found in the Wilcoxon Signed Rank Test

Critical Values Table that corresponds with n and our chosen alpha level.

If our test statistic, W, is less than or equal to the critical value in the

table, we can reject the null hypothesis. Otherwise, we fail to reject the

null hypothesis.

The critical value that corresponds to an alpha level of 0.05 and n = 13

(the total number of pairs minus the two we didn’t calculate ranks for

since they had an observed difference of 0) is 17.

Since in test statistic (W = 29.5) is not less than or equal to 17, we fail to

reject the null hypothesis

Refer Wilcoxon Signed Rank Test Critical Values Table

Source: This Question and Solution is taken from the link: How t o

Perform the Wilcoxon Signed Rank Test - Statology

3.6 RUN TEST What Is a Runs Test?

A runs test is a statistical procedure that examines whether a string of data

is occurring randomly from a specific distribution. The runs test analyzes munotes.in

## Page 39

39 Statistical Methods And Testing of Hypothesis the occurrence of similar events that are separated by events that are

different.

Wolfowitz runs test, which was developed by mathematicians

Abraham Wald and Jacob Wolfowitz.

A runs test is a statistical analysis that helps determine the

randomness of data by revealing any variables that might affect data

patterns.

Technical traders can use a runs test to analyze statistical trends and

help spot profitable trading opportunities.

For example, an investor interested in analyzing the price movement

of a particular stock mig ht conduct a runs test to gain insight into

possible future price action of that stock.

A nonparametric test for randomness is provided by the theory of

runs. To understand what a run is, consider a sequence made up of

two symbols, a and b, such as

aa bb b a bb aaaaa bbb aaaa

The problem discussed is from Schaum’ Outline series by Murray

Spiegel, fouth edition.

In tossing a coin, for example, a could represent ‘‘heads’’ and b

could represent ‘‘tails.’’ Or in sampling the bolts produced by a

machine , a could represent ‘‘defective’’ and b could represent

‘‘nondefective.

A run is defined as a set of identical (or related) symbols contained

between two different symbols or no symbol (such as at the

beginning or end of the sequence).

Proceeding from lef t to right in sequence (10), the first run, indicated

by a vertical bar, consists of two a’s; similarly, the second run

consists of three b’s, the third run consists of one a, etc. There are

seven runs in all.

It seems clear that some relationship exists between randomness and

the number of runs. Thus, for the sequence

a b a b a b a b a b a b

there is a cyclic pattern, in which we go from a to b, back to a again,

etc., which we could hardly believe to be random. In such case we

have too many runs (in fact, we have the maximum number possible

for the given number of a’s and b’s). On the other hand, for the

sequence

aaaaaa bbbb aaaaa bbb munotes.in

## Page 40

40 Non-Parametric Tests There seems to be a trend pattern, in which the a’s and b’s are

grouped (or clustered) together. In suc h case there are too few runs,

and we would not consider the sequence to be random. Thus, a

sequence would be considered nonrandom if there are either too

many or too few runs, and random otherwise.

To quantify this idea, suppose that we form all possible sequences

consisting of N1 a’s and N2 b’s, for a total of N symbols in all N1 +

N2 = N. The collection of all these sequences provides us with a

sampling distribution: Each sequence has an associated number of

runs, denoted by V. In this way we are led to the sampling

distribution of the statistic V. It can be shown that this sampling

distribution has a mean and variance given, respectively, by the

formulas

By using formulas, we can test the hypothesis of randomness at

appropriate levels of significanc e. It turns out that if both N1 and N2 are

at least equal to 8, then the sampling distribution of V is very nearly a

normal distribution. Thus, it is normally distributed with mean 0 and

variance 1.

3.7 KRUSKAL -WALIS TEST The Kruskal –Wallis Non -Parametri c Hypothesis Test (1952) is a non -

parametric. It is generally used when the measurement variable does not

meet the normality assumptions of one -way ANOVA. It is also a popular

nonparametric test to compare outcomes among three or more

independent (unmatche d) groups.

Assumptions of the Kruskal -Wallis Test :

All samples are randomly drawn from their respective population.

Independence within each sample.

The measurement scale is at least ordinal.

Mutual independence among the various samples

Procedure to con duct Kruskal -Wallis Test :

First pool all the data across the groups. munotes.in

## Page 41

41 Statistical Methods And Testing of Hypothesis Rank the data from 1 for the smallest value of the dependent variable

and next smallest variable rank 2 and so on… (if any value ties, in

that case it is advised to use mid -point), N bein g the highest variable.

Compute the test statistic

Determine critical value from Chi -Square distribution table

Finally, formulate decision and conclusion

Calculation of the Kruskal -Wallis Non -Parametric Hypothesis Test:

The Kruskal –Wallis Non -Parametric Hy pothesis Test is to compare

medians among k groups (k > 2). The null and alternative hypotheses for

the Kruskal -Wallis test are as follows:

Null Hypothesis H 0: Population medians are equal

Alternative Hypothesis H 1: Population medians are not all equal

Kruskal-Wallis test pools the observations from the k groups into one

combined sample, and then ranks from lowest to highest value (1 to N),

where N is the total number of values in all the groups.

The test statistic for the Kruskal Wallis test denoted as H i s given as

follows :

Where T i = rank sum for the ith sample i = 1, 2…,k

In Kruskal -Wallis test, the H value will not have any impact for any two

groups in which the data values have same ranks. Either increasing the

largest value or

decreasing the smalles t value will have zero effect on H. Hence, the

extreme outliers (higher and lower side) will not impact this test.

Example of Kruskal -Wallis Non -Parametric Hypothesis Test :

In a manufacturing unit, four teams of operators were randomly selected

and sent to four different facilities for machining techniques training.

After the training, the supervisor conducted the exam and recorded the

test scores. At 95% confidence level does the scores are same in all four

facilities.

munotes.in

## Page 42

42 Non-Parametric Tests Null Hypothesis H 0: The distributio n of operator scores is same

Alternative Hypothesis H 1: The scores may vary in four facilities

N=16

Rank the score in all the facilities:

Right tailed chi -square test with 95% confidence level, and df =3,

critical χ2 value is 7.815

Calculated χ2 value is greater than the critical value of χ2for a 0.05

significance level. χ2

calculated >χ2

critical hence reject the null hypotheses

There is a difference in test scores exists for four teaching methods at

different facilities.

3.8 POST -HOC ANALYSIS OF ONE -WAY ANALYSIS OF VARIANCE ANOVA test tells the overall difference between the groups, but it does

not tell you which specific groups differed – post hoc tests do that.

Because post hoc tests are run to confirm where the differences occurred

between groups, they should only be run when showed an overall munotes.in

## Page 43

43 Statistical Methods And Testing of Hypothesis statistically significant difference in group means (i.e., a statistically

significant one -way ANOVA result). Post hoc tests attempt to control the

experiment wise error rate (usually alpha = 0.05) in the same manner that

the one -way ANOVA is used instead of multiple t -tests.

3.9 DUNCAN’S TEST CHI -SQUARE TEST OF ASSOCIATION Chi square test:

A chi -square (χ2) statistic is a test that measures how expectations

compare to actual observed data (or model results).

Where o -observed frequency

e-expected frequency

Degree of freedom for mxn table -

ϑ=(no of rows -1)(no of columns -1)

ϑ=(m-1)(n-1)

Eg: In an experiment to study the dependence of hypertension on smoking

habits, the following data is taken from 180 individuals No Smokers Moderate Smokers Heavy smokers Hypertension 21 36 30 No hypertension 48 26 19

Test the hypothesis at 5 % LOS that the presence or absence of

hypertension is independent of smoking. (Given - χ_tab^2=5.99)

Solution:

Ho: Prese nce or absence of hypertension is independent of smoking.

H1: Presence or absence of hypertension is dependent of smoking. No Smokers O Moderate Smokers o Heavy smokers o Hypertension 21 36 30 RT1=87 No hypertension 48 26 19 RT2= 93 Total=180 CT1 =69 CT2=62 CT3=49 Total=180 RT=Row Total and CT=Column Total No Smokers Moderate Heavy munotes.in

## Page 44

44 Non-Parametric Tests E Smokers e smokers e Hypertension (RT1 x CT1)/Total RT1xCT2/Total RT1xCT3/Total No hypertension (RT2 x CT1)/Total (RT2 x CT2)/Total (RT2 x CT3)/Total No Smokers E Moderate Smokers e Heavy smokers e Hypertension 87*69/180 87*62/180 87*49/180 No hypertension 93*69/180 93*62/180 93*49/180 No

Smokers

O Moderate Smokers O Heavy smokers O Total Hypertension 21 36 30 87 No hypertension 48 26 19 93 Total 69 62 49 180 o e (0-e)2/e 21 33.35 4.5734 36 29.967 1.2177 30 23.683 1.6849 48 35.65 4.2780 26 32.033 1.1363 19 25.316 1.5761

= 14.46

χ^2=14.46

χ_tab^2=5.99

As χ^2> χtab^2, Reject H0 at 5% LOS.

Therefore, we can conclude that Presence or absence of hypertension is

dependent of smoking.

The Chi -square test of independence determines whether there is a

statistically significant relationshi p between categorical variables. It is a

hypothesis test that answers the question —do the values of one

categorical variable depend on the value of other categorical variables?

This test is also known as the chi -square test of association. munotes.in

## Page 45

45 Statistical Methods And Testing of Hypothesis Null hypothesis: There are no relationships between the categorical

variables. If one variable is known, it does not help you predict the

value of another variable.

Alternative hypothesis: There are relationships between the

categorical variables. Knowing the value of one variable does help

you predict the value of another variable.

The Chi -square test of association work s by comparing the distribution

that you observe to the distribution that you expect if there is no

relationship between the categorical variables.

For a Chi-square test, a p -value that is less than or equal to your

significance level indicates there is sufficient evidence to conclude that

the observed distribution is not the same as the expected distribution. You

can conclude that a relationship exists bet ween the categorical variables.

A Chi -square test of independence to determine whether there is a

statistically significant association between shirt color and deaths. We

need to use this test because these variables are both categorical variables.

Shirt c olor can be only blue, gold, or red. Fatalities can be only dead or

alive.

The problem discussed is from https://statisticsbyjim.com/hypothesis -

testing/chi -square -test-independence -example/

Eg -The color of the uniform represents each crewmember’s work are a.

We will statistically assess whether there is a connection between uniform

color and the fatality rate. Color Areas Crew Fatalities Blue Science and Medical 136 7 Gold Command and Helm 55 9 Red Operations, Engineering, and Security 239 24 Ships’ Total All 430 40

For example, we will determine whether the observed counts of deaths by

uniform color are different from the distribution that we’d expect if there

is no association between the two variables.

Color Status Frequency Blue Dead 7 Blue Alive 129 Gold Dead 9 Gold Alive 46 Red Dead 24 Red Alive 215 munotes.in

## Page 46

46 Non-Parametric Tests

Both p -values are less than 0.05. Reject the null hypothesis and there

is a relationship between shirt color and deaths.

3.10 SUMMARY In statistics, nonparametric tests are methods of statist ical analysis that do

not require a distribution to meet the required assumptions to be analyzed

(especially if the data is not normally distributed).

It is also referred to as distribution -free tests. Nonparametric tests serve as

an alternative to parame tric tests such as T -test or ANOVA that can be

employed only if the underlying data satisfies certain criteria and

assumptions.

3.11 UNIT END QUESTIONS Q1.

munotes.in

## Page 47

47 Statistical Methods And Testing of Hypothesis Given - χ_tab^2=9.49

Q2. The PQR Company claims that the lifetime of a type of battery that it

manu factures is more than 250 hours (h). A consumer advocate wishing

to determine whether the claim is justified measures the lifetimes of 24 of

the company’s batteries; the results are listed below. Assuming the

sample to be random, determine whether the comp any’s claim is justified

at the 0.05 significance level. Work the problem first by hand, supplying

all the details for the sign test

Q3.A sample of 40 grades from a statewide examination is shown below.

Test the hypothesis at the 0.05 significance level that the median grade for

all participants is (a) 66 and (b) 75. Work the problem first by hand,

supplying all the details for the sign test.

Q4.A company wishes to purchase one of five different machines: A, B,

C, D, or E. In an experiment designed to d etermine whether there is a

performance difference between the machines, five experienced operators

each work on the machines for equal times . The table below shows the

number of units produced by each machine. Test the hypothesis that there

is no difference between the machines at the (a) 0.05 and (b) 0.01

significance levels. Work the problem first by hand, supplying all the

details for the K ruskal –Wallis H test.

Q5. In 30 tosses of a coin the following sequence of heads (H) and tails

(T) is obtained:

H T T H T H H H T H H T T H T

H T H H T H T T H T H H T H T

(a) Determine the number of runs, V. munotes.in

## Page 48

48 Non-Parametric Tests (b) Test at the 0.05 significance level w hether the sequence is random.

Work the problem first by hand, supplying all the details of the runs test

for randomness.

3.12 REFERENCES FOR FUTURE READING Schaum’ Outline series by Murray Spiegel, fouth edition.

https://www.statisticshowto.com/probability -and-statistics/statistics -

definitions/parametric -and-non-parametric -data/

https://www.statisticshowto.com/sign -test/

How to Perform the Wilcoxon Signed Rank Test - Statology

https://sphweb.bumc.bu.edu/otlt/mph -

modules/bs/bs704_nonparametric/bs704_nonparametric_print.html

https://www.statisticshowto.com/kruskal -wallis/

https://statistics.laerd.com/statistical -guides/one -way-anova -statistical -

guide -4.php

*****

munotes.in