7 Hypothesis testing

Chapter 7
Hypothesis testing

7.1 Hypothesis testing: One population

A sizable volume of scientific efforts involve questions on a given population parameter. While we estimate/calculate an interval to contain an unknown population parameter with a probability of (1− α) 100% under the heading of Confidence interval estimation, here in Hypothesis testing, we question the viability of a given value as the value of our unknown population parameter.

So, what we do is to check for the validity of a claim about an unknown, in formal terms.

A statistical hypothesis is a statement about the numerical value of a parameter. The null hypothesis, denoted H₀, represents the hypothesis that is assumed to be true unless the data provide convincing counter evidence. This usually represents the status quo or some claim about the parameter that the researcher states.

The alternative hypothesis, denoted H₁, represents the hypothesis that will be maintained only if the data provide convincing evidence for its truth.

The test statistic is a sample statistic, computed from information provided in the sample, that the researcher uses to decide between the null and alternative hypotheses.

A Type I error occurs if the researcher rejects the null hypothesis in favor of the alternative hypothesis when, in fact, the null hypothesis is true. The probability of Type I error is denoted as α.

The rejection region of a statistical test is the set of possible values of the test statistic for which the researcher will reject the null hypothesis in favor of the alternative.

A Type II error occurs if the researcher fails to reject the null hypothesis when, in fact, the null hypothesis is false. The probability of Type II error is denoted as β.

⁰Checkpoint
No: 76

A step-by- step description of Hypothesis testing:

1. Establish H0 and H1

2. Determine test statistic (test score) and its statistical distribution
3I.n Sae ntu αtshell

4. Experiment/collect data and calculate test statistic

5. If the value of the test statistic falls in the rejection region conclude
Reject H0, otherwise conclude Fail to reject H0

⁰Checkpoint
No: 77

⁰ Checkpoint
No: 78

7.1 EXERCISES ___________________________________________________________

1.

For each of the situations below write the null and alternative hypotheses, corresponding to the test, in plain English; then write the null and alternative hypotheses using only mathematical symbols; then state what the symbols you used above represents:

i. We would like to test if the income share held by the highest earning 20% is less than 46%.

ii. We would like to test if the average income of males is greater than the average income of females.

iii. It is claimed that, among the people who drinks at least 2 liters of water every day, the percentage of those with a kidney problem is less than 5%. We suspect the truth of this statement and would like to test it.

iv. We would like to test if a coin is fair.

v. We would like to test if a coin is not a fair coin.

Solution: This exercise is left as self-study.

7.1.1 Hypothesis testing
Mean of a normal population
Case: Known population variance

Consider:

H₀ : μ ≤ μ₀
H₁ : μ > μ₀

and {xi} _i=1ⁿ, a random sample of n observations from a normal population N (μ,σ2) with known σ².

At the statistical significance level α:

H₀ is rejected if

$¯ x−σμ0-> z1− α √n$
and we fail to reject H₀ otherwise.

7.2 EXERCISES ___________________________________________________________

1.

A researcher wonders if the mean wage rate of workers in Ankara is greater than 6500. The population variance is known to be 1000000. The researcher measures the mean wage rate of a sample of 64 workers as 7000. Conduct and conclude the relevant hypothesis testate the significance level of 5%.

Solution:

H₀ : μ ≤ 6500
H₁ : μ > 6500

The relevant distribution is z.

Since:

7000-−-6500 = 4.00 > 1.65 1√00604

we reject H₀ at α = 0.05.

For:

H₀ : μ ≥ μ₀
H₁ : μ < μ₀

At the statistical significance level α:

H₀ is rejected if

$¯x−-μ0- √σ- < zα n$
and we fail to reject H₀ otherwise.

7.3 EXERCISES ___________________________________________________________

1.

A researcher wonders if the mean wage rate of workers in Ankara is less than 7500. The population variance is known to be 1000000. The researcher measures the mean wage rate of a sample of 64 workers as 7000. Conduct and conclude the relevant hypothesis testate the significance level of 5%.

Solution:

H₀ : μ ≥ 7500
H₁ : μ < 7500

The relevant distribution is z.

Since:

7000−-7500 = −4.00 < − 1.65 1√00604

we reject H₀ at α = 0.05.

For:

H₀ : μ = μ₀
H₁ : μ≠μ₀

At the statistical significance level α:

H₀ is rejected if

$x¯−-μ0 ¯x-−-μ0 √σ- < zα/2 or σ√-- > z1−α/2 n n$
and we fail to reject H₀ otherwise.

7.4 EXERCISES ___________________________________________________________

1.

A researcher wonders if the mean wage rate of workers in Ankara is different than 7500. The population variance is known to be 1000000. The researcher measures the mean wage rate of a sample of 64 workers as 7000. Conduct and conclude the relevant hypothesis testate the significance level of 5%.

Solution:

H₀ : μ = 7500
H₁ : μ≠7500

The relevant distribution is z.

Since:

7000− 7500 ---1√000--- = −4.00 is outside of [−1.96,1.96] 64

we reject H₀ at α = 0.05.

7.1.2 Hypothesis testing
Mean of a normal population
Case: Unknown population variance

Consider:

H₀ : μ ≤ μ₀
H₁ : μ > μ₀

and {xi} _i=1ⁿ, a random sample of n observations from a normal population N (μ,σ2) where σ² is unkown.

At the statistical significance level α:

H₀ is rejected if

$¯x −μ --s√--0> tn−1,1−α n$
and we fail to reject H₀ otherwise.

7.5 EXERCISES ___________________________________________________________

1.

A researcher wonders if the mean wage rate of workers in Ankara is greater than 6500 , for which the population variance is unknown. The researcher measures the mean wage rate of a sample of 64 workers as 7000 and the ’sample variance’ as 640000. Conduct and conclude the relevant hypothesis test at the significance level of 5%.

Solution:

H₀ : μ ≤ 6500
H₁ : μ > 6500

The relevant distribution is t and the degrees of freedom is 63.

Since:

7000−-6500 8√00- = 5.00 > 1.669 64

we reject H₀ at α = 0.05.

For:

H₀ : μ ≥ μ₀
H₁ : μ < μ₀

At the statistical significance level α:

H₀ is rejected if

$¯x-−μ-0 s√-- < tn−1,α n$
and we fail to reject H₀ otherwise.

7.6 EXERCISES ___________________________________________________________

1.

A researcher wonders if the mean wage rate of workers in Ankara is less than 7500 , for which the population variance is unknown. The researcher measures the mean wage rate of a sample of 64 workers as 7000 and the ’sample variance’ as 640000. Conduct and conclude the relevant hypothesis test at the significance level of 5%.

Solution:

H₀ : μ ≥ 7500
H₁ : μ < 7500

The relevant distribution is t and the degrees of freedom is 63.

Since:

7000-−-7500 8√00 = − 5.00 < −1.669 64

we reject H₀ at α = 0.05.

For:

H₀ : μ = μ₀
H₁ : μ≠μ₀

At the statistical significance level α:

H₀ is rejected if

$x¯−-μ0 < t or ¯x-−-μ0> t √sn- n−1,α/2 √sn- n−1,1−α/2$
and we fail to reject H₀ otherwise.

7.7 EXERCISES ___________________________________________________________

1.

A researcher wonders if the mean wage rate of workers in Ankara is different than 7500 , for which the population variance is unknown. The researcher measures the mean wage rate of a sample of 64 workers as 7000 and the ’sample variance’ as 640000. Conduct and conclude the relevant hypothesis test at the significance level of 5%.

Solution:

H₀ : μ = 7500
H₁ : μ≠7500

The relevant distribution is t and the degrees of freedom is 63.

Since:

7000−-7500 = −5.00 is outside of [−1.998,1.998] √80064-

we reject H₀ at α = 0.05.

⁰Checkpoint
No: 79

7.1.3 Hypothesis testing
Population proportion

Consider:

H₀ : P ≤ P₀
H₁ : P > P₀

and {xi} _i=1ⁿ, a random sample of n observations from a Bernoulli (P) population.

At the statistical significance level α:

H₀ is rejected if

$∘-̂p−-P0--> z P0(1−P0) 1−α n$
and we fail to reject H₀ otherwise.

7.8 EXERCISES ___________________________________________________________

1.

A political candidate wonders if her nationwide support rate exceeds 50%. Among a sample of 64 people, we know 35 support the political candidate. Conduct and conclude the relevant hypothesis test at the significance level of 5%.

Solution:

H₀ : P ≤ 0.50
H₁ : P > 0.50

The relevant distribution is z.

Since:

∘0.547−-0.500--= 0.750 is not greater than 1.65 0.500(1−0.500) 64

we fail to reject H₀ at α = 0.05.

For:

H₀ : P ≥ P₀
H₁ : P < P₀

At the statistical significance level α:

H₀ is rejected if

$∘-̂p−-P0--< zα P0(1−P0) n$
and we fail to reject H₀ otherwise.

7.9 EXERCISES ___________________________________________________________

1.

A political candidate wonders if her nationwide support rate falls short of 50%. Among a sample of 64 people, we know 30 support the political candidate. Conduct and conclude the relevant hypothesis test at the significance level of 5%.

Solution:

H₀ : P ≥ 0.50
H₁ : P < 0.50

The relevant distribution is z.

Since:

0∘.469−-0.500--= −0.500 is not less than− 1.65 0.500(1−640.500)

we fail to reject H₀ at α = 0.05.

For:

H₀ : P = P₀
H₁ : P≠P₀

At the statistical significance level α:

H₀ is rejected if

$--̂p−-P0-- --̂p−-P0-- ∘ P0(1−P0)< zα/2 or∘ P0(1−P0)> z1−α/2 ---n--- ---n---$
and we fail to reject H₀ otherwise.

7.10 EXERCISES ___________________________________________________________

1.

A political candidate wonders if her nationwide support rate is different than 50%. Among a sample of 64 people, we know 35 support the political candidate. Conduct and conclude the relevant hypothesis test at the significance level of 5%.

Solution:

H₀ : P = 0.50
H₁ : P≠0.50

The relevant distribution is z.

Since:

0∘.469−-0.500--= −0.500 is not outside [−1.96,1.96] 0.500(1−-0.500) 64

we fail to reject H₀ at α = 0.05.

⁰Checkpoint
No: 80

7.1.4 Hypothesis testing
Variance of a normal population

Consider:

H₀ : σ² ≤ σ₀²
H₁ : σ² > σ₀²

and {xi} _i=1ⁿ, a random sample of n observations from a normal population N ( )
μ,σ2 .

At the statistical significance level α:

H₀ is rejected if

$2 (n-−-1)s-> χ 2n− 1,1− α σ20$
and we fail to reject H₀ otherwise.

7.11 EXERCISES ___________________________________________________________

1.

A process engineer is concerned with the variation of - temperature in an industrial furnace and wonders if it exceeds 1500. She collects a random sample of temperatures as:

975 1075 1050 900 1000 950 1025 1050 975∘C

Conduct and conclude the relevant hypothesis test at the significance level of 5%.

Solution:

H₀ : σ² ≤ 1500
H₁ : σ² > 1500

The relevant distribution is χ² and the degrees of freedom is 8.
Since:

9 − 1 3125 (----)----= 16.667 > 15.507 1500

we reject H₀ at α = 0.05.

For:

H₀ : σ² ≥ σ₀²
H₁ : σ² < σ₀²

At the statistical significance level α:

H₀ is rejected if

$(n − 1)s2 ----2---< χ 2n− 1,α σ0$
and we fail to reject H₀ otherwise.

7.12 EXERCISES ___________________________________________________________

1.

A process engineer is concerned with the variation of - temperature in an industrial furnace and wonders if it is less than 2500. She collects a random sample of temperatures as:

975 1075 1050 900 1000 950 1025 1050 975∘C

Conduct and conclude the relevant hypothesis test at the significance level of 5%.

Solution:

H₀ : σ² ≥ 2500
H₁ : σ² < 2500

The relevant distribution is χ² and the degrees of freedom is 8.

Since:

(9 − 1)3125 ----2500--- = 10.000 is not less than 2.733

we fail to reject H₀ at α = 0.05.

For:

H₀ : σ² = σ₀²
H₁ : σ²≠σ₀²

At the statistical significance level α:

H₀ is rejected if

$(n − 1)s2 (n− 1)s2 ----2--- < χ2n−1,α/2 or----2--- > χ2n−1,1−α/2 σ0 σ0$
and we fail to reject H₀ otherwise.

7.13 EXERCISES ___________________________________________________________

1.

A process engineer is concerned with the variation of - temperature in an industrial furnace and wonders if it is different than 2000. She collects a random sample of temperatures as:

975 1075 1050 900 1000 950 1025 1050 975∘C

Conduct and conclude the relevant hypothesis test at the significance level of 5%.

Solution:

H₀ : σ² = 2000
H₁ : σ²≠2000

The relevant distribution is χ² and the degrees of freedom is 8.

Since:

(9-−-1)-3125 2000 = 12.500 is not outside [2.180,17.535]

we fail to reject H₀ at α = 0.05.

7.14 EXERCISES ___________________________________________________________

1.

A manufacturer of automobile batteries claim that at least 80% of the batteries that it produces will last 36 months. A consumers’ advocate group wants to evaluate this longevity claim and selects a random sample of 28 batteries to test. The following data indicate the length of time (in months) that each of these batteries lasted (i. e., performed properly before failure):

42.3, 39.6, 25.0, 56.2, 37.2, 47.4, 57.5, 39.3, 39.2, 47.0, 47.4, 39.7, 57.3, 51.8, 31.6, 45.1, 40.8, 42.4, 38.9, 42.9, 34.1, 49.0, 41.5, 60.1, 34.6, 50.4, 30.7, 44.1

Now, we would like to test, at a significance level of 0.05, if there is a significant evidence that less than 80% of the batteries will last at least 36 months? Conduct and conclude the test.

Solution: The critical element of solution is that what we are testing here is not the mean product life, rather it is the proportion of items that last at least 36 months. So, begin by counting the product lifetimes (among the given 28 measurements), calculate and proceed straightforwardly with the rest. 8402 This exercise is left as self-study.

2.

Right after the poll stations are closed at 17:00, a political candidate receives the information that out of the 50 people interviewed her approval "count" is 24. As a statistics lover, she immediately tests the null hypothesis that her population approval rate is less than or equal 0.50 against its respective alternative, at the 5% level of statistical significance. What is the conclusion of this test? Suppose in every consecutive 15 minutes, number of people interviewed increases by 5 and approval count increases by 4. Find the earliest time, HH:MM, that she can declare her victory based on her tests of hypotheses. Note that a formal statistical/algebraic solution is expected with proper terminology and notation.

Solution: This exercise is left as self-study.

7.2 Hypothesis testing: Two populations

In this part, you are more than welcome to transfer your earlier, indeed recently acquired, knowledge to understand things better. Except for one case or two, the material remains fairly intact compared to the ones in confidence intervals for two populations.

⁰Checkpoint
No: 81

7.2.1 Hypothesis testing
Difference between two normal population means
Case: Dependent (matched) samples

Consider:

H₀ : μ_x −μ_y ≤ 0
H₁ : μ_x −μ_y > 0

Let {x}
i _i=1ⁿ and {y}
i _i=1ⁿ, be two matched samples.

At the statistical significance level α:

H₀ is rejected if

$d¯ -s√d-> tn− 1,1−α n$
and we fail to reject H₀ otherwise.

For:

H₀ : μ_x −μ_y ≥ 0
H₁ : μ_x −μ_y < 0

At the statistical significance level α:

H₀ is rejected if

$-d¯ < t s√dn- n− 1,α$
and we fail to reject H₀ otherwise.

For:

H₀ : μ_x −μ_y = 0
H₁ : μ_x −μ_y≠0

At the statistical significance level α:

H₀ is rejected if

$-d¯ < t or-¯d- > t √sdn- n−1,α/2 √sdn- n−1,1−α/2$
and we fail to reject H₀ otherwise.

7.15 EXERCISES ___________________________________________________________

1.

A company is about to release a new drug to assist weight loss, and we are in charge of assessing how effective the drug is. We pick a random sample of 8 people with the following pre-drug body weights:

90,95,105,95,110,85,100,90

After using the drug for the designated test duration, the post-drug body weights are measured as:

85,80,110,90,110,80,95,90

Conduct and conclude a hypothesis test at the significance level of 5% to assess if ’pre-drug minus post-drug difference of mean body weights is positive’.

Solution:

H₀ : μ_x −μ_y ≤ 0
H₁ : μ_x −μ_y > 0

The difference series (pre-drug minus post-drug) is:

+5,+ 15,− 5,+5,0,+5,+ 5,0

The relevant distribution is t and the degrees of freedom is 7.

Since:

3.75 -5.√825-= 1.821 is not greater than > 1.895 8

we fail to reject H₀ at α = 0.05.

7.2.2 Hypothesis testing
Difference between two normal population means
Case: Independent samples & Known population variances

Consider:

H₀ : μ_x −μ_y ≤ 0
H₁ : μ_x −μ_y > 0

Let,

{xi} _i=1^n_x ⊂ {xi} _i=1^N_x ∼ N ( 2)
μx,σx {yi} _i=1^n_y ⊂ {yi} _i=1^N_y ∼ N ( )
μy,σ2y

where σ_x² and σ_y² are known.

At the statistical significance level α:

H₀ is rejected if

$¯x− y¯ ∘--2---σ2> z1−α σnxx + yny-$
and we fail to reject H₀ otherwise.

For:

H₀ : μ_x −μ_y ≥ 0
H₁ : μ_x −μ_y < 0

At the statistical significance level α:

H₀ is rejected if

$∘-¯x−-y¯--< zα σ2x+ σ2y- nx ny$
and we fail to reject H₀ otherwise.

For:

H₀ : μ_x −μ_y = 0
H₁ : μ_x −μ_y≠0

At the statistical significance level α:

H₀ is rejected if

$¯ ¯ ¯ ¯ ∘-x2−-y-2< zα/2 or∘-x2−y--2> z1−α/2 σxnx-+ σnyy σnxx + σyny$
and we fail to reject H₀ otherwise.

7.16 EXERCISES ___________________________________________________________

1.

A researcher wonders if the mean wage of workers in Ankara falls short of that in Istanbul. She has the following data and information: Mean wage rate of 49 workers from Ankara is 6000. Mean wage rate of 81 workers from Istanbul is 7000. Population variance of wages in Ankara and Istanbul are known to be 640000 and 810000, respectively. Conduct and conclude a hypothesis test at the significance level of 5% to assess if mean wage rate in Ankara is less than the mean wage rate in Istanbul.

Solution:

H₀ : μ_x −μ_y ≥ 0
H₁ : μ_x −μ_y < 0

The relevant distribution is z.

Since:

--6000-−-7000-- ∘ -640000- 810000-= − 6.585 < −1.65 49 + 81

we reject H₀ at α = 0.05.

7.2.3 Hypothesis testing
Difference between two normal population means
Case: Independent samples & Unknown yet equal population variances

Consider:

H₀ : μ_x −μ_y ≤ 0
H₁ : μ_x −μ_y > 0

Let,

{xi} _i=1^n_x ⊂ {xi} _i=1^N_x ∼ N ( 2)
μx,σx {yi} _i=1^n_y ⊂ {yi} _i=1^N_y ∼ N ( )
μy,σ2y

where σ_x² and σ_y² are unkown but assumed to be equal.

At the statistical significance level α:

H₀ is rejected if

$¯x− y¯ ∘--2---2-> tnx+ny−2,1−α snp+ nsp x y$
and we fail to reject H₀ otherwise.

In our formulation:

2 ( ) 2 s2p = (nx-−-1)sx+-ny−-1-sy nx+ ny −2

and,

2 ∑nix=1(xi−-¯x)2 sx = nx− 1

ny 2 ∑i=1(yi−-¯y)2 sy = ny− 1

For:

H₀ : μ_x −μ_y ≥ 0
H₁ : μ_x −μ_y < 0

At the statistical significance level α:

H₀ is rejected if

$∘-¯x−-y¯--< tn+n −2,α s2p s2p x y nx + ny$
and we fail to reject H₀ otherwise.

For:

H₀ : μ_x −μ_y = 0
H₁ : μ_x −μ_y≠0

At the statistical significance level α:

H₀ is rejected if

$¯x− y¯ ¯x −y¯ ∘--2---2-< tnx+ny−2,α/2 or∘--2---2-> tnx+ny− 2,1− α/2 sp+ sp- sp + sp- nx ny nx ny$
and we fail to reject H₀ otherwise.

7.17 EXERCISES ___________________________________________________________

1.

A researcher wonders if the mean wage of workers in Ankara falls short of that in Istanbul. She has the following data and information: Mean wage rate of 49 workers from Ankara is 6000. Mean wage rate of 81 workers from Istanbul is 7000. Population variances of wages in Ankara and Istanbul are unknown but they are assumed to be equal. Sample variance of wages in Ankara and Istanbul are calculated as 490000 and 640000, respectively. Conduct and conclude a hypothesis test at the significance level of 5% to assess if mean wage rate in Ankara is less than the mean wage rate in Istanbul.

Solution:

H₀ : μ_x −μ_y ≤ 0
H₁ : μ_x −μ_y > 0

s2p = (49-−-1)490000+-(81−-1)640000- 49+ 81− 2

The relevant distribution is t and the degrees of freedom is 128.

Since:

6000− 7000 ∘------------- = −7.232 < − 1.657 58374950+ 58378150

we reject H₀ at α = 0.05.

7.2.4 Hypothesis testing
Difference between two normal population means
Case: Independent samples & Unknown and unequal population variances

Consider:

H₀ : μ_x −μ_y ≤ 0
H₁ : μ_x −μ_y > 0

Let,

{xi} _i=1^n_x ⊂ {xi} _i=1^N_x ∼ N ( )
μx,σx2 {yi} _i=1^n_y ⊂ {yi} _i=1^N_y ∼ N ( )
μy,σ2y

where σ_x² and σ_y² are unkown and assumed not to be equal.

At the statistical significance level α:

H₀ is rejected if

$∘-¯x−-¯y---> t s2x s2y ν,1− α nx + ny$
and we fail to reject H₀ otherwise.

In our formulation:

(( s2) ( s2) )2 nxx + nyy ν = -(--)2---(-2)2-- ns2xx snyy -nx−-1 + ny−1-

Notice that, if n_x = n_y = n

( ) | 2 | ν = (1 + s2--s2y) (n− 1) sx2y + s2x

For:

H₀ : μ_x −μ_y ≥ 0
H₁ : μ_x −μ_y < 0

At the statistical significance level α:

H₀ is rejected if

$∘-¯x−-¯y---< tν,α s2x s2y nx + ny$
and we fail to reject H₀ otherwise.

For:

H₀ : μ_x −μ_y = 0
H₁ : μ_x −μ_y≠0

At the statistical significance level α:

H₀ is rejected if

$-x¯−-¯y--- --¯x-−y¯-- ∘-s2---s2y < tν,α/2 or∘ s2---s2y-> tν,1−α/2 xnx + ny nxx + ny$
and we fail to reject H₀ otherwise.

7.18 EXERCISES ___________________________________________________________

1.

A researcher wonders if the mean wage of workers in Ankara falls short of that in Istanbul. She has the following data and information: Mean wage rate of 49 workers from Ankara is 6000. Mean wage rate of 81 workers from Istanbul is 7000. Population variances of wages in Ankara and Istanbul are unknown and they are assumed to be unequal. Sample variance of wages in Ankara and Istanbul are calculated as 490000 and 640000, respectively. Conduct and conclude a hypothesis test at the significance level of 5% to assess if mean wage rate in Ankara is less than the mean wage rate in Istanbul.

Solution:

H₀ : μ_x −μ_y ≤ 0
H₁ : μ_x −μ_y > 0

The relevant distribution is t and the degrees of freedom is ν:

( ( ) ( ))2 49040900 + 64080010 ν = ----4900002----6400002-- → 112 (4499−)1-+ (8811−)1--

Since:

6000− 7000 ∘------------- = −7.474 < − 1.659 49004900+ 64008100

we reject H₀ at α = 0.05.

⁰Checkpoint
No: 82

7.2.5 Hypothesis testing
Difference between two population proportions

Consider:

H₀ : P_x −P_y ≤ 0
H₁ : P_x −P_y > 0

Let

{xi} _i=1^n_x ⊂ {xi} _i=1^n_X ∼ Bernoulli (Px)
{y}
i _i=1^n_y ⊂ {y}
i _i=1^n_Y ∼ Bernoulli (P )
y

At the statistical significance level α:

H₀ is rejected if

$∘-----̂px−-p̂y------ > z1−α ̂p0(1−̂p0)+ ̂p0(1−-̂p0)- nx ny$
and we fail to reject H₀ otherwise.

In our formulation:

n ̂p + n ̂p ̂p0 = -x-x---y-y- nx+ ny

For:

H₀ : P_x −P_y ≥ 0
H₁ : P_x −P_y < 0

At the statistical significance level α:

H₀ is rejected if

$̂px− p̂y ∘----------------- < zα ̂p0(1−nx̂p0)+ ̂p0(1−nŷp0)-$
and we fail to reject H₀ otherwise.

For:

H₀ : P_x −P_y = 0
H₁ : P_x −P_y≠0

At the statistical significance level α:

H₀ is rejected if

$∘-----̂px−-̂py------ ∘-----̂px−-̂py------ p̂0(1−p̂0) p̂0(1−p̂0)< zα/2 or p̂0(1−p̂0) p̂0(1−p̂0)> z1−α/2 nx + ny nx + ny$
and we fail to reject H₀ otherwise.

7.19 EXERCISES ___________________________________________________________

1.

A political candidate wonders if her support rate in Ankara exceeds that in Istanbul. We know that among 64 people from Ankara 35 supports the candidate and among 81 people from Istanbul 45 supports the candidate. Conduct and conclude the relevant hypothesis test at the significance level of 5%.

Solution:

H₀ : P_x −P_y < 0
H₁ : P_x −P_y ≥ 0

64-⋅0.547+-81⋅0.556 ̂p0 = 64+ 81 → 0.552

The relevant distribution is z.

Since:

0.547 − 0.556 ∘------------------------ is not greater than 1.65 0.552(16−40.552) + 0.552(1−810.552)

we fail to reject H₀ at α = 0.05.

⁰ Checkpoint
No: 83

7.2.6 Hypothesis testing
Equality of variances of two normal populations

Consider:

H₀ : σ_x² ≤ σ_y²
H₁ : σ_x² > σ_y²

Let

{xi} _i=1^n_x ⊂ {xi} _i=1^N_x ∼ N ( 2)
μx,σx {yi} _i=1^n_y ⊂ {yi} _i=1^N_y ∼ N ( )
μy,σ2y

At the statistical significance level α:

H₀ is rejected if

$2 sx-> Fnx−1,ny− 1,1− α s2y$
and we fail to reject H₀ otherwise.

Go over the description of F-distribution in Chapter 10.

In our formulation:

2 ∑nix=1(xi−-¯x)2 sx = nx− 1

ny 2 s2= ∑i=1(yi−-¯y)- y ny− 1

For:

H₀ : σ_x² ≥ σ_y²
H₁ : σ_x² < σ_y²

At the statistical significance level α:

H₀ is rejected if

$s2x s2y < Fnx−1,ny− 1,α$
and we fail to reject H₀ otherwise.

For:

H₀ : σ_x² = σ_y²
H₁ : σ_x²≠σ_y²

At the statistical significance level α:

H₀ is rejected if

$2 2 sx2 < Fnx− 1,ny−1,α/2 or sx2 > Fnx−1,ny−1,1−α/2 sy sy$
and we fail to reject H₀ otherwise.

7.20 EXERCISES ___________________________________________________________

1.

A process engineer wonders if the temperature variation in Furnace X exceeds that in Furnace Y. Sample variance of temperatures in Furnace X is 1600 on the basis of 10 temperature readings and sample variance of temperatures in Furnace Y is 1100 on the basis of 8 temperature readings. Conduct and conclude the relevant hypothesis test at the significance level of 5%.

Solution:

H₀ : σ_x² < σ_y²
H₁ : σ_x² ≥ σ_y²

The relevant distribution is F with a numerator degrees of freedom of 9 and a denominator degrees of freedom of 7.

Since:

1600-= 1.455 is not greater than 3.677 1100

we fail to reject H₀ at α = 0.05.

⁰Go to Teaching page & experiment with F_{(ν₁,ν₂)} using the file named ‘Statistical distributions.xlsx’.

7.21 EXERCISES ___________________________________________________________

1.

Consider the hypotheses regarding two normal populations X and Y:

H₀ : σ_x² ≤ σ_y² H₁ : σ_x² > σ_y²

Sample values for X and Y are given as follows:


X:	2	8	5	4	3	7	9	6

Y:	26	24	23	25	22	27

Conduct and conclude the test at α = 0.05. Clearly state the test statistic, the distribution of test statistic and critical value(s). Find the necessary critical values from the end of your textbook or from Internet sources.

Solution: This exercise is left as self-study.

2.

Consider two populations X and Y for which a researcher has estimated the following confidence intervals given that x = 150 and ȳ = 250.

P (μx ∈ [100,∞ )) = 0.90

P (μ ∈ [−∞ ,400)) = 0.95 y

In her research report, the researcher noted that she used an N (0,1) distribution in her calculations. Based on these, calculate a 90% confidence interval for μ_x −μ_y

Solution: This exercise requires some little portion of creative thinking. As the researcher has wed the standard normal distribution in her calculations, this means σ_x² and σ_y² are both known (or given). As the given confidence intervals ’for μ_x and μ_y are one-sided, the critical values are −1.29 and 1.65, respectively. So,

1501−.21900= √σnxx- 400−250= √σy- 1.65 ny

Once these are known, estimation of a 90% C.I. for μ_x −μ_y is straightforward.

⁰Checkpoint
No: 84

7.3 p-value

p-value is defined as the tail probability of a test statistic. While conducting hypothesis tests manually, i.e. with a pencil on paper, use of a p-value is not essential, since calculation of p-value already requires a calculated test statistic. In some cases, we may need to do so, though. A p-value is especially practical when we do our analysis on a computer using a dedicated software. The rule is simple:

H₀ is rejected if p−value < α

A course-related/pedagogical warning about the p-value is that, my expectation (from students) is to see the proper use of ’test score vs critical value’ comparisons in concluding hypothesis tests rather than p−value vs α’ comparisons, unless otherwise stated. In your future/professional practice you will have full freedom to enjoy p−values.

⁰Checkpoint
No: 85

tfptc(t)

7.4 Type I and Type II errors and the Power of a hypothesis test

Despite Power is not a difficult concept to grasp intuitively, its mathematics is often confusing to students. Patiently go over the following:

As you may recall, a Type I error occurs if the researcher rejects the null hypothesis in favor of the alternative hypothesis when, in fact, the null hypothesis is true. The probability of Type I error is denoted as α. A Type II error, on the other hand, occurs if the researcher fails to reject the null hypothesis when, in fact, the null hypothesis is false. The probability of Type II error is denoted as β. So,

P (Reject H 0 | H 0 true) = α P (Fail to reject H 0 | H 0 true) = 1− α P (Fail to reject H0 | H0 false) = β

Power = 1− β = P (Fail to reject H | H false) 0 0 = P (Reject H 0 | H 0 false)

So, in plain language, Power is the ability of a test to avoid a false null hypothesis.

(As a caution, note that there is no requirement of any sort like α + β = 1)

tffβαFRtcu((AEtttIJ|H|HLEC01TT))O RCERJIETCICTAL REGION

As you may pick infinitely many alternative values for your parameter of interest, there is a multiplicity of values for Power. So, Power, is indeed a function. We often write it as a function of the difference between the alternative and hypothesized parameter values.

Ceteris Paribus in each case:

∙ If μ1− μ0 increases, Power increases

∙ If α decreases, Power decreases
In a nu2tshell
∙ If σ increases, Power decreases

∙ If n increases, Power increases
∙ Fun fact: Power is 0.50 at the critical value

As a closing remark, drawing graphs (rather than calculating) may be very useful to understand the Type II error as well as Power.

7.22 EXERCISES ___________________________________________________________

1.

In a two-sided (two-tailed) hypothesis test, the test statistic was calculated as 0.18. We know that the distribution of the test statistic (call this A distribution) has the triangular shaped union of the line segments [AB] and [BC], given A(0.00, 0.00), B(2.00, 0.50) and C(4.00, 0.00). Conclude the test at α = 0.005 by calculating and using p-values only. In your answer, clearly define what p-value is.

Solution: This exercise is left as self-study.

2.

Referring to H₀ : μ = 0 against H₁ : μ > 0 and using proper drawings of the relevant distributions, demonstrate that

i. Power of a hypothesis test gets higher as the sample size gets larger

ii. Power of a hypothesis test gets higher as population variance gets smaller

Make sure your drawings are clear and well-explained.

Solution: This exercise is left as self-study.

3.

Consider a large box which contains many white (W) and black (B) balls. We have forgotten the percentage of white balls in the box, but remember that it is either 1
3 or 2
3 . Even though we do not know the percentage of white balls in the box we strongly believe that it is (but still believe that it might be ). Hence we decide to test if the percentage of white balls in the box is . For this purpose we draw 20 balls at random with replacement and note their color.

i. What are the hypotheses of this test?

ii. If we decide to use the number of white balls as our test statistic, what is the distribution of the test statistic?

iii. What is the decision rule?

iv. If the sample you observed was:

W W W B B W B W B W B B W B W B W W W W

what would your conclusion be?

v. What is the p-value corresponding to the above sample?

vi. What is the probability of a Type I error and the probability of a Type II error?

Solution: This exercise is left as self-study.

4.

In the investigation of the average performance of produced kettles, a quality control engineer examines 49 kettles and measures the mean time to heat 1 liter of water from 25 ∘C to 100 ∘C as 75seconds. Knowing that this had a historical variance of 100seconds², he wants to test at α = 0.05 whether the population mean time is equal to 60seconds or not, as the producer’s advertisements say "1 liter in 1 minute". Help him to correct the mistakes in his statistical test report.

Solution: The hypotheses involved should be written as:

H0 : μ = 60 H1 : μ > 60

As the historical variance of temperatures is known to be 100, the researcher should use:

75−-60- z = 10 = 1.5

where the upper-critical z-value is 1.65 in this one-sided test. Since 1.5 < 1.65, we fail to reject H₀. The mean time to boil water is not longer than 60 seconds, as promised in the advertisements.

5.

A researcher investigates whether two different teaching methods yield similar impacts on learning of students. After Method 1 is used in Section 1 and Method 2 is used in Section 2 of the same course, the same final exam is given to both sections. Then the researcher forms a 95% confidence interval as [9.82,17.30] for the difference of exam grades (Section 1 grade minus Section 2 grade). Can you analyze whether there is a difference of 15points between the grades of two sections?

Solution: This exercise is left as self-study.

6.

The choice of confidence level (1− α) for statistical practices depend on the scientific/technical discipline. Referring to an economist/financial analyst (performing portfolio analysis), a computer scientist (designing and coding national payment systems), an international relations specialist (trying to avoid nuclear conflicts) and a physicist working for the CERN (searching for a very rare subatomic particle), explain how the confidence level must be chosen.

Solution: This exercise is left as self-study.

7.

We have the following information:

Researcher A tests H₁ : σ² > a against H₀ : σ² ≤ a at α = 0.05 and she uses in her report the critical value of c₁ to conduct and conclude the test, using a sample of size n₁.
Researcher B tests H₁ : σ²≠b against H₀ : σ² = b at α = 0.10 and she uses in her report the critical values of d₁ and d₂, where d₁ < d₂, to conduct and conclude the test, using a sample of size n₂.
Researcher C tries to test H₁ : σ_X² > σ_Y² against H₁ : σ_X² ≤ σ_Y² at α = 0.05, using n₂ observations of X and n₁ observations of Y. Unfortunately, he only has his own data of X and Y as well as the research reports of Researcher A and Researcher B, but he does not have a computer or any statistical tables.

Help him to find the critical value needed.

Solution: This exercise is left as self-study.

$A t random variable with m degrees of freedom, denoted t(m) is found by: Z t = ∘-2--∼ t(m) χ(mm) if the numerator and denominator are independent random vari- aIbnleas n.u Htesrhee,l clonsider the meaning of: 2 χ(m) m Previously we said χ2 should be something related to variance. Based on the definition of χ 2 , do you think the fraction above is the vari- ance of something? R(me)veal what this something is.$

[next] [prev] [prev-tail] [front] [up]

Chapter 7Hypothesis testing

7.1 Hypothesis testing: One population

7.1.1 Hypothesis testing Mean of a normal population Case: Known population variance

7.1.2 Hypothesis testing Mean of a normal population Case: Unknown population variance

7.1.3 Hypothesis testing Population proportion

7.1.4 Hypothesis testing Variance of a normal population

7.2 Hypothesis testing: Two populations

7.2.1 Hypothesis testing Difference between two normal population means Case: Dependent (matched) samples

7.2.2 Hypothesis testing Difference between two normal population means Case: Independent samples & Known population variances

7.2.3 Hypothesis testing Difference between two normal population means Case: Independent samples & Unknown yet equal population variances

7.2.4 Hypothesis testing Difference between two normal population means Case: Independent samples & Unknown and unequal population variances

7.2.5 Hypothesis testing Difference between two population proportions

7.2.6 Hypothesis testing Equality of variances of two normal populations

7.3 p-value

7.4 Type I and Type II errors and the Power of a hypothesis test

Chapter 7
Hypothesis testing

7.1.1 Hypothesis testing
Mean of a normal population
Case: Known population variance

7.1.2 Hypothesis testing
Mean of a normal population
Case: Unknown population variance

7.1.3 Hypothesis testing
Population proportion

7.1.4 Hypothesis testing
Variance of a normal population

7.2.1 Hypothesis testing
Difference between two normal population means
Case: Dependent (matched) samples

7.2.2 Hypothesis testing
Difference between two normal population means
Case: Independent samples & Known population variances

7.2.3 Hypothesis testing
Difference between two normal population means
Case: Independent samples & Unknown yet equal population variances

7.2.4 Hypothesis testing
Difference between two normal population means
Case: Independent samples & Unknown and unequal population variances

7.2.5 Hypothesis testing
Difference between two population proportions

7.2.6 Hypothesis testing
Equality of variances of two normal populations