6 Confidence intervals

Chapter 6
Confidence intervals

6.1 Confidence interval estimation: One population

At the surface, the problem of interval estimation seems to be a straightforward one. Despite this true from a computational viewpoint, not true from a more philosophical angle. We’ll be discussing these issues in our lectures. The problem of interval estimation is to find a real number interval to contain an unknown population parameter of interest with a given or chosen value of probability.

⁰Checkpoint
No: 71

Often enough, but not always, a confidence interval is symmetric around a mean. We’ll formalize our discussion after our introductory exercise. Before that exercise, note that: A confidence interval estimator for a population parameter is a rule for determining based on sample information, an interval that is likely to include the parameter. The corresponding estimate is called a confidence interval estimate.

6.1.1 Confidence interval estimation
Mean of a normal population
Case: Known population variance

Consider {xi} _i=1ⁿ a random sample of n observations from a normal population N ( )
μ,σ2 . If the sample mean is x, then a (1− α) 100% confidence interval for μ with known σ² is:

σ ¯x ±z1− α2√n-

Here,

ME = z1− α√σ-- 2 n

is called the margin or error (or sampling error),

w = 2ME

is called the width.

σ UCL = x¯+ z1− α2√--- n

is called the upper confidence limit, and

LCL = x¯− z1− α√σ-- 2 n

is called the lower confidence limit.

Note that, definitions of ME, w, UCL and LCL will not be repeated in the other cases to save some space. Think about the ways to reduce the margin of error. Is everything under your control?

6.1 EXERCISES____________________________________________

1.

A researcher wants to estimate a 95% confidence interval for the mean wage rate of workers in Ankara, for which the population variance is known to be 1000000 . She uses a sample of 64 workers and measures their mean wage rate as 7000. Calculate/estimate the confidence interval requested.

Solution: The population variance is known; so, the relevant distribution iz z & the 95% confidence interval for μ is:

α-σ-- 1000 ¯x± z1−2√ n → 7000 ± 1.96√ 64

→ [6755.005,7244.995]

⁰Checkpoint
No: 72

6.1.2 Confidence interval estimation
Mean of a normal population
Case: Unknown population variance

Consider {xi} _i=1ⁿ a random sample of n observations from a normal population N ( )
μ,σ2 . If the sample mean is x, then a (1− α) 100% confidence interval for μ with unknown σ² is:

¯x± t α√s-- n−1,1−2 n

where,

√ -- ∘ --n--------2- s = s2 = ∑-i=1-(xi−-x¯)- n− 1

is the sample standard deviation. Go over the description of t-distribution in Chapter 10.

⁰Go to Teaching page & using the file named ‘Confidence intervals.xlsx’, experiment on the worksheet ‘t vs z’ to observe what happens as the degrees of freedom increases.

6.2 EXERCISES____________________________________________

1.

A researcher wants to estimate a 95% confidence interval for the mean wage rate of workers in Ankara, for which the population variance is unknown. She uses a sample of 64 workers and measures their mean wage rate as 7000 , ’sample variance’ being calculated as 640000. Calculate/estimate the confidence interval requested.

Solution: The population variance is unknown; so, the relevant distribution iz t, the degrees of freedom is 63 & the 95% confidence interval for μ is:

¯x± tn−1,1−α√s--→ 7000 ± 1.998√800- 2 n 64

→ [6800.166,7199.834]

6.1.3 Confidence interval estimation
Population proportion

Consider {xi} _i=1ⁿ, a random sample of n observations from a Bernoulli (P) population. Notice that each x_i is either 1 (success) or 0 (failure) . xin this case is nothing but the observed proportion of successed, denoted as . Then, if n (1 − ̂p) , a (1 − α) 100%confidence interval for p is:

∘ -------- ̂p± z1−α p̂(1−-̂p) 2 n

6.3 EXERCISES ___________________________________________________________

1.

A political candidate wants to know her nationwide support rate. Among a sample of 64 people, we know 35 support the political candidate. Calculate/estimate a 95% confidence interval for the candidate’s nation-wide support rate.

Solution: The relevant distribution is z.

̂p = 35 = 0.547 64

The 95% confidence interval for P is:

∘ -------- ∘--------------- p̂± z α ̂p(1−-p̂)-→ 0.547± 1.96 0.547(1−-0.547) 1− 2 n 64

→ [0.425,0.669]

6.1.4 Confidence interval estimation
Variance of a normal population

Consider {xi} _i=1ⁿ, a random sample of n observations from a normal population N ( )
μ,σ2 . If the observed sample variance is s², then a (1 − α) 100% confidence interval for σ² is:

⌊ ⌋ ⌈ (n−-1)s2 (n−-1)s2⌉ χ2 α, χ2 α n−1,1− 2 n− 1,2

where

2 ∑ni=1(xi−-¯x)2 s = n − 1

is the sample variance. Go over the description of χ²-distribution in Chapter 10.

⁰Go to Teaching page & experiment with χ_n² distribution using the file named ‘Statistical distributions.xlsx’.

6.4 EXERCISES ___________________________________________________________

1.

A process engineer is concerned with the variation of temperature in an industrial furnace. She collects a random sample of temperatures as:

975 1075 1050 900 1000 950 1025 1050 975∘C

Calculate/estimate a 95% confidence interval for the (population) variance of temperatures in this furnace.

Solution: s² for the 9 temperature readings is 3125, the relevant distribution is χ² and the degrees of freedom is 8. The 95% confidence interval for σ² is:

⌊ ⌋ (n − 1)s2 (n − 1)s2 [(9 − 1)3125 (9− 1)3125] ⌈--2----- ,--2-----⌉ → ---17.535-- ,---2.180--- χn−1,1−α2 χn−1,α2

→ [1425.757,11469.306]

⁰Checkpoint
No: 73

6.2 Finite populations and correction

When n << N, our procedures work seamlessly. However, when n is considerably high, i.e.

1 n > --N 20

we need to use a factor of

N-−-n- N − 1

to correct the relevant variances involved. This factor is called the finite population correction (fpc) factor. Observe that fpc = 1 for n = 1 and fpc = 0 for n = N in a very intuitive manner.

6.3 Sample size determination

Mean of a normally distributed population, known population variance:

z21−α2σ2 n = ME 2

Population proportion:

2 0.25z1−α2- n = ME 2

6.5 EXERCISES ___________________________________________________________

1.

A random sample, of size 9, from a normally distributed population, with variance 9, yielded as sample mean of 7 and a sample variance of 4.

i. Construct a 90% confidence interval for the population mean of the population that the sample is taken from.

ii. What is the probability of a random sample of size 9 yielding a sample variance of 4 or less, given that the population variance is 9?

iii. What is the minimum sample size required if we would like the 90% confidence interval to be at most of length 2?

Solution: This question is under maintenance.

2.

A random sample 100 consumers where asked if they made their purchasing decisions based on price or based on quality. 64% of the consumers in the sample stated that they mainly base their buying decisions on price. Based on this information, construct a 95% confidence interval for the percentage of consumers in the population who base their buying decision on price.

Solution: n = 100, = 0.64 are given. A 95% confidence interval for P is:

∘ -------- ̂p∓ zc ̂p(1−-p̂)- ∘n--------- 0.64-⋅0.36- → 0.64∓ 1.96 100 → 0.64∓ 1.96⋅0.048 → 0.64∓ 0.0941.

So, P(P ∈ [0.5459, 0.7341]) = 0.95.

3.

Two statisticians, using the same sample data reported the following different confidence intervals for the population mean: [3,5] and [2,6] . Given that they used the same sample and based their confidence interval on same estimators (the sample mean and sample variance), what is the source of the difference in the confidence interval?

Solution: This exercise is left as self-study.

4.

A researcher has a strange habit of using a sample size of √ --
N where N is the size of the population of interest. Under which value of N does the researcher need to use a correction factor for the standard deviation of the sampling distribution while estimating a CI for μ?

Solution: This exercise is left as self-study.

6.4 Confidence interval estimation: Two populations

In scientific research, we often need to compare selected parameters of two populations, rather than only comparing to a single population parameter to a given value. Despite the problem gets slightly complicated, essence of the problem is unchanged. So, while considering the confidence interval estimation and hypothesis testing problems involving two populations, we’ll first maintain a mechanical approach in what follows. Through the following pages, notice

That we use a more shorthand notation
That we use two samples every time:

_i=1^n_x ⊂_i=1^N_x
_i=1^n_y ⊂_i=1^N_y

where n_x and n_y are their respective sample sizes.

⁰Checkpoint
No: 74

6.4.1 Confidence interval estimation
Difference between two normal population means
Case: Dependent (matched) samples

Let {x}
i _i=1ⁿ and {y }
i _i=1ⁿ be two matched samples. We can then creat:

n {di}i=1wheredi = xi− yi,∀i

Then, a (1 − α) 100% confidence interval for μ_d = μ_x −μ_y is:

s ¯d± tn−1,1−α2√-d- n

where

∘ ----(-----)-- ∘ -2 ∑ni=1-di−-¯d-2 sd = sd = n − 1

and,

¯ ∑ni=-1di d = n

6.6 EXERCISES____________________________________________

1.

A company is about to release a new drug to assist weight loss, and we are in charge of assessing how effective the drug is. We pick a random sample of 8 people with the following pre-drug body weights:

90,95,105,95,110,85,100,90

After using the drug for the designated test duration, the post-drug body weights are measured as:

85,80,110,90,110,80,95,90

Calculate/estimate a 95% confidence interval for the pre-drug minus post-drug difference of mean body weights. Is the drug effective?

Solution: The difference series (pre-drug minus post-drug) is:

+5,+ 15,− 5,+5,0,+5,+ 5,0

The relevant distribution is t, the degrees of freedom is 7 and the 95% confidence interval for μ_x −μ_y is:

d¯± t αs√d-→ 3.750 ± 2.3655.8√24- n−1,1− 2 n 8

→ [− 1.120,8.620]

6.4.2 Confidence interval estimation
Difference between two normal population means
Case: Independent samples & Known population variances

Let,

{xi} _i=1^n_x ⊂ {xi} _i=1^N_x ∼ N (μx,σ2)
x
{yi} _i=1^n_y ⊂ {yi} _i=1^N_y ∼ N ( )
μy,σ2y

where σ_x² and σ_y² are known. Then a (1− α) 100% confidence interval for μ_x −μ_y is:

∘ --2--σ2- x¯− ¯y± z1−α σx-+ -y 2 nx ny

6.7 EXERCISES ___________________________________________________________

1.

A researcher wants to compare the mean wages of workers in Ankara and Istanbul. She has the following data and information: Mean wage rate of 49 workers from Ankara is 6000. Mean wage rate of 81 workers from Istanbul is 7000. Population variance of wages in Ankara and Istanbul are known to be 640000 and 810000, respectively. Calculate/ estimate a 95% confidence interval for the difference of (population) means of wages in Ankara and Istanbul.

Solution: The relevant distribution is z and the 95% confidence interval for μ_x −μ_y is:

∘ -------- σ2 σ2 ∘ -640000---810000 ¯x− y¯± z1− α2 -x+ -y→ 6000− 7000± 1.96 ------+ ------ nx ny 49 81

→ [− 1297.639,−702.361]

6.4.3 Confidence interval estimation
Difference between two normal population means
Case: Independent samples & Unknown yet equal population variances

Let,

{xi} _i=1^n_x ⊂ {xi} _i=1^N_x ∼ N ( )
μx,σx2
{y}
i _i=1^n_y ⊂ {y}
i _i=1^N_y ∼ N ( )
μy,σ2
y

where σ_x² and σ_y² are unkown but assumed to be equal. Then a 1− α
( ) 100% confidence interval for μ_x −μ_y is:

∘ -------- s2p s2p x¯− ¯y± tnx+ny−2,1−α2 n-+ n- x y

In our formulation:

(n − 1)s2+ (n − 1)s2 s2p =--x------x----y-----y nx+ ny −2

and

nx 2 s2= ∑i=1(xi−-¯x)- x nx− 1

ny 2 s2= ∑i=1(yi−-¯y)- y ny− 1

6.8 EXERCISES____________________________________________

1.

A researcher wants to compare the mean wages of workers in Ankara and Istanbul. She has the following data and information: Mean wage rate of 49 workers from Ankara is 6000. Mean wage rate of 81 workers from Istanbul is 7000. Population variances of wages in Ankara and Istanbul are unknown but they are assumed to be equal. Sample variance of wages in Ankara and Istanbul are calculated as 490000 and 640000, respectively. Calculate/ estimate a 95% confidence interval for the difference of (population) means of wages in Ankara and Istanbul.

Solution:

2 ( ) 2 s2p = (nx-−-1)sx+-ny−-1-sy nx+ ny −2

s2 = (49−-1)490000+-(81−-1)640000-→ s2 = 583750 p 49 + 81− 2 p

The relevant distribution is t, the degrees of freedom is 128 and the 95% confidence interval for μ_x −μ_y is:

∘ -------- x¯− ¯y± t α s2p+ -s2p nx+ny−2,1−2 nx ny

∘ --------------- → 6000− 7000± 1.979 583750+ 583750 49 81

→ [− 1273.601,−726.399]

6.4.4 Confidence interval estimation
Difference between two normal population means
Case: Independent samples & Unknown and unequal population variances

{xi} _i=1^n_x ⊂ {xi} _i=1^N_x ∼ N ( 2)
μx,σx
{yi} _i=1^n_y ⊂ {yi} _i=1^N_y ∼ N ( )
μy,σ2y

where σ_x² and σ_y² are unknown and assumed not to be equal. Then a (1 − α) 100% confidence interval for μ_x −μ_y is:

∘ ------2- ¯x− y¯± t α s2x+ sy v,1− 2 nx ny

In our formulation:

(( ) ( ) )2 -s2x + -s2y v = --nx-----ny----- (-s2x)2 (s2y)2 -nx--+ -ny-- nx− 1 ny−1

Notice that, if n_x = n_y = n

( ) v = |(1 + --2---|) (n− 1) s2x+ s2y s2y s2x

6.9 EXERCISES ___________________________________________________________

1.

A researcher wants to compare the mean wages of workers in Ankara and Istanbul. She has the following data and information: Mean wage rate of 49 workers from Ankara is 6000. Mean wage rate of 81 workers from Istanbul is 7000. Population variances of wages in Ankara and Istanbul are unknown and they are assumed to be unequal. Sample variance of wages in Ankara and Istanbul are calculated as 490000 and 640000, respectively. Calculate/ estimate a 95% confidence interval for the difference of (population) means of wages in Ankara and Istanbul.

Solution: The relevant distribution is t and the degrees of freedom is ν:

( ( ) ) (-s2x) -s2y 2 nx + ny v = -(-s2)2---(s2y)2-- -nxx-- -ny-- nx− 1 + ny−1

((490000) ( 640000)) 2 ----49---+----81----- → v = (49040090)2 (64080010)2 → 112 -49−1--+ -81−1--

The 95% confidence interval for μ_x −μ_y is:

∘-------- ∘ --------------- s2x s2y 490000 640000 ¯x− y¯± tv,1− α2 nx + ny → 6000− 7000± 1.982 -49---+ --81--

→ [− 1265.125,−734.875]

6.4.5 Confidence interval estimation
Difference between two population proportions

Let,

{xi} _i=1^n_x ⊂ {xi} _i=1^N_x ∼ Bernoulli (Px)
{yi} _i=1^n_y ⊂ {yi} _i=1^N_y ∼ Bernoulli ( )
Py

Then, a (1− α) 100% confidence interval for p_x −p_y is:

∘ --------------(------) p̂x − ̂py± z1− α ̂px(1−-p̂x)-+ ̂py-1-−-̂py- 2 nx ny

6.10 EXERCISES__________________________________________

1.

A political candidate wonders how her support rate in Ankara and Istanbul compares. We know that among 64 people from Ankara 35 supports the candidate and among 81 people from Istanbul 45 supports the candidate. Calculate estimate a 95% confidence interval for the difference of population support rates in Ankara and Istanbul.

Solution: _x = 35/64 = 0.547 and _y = 45/81 = 0.556 and the 95% confidence interval for P_x −P_y is:

∘ --------------(------) p̂x − ̂py± z1− α ̂px(1−-p̂x)-+ ̂py-1-−-̂py- 2 nx ny

∘ ------------------------------- → 0.547 − 0.556± 1.96 0.547(1-−0.547)+ 0.556(1−-0.556) 64 81

→ [− 0.172,0.154]

6.11 EXERCISES__________________________________________

1.

Journal A of city X reports a 90% CI for the population mean income in city X (μx) as [3400,6400] and Journal B in city Y ( )
μy as [3800,6800] . Each journal notes that the population is distibuted normally with a known variance. The journals report the odds for the approval of Mr. Doe, a political candidate, in their respective cities of X and Y as 750/500 and 80/60 where the fractions are the (sample count of approvals/sample count of disapprovals).

i. Estimate a 95% CI for μ_x

ii. Estimate a 99% CI for μ_x −μ_y

iii. Test: H₀ : μ_x −μ_y = 0 againts H₁ : μ_x −μ_y < 0 at α = 0.05

iv. Estimate a 90% CI for the popularity (share of approvals) of Mr. Doe in city X (px)

v. Estimate a 95% CI for p_x −p_y

Interpret your result clearly in each case.

Solution: To come up with solutions to parts (i) to (v), we need to find/ calculate x, ȳ, _x, _y, n_x and n_y from the given information. In that,

3400+ 6400 x¯= -----2---- = 4900 3800+ 6800 y¯= ---------- = 5300 σ 49200 − 3400 √x--= ---------- = 909.1 nx 1.65 √σy--= 5300-−-3800 = 909.1 ny 1.65

These are sufficient to solve parts (i), (ii) and (iii). To solve (iv) and (v), we need the following:

---750--- p̂x = 750+ 500 = 0.60,nx = 1250 --80--- p̂y = 80+ 60 = 0.57, ny = 140

Notice that, n_x and n_y are not to be necessarily and explicitly known while solving parts (i), (ii) and (iii).

2.

A researcher wants to test whether three populations’ means are equal to each other; i.e. whether (A) H₀ : μ₁ = μ₂ = μ₃. She picks α as 0.10 and collects data from each population. She, then, tests separately (one at a time):

(B) H₀ : μ₁ = μ₂ (C) H₀ : μ₁ = μ₃ (D) H₀ : μ₂ = μ₃

against their two-sided alternatives. She fails to reject H₀ every time at α = 0.10. So, she concludes: Failure to reject H₀ in all B, C and D at α = 0.10 is equivalent to failure to reject H₀ in A at α = 0.10; so, all three means are equal with a confidence of 90%.

Explain why her conclusion is wrong.

Solution: This question requires the execution of (1) obtaining the confidence levels for the tests A, B, and C, (2) noticing that these confidence levels are nothing but simple probabilities, (3) multiplying the individual confidence levels to obtain the joint confidence level, (4) subtracting the joint confidence level from 1 to find the joint significance level (call it α^′ ). In that,

′ 3 α = 1− (1− α)

and for α = 0.10, α^′ becomes 0.271, so ’straightforwardly joining/merging the conclusions of separate tests of hypotheses’ is not allowed in out professional practice.

$A t random variable with m degrees of freedom, denoted t(m) is found by: t = ∘-Z--∼ t χ2(m) (m) -m- if the numerator and denominator are independent random vari- In a nutshell ables. Here, consider the meaning of: χ2(m) -m- 2 Previously we said χ should be something related to variance. Based on the definition of χ 2(m), do you think the fraction above is the vari- ance of something? Reveal what this something is.$

⁰Checkpoint
No: 75

[next] [prev] [prev-tail] [front] [up]

Chapter 6Confidence intervals

6.1 Confidence interval estimation: One population

6.1.1 Confidence interval estimation Mean of a normal population Case: Known population variance

6.1.2 Confidence interval estimation Mean of a normal population Case: Unknown population variance

6.1.3 Confidence interval estimation Population proportion

6.1.4 Confidence interval estimation Variance of a normal population

6.2 Finite populations and correction

6.3 Sample size determination

6.4 Confidence interval estimation: Two populations

6.4.1 Confidence interval estimation Difference between two normal population means Case: Dependent (matched) samples

6.4.2 Confidence interval estimation Difference between two normal population means Case: Independent samples & Known population variances

6.4.3 Confidence interval estimation Difference between two normal population means Case: Independent samples & Unknown yet equal population variances

6.4.4 Confidence interval estimation Difference between two normal population means Case: Independent samples & Unknown and unequal population variances

6.4.5 Confidence interval estimation Difference between two population proportions

Chapter 6
Confidence intervals

6.1.1 Confidence interval estimation
Mean of a normal population
Case: Known population variance

6.1.2 Confidence interval estimation
Mean of a normal population
Case: Unknown population variance

6.1.3 Confidence interval estimation
Population proportion

6.1.4 Confidence interval estimation
Variance of a normal population

6.4.1 Confidence interval estimation
Difference between two normal population means
Case: Dependent (matched) samples

6.4.2 Confidence interval estimation
Difference between two normal population means
Case: Independent samples & Known population variances

6.4.3 Confidence interval estimation
Difference between two normal population means
Case: Independent samples & Unknown yet equal population variances

6.4.4 Confidence interval estimation
Difference between two normal population means
Case: Independent samples & Unknown and unequal population variances

6.4.5 Confidence interval estimation
Difference between two population proportions