Chapter 6
Confidence intervals

6.1 Confidence interval estimation: One population

At the surface, the problem of interval estimation seems to be a straightforward one. Despite this true from a computational viewpoint, not true from a more philosophical angle. We’ll be discussing these issues in our lectures. The problem of interval estimation is to find a real number interval to contain an unknown population parameter of interest with a given or chosen value of probability.

Interval estimation of μ: Construction of the problem
                        P (L ≤ μ ≤ U ) = 1− α

                 P (μ− K ≤ μ ≤ μ+ K ) = 1− α


where L = μ − K and U = μ + K.

                    P(−μ − K ≤ − μ ≤ − μ+ K ) = 1− α
            P(¯xn− μ − K ≤ ¯xn− μ ≤ ¯xn− μ+ K ) = 1− α
In a nutshell ¯x − μ− K   ¯x − μ   x¯ − μ+ K
          P (-n--√----≤  -n√---≤ -n---√----) = 1− α
            ◟--σ/◝◜-n-◞   σ◟/◝◜n◞  ◟-σ-/◝◜n--◞
               −zc (1)      z        zc (2)
Considering (1) and (2) simultaneously → K = zc√σ-. Then,
            (                             )  n

          P (  μ  −zc√σ--≤ μ ≤  μ  +zc √σ-) = 1− α
              ◟◝◜◞     n       ◟◝◜◞      n
              Subs.¯x(            Subs.x¯      )
                P  x¯− zc√ σ-≤ μ ≤ ¯x+ zc√σ-- = 1− α
                          n              n

0Checkpoint
No: 71

Often enough, but not always, a confidence interval is symmetric around a mean. We’ll formalize our discussion after our introductory exercise. Before that exercise, note that: A confidence interval estimator for a population parameter is a rule for determining based on sample information, an interval that is likely to include the parameter. The corresponding estimate is called a confidence interval estimate.

6.1.1 Confidence interval estimation
Mean of a normal population
Case: Known population variance

Consider {xi}i=1n a random sample of n observations from a normal population N(    )
 μ,σ2. If the sample mean is x, then a (1− α) 100% confidence interval for μ with known σ2 is:

         σ
¯x ±z1− α2√n-

Here,

ME  = z1− α√σ--
         2  n

is called the margin or error (or sampling error),

w = 2ME

is called the width.

               σ
UCL = x¯+ z1− α2√---
                n

is called the upper confidence limit, and

LCL = x¯− z1− α√σ--
             2  n

is called the lower confidence limit.

Note that, definitions of ME, w, UCL and LCL will not be repeated in the other cases to save some space. Think about the ways to reduce the margin of error. Is everything under your control?

6.1 EXERCISES____________________________________________     

1. 

A researcher wants to estimate a 95% confidence interval for the mean wage rate of workers in Ankara, for which the population variance is known to be 1000000 . She uses a sample of 64 workers and measures their mean wage rate as 7000. Calculate/estimate the confidence interval requested.

Solution: The population variance is known; so, the relevant distribution iz z & the 95% confidence interval for μ is:

      α-σ--            1000
¯x± z1−2√ n → 7000 ± 1.96√ 64

→  [6755.005,7244.995]

0Checkpoint
No: 72

6.1.2 Confidence interval estimation
Mean of a normal population
Case: Unknown population variance

Consider {xi}i=1n a random sample of n observations from a normal population N(    )
 μ,σ2. If the sample mean is x, then a (1− α) 100% confidence interval for μ with unknown σ2 is:

¯x± t     α√s--
    n−1,1−2  n

where,

   √ --  ∘ --n--------2-
s =  s2 =  ∑-i=1-(xi−-x¯)-
               n− 1

is the sample standard deviation. Go over the description of t-distribution in Chapter 10.

0Go to Teaching page & using the file named ‘Confidence intervals.xlsx’, experiment on the worksheet ‘t vs z’ to observe what happens as the degrees of freedom increases.

6.2 EXERCISES____________________________________________     

1. 

A researcher wants to estimate a 95% confidence interval for the mean wage rate of workers in Ankara, for which the population variance is unknown. She uses a sample of 64 workers and measures their mean wage rate as 7000 , ’sample variance’ being calculated as 640000. Calculate/estimate the confidence interval requested.

Solution: The population variance is unknown; so, the relevant distribution iz t, the degrees of freedom is 63 & the 95% confidence interval for μ is:

¯x± tn−1,1−α√s--→  7000 ± 1.998√800-
         2  n                64

→  [6800.166,7199.834]

6.1.3 Confidence interval estimation
Population proportion

Consider {xi}i=1n, a random sample of n observations from a Bernoulli(P) population. Notice that each xi is either 1(success) or 0(failure). xin this case is nothing but the observed proportion of successed, denoted as ̂p. Then, if n̂p(1 − ̂p), a (1 − α) 100%confidence interval for p is:

       ∘ --------
̂p± z1−α  p̂(1−-̂p)
      2      n

6.3 EXERCISES ___________________________________________________________     

1. 

A political candidate wants to know her nationwide support rate. Among a sample of 64 people, we know 35 support the political candidate. Calculate/estimate a 95% confidence interval for the candidate’s nation-wide support rate.

Solution: The relevant distribution is z.

̂p = 35 = 0.547
    64

The 95% confidence interval for P is:

        ∘ --------             ∘---------------
p̂± z  α  ̂p(1−-p̂)-→  0.547± 1.96  0.547(1−-0.547)
    1− 2     n                         64

→ [0.425,0.669]

6.1.4 Confidence interval estimation
Variance of a normal population

Consider {xi}i=1n, a random sample of n observations from a normal population N(    )
 μ,σ2. If the observed sample variance is s2, then a (1 − α) 100% confidence interval for σ2 is:

⌊                  ⌋
⌈ (n−-1)s2 (n−-1)s2⌉
  χ2     α, χ2   α
   n−1,1− 2   n− 1,2

where

 2   ∑ni=1(xi−-¯x)2
s =     n − 1

is the sample variance. Go over the description of χ2-distribution in Chapter 10.

0Go to Teaching page & experiment with χn2 distribution using the file named ‘Statistical distributions.xlsx’.

6.4 EXERCISES ___________________________________________________________     

1. 

A process engineer is concerned with the variation of temperature in an industrial furnace. She collects a random sample of temperatures as:

 975   1075  1050  900
 1000   950  1025  1050
975∘C

Calculate/estimate a 95% confidence interval for the (population) variance of temperatures in this furnace.

Solution: s2 for the 9 temperature readings is 3125, the relevant distribution is χ2 and the degrees of freedom is 8. The 95% confidence interval for σ2 is:

⌊                  ⌋
 (n − 1)s2 (n − 1)s2    [(9 − 1)3125 (9− 1)3125]
⌈--2----- ,--2-----⌉ →  ---17.535-- ,---2.180---
  χn−1,1−α2   χn−1,α2

→ [1425.757,11469.306]

0Checkpoint
No: 73

6.2 Finite populations and correction

When n << N, our procedures work seamlessly. However, when n is considerably high, i.e.

    1
n > --N
    20

we need to use a factor of

N-−-n-
N − 1

to correct the relevant variances involved. This factor is called the finite population correction (fpc) factor. Observe that fpc = 1 for n = 1 and fpc = 0 for n = N in a very intuitive manner.

6.3 Sample size determination

Mean of a normally distributed population, known population variance:

    z21−α2σ2
n =  ME 2

Population proportion:

        2
    0.25z1−α2-
n =  ME 2

6.5 EXERCISES ___________________________________________________________     

1. 

A random sample, of size 9, from a normally distributed population, with variance 9, yielded as sample mean of 7 and a sample variance of 4.

i. Construct a 90% confidence interval for the population mean of the population that the sample is taken from.

ii. What is the probability of a random sample of size 9 yielding a sample variance of 4 or less, given that the population variance is 9?

iii. What is the minimum sample size required if we would like the 90% confidence interval to be at most of length 2?

Solution: This question is under maintenance.

2. 

A random sample 100 consumers where asked if they made their purchasing decisions based on price or based on quality. 64% of the consumers in the sample stated that they mainly base their buying decisions on price. Based on this information, construct a 95% confidence interval for the percentage of consumers in the population who base their buying decision on price.

Solution: n = 100, p̂ = 0.64 are given. A 95% confidence interval for P is:

       ∘ --------
  ̂p∓ zc  ̂p(1−-p̂)-
           ∘n---------
             0.64-⋅0.36-
→ 0.64∓ 1.96     100
→ 0.64∓ 1.96⋅0.048

→ 0.64∓ 0.0941.

So, P(P [0.5459, 0.7341]) = 0.95.

3. 

Two statisticians, using the same sample data reported the following different confidence intervals for the population mean: [3,5] and [2,6]. Given that they used the same sample and based their confidence interval on same estimators (the sample mean and sample variance), what is the source of the difference in the confidence interval?

Solution: This exercise is left as self-study.

4. 

A researcher has a strange habit of using a sample size of √ --
  N where N is the size of the population of interest. Under which value of N does the researcher need to use a correction factor for the standard deviation of the sampling distribution while estimating a CI for μ?

Solution: This exercise is left as self-study.

6.4 Confidence interval estimation: Two populations

In scientific research, we often need to compare selected parameters of two populations, rather than only comparing to a single population parameter to a given value. Despite the problem gets slightly complicated, essence of the problem is unchanged. So, while considering the confidence interval estimation and hypothesis testing problems involving two populations, we’ll first maintain a mechanical approach in what follows. Through the following pages, notice

0Checkpoint
No: 74

6.4.1 Confidence interval estimation
Difference between two normal population means
Case: Dependent (matched) samples

Let {x}
  ii=1n and {y }
  ii=1n be two matched samples. We can then creat:

   n
{di}i=1wheredi = xi− yi,∀i

Then, a (1 − α) 100% confidence interval for μd = μx μy is:

           s
¯d± tn−1,1−α2√-d-
            n

where

           ∘ ----(-----)--
    ∘ -2     ∑ni=1-di−-¯d-2
sd =  sd =      n − 1

and,

 ¯  ∑ni=-1di
d =    n

6.6 EXERCISES____________________________________________     

1. 

A company is about to release a new drug to assist weight loss, and we are in charge of assessing how effective the drug is. We pick a random sample of 8 people with the following pre-drug body weights:

90,95,105,95,110,85,100,90

After using the drug for the designated test duration, the post-drug body weights are measured as:

85,80,110,90,110,80,95,90

Calculate/estimate a 95% confidence interval for the pre-drug minus post-drug difference of mean body weights. Is the drug effective?

Solution: The difference series (pre-drug minus post-drug) is:

+5,+ 15,− 5,+5,0,+5,+ 5,0

The relevant distribution is t, the degrees of freedom is 7 and the 95% confidence interval for μx μy is:

d¯± t     αs√d-→  3.750 ± 2.3655.8√24-
    n−1,1− 2  n                 8

→  [− 1.120,8.620]

6.4.2 Confidence interval estimation
Difference between two normal population means
Case: Independent samples & Known population variances

Let,

{xi}i=1nx {xi}i=1Nx N(μx,σ2)
    x
{yi}i=1ny {yi}i=1Ny N(     )
 μy,σ2y

where σx2 and σy2 are known. Then a (1− α) 100% confidence interval for μx μy is:

           ∘ --2--σ2-
x¯− ¯y± z1−α  σx-+ -y
          2  nx   ny

6.7 EXERCISES ___________________________________________________________     

1. 

A researcher wants to compare the mean wages of workers in Ankara and Istanbul. She has the following data and information: Mean wage rate of 49 workers from Ankara is 6000. Mean wage rate of 81 workers from Istanbul is 7000. Population variance of wages in Ankara and Istanbul are known to be 640000 and 810000, respectively. Calculate/ estimate a 95% confidence interval for the difference of (population) means of wages in Ankara and Istanbul.

Solution: The relevant distribution is z and the 95% confidence interval for μx μy is:

           ∘ --------
             σ2   σ2                  ∘ -640000---810000
¯x− y¯± z1− α2  -x+  -y→  6000− 7000± 1.96   ------+ ------
             nx   ny                       49       81

→ [− 1297.639,−702.361]

6.4.3 Confidence interval estimation
Difference between two normal population means
Case: Independent samples & Unknown yet equal population variances

Let,

{xi}i=1nx {xi}i=1Nx N(    )
 μx,σx2
{y}
  ii=1ny {y}
  ii=1Ny N(     )
 μy,σ2
     y

where σx2 and σy2 are unkown but assumed to be equal. Then a  1− α
(    ) 100% confidence interval for μx μy is:

                 ∘ --------
                    s2p   s2p
x¯− ¯y± tnx+ny−2,1−α2   n-+ n-
                     x   y

In our formulation:

    (n  − 1)s2+ (n − 1)s2
s2p =--x------x----y-----y
          nx+ ny −2

and

      nx        2
s2=  ∑i=1(xi−-¯x)-
 x      nx− 1

      ny        2
s2=  ∑i=1(yi−-¯y)-
 y      ny− 1

6.8 EXERCISES____________________________________________     

1. 

A researcher wants to compare the mean wages of workers in Ankara and Istanbul. She has the following data and information: Mean wage rate of 49 workers from Ankara is 6000. Mean wage rate of 81 workers from Istanbul is 7000. Population variances of wages in Ankara and Istanbul are unknown but they are assumed to be equal. Sample variance of wages in Ankara and Istanbul are calculated as 490000 and 640000, respectively. Calculate/ estimate a 95% confidence interval for the difference of (population) means of wages in Ankara and Istanbul.

Solution:

             2  (     ) 2
s2p = (nx-−-1)sx+-ny−-1-sy
          nx+ ny −2

s2 = (49−-1)490000+-(81−-1)640000-→ s2 = 583750
 p            49 + 81− 2             p

The relevant distribution is t, the degrees of freedom is 128 and the 95% confidence interval for μx μy is:

                 ∘ --------
x¯− ¯y± t        α   s2p+ -s2p
        nx+ny−2,1−2   nx  ny

                   ∘ ---------------
→ 6000− 7000± 1.979  583750+ 583750
                       49      81

→ [− 1273.601,−726.399]

6.4.4 Confidence interval estimation
Difference between two normal population means
Case: Independent samples & Unknown and unequal population variances

{xi}i=1nx {xi}i=1Nx N(    2)
 μx,σx
{yi}i=1ny {yi}i=1Ny N(     )
 μy,σ2y

where σx2 and σy2 are unknown and assumed not to be equal. Then a (1 − α) 100% confidence interval for μx μy is:

            ∘ ------2-
¯x− y¯± t   α  s2x+  sy
       v,1− 2  nx   ny

In our formulation:

    ((  )   (  ) )2
      -s2x  +  -s2y
v = --nx-----ny-----
     (-s2x)2   (s2y)2
     -nx--+  -ny--
      nx− 1   ny−1

Notice that, if nx = ny = n

    (          )

v = |(1 + --2---|) (n− 1)
         s2x+ s2y
         s2y  s2x

6.9 EXERCISES ___________________________________________________________     

1. 

A researcher wants to compare the mean wages of workers in Ankara and Istanbul. She has the following data and information: Mean wage rate of 49 workers from Ankara is 6000. Mean wage rate of 81 workers from Istanbul is 7000. Population variances of wages in Ankara and Istanbul are unknown and they are assumed to be unequal. Sample variance of wages in Ankara and Istanbul are calculated as 490000 and 640000, respectively. Calculate/ estimate a 95% confidence interval for the difference of (population) means of wages in Ankara and Istanbul.

Solution: The relevant distribution is t and the degrees of freedom is ν:

    (       (  ) )
     (-s2x)    -s2y   2
      nx  +  ny
v = -(-s2)2---(s2y)2--
     -nxx--   -ny--
      nx− 1 + ny−1

       ((490000)  ( 640000)) 2
       ----49---+----81-----
→  v =   (49040090)2  (64080010)2   →  112
         -49−1--+ -81−1--

The 95% confidence interval for μx μy is:

            ∘--------                   ∘ ---------------
              s2x   s2y                       490000   640000
¯x− y¯± tv,1− α2  nx + ny → 6000− 7000± 1.982   -49---+ --81--

→ [− 1265.125,−734.875]

6.4.5 Confidence interval estimation
Difference between two population proportions

Let,

{xi}i=1nx {xi}i=1Nx Bernoulli(Px)
{yi}i=1ny {yi}i=1Ny Bernoulli(  )
 Py

Then, a (1− α) 100% confidence interval for px py is:

             ∘ --------------(------)
p̂x − ̂py± z1− α  ̂px(1−-p̂x)-+ ̂py-1-−-̂py-
            2      nx          ny

6.10 EXERCISES__________________________________________     

1. 

A political candidate wonders how her support rate in Ankara and Istanbul compares. We know that among 64 people from Ankara 35 supports the candidate and among 81 people from Istanbul 45 supports the candidate. Calculate estimate a 95% confidence interval for the difference of population support rates in Ankara and Istanbul.

Solution: ̂px = 35/64 = 0.547 and ̂py = 45/81 = 0.556 and the 95% confidence interval for Px Py is:

             ∘ --------------(------)
p̂x − ̂py± z1− α  ̂px(1−-p̂x)-+ ̂py-1-−-̂py-
            2      nx          ny

                    ∘ -------------------------------
→  0.547 − 0.556± 1.96  0.547(1-−0.547)+ 0.556(1−-0.556)
                            64              81

→  [− 0.172,0.154]

6.11 EXERCISES__________________________________________     

1. 

Journal A of city X reports a 90% CI for the population mean income in city X(μx) as [3400,6400] and Journal B in city Y(  )
 μy as [3800,6800]. Each journal notes that the population is distibuted normally with a known variance. The journals report the odds for the approval of Mr. Doe, a political candidate, in their respective cities of X and Y as 750/500 and 80/60 where the fractions are the (sample count of approvals/sample count of disapprovals).

i. Estimate a 95% CI for μx

ii. Estimate a 99% CI for μx μy

iii. Test: H0 : μx μy = 0 againts H1 : μx μy < 0 at α = 0.05

iv. Estimate a 90% CI for the popularity (share of approvals) of Mr. Doe in city X(px)

v. Estimate a 95% CI for px py

Interpret your result clearly in each case.

Solution: To come up with solutions to parts (i) to (v), we need to find/ calculate x, ȳ, ̂px, ̂py, nx and ny from the given information. In that,

    3400+ 6400
x¯= -----2---- = 4900
    3800+ 6800
y¯= ---------- = 5300
  σ    49200 − 3400
 √x--= ---------- = 909.1
  nx       1.65
 √σy--= 5300-−-3800 = 909.1
  ny       1.65

These are sufficient to solve parts (i), (ii) and (iii). To solve (iv) and (v), we need the following:

     ---750---
p̂x = 750+ 500 = 0.60,nx = 1250
     --80---
p̂y = 80+ 60 = 0.57,  ny = 140

Notice that, nx and ny are not to be necessarily and explicitly known while solving parts (i), (ii) and (iii).

2. 

A researcher wants to test whether three populations’ means are equal to each other; i.e. whether (A) H0 : μ1 = μ2 = μ3. She picks α as 0.10 and collects data from each population. She, then, tests separately (one at a time):

(B) H0 : μ1 = μ2 (C) H0 : μ1 = μ3 (D) H0 : μ2 = μ3

against their two-sided alternatives. She fails to reject H0 every time at α = 0.10. So, she concludes: Failure to reject H0 in all B, C and D at α = 0.10 is equivalent to failure to reject H0 in A at α = 0.10; so, all three means are equal with a confidence of 90%.

Explain why her conclusion is wrong.

Solution: This question requires the execution of (1) obtaining the confidence levels for the tests A, B, and C, (2) noticing that these confidence levels are nothing but simple probabilities, (3) multiplying the individual confidence levels to obtain the joint confidence level, (4) subtracting the joint confidence level from 1 to find the joint significance level (call it α ). In that,

 ′            3
α =  1− (1− α)

and for α = 0.10, α becomes 0.271, so ’straightforwardly joining/merging the conclusions of separate tests of hypotheses’ is not allowed in out professional practice.

A t random  variable with m degrees of freedom, denoted t(m) is
found by:
                       t = ∘-Z--∼ t
                            χ2(m)   (m)
                            -m-
if the numerator and denominator are independent random vari-
In a nutshell
ables. Here, consider the meaning of:
                             χ2(m)
                             -m-
                  2
Previously we said χ  should be something related to variance. Based
on the definition of χ 2(m), do you think the fraction above is the vari-
ance of something? Reveal what this something is.

0Checkpoint
No: 75