At the surface, the problem of interval estimation seems to be a straightforward one. Despite this true from a computational viewpoint, not true from a more philosophical angle. We’ll be discussing these issues in our lectures. The problem of interval estimation is to find a real number interval to contain an unknown population parameter of interest with a given or chosen value of probability.
Often enough, but not always, a confidence interval is symmetric around a mean. We’ll formalize our discussion after our introductory exercise. Before that exercise, note that: A confidence interval estimator for a population parameter is a rule for determining based on sample information, an interval that is likely to include the parameter. The corresponding estimate is called a confidence interval estimate.
Consider i=1n a random sample of n observations from a normal
population N
. If the sample mean is , then a
100%
confidence interval for μ with known σ2 is:
|
Here,
|
is called the margin or error (or sampling error),
|
is called the width.
|
is called the upper confidence limit, and
|
is called the lower confidence limit.
Note that, definitions of ME, w, UCL and LCL will not be repeated in the other cases to save some space. Think about the ways to reduce the margin of error. Is everything under your control?
6.1 EXERCISES____________________________________________
A researcher wants to estimate a 95% confidence interval for the mean wage rate of workers in Ankara, for which the population variance is known to be 1000000 . She uses a sample of 64 workers and measures their mean wage rate as 7000. Calculate/estimate the confidence interval requested.
Solution: The population variance is known; so, the relevant distribution iz z & the 95% confidence interval for μ is:
|
|
Consider i=1n a random sample of n observations from a normal
population N
. If the sample mean is , then a
100%
confidence interval for μ with unknown σ2 is:
|
where,
|
is the sample standard deviation. Go over the description of t-distribution in Chapter 10.
0Go to Teaching page & using the file named ‘Confidence intervals.xlsx’, experiment on the worksheet ‘t vs z’ to observe what happens as the degrees of freedom increases.
6.2 EXERCISES____________________________________________
A researcher wants to estimate a 95% confidence interval for the mean wage rate of workers in Ankara, for which the population variance is unknown. She uses a sample of 64 workers and measures their mean wage rate as 7000 , ’sample variance’ being calculated as 640000. Calculate/estimate the confidence interval requested.
Solution: The population variance is unknown; so, the relevant distribution iz t, the degrees of freedom is 63 & the 95% confidence interval for μ is:
|
|
Consider i=1n, a random sample of n observations from a Bernoulli
population. Notice that each xi is either 1
or 0
. in this
case is nothing but the observed proportion of successed, denoted
as
. Then, if n
, a
100%confidence interval for p
is:
|
6.3 EXERCISES ___________________________________________________________
A political candidate wants to know her nationwide support rate. Among a sample of 64 people, we know 35 support the political candidate. Calculate/estimate a 95% confidence interval for the candidate’s nation-wide support rate.
Solution: The relevant distribution is z.
|
The 95% confidence interval for P is:
|
|
Consider i=1n, a random sample of n observations from a normal
population N
. If the observed sample variance is s2, then a
100% confidence interval for σ2 is:
|
where
|
is the sample variance. Go over the description of χ2-distribution in Chapter 10.
0Go to Teaching page & experiment with χn2 distribution using the file named ‘Statistical distributions.xlsx’.
6.4 EXERCISES ___________________________________________________________
A process engineer is concerned with the variation of temperature in an industrial furnace. She collects a random sample of temperatures as:
Calculate/estimate a 95% confidence interval for the (population) variance of temperatures in this furnace.
Solution: s2 for the 9 temperature readings is 3125, the relevant distribution is χ2 and the degrees of freedom is 8. The 95% confidence interval for σ2 is:
|
|
When n << N, our procedures work seamlessly. However, when n is considerably high, i.e.
|
we need to use a factor of
|
to correct the relevant variances involved. This factor is called the finite population correction (fpc) factor. Observe that fpc = 1 for n = 1 and fpc = 0 for n = N in a very intuitive manner.
Mean of a normally distributed population, known population variance:
|
Population proportion:
|
6.5 EXERCISES ___________________________________________________________
A random sample, of size 9, from a normally distributed population, with variance 9, yielded as sample mean of 7 and a sample variance of 4.
i. Construct a 90% confidence interval for the population mean of the population that the sample is taken from.
ii. What is the probability of a random sample of size 9 yielding a sample variance of 4 or less, given that the population variance is 9?
iii. What is the minimum sample size required if we would like the 90% confidence interval to be at most of length 2?
A random sample 100 consumers where asked if they made their purchasing decisions based on price or based on quality. 64% of the consumers in the sample stated that they mainly base their buying decisions on price. Based on this information, construct a 95% confidence interval for the percentage of consumers in the population who base their buying decision on price.
Solution: n = 100, = 0.64 are given. A 95% confidence interval for P
is:
Two statisticians, using the same sample data reported the following
different confidence intervals for the population mean: and
. Given that they used the same sample and based their
confidence interval on same estimators (the sample mean and sample
variance), what is the source of the difference in the confidence
interval?
A researcher has a strange habit of using a sample size of where N
is the size of the population of interest. Under which value of N does
the researcher need to use a correction factor for the standard
deviation of the sampling distribution while estimating a CI for
μ?
Solution: This exercise is left as self-study.
In scientific research, we often need to compare selected parameters of two populations, rather than only comparing to a single population parameter to a given value. Despite the problem gets slightly complicated, essence of the problem is unchanged. So, while considering the confidence interval estimation and hypothesis testing problems involving two populations, we’ll first maintain a mechanical approach in what follows. Through the following pages, notice
That we use two samples every time:
i=1nx ⊂
i=1Nx
i=1ny ⊂
i=1Ny
where nx and ny are their respective sample sizes.
Let i=1n and
i=1n be two matched samples. We can then
creat:
|
Then, a 100% confidence interval for μd = μx −μy is:
|
where
|
and,
|
6.6 EXERCISES____________________________________________
A company is about to release a new drug to assist weight loss, and we are in charge of assessing how effective the drug is. We pick a random sample of 8 people with the following pre-drug body weights:
After using the drug for the designated test duration, the post-drug body weights are measured as:
Calculate/estimate a 95% confidence interval for the pre-drug minus post-drug difference of mean body weights. Is the drug effective?
Solution: The difference series (pre-drug minus post-drug) is:
The relevant distribution is t, the degrees of freedom is 7 and the 95% confidence interval for μx −μy is:
|
|
Let,
i=1nx ⊂
i=1Nx ∼ N
i=1ny ⊂
i=1Ny ∼ N
where σx2 and σy2 are known. Then a 100% confidence interval for
μx −μy is:
|
6.7 EXERCISES ___________________________________________________________
A researcher wants to compare the mean wages of workers in Ankara and Istanbul. She has the following data and information: Mean wage rate of 49 workers from Ankara is 6000. Mean wage rate of 81 workers from Istanbul is 7000. Population variance of wages in Ankara and Istanbul are known to be 640000 and 810000, respectively. Calculate/ estimate a 95% confidence interval for the difference of (population) means of wages in Ankara and Istanbul.
Solution: The relevant distribution is z and the 95% confidence interval for μx −μy is:
|
|
Let,
i=1nx ⊂
i=1Nx ∼ N
i=1ny ⊂
i=1Ny ∼ N
where σx2 and σy2 are unkown but assumed to be equal. Then a 100%
confidence interval for μx −μy is:
|
In our formulation:
|
and
|
|
6.8 EXERCISES____________________________________________
A researcher wants to compare the mean wages of workers in Ankara and Istanbul. She has the following data and information: Mean wage rate of 49 workers from Ankara is 6000. Mean wage rate of 81 workers from Istanbul is 7000. Population variances of wages in Ankara and Istanbul are unknown but they are assumed to be equal. Sample variance of wages in Ankara and Istanbul are calculated as 490000 and 640000, respectively. Calculate/ estimate a 95% confidence interval for the difference of (population) means of wages in Ankara and Istanbul.
Solution:
|
|
The relevant distribution is t, the degrees of freedom is 128 and the 95% confidence interval for μx −μy is:
|
|
|
i=1nx ⊂
i=1Nx ∼ N
i=1ny ⊂
i=1Ny ∼ N
where σx2 and σy2 are unknown and assumed not to be equal. Then a
100% confidence interval for μx −μy is:
|
In our formulation:
|
Notice that, if nx = ny = n
|
6.9 EXERCISES ___________________________________________________________
A researcher wants to compare the mean wages of workers in Ankara and Istanbul. She has the following data and information: Mean wage rate of 49 workers from Ankara is 6000. Mean wage rate of 81 workers from Istanbul is 7000. Population variances of wages in Ankara and Istanbul are unknown and they are assumed to be unequal. Sample variance of wages in Ankara and Istanbul are calculated as 490000 and 640000, respectively. Calculate/ estimate a 95% confidence interval for the difference of (population) means of wages in Ankara and Istanbul.
Solution: The relevant distribution is t and the degrees of freedom is ν:
|
|
The 95% confidence interval for μx −μy is:
|
|
Let,
i=1nx ⊂
i=1Nx ∼ Bernoulli
i=1ny ⊂
i=1Ny ∼ Bernoulli
Then, a 100% confidence interval for px −py is:
|
6.10 EXERCISES__________________________________________
A political candidate wonders how her support rate in Ankara and Istanbul compares. We know that among 64 people from Ankara 35 supports the candidate and among 81 people from Istanbul 45 supports the candidate. Calculate estimate a 95% confidence interval for the difference of population support rates in Ankara and Istanbul.
Solution: x = 35/64 = 0.547 and
y = 45/81 = 0.556 and the 95%
confidence interval for Px −Py is:
|
|
|
6.11 EXERCISES__________________________________________
Journal A of city X reports a 90% CI for the population mean income in
city X as
and Journal B in city Y
as
.
Each journal notes that the population is distibuted normally with a
known variance. The journals report the odds for the approval of Mr.
Doe, a political candidate, in their respective cities of X and Y as
750/500 and 80/60 where the fractions are the (sample count of
approvals/sample count of disapprovals).
v. Estimate a 95% CI for px −py
Interpret your result clearly in each case.
Solution: To come up with solutions to parts (i) to (v), we need to find/
calculate , ȳ, x,
y, nx and ny from the given information. In
that,
These are sufficient to solve parts (i), (ii) and (iii). To solve (iv) and (v), we need the following:
Notice that, nx and ny are not to be necessarily and explicitly known while solving parts (i), (ii) and (iii).
A researcher wants to test whether three populations’ means are equal to each other; i.e. whether (A) H0 : μ1 = μ2 = μ3. She picks α as 0.10 and collects data from each population. She, then, tests separately (one at a time):
(B) H0 : μ1 = μ2 (C) H0 : μ1 = μ3 (D) H0 : μ2 = μ3
against their two-sided alternatives. She fails to reject H0 every time at α = 0.10. So, she concludes: Failure to reject H0 in all B, C and D at α = 0.10 is equivalent to failure to reject H0 in A at α = 0.10; so, all three means are equal with a confidence of 90%.
Explain why her conclusion is wrong.
Solution: This question requires the execution of (1) obtaining the confidence levels for the tests A, B, and C, (2) noticing that these confidence levels are nothing but simple probabilities, (3) multiplying the individual confidence levels to obtain the joint confidence level, (4) subtracting the joint confidence level from 1 to find the joint significance level (call it α′ ). In that,
and for α = 0.10, α′ becomes 0.271, so ’straightforwardly joining/merging the conclusions of separate tests of hypotheses’ is not allowed in out professional practice.