Confidence intervals - Lecture Notes in Probability and Statistics

Confidence interval estimation: One population

At the surface, the problem of interval estimation seems to be a straightforward one. Despite this true from a computational viewpoint, not true from a more philosophical angle. We’ll be discussing these issues in our lectures. The problem of interval estimation is to find a real number interval to contain an unknown population parameter of interest with a given or chosen value of probability.

In a nutshell

Interval estimation of \(\mu\): Construction of the problem \[\begin{aligned} \operatorname{P}\!\left(L \leq \mu \leq U\right)=&1-\alpha\\ \operatorname{P}\!\left(\mu-K \leq \mu \leq \mu+K\right)=&1-\alpha\\ \end{aligned}\]

where \(L=\mu-K\) and \(U=\mu+K\).

\[\begin{aligned} \operatorname{P}\!\left(-\mu-K \leq -\mu \leq -\mu+K\right)=&1-\alpha\\ \operatorname{P}\!\left(\bar{x}_{n}-\mu-K \leq \bar{x}_{n}-\mu \leq \bar{x}_{n}-\mu+K\right)=&1-\alpha\\ \operatorname{P}\!\left(\underbrace{\frac{\bar{x}_{n}-\mu-K}{\sigma/\sqrt{n}}}_{-z_c\text{ (1)}} \leq \underbrace{\frac{\bar{x}_{n}-\mu}{\sigma/\sqrt{n}}}_{z} \leq\underbrace{\frac{\bar{x}_{n}-\mu+K}{\sigma/\sqrt{n}}}_{z_c\text{ (2)}}\right)=&1-\alpha \end{aligned}\]

Considering \((1)\) and \((2)\) simultaneously \(\rightarrow\) \(K=z_c\frac{\sigma}{\sqrt{n}}\). Then, \[\begin{aligned} \operatorname{P}\!\left(\underbrace{\mu}_{\text{Subs.}\bar{x}}-z_c\frac{\sigma}{\sqrt{n}}\leq\mu\leq\underbrace{\mu}_{\text{Subs.}\bar{x}}+z_c\frac{\sigma}{\sqrt{n}}\right)=&1-\alpha\\ \operatorname{P}\!\left(\bar{x}-z_c\frac{\sigma}{\sqrt{n}}\leq\mu\leq\bar{x}+z_c\frac{\sigma}{\sqrt{n}}\right)=&1-\alpha \end{aligned}\]

Often enough, but not always, a confidence interval is symmetric around a mean. We’ll formalize our discussion after our introductory exercise. Before that exercise, note that: A confidence interval estimator for a population parameter is a rule for determining based on sample information, an interval that is likely to include the parameter. The corresponding estimate is called a confidence interval estimate.

Confidence interval estimation
Mean of a normal population
Case: Known population variance

Consider \(\left\{x_{i}\right\}_{i=1}^{n}\) a random sample of \(n\) observations from a normal population \(\operatorname{Normal}(\mu, \sigma^2)\). If the sample mean is \(\bar{x}\), then a \(\left(1-\alpha\right)100\%\) confidence interval for \(\mu\) with known \(\sigma^2\) is: \[\bar{x} \pm z_{1-\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}\] Here, \[\operatorname{ME} = z_{1-\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}\] is called the margin or error (or sampling error), \[w = 2ME\] is called the width. \[\operatorname{UCL} = \bar{x} + z_{1-\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}\] is called the upper confidence limit, and \[\operatorname{LCL} = \bar{x} - z_{1-\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}\] is called the lower confidence limit.

Note that, definitions of \(\operatorname{ME}\), \(w\), \(\operatorname{UCL}\) and \(\operatorname{LCL}\) will not be repeated in the other cases to save some space. Think about the ways to reduce the margin of error. Is everything under your control?

Exercise. A researcher wants to estimate a \(95 \%\) confidence interval for the mean wage rate of workers in Ankara, for which the population variance is known to be \(1000000\) . She uses a sample of \(64\) workers and measures their mean wage rate as \(7000\). Calculate/estimate the confidence interval requested.

Solution. The population variance is known; so, the relevant distribution iz \(z\) & the \(95 \%\) confidence interval for \(\mu\) is: \[\bar{x} \pm z_{1-\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}} \rightarrow 7000 \pm 1.96 \frac{1000}{\sqrt{64}}\] \[\rightarrow [6755.005,7244.995]\]

Confidence interval estimation
Mean of a normal population
Case: Unknown population variance

Consider \(\left\{x_{i}\right\}_{i=1}^{n}\) a random sample of \(n\) observations from a normal population \(\operatorname{Normal}(\mu, \sigma^2)\). If the sample mean is \(\bar{x}\), then a \(\left(1-\alpha\right)100\%\) confidence interval for \(\mu\) with unknown \(\sigma^2\) is: \[\bar{x} \pm t_{n-1,1-\frac{\alpha}{2}}\frac{s}{\sqrt{n}}\] where, \[s = \sqrt{s^2} = \sqrt{\frac{\sum_{i=1}^{n}\left(x_{i} - \bar{x}\right)^2}{n-1}}\] is the sample standard deviation. Go over the description of \(t\)-distribution in Chapter 10.

Exercise. A researcher wants to estimate a \(95 \%\) confidence interval for the mean wage rate of workers in Ankara, for which the population variance is unknown. She uses a sample of \(64\) workers and measures their mean wage rate as \(7000\) , ’sample variance’ being calculated as \(640000\). Calculate/estimate the confidence interval requested.

Solution. The population variance is unknown; so, the relevant distribution iz \(t\), the degrees of freedom is \(63\) & the \(95 \%\) confidence interval for \(\mu\) is: \[\bar{x} \pm t_{n-1,1-\frac{\alpha}{2}}\frac{s}{\sqrt{n}} \rightarrow 7000 \pm 1.998 \frac{800}{\sqrt{64}}\] \[\rightarrow [6800.166,7199.834]\]

Confidence interval estimation
Population proportion

Consider \(\left\{x_{i}\right\}_{i=1}^{n}\), a random sample of \(n\) observations from a \(\operatorname{Bernoulli}(P)\) population. Notice that each \(x_{i}\) is either \(1\left(success\right)\) or \(0\left(failure\right)\). \(\bar{x}\)in this case is nothing but the observed proportion of successed, denoted as \(\hat{p}\). Then, if \(n\hat{p}\left(1 - \hat{p}\right)\), a \(\left(1 - \alpha\right)100\%\)confidence interval for \(p\) is: \[\hat{p} \pm z_{1-\frac{\alpha}{2}}\sqrt{\frac{\hat{p}\left(1-\hat{p}\right)}{n}}\]

Exercise. A political candidate wants to know her nationwide support rate. Among a sample of \(64\) people, we know \(35\) support the political candidate. Calculate/estimate a \(95\%\) confidence interval for the candidate’s nation-wide support rate.

Solution. The relevant distribution is \(z\). \[\hat{p}=\frac{35}{64}=0.547\] The \(95 \%\) confidence interval for \(P\) is: \[\hat{p} \pm z_{1-\frac{\alpha}{2}}\sqrt{\frac{\hat{p}\left(1-\hat{p}\right)}{n}} \rightarrow 0.547 \pm 1.96 \sqrt{\frac{0.547\left(1-0.547 \right)}{64}}\] \[\rightarrow [0.425,0.669]\]

Confidence interval estimation
Variance of a normal population

Consider \(\left\{x_{i}\right\}_{i=1}^{n}\), a random sample of \(n\) observations from a normal population \(\operatorname{Normal}(\mu, \sigma^2)\). If the observed sample variance is \(s^{2}\), then a \(\left(1-\alpha\right)100\%\) confidence interval for \(\sigma^{2}\) is: \[\left[\frac{\left(n-1\right)s^{2}}{\chi^2_{n-1,1-\frac{\alpha}{2}}}, \frac{\left(n-1\right)s^{2}}{\chi^2_{n-1,\frac{\alpha}{2}}}\right]\] where \[s^{2} = \frac{\sum_{i=1}^{n}\left(x_{i} - \bar{x}\right)^{2}}{n-1}\] is the sample variance. Go over the description of \(\chi^{2}\)-distribution in Chapter 10.

Exercise. A process engineer is concerned with the variation of temperature in an industrial furnace. She collects a random sample of temperatures as: \[\begin{array}{cccc} 975 & 1075 & 1050 & 900 \\ 1000 & 950 & 1025 & 1050 \\ 975^{\circ} \mathrm{C} & & & \end{array}\] Calculate/estimate a \(95 \%\) confidence interval for the (population) variance of temperatures in this furnace.

Solution. \(s^2\) for the \(9\) temperature readings is \(3125\), the relevant distribution is \(\chi^2\) and the degrees of freedom is \(8\). The \(95 \%\) confidence interval for \(\sigma^2\) is: \[\left[\frac{\left(n-1\right)s^{2}}{\chi^2_{n-1,1-\frac{\alpha}{2}}}, \frac{\left(n-1\right)s^{2}}{\chi^2_{n-1,\frac{\alpha}{2}}}\right] \rightarrow \left[\frac{\left(9-1\right)3125}{17.535}, \frac{\left(9-1\right)3125}{2.180}\right]\] \[\rightarrow [1425.757,11469.306]\]

Finite populations and correction

When \(n<<N\), our procedures work seamlessly. However, when \(n\) is considerably high, i.e. \[n > \frac{1}{20}N\] we need to use a factor of \[\frac{N-n}{N-1}\] to correct the relevant variances involved. This factor is called the finite population correction \(\operatorname{fpc}\) factor. Observe that \(\operatorname{fpc} = 1\) for \(n=1\) and \(\operatorname{fpc} = 0\) for \(n=N\) in a very intuitive manner.

Sample size determination

Mean of a normally distributed population, known population variance: \[n = \frac{z_{1-\frac{\alpha}{2}}^{2} \sigma^{2}}{\operatorname{ME}^2}\] Population proportion: \[n = \frac{0.25 z_{1-\frac{\alpha}{2}}^{2}}{\operatorname{ME}^{2}}\]

Exercise. A random sample, of size \(9\), from a normally distributed population, with variance \(9\), yielded as sample mean of \(7\) and a sample variance of \(4\).

Construct a \(90\%\) confidence interval for the population mean of the population that the sample is taken from. What is the probability of a random sample of size \(9\) yielding a sample variance of \(4\) or less, given that the population variance is \(9\)? What is the minimum sample size required if we would like the \(90\%\) confidence interval to be at most of length \(2\)?

Solution. This question is under maintenance.

A random sample \(100\) consumers where asked if they made their purchasing decisions based on price or based on quality. \(64\%\) of the consumers in the sample stated that they mainly base their buying decisions on price. Based on this information, construct a \(95\%\) confidence interval for the percentage of consumers in the population who base their buying decision on price.

Solution. \(n=100\), \(\hat{p}=0.64\) are given. A \(95\%\) confidence interval for \(P\) is: \[\begin{aligned} & \hat{p} \mp z_{c} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \\ \rightarrow & 0.64 \mp 1.96 \sqrt{\frac{0.64 \cdot 0.36}{100}} \\ \rightarrow & 0.64 \mp 1.96 \cdot 0.048 \\ \rightarrow & 0.64 \mp 0.0941 . \end{aligned}\] So, \(\operatorname{P}\!\left(P \in[0.5459,0.7341]\right)=0.95\).

Two statisticians, using the same sample data reported the following different confidence intervals for the population mean: \(\left[3, 5\right]\) and \(\left[2, 6\right]\). Given that they used the same sample and based their confidence interval on same estimators (the sample mean and sample variance), what is the source of the difference in the confidence interval?

Solution. This exercise is left as self-study.

A researcher has a strange habit of using a sample size of \(\sqrt{N}\) where \(N\) is the size of the population of interest. Under which value of \(N\) does the researcher need to use a correction factor for the standard deviation of the sampling distribution while estimating a CI for \(\mu\)?

Solution. This exercise is left as self-study.

Confidence interval estimation: Two populations

In scientific research, we often need to compare selected parameters of two populations, rather than only comparing to a single population parameter to a given value. Despite the problem gets slightly complicated, essence of the problem is unchanged. So, while considering the confidence interval estimation and hypothesis testing problems involving two populations, we’ll first maintain a mechanical approach in what follows. Through the following pages, notice

That we use a more shorthand notation
That we use two samples every time:

\(\left\{x_{i}\right\}_{i=1}^{n_{x}} \subset \left\{x_{i}\right\}_{i=1}^{N_{x}}\)
\(\left\{y_{i}\right\}_{i=1}^{n_{y}} \subset \left\{y_{i}\right\}_{i=1}^{N_{y}}\)

where \(n_{x}\) and \(n_{y}\) are their respective sample sizes.

Confidence interval estimation
Difference between two normal population means
Case: Dependent (matched) samples

Let \(\left\{x_{i}\right\}_{i=1}^{n}\) and \(\left\{y_{i}\right\}_{i=1}^{n}\) be two matched samples. We can then creat: \[\left\{d_{i}\right\}_{i=1}^{n} where d_{i} = x_{i} - y_{i}, \forall i\] Then, a \(\left(1 - \alpha\right)100\%\) confidence interval for \(\mu_{d} = \mu_{x} - \mu_{y}\) is: \[\bar{d} \pm t_{n-1,1-\frac{\alpha}{2}}\frac{s_{d}}{\sqrt{n}}\] where \[s_{d} = \sqrt{s_{d}^{2}} = \sqrt{\frac{\sum_{i=1}^{n}\left(d_{i} - \bar{d}\right)^{2}}{n-1}}\] and, \[\bar{d} = \frac{\sum_{i=1}^{n} d_{i}}{n}\]

Exercise. A company is about to release a new drug to assist weight loss, and we are in charge of assessing how effective the drug is. We pick a random sample of \(8\) people with the following pre-drug body weights: \[90, 95, 105, 95, 110, 85, 100, 90\] After using the drug for the designated test duration, the post-drug body weights are measured as: \[85, 80, 110, 90, 110, 80, 95, 90\] Calculate/estimate a \(95 \%\) confidence interval for the pre-drug minus post-drug difference of mean body weights. Is the drug effective?

Solution. The difference series (pre-drug minus post-drug) is: \[+5, +15, -5, +5, 0, +5, +5, 0\] The relevant distribution is \(t\), the degrees of freedom is \(7\) and the \(95 \%\) confidence interval for \(\mu_x-\mu_y\) is: \[\bar{d} \pm t_{n-1,1-\frac{\alpha}{2}}\frac{s_{d}}{\sqrt{n}} \rightarrow 3.750 \pm 2.365\frac{5.824}{\sqrt{8}}\] \[\rightarrow [-1.120,8.620]\]

Confidence interval estimation
Difference between two normal population means
Case: Independent samples & Known population variances

Let,

\(\left\{x_{i}\right\}_{i=1}^{n_{x}} \subset \left\{x_{i}\right\}_{i=1}^{N_{x}} \sim \operatorname{Normal}(\mu_{x}, \sigma_{x}^{2} )\)
\(\left\{y_{i}\right\}_{i=1}^{n_{y}} \subset \left\{y_{i}\right\}_{i=1}^{N_{y}} \sim \operatorname{Normal}(\mu_{y}, \sigma_{y}^{2} )\)

where \(\sigma_{x}^{2}\) and \(\sigma_{y}^{2}\) are known. Then a \(\left(1 - \alpha\right)100\%\) confidence interval for \(\mu_{x} - \mu_{y}\) is: \[\bar{x} - \bar{y} \pm z_{1-\frac{\alpha}{2}}\sqrt{\frac{\sigma_{x}^{2}}{n_{x}} + \frac{\sigma_{y}^{2}}{n_{y}}}\]

Exercise. A researcher wants to compare the mean wages of workers in Ankara and Istanbul. She has the following data and information: Mean wage rate of \(49\) workers from Ankara is \(6000\). Mean wage rate of \(81\) workers from Istanbul is \(7000\). Population variance of wages in Ankara and Istanbul are known to be \(640000\) and \(810000\), respectively. Calculate/ estimate a \(95 \%\) confidence interval for the difference of (population) means of wages in Ankara and Istanbul.

Solution. The relevant distribution is \(z\) and the \(95 \%\) confidence interval for \(\mu_x-\mu_y\) is: \[\bar{x} - \bar{y} \pm z_{1-\frac{\alpha}{2}}\sqrt{\frac{\sigma_{x}^{2}}{n_{x}} + \frac{\sigma_{y}^{2}}{n_{y}}} \rightarrow 6000 -7000 \pm 1.96\sqrt{\frac{640000}{49} + \frac{810000}{81}}\] \[\rightarrow [-1297.639,-702.361]\]

Confidence interval estimation
Difference between two normal population means
Case: Independent samples & Unknown yet equal population variances

Let,

where \(\sigma_{x}^{2}\) and \(\sigma_{y}^{2}\) are unkown but assumed to be equal. Then a \(\left(1 - \alpha\right)100\%\) confidence interval for \(\mu_{x} - \mu_{y}\) is: \[\bar{x} - \bar{y} \pm t_{n_{x} + n_{y} - 2, 1-\frac{\alpha}{2}}\sqrt{\frac{s_{p}^{2}}{n_{x}} + \frac{s_{p}^{2}}{n_{y}}}\] In our formulation: \[s_{p}^{2} = \frac{\left(n_{x} - 1\right) s_{x}^{2} +\left(n_{y} - 1\right) s_{y}^{2}}{n_{x} + n_{y} - 2}\] and \[s_{x}^{2} = \frac{\sum_{i=1}^{n_{x}}\left(x_{i} - \bar{x}\right)^{2}}{n_{x} - 1}\] \[s_{y}^{2} = \frac{\sum_{i=1}^{n_{y}}\left(y_{i} - \bar{y}\right)^{2}}{n_{y} - 1}\]

Exercise. A researcher wants to compare the mean wages of workers in Ankara and Istanbul. She has the following data and information: Mean wage rate of \(49\) workers from Ankara is \(6000\). Mean wage rate of \(81\) workers from Istanbul is \(7000\). Population variances of wages in Ankara and Istanbul are unknown but they are assumed to be equal. Sample variance of wages in Ankara and Istanbul are calculated as \(490000\) and \(640000\), respectively. Calculate/ estimate a \(95 \%\) confidence interval for the difference of (population) means of wages in Ankara and Istanbul.

Solution. \[s_{p}^{2} = \frac{\left(n_{x} - 1\right) s_{x}^{2} +\left(n_{y} - 1\right) s_{y}^{2}}{n_{x} + n_{y} - 2}\] \[s_{p}^{2} = \frac{\left(49 - 1\right) 490000 +\left(81 - 1\right) 640000}{49 + 81 - 2} \rightarrow s_{p}^{2}=583750\] The relevant distribution is \(t\), the degrees of freedom is \(128\) and the \(95 \%\) confidence interval for \(\mu_x-\mu_y\) is: \[\bar{x} - \bar{y} \pm t_{n_{x} + n_{y} - 2, 1-\frac{\alpha}{2}}\sqrt{\frac{s_{p}^{2}}{n_{x}} + \frac{s_{p}^{2}}{n_{y}}}\] \[\rightarrow 6000 - 7000 \pm 1.979 \sqrt{\frac{583750}{49} + \frac{583750}{81}}\] \[\rightarrow [-1273.601,-726.399]\]

Confidence interval estimation
Difference between two normal population means
Case: Independent samples & Unknown and unequal population variances

where \(\sigma_{x}^{2}\) and \(\sigma_{y}^{2}\) are unknown and assumed not to be equal. Then a \(\left(1 - \alpha\right)100\%\) confidence interval for \(\mu_{x} - \mu_{y}\) is: \[\bar{x} - \bar{y} \pm t_{v,1-\frac{\alpha}{2}}\sqrt{\frac{s_{x}^{2}}{n_{x}} + \frac{s_{y}^{2}}{n_{y}}}\] In our formulation: \[v = \frac{\left(\left(\frac{s_{x}^{2}}{n_{x}}\right) +\left(\frac{s_{y}^{2}}{n_{y}}\right)\right)^{2}}{\frac{\left(\frac{s_{x}^{2}}{n_{x}}\right)^{2}}{n_{x} - 1} + \frac{\left(\frac{s_{y}^{2}}{n_{y}}\right)^{2}}{n_{y} - 1}}\]

Notice that, if \(n_{x} = n_{y} = n\) \[v =\left(1 + \frac{2}{\frac{s_{x}^{2}}{s_{y}^{2}} + \frac{s_{y}^{2}}{s_{x}^{2}}}\right)\left(n-1\right)\]

Exercise. A researcher wants to compare the mean wages of workers in Ankara and Istanbul. She has the following data and information: Mean wage rate of \(49\) workers from Ankara is \(6000\). Mean wage rate of \(81\) workers from Istanbul is \(7000\). Population variances of wages in Ankara and Istanbul are unknown and they are assumed to be unequal. Sample variance of wages in Ankara and Istanbul are calculated as \(490000\) and \(640000\), respectively. Calculate/ estimate a \(95 \%\) confidence interval for the difference of (population) means of wages in Ankara and Istanbul.

Solution. The relevant distribution is \(t\) and the degrees of freedom is \(\nu\): \[v = \frac{\left(\left(\frac{s_{x}^{2}}{n_{x}}\right) +\left(\frac{s_{y}^{2}}{n_{y}}\right)\right)^{2}}{\frac{\left(\frac{s_{x}^{2}}{n_{x}}\right)^{2}}{n_{x} - 1} + \frac{\left(\frac{s_{y}^{2}}{n_{y}}\right)^{2}}{n_{y} - 1}}\] \[\rightarrow v = \frac{\left(\left(\frac{490000}{49}\right) +\left(\frac{640000}{81}\right)\right)^{2}}{\frac{\left(\frac{490000}{49}\right)^{2}}{49 - 1} + \frac{\left(\frac{640000}{81}\right)^{2}}{81 - 1}} \rightarrow 112\] The \(95 \%\) confidence interval for \(\mu_x-\mu_y\) is: \[\bar{x} - \bar{y} \pm t_{v,1-\frac{\alpha}{2}}\sqrt{\frac{s_{x}^{2}}{n_{x}} + \frac{s_{y}^{2}}{n_{y}}} \rightarrow 6000 - 7000 \pm 1.982 \sqrt{\frac{490000}{49} + \frac{640000}{81}}\] \[\rightarrow [-1265.125,-734.875]\]

Confidence interval estimation
Difference between two population proportions

Let,

\(\left\{x_{i}\right\}_{i=1}^{n_{x}} \subset \left\{x_{i}\right\}_{i=1}^{N_{x}} \sim \operatorname{Bernoulli}(P_{x})\)
\(\left\{y_{i}\right\}_{i=1}^{n_{y}} \subset \left\{y_{i}\right\}_{i=1}^{N_{y}} \sim \operatorname{Bernoulli}(P_{y})\)

Then, a \(\left(1-\alpha\right)100\%\) confidence interval for \(p_{x} - p_{y}\) is: \[\hat{p}_{x} - \hat{p}_{y} \pm z_{1-\frac{\alpha}{2}} \sqrt{\frac{\hat{p}_{x}\left(1-\hat{p}_{x}\right)}{n_{x}} + \frac{\hat{p}_{y}\left(1-\hat{p}_{y}\right)}{n_{y}}}\]

Exercise. A political candidate wonders how her support rate in Ankara and Istanbul compares. We know that among \(64\) people from Ankara \(35\) supports the candidate and among \(81\) people from Istanbul \(45\) supports the candidate. Calculate estimate a \(95\%\) confidence interval for the difference of population support rates in Ankara and Istanbul.

Solution. \(\hat{p}_{x}=35/64=0.547\) and \(\hat{p}_{y}=45/81=0.556\) and the \(95 \%\) confidence interval for \(P_x-P_y\) is: \[\hat{p}_{x} - \hat{p}_{y} \pm z_{1-\frac{\alpha}{2}} \sqrt{\frac{\hat{p}_{x}\left(1-\hat{p}_{x}\right)}{n_{x}} + \frac{\hat{p}_{y}\left(1-\hat{p}_{y}\right)}{n_{y}}}\] \[\rightarrow 0.547 - 0.556 \pm 1.96 \sqrt{\frac{0.547\left(1-0.547 \right)}{64} + \frac{0.556\left(1-0.556 \right)}{81}}\] \[\rightarrow [-0.172,0.154]\]

Exercise. Journal A of city \(X\) reports a \(90\%\) CI for the population mean income in city \(X\left(\mu_{x}\right)\) as \(\left[3400,6400\right]\) and Journal B in city \(Y\left(\mu_{y}\right)\) as \(\left[3800,6800\right]\). Each journal notes that the population is distibuted normally with a known variance. The journals report the odds for the approval of Mr. Doe, a political candidate, in their respective cities of \(X\) and \(Y\) as \(750/500\) and \(80/60\) where the fractions are the (sample count of approvals/sample count of disapprovals).

Estimate a \(95\%\) CI for \(\mu_{x}\) Estimate a \(99\%\) CI for \(\mu_{x} - \mu_{y}\) Test: \(H_{0}: \mu_{x} - \mu_{y} = 0\) againts \(H_{1}: \mu_{x} - \mu_{y} < 0\) at \(\alpha = 0.05\) Estimate a \(90\%\) CI for the popularity (share of approvals) of Mr. Doe in city \(X\left(p_{x}\right)\) Estimate a \(95\%\) CI for \(p_{x} - p_{y}\)

Interpret your result clearly in each case.

Solution. To come up with solutions to parts (i) to (v), we need to find/ calculate \(\bar{x}, \bar{y}, \hat{p}_{x}, \hat{p}_{y}, n_{x}\) and \(n_{y}\) from the given information. In that, \[\begin{aligned} &\bar{x}=\frac{3400+6400}{2}=4900 \\ &\bar{y}=\frac{3800+6800}{2}=5300 \\ &\frac{\sigma_{x}}{\sqrt{n_{x}}}=\frac{4900-3400}{1.65}=909.1 \\ &\frac{\sigma_{y}}{\sqrt{n_{y}}}=\frac{5300-3800}{1.65}=909.1 \end{aligned}\] These are sufficient to solve parts (i), (ii) and (iii). To solve (iv) and (v), we need the following: \[\begin{aligned} &\hat{p}_{x}=\frac{750}{750+500}=0.60, n_{x}=1250 \\ &\hat{p}_{y}=\frac{80}{80+60}=0.57, \quad n_{y}=140 \end{aligned}\] Notice that, \(n_{x}\) and \(n_{y}\) are not to be necessarily and explicitly known while solving parts (i), (ii) and (iii).

A researcher wants to test whether three populations’ means are equal to each other; i.e. whether (A) \(H_0: \mu_{1} = \mu_{2} = \mu_{3}\). She picks \(\alpha\) as \(0.10\) and collects data from each population. She, then, tests separately (one at a time):

(B) \(H_0:\mu_{1}= \mu_{2}\) (C) \(H_0:\mu_{1}= \mu_{3}\) (D) \(H_0:\mu_{2}= \mu_{3}\)

against their two-sided alternatives. She fails to reject \(H_0\) every time at \(\alpha = 0.10\). So, she concludes: Failure to reject \(H_0\) in all \(B\), \(C\) and \(D\) at \(\alpha = 0.10\) is equivalent to failure to reject \(H_0\) in \(A\) at \(\alpha = 0.10\); so, all three means are equal with a confidence of \(90\%\).

Explain why her conclusion is wrong.

Solution. This question requires the execution of (1) obtaining the confidence levels for the tests \(A\), \(B\), and \(C\), (2) noticing that these confidence levels are nothing but simple probabilities, (3) multiplying the individual confidence levels to obtain the joint confidence level, (4) subtracting the joint confidence level from \(1\) to find the joint significance level (call it \(\alpha^{\prime}\) ). In that, \[\alpha^{\prime}=1-(1-\alpha)^{3}\] and for \(\alpha=0.10, \alpha^{\prime}\) becomes \(0.271\), so ’straightforwardly joining/merging the conclusions of separate tests of hypotheses’ is not allowed in out professional practice.

Confidence interval estimation: One population

Confidence interval estimation Mean of a normal population Case: Known population variance

Confidence interval estimation Mean of a normal population Case: Unknown population variance

Confidence interval estimation Population proportion

Confidence interval estimation Variance of a normal population

Finite populations and correction

Sample size determination

Confidence interval estimation: Two populations

Confidence interval estimation Difference between two normal population means Case: Dependent (matched) samples

Confidence interval estimation Difference between two normal population means Case: Independent samples & Known population variances

Confidence interval estimation Difference between two normal population means Case: Independent samples & Unknown yet equal population variances

Confidence interval estimation Difference between two normal population means Case: Independent samples & Unknown and unequal population variances

Confidence interval estimation Difference between two population proportions

Confidence interval estimation
Mean of a normal population
Case: Known population variance

Confidence interval estimation
Mean of a normal population
Case: Unknown population variance

Confidence interval estimation
Population proportion

Confidence interval estimation
Variance of a normal population

Confidence interval estimation
Difference between two normal population means
Case: Dependent (matched) samples

Confidence interval estimation
Difference between two normal population means
Case: Independent samples & Known population variances

Confidence interval estimation
Difference between two normal population means
Case: Independent samples & Unknown yet equal population variances

Confidence interval estimation
Difference between two normal population means
Case: Independent samples & Unknown and unequal population variances

Confidence interval estimation
Difference between two population proportions