This chapter bridges our knowledge of the probability theory to statistical inference. Sampling distributions is the key to our understanding of how a small portion of a whole can represent the whole.
For any random variable X with a finite expected value μ and finite variance σ2,
|
or, alternatively, by setting k = ,
|
When σ2 is known, one of the k and 𝜖 can be arbitrarily picked.
Let i=1N be a sequence of identically and independently distributed
random variables with a finite expected value μ. For each n ≤ N, define the
random variable n as:
|
Then,
|
For sufficiently large n, the mean of independently and identically distrbuted (iid) n random variables will almost surely be arbitrarily close to the expected value of the individual random variables.
Noting that, E = μ and Var
=
Then, ∀n ≤ N, ∀𝜖 > 0,
|
as implied by the Chebyshev’s theorem. As n approaches infinity, this expression reduces to:
|
which is known as the Law of large numbers and is true even when the variance of Xi is not finite.
Let i=1N be a sequence of identically and independently distributed
random variables with a finite expected value μ, and a finite and positive
variance σ2. For each n, define the random variable n as:
|
Let Z be a standard normal random variable. For any z ∈R, we have
|
Informally, CLT states that for sufficiently large n, the random variable
|
is approximately standard normal distributed regardless of the distribution of Xi and exactly standard normal distributed if Xi are normally distributed.
Consider i=1n ⊂
i=1N which is a random sample of n observations
coming from a population with mean μ and variance σ2. Xn being
n =
|
|
If the population is distributed normally, then the distribution of the sample means is also normal. So,
|
has a standard normal distribution.
0Go to Teaching page & experiment to reveal the relationship between the population and sampling distributions using the file named ‘Confidence intervals.xlsx’.
Consider in ⊂
iN which is a random sample of n observations
coming from a Bernoulli
population. Xn being n =
=
E = p
Var =
If n is large,
|
is approximately distributed as a standard normal.
Let s2 denote the sample variance for a random sample of n observations from a population with a variance of σ2.
E = σ2
Var =
So,
Given a random sample of n observations from a normally distributed population whose variance is σ2, the sample variance s2 has a χ2 distribution with (n− 1) degrees of freedom?
|
is distributed as the Chi-squared distribution with
degrees of
freedom.
4.1 EXERCISES ___________________________________________________________
We have two data sets consisting of identical values: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. We will use xi denote the ith value in one data set and yj to denote the jth value in the other data set. We construct a new data set by taking a number from each data set and finding their average, i.e.. the new data set consists of values of the form:
|
The constructed data set will consist of one 0, two 0.5’s, three 1’s, etc.
iii. Find the mean and variance of the initial data set and the new data set
Solution: Solve yourself to explore the Central Limit Theorem.
We make 100 independent observations from a population with mean 40 and standard deviation 20. Approximately, what is the probability that the mean of these observations will be greater than 37?
Solution: Population X ∼⋅(40, 202). We take n = 100 observations and calculate 100.
The following table gives the relative frequency distribution of a population:
Value | Rel. Freq. |
2 | 0.1 |
4 | 0.3 |
6 | 0.2 |
8 | 0.3 |
10 | 0.1 |
i. A number is selected from this population at random, what is the probability that the number selected is greater than or equal to 8?
ii. If we select two numbers at random (with replacement), what is the probability that the mean of these two numbers is less than or equal to 5?
iii. If 25 numbers are selected from the population at random (with replacement), what is the probability (approximately) that the mean these 25 numbers is less than 6.5?
Solution:
We choose 36 numbers, with replacement, at random (i.e.., we take a
random sample of size 36) from the interval . Let X be the random
variable that assigns to each sample (outcome) the mean of the
sample.
ii. Find (an approximate value for) the probability that the sample mean, i.e.. X, will be less than or equal to 2.3
Solution:
X, here, is the mean of our 36 observations, i.e., 36 in our usual notation.
We choose 9 numbers from a normally distributed population of
numbers. The mean of the population is unknown but the variance is
know to be equal to 16. If μ denotes the mean of the population, then
what is the probability that the mean of the 9 numbers that we choose
will be in the interval ?
Solution: You do not need the value of μ in this exercise. The key to
solution is that Var = 16/9. So, performing the intermediate steps,
the problem reduces to finding P(−1.5 ≤ z ≤ 1.5) and the answer is
0.86638.
In a certain university the CGPA’s of students only takes the values 0, 1, 2, 3, 4. The distribution of CGPA’s of students of this university is given below:
CGPA | Freq |
0 | 5, 000 |
1 | 10, 000 |
2 | 20, 000 |
3 | 5, 000 |
4 | 10, 000 |
Total | 50, 000 |
i. Let 1 denote the CGPA of a student who was chosen at random from the population of all students of this university. Tabulate the PDF of this random variable.
ii. Let 2 denote the average (mean) of the CGPA’s of two randomly selected students from the population of all students of this university. What is the probability that the average CGPA of the two students is less than or equal to 1?
iii. Now we choose 36 students at random. What is the probability, approximately, that the average CGPA of these students (36) is less than or equal to 2.3?
Consider a large population of which only 20% know basic concepts of statistics. We take a random sample of size 81 from this population and count the number of individuals, in the sample, who knows basic concepts of statistics. What is the probability that the sample will have between 15 and 18 (inclusive) individuals who know basic concepts of statistics?
A four sided fair die is rolled several times and we calculate the average (mean) of the values observed.
i. Let X1 denote the random variable that gives the value observed when the die is rolled once. Find the expected value and variance of X1.
ii. Let denote the mean of the values observed when the die is
rolled n times. What is the minimum number of times that the
die should be rolled so that the mean (the value of ) takes a
value in the interval with at least a probability of 0.9? Use
Chebyshev′sTheorem to answer this problem.
iii. Using the CentralLimitTheorem and the value for n you found
above, find an approximate value for the probability of taking
a value in the interval .
iv. Do the answers you found in item and item
contradict
each other? If there is a difference in what the two answers suggest,
explain the reason for this.
Solution: i. When the die is rolled once, by definition we produce an outcome directly of the population (that is we don’t do any sampling at all). So,
So, X1 ∼⋅(2.5, 1.25)
ii. n is the RV denoting the sample mean, when We roll the die n times.
Midpoint of the interval [2.1, 2.9] is 2.5
Notice that in this part, we use the Chebyshev’s theorem only. We get:
and
iii.
Notice that in this part, we use the CLT only.
iv. No, they don’t contradict each other: While Chebyshev’s theorem sets a lowerbound of 0.90 here (in ii), (iii) gives the actual probability as 0.99852 which is larger than 0.90 (as it should be).
Though I am not a fan of such hints, here a useful hint may be
underlined: When an interval is given in a question like this, always
begin with observing (calculating) its midpoint. In this question, the
midpoint was (2.1 + 2.9)/2 = 2.5, which is nothing but E and
E
. Once you have noticed this, it will be trivial to find 𝜖 to figure
out the rest of the steps.
One additional point is:
So, sampling with n = 1 simply refers to population’s distribution.
Ztandard Normal Diztribution
A cell which is at the intersection of the row labeled with a and column labeled with b gives the probability P(Z ≤ a + b).
z | 0.00 | 0.01 | 0.02 | 0.03 | 0.04 | 0.05 | 0.06 | 0.07 | 0.08 | 0.09 |
0.00 | 0.5000 | 0.5040 | 0.5080 | 0.5120 | 0.5160 | 0.5199 | 0.5239 | 0.5279 | 0.5319 | 0.5359 |
0.10 | 0.5398 | 0.5438 | 0.5478 | 0.5517 | 0.5557 | 0.5596 | 0.5636 | 0.5675 | 0.5714 | 0.5754 |
0.20 | 0.5793 | 0.5832 | 0.5871 | 0.5909 | 0.5948 | 0.5987 | 0.6026 | 0.6064 | 0.6103 | 0.6141 |
0.30 | 0.6179 | 0.6217 | 0.6255 | 0.6293 | 0.6331 | 0.6368 | 0.6406 | 0.6443 | 0.6480 | 0.6517 |
0.40 | 0.6554 | 0.6591 | 0.6628 | 0.6664 | 0.6700 | 0.6736 | 0.6772 | 0.6808 | 0.6844 | 0.6879 |
0.50 | 0.6915 | 0.6950 | 0.6985 | 0.7019 | 0.7054 | 0.7088 | 0.7123 | 0.7157 | 0.7190 | 0.7224 |
0.60 | 0.7258 | 0.7291 | 0.7324 | 0.7357 | 0.7389 | 0.7421 | 0.7454 | 0.7486 | 0.7518 | 0.7549 |
0.70 | 0.7580 | 0.7611 | 0.7642 | 0.7673 | 0.7703 | 0.7734 | 0.7764 | 0.7793 | 0.7823 | 0.7852 |
0.80 | 0.7881 | 0.7910 | 0.7939 | 0.7967 | 0.7995 | 0.8023 | 0.8051 | 0.8078 | 0.8106 | 0.8133 |
0.90 | 0.8159 | 0.8186 | 0.8212 | 0.8238 | 0.8264 | 0.8289 | 0.8315 | 0.8340 | 0.8365 | 0.8389 |
1.00 | 0.8413 | 0.8438 | 0.8461 | 0.8485 | 0.8508 | 0.8531 | 0.8554 | 0.8577 | 0.8599 | 0.8621 |
1.10 | 0.8643 | 0.8665 | 0.8686 | 0.8708 | 0.8729 | 0.8749 | 0.8770 | 0.8790 | 0.8810 | 0.8830 |
1.20 | 0.8849 | 0.8869 | 0.8888 | 0.8906 | 0.8925 | 0.8943 | 0.8962 | 0.8980 | 0.8997 | 0.9015 |
1.30 | 0.9032 | 0.9049 | 0.9066 | 0.9082 | 0.9099 | 0.9115 | 0.9131 | 0.9147 | 0.9162 | 0.9177 |
1.40 | 0.9192 | 0.9207 | 0.9222 | 0.9236 | 0.9251 | 0.9265 | 0.9279 | 0.9292 | 0.9306 | 0.9319 |
1.50 | 0.9332 | 0.9345 | 0.9357 | 0.9370 | 0.9382 | 0.9394 | 0.9406 | 0.9418 | 0.9429 | 0.9441 |
1.60 | 0.9452 | 0.9463 | 0.9474 | 0.9484 | 0.9495 | 0.9505 | 0.9515 | 0.9525 | 0.9535 | 0.9545 |
1.70 | 0.9554 | 0.9564 | 0.9573 | 0.9582 | 0.9591 | 0.9599 | 0.9608 | 0.9616 | 0.9625 | 0.9633 |
1.80 | 0.9641 | 0.9649 | 0.9656 | 0.9664 | 0.9671 | 0.9678 | 0.9686 | 0.9693 | 0.9699 | 0.9706 |
1.90 | 0.9713 | 0.9719 | 0.9726 | 0.9732 | 0.9738 | 0.9744 | 0.9750 | 0.9756 | 0.9761 | 0.9767 |
2.00 | 0.9772 | 0.9778 | 0.9783 | 0.9788 | 0.9793 | 0.9798 | 0.9803 | 0.9808 | 0.9812 | 0.9817 |
2.10 | 0.9821 | 0.9826 | 0.9830 | 0.9834 | 0.9838 | 0.9842 | 0.9846 | 0.9850 | 0.9854 | 0.9857 |
2.20 | 0.9861 | 0.9864 | 0.9868 | 0.9871 | 0.9875 | 0.9878 | 0.9881 | 0.9884 | 0.9887 | 0.9890 |
2.30 | 0.9893 | 0.9896 | 0.9898 | 0.9901 | 0.9904 | 0.9906 | 0.9909 | 0.9911 | 0.9913 | 0.9916 |
2.40 | 0.9918 | 0.9920 | 0.9922 | 0.9925 | 0.9927 | 0.9929 | 0.9931 | 0.9932 | 0.9934 | 0.9936 |
2.50 | 0.9938 | 0.9940 | 0.9941 | 0.9943 | 0.9945 | 0.9946 | 0.9948 | 0.9949 | 0.9951 | 0.9952 |
2.60 | 0.9953 | 0.9955 | 0.9956 | 0.9957 | 0.9959 | 0.9960 | 0.9961 | 0.9962 | 0.9963 | 0.9964 |
2.70 | 0.9965 | 0.9966 | 0.9967 | 0.9968 | 0.9969 | 0.9970 | 0.9971 | 0.9972 | 0.9973 | 0.9974 |
2.80 | 0.9974 | 0.9975 | 0.9976 | 0.9977 | 0.9977 | 0.9978 | 0.9979 | 0.9979 | 0.9980 | 0.9981 |
2.90 | 0.9981 | 0.9982 | 0.9982 | 0.9983 | 0.9984 | 0.9984 | 0.9985 | 0.9985 | 0.9986 | 0.9986 |
3.00 | 0.9987 | 0.9987 | 0.9987 | 0.9988 | 0.9988 | 0.9989 | 0.9989 | 0.9989 | 0.9990 | 0.9990 |
3.10 | 0.9990 | 0.9991 | 0.9991 | 0.9991 | 0.9992 | 0.9992 | 0.9992 | 0.9992 | 0.9993 | 0.9993 |
3.20 | 0.9993 | 0.9993 | 0.9994 | 0.9994 | 0.9994 | 0.9994 | 0.9994 | 0.9995 | 0.9995 | 0.9995 |
3.30 | 0.9995 | 0.9995 | 0.9995 | 0.9996 | 0.9996 | 0.9996 | 0.9996 | 0.9996 | 0.9996 | 0.9997 |
3.40 | 0.9997 | 0.9997 | 0.9997 | 0.9997 | 0.9997 | 0.9997 | 0.9997 | 0.9997 | 0.9997 | 0.9998 |