Chapter 4
Sampling distributions

This chapter bridges our knowledge of the probability theory to statistical inference. Sampling distributions is the key to our understanding of how a small portion of a whole can represent the whole.

4.1 Chebyshev’s theorem

For any random variable X with a finite expected value μ and finite variance σ2,

                           1
∀k > 1,P (|X− μ | ≤ kσ) ≥ 1 −-2
                           k

or, alternatively, by setting k = 𝜖
σ,

∀𝜖 > 0,P (|X − μ| ≤ 𝜖) ≥ 1−-12
                         𝜖σ2

When σ2 is known, one of the k and 𝜖 can be arbitrarily picked.

0 Checkpoint
No: 58

4.2 Law of large numbers theorem

Let {Xi}i=1N be a sequence of identically and independently distributed random variables with a finite expected value μ. For each n N, define the random variable Xn as:

X¯n = X-1+-X2-+...+-Xn-
            n

Then,

∀ 𝜖 > 0, lim P (| ¯Xn − μ| ≤ 𝜖) = 1
       n→∞

For sufficiently large n, the mean of independently and identically distrbuted (iid) n random variables will almost surely be arbitrarily close to the expected value of the individual random variables.

Noting that, E ¯
(Xn) = μ and Var  ¯
(Xn ) = σ2
 n Then, n N, 𝜖 > 0,

P (|X ¯ − μ| ≤ 𝜖) ≥ 1−--1-
     n               -𝜖22--
                     σ/n

as implied by the Chebyshev’s theorem. As n approaches infinity, this expression reduces to:

∀ 𝜖 > 0, lni→m∞ P (| ¯Xn − μ| ≤ 𝜖) = 1

which is known as the Law of large numbers and is true even when the variance of Xi is not finite.

0Checkpoint
No: 59

4.3 Central limit theorem

Let {Xi}i=1N be a sequence of identically and independently distributed random variables with a finite expected value μ, and a finite and positive variance σ2. For each n, define the random variable Xn as:

 ¯   X-1+-X2-+...+-Xn-
Xn =        n

Let Z be a standard normal random variable. For any z R, we have

     (           )
       X¯n-−-μ
lnim→∞ P    √σ-  ≤ z  = P (Z ≤ z)
          n

Informally, CLT states that for sufficiently large n, the random variable

¯Xn− μ
-√σ---
   n

is approximately standard normal distributed regardless of the distribution of Xi and exactly standard normal distributed if Xi are normally distributed.

0Checkpoint
No: 60

4.4 Distribution of sample means

Consider {xi}i=1n {xi}i=1N which is a random sample of n observations coming from a population with mean μ and variance σ2. Xn being Xn = X1+X-2+...+Xn-
    n

E(X¯n ) = μ

          σ 2
Var (X ¯n ) =-n

If the population is distributed normally, then the distribution of the sample means is also normal. So,

     ¯X − μ
Z =  -n√σ---
        n

has a standard normal distribution.

4.4.1 Essence of sampling distributions

0Go to Teaching page & experiment to reveal the relationship between the population and sampling distributions using the file named ‘Confidence intervals.xlsx’.

0 Checkpoint
No: 61

4.4.2 Distribution of sample proportions

Consider {xi}in {xi}iN which is a random sample of n observations coming from a Bernoulli(p) population. Xn being Xn = X-1+X2+n...+Xn- = p̂

E(̂p) = p
Var ̂
(p ) = p(1−p)
  n

If n is large,

     p̂− p
Z = ∘-p(1−p)-
      --n--

is approximately distributed as a standard normal.

0Checkpoint
No: 62

4.4.3 Distribution of sample variances

Let s2 denote the sample variance for a random sample of n observations from a population with a variance of σ2.

E( 2)
 s = σ2
Var( )
 s2 =   4
2nσ−1

n
∑ (xi− ¯x)2 = ∑ ((xi− μ )− (¯x− μ))2
i=1
          = ∑ (xi− μ)2− 2(x¯− μ)∑ (xi− μ)+ ∑ (¯x − μ)2
                     2          2         2
          = ∑ (xi− μ) − 2n(¯x− μ) + n(¯x− μ )
          = ∑ (xi− μ)2− n(¯x− μ)2

E(  (x − ¯x)2) =E ( (x − μ )2)− nE ((x¯− μ)2)
  ∑   i          ∑   i
             = ∑ E◟((xi−◝◜-μ)2)◞−n E◟((¯x−◝◜μ)2)◞
                      σ2          σ2/n
                      2
             =n σ2− nσ- = (n− 1)σ2
                      n

So,

           1
E (s2) =E (n-−-1 ∑ (xi− ¯x)2)
         1
     = -----E(∑ (xi− ¯x)2)
       n − 1
     = --1--(n− 1)σ2 = σ 2
       n − 1

Given a random sample of n observations from a normally distributed population whose variance is σ2, the sample variance s2 has a χ2 distribution with (n1) degrees of freedom?

 2     (n−-1)s2
χn−1 =    σ2

is distributed as the Chi-squared (  )
 χ2 distribution with (n − 1) degrees of freedom.

0Checkpoint
No: 63

When  we talk about sampling distributions, notice that we are talk-
ing about the statistical properties of the n observations, rather than
tIhnosae nuotfs thheelNl members of the population. By definition, properties
of the sample relate to the properties of the population & the sample
size n.

4.1 EXERCISES ___________________________________________________________     

1. 

We have two data sets consisting of identical values: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. We will use xi denote the ith value in one data set and yj to denote the jth value in the other data set. We construct a new data set by taking a number from each data set and finding their average, i.e.. the new data set consists of values of the form:

(     )
-xi+-yj-
   2

The constructed data set will consist of one 0, two 0.5’s, three 1’s, etc.

i. Find all values in the new data set and their corresponding frequencies

ii. Construct a graph that summarize your finding

iii. Find the mean and variance of the initial data set and the new data set

Solution: Solve yourself to explore the Central Limit Theorem.

2. 

We make 100 independent observations from a population with mean 40 and standard deviation 20. Approximately, what is the probability that the mean of these observations will be greater than 37?

Solution: Population X ∼⋅(40, 202). We take n = 100 observations and calculate X100.

     E(X¯100) = μ = 40
               σ2   400
   Var(X¯100) = n-=  100 = 4
                ( ¯X   − 40   37− 40)
P ( ¯X100 > 37) = P-10√0---->  -√-----
                      4         4
            = P (z > − 1.5)
            = 0.93319
3. 

The following table gives the relative frequency distribution of a population:

ValueRel. Freq.
2 0.1
4 0.3
6 0.2
8 0.3
10 0.1

i. A number is selected from this population at random, what is the probability that the number selected is greater than or equal to 8?

ii. If we select two numbers at random (with replacement), what is the probability that the mean of these two numbers is less than or equal to 5?

iii. If 25 numbers are selected from the population at random (with replacement), what is the probability (approximately) that the mean these 25 numbers is less than 6.5?

Solution:

1.
The only trick in this exercise is to begin with calculating E(X) and Var(X). Calculate and see E(X) = 6 and Var(X) = 5.6. There is no sampling as n = 1. Simply calculate P(X 8).
2.
Calculate P ¯
(X 2 ≤ 5).
3.
Calculate P( ¯X25 < 6.5). Remember that E(X ¯n ) = μ and Var ( ¯X )
  n = σ2/n.
4. 

We choose 36 numbers, with replacement, at random (i.e.., we take a random sample of size 36) from the interval (0,4). Let X be the random variable that assigns to each sample (outcome) the mean of the sample.

i. Find the expected value and variance of X

ii. Find (an approximate value for) the probability that the sample mean, i.e.. X, will be less than or equal to 2.3

Solution:

1.
Since nothing further is instructed, assume that the population is Uniform(0, 4). Then,
 μ = 0+-4-= 2
       2
σ2 = (4−-0)2=  16
       12      12

X, here, is the mean of our 36 observations, i.e., X36 in our usual notation.

  E(X ) = μ = 2
          2
V ar(X ) = σ = 16/12-= -1
          n     36    27
2.
Using the parameters in part (i), calculate P(X 2.3), i.e., P(z 1.56). The answer is 0.94062.
5. 

We choose 9 numbers from a normally distributed population of numbers. The mean of the population is unknown but the variance is know to be equal to 16. If μ denotes the mean of the population, then what is the probability that the mean of the 9 numbers that we choose will be in the interval [μ − 2,μ + 2]?

Solution: You do not need the value of μ in this exercise. The key to solution is that Var ( ¯X9) = 16/9. So, performing the intermediate steps, the problem reduces to finding P(1.5 z 1.5) and the answer is 0.86638.

6. 

In a certain university the CGPA’s of students only takes the values 0, 1, 2, 3, 4. The distribution of CGPA’s of students of this university is given below:

CGPAFreq
0 5, 000
1 10, 000
2 20, 000
3 5, 000
4 10, 000
Total 50, 000

i. Let X1 denote the CGPA of a student who was chosen at random from the population of all students of this university. Tabulate the PDF of this random variable.

ii. Let X2 denote the average (mean) of the CGPA’s of two randomly selected students from the population of all students of this university. What is the probability that the average CGPA of the two students is less than or equal to 1?

iii. Now we choose 36 students at random. What is the probability, approximately, that the average CGPA of these students (X36) is less than or equal to 2.3?

Solution: This exercise is left as self-study.

7. 

Consider a large population of which only 20% know basic concepts of statistics. We take a random sample of size 81 from this population and count the number of individuals, in the sample, who knows basic concepts of statistics. What is the probability that the sample will have between 15 and 18 (inclusive) individuals who know basic concepts of statistics?

Solution: This exercise is left as self-study.

8. 

A four sided fair die is rolled several times and we calculate the average (mean) of the values observed.

i. Let X1 denote the random variable that gives the value observed when the die is rolled once. Find the expected value and variance of X1.

ii. Let Xn denote the mean of the values observed when the die is rolled n times. What is the minimum number of times that the die should be rolled so that the mean (the value of Xn) takes a value in the interval [2.1,2.9] with at least a probability of 0.9? Use ChebyshevsTheorem to answer this problem.

iii. Using the CentralLimitTheorem and the value for n you found above, find an approximate value for the probability of Xn taking a value in the interval [2.1,2.9].

iv. Do the answers you found in item (ii) and item (iii) contradict each other? If there is a difference in what the two answers suggest, explain the reason for this.

Solution: i. When the die is rolled once, by definition we produce an outcome directly of the population (that is we don’t do any sampling at all). So,

X 1 ∼ f(x),f(x) = 1/4,x = 1,2,3,4 (DiscreteRV )

E (X 1) = 1 ⋅1/ 4+ 2⋅1/4 + 3⋅1/4+ 4 ⋅1/4
      = 2.5

  ( 2)
E  X1  = 1⋅1/ 4+ 4⋅1/4 + 9⋅1/4+ 16 ⋅1/ 4
       = 7.5

Var(X ) = 7.5− 2.52
     1
       = 1.25

So, X1 ∼⋅(2.5, 1.25)

ii. Xn is the RV denoting the sample mean, when We roll the die n times.

E(X¯n ) = E (X 1) = 2.5
Var( ¯Xn) = Var(nX1)= 1.2n5

Midpoint of the interval [2.1, 2.9] is 2.5

    E (X 1) = E ( ¯Xn) = 2.5
2.5− 2.1 = 2.9− 2.5 = 0.4 → 𝜖

          𝜖 = 0.4

   ¯                  -1-
P(|Xn − 2.5| ≤ 0.4) ≥ 1− 0.42 ← 0.9
                       1.2n5-

    -1-
1 − 0.42=  0.9
    1.2n5
1.25 = 0.1× 0.16
  n

    1.25
n = -----
    0.016
  = 78.125
n = 79 ← Round 78.125 up

Notice that in this part, we use the Chebyshev’s theorem only. We get:

E(X¯79) = 2.5

and

Var( ¯X ) = 1.25= 0.015823
     79    79

iii.

                     (                         )
        ¯             -2.1−-2.5       -2.9-−-2.5-
P (2.1 ≤ X79 ≤ 2.9) = P √ 0.015823-≤ z ≤ √ 0.015823
                     ( − 0.4         0.4  )
                 = P  ------≤  z ≤------
                      0.1258      0.1258
                 = 0.99852

Notice that in this part, we use the CLT only.

iv. No, they don’t contradict each other: While Chebyshev’s theorem sets a lowerbound of 0.90 here (in ii), (iii) gives the actual probability as 0.99852 which is larger than 0.90 (as it should be).

Though I am not a fan of such hints, here a useful hint may be underlined: When an interval is given in a question like this, always begin with observing (calculating) its midpoint. In this question, the midpoint was (2.1 + 2.9)/2 = 2.5, which is nothing but E(X 1) and E( ¯Xn). Once you have noticed this, it will be trivial to find 𝜖 to figure out the rest of the steps.

One additional point is:

¯X1 = X1-
     1

So, sampling with n = 1 simply refers to population’s distribution.

If Zi,i = 1,2,...,m are all N (0,1) random variables, then:

                    V =  Z21 + Z22 + ...+ Z 2m
       2
has a χ distribution with m degrees of freedom.
                             V ∼ χ2
                                  (m)
                          E(V) = m
                        Var(V) = 2m

Recall that your Zi’s measure nothing but deviation from mean (here
0) in terms of standard deviations (here 1). So, when we sum the
squares of Zi’s, we are calculating sum of squares of the deviations of
a random variable from its mean. This should be something related
In a nutshell      2
to variance. Indeed, χ turns out to2be the sampling distribution of
variance. Putting in another way, χ distribution is the assessment
benchmark for variability.
Consider a single Z ∼ N (0,1) random variable and,
                                 2
                           V = Z
and observe this is nothing but a χ2 distribution with 1 degree of

freedom.
                             V ∼ χ2
                                  (1)
                          E(V ) = 1
                        Var(V ) = 2

0Checkpoint
No: 64

Ztandard Normal Diztribution

A cell which is at the intersection of the row labeled with a and column labeled with b gives the probability P(Z a + b).

azf+ b

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.000.50000.50400.50800.51200.51600.51990.52390.52790.53190.5359
0.100.53980.54380.54780.55170.55570.55960.56360.56750.57140.5754
0.200.57930.58320.58710.59090.59480.59870.60260.60640.61030.6141
0.300.61790.62170.62550.62930.63310.63680.64060.64430.64800.6517
0.400.65540.65910.66280.66640.67000.67360.67720.68080.68440.6879
0.500.69150.69500.69850.70190.70540.70880.71230.71570.71900.7224
0.600.72580.72910.73240.73570.73890.74210.74540.74860.75180.7549
0.700.75800.76110.76420.76730.77030.77340.77640.77930.78230.7852
0.800.78810.79100.79390.79670.79950.80230.80510.80780.81060.8133
0.900.81590.81860.82120.82380.82640.82890.83150.83400.83650.8389
1.000.84130.84380.84610.84850.85080.85310.85540.85770.85990.8621
1.100.86430.86650.86860.87080.87290.87490.87700.87900.88100.8830
1.200.88490.88690.88880.89060.89250.89430.89620.89800.89970.9015
1.300.90320.90490.90660.90820.90990.91150.91310.91470.91620.9177
1.400.91920.92070.92220.92360.92510.92650.92790.92920.93060.9319
1.500.93320.93450.93570.93700.93820.93940.94060.94180.94290.9441
1.600.94520.94630.94740.94840.94950.95050.95150.95250.95350.9545
1.700.95540.95640.95730.95820.95910.95990.96080.96160.96250.9633
1.800.96410.96490.96560.96640.96710.96780.96860.96930.96990.9706
1.900.97130.97190.97260.97320.97380.97440.97500.97560.97610.9767
2.000.97720.97780.97830.97880.97930.97980.98030.98080.98120.9817
2.100.98210.98260.98300.98340.98380.98420.98460.98500.98540.9857
2.200.98610.98640.98680.98710.98750.98780.98810.98840.98870.9890
2.300.98930.98960.98980.99010.99040.99060.99090.99110.99130.9916
2.400.99180.99200.99220.99250.99270.99290.99310.99320.99340.9936
2.500.99380.99400.99410.99430.99450.99460.99480.99490.99510.9952
2.600.99530.99550.99560.99570.99590.99600.99610.99620.99630.9964
2.700.99650.99660.99670.99680.99690.99700.99710.99720.99730.9974
2.800.99740.99750.99760.99770.99770.99780.99790.99790.99800.9981
2.900.99810.99820.99820.99830.99840.99840.99850.99850.99860.9986
3.000.99870.99870.99870.99880.99880.99890.99890.99890.99900.9990
3.100.99900.99910.99910.99910.99920.99920.99920.99920.99930.9993
3.200.99930.99930.99940.99940.99940.99940.99940.99950.99950.9995
3.300.99950.99950.99950.99960.99960.99960.99960.99960.99960.9997
3.400.99970.99970.99970.99970.99970.99970.99970.99970.99970.9998

       Probability theory → Statistics conceptual mapping:
                    ECON  221 → ECON  222
                 Probability theory → Statistics
                          X → Data
In a nutPsDhFe,llf → Relative frequency distribution (Histogram)

    CDF, F → Relative cumulative frequency distribution (O -give)
                        E(X) → μ ←  ¯x
                      Var(X ) → σ 2 ← s2