Chapter 5
Point estimators

5.1 Point estimation

Based on our earlier studies and practice, we now move toward the study of estimating population parameters. Let’s use 𝜃 to denote, generally speaking, our parameter of interest. Very first of all, we don’t know the value of 𝜃. That’s why we are estimating it. All we know/have is a sequence of values which are coming out of the population. Formally, {xi}i=1N is the population and {xi}i=1n is our sample. Naturally the second one is a subset of the first. As we’ll discuss in detail later, our sample is better to be a random and large enough sample. So, our problem is to come up with a formula to find 𝜃; using a function notation

̂   ̂(   n  )
𝜃 = 𝜃 {xi}i=1

In the expression ̂𝜃(      )
 {xi}ni=1, the ̂𝜃(.) is our estimator, i.e.. our formula to find a value for the unknown 𝜃. The ̂𝜃 on the left hand side is called an estimate of 𝜃. Estimation is the name of the thing we are doing in here.

So, in the task of estimation, we use an estimator to obtain an estimate.

As we are seeking for a single value as an estimate here, what we do fall into the category of point estimation.

Notice that 𝜃 is unknown. {xi}i=1n is our data set, at hand and is known, possibly we collect or gather.  ̂
𝜃 as a function of {xi}i=1n is an estimator, possibly we derive. ̂
𝜃 as a numerical result is an estimate, possibly we calculate. The task to find a value ̂𝜃 for 𝜃 is estimation, possibly we perform.

Note that, while studying the properties of 𝜃(.), we prefer writing it as

 (      )
̂𝜃 {Xi}ni= 1

rather than

̂𝜃({x }n )
    i i=1

as this practice allows us to refer to statistical properties of Xi.

0Checkpoint
No: 65

For our point estimators, we seek to attain three important properties which are Unbiasedness, Consistency and Efficiency.

5.1.1 Unbiasedness

  ( )
E  ̂𝜃 = 𝜃

Unbiasedness is an individual property. If 𝜃̂ is an unbiased estimator, repeating the task of estimation so several times, using a different random sample of n observations each time, we come up with 𝜃, on average. That is E( (    n ))
 ̂𝜃 {Xi}i= 1 = 𝜃

5.1.2 Efficiency

Efficiency is a comparative property. ̂𝜃1 and ̂𝜃2 being two estimators of 𝜃, if

Var (𝜃̂1) < Var(̂𝜃2)

then ̂
𝜃1 is a more efficient estimator than ̂
𝜃2. Notice that Var(  ({  n}))
 ̂𝜃1   Xi < Var(  ({  n}))
 ̂𝜃2  X i inequality must hold at all times, i.e.. for all n.

5.1.3 Consistency

       ( )
lim  Var ̂𝜃 = 0
n→ ∞

Consistency is an individual property. Consider  ̂
𝜃 = ̂
𝜃(    n  )
 {Xi}i=1. If  ̂
𝜃 is a consistent estimator, repeating the task of estimation with larger samples reduces the error of estimation. That is,

Var(̂𝜃({X }n  )) n−→−−∞→ 0
         ii=1

0 Checkpoint
No: 66

5.1 EXERCISES ___________________________________________________________     

1. 

Consider a population that is uniformly distributed over the interval [0,β] where β is an unknown positive number that we want to estimate. Three alternative estimators are given to serve our purpose as:

β̂1 = max {X 1,X 2,...,Xn }β̂ 1 = ( (n+1))
  -n--- max {X1,X2,...,Xn} β̂3 = 2( X1,X2,...,Xn)
     n

EstimatorExpected value Variance
β̂ 1  n
n+1β    nβ2
(n+-2)(n+-1)2
β̂ 2 β --β2--
(n+ 2)n
β̂ 3 β β2
3n-

If you had to choose one of these estimators, which one would you pick? Why?

Solution: Among the three rival estimators, ̂
β 1, ̂
β 2 and ̂
β 3; ̂
β 2 and  ̂
β 3 are unbiased as:

E(β̂2) = β
 ( ̂)
E β 3 = β

Both ̂
β 2 and ̂
β 3 are consistent as:

       (̂ )       --β-2---
lni→m∞ Var β2  = lnim→∞ (n+ 2)n = 0

and

       (̂ )      β-2
ln→im∞ Var β3 = nli→m∞3n = 0

For n > 1, Var (  )
 ̂β2 < Var (  )
 ̂β3. So, ̂β 2 is more efficient than ̂β 3. So, we choose ̂β 2.

2. 

Consider a population that is uniformly distributed over and interval [0β] where β is an unknown positive number that we would like to estimate. We take a sample of size 3 from this population and use the sample maximum as an estimate for the population maximum.

i. What is the PDF of this estiamtor, i.e., what is the sampling distribution of the sample maximum?

ii. What is the expected value of this estimator?

iii. Is this estimator an unbiased estimator for β?

iv. Suggest one more estimator for β which is unbiased.

Solution:

1.
̂
β is defined as:
̂β = max {x1,x2,x3}

Where x1, x2 and x3 are the sample observations. Representing each observation xi as a random variable Xi:

β̂= m ax{X ,X ,X }
           1  2  3

is attained, which is to be used in the subsequent computations. CDF of  ̂
β , F̂β(t) is defined as a function of t as follows:

F̂β(t) = P(̂β ≤ t) = P (m ax{X 1,X 2,X 3} ≤ t).

Assuming that X1, X2 and X3 are independent random variables,

  P (m ax{X 1,X 2,X 3} ≤ t)
=P (X  ≤ t and X ≤ t and X ≤ t)
      1        2         3
=P (X 1 ≤ t)⋅P (X 2 ≤ t)⋅P (X 3 ≤ t)

can be written. Since X1, X2 and X3 come from a Uniform(0, β) distribution:

           -t
P (X 1 ≤ t) = β ,
            t
P (X 2 ≤ t) = β-,

P (X 3 ≤ t) =-t
           β

Then, F̂β(t) =  3
tβ3 is obtained and it further yields:

f (t) = dF̂β(t)= 3t2.
 ̂β       dt     β3
2.
E(̂
β ) = −∞tf̂β(t)dt = 0βt3t2
β3dt
      |3
= 3 t4||
  4 β3|0
  3
= 4β
3.
As E( ̂
β ) = 3
4β < β,  ̂
β is not an unbiased estimator of β.
4.
~β = 4
3 max {x,x ,x }
  1 2  3 is an unbiased estimator of β. Check why/how.
3. 

In a pasta factory, due to a miscalibration of the machinery, all spaghetti sticks produced had different lengths on a given day, then randomly packed and shipped to different locations and consumers. We wonder the length of the longest spaghetti stick produced that day. Offer a good statistical estimator to address this problem and discuss/evaluate its properties.

Solution: This exercise is left as self-study.

4. 

German tank problem

The problem is named after its use by Allied forces in World War II to estimate the monthly rate of German tank production from limited data. The approach enjoys the manufacturing practice of assigning and attaching ascending sequences of serial numbers to tank components, with some tanks being captured in battle by Allied forces. N being the total number of tanks produced, m being the highest serial number observed and k being the number of tanks captured, the estimator for N is given by:

        m
̂N = m + --− 1
        k

Examine and explain each term in the formula.

Solution: The purpose in the German tank problem is to estimate the population size N. Supposing that German tank manufacturers number their tanks in an intuitive fashion, estimation of population size and estimation of population maximum are similar problems. m being the highest (largest) serial number among the captured tanks and k being the number of tanks captured (i.e., sample size), the estimator for N is given by:

̂       m-
N = m + k − 1

In the formula, the first term (m) reflects the intuition that "sample maximum" is a good estimator of population maximum. However, when k is so small, it is not likely to observe higher serial numbers in the sample (among the captured tanks. To account for this, the m/k term is added to the first one. This term, for small k, yields a substantial addition to N̂. For large k, the addition to N̂ is limited. At the extreme, if k = N, then m is also equal to N. Such a configuration yields:

N̂ = N + N-− 1 = N
         N

indicating that ’when we capture all tanks, we, by definition know the total number of tanks.

Previously, we’ve studied the properties of point estimators in a thorough manner. However, we didn’t devote any efforts to derivation of the estimators. In this chapter, we do it. This chapter introduces three techniques to yield point estimators of distributional parameters of populations or data generating processes. We’ll consider three techniques (estimation criteria, estimation rules, or principles):

As promised at the beginning of Chapter 8, we are now taking into consideration the problem of estimating distributional parameters. Our task here is to deriving the relevant mathematical formulas to generate numerical estimates. Recall that these formulas are called estimators as we’ve studied in Chapter 8.

0Checkpoint
No: 67

5.2 Least squares technique: LS

Consider {xi}i=1n {xi}i=1N N(   2)
μ ,σ where we want to estimate unknown μ using our data. If we define a function S like:

               n
S (̂μ|{x}n  ) =   (x − ̂μ)2
       ii=1   i∑=1  i

we can easily see it as a loss (or punishment) function. Solving:

    n
min∑  (xi− ̂μ)2
{̂μ}i=1

we find our estimator.

dS(.)-
 d̂μ = i=1n2(xi− ̂μ)  (− 1) = 0 (F.O.C.)
2 i=1n(xi− ̂μ) = 0
i=1n(x− μ̂)
  i = 0
i=1nxi = i=1n̂μ
nμ̂ = i=1nxi

So,

    ∑ni=1xi
̂μ = --n---= x¯

is found. By minimizing the loss function, we’ve obtained the LS estimator for μ, which is ̂μ. Notice that, minimization of S here is equivalent to minimization of the unknown variance.

0Checkpoint
No: 68

5.3 Maximum likelihood technique: ML

Consider {xi}i=1n {xi}i=1N Poisson(λ) where we want to estimate unknown λ using our data. Let’s denote the estimator as λ̂. So, the likelihood of any xi in our data set is:

        −̂λ  x
f (x ) = e-λ̂i
   i     xi!

Then, the likelihood of the whole data set becomes:

                   ̂
L(λ̂|{x}n  ) = n e−λ̂λxi
       ii=1   ∏i=1  xi!

which is to be maximized to find ̂λ.

For computational ease, we take the natural logarithm of the likelihood function and call it log-likelihood function, denoted LogL.

    ( ̂    n  )   n (   −̂λ    ̂xi      )
LogL λ|{xi}i= 1 = ∑   lne  + lnλ  − lnxi!
                 i=n1        n          n
               = ∑  (−̂λ) + ∑ (xiln ̂λ) − ∑ (lnxi!)
                 i=1       i=1          i=1

Thus,

            ̂     ̂ n      n
LogL (.) = −nλ + ln λ∑i=1xi− ∑i= 1ln xi!

Solving,

    (                      )
               n     n
m âx  n̂λ+ ln ̂λ ∑ xi− ∑ lnxi!
{ λ}           i=1    i=1

dLogL (.)
--------
   d̂λ = n + 1
--
̂λi=1nxi = 0(F.O.C)
1-
̂λi=1n = n
̂λ = ∑ni=1-xi
   n

is found. By maximizing the likelihood function, we’ve obtained the ML estimator for λ, which is ̂
λ.

0Checkpoint
No: 69

5.4 Method of moments technique: MM

Consider {xi}i=1n {xi}i=1N N(μ ,σ 2) where we want to estimate unknown μ and σ2 using our data. Remember that the theoretical moments for the population are:

μ1 = μ
μ2 = σ2 + μ2

Numerical values of each are all unknown. Yet, we know the values of data moments as:

M1 =   n
∑-i=1-xi
   n
M2 = ∑ni=1-x2i
   n

Now, all we need to do is to match population moments with data moments:

μ1 = M1 μ = ∑ni=1xi
  n
μ2 = M2 σ2 + μ2 = ∑ni=1x2
--n--i-

As we have a system of two equations with two unknowns, if it exists, the solution is:

̂μ = M1
̂2
σ = M2 M12

̂μ and σ̂2 are the MM estimators of μ and σ2, respectively.

5.2 EXERCISES ___________________________________________________________     

1. 

Derive the Least Squares estimator for the mean of a population out of which you are given a data set of {xi}i=1n. Analyze the unbiasedness and consistency of the estimator you’ve derived.

Solution: Follow the replicate the solution given in the chapter. It simply yields:

     n
̂μ = ∑i=1xi= x¯
      n

Writing our estimator μ̂ as a random variable Xn, it is easy to compute E( ¯Xn) and Var(X ¯n ):

 ¯   X-1+-X2-+...+-Xn-
Xn =        n

E(X¯n ) = μ

          σ 2
Var (X ¯n ) =-n

So, μ̂ is an unbiased and consistent estimator.

2. 

We want to estimate λ for a Poisson population. However, we have only data on interarrival times of occurrences rather than the count of occurrences per unit time. Denoting this data set as {xi}i=1n, try to estimate λ using the Maximum Likelihood technique.

Solution: Since Poisson and Exponential are sister distributions (sharing the same parameter λ as we have maintained in these Lecture Notes), we can use the interarrival time data {xi}i=1n via Maximum Likelihood technique to estimate λ.
Consider {xi}i=1n {xi}i=1N Exponential(λ) where we want to estimate unknown λ using our data. Let’s denote the estimator as λ̂. So, the likelihood of any xi in our data set is:

f (xi) = λ̂e− ̂λxi

Then, the likelihood of the whole data set becomes:

  (        )   n     ̂
L  ̂λ|{xi}ni=1 = ∏  ̂λe−λxi
              i=1

which is to be maximized to find ̂λ. Then, the log-likelihood function, LogL becomes:

     (̂     n )   n (  ̂   ̂  )
LogL  λ|{xi}i=1  = ∑  lnλ − λxi
                  i=1       n
        LogL (.) = nln ̂λ − ̂λ∑ xi
                          i=1

Solving,

   (          n   )
max  nlnλ̂− ̂λ ∑ xi
{̂λ}           i=1

dLogL-(.)
   d̂λ = n-
̂λi=1nxi = 0(F.O.C)
n-
̂λ = i=1nxi
̂λ = --1---
∑ni=1xi
  n
3. 

We believe the monthly book expenditures by students of a college has a normal distribution, yet we don’t know μ and σ2. Luckily, we have the following data on book expenditures:

5, 15, 20, 40, 40, 45, 50, 50, 50, 55

Using the Methods of Moments, estimate the population mean and variance.

Solution: The theoretical moments are:

μ1 = μ
μ2 = σ2 + μ2

We calculate the data moments for 5, 15, 20, 40, 40, 45, 50, 50, 50, 55 as:

M1 = ∑ni=1xi
  n = 37
M2 = ∑ni=1x2
--n--i- = 1640

Matching the population (theoretical) moments with data moments:

μ1 = M1 μ = 37
μ2 = M2 σ2 + μ2 = 1640

Finally, the solution is:

̂μ = M1 = 37
̂σ2 = M2 M12 = 1640 372 = 271
4. 

Given a sample data set {xi}i=1n out of a Uniform(a,b) population, estimate a and b using:

i. The Least Squares technique

ii. The Maximum Likelihood technique

iii. The Method of Moments technique

In which cases you are successful? Why?

Solution: This exercise is left as self-study.

5. 

We know that a population of values (X) has a Funny(a,b) distribution. We also know that PDF of the Funny(a,b) distribution is defined properly as:

       ( )
        a   x      a−x                   +
f (x) = x  b (1− b)   ,x = 0,1,2...,a;a ∈ Z

First reveal and write the properties of b. Then, supposing a random sample of {xi}i=1n is available, derive the Method of Moments estimator of distributional parameters a and b. Explain whether you require any specific features for your sample so that you will be able to obtain numerically consistent estimates.

Solution: This exercise is left as self-study.

0Checkpoint
No: 70