Chapter 3
Random variables

Earlier we entered the world of data and learned a rich collection of descriptive statistics, followed by developing a solid understanding of the probability theory. The knowledge of this chapter will now allow us to understand and practice the probability theory by means of the standard tools of calculus.

3.1 Random Variables

Formally, a random variable is a function from a sample space S to the set of real numbers R. Note at the very beginning that we denote random variables with uppercase letters and their particular values with the corresponding lowercase letters. So, a random variable X can take a value x.

When we define the outcomes of a random experiment via a random variable, we can generalize the very structure of the experiment and get rid off our context-dependence.

After internalizing the knowledge of this chapter, we will be able to state and solve a long array of problems with more formalism. In the rest of this chapter, we will first study the concepts of ’Cumulative distribution function’ (CDF) and ’Probability distribution function’ (PDF). At the cost of a spoiler, we can say that CDF and PDF are the theoretical counterparts of O-give and histogram, respectively. Secondly, we will study the concepts of the Expected value and Variance along with their key properties. While doing so, we will create and refer to a number of ad hoc random variables. Ad hoc is a Latin phrase meaning literally ’to this’. In English, it is used to describe ’something that has been formed or used for a special and immediate purpose, without previous planning’. In that, until we reach the section entitled ’Random variablest and distributions: Discrete probability laws’, we will be creating, using and disposing several random variables that serve our specific scientific/ technical purposes.

On one hand, our discussion and use of those ad hoc random variables and distributions will prove quite useful to handle a long list of probabilistic or statistical cases/problems. On the other hand, staying ’ad hoc’ is not good for a full-fledged practice of science, as our journey will reveal. As a matter of fact, a rich set of probability laws (Discrete probability laws and Continuous probability laws) will allow us to categorize, model and solve a variety of real-world statistical problems in a sound as well as practical fashion. Note that the use of the term ’Law’ may not be the best alternative available in scientific nomenclature; yet, it is part of the tradition. Those who are not comfortable with the use of the term ’Law’ may replace it with the term ’Distribution’. As an example, ’Uniform probability, law’ and ’Uniform probability distribution’ are simply the same thing as each other.

Now, we can proceed with our quest to learn things. Recall our repeatedly used random expertment of ’tossing a fair coin’. Head and Tail (or, H and T) being the two sides of a coin, we already know the following:

S = {H ,T }
P(Coin shows a Head ) = P (H ) = 1/ 2

P(Coin shows a Tail) = P(T) = 1/2

Upon these, we are allowed to define and study everything that is relevant. Despite its simplicity, such an approach lacks one important feature: mathematical generalization. Indeed, the real-world hosts a bunch of random experiments with two basic outcomes; a student to pass or to fail an exam, a patient to survive or not survive a sickness, an asteroid to hit or not to hit our planet Earth, and so on. Notice that each of these experiments look like tossing a coin. Furthermore, if the probability of passing the exam, the probability to survive and the probability to hit the Earth are equal to 1/2, these random experiments are ’identical’ with tossing a coin, except for the details of naming. So, why not to suggest a random variable X along with its probability distribution to address all these random experiments?

Consider X ∈{0, 1} (or x [0, 1]) for which P(x = 0) = 1/2 and P(x = 1) = 1/2. This is nothing but a direct equivalent of the random experiment of tossing a coin, without referring to coin explicitly. Let us leave this discussion for a while to consider another random experiment.

Now consider (or recall from our in-class discussions) the random experiment of drawing a number from the interval [1, 5] in a fully blind-folded fashion; so, there are infinitely many basic outcomes, which are the real numbers from 1 to 5 (as one cannot guarantee to pick intergers only, when blind-folded). With regard to this case, we already know the following:

S = {x | x ∈ [1,5],x ∈ R}

or simply,

S = [1,5]

and

P(The number picked is x) = 0,∀x

The final statement should trivially follow from chapter 4 (and should not sound weird to your ears anymore).

Following a similar agenda, to what we used in the case of tossing a fair coin above, we can say that the random experiment of picking a real number from [1, 5], from [2, 6], from [3, 7] or from [1001, 1005] should not differ. You may confirm this expectation once you have measured the length (size) of each of these intervals as 4 .

Define now Y [1, 5] and leave this discussion aside until we cover the following definitions. Each of these definitions is crucial for our subsequent study of probability theory and statistics. Combining/pairing with your in-class notes, use these definitions to come up with a holistic picture of the things (objects) involved.

0Checkpoint
No: 37

3.2 Cumulative distribution function: CDF

The cumulative distribution function or CDF of a random variable X is denoted by FX x
( ) or F x
( ), and is defined as:

FX(x) = PX(X ≤ x)

or

F (x) = P (X ≤ x)

The function F (x) is a CDF if and only if it satisfies the following
conditions:

∙  limx→ −∞ F (x) = 0 and limx→ ∞ F (x) = 1
∙In Fa ( nxu)t issh neolnl -decreasing in x

∙  F (x) is right- continous, i.e.
   lim    +        ,∀x
      x→x 0F(x)=F(x0)   0

0Checkpoint
No: 38

3.3 Continuous and discrete random variables

A random variable X is continuous if F(x) is a continuous function of x. A random variable X is discrete if F(x) is a step function of x.

Additionally, the random variables X and Y are identically distributed if for every set A,

P (X ∈ A) = P (Y ∈ A)

where this does not necessarily mean X = Y. If X and Y are identically distributed,

FX (x) = FY (x) ,∀x

0 Checkpoint
No: 39

3.4 Probability distribution functions

3.4.1 Probability distribution function: Case of discrete random variables

The probability distribution function PDF of a discrete random variable X is given by:

fX(x) = P (X = x),∀x

or

f (x) = P (X = x ),∀x

For discrete random variables, probability distribution function is also called the probability mass function.

3.4.2 Probability distribution function: Case of continuous random variables

The probability distribution function PDF of a continuous random variable X is the function fX(x ) that satisfies:

        ∫ x
FX(x) =    fX (t)dt,∀x
         −∞

For continuous random variables, probability distribution function is also called the probability density function.

X  has a distribution given by f (x) is abbreviated by X ∼ f (x),
wIhnerae nuwteshreelald the symbol "∼ " as is distributed as.

The function f (x) is a PDF if and only if it satisfies the following
conditions:

∙In fa ( nxu)t ≥sh0e,l∀lx

∙  ∑xf (x) = 1, (Discrete X)
   ∫∞  f (x)dx = 1, (Continuous X)
    −∞

3.4.3 Connection between CDF and PDF

Using the Fundamental Theorem of Calculus, if f(x) is continuous

d
--F (x) = f (x)
dx

The analogy with the discrete case is almost exact. We «add up» the point probabilities f(x) to obtain interval probabilities F(x). With a slight abuse of the notation:

ΔF (x) = f (x)

Numerical meaning of PDF: Values of fX (x ) or f (x ) are the prob-
abilities if X is discrete. However, when X is continuous, fX (x ) or
f (x) values are not probabilities. They are, rather, likelihoods or den-
sity ordinates. So, notice that:
In a nutshell
                 P (X = x) = fX (x), (Discrete X)

and
                  P (X = x) = 0, (Continous X )

Having been exposed to formal definitions of the functions and operators involved, now we will reconsider the random experiment of tossing a fair coin (discrete random variable case) and random experiment of picking a number from [1, 5] (continuous random variable case), in that order.

0Checkpoint
No: 40

Using our newly acquired knowledge, we can now define the following:

X ∼  f(x)
       {
         1/2,  x = 0
f(x) =   1/2,  x = 1
       (
       |{   0,    x < 0
F (x) = |  1/2, 0 ≤ x < 1
       (   1,    x ≤ 1

Here, X is nothing but the random variable that describes the outcomes of the random experiment of tossing a fair coin.

Consider also:

Y ∼ g(y)
      {
g(y) =   1/4, 1 ≤ y ≤ 5
          0,   otherwise
       (
       |{  0y−,1  y < 1
G(y) = |  -4-, 1 ≤ y ≤ 5
       (   1   5 ≤ y

You must have noticed that Y is the random variable that describes the outcomes of the random experiment of picking a number from [1, 5].

Since F(x) is a discrete function, X is a discrete random variable and since G(y) is a continuous function, Y is a continuous random variable.

A function f(x) is a continuous function of x if:

               lim f(x) = lim  f(x) = f (x0),∀x0
               x→x−0       x→x +0

A function f(x) is a left- continuous function of x if:

In a nutshell        lim− f(x) = f (x0),∀x0
                    x→x0
A function f(x) is a right-continuous function of x if:


                    xli→mx+0 f(x) = f (x0),∀x0

A function f(x) which is both left- continuous and right- continuous
in its domain is a continuous function.

In the cases of X f(x) and Y g(y) above, observe that:

f(− 0.5) = 0
f(0.5) = 0

f(1.5) = 0
g(−0.5) = 0

g(0) = 0
g(1.5) = 1/4
g(2.5) = 1/4

g(5.1) = 0
g(10) = 0

Also, notice that:

P (X  = 0) = f(0) = 1/ 2

and

P(Y = 3) = 0

while

g(3) = 1/4

Your mind should be crystal clear in this distinction of probabilities and likelihoods for continuous random variables.

But, how we define/refer to probabilities and calculate them in the case of continuous random variables? The answer should be trivial to you: since the point probabilities are all zero for a continuous random variable, we can talk about the ’probabilities of intervals’ only. Then, the following calculation for the random variable Y above is legitimate:

              ∫ 4
P(2 ≤ Y ≤ 4) =   g(y)dy
              ∫24
            =    1dy
               2|4
            =  y||4
               4 2
            = 4 − 2
              4   4
            = 2
              4
            = 1
              2

Alternatively,

P(2 ≤ Y ≤ 4) = G(4)− G(2)
              4− 1   2− 1
           =  -----− -----
               4       4
           =  3−  1
              4   4
           =  2
              4
           =  1
              2

yields the same solution. Now, give an effort to show these solutions on the graphs of g(y) and G(y).

0Checkpoint
No: 41

3.5 Expected Value

The expected value or mean of a random variable X is:

E(X ) = ∑ xf (x),if X is discrete
        x

and

       ∫ ∞
E(X ) =    xf (x)dx,if X is continuous
        −∞

provided that the sum or integral exists.

The expected value or mean of a random variable g (X ) is:
             E (g(X )) =   g (x) f (x),if X is discrete
                        ∑x
aInnda nutshell
                     ∫ ∞
           E(g (X )) = −∞ g(x)f (x)dx,if X is continous

provided that the sum or integral exists.

Let X be a random variable and a, b and c be constants. Then, for any

g1 (x) and g2(x) whose expectations exist:
∙  E(ag (x)+ bg (x)+ c) = aE(g (x))+ bE(g (x))+ c
       1       2              1          2
∙In Ifa ngu1t(sxh)e ≥ll0,∀x, then E (g1(x)) ≥ 0

∙  If g1(x) ≥ g2(x), ∀x, then E (g1(x)) ≥ E (g2(x))

∙  If a ≤ g1(x) ≤ b,∀x, then a ≤ E (g1(x)) ≤ b

0Checkpoint
No: 42

3.6 Variance and standard deviation

The variance of a random variable X is defined as:

Var(X) = E(X − E (X ))2

For a discrete random variable X:

                     2
Var (X ) = ∑x (x− E (X )) f (x)

where

E (X ) = ∑x xf (x)

For a continuous random variable X:

         ∫ ∞
Var (X ) =    (x − E(X))2f (x) dx
          − ∞

where

       ∫ ∞
E (X ) =    xf (x)dx
        − ∞

The positive square root of Var(X) is the standard deviation of X. If X is a random variable with finite variance, then for any constants a and b:

              2
Var(aX + b) = a Var(X)

An  alternative and easier formula for the variance is given by:
                  Var(X) = E(X 2) −(E (X))2

The simple proof is as follows:
                             (              (      ))
In Vaa nru(tXsh)e =llE (X− E (X))2 = E X 2− 2XE (X)+ E (X)2
                             (  2)                       2
                         = E  X   − 2E(XE (X )) + E(E (X ))
                             (  2)                      2
                         = E (X ) − 2E(X )E(X )+ (E(X ))
                         = E  X 2 − (E(X ))2

Regarding the same X and Y defined above, we can now study/compute the Expected value and the Variance:

E (X ) =   xf(x)
       ∑x
     1      1
 = 0⋅2 + 1⋅ 2
   1
 = 2

                    2
Var(X) = ∑ (x− E(X )) f(x)
  (      x)2     (     ) 2
=   0− 1   ⋅ 1 + 1 − 1   ⋅ 1
       2    2        2    2
   1 1   1  1
=  4 ⋅2 + 4 ⋅2
   1
=  4

  (  )
E  X2  =   x2f(x)
         ∑x
    2 1   2  1
 = 0 ⋅2 +1  ⋅2
   1
 = 2

          (  )
Var(X ) = E X2  − (E(X))2

  1   (1) 2
= 2 −  2
  1   1
= - − -
  21   4
= -
  4

       ∫
E(Y) =  ∞ yg (y)dy
        −∞
       ∫ 5 1
    =   1 y 4dy
        2||5
    =  y-||
       8
    =  25− 1
       8   8
    =  24
       8
    =  3

        ∫ ∞          2
Var(Y ) = −∞ (y − E(Y)) g(y)dy
        ∫ 5      21
      =  1 (y − 3) 4dy
              3 |5
      =  (y-−-3)-||
           12(  |1 )
        -8      -8
      = 12 −  − 12
        16
      = --
        142
      = -
        3

  ( 2)   ∫ ∞  2
E  Y   =  −∞ y g(y)dy
         ∫ 5  1
       =    y2-dy
          1 |5 4
       =  y3||
          12|1
         125   1
       = -12 − 12
         124
       = -12
       31
     = 3-

          (  )
Var(Y ) = E Y2  − (E (Y ))2

      = 31 − 32
         3
      = 31-−-27
           3
      = 4
        3

3.1 EXERCISES ___________________________________________________________     

1. 

Let X be a random variable with the following cumulative distribution function F:

Fx123450000000001.1.2.3.4.5.6.7.8.9

Calculate the following probabilities:

i. P(X ≤ 4)

ii. P(2 < X ≤ 4)

iii. P(2 ≤ X ≤ 4)

iv. P(3.5 ≤ X < 4)

v. P(X = 4)

vi. P(X > 3)

Solution:

i.
P(x 4) = 0.8
ii.
P(2 < x 4) = 0.5
iii.
P(2 x 4) = 0.6
iv.
P(3.5 x < 4) = 0
v.
P(x = 4) = 0.1
vi.
P(x > 3) = 0.4
2. 

Let X be a discrete random variable with the following PDF, f:

x 1 3 5 7 9
f(x)0.40.10.20.20.1

i. P(3 < X < 7)

ii. P(3 < X < 7|X > 5)

iii. Draw the graph of the CDF of X. Find the expected value of X.

Solution:

1.
P(3 < x < 7) = f(5) = 0.2
2.
P(3 < x < 7x > 5) = 0
3.
This exercise is left as self-study.
E(X ) = ∑ xf(x ) = 1⋅0.4+ 3⋅0.1+ 5⋅0.2+ 7.0.2+ 9⋅0.1
        x
               = 0.4 + 0.3+ 1.0+ 1.4+ 0.9
               = 4
3. 

Explain why each of the following is or is not a valid probability distribution for a discrete random variable X:

i.

x 0 1 2 3
f(x)0.10.30.30.2

ii.

x 2 1 0
f(x)0.250.500.25

iii.

x 4 9 20
f(x)0.30.40.3

iv.

x 2 3 5 6
f(x)0.150.150.450.35

Solution:

1.
xf(x) = 0.1 + 0.3 + 0.3 + 0.2 = 0.9 < 1.0 (not valid).
2.
f(x) 0 for all x values.
∑ f(x) = 0.25+ 0.50+ 0.25
x
      =  1.0 = 1.0  (valid).
3.
f(4) = 0.3 < 0 (not valid).
4.
∑  f(x) = 0.15 + 0.15+ 0.45+ 0.35 > 1
 x
( not valid).
4. 

The random variable X has the following discrete probability distribution:

x 1 3 5 7 9
f(x)0.10.20.40.20.1

i. List the values x may assume.

ii. What value of x is the most probable?

iii. Graph the probability distribution.

iv. Find P(X =  7)

v. Find P(X ≥ 5)

vi. Find P(X > 2)

vii. Find E(X )

Solution:

1.
x can take any of the values from {1, 3, 5, 7, 9}.
2.
f(5) is greater than all other f(x) valued; so, x = 5 is the most probable.
3.
This exercise is left as self-study.
4.
P(x = 7) = f(7) = 0.2
5.
P(x 5) = f(5) + f(7) + f(9) = 0.4 + 0.2 + 0.1 = 0.7
6.
P(x > 2) = f(3)+ f(5)+ f(7)+ f(9)
        = 0.2+ 0.4+ 0.2+ 0.1

        = 0.9
7.
E(X) = xxf(x) = 5.
5. 

Consider the probability distributions,

x 0 1 2
f(x)0.30.40.3

and

y 0 1 2
f(y)0.10.80.1

i. Use your intuition to find the mean for each distribution.

ii. Which distribution appears to be more variable? Why?

Solution:

1.
f(x) is symmetric around x = 1. f(y) is symmetric around y = 1. So, E(X) = 1 and E(Y) = 1.
2.
X displays higher variation. Intuitively, "its values that are away from the expected value are more probable" compared to the case of Y.
6. 

Every morning, my mother gives me a random amount of money according to the following PDF, where X is the random variable that measures the amount of money:

x 20 30 40 50
f(x)0.100.200.300.40

Right after that, my sister takes out of my pocket a random amount of money according to the following CDF, where Y is the random variable that measures the amount of money:

y 5 10 15
F(y)0.300.701.00

Then I leave home and spend all my money before the day ends. Create a random variable W which shows the net amount of money before I leave home in the morning. Calculate F(w) and present it in tabular format. Using these functions:

i. Calculate E(X )

ii. Calculate E(Y )

iii. Verify that E(W ) = E(X ) E(Y )

iv. Draw the graph of f(w) and mark the value of E(W) on it

v. Calculate Var(W )

Solution: i.

E(x) = ∑ xf (x)
    = 20⋅0.10+ 30⋅0.20+ 40⋅0.30+ 50⋅0.40

    = 2+ 6 + 12+ 20
    = 40

ii. First, we need to find g(y) :

g(y) = ΔG (y)
g(5) = 0.30 ← 0.30

g(10) = 0.40 ← 0.70− 0.30
g(15) = 0.30 ← 1.00− 0.70

E (Y ) = ∑ yg(y) = 5⋅0.30+ 10⋅0.40+ 15⋅0.30
               = 1.5 + 4+ 4.5

               = 10

iii. First, we need to find the PDF of W, call it h(w). Find the possible values of W and calculate the probability for each w. Those values are

w ∈ {5,10,15,20,25,30,35,40,45}
h(5) = f(20)g(15) = 0.10⋅0.30 = 0.03

h(10) = f(20)g(10) = 0.10⋅0.40 = 0.04
h(15) = f(20)g(5)+ f(30)g(15) = 0.10 ⋅0.30+ 0.20⋅0.30 = 0.09
h(20) = f(30)g(10) = 0.20⋅0.40 = 0.08

h(25) = f(30)g(5)+ f(40)g(15) = 0.20 ⋅0.30+ 0.30⋅0.30 = 0.15
h(30) = f(40)g(10) = 0.30⋅0.40 = 0.12

h(35) = f(40)g(5)+ f(50)g(15) = 0.30 ⋅0.30+ 0.40⋅0.30 = 0.21
h(40) = f(50)g(10) = 0.40⋅0.40 = 0.16
h(45) = f(50)g(5) = 0.40⋅0.30 = 0.12

Then,

E (w ) = ∑ wh(w)

     = 5⋅0.03+ 10⋅0.04 + 15⋅0.09 + 20⋅0.08
       + 25⋅0.15+ 30⋅0.12+ 35⋅0.21+ 40⋅0.16
       + 45⋅0.12

     = 0.15 +0.4+ 1.35+ 1.6
       + 3.75+ 3.6 + 7.35+ 6.4

       + 5.4
     = 30

From the previous parts we know that E(X) = 40 and E(Y) = 10. In this part, we found E(W) = 30. So, E(X) E(Y) = 40 10 = 30 = E(W) verification done.

iv. Do on your own.

v. Calculate Var(W) as:

Var(W ) = ∑ (w− E (w))2h(w )
       = (5− 30)2⋅0.03+ (10 −30)2⋅0.04
                2               2
       +(15 − 30) ⋅0.09+ (20− 30) ⋅0.08
       +(25 − 30)2⋅0.15+ (30− 30)2⋅0.12
                2               2
       +(35 − 30) ⋅0.21+ (40− 30) ⋅0.16
       +(45 − 30)2⋅0.12
       = 115

As an alternative:

 (   )
E  W2  = ∑ w 2h(w )

         = 25 ⋅0.03+ 100⋅0.04+ 225⋅0.09
         + 400⋅0.08+ 625⋅0.15 + 900⋅0.12

         + 1225⋅0.21 + 1600⋅0.16 + 2025 ⋅0.12
       = 1015

          (   2)         2
Var(W) = E  W   − (E(W ))
       = 1015− 302

       = 1015− 900
       = 115

(As a follow-up exercise: calculate Var(X) and Var(Y) on your own, and verify that Var(W) = Var(X) + Var(Y)).

7. 

Consider X f(x) = 1
4, 4 x 8, Y g(y) = 1
3, 0 y 3 and another random variable W which is defined as W = XY. Calculate E(X), E(Y), E(W), Var(X), Var(Y), Var(W).

Solution:

       ∫ ∞          ∫ 8   1     1 x2||8
E(X) =  −∞ xf(x)dx = 4 x ⋅4dx = 4 2-||
                                    4
                              = 1(64− 16)
                                8
                              = 6

       ∫ ∞          ∫ 3 1     1 y2||3
E(Y) =     yg(y)dy =    y3dy = 3 2-||
        −∞           0            0
                            = 1(9− 0)
                              6
                            = 3/2

                                      |
 (  2)   ∫ ∞  2        ∫ 8 21     1 x3||8
E  X   =  −∞ x f(x)dx =  4 x 4dx = 4 3 |
                                   1
                                = --(512− 64)
                                  12
                                = 112/3

V ar(X ) = E (X2) − (E (X))2

       = 112 − 36
          3
       = 4/3

 (  )   ∫ ∞           ∫ 3  1      1y3||3
E Y 2 =     y2g(y)dy =   y2-dy =  ---||
         −∞            0   3      33  0
                               = 1(27 − 0)
                                 9
                               = 3

          ( 2)        2
Var(Y ) = E Y   − (E (Y ))
            9
      = 3−  4

      = 3/4

Without finding h(w), the following can be written:

W  = X − Y → E(W ) = E (X )− E(Y)
                        3
                  = 6 − 2
                  = 9/ 2

         → V ar(W ) = V ar(X )+ Var(Y)
                     4  3
                  =  3 − 4
                     16−-9-
                  =   12
                  = 7/ 12

Another way to deal with W is:

W ∼ h(w)∫,  h(w∫) = f(x)g(y)
         ∞     ∞
E(W ) = x=−∞  y=−∞ (x − y)f(x)g(y)dxdy
  ∫ 8 ∫ 3       1 1
=         (x− y)4 3dxdy
   x=4 y=0        |
   1-∫ 8  −(x−-y)2||3
=  12 x=4    2    |y=0dx
       ∫ 8[               ]
= − -1    (x − 3)2 − (x − 0)2  dx
    24 ∫4
= − -1  8(− 6x+ 9)dx
    24  4
     1 (   x2     )||8
= −  24  −6-2 + 9x ||
       (         )|84
= −  1- − 3x2+ 9x ||
     24           |4
    -1
= − 24[(−3 ⋅64 + 9⋅8)− (− 3⋅16+ 9⋅4)]
     1
= − 24[−192 + 72+ 48− 36]
     1
= − 24(− 108)
   108
=  24-= 9/2 (verifies E(W ) = E(X)− E (Y ) )

To calculate Var(W) you do the following:

If ’double integrals’ were not in the curriculum of MATH 105 or MATH 106 and if you do not have a prior knowledge of it, you may safely skip this last part.

3.7 Random variables and distributions: Discrete probability laws

We consider here four (one being optional) discrete probability laws

0Checkpoint
No: 43

3.7.1 Bernoulli distribution

Bernoulli distribution is also called Bernoulli trial or Bernoulli process. Consider an experiment consists of 1 trial and let there be two possible outcomes, success and fail.

For X Bernoulli(P )

              (
              |||| P     ,x = 1
              |{ 1− P  ,x = 0
∀x ∈ R,f (x) = |
              |||| 0     ,otherwise
              (

Despite its simplicity, Bernoulli distributton is a stunningly useful one, as a building block of some other distributions.

Observe below the PDF of X Bernoulli(0.80):

0100000xf.2.4.6.8(x)

Expected value and Variance:

        1
E(X) = ∑  xf(x) = 0⋅(1− P) + 1⋅P
       x=0
               =P

   2     1  2      2          2
E(X ) = ∑  x f(x ) =0 (1− P )+ 1 ⋅P
        x= 0
                 =P

Var(X) =E (X 2)− (E(X))2
             2
       =P − P
       =P (1 − P)

A practical hint for deriving Var(X )
As you have seen in the derivations of E (X) and Var(X) for the
Bernoulli distribution, we first calculated the E(X ). This is an often
seamless step. However, instead of attacking the Var(X) directly, we
preferred to calculate E (X2), which with the knowledge of E (X )

yields
                            (   )
                  Var(X) = E  X2  −(E (X ))2

In the remaining derivations of this chapter, notice the use of
In a nutshell            E(X(X − 1))

to facilitate easier derivation / calculation of Var(X). Especially
when  the PDF of X, ie., f(x) involves combinations, E(X (X − 1))
may  be a life saver.
Notice that E (X (X − 1)) = E(X 2− X) = E (X2)− E(X )
     (  2)
So, E X   = E(X(X − 1))+ E(X ), and:
                         2         2
            Var(X ) = E (X )− (E(X))
                   = E (X (X −1))+ E (X )− (E(X))2

Examine/  practice this hint along this chapter.

0Go to Teaching page & experiment with Bernoulli(P) using the file named ‘Statistical distributions.xlsx’.

0 Checkpoint
No: 44

3.7.2 Binomial distribution

Consider an experiment which consists of n independent and identical Bernoulli trials;i.e, the probability of success (P) is the same across all the trials and a trial’s outcome does not alter the outcomes of the subsequent trials. X being the number of successes in n trials, X Binomial(n,P), i.e., X has a Binomial distribution with parameters n and P:

              (
              || (n)Px (1− P)n−x  ,x = 0,1,2...,n
              {  x
∀x ∈ R ,f (x) = || 0              ,otherwise
              (

Observe below the PDF of X Binomial(8, 0.80):

0123456780000xf.1.2.3(x)

Then the PDF of X Binomial(8, 0.20):

0123456780000xf.1.2.3(x)

And finally the PDF of X Binomial(8, 0.50):

012345678000xf.1.2(x)

Having compared the PDFs of Binomial(8, 0.80), Binomial(8, 0.20), Binomial(8, 0.50), can you identify the source of asymmetry of Binomial PDFs?

Expected value and Variance:

        n
E(X) = ∑  x---n!----Px(1− P)n−x
       x= 0 x!(n − x)!
          n     (n − 1)!    x−1      n−x
     =nP ∑  (x-−-1)!(n−-x)!P   (1− P)
         x◟=1-----------◝◜-------------◞
                        1
     =nP

         n
E(X 2) = ∑ x2---n!---Px (1 − P)n−x
        x=0  x!(n − x)!

is not practical to work with. So, consider:

E (X (X− 1)) = n  x(x − 1)---n!---Px(1 − P)n−x
             x∑=0        x!(n − x)!
                        n
            =n(n − 1)P 2∑  ---(n-−-2)!---Px− 2(1 − P)n−x
                       x=◟-2(x−-2)!(n-−-x◝)◜!------------◞
                                     1
            =n(n − 1)P 2

This means:

  ( 2)                  2
E (X ) − E(X) = n(n− 1)P
E  X2  = n(n− 1)P2 +nP

Then,

Var(X ) = E (X 2)− (E(X))2
                 2           2
      = n(n − 1)P  + nP − (nP )
      = n2P 2− nP2+ nP − n2P2
      = nP − nP2

      = nP (1− P)

0 Checkpoint
No: 45

0 Go to Teaching page & experiment with Binomial(n, P) using the file named ‘Statistical distributions.xlsx’.

3.7.3 Poisson distribution

Consider an experiment which consists of counting the number of times a certain event occurs during a given unit of time or in a given area or volume. The probability that an event occurs in a given unit of time, area or volume is the same for all units. The number of events that occur in one unit of time, area or volume is independent of the number that occur in any other mutually exclusive unit. The mean (or expected, or typical) number of events in each unit is denoted by λ. For X Poisson(λ):

              (| e−λλx-
              |{  x!   ,x = 0,1,2...
∀x ∈ R,f (x) = | 0     ,otherwise
              |(

Recall that, e is called the Euler’s Number where e = 2.71828...

Observe below the PDF of X Poisson(3):

0246811111205000xf024680...(⋅112x15)0−2

Then the PDF of X Poisson(6):

024681111120500xf024680..(⋅11x15)0−2

And finally the PDF of X Poisson(10):

02468111112050xf024680.(⋅1x1)0−2

Is the last graph symmetric? Is it possible to have a Poisson(λ) PDF which is symmetric? Why?

Expected Value and Variance:

       ∞   e−λλx
E(X) = ∑  x--x!--
       x=0
       −λ ∞   λx-
     =e   ∑x=0xx!
            ∞    x− 1
     =λe− λ∑  -λ-----
           x◟=1(x◝−◜-1)!◞
               eλ

     =λ

         ∞    −λ x
E (X 2) = ∑ x2e--λ--
        x=0    x!

is not useful againg. So, consider,

              ∞         − λ x
E(X (X − 1)) = ∑ x(x − 1) e--λ--
             x=0         x!
                 ∞    λx
           =e −λ ∑  (x-−-2)!-
                x=2
           = λ2e−λ ∞  -λx−2--
                  x∑=2 (x − 2)!
                  ◟---◝◜λ---◞
                       e
           = λ2

This means,

E(X2)− E(X ) = λ2
          2    2
      E (X  ) = λ + λ

Then,

Var(X) =E (X 2)− (E(X))2
       =λ 2+ λ− λ 2

       =λ

0 Checkpoint
No: 46

0 Go to Teaching page & experiment with Poisson(λ) using the file named ‘Statistical distributions.xlsx’.

Poisson approximation to Binomial distribution
Let X be the number of successes resulting from n independent trials,
each with probability of success P. The distribution of the number
of successes, X, is binomial, with mean nP. If the number of trials,

nI,n isa nlaurtgseh aenldl nP is of only moderate size (preferably nP ≤ 7), this
distribution can be approximated by the Poisson distribution with
λ = nP. So,
                 f (x) = e−nP (nP)x,x = 0,1,2...
                            x!
can safely be used to obtain a numerical result.

3.7.4 How to derive Poisson distribution from Binomial distribution?

Consider a Binomial(n,p) process with:

      (  )
f (x) = n px (1− p)n−x,x = 0,1,2,...,n
        x

Define

λ = np

So,

    λ
p = --
    n

Re-writting f x
( ) as:

          (  ) (  )x(      )n−x
f (x) = lim  n   λ-    1− λ-
      n→ ∞  x   n        n

the derivation yields. Now,

           ( ) (  )x (     )n−x
f (x) = lim  n    λ-   1−  λ-
       n→ ∞  x    n        n
       (λx )    [   n!   ( 1 )] [(    λ)n] [(    λ) −x]
    =   --- nli→m∞  -------  -x-    1−  --     1 − --
        x !     ◟-(n-−-x)◝!◜--n--◞ ◟---◝◜n---◞◟----◝n◜----◞
                       A             B          C

Consider now the parts A, B and C separately:

(A)
The limit is 1; there are x terms linear in n in its numerator and x terms each of which is equal to n in its denominator
(B)
The limit is eλ; by definition of the Euler’s number.
(C)
The limit is 1 trivially.

Combining the limits:

f (x|λ) = P (X = x|λ) = λxe−-λ
                       x!

is found.

The intuition is as follows: When we consider a Binomial process in which a success occurs with an infinitesimal probability in every infinitesimal time period and when there are infinitely many time periods as such, what yields for a finite time period is nothing but the Poisson distribution. The derivation can be carried out in reference to space rather than time, if you wish.

0Checkpoint
No: 47

3.7.5 Hypergeometric distribution

Consider an experiment which consists of randomly drawing n elements without replacement from a set of N elements, r of which are successes and (N − r) of which are failures. X being the number of successes among n elements, X Hypergeometric(N ,r,n):

             (| (rx)(Nn−−rx)
             ||{   (Nn)   ,x = max{ 0,n − (N − r)},...,m in{r,n}
∀x ∈ R ,f (x) = 0      ,otherwise
             |||(

Expected Value and Variance:

           (   ) (       )
             r      N − r
       n     x      n− x
E(X) = ∑  x-----(----)------
      x=0         N
                  n
           (   ) (       )
             r      N − r
     = n  x--x--(---n−)-x----
      x∑=1         N
                  n

 (   )
   r      ----r!---
x  x    =xx!(r− x)!

        =r---(r−-1)!---
          ((x− 1)!(r)− x)!
             r− 1
        =r   x− 1

(    )
  N      ----N!---   ---N-(N-−-1)!----
  n    = n!(N − n)! = n(n− 1)!(N − n)!

                   = N----(N-−-1)!---
                     n ((n− 1)!(N)− n)!
                     N    N − 1
                   = n-   n− 1

      (  r− 1 ) ( N − r )
   n r
=    ----x−(1-----n-−)-x---
  ∑x=1     N   N − 1
          n-  n − 1
       (       ) (       )
          r− 1      N − r
  nr n    x− 1      n− x
= --∑  -----(--------)-----
  N x=1        N − 1
               n− 1
    ◟----------◝1◜---------◞
    nr
  = N-

                        (    )(        )
                           r     N − r
               n           x     n − x
E (X (X − 1)) = ∑  x(x− 1)-----(----)------
              x=0              N
                               n
                        (  r )(  N − r )
               n
            = ∑  x(x− 1)---x-(---n)−-x---
              x=2              N
                               n

Notice that:

        ( r )              r!
x (x − 1)      =  x(x − 1)---------
          x             x!(r− x)!
                       ----(r-−-2)!---
              =  r(r− 1)(x− 2)!(r− x)!
                       (       )
              =  r(r− 1)   r− 2
                          x− 2

and that:

(    )
  N    = ----N-!-- = =  ---N(N-−-1)(N-−-2)!----
   n     n !(N − n)!     n(n− 1)(n− 2)!(N − n)!
                              (       )
                   = N-(N-−1)-  N − 2
                      n(n− 1)    n− 2

                       (       ) (        )
                r(r− 1)   r− 2      N − r
              n           X −2      n− x
E(X(X − 1)) = ∑ ------------(-------)------
             x=2      N(N−1)  N − 2
                      n(n− 1)    n− 2
                               (  r− 2 ) (  N −r )

           = nr(n-−-1)(r−-1)  n --X-−(2------n)−-x---
             N     N − 1    ∑x= 2       N − 2
                                       n− 2

    2          2
E (X  − X ) =E (X )− E(X )
            nr(n−-1)(r−-1)
          = N    N − 1
        2   nr(    (n− 1)(r− 1))
    E(X  ) = N  1+ ----N-−1----

            2         2
Var(X) =E(X  )− (E(X))
        nr (    (n− 1)(r− 1))   (nr)2
       =-N  1 + ---N-−-1----  −  N-
           (                    )
       =nr  1 + (n−-1)(r−-1)− nr
         N (       N − 1      N                 )
       =nr   N(N-−-1)+-N-(n−-1)(r−-1)−-(N-−-1)nr-
         N                N(N − 1)
        nr  (N − n)(N − r)
       =-N ⋅---N(N-−-1)---

0 Checkpoint
No: 48

3.7.6 Geometric distribution

Consider an experiment which consists of a sequence of independent and identical Bernoulli trials; the expelriment ends when a (one) success is observed. X being the number of trials until one success, X Geometric(P), i.e., X has a geometric distribution with parameter P:

              (
              || (1− P)x−1P  ,x = 1,2,...
              {
∀x ∈ R ,f (x) = || 0          ,otherwise
              (

The construction of f(x) is intuitive as the experiment will yield x1 failures before the ’one and only’ success, which occurs at the end, by definition.

Observe below the PDF of X Geometric(0.80):

0123456789100000xf0.2.4.6.8(x)

Observe below the PDF of X Geometric(0.50):

01234567891000xf0.2.4(x)

Observe below the PDF of X Geometric(0.20) for 1 x 10

     −2
0123456789105000xf0...(⋅112x15)0

Observe below the PDF of X Geometric(0.20) for 1 x 20

01234567891111111111205000xf01234567890...(⋅112x15)0−2

Memoryless  property of geometric distribution
Consider X ∼ Geometric(P), f(x) = (1 − P)x−1⋅P. Suppose we know
that P (x > h) = k. What is the value of P(x > s+ h | x > s) ?

                                 m
                P(x > m) = (1 − P)   for any m
So,
In a nutshell                      (1− P )s+h
                P(x > s+ h | x > s) =-(1−-P-)s
                                         h
                                 = (1− P)
                                 =P (x > h)

                                 =k
What  does this result tell?

Expected Value and Variance:

                     ∞
             E(X ) = ∑ x(1− P)x−1P
                    x=1
                  = 1P(1− P)0+ 2P(1− P )1+ 3P(1− P)2+ ⋅⋅⋅
                                       2          3
       (1− P)E(X ) = 1P(1− P)+ 2P (1 − P) + 3P(1− P) + ⋅⋅⋅
E (X )− (1− P)E(X ) = 1P + 1P(1− P )+ 1P(1− P)2+ ⋅⋅⋅
                                        2
E (X )+ (P − 1)E(X ) =P + P(1 − P)+ P(1− P ) + ⋅⋅⋅
            PE(X ) =P (1+ (1− P) +(1 − P)2+ ⋅⋅⋅)
                        1        1
             E(X ) = 1−-(1−-P)-= P

        ∞
E(X 2) = ∑  x2(1 − P)x−1P
        x= 1
          ∞
      =P ∑  x2(1− P)x−1
         x=1
      =P 2−-P-
          P3
      = 2−-P-
         P2

Note that, denoting 1 P = q

 ∞           1+ q     1+ 1− P
∑  x2qx− 1 =------3 = ---------3-
x=1        (1− q)    (1 − 1+ P)
                   = 2−-P-
                      P3

            2         2
Var(X) =E (X  )− (E(X))
         2−-P-   1-2
       =  P2  − (P)
         1−-P-
       =  P2

0 Checkpoint
No: 49

3.7.7 Negative Binomial distribution

Consider an experiment which consists of a sequence of independent and identical Bernoulli trials; the expelriment ends when r successes are observed. X being the number of trials until r successes, X Neg Bin(r, P), i.e., X has a Negative Binomial distribution with parameters r and P:

              (  x−1
              ||{ (r− 1)Pr(1− P )x−r  ,x = r,r+ 1,...
∀x ∈ R ,f (x) =  0                ,otherwise
              ||(

To develop an intuition of f(x), notice that the last Bernoulli trial yields success, with a probability of P and the x1 trials before that yield r1 successes with a probability of x−1
(r−1)Pr1(1 P)xr according to a Binomial (x1, P) distribution, where the product of the two probabilities yield the Negative Binomial PDF.

Observe below the PDF of X NegativeBinomial(4, 0.80):

4567891111111111200000xf01234567890.1.2.3.4(x)

Observe below the PDF of X NegativeBinomial(4, 0.65):

45678911111111112000xf01234567890.1.2(x)

Observe below the PDF of X NegativeBinomial(4, 0.50):

456789111111111120500xf01234567890..(⋅11x15)0−2

Observe below the PDF of X NegativeBinomial(4, 0.35):

4567891111111111202468⋅xf012345678901(0x−)2

Observe below the PDF of X NegativeBinomial(4, 0.20):

   −2
45678911111111112024⋅xf012345678901(0x)

Among  their many uses, one may consider the use of Geometric
and Negative Binomial distributions in a Research and Development
(R&D ) environment. Suppose there is an R&D project consisting of a
number  of engineering trials. Though it is unrealistic, suppose also
that each R&D trial is independent from others and all trials are iden-

tical. So, for simplicity of course, we assume that our R&D engineers
aIrnean noutt lesahrenlilng across trials. Given these:

∙  We can assess the probability of x trials until a (one) success using
   a Geometric distribution

∙  We can assess the probability of x trials until r successes using a
   Negative Binomial distribution

Think: Is there a good use as such for budgeting purposes?

3.7.8 Discrete Uniform distribution

X ∼ Uniform (a,b),a ≤ x ≤ b ∈ Z

n = b − a+ 1
f(x) = 1
      n
E(X) = a+-b-
        2
         n2−-1-
Var(X) =   12

X Uniform(a, b)

Expected Value and Variance:

  n =b − a+ 1
f(x) = 1
      n

        b  1
E(X) = ∑  x--
       x=a  n
      na-+-(b−a)(b−2a+1)
     =       n
      2na + (b− a)n
     =------2n------
      2a + b− a
     =----2----
      a + b
     =-----
        2

         b
E(X 2) =    x21-
        ∑x=a   n
        1 b
      = --∑ x2
        nx=a
      = 1-1(b− a+ 1)(2a2+ 2ab− a+ 2b2+ b)
        A 6
        2a2-+-2ab-−-a+-2b2+-b-
      =          6

Var(X) =E(X 2) − (E (X))2
        2a2+ 2ab − a+ 2b2+ b  ( a+ b) 2
       =--------------------−   -----
                  6               2
       =2a2+-2ab-−-a+-2b2+-b−  a2-+-2ab-+-b2
                  6                 4
        4a2+-4ab-−-2a-+-4b2+-2b−-3a2−-6ab−-3b2
       =                  12
        a2− 2ab − 2a+ 2b+ b2
       =---------12---------
        n2 − 1
       =------
          12

0 Checkpoint
No: 50

3.8 Random variables and distributions: Continuous probability laws

We consider here three continuous probability laws

3.8.1 Uniform distribution

X ∼ Unif orm (a,b)
             (| -1-
             |{ b−a  ,a ≤ x ≤ b
∀x ∈ R ,f (x) = | 0 ,otherwise
             |(

The graph of the PDF of Uniform(0, 4) looks like:

040xf.2(x5)

Expected Value and Variance:

       ∫                   |b
E(X ) =  bx--1--dx = -1--x2||
        2  b − a     b− a 2|a
                     b2 − a2
                  = 2(b−-a)

                  = a+-b-
                      2

   2   ∫ b 2  1        1  x3||b
E(X ) = a x b-−-adx = b−-a-3||
                       3   3 a
                   = -b-−-a-
                     3(b− a)
                     b2+-ab+-a2-
                   =      3

Var(X) =E (X 2)− (E(X))2

       = b2-+-ab+-a2− (a+-b)2
             3          4
         4a2-+-4ab-+-4b2−-3a2−-6ab−-3b2-
       =              12
         a2 − 2ab+ b2
       = ----12-----
         (b − a)2
       = -------
           12

0 Go to Teaching page & experiment with Uniform(a, b) using the file named ‘Statistical distributions.xlsx’.

0 Checkpoint
No: 51

3.8.2 Triangular distribution

X Triangular(a, b, c)

a: Lower limit b: Mode c: Upper limit

       (
       { -2(x−a)--, a ≤ x ≤ b
f(x) =   (b−2a(c)−(cx−)a)
       ( (c−a)(c−b),  b ≤ x ≤ c
       ({ -(x−a)2--
F(x) =   (b−a)(c−a),    a ≤ x ≤ b
       ( 1− --(c−x)2--, b ≤ x ≤ c
            (c−a)(c−b)

E(X) = a+-b+-c-
          3
Var(X) = a2+-b2+-c2−-ab−-ac−-bc-
                   18

Triangular distribution is a practical model, mostly useful in business what-if analysis. A symmetric triangular is the sum of two identically distributed uniform variables.

3.8.3 Exponential distribution

X ∼ Exponential(λ)
              (|  − λx
              |{λe     ,x > 0
∀x ∈ R,f (x) = |0     ,otherwise
              |(

Graphs of Exponential(0.5), Exponential(1.0), Exponential(2.0) and Exponential(4.0) PDFs can be seen in the following figure:

001122334001122334xf.5.5.5.5.5.5.5.5(x)

Expected Value and Variance:

       ∫
        ∞     −λx
E(X) =  −∞ xλe   dx
       ∫ ∞   −λx
     =  0 xλe   dx
        ∫ ∞     −λx
     =λ  0 ◟◝x◜◞e◟-◝◜dx◞
             u   dv

 u =x

du =dx
 dv =e− λxdx
       1
  v = − λ-e− λx

                 |∞     ∫ ∞
   E(X) = − x-e− λx||  + 1-   e−λxdx
            λ  ∫  0   λ  0
        =0 + 1- ∞ e−λxdx
           ( λ  0   | )
          1-   1-− λx||∞
        = λ  − λ e  |0
          1
        = λ2-
           1    1
So, E(X) =λ-2-= --
           λ    λ

       ∫ ∞
E(X2) =    x2λe−λxdx
        0∫ ∞
     = λ    ◟x◝2◜◞e◟−λ◝x◜dx ◞
         0   u    dv

 u =x 2

du = 2xdx
 dv =e− λxdx
       1 − λx
  v = − λ-e

         [                          ]
             x2    ||∞   2 ∫ ∞
E(X2) = ⁄ λ − ⁄ λe−λx|| + ⁄λ   xe−λxdx
        ∫           0      0
     = 2  ∞ xe− λxdx
         0
     = -2-
       λ2

Var(X) =E (X 2)− (E(X))2
         2    1
       = λ2 − λ-2
         1
       = λ2-

0 Checkpoint
No: 52

0 Go to Teaching page & experiment with Exponential(λ) using the file named ‘Statistical distributions.xlsx’.

To gain some computational insight, consider the completion of a repetitive/routine task by an office employee. Suppose that every repetition of a task takes a random duration which is governed by an Exponential(1/4) distribution. As λ = 1/4, one task is, on average, completed in 4 time units (let’s say, days). Using this information, let’s calculate the following:

1.
What is the probability that a task will be completed in exactly 2 days?
The answer here is 0, as time (X) here is a continuous random variable.
2.
What is the probability that a task will be completed within 2 days?
X ∼ Exponential(1/4)

                 1 −1x
          f(x) = 4e 4 ,x > 0
                    − 1x
          F(x) = 1 − e 4 ,x > 0
P(x ≤ 2) = F(2) = 1 − e− 14⋅2

              = 0.3934
3.
What is the probability that a task will be completed within 4 days?
X ∼ Exponential(1/4)

                 1 −14x
          f(x) = 4e   ,x > 0
          F(x) = 1 − e− 14x,x > 0
                      1
P(x ≤ 4) = F(4) = 1 − e− 4⋅4
              = 0.6321
4.
What is the probability that a task will be completed within 6 days?
X ∼ Exponential(1/4)

                 1  1
          f(x) = 4e−4x,x > 0
                    − 1x
          F(x) = 1 − e 4 ,x > 0
P(x ≤ 6) = F(6) = 1 − e− 14⋅6

              = 0.7768
5.
What is the probability that a task will be completed between 2 and 6 days?
X ∼ Exponential(1/4)

                          1 − 1x
                    f(x) = 4 e 4 ,x > 0
                              − 14x
                    F(x) = 1 −e  ,x > 0
P(2 ≤ x ≤ 6) = F(6)− F(2) = 0.7768 − 0.3934
                        = 0.3834

Memoryless  (no memory ) property of Exponential distribution
A watch repairer’s repair times X follow an Exponential(λ) distribu-
tion (where λ is the typical/average number of repairs per unit time ).

                      X ∼ Exponential(λ)
                     f(x) = λe−λx,x > 0
                               −λx
                     F(x) = 1− e  ,x > 0
At time 0, I left my watch for a repair and after waiting for T1 time
units I observed that my watch was not repaired. What is the proba-

bility that I will wait for an additional (extra) T2 time units? A careful
examination will reveal the followino: as T1 units of time has already
passed, total waiting time will be at least T1, i.e., x > T1 will hold.
This is nothing but a condition imposed on my full sample space of
x  >   0. Based on this, the original question turns into "beginning at
T , what is the probability that I will wait until T + T ?" In technical
  1                                       1    2
notation, the answer is:
                   P-(T1 <-x-<-T1+-T2)
In a nutshell           P (T1 < x )
                   F (T 1+ T2)− F (T1)
                 = ----1-−-F (T-)---
                              1  (        )
                 = 1−-e−λ(T1+T(2)−--1−-e)−λT1-
                         1−  1− e−λT1
                   e−λT1 − e−λ(T1+T2)
                 = ------−-λT-------
                    −λT+ eλT  1 −λT −λT+ λT
                 =e    1  1 − e  1   2  1
                 =e 0− e−λT2
                       −λT2
                 = 1− e
Now, we notice that the final expression is just equal to F (T2), i.e.,
P (x < T2). T simply drops out of the solution and "probability
            1
of waiting for an additional T 2 upon T1" is equal to "probability of
waiting until T2 at time 0". Disappearance of T1 (or its irrelevance)
implies that the random process (random variable X here) does not
remember  its past. This is called the memoryless (no memory ) prop-
erty. Drawings and graphs will be covered in the lectures.

0Checkpoint
No: 53

Suppose we want to simulate and analyze the purchasing decisions
of 1000 customers arriving at a store. In such a simulation, the very
first step is to create/generate these 1000 customers. Under cer-
tain conditions (research for them ), we can (and most of the time

we should) assume that arrivals of customers follow a Poisson(λ )
distribution; so, interarrival times of the same customers follow an
Exponential(λ ) distribution (which makes our lives quite easy).
The technique we use is called the "inverse transformation technique"
and utilizes the inverse CDF, i.e., F−1(x). The steps are:

1. Generate a sequence of uniform (0,1) random values, call these
   values u.

2. Find the F −1(⋅) expression, i.e., derive the inverse CDF .

3. Input u into F−1(⋅) to obtain a sequence of x values. These x val-
   ues are the randomly generated numbers obeying/following F(⋅).
Simulation guide
In our case of generating Exponential(λ) interatrival times:

                P(X < x) = F(x) = u ,0 ≤ u ≤ 1.
So,
                         x = F−1(u)

where
                     F−1(u) = −ln(1−-u)
                                   λ
Accumulating the generated x (interarrival time) values, we can
easily see the arrival times of our simulated customers. As the inter-
arrival times have been randomly generated, arrival times are also

random.
Think: What is the role and importance of Uniform(0,1) distribution
here?
Think: Can you use this technique to generate random values that
obey another statistical distribution?

0Checkpoint
No: 54

3.8.4 Normal distribution

           (   2)
X ∼ Normal  μ,σ
                      1(x−μ)2
∀x ∈ R ,f (x) = √-1-e−2-σ2-,− ∞ < x < ∞
               2π σ2

Below given the graph of Normal(4, 1) PDF:

4xf(x)

When we add the guidelines that show μ3σ, μ2σ, μσ, μ + σ, μ + 2σ and μ + 3σ, the previous figure looks like:

1234567xf(x)

Displaying the PDFs of Normal(4, 1) and Normal(4, 0.25) together, we notice that the latter has a higher peak:

4xf(x)

Displaying the PDFs of Normal(4, 1), Normal(4, 0.25) and Normal(4, 0.09) together, we notice that the last has an even higher peak, the area under each PDF integrating to 1.

4xf(x)

Keeping the variance σ2 the same, a change in mean μ results in a shift of the PDF. Compare Normal(4, 1) and Normal(6, 1) below:

46xf(x)

μμμμμμμ699x859−−−+++.3.5.7%%%32σσ23σσσσ

Expected Value and Variance:

       ∫ ∞           (x−μ)2
E(X ) =    x√--1--e− -2σ2-dx
        −∞    2πσ2

    x−-μ-
 z =  σ
 x = σz+ μ

dx = σdz

          1   ∫ ∞         −z2
E (X ) = √---2- −∞(σz + μ)e 2 σdz
         2π1σ  ∫ ∞ (       )   z2
     = √-----      σ2z+ σμ  e− 2 dz
         2πσ2∫  −∞            ∫
     = √-σ--  ∞ ze− z22 dz +√-μ-  ∞ e−z22dz
       ◟-2π--−∞◝◜-----◞    2π ◟−∞-◝◜---◞
              0                 √ 2π-
            μ  √---
     = 0+ √-2π- 2π

     = μ

        ∫
    2     ∞  2--1----−(x−2μσ)22
E (X  ) = −∞ x √ 2πσ2e      dx

    x− μ
 z =-----
      σ
 x = σz+ μ
dx = σdz

   2      1   ∫ ∞        2− z2-
E(X ) = √---2- −∞(σz + μ)e  2 σdz
         2πσ  ∫ ∞                    2
     = √--1---   (σ2z2+ 2σμz + μ2)σe− z2 dz
         2πσ2 ∫−∞                 ∫                   ∫
       --σ3---  ∞  2− z22-    -2σ2μ--  ∞   −z22    --μ2σ-- ∞  − z22-
     = √ 2πσ2  −∞ ze   dz+  √2π σ2 −∞ ze   dz+ √ 2πσ2  −∞ e  dz
        σ2  ∫ ∞     z2-    2σμ ∫ ∞    z2     μ2  √---
     = √----    z2e− 2 dz+ √----   ze− 2 dz+ √--- 2π
         2π  −∞            2π  −∞            2π

        σ2  √---
E(X2) = √--- 2π + 0+ μ2
        22π 2
     = σ + μ

Var(X) =E (X 2)− (E(X))2
          2   2   2
       =σ  + μ − μ
       =σ 2

Consider −∞z2e 2
z2dz. For α = 12:

∫ ∞     z2    ∫ ∞      2
    z2e−2 dz =    z2e−αz dz
 −∞            −∞∫ ∞
            = −     -de−αz2dz
                 −∞∫dα
            = − -d- ∞  e− αz2dz
                dα  −∞

Set ω = √z2α:

∫ ∞    2      1 ∫ ∞    ω2
    e−αzdz = √---   e− 2-dω
 −∞           2α◟−-∞√◝◜---◞
                      2π
               d ∘ π-
          = −  dα- α-
              √ --
          = −   π d-α−1/2
            √ --  dα
          = --π-α−3/2
            √2--(  )
            --π-  1 −3/2
          =  2    2
            √ π-
          = ----23/2
            √2---
          =   2π

0 Go to Teaching page & experiment with Normal(μ, σ2) using the file named ‘Statistical distributions.xlsx’.

3.8.5 Standard normal distribution

Z Normal(0,1) has the standard normal distribution. If X Normal(    )
 μ,σ2, the random variable Z defined as:

Z = X-−-μ
      σ

has a Normal(0,1) distribution. A casual naming is z-distribution, and

        1    12
f (z) = √--e−2z,− ∞ < z < ∞
        2π

Recall that, e is called the Euler’s Number where e = 2.71828... and π = 3.14159...

Notice/recall that the PDF of the Standard Normal (Z) random variable has a unique parametrization. Its PDF with the guidelines that show μ3σ = 3, μ2σ = 2, μσ = 1, μ = 0, μ + σ = 1, μ + 2σ = 2 and μ + 3σ = 3 look like:

−−−−01234zf(4321z)

0Go to Teaching page & experiment with Normal(0, 1) using the file named ‘Statistical distributions.xlsx’. Is there anything in Z to experiment with?

Normal  approximation to Binomial distribution
Let X be the number of successes resulting from n independent trials,
each with probability of success P. The distribution of the number of
successes, X, is binomial, with mean nP. If the number of trials, n, is
large and nP (1 − P) >  5, this distribution can be approximated by
                                     2
the Normal distribution with μ = nP and σ = nP (1 − P). So,
In a nutshell        --1--- −12(x−μ2)2
              f (x) = √2π-σ2e   σ ,− ∞ < x < ∞

can safely be used to obtain a numerical result, where
                           --X-−-nP---
                       Z = ∘nP-(1−-P-)

has a Standard Normal distribution.

0Checkpoint
No: 55

To see how/why the Normal approximation to Binomial works, consider the PDF of X Binomial(80, 0.80) over the domains of 0, 1, ..., 80 and 55, 56, ..., 75 below:

PDF of X Binomial(80, 0.80) plotted over 0, 1, ..., 80 looks like:

05112233445566778050xf050505050505050.(⋅1x1)0−2

PDF of X Binomial(80, 0.80) plotted over 55, 56, ..., 75 looks like:

56677050xf50505.(⋅1x1)0−2

Do you see the Normal-like behavior of X Binomial(80, 0.80) around its mean, i.e., nP = 80 0.80 = 64? Can you obtain the same with X Binomial(8, 0.80)? Why?

3.9 Random variables and distributions: Moments of distributions [Optional material]

For each integer k, the kth moment of X is denoted as μK and is defined as:

 ′    (   )
μK = E  Xk

The kth central moment of X is denoted as μk and is defined as:

       (       )
μK = E  (X − μ)k

Notice that μ = μ1 = E(X) In addition to the mean (expected value) of a random variable, another important moment is the second central moment, as you’ve known as variance.

3.10 Moment generating functions [Optional material]

X being a random variable with CDF F(x), the moment generating function (MGF) of X is denoted by MX(t) and is defined as:

          (   )
MX (t) = E etX

provided that the expected value exists for t in some neighborhood of zero. That is, there exists h > 0 such that for all h < t < h, E(eX)
 e exists. Otherwise, the MGF is said not to exist. Explicitly,

         ∫ ∞
MX  (t) =    etXf (x)dx,continuous X
          −∞

or

M   (t) =   etXf (x)dx,discrete X
  X      ∑x

3.10.1 Moment generating functions for selected distributions [Optional material]

Distribution MX(t)
Bernoulli(p) (1− p) + pet
Binomial(n,p) ((1 − p)+ pet)n
Poisson(λ) eλ(et1)
χn2 (-1--)
 1−2t n2, t < 1
2
Exponential(λ )1−1t-
  λ, t < λ
Fn1,n2 Does not exist
Normal(μ,σ2) eμt+σ2t22
tn Does not exist
Uniform(a,b) ebt−eat
(b−a)t

If a random variable X has the MGF MX(t), then

E(Xn ) =-dnM   (t|
        dtn  X   t=0

That is, the nth moment of X is equal to the nth derivative of MX(t) evaluated at t = 0. See after five years: convergence of MGF’s.

0Checkpoint
No: 56

3.2 EXERCISES ___________________________________________________________     

1. 

We roll a pair of fair dice. Let X be the random variable that assigns the minimum of the two numbers that turn up to each outcome.

i. Tabulate the probability density function and cumulative distribution function of X.

ii. If we know that one of the dice turned up a number less than or equal to 3, what is the probability that X takes a value greater than or equal to 2?

iii. If we know that one of the dice turned up a number less that or equal to 3, what is the probability that X takes a value equal to 3?

iv. Find the expected value of X.

v. Find the variance of X.

Solution:

1.
PDF is tabulated as follows:

x f(x)
1 11/36
2 9/36
3 7/36
4 5/36
5 3/36
6 1/36
2.
This is a conditional probability question that you already are familiar with.
3.
This is a conditional probability question that you already are familiar with.
4.
E(X ) = ∑ xf(x) = 1⋅ 11+ 2⋅ 9-+ 3⋅ 7-+ 4⋅ 5-+ 5⋅ 3-+ 6⋅ 1
                    36     36     36     36     36     36
               = 11-+-18+-21+-20+-15-+-6
                           36
               = 91 = 2.53
                 36
5.
V ar(X ) = ∑ (x− E (X ))2f (x)

       = (1− 2.53)2 ⋅ 11+ ⋅⋅⋅+ (6− 2.53)2-1
                    36                36
       = 1.97
2. 

Two balls are simultaneously chosen (i.e.., chosen without replacement) from an urn containing 3 white, 2 black, and 1 red balls. You are given 2TL for each white ball chosen, you have to pay 1TL for each black ball chosen, and you neither pay nor receive any money for a red ball that is chosen. For example if you have chosen 1 white and 1 black ball, you net winning is 2 + (− 1) = 1 TL. Let X be the random variable that gives your net winnings.

i. Construct a table that shows the possible values of X and the probabilities associated with each value, i.e.., tabulate the probability density (mass) function of X.

ii. Find the expected value of X.

Solution:

1.
PDF is tabulated as follows:

x f(x)
2 2/30
1 4/30
1 12/30
2 6/30
4 6/30
2.
E(X) = ∑ xf(x)
            2        4     12     6      6
     = (−2)-- + (− 1)-- + 1⋅-- +2 ⋅-- +4 ⋅--
       − 4−304+ 12+ 1320+ 24 30     30     30
     = -------------------
               30
     = 40 = 1.33
       30
3. 

A class in statistics has 20 students. In the first midterm 2 students scored 50, 10 scored 60, 1 scored 70, 5 scored 80, and 2 scored 100. Three students are selected at random without replacement. Let X be the median score of the three students.

i. Tabulate the probability density function of X.

ii. Find the probability of the median score being greater or equal to 80.

iii. Given that the median of the scores of the three students selected is greater than or equal to 70, what is the probability that their median is equal to 80?

iv. Find the expected value and variance of X.

Solution: Try on your own if you have time, and just for fun.

4. 

We have three coins such that when coin 1 is tossed the probability of observing a head is 0.4, when coin 2 is tossed the probability of observing a head is 0.7, and when coin 3 is tossed the probability of observing a head is 0.2. We first toss coin 1. If we observe a head we toss coin 2 otherwise we choose coin 1 or coin 3 at random and toss it.

i. What is the probability of observing a head on the second toss?

ii. Are the events of observing a head on the second toss and observing a head on the first toss independent?

Solution: This question is reserved for in-class discussions.

5. 

A fair die is rolled ten times. We are interested in the number of times 6 is obtained.

i. Given our interest, can we think of this experiment as a binomial experiment. If so describe each Bernoulli trial, i.e.. verbally describe the Bernoulli trial, state the outcome that you will call success and the probability of success in each trial.

ii. Let X be the random variable which assigns, to each outcome, the number of times 6 is obtained in the outcome. What is the distribution of X?

iii. With what probability will X take the value 1?

iv. With what probability will X take a value greater than or equal to 4?

Solution:

1.
Success is observing a 6, failure is observing any of {1, 2, 3, 4, 5}. Since the die is fair, the probability of success is 1/6 . X being the random variable indicating the outcome of experiment,
       (
       {1/6,  x = 1
f(x) = (5/6,  x = 0

that is, X Bernoulli (1/6).

2.
Considering the whole experiment, X Binomial(10, 1/6).
      (     )
f(x) =   10   (1/ 6)x(5/6)10−x ,x = 0,1,2,...,10.
         x
3.
      (     )
f(1) =   10   (1/ 6)1(5/6)9 = 0.3230
         1
4.
P(x ≥ 4) = 1− f(0)− f(1)− f(2)− f(3)
        = 0.0697
6. 

It is known that 40% of all students of Economics are male. Independent observers note the gender of 12 random Economics students (a student’s gender might be noted more than once) and we count the number of males observed.

i. What is the probability that exactly 2 of the observed students are male?

ii. What is the probability that the number of male students, in the group observed, is 5 or less?

iii. You have been told that at least 2 of the students that has been observed are female. What is the probability that the number of male students, in the observed group, is 5 or less?

Solution:

1.
X Binomial(12, 0.40)
      (     )
         12
f(2) =   2    0.4020.6010

    = 0.0639
2.
x Binomial(12, 0.40)
P (X ≤  5) = f(0)+ f(1)+ f(2)+ f(3)+ f(4)+ f(5)
         = F(5)
         = 0.6652
3.
This question is reserved for in-class discussions.
7. 

Consider a game where a round of the game consists of rolling a fair die 10 times. Each time a 1 or 6 comes you win 1TL.

i. What is the probability that you will win 5 TL or less, if you played this game for one round?

ii. What is the probability that you will win exactly 5 TL, if you played this game for one round?

iii. You have learned that two of the rolls of the die resulted with a number different than 1 or 6, but you do not know what the result of the other rolls of the die was. What is the probability that you will win more than 5TL?

iv. What would your average (mean) winnings be if you played this game indefinitely?

Solution:

1.
X Binomial(10, 1/3)
P(X ≤ 5) = F(5)

        = 0.9234
2.
X Binomial(10, 1/3)
      (     )
f(5) =   10   (1/ 3)5(2/3)5
         5

    = 0.1366
3.
This question is reserved for in-class discussions.
4.
X Binomial(10, 1/3)
                1
E(X ) = nP = 10⋅ 3
     = 3.33
8. 

Based on past data, we know that, on average, 6 customers enter Coffee Break every 20 minutes.

i. What is the probability that at least 2 customers will enter Coffee Break during a given 20-minute time period?

ii. Define the probability of k customers entering Coffee Break in 20 minutes as a mathematical function. Describe what is what in your function clearly.

Solution:

1.
X Poisson (6) P( At least 2 customers ) = 1 P(At most 1 customer ) = 1  −60
e06!-  −61
e-1!6 = 0.9826
2.
X Poisson (6)
       −6 x
f(x) = e-6-,x = 0,1,2,...
        x!

X being the random variable that shows the number of customers arriving every 20 minutes. The rate of arriving customers is λ = 6. X is a Poisson random variable.

9. 

On an ordinary day, on average 3 white and 1 blue cars pass through a certain cross-section of a road every 5 minutes.

i. What is the probability that 6 white cars will pass in a 5-minute interval?

ii. What is the probability that 6 white cars will pass in a 10-minute interval?

iii. What is the probability that 3 cars (blue or white) will pass in a 5-minute interval?

Solution:

1.
W Poisson(3)
      e−336
f(6) =--6!- = 0.0504
2.
W Poisson (6)
      e−666
f(6) =--6!- = 0.1606
3.
X = W + B
W ∼  Poisson (3)

B ∼ Poisson (1)

As Poisson λ ’s are additive, X Poisson (4).

       −4 3
f(3) = e-4- = 0.1954
        3!
10. 

A Hypergeometric story: In a corporation, promotion decisions for employees are made by a committee of 5 people. The decision making procedure has the following steps:

1.
Each of the 5 writes her vote (either ’Promote’ or ’Not promote’) on a piece of paper, folds the paper twice and casts the paper into a bowl.
2.
Another person from outside the committee randomly picks 3 out of the 5 votes. (This is a step taken to anonymize votes).
3.
The 3 picked papers are opened. If the employee gets 2 or 3 votes, then she is promoted. If she gets no votes or 1 vote, she is not promoted.

Consider Employee A for whom the chance of a promotion is P in the eyes of each committee member. That is, each committee member has a chance of P to promote Employee A. Also, preferences of committee members are independent from each other’s. Is there a chance to be accidentally or unfairly promoted (or not promoted) in this kind of scheme?

Solution: The solution involves some steps:

First, a ’Promotion’ vote being marked as Success, each committee member’s vote is a Bernoulli trial:

Xi ∼ Bernoulli (P),i = 1,2,3,4,5

Then, total votes (total of successes) (Y) is a Binomial process:

Y = X 1+ X2+ X 3+ X4 + X5
Y ∼ Binom ial(5,P ), y = 0,1,2,3,4,5

Then, W being the number of ’Promotion’ votes among the final 3, W has a Hypergometric distribution:

W Hypergeometric(5, Y, 5 Y)

So,

       {
f (x) =  P ,xi = 1
   i     1 − P,xi = 0
      (   )
g(y) =   5   Py(1− P )5−y
        y
       ( y ) (  5− y )

h(w ) = --w--(---3)−-w----
              5
              3

Now, your task is to find g(y) for each value of y. Then you will calculate h(w) for each different value of y. At the end, you will compare Employee A’s chance to promote with and without the Step 2&3 of the promotion procedure. Note that the result may be a little surprising.

11. 

Let X1 be the random variable that gives the number of phone calls that you get between 1 PM and 2 PM. Let X2 be the random variable that gives the number of phone calls that you get between 2PM and 4PM. Assume that X1 is Poisson distributed with parameter 5 and X2 is Poisson distributed with parameter 12. Let X be the random variable that gives the number of phone calls that you get between 1PM and 4PM. Find the PDF of X.

Solution: X1 Poisson (5) and X2 Poisson (12). X = X1 + X2, X Poisson (17). Make sure you have obtained this result by following the chapter’s instructions.

12. 

Let X1 be the random variable that gives the number of phone calls that you get between 1PM and 2PM. Let X2 the random variable that gives the number of phone calls that your friend gets between 1PM and 2PM. Assume that X1 is Poisson distributed with parameter λ1 and X2 is Poisson distributed with parameter λ2. Find the distribution of X = X1 + X2, i.e.., the PDF of the random variable that gives the total number of phone calls that you and your friend receive between 1PM and 2PM.

Solution: The solution method is already arailable in the chapter.

13. 

Suppose that you buy 40 lottery tickets. Using the Poisson approximation find the probability of having at least 2 winning tickets, given that the probability of any ticket being a winning ticket is 0.02.

Solution: This is self-study for those who are interested. Not to appear in any examination.

14. 

Let X be a random variable that is uniformly distributed over (− 1,3). Answer the following questions:

i. Find P(X < 0)

ii. Find P( 1        )
  2 < X < 1

iii. Find P(X > 2)

iv. What is the expected value and variance of X?

Solution:

1.
X Uniform(1, 3)
          1      1
f(x) = --------= -,− 1 ≤ x ≤ 3
       3− (−∫1)   4    |
P(X < 0) =  0 1dx = x-||0
            −14      4− 1
   0   −1-
=  4 − 4
=  1/4
2.
                  ∫ 1 1
  P(1/2 < x < 1) =    -dx
    |              1/2 4
=  x||1
   4 1/2
=  1−  1
   4   8
=  1/8
3.
            ∫ 3
 P(X > 2) =    1dx
   |         2 4
= x||3
  4 2
= 3−  2
  4   4
= 1/4
4.
         − 1+ 3
  E(X) = ------
           2
       = 1
         (3−-(−1))2
Var(X) =     12
         16
       = 12
       = 4/3
15. 

A potato chips producer starts a promotion program in an effort to boost its sales. In that, gift tickets are placed in every 25 out of 100 chip bags in sale and the customers are required to collect two tickets to win a free soft drink. By the nature of such promotions, gift tickets are invisible from outside prior to purchase. In order to attain a probability of 90% at minimum to win a soft drink, how many bags of potato chips should an average customer buy? Notes:

Solution: This is to be discussed in class only along with a computer demo.

16. 

Let X be a random variable with the following PDF:

x 3 1 0 1 2 3
f(x)0.250.100.050.200.300.10

Define a new random variable Y as

Y = X2+ 1

i. Find the PDF of Y

ii. Find the CDF of Y

iii. Find the expected value of Y and show that it is equal to (      )
 x2 + 1fX(x), where f(x) is the PDF of X

iv. Find the variance of Y

Solution: This question is reserved for in-class discussions.

17. 

Let X be a random variable normally distributed with expected value of 2 and variance of 9. Answer the following questions:

i. Find P(X < 5.15)

ii. Find P(X < − 1)

iii. Find P(X > 4)

iv. Find P(1.04 < X < 3.5)

Solution:

1.
X ∼ Norm al(2,9)(               )
               x-−-2   5.15-−-2
P(x < 5.15) = P  3   <    3

= P (Z < 1.05)
= 0.85314
2.
              (              )
                x−-2-  −-1−-2
P(X < − 1) = P   3   <    3

          = P (Z < −1)
          = P (Z > 1)
          = 1 − F(1)

          = 1 − 0.84134
          = 0.15866

Reveal how symmetry property is used here.

3.
              (         )
P(X > 4)  = P  x−32>  4−32
          = P(Z > 0.66)
          = 1− F(0.66)

          = 1− 0.74537
          = 0.25463
4.
   P (1(.04 < X < 3.5)     )
=  P  1.043−2 < x−32<  3.53−2
=  P (− 0.32 < z < 0.50)

=  F (0.50)+ F(0.32)− 1
=  0.69146+ 0.62552− 1
=  0.31698

Study this solution by drawing proper graphs of the PDF of the Standard normal distribution.

18. 

It is estimated that 45% of the freshmen entering a particular college will graduate from that college in four years.

i. For a random sample of 5 entering freshmen, what is the probability that exactly 3 will graduate in four years?

ii. For a random sample of 5 entering freshmen, what is the probability that a majority (more than half) will graduate in four years?

iii. 80 entering freshmen are chosen at random. Find the mean and variance of the number of these 80 that will graduate in four years.

Solution:

1.
X Binomial(5, 0.45)
      (    )
f(3) =   5   0.4530.552 = 0.2757
         3
2.
X Binomial(5, 0.45)
f(3)+ f(4)+ f(5) = 0.4069
3.
X Binomial(80, 0.45)
E(X) = nP = 80⋅0.45 = 36
Var(X) = nP(1− P) = 80⋅0.45⋅0.55 = 19.8
19. 

Bags of our packed by a particular machine have weights which are normally distributed with mean of 500gr and standard deviation of 20gr.

i. What is the probability that a bag weights more than 515gr or less than 490gr?

ii. If 2% of the bags are rejected for being underweight, what is the maximum weight for a bag to be rejected as underweight?

iii. Find an interval [l,u] symmetric around the mean and the probability of the weight of a randomly selected bag being in the interval is 0.90.

Solution:

1.
X Normal(500, 400)
  P(X < 490)+ P (X >  515)
    (             )    (             )
=P   Z < 490-−-500- + P  Z >  515−-500-
            20                  20
=P (Z < − 0.50)+ P(Z > 0.75)

Use your z table, the answer is 0.30853 + 0.22662, i.e., 0.53516.

2.
This question is reserved for in-class discussions.
3.
This question is reserved for in-class discussions.
20. 

X is distributed as Bin(100,0.04). Describe the steps to calculate P(X > k) for a given k by using a Poisson approximation.

Solution: This is self-study for thase who are interested. Not to appear in any examination.

21. 

We know that the number of vampires killed by Dean in a typical fight has a Poisson(5) distribution and the number of vampires killed by Sam in a typical fight has a Poisson(3) distribution. Show that the total number of vampires killed in a typical fight follows a Poisson(8) distribution.

Solution: The solution method is already available in the chapter.

22. 

X has a Uniform(0,100) distribution. Calculate P(33 < X < 67), E(X ) and Var(X ).

Solution: X Uniform(0, 100)

       ---1---
 f(x) = 100− 0 ,0 ≤ x ≤ 100
F (x) = -x-,0 ≤ x ≤ 100
       100

P (33 < X < 67) = F(67)− F(33)
                 67    33
               = ---−  ---
                 100   100
               = 34-
                 100

        0+ 100
E (X ) = --2----= 50

         (100 − 0)2
Var(X ) = ---------= 833.33
            12
23. 

Assume that the number of phone calls that you receive in a day is governed by a Poisson process. Answer the following questions assuming that on average you receive 3.4 phone calls in a day.

i. What is the probability that you will receive at least3 phone calls in a day?

ii. Given that you have already received a phone call, what is the probability that you will receive at least3 phone calls?

Solution: This question is left as self-study.

24. 

The probability density function for a random variable X is defined as:

       (
       ||{ 0.5x  0 < x < a
f (x) =  0    elsewhere
       ||(

i. Find the value of a that makes f(x) a well-defined probability function.

ii. Calculate E(X ) and Var(X )

Solution:

1.
0a0.5xdx = 1    x2||
0.52 |0a = 1 a2 02 = 4 a2 = 4 a = 2. When a equals 2, the given f(x) becomes a well-defined probability (distribution) function.
2.
        ∫ 2
 E (X ) =   x0.5xdx
         0 ∫
      = 0.5  2x2dx
            0
           x3||2
      =  0.5-3||
              0
      = 4/3
        ∫ 2        2
Var(X ) = 0 (x− 4/3) 0.5xdx

Try on your own.

25. 

I roll a die repeatedly. In each roll, if the outcome is 3, my score increases by 1; nothing happens otherwise. Knowing that my score was initially zero, what are the expected value and variance of my score right after the 10000th roll?

Solution: Reveal that this physical experiment generates a random variable X with X Binomial(10000, 1/6). Then, E(X) = 10000 (1/6) = 1666.7 and Var(X) = 10000 (1/6) (5/6) = 1388.9

26. 

Suppose that you’re in charge of marketing airline seats for a major carrier. Four days before the flight date you’ve 16 seats remaining on the aircraft. You know from the past experience that 80% of the people that purchase tickets in this time period will actually show up for the flight.

i. If you sell 20 tickets, what is the probability that you’ll overbook the flight or have at least one empty seat?

ii. What if you sell 18 tickets?

Solution: This exercise is left as self-study.

27. 

A machine that produces stampings for automobile engines is malfunctioning and producing 10% defectives. The defective and nondefective stampings proceed from the machine in a random manner. If the next five stampings are tested, find the probability that three of them are defective.

Solution: This question is reserved for in-class discussions.

28. 

The variance of a Poisson random variable X is known to be 4. Calculate manually the probability that X takes a value of at least2. For ease, take e = 3.

Solution: X Poisson(λ), Var(X) = 4. Since for a Poisson random variable X Poisson(λ), Var(X) = λ, λ = 4 here.

P (x ≥ 2) = 1− P(x ≤ 1)
        =  1− f(0)− f(1)
               −4 0    −4 1
        =  1− 3--4-−  3--4-
                0!     1!
        =  1− -1 − 4-
              81   81
        =  76
           81
29. 

Median and the coefficient of variation for a random variable X N(μ,σ 2) are given as 100 and 0.25, respectively. Given F(−0.84) = 0.20 for the standard normal distribution, calculate the 80th pecentile of X.

Solution: Median of a Normal random variable is equal to μ, so μ = 100.

     σ-
CV = μ
       σ
0.25 = 100

σ = 25
σ2 = 625

So, X Normal (100, 625). F(0.84) = 0.20 implies F(0.84) = 1 0.20 = 0.80 by the symmetry of the Standard normal distribution. Based on these, the 80th percentile of X is found as:

P80(X) = 100+ 0.84 ⋅25
       = 121

Self-practice this solution by drawing the Normal and Standard normal PDFs.

30. 

Find the value of

∫ 1  1   −1z2
 − 2√-2πe 2  dz

with proper explanations.

Solution: Notice that the question requires the calculation of F(1) F(2) for the Standard normal random variable. It is nothing but the area under the standard normal PDF from 2 to 1. The answer is 0.8186.

31. 

Calculate P 8 for X Bin(1000,0.010) using the Normal approximation to Binomial distribution.

Solution: X Bin(1000, 0.010) has E(X) = 10 and Var(X) = 9.9. So, X can be approximated by X Normal(10, 9.9). Calculation of P(X 8) is then:

            (               )
              X−-10-  8-−-10
P(X ≤ 8) = P  √ 9.9- ≤  √ 9.9-

        = P (Z ≤ − 0.64)
        = 0.262
32. 

We make 100 independent observations from a normal population with mean 40 and standard deviation 20. Approximately, what is the probability that the mean of these observations will be less than or equal 37?

Solution: Indeed, this question is an early reference to sampling distributions. Each of the 100 observations has a Normal (40, 400) distribution, name them as X1, X2, , X100. Try to see X Normal(40, 4). Calculation of P(X 37) is then straightforward.

33. 

Suppose the PDF of a logistic random variable X is given by

          e−x
f (x) =--------2,− ∞ < x < ∞
       (1 + e−x)

Among the many of its parametrizations, this simple function is something you are familiar from your lab work during the semester.

i. Find F(x) for the random variable X by performing the necessary calculus operations.

ii. Verify that the F(x ) you have found possesses the properties of a CDF

iii. Using the graph of F(x) only, find the value of E(X)

Solution:

1.
      ∫ x     −x
F(x) =    ---e-----dx
       − ∞(1 + e−x)2
         1   ||x
    =  1+-e−x||
              −∞
F(x) = --1-−x,−∞  < x < ∞
       1+ e
2.
F(−∞) = 0, F() = 1 and F() is nondecreasing. F is a proper CDF here.
3.
Try it using WolframAlpha or GeoGebra.
34. 

The class grades after an exam has a normal distribution with a mean of 50 and a variance of 144. If a student is known to have a grade less than 70, what is the probability that she has received a grade between 40 and 60?

Solution: Except for the use of conditional probatrilities, this is a trivial question. Try on your own.

35. 

An experimenter tosses a coin (with P(Tail) = P) until obtaining r successes (tails). What is the distribution of the number of tosses (X) to get r successes? Derive its PDF. Hint: X = x can occur only if there are exactly r1 successes in the first x1 trials. When you notice the first x1 trials have a Binomial structure, the rest is trivial.

Solution: Pr(x) = P(r1 successes in the first x1 trials) times P(a success at the xth trial). The first term is nothing but the Binomial PDF & the second term is simply P. So,

       (       )
         x − 1
Pr(x) =          Pr−1(1− P )x−1−(r−1)× P
       ( r −1  )
         x − 1    r      x−r
     =   r −1   P (1− P )  ,x = r,r+ 1,r+ 2,...
       ◟--------◝◜--------◞

This is the PDF of the random variable described.

36. 

Consider a random variable x with f(x) = √k2π-e1
2x2 , 0 x < . Notice the resemblance of this PDF of the standard Normal PDF. Though, domain of f(x) spans the nonnegative real numbers. What should k be to make f(x) a proper PDF? Plot f(x) using your solution for k.

Solution: See the past exam questions for a solution.

3.11 Random vectors [Optional material]

Earlier we have studied the bivariate probabilities, yet we haven’t described bivariate probabilities referring to random variables and distribution functions. This chapter provides a calculus-based treatment of the same topic in an attempt to complete our knowledge of the topic.

In our earlier study, we’ve discussed probability models and computation of probability for events involving one variable mostly. These were the univariate models. Now, we are diving into models that involve more than one random variable, called multivariate models. As we are talking about more than one random variables, they are best represented as an n-dimensional random vector. This random vector is a function from a sample space S into Rn, i.e.. n-dimensional Euclidean space.

3.11.1 Joint Distributions

Let (X,Y) be a random vector. The function f(x,y) : R2 R defined by

f (x,y) = P (X = x,Y = y)

is called the joint probability distribution function or joint pdf of (X ,Y ) if X and Y are discrete. We denote the function as fX,Y(x,y)

If (X,y) is a continuous random vector, if for every A R2

               ∫∫
P ((X ,Y) ∈ A ) =  f (x,y) dxdy
                A

f(x ,y) : R2 R is called a joint probability density function or joint PDF of (X,Y).

Note that the following are to hold for properly defining joint PDF’s: Discrete case:

f   (x,y) ≥ 0,∀ (x,y) ∈ R2
 X,Y

                   (          )
  ∑   fX,Y (x,y) = P (X,Y) ∈ R2 = 1
(x,y)∈R2

Continuous case:

                      2
fX,Y (x,y) ≥ 0,∀ (x,y) ∈ R

∫   ∫
  ∞  ∞ f (x,y)dxdy = 1
 ∗∞  ∞

3.11.2 Marginal distributions

Let (X,Y) be a discrete bivariate random vector with joint PDF fX(x,y). Then, the marginal PDFs of X and Y are:

fX (x) = P (X = x) = ∑ fX,Y (x,y)
                   y∈R

and

fY (Y ) = P (Y = y) = ∑x∈RfX,Y (x,y)

Let (X,Y ) be a continous bivariate random vector with joint PDF fX(x,y). Then, the marginal PDFs of X and Y are:

        ∫
f  (x) =   ∞ f   (x,y),∞ < x < ∞
 X       −∞  X,Y

and

fY (Y ) = x∑∈R fX ,Y (x,y),∞ < y < ∞

3.11.3 Conditional distributions

Let (X,Y) be a discrete bivariate random vector with fX,Y(xy), fX(x), and fY(y). Then,

                           fX,Y (x,y)
fX|Y (x|y) = P (X = x|Y = y) = fY (y) ,fY (y) ⁄= 0

and

fY|X(y|x) = P (Y = y|X = x ) = fX,Y (x,y),fX (x) ⁄= 0
                             fX(x)

Let (X,Y ) be a continous bivariate random vector with fX,Y(xy), fX(x), and fY(y). Then,

           fX,Y (x,y)
fX|Y (x|y) = --f-(y)--,fY (y) ⁄= 0
              Y

and

fY|X (y|x) = fX,Y-(x,y),fX (x ) ⁄= 0
             fX (x)

3.11.4 Independence of random variables

Let (X,Y) be a bivariate random vector with fX,Y(xy), fX(x). Then, X and Y are called independent random variables if, for every x R and y R

f   (xy) = f (x)f (y)
 X,Y       X     Y

If X and Y are independent

f   (x|y) = fX,Y (x,y)= fX (x),
 X|Y         fY (y)

and

            fX ,Y (x,y)
fY|X(y|x) = -f--(x)---= fY (y),
              X

Notice that, except for the minor changes in notation, these definitons are the same as before.

3.11.5 Covariance and correlation

The covariance of X and Y is the number defined by:

Cov (X,Y) = σ   = E((X − μ )(Y − μ ))
             XY           X       Y

The correlation of X and Y is the number defined by:

ρ (X,Y) = σXY--
          σXσY

This value is also called the correlation coefficient. Note that, 1 ρXY 1.

For any random variables X and Y,

σXY = E((X − μX)(Y − μY))
    = E(XY − X μY − Y μX + μX μY)

    = E(XY )− μYE (X)− μY E(Y)+ μX μY
    = E(XY )− μXμY − μXμY + μXμY

    = E(XY )+ μXμY

So,

σ   = E(XY )− μ  μ
 XY            X  Y

If X and Y are independent random variables, then

Cov (X,Y) = 0

and

ρXY = 0

Let X and Y be any two random variables, also let a and b are any two constants, then

Var(ax + bY ) = a2Var(X )+ 2abCov(X ,Y )+ b2Var(Y)

or

σ2    = a2σ2 + 2abσXY + b2σ2
 aX+bY     X              Y

If X and Y are independent random variables with moment generating functions MX(t) and MY(t), then the moment generating function of X + Y is given by:

M     (t) = M  (t)M  (t)
  X+Y        X     Y

0 Checkpoint
No: 57

3.3 EXERCISES ___________________________________________________________     

1. 

Fill the empty cells in the tables, which are tables of joint CDF and joint PDF of the random variables (X 1,X2).

f(x1,x2)0.00.51.0
1 0.1
2 0.1
3

F(x1,x2)0.00.5 1.0
1 0.2
2 0.50.650.75
3 0.50.8
2. 

Let X and Y be two discrete random variables with joint density function f(x,y) R2 such that

        (
        { cxy  x ∈ {1,2,3} and y ∈ {1,2}
f (x,y) =
        ( 0    otherwise

i. Find the value of c

ii. Find P(Y < X )

iii. Find P(Y = X)

iv. Find the PDF of X and PDF of Y

v. Find the conditional distribution of Y given x = 1

vi. Are the random variables X and Y independent?

vii. Find the expected value of X

viii. Find the variance of X

3. 

First we pick a number, at random, from the interval (0,1), then we pick a number, at random, from the interval (0,x1). Let X1 be the random variable that gives the value of the first number and X2 the random variable of the second number. The distribution of X1 is uniform over (0,1) and the distribution of X2 given that x1 = x1 is uniform over (0,x1)

i. Find the joint PDF of (X1,X2).

ii. Find the PDF of X2 (the second marginal distribution of (X 1,X2)).

iii. Find the conditional distribution of X1 given X2 = x2.

0Go to Teaching page & experiment with Normal(0, 1) using the file named ‘Statistical distributions.xlsx’. Is there anything in Z to experiment with?

In ECON  221 and ECON 222 we formally consider the following sta-
tistical distributions:
ECON   221 (Probability Theory):
Bernoulli, Binomial, Poisson, Hypergeometric, Geometric, Negative
Binomial, Discrete Uniform, (Continuous) Uniform, Triangular, Expo-

nIenntai naul,t Nsohremlall, Standard Normal, Half Normal (as exercise)
ECON   222 (Statistics and Basic Econometrics):
t (t distribution ), χ2 (Chi-square distribution), F (F distribution)
Note that the above is only a selection of essential distributions out of
tens of available statistical distributions. Learning how to acquire the
knowledge of other distributions is a valuable asset here.