8 Linear regression analysis

Chapter 8
Linear regression analysis

In the previous chapters that served the whole ECON 221 and about a half of ECON 222, we studied the fundamentals of Probability theory and the key theory and toolset of Statistical inference. Remember that we focused solely on understanding statistical distributions and estimating the distributional parameters. In your future scientific, technical, professional practice, this body of knowledge will be quite fruitful.

Now, we are ready to study the theoretical background and applied dimensions of the ’curve-fitting’ problem. To this end, in this chapter, we will consider the Linear Regression Models. Notice that what we will do here accounts for the first half of a traditionally designed ’Introductory Econometrics’ course.

The term regression was coined by Francis Galton to describe a biological phenomenon. The phenomenon was that the heights of descendants of tall ancestors tend to regress down towards a normal average, which is also known as regression toward the mean. For Galton, regression had only this biological meaning. His work was later extended by Udny Yule and Karl Pearson, and later by Fisher (in a way to come closer to Gauss’s 1821 formulation of the problem). Once you have researched it, you will enjoy the history of this certain line of research.

As being the main pillar of it, the ’regression analysis’ takes us to the rich analytical world of Econometrics. The literal meaning of the term econometrics (econo+metrics) is ’measurement in economics’. Econometrics is ’the branch of economics concerned with the use of statistical methods in describing and quantifying economic systems’ (Oxford Dictionary). From a broader perspective, econometrics is a shared sub-field of Statistics (hence of Mathematics) and Economics. In that, our tools in Econometrics are those tools in Statistics as shaped and augmented by our knowledge of Economics. (One of the founders of the Econometrics Society, another pioneer of the field, Ragnar Frisch is credited with coining the term ’econometrics’.)

Renowned academic Badi Baltagi says "An econometrician has to be a competent mathematician and statistician who is an economist by training. Fundamental knowledge of mathematics, statistics and economic theory are a necessary prerequisite for this field".

Our starting point is a scientific urge to find/formulate, measure and test the relationship between, say, two variables y and x. These variables may belong to natural sciences, social sciences or even to humanities; this is not something to mind. What matters often is that the linkage between our variables may not be (mostly is not) a perfect relationship like y = mx + n (we prefer indeed a notation like y = β₀ + β₁x, where n is β₀ and m is β₁). We rather observe there are deviations from a perfect relationship, as seen earlier in ECON 221. In that, actual y values are connected to actual x values through a relationship like y = β₀ + β₁x + e where e stands for a sequence of statistical errors (disturbances).
05112230123xy05050000 The error sequence e may stem from the random actions/choices of humans, unexpected shocks to socio-economic systems, misspecification of models, improper choices of mathematical functional forms or imprecision of the data. Note that, this picture is not specific to social sciences: in the natural science experiments there is a multiplicity of sources of uncertainty (hence of statistical errors or disturbances).

Goals of econometrics, as we understand it, will be (1) to find the relation between variables y and x, encapsulated in (β₀, β₁), (2) to validate and quantify theory and (3) forecasting.

Purpose of modeling and Simplicity
Deferring a detailed discussion of it to class gatherings, we will say here that ’a model is a downsized yet realistic representation of reality’. An immediate analogy from architecture would be useful: on an architectural model of a building we see things ’only as needed’. While we may not see the doorknobs (depending on the scale) on a model, we see the proportionality of distances clearly. After all, the purpose of the model is to give a broad yet accurate idea of/about things.

A similar idea applies in the other disciplines. In business models we do not see every tiny detail of the workplace or the manufacturing environment. In economic models we tend not to include all potential explanatory variables at once. We just try to remain ’accurate enough’.

Using our models we can present our scientific grasp of the nature or universe or the society. Once the model is well-parametrized and quantified, we can develop forecasts of the future, or we can (depending on the type of our model) develop counter factual and/or scenario analyses. Presenting a scientific view of ours and forecasting the future (while) are fairly pragmatic ends, a third use of a scientific model helps testing, validating/ invalidating theories, which calls for a more than pragmatic spirit. Regardless of the purposes cited, though, a model (any model) should display: a certain level of simplicity. Before proceeding, recall Albert Einstein saying "Everything should be made as simple as possible, but no simpler."

In our practice of statistical/ econometric modeling, the ’principle of parsimony’ guides us. Equipped with a rich toolset of formal statistical tests and her judgmental skills, a good researcher tries to come up with an "as simple as possible but no simpler" model. Common sense says essentials will be included in while all the inessentials will be omitted from a model. Bad news is every researcher’s practice has a couple bumps as to improving a sense of such in practice. Good news is honest and hard work pays back.

In Philosophy (and Science) there are several ’razors’ to shave away the redundancies in models (or in scientific explanations). Here we will maintain the Occam’s razor (or Ockham’s razor) attributed to william of Ockham, an English philosopher of the 13th-14th centuries. Occam’s razor is a principle of parsimony stating that among the explanations addressing the same thing, the simplest is to be picked! (William of Baskerville of the Name of the Rose by Umberto Eco is a tribute to William of Ockham) (Arthur Conan Doyle’s Sherlock Holmes once utters "When you have eliminated the impossible, whatever remains, however improbable, must be the truth")

Occam’s razor reads in Latin as "pluralitas non est ponenda sine necessitate" which translates into English as "plurality should not be posited without necessity". The principle, so, calls for parsimony in ’deductive thinking’.

Despite what we do in applied statistics/econometrics is not purely (maybe not at all) deductive thinking, we rather try to reach an inference to the best explanation via a formal sequence of estimations/tests/calculations. In this practice, Occam’s razor sheds some good light for us to see things clearly.

In the world of project development you may hear the same principle as an acronym of ’KISS’. Referring to a model, KISS reads as ’Keep It Small and Simple’ or sometimes as ’Keep It Simple, Stupid’. (Search yourself for its relevance to the US Navy)

In the remainder of this chapter, we will study/learn the theory of elementary econometrics, a rich enough toolset pertaining to it along with a selection of applied problems.

8.1 EXERCISES____________________________________________

1.

Refer to our in-class discussions to explain/discuss the following:

i. Occam’s razor

ii. Principle of parsimony

iii. ’Keep It Small and Simple’, i.e., KISS

iv. Purpose of modeling

v. Come up with a synthesis of the terms/phrases referred to above.

Solution: Left as self-exercise.

⁰Checkpoint
No: 86

8.1 Overview of linear models

Overview of linear models

The specific meaning of linearity here is ’the linearity of a model in terms of (with respect to) its parameters. In that:

y = β0+ β1x1+ β2x2+ e

is a linear model. So is

2 3 y = β0+ β1x1+ β2x2+ e

However,

2 y = β 0+ β1x1+ β1β3x2+ β3x3+ e

is not considered to be a linear model. Neither is

y = β0 + β1x1+ β2x2+ β1β2x3+ e.

In your future practice, you will be able to settle this issue in a crystal clear fashion.

Why do we resort to linear models? This is a very legitimate question once we observe a number of relationships in the nature and in societal life are, indeed, nonlinear (not linear). A straightfor ward answer reads as ’linear models are easy to use’. So, simplicity matters. Simplicity brings practicality to researchers, they are easy to compute, to interpret and to communicate. More importantly, as noted earlier, our linear regression models are linear with respect to their parameters while the independent variables of our models can be of any nonlinear form. All in all, one can establish/ form models that are nonlinear in their variables’ using ’models that are linear in their parameters’. The good thing about models that are linear in parameters is that such a structure allows us to use the tools of linear algebra effectively in our computations.

Our curious nature often forces us to include many explanatory variables in a model:

y = β0+ β1x1+ β2x2+ ⋅⋅⋅+ βkxk

However, a minimalist design is also possible:

y = β0+ β x 1

Even this may be a good enough model (think when):

y = β0

The process of inference begins with the specification of an economic model. Then a statistical model describes the sampling process that we visualize was used to produce the sample data. See the structure below:

Economic model:

y = β0+ β1x

Statistical model:

y = β0+ β 1x + e

The random error term (e) serves three main purposes:

1.: e captures the combined effect of all other influences other than x. These other effects are assumed to be unobservable, otherwise they would be included in the model.
2.: e captures any approximation error that arises because of the linear functional form
3.: e captures any element of random behavior present in each individual observation.

See the structures below:

Case 1: Unconditional model of mean

Economic model:

y = β0

Statistical model:

y = β0+ e

Case 2: Simple Linear model

Economic model:

y = β0+ β1x

Statistical model:

y = β + β x + e 0 1

Case 3: Multiple Linear model

Economic model:

y = β0+ β1x1+ β2x2+ ⋅⋅⋅+ βkxk

Statistical model:

y = β0+ β 1x1 + β2x2+ ⋅⋅⋅+ βkxk + e

⁰ Checkpoint
No: 87

8.2 Transformations and functional forms

Transformations and functional forms In economics and finance, like in other quantitative disciplines, we attribute a great deal of importance to measuring the impact of a change in one variable on one another. Considering y = f(x) as a relationship between the variables y (dependent) and x (independent), the derivative dy/dx = f^′(x) describes that impact. When we consider y = f (x1,x2,...,xk) , the impact of an independent variable x_i on the dependent variable y is better described by the partial derivative ∂y/∂x_i. Having formed and estimated a proper statistical/ econometric model, then, a researcher gains a good grasp of issues embedded in the research problem at hand.

Note that, as economists and finance specialists, we like to learn about a special class of impact measurements, namely the elasticities. Recall from your introductory economics classes that ’elasticity of y with respect to x is the percentage change in y against a one percentage point change in x ’. in formal terms:

Δy n = -%Δy = y--= Δy-⋅ x- y,x % Δx Δxx Δx y

So, as long as we can estimate Δy/Δx, we can come up with an estimate of η_y,x by substituting appropriate values of x and y into x/y. We will see several examples as we progress through this chapter, where we will see estimating an elasticity is possible under a wide array of functional forms of f(⋅) in the expression y = f(x).

One of the functional forms, i.e., the Log-Log form, yields elasticities directly as:

Δlny- ηy,x = Δlnx .

We will discuss this topic further in our classes.

Functional form: Linear

yi = β0+ β1xi+ ei
Nonlinear form:

None
IImnpaac ntutasth mealrlgin:
dy-
dx = β1
Elasticity:
dy x x
--⋅ -= β 1--
dx y y

Functional form: Reciprocal
1-
yi = β0 + β1xj + ei

Nonlinear form:
None
In a nutshell
Impact at margin:
dy-= −β 1 12
dx x
Elasticity:
dy⋅ x= − β ⋅ 1--
dx y 1 xy

Functional form: log− log
lny = β + β ln x + ℓ
i 0 1 i i
Nonlinear form:
yi = αx β1eei
i
IImnpaac ntutasth mealrlgin:
dy-= β y-
dx 1x
Elasticity:
dy- x-
dx ⋅y = β1

Functional form: Log -Linear (exponential)
lny = β + β x + e
i 0 1 i i
Nonlinear form:
yi = eβ0+β1xi+ei
In a nutshell
Impact at margin:
β y = dy-
1 dx
Elasticity:
dy- x-
dx ⋅y = β1x

Functional form: Linear log( semilog )

yi = β 0+ β1lnxi+ ei

Nonlinear form: β
eyi = eβ0+eix1i
In a nutshell
Impact at margin:
dy-= β11-
dx x
Elasticity:
dy⋅ x= β 1-
dx y 1y

Functional form: Log -Inverse
1-
lnyi = β0 − β1xi+ ei
i
Nonlinear form: β−β 1+e
yi = e0 1xi i
In a nutshell
Impact at margin:
dy= β1 y2
dx x
Elasticity:
dy-x-= β 1-
dx y 1x

⁰Checkpoint
No: 88

8.3 Our approach to teaching/learning

In the remainder of this chapter, we will maintain an approach which may slightly differ from the approaches of others. Sticking to this approach would facilitate better learning. Our approach folds out as:

Depiction of an unconditional model of mean and the mechanics of estimation (without inference)
Depiction of a Simple Linear Regression model and the mechanics of estimation (without inference)
Depiction of a Multiple Linear Regression model and the mechanics of estimation (without inference)
Goodness of fit of a model measured via R²
Handling statistical uncertainty: calculation of variances and covariances associated with a Multiple Linear Regression model
Statistical inference

Having a pitstop here, the sequence of topics above will provide us with a solid understanding of the mechanical workings of our linear regression universe.

Once we have learned these, we will move to:

Ideal econometric conditions: Gauss-Markov assumptions

In many, maybe all, books Gauss-Markov assumptions are covered before other things. Though, our approach maintains a different pedagogical perspective. In that, we take into consideration the Gauss-Markov assumptions, which are crucial in econometric theory and practice, upon a clear view of the working environment. After that we will move to:

Model specification
Regression analysis at work

Note that the above order of topics require us to stick to it without interruption or gaps for successful learning.
An artificial data set:

In our subsequent discussions we will be referring to the following data set frequently. While we can show a data set as an actual set (with proper mathematical notation) like:

A={(2,1), (2,3), (3,2), (3,3), (3,4), (5,3), (5,4), (5,6), (8,5), (8,8), (10,6), (11,8), (11,10), (12,8), (14,10), (15,11), (15,17), (16,13), (19,15), (21,16), (23,18), (23,19), (23,20), (25,18), (25,20), (26,21), (27,24), (28,21), (28,24), (28,25)}

it may be more practical to use a tabular listing of the data. A tabular structure improves visibility and exposition:


Observation i	x_i	y_i	Observation i	x_i	y_i

1	2	1	16	15	11

2	2	3	17	15	17

3	3	2	18	16	13

4	3	3	19	19	15

5	3	4	20	21	16

6	5	3	21	23	18

7	5	4	22	23	19

8	5	6	23	23	20

9	8	5	24	25	18

10	8	8	25	25	20

11	10	6	26	26	21

12	11	8	27	27	24

13	11	10	28	28	21

14	12	8	29	28	24

15	14	10	30	28	25

8.4 Building and estimating an Unconditional Model of Mean: A model which is a non-model

Consider a variable y that is modeled as:

y = β0+ e

If we have a sample y₁, y₂, …, y_n, this relationship can also be written as

yi = β0+ ei,i = 1,2,...,n

It is clear that our model does not include any independent (explanatory) variables on the right hand side, ie., values of y are scattered around β₀ (if they are not all accidentally equal to β₀ ).

Supposing there are K potential independent variables, x₁, x₂, …, x_k, that might explain y, the unconditional model of mean can be viewed as:

yi = β0+ 0 ⋅x1i+ 0 ⋅x2i+ ⋅⋅⋅+ 0⋅xki+ ei

where the researcher places zero weight on x₁, x₂, …, x_k. In that, this model of mean turns out to be the simplest possible model or more like a non-model. When we plot y_i against one of the x’s (say x_ki), the model of mean is to appear as a horizontal line (as the model disregards x ’s). This is simply the orange line displayed below (observe that across the orange line dy/dx = 0:
0511223051122xy050500505 To estimate β₀ in y = β₀ + e we need two main ingredients:

Data on y, a set of n observations y₁, y₂, …, y_n collected from the population y₁, y₂, …, y_N. Our n observations as a whole is called a sample; recall from our discussions in earlier chapters that the sample should be randomly picked and large enough
A formula to compute the desired numerical result; recall that this formula is called an estimator and the numerical result it yields is called an estimate, here ₀. Note that we need a method (rule, criterion) to derive our estimator (formula). Here, we will use ’Least squares’ as our method.

Now, suppose out estimator is ₀. Then, the estimated values of y_i (denoted as ŷ_i ) are written as:

̂ ̂ yi = β0

Actual values of y_i, on the other hand, are:

̂ yi = β0+ ̂ei

equivalently

yi = ̂yi+ ̂ei

The difference between y_i and ŷ_i are the error terms:

̂ei = yi− ŷi ̂ei = yi− β̂0

Consider the function S :

n n S = ∑ ̂e2i = ∑ (yi− ̂β0)2 i=1 i=1

The Least Squares method instructs us to minimize S by optimally choosing ₀ :

n ( ) m in ∑ yi− β̂0 2 {̂β0} i=1

The F.O.C. for this problem is:

n dS-= ∑ 2 (yi− β̂0) (− 1) = 0 d̂β0 i=1

which is followed by:

n ( ̂ ) ∑ yi− β0 = 0 i=n1 n ∑ yi− ∑ β̂0 = 0 i=1 i=1 n ∑ yi− n̂β0 = 0 i=1

n β̂0 = 1-∑ yi = ¯y n i=1

So, not surprisingly and as maybe called from our discussion of point estimators, the sample mean is the estimator of population mean. Namely, ₀ = ȳ estimates y.

A note on the function S may be useful here: As ̂
β ₀ is the estimated mean of y_i, the function S shows the variance of error terms multiplied by n. This is good to keep in mind: the least squares estimator is a ’minimum variance estimator’ as we will formally discuss later. Statistical properties of the error terms e_i will also be covered in detail.

Returning to qualities of the sample mean ₀ = ȳ as an estimator of population mean β₀, one can be intellectually stunned by the beauty generated by simplicity. There are couple things to mention:

Representing a variable y with its unconditional mean is a meaningful alternative only when there is no good explanatory variables to model y
In that the unconditional model of mean simply provides us with a descriptive statistic
Still, the unconditional model of mean is very valuable to us as a ’non-model’. This is the model when no explanatory variables work and we use this model as a benchmark in assessing the statistical significance of other (nonempty) models in the subsequent sections.

⁰Checkpoint
No: 89

8.5 Building and estimating a Simple Linear Regression model

Consider a variable y which we believe is explained by another variable x via a linear relationship like:

y = β0+ β x + e 1

in this expression,

β₀ stands for the autonomous / unconditional component of y
β₁x stands for the part of y attributable to x; depending on the sign of β₁, an increase in x may induce an increase or a decrease in y
A case of β₁ = 0 corresponds to our unconditional model of mean

Below, the green line is a good candidate to be a Simple Linear regression line:
05112230123xy05050000 Notice that we need to estimate two parameters β₀ and β₁ this time. The Least squares method is again applicable. Let us go over its steps below:

̂yi = ̂β0+ ̂β1xi y = ̂β + ̂β x + ̂e i 0 1 i i ̂ei = yi− β̂0− ̂β1xi

n n ( )2 S = ∑ ̂e2i = ∑ yi− ̂β0− ̂β1xi i=1 i=1

n ( )2 m in ∑ yi−β̂0− ̂β1xi {̂β0,̂β1}i= 1

∂S- n ( ̂ ̂ ) ∂β̂0 = ∑i=12 yi− β0− β 1xi (− 1) = 0 n ( ) ∂S-= ∑ 2 yi− ̂β0− β̂1xi (−xi) = 0 ∂β̂1 i=1

n ∑ (yi− β̂0 − ̂β1xi) = 0 i=1 n ( ) ∑ yi− β̂0 − ̂β1xixi = 0 i=1

( ) ∑ yi− n ̂β0− ∑ xi ̂β1 = 0 ( ) ̂ ( 2) ̂ ∑ xiyi− ∑ xi β0 − ∑ xi β 1 = 0

( ) n̂β0+ ∑ xi ̂β1 = ∑ yi ( ) ̂ ( 2) ̂ ∑ xi β0+ ∑ xi β1 = ∑ xiyi

n∑ x2 ( ) ∑x2 ∑yi ----iβ̂0+ ∑ x2i ̂β1 = --i----- (∑ xi) ( ) ∑ xi ∑ xi β̂0 + ∑ x2i ̂β1 = ∑ xiyi

( 2 ) 2 n-∑xi-− x ̂β = ∑-xi ∑-yi− xy ∑ xi ∑ i 0 ∑ xi ∑ ii ( 2 2) 2 n-∑xi-−-(∑-xi)- β̂0 = ∑-xi ∑-yi−-∑-xi∑-xiyi ∑ xi ∑ xi

̂ ∑xi2∑yi−-∑-xi∑-xiyi- β0 = n ∑x2 − (∑ x )2 i i

( ) ∑ yi− nβ̂0 − (∑ xi) ̂β1 = 0 n̂β0 = ∑ yi− ∑ xi ̂β1 = 0 ̂ ̂ β0 = y¯− ¯xβ1 ̂β0 = y¯− ̂β1¯x

( )( ) ( ) ∑ xiyi− ∑ xi ¯y− β̂1x¯ − ∑ x2i ̂β1 = 0 ̂ ̂ 2 ∑ xiyi− ¯y∑ xi− β1¯x∑ xi− β1∑ xi = 0 ∑-xiyi ∑-xi ̂ ∑-xi ̂ ∑-x2i n −y¯ n − β1¯x n − β1 n = 0 ∑ xiyi ∑ x2 --n-- −x¯¯y− ̂β1¯x2− ̂β1--ni = 0

̂ ∑-xniyi−-¯x¯y β1 = 2 ∑-x2i ¯x + n = n-∑xiyi−-n2¯x¯y n2¯x2+ n∑ x2i

β̂1 = n∑-xiyi−-∑-xi∑-yi n∑ x2i + (∑ xi)2

Now, reconsider that ₀ = ȳ− ₁x and that ∑x_iy_i− ₀ ∑x_i− ₁ ∑x_i² = 0. Substituting the first one into the second:

∑ xiyi− ∑ xi¯y+ β̂1 ∑ xi¯x− ̂β1∑ x2i = 0 ̂ ( 2 ) β1 ∑ xi − ∑ xix¯ = ∑ xi(yi− ¯y) β̂1∑ xi(xi− ¯x) = ∑ xi(yi− ¯y) ̂β1 = ∑-xi(yi−-y¯) ∑ xi(xi− x¯)

Now notice the following:

(x − x¯)(y − ¯y) = (x − ¯x)y = (y − ¯y)x ∑ i i ∑ i i ∑ i i

as,

∑ (xi− ¯x)(yi− ¯y) = ∑ (xi− x¯)yi− y¯∑-(xi−-¯x)= ∑ (yi− y¯) xi− x¯∑-(yi−-¯y) ◟ ◝◜0 ◞ ◟ ◝◜0 ◞

and, as the sum of the deviations from the mean is zero, i.e.,

(x − ¯x) = x − ¯x = x − n¯x = 0 ∑ i ∑ i ∑ ∑ i

and

∑ (yi− ¯y) = ∑ yi− ∑ ¯y = ∑ yi− n¯y = 0

The same logic applies in:

(x − ¯x)2 = (x − ¯x)(x − ¯x) ∑ i ∑ i i = ∑ (xi− ¯x)xi− ¯x∑--(xi−x¯) ◟ ◝0◜ ◞

At the end, the above-driven expression for ₁, i.e.,

∑ x (y − ¯y) ̂β1 =---i--i---- ∑ xi(xi− ¯x)

can be rewritten as:

̂ ∑-(xi−x¯)(yi−-y¯)- β1 = ∑ (x −x¯)2 i

so, can be written as:

1 ̂β1 = n-∑(xi−-¯x)(yi−-¯y) = Cov(x,y) 1n ∑(xi− ¯x)2 Var(x)

To sum up, our Least Squares estimators ₀ and ₁ for the model parameters β₀ and β₁ are found to be:

̂ ∑(xi−-¯x)(yi−-¯y)- Cov(x,y) β1 = ∑(x − ¯x)2 = Var(x) i

and

̂β0 = ¯y− ̂β1¯x

In the graph given below, try to observe why the green line is superior to others in representing our data:
05112230123xy05050000

⁰Checkpoint
No: 90

8.6 Building and estimating a Multiple Linear Regression model: An increase in dimensionality

Consider

y = β0+ β 1x1 + β2x2+ ⋅⋅⋅+ βkxk + e

where

β₀ stands for the autonomous/ unconditional component of y
Each β_jx_j stands for the part of y attributable to x_j, sign of β_j determining the impact of x_j on y. (j = 1, 2, …, K)
Note again, a case of β₁ = β₂ = … = β_K = 0 corresponds to our unconditional model of mean

̂ ̂ ̂ ̂ ̂yi = β0+ β 1xi1+ β2xi2+ ⋅⋅⋅+ βKxiK yi = ̂β0+ β̂1xi1+ ̂β2xi2+ ⋅⋅⋅+ β̂KxiK + ̂ei ̂e = y − ̂β − ̂β x ⋅⋅⋅+ − ̂β x i i 0 1 i1 K iK

n n S = ∑ ̂e2i = ∑ (yi− β̂0− ̂β1xi1− ⋅⋅⋅− ̂βKxiK)2 i= 1 i= 1 {̂β0,β ̂,⋅⋅⋅ , ̂β } 1 K

As before, this minimization problem will give us ₀, ₁, …, _K, ie., the estimators of β₀, β₁, …, β_K.

For future ease, let us restate our Multiple Linear model using matrix notation. To do this, let us first write our model equation for every single observation (for each i = 1, 2, …, n):

y1 = β 0+ β1x11 + β2x12 + ⋅⋅⋅+ βKx 1K + e1 y2 = β 0+ β1x21 + β2x22 + ⋅⋅⋅+ βKx 2K + e2 ⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅ yn = β0+ β1xn1+ β2xn2+ ⋅⋅⋅+ βKxnK+ en

In matrix notation:

⌊ y1 ⌋ ⌊ ⌋ ⌊ ⌋ ⌊ e1 ⌋ | y | 1 x11 x12 ⋅⋅⋅ x1K β0 | e | || 2. || || 1 x21 x22 x2K || || β1 || || 2. || || .. || = || . || || . || + || .. || || .. || ⌈ .. ⌉ ⌈ .. ⌉ || .. || ⌈ . ⌉ 1 xn1 xn2 ⋅⋅⋅ xnK βK ⌈ . ⌉ -yn-- ◟--------X-◝◜----------◞ ◟β-◝◜--◞ -en-- ◟y◝(n◜×1)◞ (n×(K+1)) ((K+1)×1) ◟e(◝n◜×1)◞

can be written. Then,

y = X β+ e

It is also possible to write each explanatory variable as a separate vector like:

⌊ ⌋ ⌊ ⌋ ⌊ ⌋ 1 x11 x1K || 1 || || x21 || || x2K || x = || 1 || ,x = || .. || ,...,x = || .. || 0 || . || 1 || .. || K || .. || ⌈ .. ⌉ |⌈ .. |⌉ |⌈ .. |⌉ 1 xn1 xnK

so the model looks like:

y = x0β0+ x1β1+ ⋅⋅⋅+ xKβK + e

When the matrix expression y = Xβ + e is maintained, the function S becomes

′ S = e e

where e^′ is the transpose of e.

Returning to our minimization problem written in classical notation, the following first order conditions are written:

-∂S = n −2 (y −β̂ − ̂β x ⋅⋅⋅− β̂x ) = 0 ∂β̂0 ∑i=1 i 0 1 i1 K iK n ( ) -∂S = ∑ −2 yi−β̂0− ̂β1xi1− ⋅⋅⋅− β̂KxiK xi1 = 0 ∂β̂1 i=1 ⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅ n -∂S-= ∑ − 2(yi− ̂β0− ̂β1xi1− ⋅⋅⋅−β̂KxiK) xiK = 0 ∂β̂K i=1

Simplifying a little:

( ) ∑ (yi− β̂0 − ̂β1xi1− ⋅⋅⋅− ̂βKxiK)= 0 ∑ yi− β̂0 − ̂β1xi1− ⋅⋅⋅− ̂βKxiK xi1 = 0 ( ⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅ ) ∑ yi− β̂0 − ̂β1xi1− ⋅⋅⋅− ̂βKxiK xiK = 0

Reorganizing the terms:

n̂β0+ ∑ xi1̂β1+ ⋅⋅⋅+ ∑ xiK̂βK = ∑ yi ∑ xi1̂β0+ ∑ x2 ̂β1+ ⋅⋅⋅+ ∑ xi1xiK̂βK = ∑ xi1yi i1 ⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅ ∑ xiK ̂β0+ ∑ xi1xiK̂β1+ ⋅⋅⋅+ ∑ x2iKβ̂k = ∑ xiKyi

Notice that this last set of equations can be written as:

⌊ ̂ ⌋ ⌊ ⌋ ⌊ n ∑ xi1 ⋅⋅⋅ ∑ xiK ⌋| β̂0 | Σyi | x x2 x x ||| β1 || || Σxi1yi|| || ∑ .i1 ∑ i1 ∑ i1 iK |||| ... || = || || |⌈ .. |⌉|| . || |⌈ |⌉ ΣxiK ∑ xi1xiK ⋅⋅⋅ ∑ x2 ⌈ .. ⌉ iK ̂βK xiKyi

In terms of our earlier definitions of x and y as well as β; what we have obtained is

X ′X ̂β= X′y

So,

β̂= (X′X)− 1X′y

Solves our minimization problem and = [β̂, ̂β , ̂β ,⋅⋅⋅ ,β ̂ ]
0 1 2 K ^′ contains our parameter estimates.

8.2 EXERCISES____________________________________________

1.

Reconsider the Simple Linear model y = β₀ + β₁x + e and show that the = (X ′X ) ⁻¹X^′y works in estimating β₀ and β₁ (ie., while finding ₀ and ₁ ). Solution:

[ ] X′X = n Σxi Σxi Σx2i [ ] X′y = ∑ yi ∑ xiyi ( ) 1 [ x2 −Σx ] X′X − 1 =-----2-------2 ∑ i i n ∑ xi − (∑xi) − Σxi n

So,

[ ][ ] ̂β= ------1------ Σx2i −Σxi Σyi n∑ x2− (Σxi)2 − Σxi n Σxiyi i

So,

∑ x2i ∑yi− ∑ xi∑ xiyi ̂β0 =-------2-------2--- n∑ xi −(∑ xi) ̂β1 = −-∑-xi∑-yi+-n∑-xiyi= n-∑-xiyi−-∑-xi∑yi n∑ x2i − (∑ xi)2 n ∑ x2i − (∑ xi)2

Checking back our earlier solution, we verify that = (X′X) ⁻¹X^′y works well.

2.

Question: Write and solve the Least squares estimation problem for

yi = β0+ β 1xi1+ β2xi2 + ei

that is, a model with a constant term (β0) and two explanatory variables.

Solution: Solution: Left as self-study.

As you have noticed, we used/devised the term (X′X)−1 in solving
our estimation problem: Think about what ensures the invertibility
In a′ nutshell
of X X? In your future learning and practice this will be a central
technical issue to address many times.

⁰Checkpoint
No: 91

8.7 Goodness of fit

Suppose we have the following model:

ŷi ◜--◞◟--◝ yi = ̂β0+ β̂1xi+ei = ̂yi+ ei

Observe that

yi− y¯= ̂yi− ¯y+ ei

and consider the quantity ∑ (yi− ¯y) ²: This quantity is called ’Total Sum of Squares’. In what follows, we decompose it into other useful quantities:

∑ (yi− y¯)2 =∑ (̂yi− ¯y)2 +2 ∑ (̂yi− ¯y)ei+ ∑ e2i

Reordering the terms in the last expression:

2 2 ∑ (yi− y¯) =∑ (ŷi− ¯y + ei) =∑ (̂yi− ¯y)2 +∑ e2+ 2∑ (ŷi− ¯y)ei i ◟---◝◜----◞ 0 ∑ (yi− y¯)2 =∑ (̂yi− ¯y)2 +∑ e2i

is obtained. In this expression,

∑ (yi−y¯)2 = ∑ (̂yi− ¯y)2 +∑ e2 ◟---◝◜---◞ ◟--◝◜---◞ ◟|◝◜i◞ TSS ESS RSS

TSS, ESS and RSS stand for:

TSS: Total Sum of Squares
ESS: Explained Sum of Squares
RSS: Residual Sum of Squares

Notice that the Total Sum of Squares ∑ (yi− ¯y) ² is nothing but the variance of y multiplied by n :

( 1 ) TSS = ∑ (yi− y¯)2 = n -∑ (yi− ¯y)2 n

Explained Sum of Squares ∑ (ŷi− y¯) ² measures the sum of squared deviations of our estimated values of y (namely ŷ_i ) from ȳ (namely the unconditional mean of our dependent variable y ). As ŷ_i values are implied by our model’s explanatory variables (x1,x2,...,xK) , the ESS measures the portion of TSS that we explained. Residual Sum of Squares, then, measures the portion of TSS that could not be explained. The Coefficient of Determination R² is the fraction of variation in y explained by our knowledge of x:

R2 = ESS-= ∑-(ŷi−-¯y)2- TSS ∑ (yi− ¯y)2 RSS =1 − ---- TSS 2 =1 − ---∑-̂ei---- ∑ (yi−y¯)2

Note that, if the model does not have a constant term (that is β₀ is omitted), then the measure R² is not appropriate anymore. When the constant term is omitted,

(y − y¯)2 ⁄= (̂y − ¯y)2+ e2 ∑ i ∑ i ∑ i

A bad habit of R² is that it tends to somehow increase upon the inclusion of additional explanatory variables (in fact, when their t-statistics exceed 1 in absolute value: we will see in subsequent sections) in a model. Does this mean we should continue adding more and more explanatory variables to our model ’just to push up R²’? The answer is quite the opposite: we must see the inclusion of more variables as a cost (after all we want to come up with a parsimonious model). Then, we need to balance the benefits of more explanatory variables (enhanced ESS) with the cost of including them.

The Adjusted Coefficient of Determination (¯2)
R serves that purpose:

RSS /(n− K − 1) ¯R2 = 1− --TSS-/(n−-1)--

Notice that:

2 ( 2) ( n − 1 ) ¯R = 1 − 1− R n−-K-−-1-

Also keep in mind that neither R² nor R² has a statistical distribution. So, they are not directly and formally testable. Though, a simple arithmetic reorganization of R² resembles an F test score (test statistic) as we will consider very soon.

⁰Checkpoint
No: 92

8.8 Handling statistical uncertainty: calculation of variances and covariances associated with a Multiple Linear Regression model

As stated before in "Our approach to teaching/learning’, up to here we maintained a naive and mechanical view of the Linear Regression modeling. In that, we deliberately, avoided calculations and discussions of the measures of dispersion or co-dispersion associated with our models. Now, it is the time to turn to reality. After all, e_i sequence has a certain statistical distribution, so does y_i. As we will formally study under the heading of ’Ideal econometric conditions: Gauss-Markov assumptions’, the e_i terms have:

( ) ei ∼ Normal 0,σ2

that is, a Normal (Gaussian) distributton with a mean of zero (0) and constant (and preferably finite) variance.

As a consequence y_i values have:

( ) yi ∼ Normal β0+ β1x1+ ⋅⋅⋅+ βKxK,σ2

Intuitively, the mean of y_i depends on (is conditional on) x₁, x₂, …, x_K (along with their parameters); while variance of y simply mimics that of e (by the very construction of our analytical framework).

The key thing to understand now is the variability of our parameter estimates: once they are obtained from a stochastic/ random data set, it is natural/ trivial to expect each of our estimators to have a nonzero variance and each pair of our estimators to have a covariance.

We devote this section to some rigorous treatment of what we call a ’variance-covariance’ matrix.

Let us begin from e ∼ Normal ( )
0,σ2 . Once we assume the error terms to have a Normal distribution with a mean of zero (0) and a variance of σ², we may proceed to the following Q&A style mathematical elaboration:

Q: Do we know the value of σ² ?

A: No, it belongs to the population of e_i’s. But, we only have a sample of e_i ’s, namely ê_i ’s.

Q: Can we use those ê_i ’s to estimate σ², that is to obtain ² ?

A: Yes, the formula for ̂
σ ² is:

2 ̂σ2 = ---∑-̂ei--- n− (K + 1)

Q: can we express ² using matrix notation?

A: Yes, the expression is:

̂σ2 =----̂e′̂e--- = (y−-X-β)′(y-−-Xβ) n − (K+ 1) n − (K+ 1)

Q: What about the Cov ( )
β̂i, ̂βj values, can we calculate them?

A: Sure, in matrix notation,

Cov(̂β) = E ((̂β− β)(̂β− β )′) = σ 2(X′X)−1

Q: What about the distribution of ?

( ( ) ) β̂∼ Normal β,σ2 X ′X − 1

Q: What does this mean?

A: First, each parameter estimate is unbiased, E( ̂
β ) = β. Second; the variances are ruled by σ² ′
(X X) ⁻¹.

Q: How is the structure of the variance-covariance matrix?

̂ ( ̂ ( ̂ )′] Cov(β) =E (β− β) β − β ⌊ Var(̂β0)Cov (̂β0,β ̂1) ⋅⋅⋅ Cov (̂β0,β ̂K )⌋ | Cov(̂β , ̂β )Var(β̂) ⋅⋅⋅ Cov (̂β ,β ̂ )| = || 0 1 1 1 K || ⌈ ( ⋅)⋅⋅ ( ) ⋅⋅⋅ ⋅⋅(⋅ ) ⌉ Cov β̂0, ̂βK Cov ̂β1, ̂βK ⋅⋅⋅ Var ̂βK

⌊ ( ̂ )2 (̂ ) (̂ ) (̂ ) (̂ ) ⌋ | ( E β0 −)(β0 ) E β0−(β 0 β1)−2 β1 ⋅⋅⋅ E(β0− β 0) (βK− βK) | = || E β̂0− β0 ̂β1− β1 E ̂β1− β1 ⋅⋅⋅ E ̂β1− β 1 ̂βK− βK || |⌈ |⌉ E(̂β − β ) (̂β − β ) E (̂β − β )(β̂ − β ) ⋅⋅⋅ E (̂β − β )2 0 0 K⌊ K 1 1 K K ⌋ K K n ∑ xi1 ⋅⋅⋅ ΣxiK − 1 ( ) || Σxi1 Σx 2 ⋅⋅⋅ Σx x || = σ 2 X′X −1 = σ2|| . i1 i1 iK || ⌈ .. ⌉ ΣxiK Σxi1xiK ⋅⋅⋅ Σx 2iK

Q: But, we do not know the value of σ²?

A: Then, substitute ² for it:

̂ 2( ′ )−1 C ov(β ) = ̂σ X X

Q: Does that mean we will be using the estimated values of variances and covariances?

A: Sure. This is what we have been doing since the beginning of our ECON 222 journey.

Q: Are we now ready to dive into the fascinating world of statistical inference over our estimated models?

A: Very much, indeed.

Q: Are you an AI?

A: No. Are you?

⁰Checkpoint
No: 93

8.9 Statistical inference

We have studied/learned up to this point:

Probability basics and a rich-enough collection of well-known statistical distributions in ECON 221 (Chapter 1, Chapter 2, Chapter 3, chapter 4)
Point estimators of distributional parameters, and the fundamentals of statistical inference (confidence intervals and hypothesis testing) in ECON 222 (Chapter 5, Chapter 6, Chapter 7)
Structure, formation and estimation of Simple Linear Regression and Multiple Linear Regression models in ECON 222 (earlier sections of Chapter 8)

Now, we are ready to place our estimated models under some serious scrutiny. Using the inferential tools that we learned, we will evaluate, test and scientifically question our regression models.

In a bold fashion, we can say that what we did up to here (i.e., estimating regression models) is no more than the half of the job. To have the job actually done, we need to delve into the following tasks:

1.: Estimating confidence intervals for individual model parameters β_i
2.: Estimating confidence intervals for linear combinations of (more than one) model parameters
3.: Conducting hypothesis tests for individual model parameters β_i
4.: Conducting hypothesis tests for linear combinations of (more than one) model parameters
5.: Conducting hypothesis tests for all of our model parameters at once
6.: Conducting hypothesis tests for specific subsets of our model parameters at once

Now, let us give examples to each category of tasks listed above. To do this, suppose we have the following economic model:

yi = β0 + β1xi1+ β2xi2+ β3xi3 +β 4xi4

Recall that, this is our model written for the population and we turn it into a statistical model (written again for the population) by introducing the statistical error (disturbance, sometimes ’shock’) terms:

yi = β0+ β1xi1+ β 2xi2+ β3xi3 + β4xi4+ ei

where e_i ∼ Normal ( )
0,σ2 . As you know well now, we do not know the true values (population values of β_j ’s). So, we will estimate the model using a sample of n observations and the Least Squares technique.

yi = β0 + β1xi1+ β2xi2+ β3xi3+(β 4xi4)+ ei, i = 1,2,...,n,e ∼ Norm al 0,σ2 i

Provided that everything goes well on the paper and in the computer, we will end up with a rich set of estimates:

Estimates of model parameters: ₀, ₁, ₂, ₃, ₄
Estimated sequence of the dependent variable: ŷ_i
Estimated sequence of error terms: ê_i
Estimated model variance:
$2 ê′̂e ( ̂e′̂e ) ̂σ = n−-(K-+1)- here,n-−-5$
Estimated "variance-covariance matrix:
$Côv (β) = ̂σ2(X ′X )−1$

Now, suppose the following claims and/or questions come from an academic/ technical colleague. (Needless to say, even when there is no criticizing colleague around, we need to put these claims on our own and heavily test our models):

1.: Is 0.4 a viable value for β₁, with respect to a 95% confidence interval of β₁ ?
2.: Is 0.7 a viable value for β₁ + β₂, with respect to a 95% confidence interval of β₁ + β₂ ?
3.: Is β₃ equal zero or not; how do we know x₃ is an important/significant explanatory variable?
4.: Is β₃ + β₄ equal one or not?
5.: Is β₁ = β₂ = β₃ = β₄ = 0; how do we know our explanatory variables x₁, x₂, x₃ and x₄ matter as a whole?
6.: Is β₁ = β₂ = 0; how do we know the explanatory variables x₁ and x₂ matter together?

Our road map to assess these questions begins with formulating these questions/claims in some formal notation:

Following the same order as above:

1.

We will calculate a 95% C.I. for β₁ and will check if 0.4 belongs to the calculated interval. This is simply done as:

( ( ) ( )) P β̂1 − tcse β̂1 ≤ β1 ≤ β̂1+ tcse β̂1 = 1− α

2.

We will calculate a 95% C.1. for β₁ +β₂ and will check if 0.7 belongs to the calculated interval.

( ( ) ( )) P β̂1 + ̂β2− tcse ̂β1+ ̂β2 ≤ β 1+ β2 ≤ β̂1+ ̂β2+ tcse ̂β1+ ̂β2 = 1 − α

Here, we apparently need to calculate Var( ̂
β ₁ + ̂
β ₂). Using our knowledge from ECON 221:

̂ ̂ ̂ ̂ ̂ ̂ Var(β1 + β2) = Var(β1)+ 2Cov(β1,β2)+ Var(β2)

where Var( ₁), Cov( ₁, ₂) and Var( ₂) are straightforwardly obtained during the estimation of the model. Once Var( ₁ + ₂) is at hand, se( ̂
β ₁ + ̂
β ₂) = ∘ ----̂----̂-
Var(β1+ β2) yields the required standard error.

3.

We will conduct the test

H0 : β3 = 0 H1 : β3 ⁄= 0

Distribution of the test statistic:

̂ ∘β-3−(β3)-∼ t(n−K− 1) Var β̂3

Calculation of the test statistic:

β̂3−-β03 se (̂β3) ∼ t(n−K−1)

̂β3−-0- se (̂β3) ∼ t(n−K−1)

4.

We will conduct the test

H : β + β = 1 0 3 4 H 1 : β 3+ β4 ⁄= 1

β̂3+-̂β4−-(β3+-β4) ∘ ---(̂----̂)- ∼ t(n−K−1) Var β3+ β 4

̂β + β̂ − (β + β )0 -3---4(----3-)-4--∼ t(n−K −1) se β̂3+ ̂β4

̂β3+ ̂β4− 1 se-(β̂-+-̂β-) ∼ t(n−K−1) 3 4

Var( ₃ + ₄) will be treated as outlined above for the case of Var( ̂
β ₁ + ̂
β ₂). Note that this test can also be conducted as an F test, as we will cover in our class discussions.

5.

We will conduct the test

H 0 : β 1 = β2 = β3 = β4 = 0 H : ∃ β ⁄= 0 1 i

Total sum of squares TSS being ∑ (yi− ¯y) ², explained sum of squares ESS being ∑ (̂yi− ¯y) ² and residual sum of squares RSS being ∑ê_i²:

(TSS − RSS)/K RSS/(n-−-K−-1)-∼ F(K,n−K−1)

6.

We will conduct the test

H0 :β1 = β2 = 0 H1 :β1 ⁄= 0 or β2 ⁄= 0

J being the number of joint hypotheses, RSS_R being the RSS for the restricted model and RSS_U being the RSS for the unrestricted model:

(RSS − RSS )/J ----R------U---- ∼ F(J,n−K− 1) RSSU /(n− K − 1)

Note / clarify again that RSS_U is the RSS value of the unrestricted, i.e., full model, which is:

yi = β0+ β1xi1+ β 2xi2+ β3xi3 + β4xi4+ ei

where, RSS_R is the RSS value of the restricted model, which is:

yi = β0+ β 3xi3+ β4xi4 + ei

equivalently of:

yi = β0 + 0⋅xi1+ 0 ⋅xi2+ β 3xi3+ β4xi4 + ei

Returning to our previous hypothesis test:

H 0 : β 1 = β2 = β3 = β4 = 0 H 1 : ∃ βi ⁄= 0

you will notice that the restricted model is:

yi = β0+ ei

yi = β0+ 0 ⋅xi1+ 0⋅xi2+ 0⋅xi3+ 0⋅xi4+ ei

against the unrestricted (full) model of:

yi = β0+ β1xi1+ β 2xi2+ β3xi3 + β4xi4+ ei

Herein, RSS_R becomes the TSS of the full model (verify yourself), RSS_U becomes the RSS of the full model (should be trivial) and J becomes K. Then, the equivalence between

(RSSR-−-RSSU-)/J ∼ F RSSU /(n− K − 1) (J,n−K− 1)

and

(TSS − RSS)/K RSS/(n-−-K−-1)-∼ F(K,n−K−1)

becomes apparent.

We will now use a model estimated on a computer to exemplify each of the cases above:
[To be distributed as a handout]

⁰Checkpoint
No: 94

8.10 Essence of the Gauss-Markov assumptions

Having studied the mechanical aspects of Linear Regression models, now it is the time to establish the conditions under which a linear regression model is viable with workable results. As we often call it ’ideal econometric conditions’, the Gauss-Markov assumptions level the field for us. If a model abides by these assumptions, i.e., if a model has been formed so as to hold the Gauss-Markov assumptions, then it is a good econometric model.

Now we can review how good is our LS estimator under these conditions. Consider the Simple Linear regression model y_i = β₀ + β₁x_i + e_i and consider the Gauss-Markov assumptions. Let us now try to see how good is our LS estimator under these assumptions. Recall that ₁ is:

x − ¯x y − ¯y β̂1 = ∑-(-i---)(-i2--) ∑ (xi− ¯x)

which can also be written as:

̂β1 = ∑-(xi−-x¯)yi- ∑ (xi− x¯)xi ∑-(xi−-x¯)(β0-+-β1xi+ei) = ∑(xi− ¯x)xi β (x − ¯x)+ β (x − ¯x)x + (x − x¯)e = -0∑---i-------1∑---i-----i--∑--i-----i- ∑(xi− ¯x)xi

As ∑ (xi− x¯) = 0 (shown before), the expression becomes:

̂β = β + ∑-(xi−-x¯)-ei- 1 1 ∑ (xi− x¯)xi ∑ (xi− x¯)ei = β1 + ---------2- ∑ (xi− ¯x)

Then,

( || ) E (̂β1 | x) = E β1+ ∑(xi−-x¯)ei||x ∑ (xi− ¯x)2 | ( (x − x¯)e || ) = β1+ E ∑--i----2i||x ∑ (xi− ¯x) | ∑-(xi−-¯x)E-(ei | x) = β1+ ∑(x − x¯)2 i

can be written since our x (independent variable, explanatory variable) is non-stochastic.

We also know by the Gauss-Markov assumptions that E (ei | x) = 0, i.e., our knowledge of x does not improve expectation of e. So,

( ) E ̂β1 | x = β1

equivalently saying ₁ is an unbiased estimator of β₁.

What about E (̂β0 | x) ?

yi = β 0+ β1xi+ ei → ¯y = β0+ β1¯x+ e¯ ̂β0 = y¯− ̂β1¯x ̂ = β 0+ β(1¯x+ ¯e−)β1x¯ = β 0− ̂β1− β1 ¯x + ¯e

Then,

( ) ( ( ) ) E β̂0 | x = E β0− ̂β1− β 1 ¯x+ ¯e | x (( ) ) ( ∑ e|| ) = β 0− ¯xE--β̂1− β1-| x + E ---i|| x ◟ ◝◜0 ◞ ◟---n◝◜---◞ 0

So,

( ) E ̂β0 | x = β0

equivalently saying ₀ is an unbiased estimator of β₀.

(̂ ) ̂ ( ̂ 2 Var β1 = E((β 1− E◟ ◝β◜1◞)) ) β1 ( ( )2) = E( ∑(xi−-x¯)ei- ) ∑ (xi− ¯x)2

Expanding the expression and rearranging its terms:

( | ) 2 2 ( ) || Var(̂β ) = E |( ∑-(xi−-¯x)-ei +-∑-∑(xi−-¯x)-xj −-¯x-eiej|| x|) 1 ( (x − ¯x)2) 2 || ( ) ∑ i ( ) ( ) ∑ (xi− ¯x)2E e2i | x + ∑ ∑ (xi− x¯) xj − x¯ E eiej | x = -----------------(---------2)2----------------- ∑ (xi− ¯x)

As E ( )
e2i | x = σ² and E ( )
eiej | x = 0,

(̂ ) σ2-∑(xi−-¯x)2- ----σ2---- --σ2---- Var β1 = ( 2)2 = ∑ (x − ¯x)2 = nVar(x) ∑ (xi− ¯x) i

This expression is a Noise/Signal (i.e., a noise-to-signal) ratio expression.

Examining

( ) 2 Var ̂β1 = --σ----- nVar(x)

We see that, to decrease Var( ̂
β ₁), a larger sample size n, a larger Var(x) and a smaller σ² would help. Among these, the researcher’s choice of the sample data affects n and Var(x). σ², on the other hand, is out of the researcher’s reach.

( ) (( ( )2) Var β̂0 = E ̂β0− E ̂β0 ̂ (̂ ) Asβ0 = β0(−( β1− β1 x¯+ ¯e : ) Var(̂β0) = E β0− (̂β1− β1)x¯+ ¯e− E(β̂0)2 (( ( ) )) = E β 0− ̂β1− β1 ¯x +e¯− β0 2 (( ( ) )2) = E − β̂1 −β 1 ¯x+ ¯e 2 ( )2 ( 2) (( ) ) = ¯x E ̂β1− β 1 + E ¯e − 2x¯E ̂β1− β1 ¯e

To simplify this expression observe/elaborate:

(1)

( )2 ( ) E ̂β1− β1 = Var ̂β1

(2)

( ) ( )2 E e2 = E ∑ei- = -1E (∑ ei)2 n n2 -1 ( 2) ( = n2(E ∑ ei + E◟--∑◝◜eiej◞)) 0 E (∑e2i) = ---n2-- nσ2 = --2- n2 = σ- n

(3) E (̂ )
β1− β1 E(ē) = 0

Then,

V ar(̂β ) =---¯x2σ2---+ σ2 0 ∑ (xi− ¯x)2 n

is reached. Rearranging:

( ) σ2 n¯x2+ ∑ (x − ¯x)2 Var (β̂0) = -------------i------ n ∑ (xi− x¯)2 σ2 ( 2 2 2) = ----------2 n¯x + ∑ xi − 2x¯∑ xi+ ∑ ¯x n ∑(xi− ¯x) = -----σ2----(nx 2+ x2 − 2n ¯x2+ nx2) n ∑(xi− ¯x)2 ∑ i ( 2 ) Var (β̂0) = σ2 ----∑xi---- n∑ (xi− ¯x)2

is obtained.

( ) ( 2 ) Var ̂β0 = σ2 ---∑-xi---2 n∑ (xi− ¯x) ( ) σ2 Var ̂β1 = --------2- ∑ (xi− x(¯) ) (̂ ̂) −2 ----−¯x---- Cov β0,β 1 = σ ∑ (xi− ¯x)2

(1) The larger the value of σ² the larger will be the variances of the estimators.

(2) Var (̂ )
β1 will be smaller, the larger the value of ∑ (xi− ¯x) ². This is also true for Var ( )
̂β0 , but it is less evident as ∑x_i² appears in the numerator of Var ( )
β̂0 expression.

(3) Because the number of terms in ∑ (xi− ¯x) ² increases in n (sample size), an increase in n generally leads to an increase in precision.

⁰Checkpoint
No: 95

8.11 Model Specification

There are two main approaches to model specification:

Starting out small, with one or few explanatory variables; retaining statistically significant ones and expanding the variables when needed
Starting out large and throwing out insignificant variables to reach the true model.

Regarding either of the approaches, we need a good methodological basis. The material of the section entitled ’Statistical inference’, luckily, provides us with the toolset to establish that. The task of model specification involves a systematic sequence of hypothesis tests and evaluation of models with respect to some ad hoc criteria. While the t tests and F tests equip us to assess our models, R², R², AIC, BIC (or SIC) and HQ information criteria further strengthen our hand to come up with parsimonious model specifications.

Akaike Information Criterion:

2 2k AIC = ln(σ )+ -n

Bayesian Information Criterion or Schwarz Information Criterion or Schwarz Criterion or Schwarz-Bayesian Criterion:

BIC = SIC = SC = SBC = ln(σ 2) + klnn- n

Hannan-Quin Criterion:

kln (ln(n)) HQ = ln (σ2) + --------- n

Among the rival models, the ones with lower information criterion values are preferable to others. Therein, it is a good practice to use the same sample size while comparing models via information criteria.

As of this point, we have a su fficient knowledge base to proceed to
our first econometric practice. In what follows through the lecture
nIonteas n,u wteshmeolblilize our statistical knowledge on field. Note that some
new formulations and/or theoretical elements can be introduced if a

need arises.

⁰Checkpoint
No: 96

8.12 Regression analysis at work

In this section we will put our theoretical knowledge into practice. The modeling exercises that we will consider maintain a manageable pedagogical standard, they are somehow downsized and sometimes oversimplified. Yet, they are designed to deliver the intended message of the chapter with regard to applied statistical/ econometric research.

The cases we will consider are as follows:

Case 01 State public expenditures in the US: A public finance model (Economics)
Data reference:
- U.S. Department of Commerce, Bureau of the Census, Government Finances in 1960, Census of Population, 1960, Census of Manufactures, 1958, Statistical Abstract of the United States, 1961.
- U.S. Department of Agriculture, Agricultural Statistics, 1961.
- U.S. Department of the Interior, Minerals Yearbook, 1960.
- Authorization: for educational use
Variables:
1.
EX: Per capita state and local public expenditures (USD)
2.
ECAB: Economic ability index, in which income, retail sales, and the value of output (manufactures, mineral, and agricultural) per capita are equally weighted.
3.
MET: Percentage of population living in standard metropolitan areas
4.
GROW: Percent change in population, 1950-1960
5.
YOUNG: Percent of population aged 5-19 years
6.
OLD: Percent of population over 65 years of age
7.
WEST: Western state (1) or not (0)
Case 02 Home prices in Albuquerque: what determines home prices? (Economics, Real estate, Business)
Data reference:
- Albuquerque Board of Realtors
- Authorization: for educational use
Variables:
1.
PRICE: Selling price (USD, hundreds)
2.
SQFT: Square feet of living space
3.
AGE: Age of home (years)
4.
FEATS: Number out of 11 features (dishwasher, refrigerator, microwave, disposer, washer, intercom, skylight(s), compactor, dryer, handicap fit, cable TV access
5.
NE: Located in northeast sector of city (1) or not (0)
6.
COR: Corner location (1) or not (0)
7.
TAX: Annual taxes (USD)
Case 03 Taste of cheese: An assessment of subjective scores (Product development, Business)
Data reference:
- Moore, David S., and George P. McCabe (1989). Introduction to the Practice of Statistics.
- Authorization: for educational use
Variables:
1.
TASTE: Subjective taste test score, obtained by combining the scores of several tasters
2.
ACETIC: Natural log of concentration of acetic acid
3.
H2S: Natural log of concentration of hydrogen sulfide
4.
LACTIC: Concentration of lactic acid
Case 04 Consumption of soft drinks: Practicing categorical determinants (Consumer research)
Data reference:
- Artificial data - Eray Yucel
- Authorization: for educational use
Variables:
1.
GENDER (0: male, 1: female)
2.
URBAN: 1 for urban, 0 for rural
3.
RURAL: 1for rural, 0 for urban
4.
AGE
5.
INCOME (TL)
6.
CONS2: consumption of soft drinks per month
Case 05 A promotion for soda consumers: The Linear Probability Model - simple and still useful (Business)
Data reference:
- Artificial data - Eray Yucel
- Authorization: for educational use
Variables:
1.
INCOME2
2.
U: 1 for Urban, 0 for Rural
3.
F: 1 for Female, 0 for Male
4.
W: 1 for Working, 0 for Non-working
5.
SODA: Monthly soda consumption (bottles)
6.
SODA20: 1 if SODA is at least 20
Case 06 A demonstration of the effect of omitted variables (Simpson’s paradox)
Data reference:
- Artificial data - Eray Yucel
- Authorization: for educational use
Variables:
1.
Y
2.
X
3.
D1, D2, D3

While these cases are being examined, we will concurrently be learning the use of ’Dummy variables’ in an embedded fashion: The theoretical knowledge needed will be provided when/as necessary.
⁰Checkpoint
No: 97

Cross-section versus Time series data

Our choice of theoretical exposition in ECON 222 maintained/kept cross-section data at a central position. In that, we often referred to our observations y_i, x_i1, x_i2, …, x_iK using the observation index ’ i ’. When this is the case, note that there is no natural ordering of observations. For example, writing the USA’s inflation rate in Row 2 of a data file, while we write the UK’s inflation rate in Row 7 for the same year and ’switching their rows’ do not yield different results.

Time series data, on the other hand, do have a natural ordering of observations, merely by the definition of time: before comes before now, now comes before tomorrow, so tomorrow comes after both. This underlines the importance of time as the primary key of our dataset when analyzing time series data and especially when we do it via dynamic models. Indifference/silence of this book, ECON 221 and ECON 222 about time series notation and data was of course intentional from a pedagogical viewpoint. Once you proceed to ECON 301 and ECON 302 (Econometrics sequence) be prepared to replace ’ i ’ with ’ t ’ as your new (and naturally ordered, t = 1, 2, ⋅⋅⋅ , T) observation index. Note that, all our formulations are rock solid / robust up to this change.

In the set of cases/exercises of this section, we make use of cross-section data sets.
NOTICE: Until a proper typeset is prepared, the cases/exercises of this section will be handled using Handouts. These Handouts will follow and summarize what is to be done in class lectures and they are available through the “Handouts” link under https://sites.google.com/view/erayyucel/probability-and-statistics. To have the latest available material and stay informed, keep a keen eye on this page.
⁰ Checkpoint
No: 98

8.13 Frisch-Waugh-Lovell theorem (FWL theorem)

FWL theorem shows how to decompose a regression of y on a set of variables x into two pieces. If we divide x into two sets of variables x₁ and x₂ and regress y on x₁ and x₂, the coefficient estimates on x₂ can also be obtained through the following steps:

1.: Regress all variables in x₂ on x₁ and take the residuals
2.: Regress y on x₁ and take the residuals
3.: Regress the residuals from step (2) on the residuals from step (1).

To demonstrate what the FWL theorem says, consider our Case02 (Home prices) again:

Dependent Variable: LP







Variable	Coefficient	Std. Error	t-Statistic	Prob.







C	0.652384	0.350431	1.861662	0.0655
LS	0.521313	0.085217	6.117444	0.0000
LT	0.368324	0.065693	5.606762	0.0000

In case02.wf1, page case02s2, we have the regression equation that

LP = β0+ β1LS + β2LT + e

estimated as above. So,

̂β0 = 0.6523 ̂β = 0.5213 1 ̂β2 = 0.3683

Focus on ₂ = 0.3683, i.e., the coefficient estimate of taxes (LT).

As to our application of the FWL theorem,

x = {LS,LT} x1 = {LS } x2 = {LT }

and y = LP.

(1) Here, we regress x₂ on x₁ that is LT on LS and extract the residuals and name it E_LT_ON_LS. You can view this series in case02.wf1.

Dependent Variable: LP







Variable	Coefficient	Std. Error	t-Statistic	Prob.







C	0.652384	0.350431	1.861662	0.0655
LS	0.521313	0.085217	6.117444	0.0000
LT	0.368324	0.065693	5.606762	0.0000

(2) Here, we regress y on x₁ that is LP on LS and extract the residuals and name it E_LP_ON_LS. You can view this series in case02.wf1.

Dependent Variable: LT







Variable	Coefficient	Std. Error	t-Statistic	Prob.







C	−1.473070	0.500340	−2.944136	0.0040
LS	1.095477	0.067801	16.15724	0.0000

Finally, we regress E_LP_ON_LS on E_LT_ON_LS and obtain the coefficient estimate for E_LT_ON_LS as 0.3683.

Dependent Variable: E_LP_ON_LS







Variable	Coefficient	Std. Error	t-Statistic	Prob.







C	0.007499	0.013445	0.557758	0.5782
E_LT_ON_LS	0.368324	0.065399	5.631914	0.0000

Notice that coefficient estimate of LT in the very first regression is identical with the coefficient estimate of E_LT_ON_LS on this page.

This is how the FWL theorem functions.

⁰Checkpoint
No: 99 ⁰ Checkpoint
No:100
The End

[prev] [prev-tail] [front] [up]

Chapter 8Linear regression analysis

8.1 Overview of linear models

8.2 Transformations and functional forms

8.3 Our approach to teaching/learning

8.4 Building and estimating an Unconditional Model of Mean: A model which is a non-model

8.5 Building and estimating a Simple Linear Regression model

8.6 Building and estimating a Multiple Linear Regression model: An increase in dimensionality

8.7 Goodness of fit

8.8 Handling statistical uncertainty: calculation of variances and covariances associated with a Multiple Linear Regression model

8.9 Statistical inference

8.10 Essence of the Gauss-Markov assumptions

8.11 Model Specification

8.12 Regression analysis at work

8.13 Frisch-Waugh-Lovell theorem (FWL theorem)

Chapter 8
Linear regression analysis