In the previous chapters that served the whole ECON 221 and about a half of ECON 222, we studied the fundamentals of Probability theory and the key theory and toolset of Statistical inference. Remember that we focused solely on understanding statistical distributions and estimating the distributional parameters. In your future scientific, technical, professional practice, this body of knowledge will be quite fruitful.
Now, we are ready to study the theoretical background and applied dimensions of the ’curve-fitting’ problem. To this end, in this chapter, we will consider the Linear Regression Models. Notice that what we will do here accounts for the first half of a traditionally designed ’Introductory Econometrics’ course.
The term regression was coined by Francis Galton to describe a biological phenomenon. The phenomenon was that the heights of descendants of tall ancestors tend to regress down towards a normal average, which is also known as regression toward the mean. For Galton, regression had only this biological meaning. His work was later extended by Udny Yule and Karl Pearson, and later by Fisher (in a way to come closer to Gauss’s 1821 formulation of the problem). Once you have researched it, you will enjoy the history of this certain line of research.
As being the main pillar of it, the ’regression analysis’ takes us to the rich analytical world of Econometrics. The literal meaning of the term econometrics (econo+metrics) is ’measurement in economics’. Econometrics is ’the branch of economics concerned with the use of statistical methods in describing and quantifying economic systems’ (Oxford Dictionary). From a broader perspective, econometrics is a shared sub-field of Statistics (hence of Mathematics) and Economics. In that, our tools in Econometrics are those tools in Statistics as shaped and augmented by our knowledge of Economics. (One of the founders of the Econometrics Society, another pioneer of the field, Ragnar Frisch is credited with coining the term ’econometrics’.)
Renowned academic Badi Baltagi says "An econometrician has to be a competent mathematician and statistician who is an economist by training. Fundamental knowledge of mathematics, statistics and economic theory are a necessary prerequisite for this field".
Our starting point is a scientific urge to find/formulate, measure and test
the relationship between, say, two variables y and x. These variables may
belong to natural sciences, social sciences or even to humanities; this is not
something to mind. What matters often is that the linkage between
our variables may not be (mostly is not) a perfect relationship like
y = mx + n (we prefer indeed a notation like y = β0 + β1x, where n
is β0 and m is β1). We rather observe there are deviations from a
perfect relationship, as seen earlier in ECON 221. In that, actual y
values are connected to actual x values through a relationship like
y = β0 + β1x + e where e stands for a sequence of statistical errors
(disturbances).
The error sequence e may stem from the random actions/choices of
humans, unexpected shocks to socio-economic systems, misspecification of
models, improper choices of mathematical functional forms or imprecision
of the data. Note that, this picture is not specific to social sciences: in the
natural science experiments there is a multiplicity of sources of uncertainty
(hence of statistical errors or disturbances).
Goals of econometrics, as we understand it, will be (1) to find the relation between variables y and x, encapsulated in (β0, β1), (2) to validate and quantify theory and (3) forecasting.
Purpose of modeling and Simplicity
Deferring a detailed discussion of it to class gatherings, we will say here
that ’a model is a downsized yet realistic representation of reality’.
An immediate analogy from architecture would be useful: on an
architectural model of a building we see things ’only as needed’.
While we may not see the doorknobs (depending on the scale) on a
model, we see the proportionality of distances clearly. After all, the
purpose of the model is to give a broad yet accurate idea of/about
things.
A similar idea applies in the other disciplines. In business models we do not see every tiny detail of the workplace or the manufacturing environment. In economic models we tend not to include all potential explanatory variables at once. We just try to remain ’accurate enough’.
Using our models we can present our scientific grasp of the nature or universe or the society. Once the model is well-parametrized and quantified, we can develop forecasts of the future, or we can (depending on the type of our model) develop counter factual and/or scenario analyses. Presenting a scientific view of ours and forecasting the future (while) are fairly pragmatic ends, a third use of a scientific model helps testing, validating/ invalidating theories, which calls for a more than pragmatic spirit. Regardless of the purposes cited, though, a model (any model) should display: a certain level of simplicity. Before proceeding, recall Albert Einstein saying "Everything should be made as simple as possible, but no simpler."
In our practice of statistical/ econometric modeling, the ’principle of parsimony’ guides us. Equipped with a rich toolset of formal statistical tests and her judgmental skills, a good researcher tries to come up with an "as simple as possible but no simpler" model. Common sense says essentials will be included in while all the inessentials will be omitted from a model. Bad news is every researcher’s practice has a couple bumps as to improving a sense of such in practice. Good news is honest and hard work pays back.
In Philosophy (and Science) there are several ’razors’ to shave away the redundancies in models (or in scientific explanations). Here we will maintain the Occam’s razor (or Ockham’s razor) attributed to william of Ockham, an English philosopher of the 13th-14th centuries. Occam’s razor is a principle of parsimony stating that among the explanations addressing the same thing, the simplest is to be picked! (William of Baskerville of the Name of the Rose by Umberto Eco is a tribute to William of Ockham) (Arthur Conan Doyle’s Sherlock Holmes once utters "When you have eliminated the impossible, whatever remains, however improbable, must be the truth")
Occam’s razor reads in Latin as "pluralitas non est ponenda sine necessitate" which translates into English as "plurality should not be posited without necessity". The principle, so, calls for parsimony in ’deductive thinking’.
Despite what we do in applied statistics/econometrics is not purely (maybe not at all) deductive thinking, we rather try to reach an inference to the best explanation via a formal sequence of estimations/tests/calculations. In this practice, Occam’s razor sheds some good light for us to see things clearly.
In the world of project development you may hear the same principle as an acronym of ’KISS’. Referring to a model, KISS reads as ’Keep It Small and Simple’ or sometimes as ’Keep It Simple, Stupid’. (Search yourself for its relevance to the US Navy)
In the remainder of this chapter, we will study/learn the theory of elementary econometrics, a rich enough toolset pertaining to it along with a selection of applied problems.
8.1 EXERCISES____________________________________________
Refer to our in-class discussions to explain/discuss the following:
v. Come up with a synthesis of the terms/phrases referred to above.
Solution: Left as self-exercise.
Overview of linear models
The specific meaning of linearity here is ’the linearity of a model in terms of (with respect to) its parameters. In that:
is a linear model. So is
However,
is not considered to be a linear model. Neither is
In your future practice, you will be able to settle this issue in a crystal clear fashion.
Why do we resort to linear models? This is a very legitimate question once we observe a number of relationships in the nature and in societal life are, indeed, nonlinear (not linear). A straightfor ward answer reads as ’linear models are easy to use’. So, simplicity matters. Simplicity brings practicality to researchers, they are easy to compute, to interpret and to communicate. More importantly, as noted earlier, our linear regression models are linear with respect to their parameters while the independent variables of our models can be of any nonlinear form. All in all, one can establish/ form models that are nonlinear in their variables’ using ’models that are linear in their parameters’. The good thing about models that are linear in parameters is that such a structure allows us to use the tools of linear algebra effectively in our computations.
Our curious nature often forces us to include many explanatory variables in a model:
However, a minimalist design is also possible:
Even this may be a good enough model (think when):
The process of inference begins with the specification of an economic model. Then a statistical model describes the sampling process that we visualize was used to produce the sample data. See the structure below:
Economic model:
Statistical model:
The random error term (e) serves three main purposes:
See the structures below:
Case 1: Unconditional model of mean
Economic model:
Statistical model:
Case 2: Simple Linear model
Economic model:
Statistical model:
Case 3: Multiple Linear model
Economic model:
Statistical model:
Transformations and functional forms In economics and finance,
like in other quantitative disciplines, we attribute a great deal of
importance to measuring the impact of a change in one variable on one
another. Considering y = f(x) as a relationship between the variables
y (dependent) and x (independent), the derivative dy/dx = f′(x)
describes that impact. When we consider y = f, the
impact of an independent variable xi on the dependent variable y is
better described by the partial derivative ∂y/∂xi. Having formed and
estimated a proper statistical/ econometric model, then, a researcher
gains a good grasp of issues embedded in the research problem at
hand.
Note that, as economists and finance specialists, we like to learn about a special class of impact measurements, namely the elasticities. Recall from your introductory economics classes that ’elasticity of y with respect to x is the percentage change in y against a one percentage point change in x ’. in formal terms:
So, as long as we can estimate Δy/Δx, we can come up with an estimate of ηy,x by substituting appropriate values of x and y into x/y. We will see several examples as we progress through this chapter, where we will see estimating an elasticity is possible under a wide array of functional forms of f(⋅) in the expression y = f(x).
One of the functional forms, i.e., the Log-Log form, yields elasticities directly as:
We will discuss this topic further in our classes.
In the remainder of this chapter, we will maintain an approach which may slightly differ from the approaches of others. Sticking to this approach would facilitate better learning. Our approach folds out as:
Depiction of an unconditional model of mean and the mechanics of estimation (without inference)
Depiction of a Simple Linear Regression model and the mechanics of estimation (without inference)
Depiction of a Multiple Linear Regression model and the mechanics of estimation (without inference)
Goodness of fit of a model measured via R2
Handling statistical uncertainty: calculation of variances and covariances associated with a Multiple Linear Regression model
Statistical inference
Having a pitstop here, the sequence of topics above will provide us with a solid understanding of the mechanical workings of our linear regression universe.
Once we have learned these, we will move to:
Ideal econometric conditions: Gauss-Markov assumptions
In many, maybe all, books Gauss-Markov assumptions are covered before other things. Though, our approach maintains a different pedagogical perspective. In that, we take into consideration the Gauss-Markov assumptions, which are crucial in econometric theory and practice, upon a clear view of the working environment. After that we will move to:
Model specification
Regression analysis at work
Note that the above order of topics require us to stick to it without
interruption or gaps for successful learning.
An artificial data set:
In our subsequent discussions we will be referring to the following data set frequently. While we can show a data set as an actual set (with proper mathematical notation) like:
it may be more practical to use a tabular listing of the data. A tabular structure improves visibility and exposition:
Observation i | xi | yi | Observation i | xi | yi |
1 | 2 | 1 | 16 | 15 | 11 |
2 | 2 | 3 | 17 | 15 | 17 |
3 | 3 | 2 | 18 | 16 | 13 |
4 | 3 | 3 | 19 | 19 | 15 |
5 | 3 | 4 | 20 | 21 | 16 |
6 | 5 | 3 | 21 | 23 | 18 |
7 | 5 | 4 | 22 | 23 | 19 |
8 | 5 | 6 | 23 | 23 | 20 |
9 | 8 | 5 | 24 | 25 | 18 |
10 | 8 | 8 | 25 | 25 | 20 |
11 | 10 | 6 | 26 | 26 | 21 |
12 | 11 | 8 | 27 | 27 | 24 |
13 | 11 | 10 | 28 | 28 | 21 |
14 | 12 | 8 | 29 | 28 | 24 |
15 | 14 | 10 | 30 | 28 | 25 |
Consider a variable y that is modeled as:
If we have a sample y1, y2, …, yn, this relationship can also be written as
It is clear that our model does not include any independent (explanatory) variables on the right hand side, ie., values of y are scattered around β0 (if they are not all accidentally equal to β0 ).
Supposing there are K potential independent variables, x1, x2, …, xk, that might explain y, the unconditional model of mean can be viewed as:
where the researcher places zero weight on x1, x2, …, xk. In that, this
model of mean turns out to be the simplest possible model or more like a
non-model. When we plot yi against one of the x’s (say xki), the model of
mean is to appear as a horizontal line (as the model disregards x ’s). This is
simply the orange line displayed below (observe that across the orange line
dy/dx = 0:
To estimate β0 in y = β0 + e we need two main ingredients:
Data on y, a set of n observations y1, y2, …, yn collected from the population y1, y2, …, yN. Our n observations as a whole is called a sample; recall from our discussions in earlier chapters that the sample should be randomly picked and large enough
A formula to compute the desired numerical result; recall that this
formula is called an estimator and the numerical result it yields
is called an estimate, here 0. Note that we need a method (rule,
criterion) to derive our estimator (formula). Here, we will use ’Least
squares’ as our method.
Now, suppose out estimator is 0. Then, the estimated values of yi
(denoted as ŷi ) are written as:
Actual values of yi, on the other hand, are:
equivalently
The difference between yi and ŷi are the error terms:
Consider the function S :
The Least Squares method instructs us to minimize S by optimally
choosing 0 :
The F.O.C. for this problem is:
which is followed by:
So, not surprisingly and as maybe called from our discussion of point
estimators, the sample mean is the estimator of population mean. Namely,
0 = ȳ estimates y.
A note on the function S may be useful here: As 0 is the estimated mean
of yi, the function S shows the variance of error terms multiplied by
n. This is good to keep in mind: the least squares estimator is a
’minimum variance estimator’ as we will formally discuss later.
Statistical properties of the error terms ei will also be covered in
detail.
Returning to qualities of the sample mean 0 = ȳ as an estimator
of population mean β0, one can be intellectually stunned by the
beauty generated by simplicity. There are couple things to mention:
Representing a variable y with its unconditional mean is a
meaningful alternative only when there is no good explanatory
variables
to model y
In that the unconditional model of mean simply provides us with a descriptive statistic
Still, the unconditional model of mean is very valuable to us as a ’non-model’. This is the model when no explanatory variables work and we use this model as a benchmark in assessing the statistical significance of other (nonempty) models in the subsequent sections.
Consider a variable y which we believe is explained by another variable x via a linear relationship like:
in this expression,
β0 stands for the autonomous / unconditional component of y
β1x stands for the part of y attributable to x; depending on the sign of β1, an increase in x may induce an increase or a decrease in y
A case of β1 = 0 corresponds to our unconditional model of mean
Below, the green line is a good candidate to be a Simple Linear
regression line:
Notice that we need to estimate two parameters β0 and β1 this time. The
Least squares method is again applicable. Let us go over its steps
below:
Now, reconsider that 0 = ȳ−
1 and that ∑xiyi−
0 ∑xi−
1 ∑xi2 = 0.
Substituting the first one into the second:
Now notice the following:
as,
and, as the sum of the deviations from the mean is zero, i.e.,
and
The same logic applies in:
At the end, the above-driven expression for 1, i.e.,
can be rewritten as:
so, can be written as:
To sum up, our Least Squares estimators 0 and
1 for the model
parameters β0 and β1 are found to be:
and
In the graph given below, try to observe why the green line is superior to
others in representing our data:
Consider
where
β0 stands for the autonomous/ unconditional component of y
Each βjxj stands for the part of y attributable to xj, sign of βj determining the impact of xj on y. (j = 1, 2, …, K)
Note again, a case of β1 = β2 = … = βK = 0 corresponds to our unconditional model of mean
As before, this minimization problem will give us 0,
1, …,
K, ie., the
estimators of β0, β1, …, βK.
For future ease, let us restate our Multiple Linear model using matrix notation. To do this, let us first write our model equation for every single observation (for each i = 1, 2, …, n):
In matrix notation:
can be written. Then,
It is also possible to write each explanatory variable as a separate vector like:
so the model looks like:
When the matrix expression y = Xβ + e is maintained, the function S becomes
where e′ is the transpose of e.
Returning to our minimization problem written in classical notation, the following first order conditions are written:
Simplifying a little:
Reorganizing the terms:
Notice that this last set of equations can be written as:
In terms of our earlier definitions of x and y as well as β; what we have obtained is
So,
Solves our minimization problem and =
′ contains
our parameter estimates.
8.2 EXERCISES____________________________________________
Reconsider the Simple Linear model y = β0 + β1x + e and show
that the =
−1X′y works in estimating β0 and β1 (ie., while
finding
0 and
1 ). Solution:
So,
So,
Checking back our earlier solution, we verify that =
−1X′y
works well.
Question: Write and solve the Least squares estimation problem for
that is, a model with a constant term and two explanatory
variables.
Solution: Solution: Left as self-study.
Suppose we have the following model:
Observe that
and consider the quantity ∑ 2: This quantity is called ’Total
Sum of Squares’. In what follows, we decompose it into other useful
quantities:
Reordering the terms in the last expression:
is obtained. In this expression,
TSS, ESS and RSS stand for:
TSS: Total Sum of Squares
ESS: Explained Sum of Squares
RSS: Residual Sum of Squares
Notice that the Total Sum of Squares ∑ 2 is nothing but the
variance of y multiplied by n :
Explained Sum of Squares ∑ 2 measures the sum of squared
deviations of our estimated values of y (namely ŷi ) from ȳ (namely the
unconditional mean of our dependent variable y ). As ŷi values are implied
by our model’s explanatory variables
, the ESS measures the
portion of TSS that we explained. Residual Sum of Squares, then, measures
the portion of TSS that could not be explained. The Coefficient of
Determination R2 is the fraction of variation in y explained by our
knowledge of x:
Note that, if the model does not have a constant term (that is β0 is omitted), then the measure R2 is not appropriate anymore. When the constant term is omitted,
A bad habit of R2 is that it tends to somehow increase upon the inclusion of additional explanatory variables (in fact, when their t-statistics exceed 1 in absolute value: we will see in subsequent sections) in a model. Does this mean we should continue adding more and more explanatory variables to our model ’just to push up R2’? The answer is quite the opposite: we must see the inclusion of more variables as a cost (after all we want to come up with a parsimonious model). Then, we need to balance the benefits of more explanatory variables (enhanced ESS) with the cost of including them.
The Adjusted Coefficient of Determination serves that purpose:
Notice that:
Also keep in mind that neither R2 nor 2 has a statistical distribution. So, they are not directly and formally testable. Though, a simple arithmetic reorganization of 2 resembles an F test score (test statistic) as we will consider very soon.
As stated before in "Our approach to teaching/learning’, up to here we maintained a naive and mechanical view of the Linear Regression modeling. In that, we deliberately, avoided calculations and discussions of the measures of dispersion or co-dispersion associated with our models. Now, it is the time to turn to reality. After all, ei sequence has a certain statistical distribution, so does yi. As we will formally study under the heading of ’Ideal econometric conditions: Gauss-Markov assumptions’, the ei terms have:
that is, a Normal (Gaussian) distributton with a mean of zero (0) and constant (and preferably finite) variance.
As a consequence yi values have:
Intuitively, the mean of yi depends on (is conditional on) x1, x2, …, xK (along with their parameters); while variance of y simply mimics that of e (by the very construction of our analytical framework).
The key thing to understand now is the variability of our parameter estimates: once they are obtained from a stochastic/ random data set, it is natural/ trivial to expect each of our estimators to have a nonzero variance and each pair of our estimators to have a covariance.
We devote this section to some rigorous treatment of what we call a ’variance-covariance’ matrix.
Let us begin from e ∼ Normal. Once we assume the error terms to
have a Normal distribution with a mean of zero (0) and a variance
of σ2, we may proceed to the following Q&A style mathematical
elaboration:
Q: Do we know the value of σ2 ?
A: No, it belongs to the population of ei’s. But, we only have a sample of ei ’s, namely êi ’s.
Q: Can we use those êi ’s to estimate σ2, that is to obtain 2 ?
A: Yes, the formula for 2 is:
Q: can we express 2 using matrix notation?
A: Yes, the expression is:
Q: What about the Cov values, can we calculate them?
A: Sure, in matrix notation,
Q: What about the distribution of ?
A:
Q: What does this mean?
A: First, each parameter estimate is unbiased, E( ) = β. Second; the
variances are ruled by σ2
−1.
Q: How is the structure of the variance-covariance matrix?
A:
Q: But, we do not know the value of σ2?
A: Then, substitute 2 for it:
Q: Does that mean we will be using the estimated values of variances and covariances?
A: Sure. This is what we have been doing since the beginning of our ECON 222 journey.
Q: Are we now ready to dive into the fascinating world of statistical inference over our estimated models?
A: Very much, indeed.
Q: Are you an AI?
A: No. Are you?
We have studied/learned up to this point:
Probability basics and a rich-enough collection of well-known statistical distributions in ECON 221 (Chapter 1, Chapter 2, Chapter 3, chapter 4)
Point estimators of distributional parameters, and the fundamentals of statistical inference (confidence intervals and hypothesis testing) in ECON 222 (Chapter 5, Chapter 6, Chapter 7)
Structure, formation and estimation of Simple Linear Regression and Multiple Linear Regression models in ECON 222 (earlier sections of Chapter 8)
Now, we are ready to place our estimated models under some serious scrutiny. Using the inferential tools that we learned, we will evaluate, test and scientifically question our regression models.
In a bold fashion, we can say that what we did up to here (i.e., estimating regression models) is no more than the half of the job. To have the job actually done, we need to delve into the following tasks:
Now, let us give examples to each category of tasks listed above. To do this, suppose we have the following economic model:
Recall that, this is our model written for the population and we turn it into a statistical model (written again for the population) by introducing the statistical error (disturbance, sometimes ’shock’) terms:
where ei ∼ Normal. As you know well now, we do not know
the true values (population values of βj ’s). So, we will estimate
the model using a sample of n observations and the Least Squares
technique.
Provided that everything goes well on the paper and in the computer, we will end up with a rich set of estimates:
Estimates of model parameters: 0,
1,
2,
3,
4
Estimated sequence of the dependent variable: ŷi
Estimated sequence of error terms: êi
Estimated model variance:
Estimated "variance-covariance matrix:
Now, suppose the following claims and/or questions come from an academic/ technical colleague. (Needless to say, even when there is no criticizing colleague around, we need to put these claims on our own and heavily test our models):
Our road map to assess these questions begins with formulating these questions/claims in some formal notation:
Following the same order as above:
Here, we apparently need to calculate Var( 1 +
2). Using our
knowledge from ECON 221:
where Var( 1), Cov(
1,
2) and Var(
2) are straightforwardly
obtained during the estimation of the model. Once Var(
1 +
2) is
at hand, se(
1 +
2) =
yields the required standard
error.
Distribution of the test statistic:
Calculation of the test statistic:
Var( 3 +
4) will be treated as outlined above for the case of
Var(
1 +
2). Note that this test can also be conducted as an F test,
as we will cover in our class discussions.
Total sum of squares TSS being ∑ 2, explained sum of
squares ESS being ∑
2 and residual sum of squares RSS
being ∑êi2:
J being the number of joint hypotheses, RSSR being the RSS for the restricted model and RSSU being the RSS for the unrestricted model:
Note / clarify again that RSSU is the RSS value of the unrestricted, i.e., full model, which is:
where, RSSR is the RSS value of the restricted model, which is:
equivalently of:
Returning to our previous hypothesis test:
you will notice that the restricted model is:
or
against the unrestricted (full) model of:
Herein, RSSR becomes the TSS of the full model (verify yourself), RSSU becomes the RSS of the full model (should be trivial) and J becomes K. Then, the equivalence between
and
becomes apparent.
We will now use a model estimated on a computer to exemplify each of
the cases above:
[To be distributed as a handout]
Having studied the mechanical aspects of Linear Regression models, now it is the time to establish the conditions under which a linear regression model is viable with workable results. As we often call it ’ideal econometric conditions’, the Gauss-Markov assumptions level the field for us. If a model abides by these assumptions, i.e., if a model has been formed so as to hold the Gauss-Markov assumptions, then it is a good econometric model.
Now we can review how good is our LS estimator under these
conditions. Consider the Simple Linear regression model yi = β0 + β1xi + ei
and consider the Gauss-Markov assumptions. Let us now try to see how
good is our LS estimator under these assumptions. Recall that 1
is:
which can also be written as:
As ∑ = 0 (shown before), the expression becomes:
Then,
can be written since our x (independent variable, explanatory variable) is non-stochastic.
We also know by the Gauss-Markov assumptions that E = 0, i.e.,
our knowledge of x does not improve expectation of e. So,
equivalently saying 1 is an unbiased estimator of β1.
What about E ?
Then,
So,
equivalently saying 0 is an unbiased estimator of β0.
Expanding the expression and rearranging its terms:
As E = σ2 and E
= 0,
This expression is a Noise/Signal (i.e., a noise-to-signal) ratio expression.
Examining
We see that, to decrease Var( 1), a larger sample size n, a larger Var(x)
and a smaller σ2 would help. Among these, the researcher’s choice of the
sample data affects n and Var(x). σ2, on the other hand, is out of the
researcher’s reach.
To simplify this expression observe/elaborate:
(1)
(2)
(3) EE(ē) = 0
Then,
is reached. Rearranging:
is obtained.
(1) The larger the value of σ2 the larger will be the variances of the estimators.
(2) Var will be smaller, the larger the value of ∑
2. This is
also true for Var
, but it is less evident as ∑xi2 appears in the
numerator of Var
expression.
(3) Because the number of terms in ∑ 2 increases in n (sample
size), an increase in n generally leads to an increase in precision.
There are two main approaches to model specification:
Starting out small, with one or few explanatory variables; retaining statistically significant ones and expanding the variables when needed
Starting out large and throwing out insignificant variables to reach the true model.
Regarding either of the approaches, we need a good methodological basis. The material of the section entitled ’Statistical inference’, luckily, provides us with the toolset to establish that. The task of model specification involves a systematic sequence of hypothesis tests and evaluation of models with respect to some ad hoc criteria. While the t tests and F tests equip us to assess our models, R2, 2, AIC, BIC (or SIC) and HQ information criteria further strengthen our hand to come up with parsimonious model specifications.
Akaike Information Criterion:
Bayesian Information Criterion or Schwarz Information Criterion or Schwarz Criterion or Schwarz-Bayesian Criterion:
Hannan-Quin Criterion:
Among the rival models, the ones with lower information criterion values are preferable to others. Therein, it is a good practice to use the same sample size while comparing models via information criteria.
In this section we will put our theoretical knowledge into practice. The modeling exercises that we will consider maintain a manageable pedagogical standard, they are somehow downsized and sometimes oversimplified. Yet, they are designed to deliver the intended message of the chapter with regard to applied statistical/ econometric research.
The cases we will consider are as follows:
Case 01 State public expenditures in the US: A public finance model (Economics)
Data reference:
U.S. Department of Commerce, Bureau of the Census, Government Finances in 1960, Census of Population, 1960, Census of Manufactures, 1958, Statistical Abstract of the United States, 1961.
U.S. Department of Agriculture, Agricultural Statistics, 1961.
U.S. Department of the Interior, Minerals Yearbook, 1960.
Authorization: for educational use
Variables:
Case 02 Home prices in Albuquerque: what determines home prices? (Economics, Real estate, Business)
Data reference:
Albuquerque Board of Realtors
Authorization: for educational use
Variables:
Case 03 Taste of cheese: An assessment of subjective scores (Product development, Business)
Data reference:
Moore, David S., and George P. McCabe (1989). Introduction to the Practice of Statistics.
Authorization: for educational use
Variables:
Case 04 Consumption of soft drinks: Practicing categorical determinants (Consumer research)
Data reference:
Artificial data - Eray Yucel
Authorization: for educational use
Variables:
Case 05 A promotion for soda consumers: The Linear Probability Model - simple and still useful (Business)
Data reference:
Artificial data - Eray Yucel
Authorization: for educational use
Variables:
Case 06 A demonstration of the effect of omitted variables (Simpson’s paradox)
Data reference:
Artificial data - Eray Yucel
Authorization: for educational use
Variables:
While these cases are being examined, we will concurrently be learning
the use of ’Dummy variables’ in an embedded fashion: The theoretical
knowledge needed will be provided when/as necessary.
0Checkpoint
No: 97
Cross-section versus Time series data
Our choice of theoretical exposition in ECON 222 maintained/kept cross-section data at a central position. In that, we often referred to our observations yi, xi1, xi2, …, xiK using the observation index ’ i ’. When this is the case, note that there is no natural ordering of observations. For example, writing the USA’s inflation rate in Row 2 of a data file, while we write the UK’s inflation rate in Row 7 for the same year and ’switching their rows’ do not yield different results.
Time series data, on the other hand, do have a natural ordering of
observations, merely by the definition of time: before comes before now,
now comes before tomorrow, so tomorrow comes after both. This
underlines the importance of time as the primary key of our dataset when
analyzing time series data and especially when we do it via dynamic
models. Indifference/silence of this book, ECON 221 and ECON 222
about time series notation and data was of course intentional from a
pedagogical viewpoint. Once you proceed to ECON 301 and ECON 302
(Econometrics sequence) be prepared to replace ’ i ’ with ’ t ’ as
your new (and naturally ordered, t = 1, 2, , T) observation index.
Note that, all our formulations are rock solid / robust up to this
change.
In the set of cases/exercises of this section, we make use of cross-section
data sets.
NOTICE: Until a proper typeset is prepared, the cases/exercises
of this section will be handled using Handouts. These Handouts
will follow and summarize what is to be done in class lectures
and they are available through the “Handouts” link under
sites.google.com/view/erayyucel/teaching. To have the latest available
material and stay informed, keep a keen eye on this page.
0Checkpoint
No: 98
FWL theorem shows how to decompose a regression of y on a set of variables x into two pieces. If we divide x into two sets of variables x1 and x2 and regress y on x1 and x2, the coefficient estimates on x2 can also be obtained through the following steps:
To demonstrate what the FWL theorem says, consider our Case02 (Home prices) again:
Dependent Variable: LP | ||||
Variable | Coefficient | Std. Error | t-Statistic | Prob.
|
C | 0.652384 | 0.350431 | 1.861662 | 0.0655
|
LS | 0.521313 | 0.085217 | 6.117444 | 0.0000
|
LT | 0.368324 | 0.065693 | 5.606762 | 0.0000
|
In case02.wf1, page case02s2, we have the regression equation that
estimated as above. So,
Focus on 2 = 0.3683, i.e., the coefficient estimate of taxes (LT).
As to our application of the FWL theorem,
and y = LP.
(1) Here, we regress x2 on x1 that is LT on LS and extract the residuals and name it E_LT_ON_LS. You can view this series in case02.wf1.
Dependent Variable: LP | ||||
Variable | Coefficient | Std. Error | t-Statistic | Prob.
|
C | 0.652384 | 0.350431 | 1.861662 | 0.0655
|
LS | 0.521313 | 0.085217 | 6.117444 | 0.0000
|
LT | 0.368324 | 0.065693 | 5.606762 | 0.0000
|
(2) Here, we regress y on x1 that is LP on LS and extract the residuals and name it E_LP_ON_LS. You can view this series in case02.wf1.
Dependent Variable: LT | ||||
Variable | Coefficient | Std. Error | t-Statistic | Prob.
|
C | −1.473070 | 0.500340 | −2.944136 | 0.0040
|
LS | 1.095477 | 0.067801 | 16.15724 | 0.0000
|
Finally, we regress E_LP_ON_LS on E_LT_ON_LS and obtain the coefficient estimate for E_LT_ON_LS as 0.3683.
Dependent Variable: E_LP_ON_LS | ||||
Variable | Coefficient | Std. Error | t-Statistic | Prob.
|
C | 0.007499 | 0.013445 | 0.557758 | 0.5782
|
E_LT_ON_LS | 0.368324 | 0.065399 | 5.631914 | 0.0000
|
Notice that coefficient estimate of LT in the very first regression is identical with the coefficient estimate of E_LT_ON_LS on this page.
This is how the FWL theorem functions.