This is a pretty big topic which is often dealt with badly
*. The aim of this post is to provide a
(somewhat) clear understanding of what the G-M assumptions are and what they entail.
Wherever possible, I will try to minimise the usage of math.
* Just take a look at the Wikipedia article..
http://en.wikipedia.org/wiki/Gauss%E2%80%93Markov_theorem
----------------------------------------------------------
The G-M assumptions are a set of criteria which were created by mathematicians Carl Friedrich and Andrey Markov. If the G-M assumptions are upheld then this tells us something about our ability to use least squared estimators on the sample data.
It says that those least squared estimators are
BLUE. For an estimator to be blue, it means that there are no other linear unbiased estimators which have a lower sampling variance than that particular estimator.
Let's imagine that I have the sampling distribution of one estimator and then I have another one, which has the same centre as the previous, but it is slightly steeper towards the actual centre of the distribution.
Assuming that both of these are unbiased (centered around the true population parameter), we can see that the second estimator has a lower sampling variance than the first.
This means that more often than not, when I apply least squared estimators to sample data, they will more often than not provide me with closer estimates of the true population parameter
An illustration of the above
The assumptions (in no particular order)
(1)
We require a model which is linear in parameters. The above population process is linear in parameters alpha and beta, so it satisfies this assumption.
It is important to note that the variable
v cannot determine
Y due to the presence of the error term
u.
The above is an example of a population process which violates our assumption, as it is
non-linear in parameters due to the multiplicative effect between alpha and beta.
However, this assumption does not entail that we have a population process which is linear in variables. A population process which is
non-linear in variables such as the above, can still be consistent with our assumption.
(2)
The set of sample data we are working with is a random sample from the population. This means that within our population, a random sample occurs if
each individual in the population is equally likely to be selected. This also implicitly means that all of our data points come from the same population process.
(3)
This assumption, known as the
'Zero-conditional mean of errors' is arguably the most important.
In the above population process, the variable
v now represents 'years of education'. We are trying to see what effect v has on Y.
If I know what level of education a person has, the Zero-conditional mean of errors states that this does not help me to predict whether said person will be above or below the population regression line.
This assumption tells us that:
From the above, it follows that the expectation of my error given any of my independent variables
has to be equal to 0.
If this is violated, the following happens:
And this implies:
In sum,
if the Zero-conditional mean of errors is violated then our least squared estimators are biased, which is demonstrated above.
As a result, our least squared can become upwardly biased, which is shown in the following diagram:
Another result of this assumption is:
If I was to plot
ui against
xi , there should be no discernible relationship between the two. Simply put, there should be no correlation between the two variables.
(4)
There is
no perfect colinearity in regressors.
Using the above as an example in light of no perfect colinearity assumption, there cannot be an exact relationship between
x1 and x2.
The above seems to suggest that we can know the value of
x2 through x1. If this is the case, then we have violated the assumption of no perfect colinearity.
(5)
Homoscedastic errors
A more careful representation of the homoscedasticity assumptions yields:
This tells us that the variance does not vary systematically with
x.
(6)
No serial correlation
No serial correlation means that the
errors have to be independent of one another. Knowing one of the errors does not help me to predict another error.
An exception to this assumption is where
i=j. If this is the case, we are working with a variance and not a covariance.
That's all there is to it... for now.