Numbers

The F-test: An example

For the sake of this example, let's assume that N=200.

Unrestricted regression

Restricted regression

Null hypothesis

We want to test if the independent variables in this regression are jointly insignificant in determining our dependent variable, we do this by forming the F statistic:

Let's assume that this value is 100, we have to weigh this against our critical value.

Critical Value

This takes the following form, where P is the number of restrictions on our null hypothesis (in our case, this is 2):

If the following holds true, we say that we can reject the null hypothesis. If it does not hold true, then we do not have enough evidence to reject the null hypothesis:

An introduction to the F-test

Let's assume that we have a linear regression such as:

If we wanted to work with the t statistic, our null hypothesis would be something along the lines of:

The above is ok for testing just one regression coefficient, but we can't use the current framework to test multiple coefficients.

Multiple coefficient test

This null hypothesis is basically telling us that in the population, all the effects of each variable are jointly equal to 0. We are therefore testing for the joint non significance of all the variables in our model.

The alternative hypothesis states that if any of the coefficients are not equal to 0, this would result in us rejecting the null hypothesis.

What we should expect:

If we have a regression where some of our coefficients have very high t-stats, it is unlikely that we would fail to reject the null hypothesis. This is because even if one of the variables is significant, all the variables are not jointly insignificant.

A high t-stat means that we are likely to reject the null hypothesis under multiple regression.

Typically, for the F-stat, it normally has a critical value ranging from 3 to 5.

So far, we've only discussed the t-stat, how do we conduct an F-test?

We go about this by first thinking about unrestricted regression (UR):

This is essentially saying that if our model fits the data well, the sum of squared residuals (SSR) is likely to be low as our errors will be low.

After we get the SSR from our UR, we have to perform a restricted regression (R) :

We restrict the model by assuming that the dependent variable does not depend on the independent variables.

Therefore, any sort of extra variation in the independent variables helps to reduce the error associated with predicted the dependent variables.

In order to know if the restricted SSR is significantly greater than the unrestricted version, we need to form the F-statistic:

If the numerator is particularly large relative to the denominator, when we move from R to UR, we explain more than the R model. This leads to the unrestricted SSR being significantly lower than the restricted version.

The larger the F-stat, the more likely we are to reject the null hypothesis.

We will reject the null hypothesis if the value of our F stat > critical value.

Log models (2/2)

This post will be a mathematical elaboration of the previous.

In the previous post we established that a log-log model is where we have a logged dependent variable and we are regressing that on a bunch of logged independent variables.

Before we begin, we need to remind ourselves of some log rules:

Given our log-log model, let's say that we are interested in finding out what Y is.
How can we do this?

We need to anti-log our model:

This result is another reason why log models are attractive. This rises from the fact that we have assumed non-linearity in our independent variables. Specifically, we are trying to estimate the degree of this non-linearity.
We want to estimate β1 and β2 in order to discover the non-linear effects of X1 and X2 on Y.

For all intents and purposes, this is a much more realistic assumption than the wholly linear model.

Log models (1/2)

In the previous post, we established that the case for a wholly linear model is:

β1 is the marginal effect of X1 on the dependent variable Y.
If X1 changes by one unit, assuming that all other variables are constant, what happens to Y?

This is where the log model is of some use to us:

In this model, β1 captures the corresponding increase in lnY due to an increase of 1 unit in lnX1.
However, this is a very cumbersome description and it doesn't really tell us much.
Assuming that all variables are held constant and the only thing we vary is X1, we can differentiate the above:

With some minor adjustments, the above can be restated as:

The above describes the percentage change in Y over the percentage change in X.
Anyone familiar with economics should immediately recognise this result.
β1 represents the concept of elasticity within this log model.

We can generalise this specific case in terms of our log model. We can say that the beta coefficients show us the partial elasticity of our dependent variable with respect to a particular independent variable.

Now, we are ready to adjust our "cumbersome" description:

Assuming that all the other variables are held constant, β1 shows us the effect of a percentage increase of 1% in X1 on the dependent variable.

The above example will happily hold for when both our dependent and independent variables are logged, but what happens if we only log our dependent variable?

We can still think about β1 in percentage terms, it shows what the percentage increase in Y is from a 1 unit increase in X1.

If I have a non log dependent variable and a logged independent variable:

β1 in this context represents the increase in Y that is associated with a 1% increase in X1.

In sum, the reason to have a log-log model is that all of our variables are on the same scale. Therefore, we don't need to worry about units which allows for a sharper comparison between coefficients.

Interpreting regression coefficients in linear regression

The idea with linear regression is that we fit some sort of line to our data.

By virtue of this process, one of the first questions we face is 'What do the parameters alpha and beta represent?'

With reference to the simple bivariate case:

alpha (α) represents the value of Y if X was 0.

Beta (β) represents the gradient of our line. (Think about y=mx+c)

The above is pretty straightforward, but what happens when we introduce a third variable in our regression?
In this context, we are now working with two independent variables and one dependent variable.
We can call our independent variables X1 and X2 such that:

Least squares will place a line which minimises the sum of the squared distances of each of the points from the line.

In most situations, we wont be dealing with just one or two explanatory variables. Typically, we have a whole host of explanatory variables to consider.

How can we now think about multiple regression given this new information?

If we think about the situation pragmatically, we have now run out of spatial dimensions to graph any more variables. However, we can still think about what the individual β1 to βn represent.

We can say that β1 represents the marginal effect of having one more unit of X1 on Y.

Why does it represent the marginal effect?

We can think about it in terms of having one more unit of X1 whilst everything else is held constant.

We can also show this through partial differentiation:

This is why β1 represents the "partial effect" in econometrics.

An introduction to the Gauss-Markov assumptions

This is a pretty big topic which is often dealt with badly*. The aim of this post is to provide a (somewhat) clear understanding of what the G-M assumptions are and what they entail.
Wherever possible, I will try to minimise the usage of math.

* Just take a look at the Wikipedia article..
http://en.wikipedia.org/wiki/Gauss%E2%80%93Markov_theorem

----------------------------------------------------------
The G-M assumptions are a set of criteria which were created by mathematicians Carl Friedrich and Andrey Markov. If the G-M assumptions are upheld then this tells us something about our ability to use least squared estimators on the sample data.

It says that those least squared estimators are BLUE. For an estimator to be blue, it means that there are no other linear unbiased estimators which have a lower sampling variance than that particular estimator.

Let's imagine that I have the sampling distribution of one estimator and then I have another one, which has the same centre as the previous, but it is slightly steeper towards the actual centre of the distribution.

Assuming that both of these are unbiased (centered around the true population parameter), we can see that the second estimator has a lower sampling variance than the first.

This means that more often than not, when I apply least squared estimators to sample data, they will more often than not provide me with closer estimates of the true population parameter

An illustration of the above

The assumptions (in no particular order)

(1)

We require a model which is linear in parameters. The above population process is linear in parameters alpha and beta, so it satisfies this assumption.

It is important to note that the variable v cannot determine Y due to the presence of the error term u.

The above is an example of a population process which violates our assumption, as it is non-linear in parameters due to the multiplicative effect between alpha and beta.

However, this assumption does not entail that we have a population process which is linear in variables. A population process which is non-linear in variables such as the above, can still be consistent with our assumption.

(2)

The set of sample data we are working with is a random sample from the population. This means that within our population, a random sample occurs if each individual in the population is equally likely to be selected. This also implicitly means that all of our data points come from the same population process.

(3)
This assumption, known as the 'Zero-conditional mean of errors' is arguably the most important.

In the above population process, the variable v now represents 'years of education'. We are trying to see what effect v has on Y.

If I know what level of education a person has, the Zero-conditional mean of errors states that this does not help me to predict whether said person will be above or below the population regression line.

This assumption tells us that:

From the above, it follows that the expectation of my error given any of my independent variables has to be equal to 0.

If this is violated, the following happens:

And this implies:

In sum, if the Zero-conditional mean of errors is violated then our least squared estimators are biased, which is demonstrated above.

As a result, our least squared can become upwardly biased, which is shown in the following diagram:

Another result of this assumption is:

If I was to plot ui against xi , there should be no discernible relationship between the two. Simply put, there should be no correlation between the two variables.

(4)

There is no perfect colinearity in regressors.

Using the above as an example in light of no perfect colinearity assumption, there cannot be an exact relationship between x1 and x2.

The above seems to suggest that we can know the value of x2 through x1. If this is the case, then we have violated the assumption of no perfect colinearity.

(5)

Homoscedastic errors

A more careful representation of the homoscedasticity assumptions yields:

This tells us that the variance does not vary systematically with x.

(6)

No serial correlation

No serial correlation means that the errors have to be independent of one another. Knowing one of the errors does not help me to predict another error.
An exception to this assumption is where i=j. If this is the case, we are working with a variance and not a covariance.

~~That's all there is to it... for now.~~

Numbers

Friday 26 December 2014

The F-test: An example

Wednesday 24 December 2014

An introduction to the F-test

Thursday 20 November 2014

Log models (2/2)

Log models (1/2)

Interpreting regression coefficients in linear regression

Monday 17 November 2014

An introduction to the Gauss-Markov assumptions

About Me

Links

Previous Posts

Archives