1. Name That Model!

Keywords: OLS, WLS, GLM, GzLM, Heteroscedasticity , Assumptions, Residuals, Errors

Hullo

I have been working full time and studying part time for a while. Approx 18 months ago I had to contend with getting up to speed with weighted least squares (WLS) regression. One of our models had been identified as heteroscedastic and the modeller had used WLS to conduct the regression. Which had the form

where W was a diagonal matrix with the variance of at that point. (or was it 1 / the variance?)

I recall while I was searching for information of WLS I read that the W matrix could be further generalised to compensate for non-independence of the error terms but turning it into a variance/co-variance matrix. I think in the material I was reading at the time this was put in the category of "generalised linear models". I'm now back filling my knowledge and but can't seem to find the right term to search for without bringing up a whole lot of other modelling topics I don't have time to read around!

I want the generalised version of OLS which assumes normality but not independence or error terms. - Is there a specific term which covers this topic without encompassing other areas and issues?

(are there any other competing frameworks I should make myself aware of?)

Thanks,

JP

2. Re: Name That Model!

To put it in context a little bit. I am looking at potentially analysing costs for a particular type of product (think Car models). I have data from a number of sources,
* Some represent a particular model that developed as the years went on, incrementally they probably have alot in common while the first model and the current model could be completely different.
* Others are from insights gained into other makes and may have errors correlated with any other data points.
* and finally some may be from the same maker but different models. Thus weakly correlated errors with other models being produced at the same time.

Sampling issues aside, I'm looking at ways I could handle the violation of OLS assumptions in this situation.

3. Re: Name That Model!

So you final model is:

cost = car model

How are you putting car model into the model, in particular are the evolving cars or similar cars all enter as a new model, so they are their own categories?

You are breaking (as you know the idea of interference between groups and relationships are not closed, there are cyclic relationships). Sorry I am not too familiar with WLS in this regard, but will help if I can.

4. The Following User Says Thank You to hlsmith For This Useful Post:

dj_johnphillips (08-05-2016)

5. Re: Name That Model!

It is called Generalized least squares. Search for GLS and autocorrelation.

Originally Posted by dj_johnphillips

I believe that your is equal to the identity matrix. Or that your finger slipped and you got an "X" instead of an "y" in the end.

In your the formula W is the inverse of the variance-covariance matrix. Note that when there is autocorrelation W will not be diagonal, but W will have off-diagonal terms. One version is 'compound symmetry correlation'. (Search for it.)

By the way, GLS is not the name of the model. It is the name of the estimator. Your model can be estimated by several estimators, like OLS, WLS, GLS, ML etcetera.

So in the the example below! with 12 uncorrelated observations would correspond to a diagonal W and the 5 correlated observations would give off-diagonal terms in the variance-covariance matrix (which inverted gives W).

6. The Following User Says Thank You to GretaGarbo For This Useful Post:

dj_johnphillips (08-05-2016)

7. Re: Name That Model!

OK, so lets say I have a dozen (12) haphazardly sampled points for random makes and models of cars and also 5 points for the Mark 1,2,3,4 and 5 of a particular make and model of car. These 5 points may well have correlation in the error terms due to similarities, decreasing the further away they get from each other.

Given how scarce my data is I want to make the most our of every point.
I'm in business so this isn't about "hypothesis tests" as such, more about making the best prediction for some future car, without over fitting!

How do I correctly treat the 5 points which I think might be correlated?

From what I recall the framework in my 1st post can assist, so I would like to read up on it (and its limitations). Of course I'm also more than happy to read up on any other contenders!

If it helps I could try and come up with a better illustrated example tomorrow.

8. Re: Name That Model!

Originally Posted by GretaGarbo
It is called Generalized least squares. Search for GLS and autocorrelation.

I believe that your is equal to the identity matrix. Or that your finger slipped and you got an "X" instead of an "y" in the end.

In your the formula W is the inverse of the variance-covariance matrix. Note that when there is autocorrelation W will not be diagonal, but W will have off-diagonal terms. One version is 'compound symmetry correlation'. (Search for it.)

By the way, GLS is not the name of the model. It is the name of the estimator. Your model can be estimated by several estimators, like OLS, WLS, GLS, ML etcetera.

So in the the example below! with 12 uncorrelated observations would correspond to a diagonal W and the 5 correlated observations would give off-diagonal terms in the variance-covariance matrix (which inverted gives W).
Haha, yes you are quite correct that last X should have been a y. I was so busy trying to get the math tag right I missed that Thanks.

I'll try a few of those terms. The issue I'm getting in my searching is most of the material that turns up doesn't really focus (or sometimes mention) the W matrix and it's impact on the regression through to prediction intervals. Most of the material that I've turned up seems to focus on non-normal distributions. Still I've I'm barking up the right tree with my search terms I'll just have to preserve something will turn up. I was partially worried I may have the wrong terms in which case obviously I'd never find anything!

9. Re: Name That Model!

So, is your concern autocorrelation? If so, firstly, run a correlogram and examine it visually. Then test for serial correlation using, say, Durbin-Watson test. If the issue proves to be the case, consider several models -- lagged model, differencing the DV, ARIMA, or GLS as mentioned earlier -- and examine their fit and consistency of the estimates.

You sample size is very small, admittedly.

10. The Following User Says Thank You to kiton For This Useful Post:

dj_johnphillips (08-05-2016)

11. Re: Name That Model!

Hi Kiton,

I've met autocorrelation in time series data and while time/order is an element (a mk 2 has to happen before a mk 3 or a mk 4 but after the mk 1 ) it's not a time series per-say, although it might be I can adapt the theory from time series models once I've got enough a grasp on those! It's just some of the points have more in common with each other than others. Given the sample size is small and not randomly sampled (although I might have the entire current population and we want to predict the next point), I can't really model all of the commonalities as variables without vastly overfitting to the point of more variables than data, but I'm looking to squeeze as much insight out of it as possible out of the data and a more holistic "correlation between points Xn and Xm" might well be a solution.
I'm looking at producing a "best estimate" (with suitable uncertainty from a prediction interval) so the business can make a more informed choose than blind chance alone and a (hopefully) less coloured any preconceptions. After a downselect and more investigations a bottom up estimate can be obtained from engineering first principals.

Thanks again for the input I'll be reading (or re-reading) around those models

JP

 Tweet

Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts