Stepwise and all-possible-regressions

#1
Hi all,

Before I start I just wanted to cordially thank you all for this great community...I really admire the work you have done...and the answers you have provided to tough questions...

Here is one...tough for me question...please forgive me if you have already answered it.

I have been told...that stepwise regression is for dummies...people who start doing inferential statistics or their PhD or Masters dissertation ... from other fields of statistics and do not think or have good statistical analytical skills...

Therefore, I was wondering if anyone could share with us a table matrix or chart or references that compares the different types of regressions.

THANK YOU so much for your great and really invaluable help and support :)

T
 

noetsi

Fortran must die
#2
Stepwise is really not a type of regression. It is a method that can be used within many types of regression (linear, logistic regression etc). It is heavily frowned on for two reasons. 1) Because it is heavily dependent on sample and thus results can vary signficantly between sample to sample. 2) It has no theoretical basis at all (that is the results you get are not tied to theory, they are found empirically only in your data and could be entirely nonsensical). This is what makes 1) so dangerous. If you have variables (IV) that are highly correlated you can well leave one in and one out through this method which is anything but ideal.

I don't have a matrix showing regression types. The primary ones I know of are linear (OLS) and categorical (logistic or probit) but there are many variations, a high percentage of which are used only in special circumstances such as weighted least squared or robust regression (both deal with failure of assumptions in regression).
 
#3
Hi:

Thank you VERY much for your response and correction!

Based on your insightful answer please let me correct my question then for all the other people participating in this forum.

I am looking for pros and cons of regression methods. Thus I would greatly appreciate if anyone could share with us such kind of valuable information!

Thank you so much!

T
 

noetsi

Fortran must die
#4
The pros and cons depend specifically on what you are doing and the form your dependent variable takes. Also the nature of your data. All methods have assumptions and all are more or less useful depending on what you are trying to do. So you need to provide specifics for people to make suggestions.

Always remember, particularly since many here will suggest complicated methods, :) that what is the best method depends on your expertise, what the people you are producing the report for like, and so on. Particularly for graduate projects the perfect method may not be ideal (something I can speak to painfully).
 
#5
Thank you Noetsi. Your answer was so fast and much appreciated. However, I was looking like a document that has the assumption if not pros and cons of regression methods in a multiple regression model.

Many thanks again! Your prompt response proves that this forum is awesome!

Christos
 

noetsi

Fortran must die
#6
I don't know where you can find an online chart for this. Tabachnick and Fidel's "Using multivariate statistics" provides a list of the assumptions for logistic regression (pages 441-443) and OLS (linear regression) on pages 123-127. I strongly reccomend that book as it is easy to read, provides lots of useful details, and covers software (although any book dealing with software will be quickly dated).

For a much more detailed list of OLS assumptions you can use "Understanding Regression Assumptions" by William Berry
 
#7
I don't know where you can find an online chart for this. Tabachnick and Fidel's "Using multivariate statistics" provides a list of the assumptions for logistic regression (pages 441-443) and OLS (linear regression) on pages 123-127. I strongly reccomend that book as it is easy to read, provides lots of useful details, and covers software (although any book dealing with software will be quickly dated).
:D

Although the user called 'spunky' would disagree :(
 
#9
A linear regression model is a model. Ordinary least squares (OLS) is an estimation method. That's two different things.

A linear regression model can be estimated with many methods, eg. OLS, WLS (weighted LS) and GLS (generalized LS) and many other methods.

A logistic regression model can be estimated with OLS, with relatively good results, but it would be better to use maximum likelihood.

OLS – as a method – does not have any assumptions. But there are standard assumptions for the linear regression model. When these standard assumptions are violated other estimations methods are often used.

Stepwise regression is a model selection method. But “stepwise” is “unwise”!

- - -

An other issue: I don't believe that “Tabachnick” is a genuine user. Therefore I suggest that “Tabachnick” read the agreement that we signed when we started here. Although it is highly entertaining [to read “Tabachnicks” comments], as I understand it it might be that “Tabachnick” might get in trouble with the moderators. Therefore I suggest to the moderator Dason to withdraw that posting.
:)

(I might have to suffer for this in the future. :) )
 

Dason

Ambassador to the humans
#11
There are whole pages in the net from universities on the assumptions of OLS regression...:p
Sure but like Greta said those are for the model - not on fitting using OLS. OLS itself doesn't assume normally distributed errors. But to make inference about parameters we typically make the assumption of normally distributed errors so we have a sampling distribution for the parameters of interest.
 

noetsi

Fortran must die
#12
My point was that its unusual to make the distinction Greta made. Books, articles, etc simply talk about the assumptions of OLS regression. Not of it as, or not as, a model.

I wasn't actually thinking of normality. I was thinking of homoskedacity, linearity, independence etc.
 

noetsi

Fortran must die
#14
On page 438 (among other places) John Fox notes 'Because the errors in equation 16.13....the transformed equation can legitimately be fit by OLS regression." in Applied Regression Analysis and Generalized Linear Models, a rather large statistical book.

Among many places he uses that term without qualifying it as a technique.....

Other books have it both ways. Alison in "Logistic Regression Using SAS" First says....it was common to see published research that used ordinary least squares (OLS) linear regression..."
Then on the same page he says, "No reputable social science journal would publish an article that used OLS regression with dichotomous dependent variables." In both cases he clearly means linear regression.
Other examples.

2- What is OLS regression?

http://www.chsbs.cmich.edu/fattah/courses/empirical/29.html

Before we look at these approaches, let's look at a standard OLS regression using the elementary school academic performance index (elemapi2.dta) dataset.
http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter4/statareg4.htm

Stata is a popular alternative to SPSS, especially for more advanced statistical techniques. This handout shows you how Stata can be used for OLS regression.
https://www3.nd.edu/~rwilliam/stats1/OLS-Stata9.pdf

Interpreting OLS regression and transformations
http://pages.uoregon.edu/aarong/teaching/V3212_Outline/node8.html

Here is a journal article
"While ordinary least square (OLS) regression....
http://www.jstor.org/discover/10.23...2&uid=70&uid=4&uid=3739256&sid=21101589330883

Here is a sage book
OLS Regression With a Nonnormal Error Structure
http://srmo.sagepub.com/view/bootstrapping/d18.xml?rskey=mQySDo&row=13

Here is another academic article
Homoskedasticity is an important assumption in ordinary least squares (OLS) regression.
http://www.afhayes.com/public/BRM2007.pdf



I will stop there. Note that I am not arguing this is the correct usage. Just that it is commonly done - that is people use OLS regression when they actually mean linear regression employing OLS as an estimation technique. It is confusing to those who don't catch this.