+ Reply to Thread
Results 1 to 4 of 4

Thread: linear model estimation: how many regressors to choose?

  1. #1
    Points: 2,529, Level: 30
    Level completed: 53%, Points required for next Level: 71

    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    linear model estimation: how many regressors to choose?



    Hi,

    Suppose I have some timecourse, which I need to explain (it's a fMRI brain imaging example). In this timecourse I have two types of conditions, where one condition influence my timecourse two time stronger than the second (let's say 1 and 0.5). In addition, I have some zero conditions, where nothing happens. I can model my timecourse in two different ways:
    1) one regressor, where I put "1" at the points of condition_1 and "0.5" at the points of condition_2. All the rest is "0"
    2) two regressors, where in resgressor no 1 I put "1" at the points of condition_1 and in resgressor no 2 I put "1" at the points of condition_2. All the rest is zero.
    Should my design choice result in different residuals unexplained error? I have hundreds of data points, so adding one more regressor doesn't hurts the power.

    Bellow please find matlab simulation, where I get comparable results in both.

    Thanks a lot,
    Vadim

    function RegressionFitSimulation

    %Here I define my paramaters
    N = 100;
    N_cond1 = 40;
    N_cond2 = 40;
    Noise_Sigma = 0.3;

    %Here I simulate my Y timecourse
    perm_index_arr = randperm(N);

    cond1_indexes = perm_index_arr(1:N_cond1);
    cond2_indexes = perm_index_arr(N_cond1+1:N_cond1+N_cond2);
    cond0_indexes = perm_index_arr(N_cond1+N_cond2+1:N);

    Y(perm_index_arr(cond1_indexes)) = 1;
    Y(perm_index_arr(cond2_indexes)) = 0.5;
    Y(perm_index_arr(cond0_indexes)) = 0;

    Y = Y';
    Y = Y + normrnd(0,Noise_Sigma,N,1);

    %model with one regressor
    x1(1:N) = 0;
    x1(cond1_indexes) = 1;
    x1(cond2_indexes) = 0.5;
    X = [ones(N,1) x1'];
    [b,bint,r] = regress(Y,X);
    disp(['Beta0(hoteh): ' num2str(b(1)) ' Beta1:' num2str(b(2)) ' Sum of residiuls squares ' num2str(sum(r.^2))]);

    %model with two regressors
    x1(1:N) = 0;
    x2(1:N) = 0;
    x1(cond1_indexes) = 1;
    x2(cond2_indexes) = 1;
    X = [ones(N,1) x1' x2'];

    [b,bint,r] = regress(Y,X);
    disp(['Beta0(hoteh): ' num2str(b(1)) ' Beta1:' num2str(b(2)) ' Beta1:' num2str(b(3))...
    ' Sum of residiuls squares ' num2str(sum(r.^2))]);

  2. #2
    TS Contributor
    Points: 7,553, Level: 58
    Level completed: 2%, Points required for next Level: 197
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    1,213
    Thanks
    37
    Thanked 142 Times in 106 Posts
    The way I see it, the choice of which to use depends what kind of scale the conditions are on. Let's say that your conditions are, say, days of stroke rehab therapy received. If:

    Condition 0 = no rehab therapy received
    Condition 1 = 10 days of rehab therapy received
    Condition 2 = 20 days of rehab therapy received

    Then it would totally make sense to use one regressor, with multiple levels. If on the other hand you had something like:

    Condition 0 = no stroke rehab therapy received
    Condition 1 = Conventional stroke rehab received
    Condition 2 = Chinese medicine -style stroke rehab received

    Then you would need two regressors - there's no way you could justify saying that Chinese medicine stroke rehab* is 'twice as much' stroke rehab as conventional rehab. So the question is, do your conditions represent different levels of the same variable, or different variables entirely?

    As far as effects on final analysis: having two regressors rather than one increases the degrees of freedom for the total model, reducing the statistical power, but would also likely result in a higher R2 (possibly just due to chance effects being better captured in the more complex equation, possibly due to non-linear relationships of condition level on dependent variable).

  3. #3
    Points: 2,529, Level: 30
    Level completed: 53%, Points required for next Level: 71

    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks you for your reply.

    However, in your example for one regressor (10 vs. 20 days rehab therapy), what we actually should care is not weather one regressor value is twice the second, but whether the Y values of regressor=20 is twice than values of regressor=10. If it's not the case, then my fit will be lower, than in case with two regressors. Am I correct?

    In general, it looks to me that unless I have to concern about degrees of freedom I better use two regressors. I don't see than what would be the advantage of using a single regressor.

  4. #4
    TS Contributor
    Points: 7,553, Level: 58
    Level completed: 2%, Points required for next Level: 197
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    1,213
    Thanks
    37
    Thanked 142 Times in 106 Posts

    Quote Originally Posted by vadim1508_1 View Post
    Thanks you for your reply.

    However, in your example for one regressor (10 vs. 20 days rehab therapy), what we actually should care is not weather one regressor value is twice the second, but whether the Y values of regressor=20 is twice than values of regressor=10. If it's not the case, then my fit will be lower, than in case with two regressors. Am I correct?
    Hmm, I don't think that's quite right, though I guess I confused the issue a bit by bringing scaling issues into the discussion! It's the practical meaningfulness and equivalence of intervals between data points on variable x that determine whether you can consider variable x as measured at the interval level - what data points the values of x relate to on variable y isn't really key to answering this question.

    Anyway, if you can say to yourself that the different conditions represent different levels of ONE variable, and you've measured that variable on an interval-level scale, go for one scalar regressor. On the other hand, if you reckon that the different conditions represent different levels of one variable, but you can't justify an interval-level measurement assumption, OR the different conditions are best considered as two entirely different variables, you can go for either two regressors or one nominally-specified regressor (these should be equivalent, I think).

    Time for coffee for me, good luck with everything

+ Reply to Thread

Similar Threads

  1. choose model
    By encsgo in forum Statistics
    Replies: 1
    Last Post: 03-22-2011, 03:56 PM
  2. bridge estimation in linear regression
    By zzzc in forum Statistical Research
    Replies: 0
    Last Post: 11-25-2010, 10:53 PM
  3. logistic regression, which model to choose ?
    By WeeG in forum Psychology Statistics
    Replies: 10
    Last Post: 01-23-2010, 04:09 AM
  4. random effects mixed model, different regressors
    By Fenisi in forum Statistics
    Replies: 0
    Last Post: 07-31-2008, 04:19 AM
  5. Regression Analysis Non linear estimation using squares
    By vladinator in forum Regression Analysis
    Replies: 1
    Last Post: 04-30-2007, 09:01 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats