+ Reply to Thread
Results 1 to 3 of 3

Thread: Regression analysis(?) for multiple independent variables

  1. #1
    Points: 5,027, Level: 45
    Level completed: 39%, Points required for next Level: 123

    Posts
    4
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Regression analysis(?) for multiple independent variables




    Hello all,

    Apologies for posting an elementary query, but my stats is very rusty. Not looking for an explicit solution, necessarily, just a pointer in the right direction. (And if I've posted to the wrong sub-forum, I'd be grateful for suggestions.)

    I have N records. Each contains M real values (an individual's known characteristics) and one measurement of that individual's result on a particular test. I know that N>>M. As a simple example, suppose I have the age/height/weight of 1000 individuals (thus M=3, N=1000), as well as each person's time t_run on a 10km run at maximum effort. Importantly, in some cases, the person could not complete the run at all, so t_run for those records is undefined.

    I would appreciate any help with understanding the following:

    1. Assuming this data is representative of some (larger) population, what is a reasonable way to predict someone's test result (here, 10km time) as a function of known characteristics (here, age/height/weight)? Since N>>M, one idea I had was to compute the least-squares coefficients, k_m (for m = 1 .. M), such that t_run_predicted = k_1*age + k_2*height + k_3*weight.
    2. What is the correct term for the approach described in (1) -- linear regression? correlation analysis? (I just need to figure out where to start looking.)
    3. I'm concerned that setting t_run = (infinity) for those records where the test subject was unable to complete the run will cause problems (e.g. undefined matrix inverse and/or pseudoinverse). Would setting t_run as, say, 10X the largest t_run recorded by anyone who completed the run be a reasonable workaround?
    4. I'm uncertain if the problem is linear in the known characteristics. For example, the run time might be roughly linear in height and weight, but quadratic in age. Is there a standard approach for estimating the best exponents (orders?) in such a polynomial, if any or all of them are not unity? (Again, I'm not necessarily asking for the answer -- just what this analysis is called, so I can try to teach myself how to do it)

    Thanks very much in advance for any pointers or suggestions!

    -Heywood
    Last edited by Heywood; 06-14-2015 at 07:08 PM. Reason: Formatting

  2. #2
    TS Contributor
    Points: 7,081, Level: 55
    Level completed: 66%, Points required for next Level: 69

    Location
    Copenhagen , Denmark
    Posts
    515
    Thanks
    71
    Thanked 123 Times in 116 Posts

    Re: Regression analysis(?) for multiple independent variables

    1. Multiple Linear Regression
    2. Specifically >>Multiple Linear Regression<< when M>1 for in specific M simply referred to as >>Linear Regression<<
    3. What is it you want to model? You could limit the model to people capable of completing the race.
    and then make another model - logit - predicting whether or not a person will complete the race.
    4.
    I'm uncertain if the problem is linear in the known characteristics.
    Not a problem for multiple linear
    regression. The model is called linear because it is linear in the coefficients not in the characteristics. Put in som quadratics
    and check if coefficients are significant.

  3. The Following User Says Thank You to JesperHP For This Useful Post:

    Heywood (06-24-2015)

  4. #3
    Points: 5,027, Level: 45
    Level completed: 39%, Points required for next Level: 123

    Posts
    4
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Regression analysis(?) for multiple independent variables


    Hi Jesper,

    Thanks for your explanations! A quick clarification:

    when M>1 for in specific M simply referred to as >>Linear Regression<<
    Did you mean "for non-specific M" (that is, for M not known to be a specific value)?

    My (primitive) understanding is that the term "Multiple Linear Regression" means M>1, while "Linear Regression" either means M=1 exactly (precise definition) or M>0 (informal definition). Is that right?

    Put in som quadratics and check if coefficients are significant.
    OK, I can certainly do that. What I'm wondering is, does there exist a systematic approach to solve for the exponents explicitly, in the same way that LSQ solves for the coefficients? Or is that an ill-posed problem, regardless of how overconstrained (N>>M) the system of equations is?

    Sorry if these followups are a bit obtuse. Thanks again for any suggestions,

    -Heywood

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats