+ Reply to Thread
Results 1 to 6 of 6

Thread: Finding means and regression coefficients of dataset while controlling for outliers?

  1. #1
    Points: 33, Level: 1
    Level completed: 66%, Points required for next Level: 17

    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Lightbulb Finding means and regression coefficients of dataset while controlling for outliers?




    Edit: in the title I meant CONFOUNDERS, not outliers, sorry!!

    Hi,

    I have a dataset which consists of around 50 people, split into 3 groups. Each of those people has answered a 10-question IQ-type test with numerical results. However, I have two confounders that I've already shown are significant - ethnicity (white/black/other) and position (1/2/3).

    I need to find the means and regression data of each group for each question, controlling for ethnicity and then, separately, controlling for both ethnicity and position.

    I've read a fair bit of documentation about this but for some reason I just can't get my head around it. The main type of multiple regression I know how to do is stepwise, but that only allows for one DV at once and the output (I'm using SPSS) doesn't look right to me. I can't work out how to get the mean values at all! Am I just being thick? Does anyone here know the most appropriate test to use for this?

    Really sorry in advance if this isn't the right forum, I was weighing up whether to post here or in the regression forum but I'm not sure if I need a regression to find the means so I eventually decided to post here - I'm happy to delete or remove if this belongs somewhere else. Thanks!
    Last edited by phoela; 08-24-2016 at 10:05 AM.

  2. #2
    Points: 4,664, Level: 43
    Level completed: 57%, Points required for next Level: 86
    kiton's Avatar
    Location
    Corn field
    Posts
    234
    Thanks
    47
    Thanked 51 Times in 46 Posts

    Re: Finding means and regression coefficients of dataset while controlling for outli

    Quote Originally Posted by phoela View Post
    Hi,

    I have a dataset which consists of around 50 people, split into 3 groups. Each of those people has answered a 10-question IQ-type test with numerical results. However, I have two outliers that I've already shown are significant - ethnicity (white/black/other) and position (1/2/3).
    Outliers typically refer to points in the data. For example, of your 50 people in a sample several might have extremely high (or low) values in their responses. So, I assume you wanted to say that ethnicity and position are two control variables that by themselves significantly impact the outcome (DV).

    Quote Originally Posted by phoela View Post
    I need to find the means and regression data of each group for each question, controlling for ethnicity and then, separately, controlling for both ethnicity and position.
    To find group means you can simply run the descriptive statistics on the variables of interest while specifying, say, -if- option for each group. Now, to estimate the unknown coefficient of your predictor of interest (do you have any specific one?) controlling for ethnicity and position, you indeed need to run a regression analysis.


    Quote Originally Posted by phoela View Post
    I've read a fair bit of documentation about this but for some reason I just can't get my head around it. The main type of multiple regression I know how to do is stepwise, but that only allows for one DV at once and the output (I'm using SPSS) doesn't look right to me. I can't work out how to get the mean values at all! Am I just being thick? Does anyone here know the most appropriate test to use for this?
    Assuming you have one numerical DV, you can run a linear multiple regression model. It seems that all your data on the 10-question test pertain to one single DV (not 10 separate ones) -- call it "IQ". Consequently, there seems to be a need to reduce the dimensions from 10 to 1. This could be done either (easily) by averaging all responses, or (more complicated) by running a factor analysis on them. Once you reduce the dimensions of your response variable, you can run a multiple regression (stepwise or other types shouldn't matter at this stage). Note, once you have a single variable for your DV, you'll also be able to calculate its mean depending on the group.

    Hope this clarifies your problem a bit.

  3. #3
    Points: 33, Level: 1
    Level completed: 66%, Points required for next Level: 17

    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Finding means and regression coefficients of dataset while controlling for outli

    Hey, thank you so much for your reply! You're right, I completely derped out there - I meant confounders, not outliers. I've edited my post to reflect that!

    To find group means you can simply run the descriptive statistics on the variables of interest while specifying, say, -if- option for each group.
    I feel like I'm probably missing something obvious, but I just can't work out how to find the descriptives while controlling for confounders! What do you mean by specifying the -if- option?

    Assuming you have one numerical DV, you can run a linear multiple regression model. It seems that all your data on the 10-question test pertain to one single DV (not 10 separate ones) -- call it "IQ". Consequently, there seems to be a need to reduce the dimensions from 10 to 1.
    Ah, sorry, I should have been clearer. Although the 10 questions make up one single test, the interest is in seeing how participants do on the individual questions, as they all measure slightly different things. My task is to produce three tables: one with the unadjusted mean scores of each group for each question, one with those scores adjusted for the confounding effect of ethnicity, and one with those scores adjusted for the effects of both ethnicity and position. I have the first table, but the other two I'm struggling with!

    I could easily just run separate tests for each question, though! Would that still work just as well? My main issue is this: I understand how to run a simple multiple regression for, say, result of question 1 x ethnicity. The part I'm stuck on is how to also account for participant group. Is there a way to do this, or should I split my dataset down further and run the tests for a single group at a time?

    Thank you again for your help, and I'm sorry to bombard you with questions! Statistics isn't my strongest suit!

  4. #4
    Points: 4,664, Level: 43
    Level completed: 57%, Points required for next Level: 86
    kiton's Avatar
    Location
    Corn field
    Posts
    234
    Thanks
    47
    Thanked 51 Times in 46 Posts

    Re: Finding means and regression coefficients of dataset while controlling for outli

    Let us look deeper into this ) Firstly, let's clarify what your model is:

    y = a + x1 + x2 + u (1)

    where y is a 10-item IQ test result, x1 is ethnicity, and x2 is position, a is a constant, and u is an error.

    Do you have any other predictors of interest? I am asking this, because "confounding effects of other variables" typically implies the estimation of some other variables in the model in addition to controls -- as if you had, say, x3 in the model. As such, "descriptives while controlling for confounders" sounds implausible to me. Do you have some "group" variable as a third predictor? If you do, then it is plausible to estimate the group mean scores for each group with and without the controls. Note though that your controls are categorical (i.e., they each have 3 groups) and they should be specified in SPSS accordingly (otherwise, SPSS will treat them as numericals and provide wrong estimates)

    In case you want to run a model with 10 DVs (with each test item being a DV), then you should be looking at a multivariate multiple regression (not univariate) -- SPSS is capable of running that (google for it) and it isn't much more complicated than univariate that you are running. Ideally, assuming your IVs are categories, you should be looking at MANOVA (though regression could also do the job if predictor types are correctly specified).

  5. #5
    Points: 33, Level: 1
    Level completed: 66%, Points required for next Level: 17

    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Finding means and regression coefficients of dataset while controlling for outli

    I'm sorry, I haven't really explained it clearly I have 3 participant groups, 2 confounders and 10 DVs. Participant group is my IV. Basically I have to fill 2 tables that look like this:

    HTML Code: 
    Test       ||   Group A ||  Group B  ||  Group C ||
    ==============================================
    Question 1 ||  μ  |  β  ||  μ  |  β  ||  μ  |  β  ||
    ---------------------------------------------
    Question 2 ||  μ  |  β  ||  μ  |  β  ||  μ  |  β  ||
    ---------------------------------------------
    etc        ||  μ  |  β  ||  μ  |  β  ||  μ  |  β  ||
    with the first having adjusted for ethnicity, and the second having adjusted for ethnicity and position.

    So my problem is not just finding the regression coefficient of Question 1 while adjusting for confounders, it's finding the regression coefficient of Question 1 for Group A while adjusting for confounders.. and then group B, group C, and etc for every question and every group. I'm assuming I could just split the data by group and run separate analyses that way, but is there a way to do it otherwise?

  6. #6
    Points: 4,664, Level: 43
    Level completed: 57%, Points required for next Level: 86
    kiton's Avatar
    Location
    Corn field
    Posts
    234
    Thanks
    47
    Thanked 51 Times in 46 Posts

    Re: Finding means and regression coefficients of dataset while controlling for outli


    Quote Originally Posted by phoela View Post
    I'm sorry, I haven't really explained it clearly I have 3 participant groups, 2 confounders and 10 DVs. Participant group is my IV.
    Now we are talking So, the model to be estimated: y = a + x1 + x2 + x3 + u, where y is represented by 10 items, x1/x2 are controls and x3 is the main predictor of interest --all regressors are categorical. Basically, you need to run three models: y = a + x3 + u; y = a + x1 + x3 + u, and y = a + x1 + x2 + x3 + u -- "uncontrolled" effect of x3, controlled for x1, and controlled for x1 and x2.

    Then, based on the requirements for table, the task is to explore the change in the means and regressions coefficients (betas) depending on the group of x3 (i.e., A, B, or C).

    Quote Originally Posted by phoela View Post
    So my problem is not just finding the regression coefficient of Question 1 while adjusting for confounders, it's finding the regression coefficient of Question 1 for Group A while adjusting for confounders.. and then group B, group C, and etc for every question and every group. I'm assuming I could just split the data by group and run separate analyses that way, but is there a way to do it otherwise?
    Although you could have split the data set in three parts, I wouldn't do it. Not only it increases the bias of your estimates, but it is actually not necessary. If you run a multivariate regression, or I should say preferably MANOVA, you would have your x3 -- your categorical predictor of interest -- be able to provide you with answers you are looking for. That is, you would be able to estimate if there is any difference in the mean of your outcome(s) depending on the group (Note, the results will be interpreted in comparison to a reference group that you choose). Try watching this video for starters on MANOVA in SPSS: https://www.youtube.com/watch?v=3pzCa4Whv74

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats