+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 16 to 22 of 22

Thread: Coding Data

  1. #16
    TS Contributor
    Points: 13,936, Level: 76
    Level completed: 72%, Points required for next Level: 114
    jpkelley's Avatar
    Location
    Vancouver, BC, Canada
    Posts
    440
    Thanks
    17
    Thanked 90 Times in 84 Posts

    Re: Coding Data




    Just to bring the conversation around to the OP's question...any criticism or questions about the statistical solution proposed in this thread?

  2. #17
    TS Contributor
    Points: 22,383, Level: 93
    Level completed: 4%, Points required for next Level: 967
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Coding Data

    oh no, and i mean i totally welcome the discussion because i think it's a very relevant one for any of us who work as "knowledge translators" between statisticians/quantitative methodologists and everyone else... besides, we have a history here of hijacking other people's posts and take them in weird tangents.

    you touch on a very important point there when you mention it's possible to use less sophisticated approaches (which is always desirable) with the huge caveat that they have to be "accurate enough for your purposes". the point that jpkelly and i are trying to make is that statistical estimates from Berley's data derived through traditional correlational methods (regular OLS regression, pearson's correlation, etc) could end up being so biased that analyzing them through simple approaches would end up doing more harm that good. i'm not sure what its name is in the ecological sciences (where jpkelley is our local expert) but here in the province of social sciences/educational measurement/psychometrics is called the "unit of analysis" problem, which is perfectly exemplified the school-setting paradigm: should analysis be done at the student-level? classroom-level? school-level? district-level? and performing analysis ignoring this clustering of the data (which can arise naturally like in the school example or by design as in Berely's case) produces such bad estimates that a whole new area of statistics called hierarchical linear models/multilevel models was created just to tackle this problem. so just from starters it is known the estimates derived from averaged data will not be accurate enough because there's about 20-yrs worth of analytical, simulation and real-data studies in the academic literature that backs that up.

    which takes us to the second point. i dont think denny borsboom has ever dealt with corporate america (my assumption only. i have never asked him for his CV. he is a university professor in amsterdam) but he is well acknowledged as one of the most important living figures in the area of quantitative analysis for the social sciences and *the* most brilliant psychometrician of the post-IRT generation. the point that you make is very good, but i think that's true from any analysis. gov't agencies or the private industry usually care about results, and as someone who'se done interships at ETS (developers of the SATs, GREs and pretty much all the major standardised tests used in america and the world today) i understand these people end up wanting the "what" more than the "how did you get that". just as jpkelley said... why would you even mention in the first place a poisson distribution? or regression? or even the variance? you're not talking to experts here, what's relevent to them are the results of the analysis, not how you got there... because how you got there requires a certain degree of technical knowledge most people are not interested in acquiring.

    so i ask you... let's assume you're my boss and i'm your number cruncher. what if i asked you: "i can analyze this in a very simple way. it will be wrong and mostly useless, but you'll be able to follow the logic of what i did perfectly. or i can do a super-convoluted analysis that will get you excellent estimates but you wont understand *bleep* of what i did. which one do you prefer?" and if we encourage people to do the wrong thing just because it's easy we are not gonna get very far, arent we?

    albert einstein once said "make things as simple as possible... but not simpler". i mean, i could also try and fit some incredibly bizarre likelihood equation with strange discontinuities to Berely's data and probably get estimates just a tiiiiny bit better than would come from out from a mixed-effects regression. but the improvement from a regular OLS regression/correlation with averaged data to a mixed-effects regression is so substantial, that it is called for, even if it's more complicated to implement and/or understand.
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  3. #18
    TS Contributor
    Points: 22,383, Level: 93
    Level completed: 4%, Points required for next Level: 967
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Coding Data

    Quote Originally Posted by jpkelley View Post
    Just to bring the conversation around to the OP's question...any criticism or questions about the statistical solution proposed in this thread?
    it's kind of what i would do, but i'd need to have a look at the data to see whether poisson seems like a suitable solution or not..
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  4. #19
    TS Contributor
    Points: 13,936, Level: 76
    Level completed: 72%, Points required for next Level: 114
    jpkelley's Avatar
    Location
    Vancouver, BC, Canada
    Posts
    440
    Thanks
    17
    Thanked 90 Times in 84 Posts

    Re: Coding Data

    I agree. I'd like to have a look at the data as well. I wonder if the original poster might provide the forum with a fake data set?

  5. #20
    TS Contributor
    Points: 22,383, Level: 93
    Level completed: 4%, Points required for next Level: 967
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Coding Data

    Quote Originally Posted by jpkelley View Post
    I agree. I'd like to have a look at the data as well. I wonder if the original poster might provide the forum with a fake data set?
    or even better... give us the parameters and the data format and we'll simulate it... even simulated data is better than no data at all... :P
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  6. #21
    TS Contributor
    Points: 13,936, Level: 76
    Level completed: 72%, Points required for next Level: 114
    jpkelley's Avatar
    Location
    Vancouver, BC, Canada
    Posts
    440
    Thanks
    17
    Thanked 90 Times in 84 Posts

    Re: Coding Data

    I couldn't stand it any longer...I decided to play with R on this for a couple of minutes...

    Code: 
    
    ## install packages
    install.packages("lme4"); install.packages("plyr")
    library(lme4); library(plyr)
    
    ## simulate data ##
    df<-data.frame(ind_id=rep(1:11, c(3,4,2,3,2,5,3,2,2,1,7)), educ=rep(sample(1:4,11,replace=T), c(3,4,2,3,2,5,3,2,2,1,7)))
    df<-ddply(df, .(ind_id), transform, offense_num=seq(1,length(ind_id),1))
    df$offense_grade<-df$offense_num     # assume that employees get demerits in sequence (1-7)
    hist(df$offense_grade)    ## all individuals combined
    
    ## GLMM (haven't thought about this too much...might be wrong ##
    mod <- lmer(offense_grade ~ educ + (1|ind_id), family=poisson (link="log"), data=df)
    summary(mod)
    
    ## Just for those who want to change the dataset above and examine residuals, etc. ###
    par(mfrow=c(2,2))
    plot(predict(mod), resid(mod)); abline(h=0)
    hist(resid(mod))
    qqnorm(resid(mod))
    Maybe a poor proof of concept, if you can call it that?
    Last edited by jpkelley; 08-26-2011 at 12:14 AM. Reason: minor addition

  7. #22
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Coding Data


    so i ask you... let's assume you're my boss and i'm your number cruncher. what if i asked you: "i can analyze this in a very simple way. it will be wrong and mostly useless, but you'll be able to follow the logic of what i did perfectly. or i can do a super-convoluted analysis that will get you excellent estimates but you wont understand *bleep* of what i did. which one do you prefer?" and if we encourage people to do the wrong thing just because it's easy we are not gonna get very far, arent we?
    It is rarely that simple as the simpler method won't be wrong as far as the organization is concerned. That is it will be right enough for what it is used for. Organizations commonly want to know which of several numbers are bigger or what the direction of a number, not what it is specifically. So the fact that one answer is closer to the true one (which you will rarely if ever know anyhow) won't matter in most organizations.

    Last summer when I was trying to convince my boss, who knows more statistics than most bosses, of the need to run permutations to deal with missing data (which will make the answer "wrong" if there is data missing not at random) I was told not to worry about "esoteric statistical" issues

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats