Coding Data

jpkelley

TS Contributor
#21
I couldn't stand it any longer...I decided to play with R on this for a couple of minutes...

Code:
## install packages
install.packages("lme4"); install.packages("plyr")
library(lme4); library(plyr)

## simulate data ##
df<-data.frame(ind_id=rep(1:11, c(3,4,2,3,2,5,3,2,2,1,7)), educ=rep(sample(1:4,11,replace=T), c(3,4,2,3,2,5,3,2,2,1,7)))
df<-ddply(df, .(ind_id), transform, offense_num=seq(1,length(ind_id),1))
df$offense_grade<-df$offense_num     # assume that employees get demerits in sequence (1-7)
hist(df$offense_grade)    ## all individuals combined

## GLMM (haven't thought about this too much...might be wrong ##
mod <- lmer(offense_grade ~ educ + (1|ind_id), family=poisson (link="log"), data=df)
summary(mod)

## Just for those who want to change the dataset above and examine residuals, etc. ###
par(mfrow=c(2,2))
plot(predict(mod), resid(mod)); abline(h=0)
hist(resid(mod))
qqnorm(resid(mod))
Maybe a poor proof of concept, if you can call it that?
 
Last edited:

noetsi

Fortran must die
#22
so i ask you... let's assume you're my boss and i'm your number cruncher. what if i asked you: "i can analyze this in a very simple way. it will be wrong and mostly useless, but you'll be able to follow the logic of what i did perfectly. or i can do a super-convoluted analysis that will get you excellent estimates but you wont understand *bleep* of what i did. which one do you prefer?" and if we encourage people to do the wrong thing just because it's easy we are not gonna get very far, arent we?
It is rarely that simple as the simpler method won't be wrong as far as the organization is concerned. That is it will be right enough for what it is used for. Organizations commonly want to know which of several numbers are bigger or what the direction of a number, not what it is specifically. So the fact that one answer is closer to the true one (which you will rarely if ever know anyhow) won't matter in most organizations.

Last summer when I was trying to convince my boss, who knows more statistics than most bosses, of the need to run permutations to deal with missing data (which will make the answer "wrong" if there is data missing not at random) I was told not to worry about "esoteric statistical" issues :)