1. ## Coding Data

I am examining the relationship between employee level of education and the severity of their disciplinary offenses.

I coded education categorically as (1=High School, 2=Associates, 3=Bachelors, etc...)

Next, I coded disciplinary offenses using the agencies already existing disciplinary scale of (1=counselling, 2= oral reprimand, 3= written reprimand, through 10=termination).

My question is how to handle/code employees with multiple disciplinary offenses. For instance, many employees have multiple disciplinary offenses. If I intend to conduct a correlational study of the aforementioned, should I?:

1. just use the highest recorded disciplinary offense for each employee
2. use the sum of each employee's disciplinary offenses as a "offense score"
3. sum up the offenses and divide by the number of offenses committed for an "offense score"

Thanks,

T

2. ## Re: Coding Data

Interesting study. A couple of questions:
- When employees earn disciplinary action the first time, they receive counseling (level 1), but is it always the case the second action results in #2 (oral reprimand). I'm asking whether disciplinary action is cumulative or not. If so, the maximum will be the best action work (option #1). Rather than thinking about this as a scale of 1-9, you could look at this as simple count data. One employee might get three demerits...so that counts as a three. I'm guessing it the data follows a Poisson distribution, with many employees at zero or 1 offense. It might be good to view the frequency histogram to see if there's zero-inflation (and likely overdispersion) in your data.

To summarize my thoughts and more directly answer your question, I would probably avoid the last two options. The disciplinary action scale, I'm guessing, is cumulative and thus not independent. I would worry about creating a index score.

3. ## Re: Coding Data

Thank you for the quick response and your thought. The disciplinary cases/investigations are separate events and are independent of each other. Hence, depending on the offense(s) committed one employee might accrue 5 separate level one offenses while another may only commit one level 10 offense and be terminated. I created the 1-10 scale based on the employees sustained outcomes (1- counselling, 2-written, 3-8 hr suspension, 4-16 hr suspension..... 10-termination). Your thoughts/advice?

Thank you again,

T

4. ## Re: Coding Data

How you scale your data depends on your research question. If you are interested in what types of offense people get in, just listing the most signficant probably makes sense. If you are interested in how many offenses then you would count them. There is no correct or incorrect coding scheme, it depends on your purpose.

I may have missed this, but what exactly are you trying to test with your data?

5. ## Re: Coding Data

I want to determine if there is a correlation between employee level of education and severity & number of disiplinary offenses.

6. ## Re: Coding Data

OK, noetsi's question yielded a good answer. Good.

So, considering that each employee can receive more than more reprimand for seemingly independent offenses (i.e can receive more than one Grade-1 disciplinary action), I would take a mixed-effects approach. Specify individual as a random effect. Still, I would keep the response as count data (1-9) , since the rank scale has a baseline starting point of 1. I might shift the scale down to zero (by substracting 1) so as to avoid dealing with a zero-truncated distribution. So, all your data is now 0-8 and presumably follows a Poisson distribution. As I see it, you can run the raw data (not the maximum) just to determine if there is an affect of educational level on median offense score. After your parameter estimation, simply add 1 again. This would take care of your severity metric. Follow the same approach for number of disciplinary offenses.

This would be the approach I would take. Others might have opinions about taking this approach with data on categorical scales. Still, the above approach might be the simplest and might be a more than adequate solution.

7. ## Re: Coding Data

I think about my audience a lot when I run data. Particularly how much they know about statistics (and if they are interested). If this is going to managers they will probably want to keep it simple.

A really simple way to do this is just to calculate a count of offenses and compare that to levels of education. You can use Chi Square, Cramer's V etc. It will tell you if there is a meaningful relationship. Then run Spearman's Rho which will give you a correlation (it assumes that both variables are ordinal, but that seems realistic to me if you code education from the lowest to highest level).

You can calculate an average severity and do the same thing for severity.

Of course that is pretty simple, I am still admiring the Poisson analysis jpkelley noted. I have a lot to learn.....

8. ## Re: Coding Data

I think you need to add a field that addresses offense number. As you correctly point out, you could have multiple offenses receiving multiple levels of discipline. You need to be able to distinguish between a first offense that receives counseling and a third offense that receives counseling (which is possible). You could also have a single offense that was so egregious that it warranted termination without prior discipline (even though that may be a rare occurrence that doesn't actually show up in your sample).

Perhaps the field in question is "Number of prior disciplinary actions."

9. ## Re: Coding Data

Originally Posted by noetsi
You can calculate an average severity and do the same thing for severity.
im having an issue with this part. calculating correlations between averaged-data restricts the variance which also restricts the range of the correlation. the variability of means is less than the variability of raw data points, which is kind of the issue that gave birth to hierarchical linear modeling (as a specific instance of random-coefficient regression...)

jpkellys approach might seem more complicated but it does take into account the clustering found in the design..

10. ## Re: Coding Data

It is certainly true it will be less efficient than the method using poisson. My point was that, if your audience is most managers, the simpler method will far more prefered. I don't think most managers are going to understand what a poisson distribution is. And the ones I have worked for preferred something easy to understand among non-statisticians even if it was less accurate than more sophisticated methods.

11. ## Re: Coding Data

Originally Posted by noetsi
You can calculate an average severity and do the same thing for severity.....Of course that is pretty simple, I am still admiring the Poisson analysis jpkelley noted.
spunky already addressed the issue with taking the mean, but I have to second it (though I'm not thinking in the precise terms that spunky is). The consequence of what spunky mentioned is that your parameter estimate (i.e the effect of education) is going to be inaccurate, since the data aren't from a normal distribution (nor could they be transformed to such).

And don't admire the Poisson approach too much. Once you start admiring the Poisson, the whole world starts looking like count data.

12. ## Re: Coding Data

I must say that it's troubling to think that managers would not better practices in their organization. Troubling, I say!

Regardless of whether you use Poisson or not, I don't think you can avoid the fact that you have multiple offenses per individuals. If you take the median of all offenses per individual, you ignore the total number of incidents, If you take the maximum, you ignore the fact that one individual might have 1000 Grade-1 offenses. If you take the mean, you end up with potentially incorrect parameter estimates. Use the variation between and within individuals to your advantage. And you never have to mention the Poisson distribution. Just say that you conducted a test that took into account the fact that there were some individuals with many offenses and that the data were skewed. Or just go all out and baffle them with stats!

13. ## Re: Coding Data

Originally Posted by noetsi
And the ones I have worked for preferred something easy to understand among non-statisticians even if it was less accurate than more sophisticated methods.
uhmm... so... if i follow your argument correctly, i should choose a wrong solution rather than the correct one just because it's simpler? how "less accurate" can i go before my solution is wrong? because if we're talking about simple models, the mean is the simplest (linear) model that exists so one should only show managers means and variances?

whether your audience understands you or not does not depend on the method of choice but on your ability to communicate it. my husband is not a statistician (he didnt even finish high school) but he understands some of the subtleties of maximum-likelihood estimations because i've made sure to bring it down to a level he can understand. i cant see why any manager wouldnt be able to understand a poisson process if you take enough care to explain it properly and present enough examples.

denny borsboom in his annual address to the int'l psychometric society last year was very clear that the new mission of quantitative methodologists in the social sciences was to fight this idea of simple models can address complex questions... the most *appropriate* models should be used to address the most appropriate questions and it is our job to make sure other people understand and use these *appropriate* models... sadly, history is plagued with events where inappropriate statistical analyses (which tended to be simpler) ended up hurting people (like in that horrendous book The Bell Curve) because people (either intentionally or unintentionally) decided to ignore the subtleties involved in analysing data and forumlating a correct research design.

14. ## Re: Coding Data

uhmm... so... if i follow your argument correctly, i should choose a wrong solution rather than the correct one just because it's simpler?
What is wrong in statistics is not all that clear to me. It's not uncommon to use less sophisticated approaches (which may well be less accurate but still accurate enough for your purpose) even in academic journals. My regression professor was told to make interval analysis categorical in nature (change the form of regression used) by a journal because that is the way they did things.

whether your audience understands you or not does not depend on the method of choice but on your ability to communicate it.
Respectfully I disagree. I have the glorious experience of explaining odds ratios to those with little to no statistical background and told to make analysis simpler (by a very bright doctorate in economics) because senior managers could not understand the (more accurate) measure I suggested. Most managers have limited interest in statistics, if you bring up something like a poisson distribution in the discussion it's essentially over regardless of how well you explain it.

I suspect denny borsboom has not worked a lot in corporate america (or government outside academics). I think if you were to survey those who present data to such real world audiences, what I said here would get a D'OH response (that is its so obvious to them that it's taken for granted).

There is a reason businesses and government ignore academics. Overly complex methods to make acceptable data better is the primary one.

Sorry is this is off topic. It is a sore point with me... I was required to make a report signficantly simpler today and all it involved was ANOVA and the like.

15. ## Re: Coding Data

I see your general point but I don't see what it really has to do with this situation. It's certainly the case that a managerial crowd is not going to understand the details of a mixed-effects Poisson regression. But do you really think that they are going to understand the details of Spearman's ranked correlation, the solution you suggested instead? Perhaps more to the point, why is it necessary that they understand the details of the statistical analysis in the first place? It seems to me that the goal is simply to make them understand the conclusions derived from the analysis, a goal which should be pretty much indifferent to what type of procedure you happened to use.

Page 1 of 2 1 2 Last

 Tweet