+ Reply to Thread
Results 1 to 15 of 15

Thread: Help with inter-rater reliability

  1. #1
    Points: 40, Level: 1
    Level completed: 80%, Points required for next Level: 10

    Posts
    7
    Thanks
    7
    Thanked 0 Times in 0 Posts

    Help with inter-rater reliability




    Hi all, I've got a set of data that I need some advice on. I'm working on a pilot project where a group of four people evaluate essays. The way that I'm working it is that each essay is read by two of the four people on the assessment team. Each of these individuals will rate five different aspects of the essay on a scale of five, giving each essay a possible score of 5-25 points. My question is about calculating inter-rater reliability. What calculation would be best for this assessment? I'm running into trouble since I'm looking for a reliability within a group of four when only two randomly assigned people are reading each essay. Do I need to calculate a different reliability between each possible pair of readers, or is there a measure I can use for the entire group?

    Any advice you can provide is greatly appreciated.

  2. #2
    Omega Contributor
    Points: 38,396, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,001
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Help with inter-rater reliability

    How many essays?

  3. The Following User Says Thank You to hlsmith For This Useful Post:

    Blank (09-07-2012)

  4. #3
    Points: 40, Level: 1
    Level completed: 80%, Points required for next Level: 10

    Posts
    7
    Thanks
    7
    Thanked 0 Times in 0 Posts

    Re: Help with inter-rater reliability

    Not many since this is just a pilot to get the university started. There will be 50 essays with two readers each. Believe me, I know there are problems with such a low number, but the administration wants something soon to show that we're doing assessment. Once that goes through, I'll actually have money to put together a more appropriate study.

    But anyway, I'm capable of running the numbers myself, I'm just not sure what formula to use for this particular data set.

  5. #4
    Omega Contributor
    Points: 38,396, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,001
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Help with inter-rater reliability

    Sorry for all of the questions.

    So there are 4 essay reviewers and 50 essays. Each essay was reviewed by two reviewers, I am assuming based on no random assignment (just convenience).

    If only two people reviewed each unique essay, and these combination were always changing, it almost seems like you may need to calculate a kappa statistic for each combination of two reviewers. Did a reviewer examine all 5 portions of the essay, or did the components get split up among reviewers as well?

    I will knowledge that I have not run into this exact scenario before. Can you not get all four to review some of the same essays, or is that too labor intensive? Lets see what others propose and a broader literature review may be beneficial. You MAY be able to also run a generalized Kappa for all of the reviewers for all essay sections combined, but that would not really help describe where differences are. Depends on your aims.

  6. The Following User Says Thank You to hlsmith For This Useful Post:

    Blank (09-07-2012)

  7. #5
    Points: 40, Level: 1
    Level completed: 80%, Points required for next Level: 10

    Posts
    7
    Thanks
    7
    Thanked 0 Times in 0 Posts

    Re: Help with inter-rater reliability

    No worries. Thanks for the help.

    Yes, four reviewers, 50 essays. The essays weren't split up by components, they were just different facets of the essay. So one reader would pick up an essay (yes, distributed by convenience rather than any specific method), evaluate five aspects of the essay (thesis, support, structure, mechanics, conclusion) and then move on to the next. We had someone collecting the data as it came in, and any essay that displayed more than a one point difference on any axis was then read and scored by a third reader. So most of the essays had two readers but a few had three, making it problematic to calculate reliability between pairs. Also, if I went that route, wouldn't I have to cut it down to each of the five facets, as well? I should note that the third reader was just meant to improve upon the overall reliability of the essay rating. I'm not using tertium quid to throw out the different score.

    So yeah, I'm still not sure where to go with this.

  8. #6
    Points: 40, Level: 1
    Level completed: 80%, Points required for next Level: 10

    Posts
    7
    Thanks
    7
    Thanked 0 Times in 0 Posts

    Re: Help with inter-rater reliability

    Got a follow up here. I've found a source that says that I can use this formula for averaged ratings:

    (Between persons mean square) - (Within persons mean square)) / (Between persons mean square)

    Unfortunately, it doesn't give any more explanations than that. Do these terms mean anything to anyone?

  9. #7
    TS Contributor
    Points: 22,432, Level: 93
    Level completed: 9%, Points required for next Level: 918
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Help with inter-rater reliability

    yup... it looks a lot like one of the many formulas for the many intraclass correlation coefficients out there..
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  10. The Following User Says Thank You to spunky For This Useful Post:

    Blank (09-07-2012)

  11. #8
    Points: 40, Level: 1
    Level completed: 80%, Points required for next Level: 10

    Posts
    7
    Thanks
    7
    Thanked 0 Times in 0 Posts

    Re: Help with inter-rater reliability

    Yep, that's where I found it. So, any idea what "within persons mean" and "between persons mean" is?

  12. #9
    Omega Contributor
    Points: 38,396, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,001
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Help with inter-rater reliability

    Can you post where you got the formula? URL or link.

  13. The Following User Says Thank You to hlsmith For This Useful Post:

    Blank (09-07-2012)

  14. #10
    Points: 40, Level: 1
    Level completed: 80%, Points required for next Level: 10

    Posts
    7
    Thanks
    7
    Thanked 0 Times in 0 Posts

    Re: Help with inter-rater reliability

    Er.. not really. It's from a book. The article is "Reliability Issues in Holistic Assessment" by Roger Cherry and Paul Meyer from the book Validating Holistic Assessment Scoring for Writing Assessment: Theoretical and Empirical Foundations edited by M Williamson and B Huot. The Google Books information is here:

    http://books.google.com/books/about/...d=ozgmAQAAIAAJ

    But there's no available ebook to peruse. At this point, I'm considering just going with this simple calculator http://www.med-ed-online.org/rating/reliability.html and calling it done. The admins who'll be looking at this stuff won't really know the difference, and that'll give me more time to see what other people have done in preparation for continuing the study next year.

  15. #11
    Points: 7,069, Level: 55
    Level completed: 60%, Points required for next Level: 81

    Posts
    298
    Thanks
    9
    Thanked 42 Times in 39 Posts

    Re: Help with inter-rater reliability

    Not commenting on the validity of the formula, but just to explain it. (And with the caveat that I haven't finished my coffee yet this morning...) I believe it means:

    (The mean score for all 4 evaluators reading all 50 essays)squared - (the mean score of all a single evaluator's scores for all the essays they read)squared / the first one

    So take the mean score for all 100 evaluations (pretend it's 15), square it (to 225), then subtract the mean score for all 25 of evaluator #1's evaluations (pretend it's 9) squared (to make it 81) and divide it by the overall mean score squared (225 again) for a grand total of 0.64. That tells you Evaluator #1's score.

    Now if you did the same thing for Evaluator #2 and came up with the same number (then presumably) you could assume that Evaluator #1 and Evaluator #2 were similar evaluators even if they didn't read the same essays.

    But if Evaluator #2's mean score was 20 (squared to 400), then you come up with -0.7778. So you could conclude that Evaluator #2 doesn't see essays the same way that Evaluator #1 does. And you can say that even if you don't have essays in common by which you can directly compare scores.

    I think that's right. Yes? Someone who is awake?

  16. The Following User Says Thank You to Berley For This Useful Post:

    Blank (09-07-2012)

  17. #12
    TS Contributor
    Points: 22,432, Level: 93
    Level completed: 9%, Points required for next Level: 918
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Help with inter-rater reliability

    Quote Originally Posted by Berley View Post
    (The mean score for all 4 evaluators reading all 50 essays)squared - (the mean score of all a single evaluator's scores for all the essays they read)squared / the first one
    uhmm... this does not seem quite right right. look at how to calculate the mean squares from any standard ANOVA formulas most traditional ICCs (Intra Class Correlations) are obtained from the mean squares you'd calculate when doing an ANOVA
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  18. The Following User Says Thank You to spunky For This Useful Post:

    Blank (09-07-2012)

  19. #13
    Points: 7,069, Level: 55
    Level completed: 60%, Points required for next Level: 81

    Posts
    298
    Thanks
    9
    Thanked 42 Times in 39 Posts

    Re: Help with inter-rater reliability

    Quote Originally Posted by spunky View Post
    uhmm... this does not seem quite right right. look at how to calculate the mean squares from any standard ANOVA formulas most traditional ICCs (Intra Class Correlations) are obtained from the mean squares you'd calculate when doing an ANOVA
    So you think maybe it means to calculate the sample variance for each instead? I could buy that, and it certainly seems like it would be more accurate. I just didn't take that away from what was presented.

  20. The Following User Says Thank You to Berley For This Useful Post:

    Blank (09-07-2012)

  21. #14
    Points: 40, Level: 1
    Level completed: 80%, Points required for next Level: 10

    Posts
    7
    Thanks
    7
    Thanked 0 Times in 0 Posts

    Re: Help with inter-rater reliability

    Ah, I get it now. I found this site that explains what those terms actually mean: http://people.richland.edu/james/lec.../ch13-1wy.html

    I'm not sure why it wasn't showing up when I searched for the terms specifically, but adding in the ANOVA led me in the right direction.

    Thanks for all of the help.

  22. #15
    TS Contributor
    Points: 22,432, Level: 93
    Level completed: 9%, Points required for next Level: 918
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Help with inter-rater reliability


    Quote Originally Posted by Berley View Post
    So you think maybe it means to calculate the sample variance for each instead? I could buy that, and it certainly seems like it would be more accurate. I just didn't take that away from what was presented.
    something like that... i mean not exactly but you got the idea that some sort of variance is at play here. i guess i'm just so used to working with these things that the minute i saw a ratio of mean squares in the context of inter-rater reliability i immediatley thought "oh... the OP must be looking at a formula for the intra class correlation coefficient". the thing is there area many intraclass correlation coefficients (i think there are 3 or 4 out there) so it really depends on which one of all the OP is looking at
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats