+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 17

Thread: Was thinking Kendall's Tau but...(correlation)

  1. #1
    Points: 1,482, Level: 21
    Level completed: 82%, Points required for next Level: 18

    Posts
    26
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Was thinking Kendall's Tau but...(correlation)




    Hey guys, I was wondering if you could help me out with this:
    I have seven sites along the west coast each with two types of data and several hundred points. I checked the wave exposure at each site, and I also measured the breakage/repair of sea urchin spines. My prediction was that sites with higher wave exposure would result in more breakage repair.

    So, using JMP, in one column I've got wave exposure (std dev of water level), and in the other I have repair rings (they look like tree rings, long story). A third column lists the site.

    I was thinking Kendall's Tau, but my adviser instructed me that I cannot arbitrarily assign one wave exposure value to one sea urchin spine value even if they are from the same site (because KT works with ranks). So I averaged both wave exposure and breakage/repair rings and did a kendall's tau, but the number of replicates became a very small number, and I'm wondering if there's a way I can run a correlation on the raw data.

    Any help would be greatly appreciated. Thanks so much.

  2. #2
    TS Contributor
    Points: 22,411, Level: 93
    Level completed: 7%, Points required for next Level: 939
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Was thinking Kendall's Tau but...(correlation)

    hello there. i am a little bit concerned about your approach with regards to averagig over wave exposure and breakage/repair because, given that you work with data in natural settings, i'm sure you're already aware of the problems related to making inferences over averages because you're risking falling into an ecological fallacy.

    i would like to suggest fitting a hierarchical linear model with 2-levels, with site as a level-2 predictor and the readings at each site as level 1 predictor(s)... although i would have to look at your data more closely because i'm not sure whether 7 sites would be enough... (or maybe they will, depends on the within-site variability...)

  3. #3
    Points: 1,482, Level: 21
    Level completed: 82%, Points required for next Level: 18

    Posts
    26
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Was thinking Kendall's Tau but...(correlation)

    THANK YOU! I only did averages because my adviser instructed me to do so. And I actually have two tidepools at each site so its 12 (the sensors in two pools malfunctioned). So in my JMP file I have them labeled as if each pool were a different site (ie from Bodega bay I have Bo1 and Bo2).

    I'm not sure I understand how to do a hierarchical linear model, though. Is there a simple way to do this in JMP?

  4. #4
    TS Contributor
    Points: 22,411, Level: 93
    Level completed: 7%, Points required for next Level: 939
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Was thinking Kendall's Tau but...(correlation)

    uhmmm... i've only used JMP once but, if my memory serves me right, i'm pretty sure they have a little thingy somewhere in their pull down menu to do mixed regression (random and fixed effects, that's why it's mixed. google says it's called "REML" there in JMP)...

    the only other way i know how to do this (which only works well if you have a small number of independent variables) is to treat site as a categorical predictor and dummy-code the hell out everything (so you can get all possible patters of site membership) which will work BUT you have to be VERY careful when you code...

    i honestly dont know if there is a simpler way to do this correctly. you have a complex structure underlying your data and the bulk of statistical literature says that needs to be handled appropriately... aaand, as it usually happens when you do research, if you are prepared to ask interesting questions then you're expected to perform interesting (i.e. more complicated) analyses to answer them...

  5. #5
    Points: 1,482, Level: 21
    Level completed: 82%, Points required for next Level: 18

    Posts
    26
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Was thinking Kendall's Tau but...(correlation)

    What do you mean by dummy coding? Im trying to use your method, but I'm lost

  6. #6
    TS Contributor
    Points: 22,411, Level: 93
    Level completed: 7%, Points required for next Level: 939
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Was thinking Kendall's Tau but...(correlation)

    Quote Originally Posted by aesopclimber View Post
    What do you mean by dummy coding? Im trying to use your method, but I'm lost
    so i was gonna rant endlessly about regression and how to do this but i said to myself "what the heck?, let one of the best speak for himself".

    go grab from your local library: "applied multiple regression/correlation analysis for the behavioural sciences" by Cohen, Cohen, Aiken and West. then, (at least from the latest edition) look up chapter 7 to see how to play around with continuous interactions, chapter 9 on how to add and work out correctly categorical X categorical or categorical X continuous interactions (around 99.9% of people do this wrong and they dont get their analysis right. dont be one of them and chapter 14 (the first part only) to see how to do all the interactions that you'll need to get the correct parameter estimates.

    any good book in ecological economics or econometric methods will help you out as well. economists have been dealing with this kind of problems for ages (answering questions about, i dunno, income or salary while controlling for the fact that neighbourhood A is uber-rich but neighbourhood B is uber-poor).

    hope that helps..

  7. #7
    Points: 1,482, Level: 21
    Level completed: 82%, Points required for next Level: 18

    Posts
    26
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Was thinking Kendall's Tau but...(correlation)

    While stressing out over this analysis, I took a break and started writing my discussion section based on the other results of my thesis. I decided that what I want is probably not a regression analysis, because at the end of the day, I do want some kind of ranking.

    Let me explain. My project asks the question: can we use the number of spine repair cycles in sea urchins to give us a relative measure of the wave exposure in the area? So basically I want a score of wave exposure per site, and I want a score of spine rings per site. Then I want to rank these sites by each variable. Finally, I want to compare these rankings.

    Thank you for the tip on the book, and I will probably still pick it up for future research (trying to start collecting good books). But is anyone aware of a good analysis or string of analyses that I can use here?

  8. #8
    TS Contributor
    Points: 22,411, Level: 93
    Level completed: 7%, Points required for next Level: 939
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Was thinking Kendall's Tau but...(correlation)

    Quote Originally Posted by aesopclimber View Post
    Then I want to rank these sites by each variable. Finally, I want to compare these rankings.
    mmmhh... could you give me an example here? ok, so i know absolutely nothing about sea urchins or wave patterns or anything so bear with me if my example is a little too dumb... but would it be sometihng along the lines of "on beach A, we have an average of XX spin repair cycles and YY units of wave exposure, on beach B we have ZZ spin repair cycles and WW units of wave exposure, etc..." (<-- btw, this example is logically wrong because i'm falling into the ecological fallacy) do you have any idea of the direction of your hypothesis? (i.e. does an increase on spin repair cycles implies an increase in wave exposure or vice-versa?)

    i think the problem i'm having is understanding where the ranking comes in...

  9. #9
    Points: 1,482, Level: 21
    Level completed: 82%, Points required for next Level: 18

    Posts
    26
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Was thinking Kendall's Tau but...(correlation)

    You've basically got it, although I'd like to avoid averages.

    I'm interested in using rankings because ultimately it would be super awesome if you could go to a site and say, "This urchin has a bunch of spine repair cycles, the waves here must be crazy."

    So if the ranking of sites by spine cycle score matches the ranking of sites by wave score, then yeah, increased wave exposure increases spine cycles. (If the rankings had an opposite correlation, then I would guess there was some site to site difference in sediment scour or sub-lethal predation etc etc etc.)

    So I think that ranking might be correct, but then again, I can't really understand how site can be used as a predictor in a regression.

  10. #10
    Cookie Scientist
    Points: 13,431, Level: 75
    Level completed: 46%, Points required for next Level: 219
    Jake's Avatar
    Location
    Austin, TX
    Posts
    1,293
    Thanks
    66
    Thanked 584 Times in 438 Posts

    Re: Was thinking Kendall's Tau but...(correlation)

    It's still not clear to me why using ranks is a necessity here, or even why it would be desirable. You just want to say that more spine repair cycles = greater wave exposure. Why would it be more impressive to use ranks here? It seems to me that it may actually be less impressive. Being able to predict that the site in your sample with the most spine repair cycles also has the greatest wave exposure is cool, but wouldn't it be even cooler to be able to give an actual quantitative prediction of the amount of wave exposure?

  11. #11
    TS Contributor
    Points: 22,411, Level: 93
    Level completed: 7%, Points required for next Level: 939
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Was thinking Kendall's Tau but...(correlation)

    i'm on with Jake on this one.. from what you mentioned, it seems to me that your question is one of degree (does an increase in spine repair cycle goes along with an increase in wave activity?) rather than ranking...plus... you're looking for a simpler way than hierarchical linear modeling, yet you would like to do inference on ranking/order statistics that are themselves nested? how are you gonna get your standard errors? oh god... for yours and my mental sanity...please... dont

    the Cohen book will help you out but i'm a little bit surprised (and concerned) that you're not working alongside a statistics expert to deal with this.... i mean, the focus of your thesis is on sea urchins and waves, not how to analyze ecological data properly. have you asked your advisor about this? is there some sort of stats consultant or methodology expert in your dept/faculty?

  12. #12
    Points: 1,482, Level: 21
    Level completed: 82%, Points required for next Level: 18

    Posts
    26
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Was thinking Kendall's Tau but...(correlation)

    Haha, believe me, I've talked to my advisor, who is a biostats teacher, and KT is what he suggested.....not gonna go there.

    Anyway. I see exactly what you're saying about rankings vs regression, but let me try one more time to explain (sorry if this is super scatterbrained).

    My goal from the beginning has been using spine cycles as a relative measure of wave exposure. Our lab has been studying sea urchin populations from these sites since the 70's and we wanted a way to say site A has relatively higher wave exposure than site B. The lab portion shows that breaking spines definitely increases spine cycles (I clipped them with scissors once a month). So I already know that if waves break spines, then wave exposure = more spine rings. However, I wanted a non-expensive way of saying, "hey this site has a high level of wave exposure."

    In addition, looking at spine cycles is not like looking at a pH or salinity probe. Its not an instantaneous measure of the current wave situation, it takes about a month and change for a spine to fully regenerate and add a new cycle. What I want for wave exposure and for spine cycles are two scores for each site that I can compare...I'm losing my mind

    I will get that book, but for the sake of finishing my thesis, how do you suggest I do the regression/dummy coding? I realize this is an extremely needy request, but I really just want to finish.

    I basically just have a bunch of wave exposure data per site, and I want to compare it to spine cycle data per site. Keep in mind that one wave exposure data point and one spine cycle data point are not directly related (ie cant be a point on a graph).

  13. #13
    Points: 1,482, Level: 21
    Level completed: 82%, Points required for next Level: 18

    Posts
    26
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Was thinking Kendall's Tau but...(correlation)

    Quote Originally Posted by aesopclimber View Post
    I basically just have a bunch of wave exposure data per site, and I want to compare it to spine cycle data per site. Keep in mind that one wave exposure data point and one spine cycle data point are not directly related (ie cant be a point on a graph).
    I think this is the root of the problem. My data sets are not 1 to 1. IE unlike comparing height and weight for a population where each person has one H and one W value, I have sites that have one exposure value per each tidal cycle (~24 hours), and one spine cycle value per each spine (5 spines per urchin, 15 urchins per site). see my dilemma? I cannot match column A value with column B value.

  14. #14
    TS Contributor
    Points: 22,411, Level: 93
    Level completed: 7%, Points required for next Level: 939
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Was thinking Kendall's Tau but...(correlation)

    Quote Originally Posted by aesopclimber View Post
    I cannot match column A value with column B value.
    i'm not sure which name it recieves in biology, but in the social sciences this is called the "unit of analysis" problem. in my case i deal relatively often with situations in which, for example, math scores on tests from students are nested within classrooms which are themselves nested within schools.... as you can imagine, just as you have a lot of data on one side for urchins but only limited data for waves per site (because many urchins live in one site) i have many test scores for students but only one teacher (or classroom) on which they are reported. and the problem is that when your data moves in clusters naturally and you analyze it ignoring that, you can run into all sorts of statistical nightmares (inflated type 1 errors, biased/incorrect estimates, lack of statistical power... or everything at the same time). the answer to this which was formalized during the 80's was to fit what are called now hierarchical linear models, which is a specific example of the more general technique of random-coefficients regression.

    now, if i remember correctly your advisor told you to average urchins in each site but i can't, for the life of me, understand how (s)he is missing the bigger picture of how to avoid falling into an ecological fallacy. besides, what i think would be a primary concern here is how to decompose the variance due to urchins and the variance due to sites. say you treat this as a regular OLS regression problem or work out some other standard correlational technique... if i was reading your thesis, even though i know nothing about urchins, one of my immediate reactions would be to ask something along the lines of "well, and how are you controlling for the fact that urchins living in different sites may be subject to different condtions? perhaps the common variance between spin cycle regeneration and level of wave activity can be explained by something inherent to the site itself and not to the urchins...."

    i think as a matter of conclusion what i can see is: you have a complicated design here (or at least more complicated that what a standard anova or regression/correlation can handle). i think if your advisor instructs you to do something in a particular way you should try and stick with it but it is good science to ask him/her why... i mean, you have to balance the fact that (s)he will be grading you with what you know is right. and, at least from what you've read here, you can get a little bit of an idea of how to proceed next time. or you can use some standard correlational technique and mention somewhere there in your thesis that you're aware that your data does not lend itself to this particular kind of analysis anyways.

    oh, and regarding the dummy coding, that's the least of your problems. if you were to proceed along this path, you'd need to learn about dummy coding because you need to work out the interactions between urchins X sites correctly so you can control for the variance of site (which you're not interested in) as a potential variable correlated with wave exposure. and it is not that i dont want to explain to you how to do it, it's just that i believe the Cohen book will do a considerably better job than i can do, but it's just basically adding columns of 0's and 1's so that the urchins that belong to a site get the 1s (and their urchin-spine predictors are multiplied by it) and the urchins not in the site get a 0 (so you dont confound the variance of urchins at site A with those at stite B)... then you work out the interaction between spines at site A with site A membership, interaction between spines at site B with site B membership and so on until you've only left the common variance between sea waves and urchin spines, irrespective of site...

    that's why i didnt want to explain it. it actually sounds more complicated than it actually is, but the Cohen book does a better job at explaining it nicely...

    anyways, i still think you should be able to dump this data to your local stats expert, tell him/her what you want, let him/her work his/her magic and get you those regression coefficients you need.... at least that's how i'm paying for graduate school, heh.

    maybe Jake (or someone else) can join in and share some ideas because, to me, it's either generalized estimating equations (you dont wanna go there), hierarchical linear models (your best option), crazy-dummy-coded regression (doing things the old school style) or... well, i ran out of options. nested data tends to be a !@%!$ to work with sometimes...

  15. #15
    Points: 1,482, Level: 21
    Level completed: 82%, Points required for next Level: 18

    Posts
    26
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Was thinking Kendall's Tau but...(correlation)


    Well, I'll see what my adviser thinks, but I have to say, I really appreciate the time and energy you put in to helping me. I'll let you know if I figure anything out.

+ Reply to Thread
Page 1 of 2 1 2 LastLast

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats