Ok, this is quite hard to explain, but I'm at a complete loss what to do. I'm a relative newcomer to R and although I can completely admire how powerful it is, I'm not too good at actually using it....
Basically, I have some very contrived data that I need to analyse (it wasn't me who chose this, I can assure you!). I have the right and left hand lengths of lots of people, as well as some numeric data that shows their sociability.
Now I would like to know if people who have significantly different lengths of hand are more or less sociable than those who have the same (leading into the research that 'symmetrical' people are more sociable and intelligent, etc.
I have got as far as loading the data into R, then I have no idea where to go from there. How on Earth do I start to separate those who are close to symmetrical to those who aren't to then start to do the analysis?
Last edited by TheEcologist; 01-07-2011 at 10:17 AM. Reason: typo
The true ideals of great philosophies always seem to get lost somewhere along the road..
I definitely need to apologise for the ridiculous way I've asked the question...I'm definitely under the stupid category on your guide so thanks for your patience.
Thank you for sending me the other link too, I'll certainly be using it in future.
I've tried to simplify my question. I would like to do two things:
1. Test whether the lengths of the left and right hands are significantly different from each other.
2. Test whether the sociability is significantly affected by hand length (i.e. is there a difference between those people who have similar left and right hand lengths, to those who have different lengths?)
I have 150 people, and thought that at some point (when I eventually figure out the stuff before), I'd have to do something along the lines of:
glm(sociable~ ???, family="binomial" )
??? : I've got no idea what to put here...
Don't worry about it.
Why dont you post an example of how you structured your data (with the command head as I suggested.. we will only see the first few lines). This will make it easier for me (or someone else) to help you with the code. That way we will have an idea of what to put there
With the data example I can also help you easier with 1.
The true ideals of great philosophies always seem to get lost somewhere along the road..
What I've done so far is this:
hist(social)
This shows me that the data isn't normally distributed (i.e. it's a reverse J distribution).
I then:
stand=scale(measurements$l.hand-measurements$r.hand)
m<-lm(measurements$social~stand)
m
summary(m)
anova(m)
par(mfrow=c(2,2))
plot(m)
This obviously gives me the model plot checks, and again shows the data isn't normally distributed.
I then tried to plot the results on a graph, so:
plot(l.hand-r.hand,social,
ylab="Sociability (number of people spoken to)",
xlab="Difference in Hand Length (cm)")
But I'm unable to plot a line of best fit on it....I tried abline, but that just runs a horizontal line straight through 0. So then I looked at scatter.smooth but that looks wrong too.....
Finally, I'm trying to do some kind of analysis on the data, but I'm totally lost here. I've muddled my way through some ideas:
1.
stand=scale(measurements$l.hand-r.hand,center=FALSE)
fake=abs(stand)<1.96
t.test(measurements$social[fake],measurements$social[!fake])
But I don't think I have enough observations to do this (150)...
2.
cor(abs(measurements$l.hand-measurements$r.hand),measurements$social)
But again, I have no idea if this is right and don't know how to intepret this.
3.
set.seed(1)
DF <- data.frame(l.hand = rnorm(100, 15, sd = 2), r.hand = rnorm(100, 15, sd = 2), social = runif(100))
DF <- within(DF, hands <- l.hand - r.hand)
mod <- lm(social ~ hands, data = DF)
summary(mod)
plot(social ~ hands, data = DF)
So I've messed about with all of these and they all work, but I just feel like I'm blindly trying anything, feeling optimistic when it doesn't error, but in essence, have absolutely no idea what I'm doing
Oke now I have a slightly better idea of how your data looks (all I wanted to know from the data example was how you had structure your data).
No offense meant, but It always amazes me how much (often sophisticated) statistics people conduct on there data without first looking at them (now if you did the below first then this comment is off course not for you). From experience I know that doing some simple data explorations sometimes save you hours of fussing with analysis. It (1) gives you a 'feel' for your data, (2) helps you understand the trends you find and (3) make your analysis much more directed.
For your objective 1. Try this: plot a boxplot of both statistics.
Now that should already give you a pretty good idea of the existence of any differences, the distribution of the data and what test would be best.Code:#create a data frame of handstats handstats=data.frame(length=c(measurements$l.hand, measurements$r.hand), hand=c(rep(150,'left'),rep(150,'right')) # boxplots boxplot(handstats$length~handstats$hand)
Report on how this looks, but best of all would be if you posted the graph here.
For 2. Plot these scatter plots.
Again when you report back, it would be best to post the graph. We should then be able to see which course of action make sense.Code:par(mfrow=c(2,2)) # 1 plot(social~l.hand) # 2 plot(social~r.hand) # 3 plot(social~c(r.hand-l.hand)) #4 this tell you whether right and left hand lengths are related, # and thus if for 1 a different test would be more appropriate (e.g. ANCOVA). plot(r.hand~l.hand)
Hope this helps,
The true ideals of great philosophies always seem to get lost somewhere along the road..
Thank you so much for helping me....you have no idea how grateful I am.
I tried to do the first thing you said, but it just errors:
> handstats=data.frame(length=c(measurements$l.hand,
+ measurements$r.hand), hand=c(rep(150,'left'),rep(150,'right'))
+ boxplot(handstats$length~handstats$hand)
Error: unexpected symbol in:
"measurements$r.hand), hand=c(rep(150,'left'),rep(150,'right'))
boxplot"
I have retyped it word for word, just in case it's a formatting error, etc. but to no avail.
Part two graphs:
This worked! I have no idea how to post a graph in the thread though (how terrible do I appear with computers?!) but have attached it.
Last edited by Gemsie; 01-08-2011 at 09:54 AM.
Aah I missed one parenthesis, here it is correct:
That should work.Code:handstats=data.frame(length=c(measurements$l.hand, measurements$r.hand), hand=c(rep(150,'left'),rep(150,'right'))) boxplot(handstats$length~handstats$hand)
Also don't forget this plot (scatter plot left right hand lengths):
Looks like there is some positive relationship (the graphs also leads me to believe there is a relationship between l.hand and r.hand as well). But lets see the scatter plot of left right hand lengths first.Code:plot(measurements$l.hand~measurements$r.hand)
There seems to be no relationship between socialibilty & the difference in handsizes (r.hand-l.hand) so you can forget about that.
The true ideals of great philosophies always seem to get lost somewhere along the road..
Ok, I tried it again, but got this:
> handstats=data.frame(length=c(measurements$l.hand,
+ + measurements$r.hand), hand=c(rep(150,'left'),rep(150,'right')))
Error in rep(150, "left") : invalid 'times' argument
In addition: Warning message:
In data.frame(length = c(measurements$l.hand, +measurements$r.hand), hand = c(rep(150, :
NAs introduced by coercion>
boxplot(handstats$length~handstats$hand)
Error in eval(expr, envir, enclos) : object 'handstats' not found
What does this mean? I've noticed that in the output code, there is an extra +, which I haven't written in the coding window....strange.
The new graph was interesting with a definite relationship (except for one very strange person who has particularly odd hands)!
Hi Gemsie,
That error message means that there are not exactly 150 samples (I thought there were 150 from your previous remarks). Lets see if this finally works:
Code:# I'm adding code to find the exact length of the measurements N=dim(measurements)[1] handstats=data.frame(length=c(measurements$l.hand, measurements$r.hand), hand=c(rep(N,'left'),rep(N,'right'))) boxplot(handstats$length~handstats$hand)
That's what I expected, if you have a large right hand odds are you have a large left hand! This also tells us that both variables give us basically the same information, you thus don't need to and should not use both [in statistics this is called collinearity. You need to choose either left or right hand lengths in the rest of your analysis (you can let model fits guide your decision later, but I suspect right hand lengths will be best - as this is the dominant hand for most people).
Next try this code (we start with a very simple linear regression model).
Code:m1=lm(social~r.hand) # or whatever hand you choose see what these command tell you: #summary of the fitted model summary(m1) # evaluate if the residuals are normal hist(m1$residuals) shapiro.test(m1$residuals)
The true ideals of great philosophies always seem to get lost somewhere along the road..
> N=dim(measurements)[1]
> handstats=data.frame(length=c(measurements$l.hand,
+ measurements$r.hand), hand=c(rep(N,'left'),rep(N,'right')))
Error in rep(N, "left") : invalid 'times' argument
In addition: Warning message:
In data.frame(length = c(measurements$l.hand, measure$r.hand), hand = c(rep(N, :
NAs introduced by coercion
> boxplot(handstats$length~handstats$hand)
Error in eval(expr, envir, enclos) : object 'handstats' not found
BTW- I actually have 152 samples (I jut rounded down for simplicity, sorry), so I redid the original using 152...still didn't work!
You can't use a character as the times parameter. You probably want it switched aroundYou should also learn how to debug these things yourself (it's a very good skill). R has a good built in help system. To get help on using rep you would doCode:rep("left", 152)
But for your purposes you would probably want to explore and use something likeCode:?rep #or help(rep)
Code:N <- 152 rep(c("left", "right"), each = N)
Yay! I got it to work! Thank you, thank you, thank you to TheEcologist and Dason!
I use the help function...a lot, but I always think it's super complex. I could feasibly have sat there for hours trying to figure out which is the wrong way round, etc. I have used Crawley's text too, but didn't find it particularly helpful
Does anybody have any recommendations for textbooks, etc?
> m1=lm(social~r.hand)
> summary(m1)
Call:
lm(formula = social ~ r.hand)
Residuals:
Min 1Q Median 3Q Max
-16.922 -8.468 -4.238 4.568 34.874
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -14.5674 7.5104 -1.940 0.054300 .
r.hand 3.0612 0.8054 3.801 0.000209 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 12.19 on 150 degrees of freedom
Multiple R-squared: 0.08785, Adjusted R-squared: 0.08177
F-statistic: 14.45 on 1 and 150 DF, p-value: 0.0002091
> hist(m1$residuals)
> shapiro.test(m1$residuals)
Shapiro-Wilk normality test
data: m1$residuals
W = 0.8571, p-value = 7.608e-11
Last edited by Gemsie; 01-08-2011 at 04:19 PM.
The true ideals of great philosophies always seem to get lost somewhere along the road..
Coding from memory?
Oh.my.gosh.
Tweet |