Urgently need help with data analysis

Hey folks,

i really am desperate for help with this, since I have to present my results at a conference in less than two weeks. I am a statistics-noob undergraduate psychoglogy-student, so please go easy on me.

Research and Variables

I want to examine the correlative relationship between a personality trait (PT) and spatial memory (SM). My hypothesis is, that there is a relevant, positive correlation between these two. However, here are all the important variables:

  • Spatial Memory (SM): The outcome, scaled measure
  • Personality Trait (PT): The predictor, scaled measure
  • Intelligence (IQ): As a possible covariate, scaled measure
  • Subjects Age (AGE): As a possible covariate, scaled measure
  • PC-Skills (PC): As a possible covariate, scaled measure
  • Several other variables, also possible covariates, won't list them all here

Sample & Data-analysis so far

Unfortunately my sample-size is very limited (N = 26), which causes a lot of problems. This is due to the special nature of my population and there's nothing I can do about it now, except for making the best of it.

I cannot make any assumptions about normal distribution in the population, since my data is not sufficient. So let's say there's no normal distribution. However, I will be using some methods that require normally distributed data, like the pearson correlation anyway, since I've been told that the issue can be ignored. If I'm completely wrong about this, let me know.

Correlations with the outcome:
  • r (PT, SM) ~ .49*
  • r (IQ, SM) ~ -.40*
  • r (AGE, SM) ~ .70**
  • r (PC, SM) ~ -.21 (not significant)
  • the other correlations were not significant either, so that's why I didn't list them

Now to my problem(s):
  • I want to make sure that the influence on SM comes from PT and not PT in interaction with the other covariates. I would use a mutiple regression for this, but I'm not sure if I'm supposed to, due to my small sample size, etc. A LOESS-curve shows a linear connection. The predictor does show small, insignificant correlations to the other variables, as r (AGE, PT) ~ .24 (not significant).

  • Can I run a multiple Regression on this?
  • If yes: Do I only include the correlations between the outcome SM and the variables which turn out to be significant, or do I use all variables (which wouldn't work due to my small sample size, right?)
  • If yes: Do i use the ENTER-Method?
  • If no: What else can I do to isolate the effect of PT?

I hope I didn't forget anything.. Sorry in that case. Would be extremely thankful if anyone can help me with this!!


No cake for spunky
You can run multiple regression on this, your power will be awful so your chance of not rejecting the null when you should will be high. It is always possible the software won't run with such few cases, all you can do is try. I am not really clear what scale data means as I dont use that term. With such few cases, below the minimum many say for the central limit theorem to apply any violation of the assumptions of the method will be more serious so be sure to check carefully for them (such as multicolinearity, normality of the residuals etc).

If your dependent variable is not interval than you have to use something like logistic regression. With so few cases I would be tempted only to use those variables that are signficant - but remember because of your extreme sample size test of significance are going to be doubtful. You might well have a signficant relationship which does not have a signficant p value simply because your sample size is so small. You should do a power test with something like G power (you will amost certainly get ugly results when you do).

If you mean step wise - this is strongly frowned on in many circles because it will produce invalid results with high multicolinearity, is purely atheoretical, and will vary sharply from sample to sample (made worse in your case I would think because you have such a small sample). It would be better to do hiearcical regression where you tell the computer what order to add the variables based on theory - with the theoretically more important variables added first.

The specific effects of PT will be generated automatically by multiple regression which statistically controls for other variables. Note if you expect an interaction effect, you have to generate interaction variables (such as predictor1timespredictor2) and see if they are signficiant. The regression will not automatically address this.
Thank you for taking the time :) It seems like I have a pretty robust effect, so even though the sample size is small, it still turns out to be significant and the effect sizes I calculated so far are pretty decent as well.

I think I will have to read up on residuals..

Sorry, by scale measure I simply mean interval data or higher.
I wasn't talking about the stepwise-method, I was using the forced-entry-method in SPSS. Is that one actually hierarchical? If not, which one is it in SPSS?

I'm investigating people with a very specific clinical diagnosis, so it's very hard to get more subjects.


No cake for spunky
I wasn't talking about the stepwise-method, I was using the forced-entry-method in SPSS. Is that one actually hierarchical?
I actually thought SPSS called forced entry hierarchichal which is why I used it. It involves entering variables in blocs with F change tests in each bloc.

I will again strongly recommend doing a power calculation - which I think SPSS will do, but which in any case you can use Gpower for.


No cake for spunky
A seperate issue from power (which is effect size) is generalizability. I would doubt if 26 cases, probably not chosen at random, could really be generalized to the larger population. That does not effect the statistics, but it does effect obviously if your results mean anything for the population. This tends to get ignored in many discussions of methods (it comes up a lot in design of experiment).