Multivariate data set of two groups: finding which variables are statistically diff.

mgroves

New Member
I have a data set consisting of two different groups (A and B) each with 5 specimens. We tested 73 different variables in each specimen and recorded the data. I am new to statistics, so I am struggling to find the best method(s) in which to analyse the data so that I can find which variable(s) is/are statistically different between the two groups. I do not know which variables are independent, dependent, or some mixture there of. I am using R to do the analysis on this data set, but I can fingure out how to use R once I figure out what tests and steps need to happen statistically first.

mgroves

New Member
Re: Multivariate data set of two groups: finding which variables are statistically di

Quick addition, all of the data recorded are numerical noninteger values. So, I have a 10x73 matrix: first 5 rows for group A and last five rows for group B.

Junes

Member
Re: Multivariate data set of two groups: finding which variables are statistically di

What do you want to accomplish? It's usually considered a bad idea to do significance testing without defining hypotheses or research questions. If you test for 73 variables at once, there's a large chance that the "significant" difference is just a coincidence since you're doing so many tests.

If you really have no clue and just want to explore the data, a better idea would be to do a descriptive analysis on these data, formulate hypotheses based on the results (and theory). And then test these in a new study. You will also need a larger sample size then; n=5 is very small.

Last edited:

mgroves

New Member
Re: Multivariate data set of two groups: finding which variables are statistically di

Honestly, the only information that was given to me was the spread sheet and the request to find the variable(s) that are statistically different ("statistically significant" was also used in my conversation about the data). I honestly don't even know what the values in the spreadsheet represent specifically. This is why I'm so frustrated. I feel like I wasn't given enough information about the data to do what has been requested.

T-tests were discussed as a way to find out if one of the variable's was statistically significant, but then the question was "How would I determine the significance for the rest of the variables?" Again, I have no clue about independence between the variables.

One suggestion that was given to me from a friend is that I do a correlation analysis on the variables to find which ones are dependent and which are independent. I'm not sure if that will work, but even if it does, how would I find the significance of the dependent variables? As I understand, t-tests only work for independent variables. I know there is a paired t-test... would that work?

Junes

Member
Re: Multivariate data set of two groups: finding which variables are statistically di

So what is it? A homework assignment, something you have to do for your job?

This is just silly and if it were up to me, I would refuse to do it and go back to whoever gave you these data. You don't have enough information. Frankly, you don't have any. Whatever results you produce will likely only misinform whoever gave you this.

There is no way to determine what variable is independent or dependent. This is something that can only be hypothesized on a theoretical basis. Statistics can't help you there, since it can't establish causation. However, since you're looking for the difference between two groups, however gave you these data is possibly assuming that the grouping variable is the independent variable.

And like I said, doing 73 tests without any plan or information is simply a bad idea. Whoever gave you this assignment should reconsider.

Last edited:

Junes

Member
Re: Multivariate data set of two groups: finding which variables are statistically di

But anyway, if you absolutely must do this (which I would really advise against), I think the only way to do it would be Mann-Whitney U tests (correlation wouldn't work since you're looking for differences, not correlation). A t-test wouldn't work since you can't assume normality with such a small sample size.

Even though the power of the tests will be very weak with a sample size of 5, you will probably have false positives, possibly many. To prevent that you could do a Bonferroni correction which would very likely leave you with no significant results at all.

But again, this is just really, really silly and you really shouldn't do it.

mgroves

New Member
Re: Multivariate data set of two groups: finding which variables are statistically di

HAHA, that is what I have been thinking for a week! One major problem is that I am not sure what questions to ask in order to get more information about the data set. I'm not sure what all I need.

I am pretty positive that the two groups are independent, but I am unsure about the variables that were tested. Also, I think the values in the spreadsheet are abundances.

I will try your suggestions and see what I get. This assignment is for a new job. He is testing me. I'm failing obviously, but this is because I'm not a statistician (I'm a mathemtatician) and he knew this when he hired me.

noetsi

No cake for spunky
Re: Multivariate data set of two groups: finding which variables are statistically di

In theory you would have a theory of which variables are signficant ahead of time. Otherwise familywise error, using the same data for multiple test, might lead to some variables being signficant (or not significant) just by chance as noted above. Admitedly you commonly won't have a theory of course especially with so many variables.

All the techniques I was going to suggest won't work with five cases. If you have more data you might try EFA and then run the factors in the regression model.