It seems like you need a linear model, that is the "continuous" as covariates and the factors as factors.
If you show an example of some data it might be easier to suggest. What software do you use?
(Skip the principal component.)
Hello everyone,
I am a master student in Biology. I am now trying to analyse my thesis data, but I am facing a problem: I do not understand which test I should use for my large dataset containing one response variable and multiple explanatory variables (factor and numeric values!).
Please let me explain my problem in more detail:
- I have a dataset with one response variable and multiple explanatory variables. The response variable is nitric oxide (NO) production by my cells. The explanatory variables consist of factor variables (e.g. type of medium used, type of cell culture) and numeric variables (growth expressed in doubling time, general responsiveness of the cells, etc.)
- I now want to know which of the explanatory variables explain or predict the outcome of my response variable. For example, is the extent of NO production by my cells dependent on the type of medium used, or on the doubling time, or ...?
I have already been thinking of performing:
- Multiple regression analysis. The problem is here that I do not understand how to add my factor variables here. I also think that the combination of the explanatory variables are very important (so that I do not really find strong correlations between the NO production and only one other variable).
- Prinicipal component analysis. It did not work out very well, because here you can not select the response variable (or can I? I don't know how...)
- Model analysis. It would be ideal - I think - to make a model containing all the variables that have some influence on the response variable. However, I do not know how to do this with the factor variables. Somehow I might have to split my analysis in factor and numerical values.
Does anyone have any suggestion how to analyse these data? All replies are very welcome, I am getting a little desperate here..
Please let me know if I expressed myself unclear
Thanks!
Cass
It seems like you need a linear model, that is the "continuous" as covariates and the factors as factors.
If you show an example of some data it might be easier to suggest. What software do you use?
(Skip the principal component.)
Hello GretaGarbo,
Thanks a lot for your reply! Here is an example file with how my data looks like - the real dataset is much larger. The response variable is 'Response(t=1)'. I use the software R for the analysis.
What do these variables represent? (Culture Medium Response(t=0) Response(t=1) N0 Nt(ml) Nt t f td)
Is the "Response(t=0)" a variable that represent the base value? It seems natural (for me) to have the change in N-production as the dependent variable.
What are design set variables, the factors?
Did You randomize the experiment?
You might need to change the variable names if you want to read them in directly in R.
Cass123 (11-16-2015)
Ok, I hope this makes it more clear:
Culture = cells come from different cultures, so this is to know which cells belong to the same culture. I can imagine that some cultures are just 'better' than others, so this might partly explain differences in N-production.
Medium = type of medium the cells were cultured into
Response(t=0) = yes, this is indeed the 'basal' reponse (at t=0, so at the start of the experiment)
Response(t=1) = should actually be t=x, this is the N-production at t=X
N0 = cell density at the start of the experiment
Nt(ml) = cell density in one part of the protocol, hard to explain, but this is during the experiment. I think that there the cell density has some influence, but I am not sure where in the protocol it plays a role.
Nt = cell density in another part of the protocol.
t = duration of culturing
f = measurement to indicate growth speed
td = doubling time (=1/f)
So, I measured al these variables in different experiments (the same culture means the same experiment). The amount of cells is too low to perform an experiment containing all variables and for this reason there is no randomization possible. It is also not possible to keep all variables constant except the one to test, because some variation occurs naturally.
What do you mean with dependent variable? Do you mean that N-production is dependent on the basal level? I can not find a strong association between the basal level and N-production after a certain time.
I think I understand what you mean with set variables. Do you mean that for every experiment I should only change one variable to see if this one has influence on N-production? That I can not put them all together at once?
Are these "cells" something like bacterial cells? It is not human or animal cells?
Is this a situation when the bacterial cell number increases, then the production of N, Nitrogen, will increase?
By dependent variable I mean the response variable, the Response(t=1).
The number of cells (if that is Nt or NT(ml) or if is N= Nitrogen) can be an explanatory factor= "an independent" variable. And culture and medium can be other explanatory variables.
Cass123 (11-20-2015)
I am sorry I could not reply earlier, but I really appreciate your help.
Ah I see what you mean! Yes, maybe it is better to take the change in N-production as the response variable instead of the N-production at t=X. I will try whether this makes more sense, thank you!
These cells are animal cells. About your second question, I think that it is the other way around, when the cell number increases, the N-production decreases. However, when de cell numbers are very low, N-production is also very low. In other words, there is not a linear relationship, but I am 90% sure that there is an association (such as an optimal cell density, but the optimum differs per animal). It is due to this complexity that I don't know how to analyse the data, I have mostly learned about linear correlations. Anyways, Nt and Nt(ml) is the number of cells indeed; the 'N' does not stand for Nitrogen.
So, you you think there is a way to implement all these factors and explanatory variables into one model or should I test for correlations/associations of the variables with N-production seperately?
But the dependent variable is:Anyways, Nt and Nt(ml) is the number of cells indeed; the 'N' does not stand for Nitrogen.
So here "N" is nitrogen and "O" is oxygen. And the NO is the response variable.The response variable is nitric oxide (NO) production by my cells.
So there are a number of animal cells Nt, and in their metabolism they produce NO?
Maybe there is an amount of NO production in a cells normal life, even if the cell is not dividing?
Maybe there is an other amount created when the cell divides?
This would mean that there are two effects to estimate. One that is dependent on the level of Nt, and one that is dependent on the slope (of the growth curve).
Is there an S-shaped growth curve for the Nt over time?
Almost all answers on this site is about translating a verbal description to a mathematical model (or description) and to formulate an estimation method and testing method for that mathematical model.
If you write down your mathematical model yourself it will be easier for the rest of us to suggest something.
Tweet |