Hi,
So I'm working on a dataset named "esoph" which is provided by default in RStudio. The dataset's aim is to find if there is a correlation between age, drinking, smoking and oesophageal cancer.
There are 5 columns, 4 of which are numerical (agegp, alcgp, tobgp, ncontrols) and one which is categorical (ncases).
After doing a chi-squared test, I found that the numerical variables are not dependent at all since the p-value is very small.
After getting the correlation matrix, I noticed that all the correlations are low (below 50%)...
Therefore there are no variables that will be included for a regression model? How can one make a regression analysis on this dataset?
So I'm working on a dataset named "esoph" which is provided by default in RStudio. The dataset's aim is to find if there is a correlation between age, drinking, smoking and oesophageal cancer.
There are 5 columns, 4 of which are numerical (agegp, alcgp, tobgp, ncontrols) and one which is categorical (ncases).
After doing a chi-squared test, I found that the numerical variables are not dependent at all since the p-value is very small.
After getting the correlation matrix, I noticed that all the correlations are low (below 50%)...
Code:
# Code used to get the correlation matrix
df$agegp = as.numeric(as.factor(df$agegp))
df$alcgp = as.numeric(as.factor(df$alcgp))
df$tobgp = as.numeric(as.factor(df$tobgp))
sort(cor(df)[1,])
cor(df, use="complete.obs", method="pearson")