Regression model when correlations are low

StatIsCool

New Member
Hi,

So I'm working on a dataset named "esoph" which is provided by default in RStudio. The dataset's aim is to find if there is a correlation between age, drinking, smoking and oesophageal cancer.

There are 5 columns, 4 of which are numerical (agegp, alcgp, tobgp, ncontrols) and one which is categorical (ncases).

After doing a chi-squared test, I found that the numerical variables are not dependent at all since the p-value is very small.
After getting the correlation matrix, I noticed that all the correlations are low (below 50%)...

Code:
# Code used to get the correlation matrix
df$agegp = as.numeric(as.factor(df$agegp))
df$alcgp = as.numeric(as.factor(df$alcgp))
df$tobgp = as.numeric(as.factor(df$tobgp))

sort(cor(df)[1,])
cor(df, use="complete.obs", method="pearson")
Therefore there are no variables that will be included for a regression model? How can one make a regression analysis on this dataset?

Dason

Ambassador to the humans
After doing a chi-squared test, I found that the numerical variables are not dependent at all since the p-value is very small.
Either you wrote something wrong or are misinterpreting p values.

StatIsCool

New Member
I did 3 chi-squared test:

Code:
CHI = chisq.test(table(df$agegp, df$alcgp))
CHI

CHI = chisq.test(table(df$agegp, df$tobgp))
CHI

CHI = chisq.test(table(df$alcgp, df$tobgp))
CHI
I found a p-value of 1, 0.9999 and 0.9999 respectively. I then checked on the chi-squared probabilities table with the specific degree of freedom and found that all my p-values are way below. I interpreted that as my variables are not dependent. This is how I understood the chi-squared test.

Am I misinterpreting p-values?

Dason

Ambassador to the humans
No. But you said you had small p values and those are quite large.