Dear all,
I would like to know how to organize the datasheet to import in R for survival analysis (Surv object, logrank test and Coxph).
Let's consider an experiment with small animals. A cohort of 600 individuals is being followedup every two days for 6 days (so I have data at day=0, 2, 4, 6). They are divided into 4 conditions (4 exposures). Each condition (refered as A, B, C or D) has whether no pharmaceutical (A), or one pharmaceutical (B), or another pharmaceutical (C), or the combination of both (D). The experiment is equilibrated so there is 150 animals per condition.
In my datasheet, for each day I include ALL the 600 animals with their survival status as alive (=1) or dead (=2). So in total I have 600x4= 2400 lines. Once an individual is dead, let says on day 2, then I will still include it at the day 4 and 6, with the same "dead" status. But I wonder if R takes into account the individual or not at all, in such case I should remove the individual "already" dead at further dates?
Please find attached my datasheet and below my Rscript. I wonder if this is the correct format to input to then proceed to analyses (logrank and coxph models).
> Also, I have troubles with the censored signs on my graphs and so it lets me think something is wrong...... (see my code below)
Thank you very much for your help, Very much appreciated.
Kind regards,
R.
library(survival)
library(ggplot2)
library(ggpubr)
library(survminer)
a<read.delim("surv.txt")
str(a)
# create a Surv object
survobj < Surv(time = a$time, event = a$survival, type='right') # Important about the "survival" variable: 2= dead, 1=alive in the notation
# Plot survival distribution of the total sample
# KaplanMeier estimator
fit0 < survfit(survobj~1, data=a)
fit1 < survfit(survobj~exposure, data=a)
plot(fit1, xlab="time (days)",
ylab="Survival rate", col=c("black","green", "orange","red"),
main="Survival curves")
legend("bottomleft", title="Exposures", c("A", "B", "C", "D"),
fill=c("black", "green", "orange", "red"))
## Why there is no censored sign on the graph??
summary(fit0)
summary(fit1)
summary(fit1)$table
d < data.frame(time = fit1$time,
n.risk = fit1$n.risk,
n.event = fit1$n.event,
n.censor = fit1$n.censor,
surv = fit1$surv,
upper = fit1$upper,
lower = fit1$lower
)
head(d)
write.csv(d, "d.csv")
ggsurvplot(fit1,
pval = FALSE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = c(1,2,3,4), # Change line type by groups
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("black", "grey", "orange", "red"),
cumevents=FALSE,
cumcensor=FALSE,
tables.height=0.5,
surv.plot.height=10,
ncensor.plot.height=10)
## I don't understand why there are "censored" signs at intermediate days instead of the last day only
## I don't understand why in the table at 0 for example it is 600 for each of the group instead of 150?
I would like to know how to organize the datasheet to import in R for survival analysis (Surv object, logrank test and Coxph).
Let's consider an experiment with small animals. A cohort of 600 individuals is being followedup every two days for 6 days (so I have data at day=0, 2, 4, 6). They are divided into 4 conditions (4 exposures). Each condition (refered as A, B, C or D) has whether no pharmaceutical (A), or one pharmaceutical (B), or another pharmaceutical (C), or the combination of both (D). The experiment is equilibrated so there is 150 animals per condition.
In my datasheet, for each day I include ALL the 600 animals with their survival status as alive (=1) or dead (=2). So in total I have 600x4= 2400 lines. Once an individual is dead, let says on day 2, then I will still include it at the day 4 and 6, with the same "dead" status. But I wonder if R takes into account the individual or not at all, in such case I should remove the individual "already" dead at further dates?
Please find attached my datasheet and below my Rscript. I wonder if this is the correct format to input to then proceed to analyses (logrank and coxph models).
> Also, I have troubles with the censored signs on my graphs and so it lets me think something is wrong...... (see my code below)
Thank you very much for your help, Very much appreciated.
Kind regards,
R.
library(survival)
library(ggplot2)
library(ggpubr)
library(survminer)
a<read.delim("surv.txt")
str(a)
# create a Surv object
survobj < Surv(time = a$time, event = a$survival, type='right') # Important about the "survival" variable: 2= dead, 1=alive in the notation
# Plot survival distribution of the total sample
# KaplanMeier estimator
fit0 < survfit(survobj~1, data=a)
fit1 < survfit(survobj~exposure, data=a)
plot(fit1, xlab="time (days)",
ylab="Survival rate", col=c("black","green", "orange","red"),
main="Survival curves")
legend("bottomleft", title="Exposures", c("A", "B", "C", "D"),
fill=c("black", "green", "orange", "red"))
## Why there is no censored sign on the graph??
summary(fit0)
summary(fit1)
summary(fit1)$table
d < data.frame(time = fit1$time,
n.risk = fit1$n.risk,
n.event = fit1$n.event,
n.censor = fit1$n.censor,
surv = fit1$surv,
upper = fit1$upper,
lower = fit1$lower
)
head(d)
write.csv(d, "d.csv")
ggsurvplot(fit1,
pval = FALSE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = c(1,2,3,4), # Change line type by groups
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("black", "grey", "orange", "red"),
cumevents=FALSE,
cumcensor=FALSE,
tables.height=0.5,
surv.plot.height=10,
ncensor.plot.height=10)
## I don't understand why there are "censored" signs at intermediate days instead of the last day only
## I don't understand why in the table at 0 for example it is 600 for each of the group instead of 150?
Attachments

62.8 KB Views: 1