# Use of the g-formula to estimate the x change after x years in x sample in R

#### Tofuliii

##### New Member
Hi guys,

New here. Terrible at stats and made a terrible decision to take stats module and have never felt so stupid. I know R well but not for pure stats stuff like this and more for general data analysis.

I need to do an analysis in R for the above. Along with answering:

if had nobody quitted x, b) had all the individuals quitted x. What is the average causal effect of x on x change after 5 years?

Would really appreciate any help. Tried searching everywhere and can't understand any of the info online. The simpler someone can explain it to me, the better. Thank you in advance!

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Can you post the question exactly as it was posed to you? Your version reads crappily. Also, were you given a dataset? What 'module' is this for? Typically g-formula consist of assigning everyone to a group then to the other and then find the difference.

#### Tofuliii

##### New Member
Can you post the question exactly as it was posed to you? Your version reads crappily. Also, were you given a dataset? What 'module' is this for? Typically g-formula consist of assigning everyone to a group then to the other and then find the difference.
Hi,

Thanks for the reply.

Use of the g-formula to estimate the BMI change after 5 years in this sample, a) had nobody quitted smoking, b) had all the individuals quitted smoking. What is the average causal effect of smoking cessation on BMI change after 5 years?

Yes I have a dataset. This question was part of an analysis of a set of health records data.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Have you all started talking about doubly robust or not? SO, propensity model and outcome model. Also, do you have BMI data for two time points or is it formatted as one variable (5-year change in BMI)?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
OK, I was hoping you would reply with the details. But if it is time fixed data where you have smoking status and another variable that is the five year change in BMI. What you would typically do is have three copies of the data file.
-First copy is exactly the same as the original.
-Second copy is the same as the original but you leave change in BMI as missing and make all smoking status equal
"no".
-Third copy is the same as the original but you leave change in BMI as missing and make all smoking status equal 'yes'.

Now you fit a linear model then use those coefficients (beta's for smoking) to score the two other datasets. Lastly you find the average of the predicted estimates for the second and third data sets individually and subtract these two averages. This gets you the average treatment effect for smoking on 5-year change in BMI. If there were other covariates you needed to control for, you just control for them in the regression model as well and also use them when scoring the second and third datasets. I believe you usually need to use bootstrapping to get the confidence interval on the ATE. I would need to check on that last part.