# Getting p-value for rsquared bootstrap in R

#### skate17

##### New Member
I've been trying to get the pvalue of my samples from a bootstrap rsquared test in R. I'm not very good with statistics so could someone please take a look at my below code and point me in the right direction with regards to how I can extract the p-values per sample?
Code:
# Bootstrap 95% CI for R-Squared
library(boot)
# function to obtain R-Squared from the data
rsq <- function(formula, data, indices) {
d <- data[indices,] # allows boot to select sample
fit <- lm(formula, data=d)
return(summary(fit)$r.square) } ###vdw beta bootstrap results[b]_vdw = boot(data=rescored_beta, statistic=rsq, R=10000, formula=Descriptor_Score~vdw) # get 95% confidence interval confb_vdw=boot.ci(results[b]_vdw) cib_vdw = confb_vdw$bca[ , c(4, 5)]
Thanks!

#### hlsmith

##### Less is more. Stay pure. Stay poor.
You dont need a pvalue if you have the 95% CI. What additional information would it provide?

#### skate17

##### New Member
I want to basically input a Descriptor_Score value and output a p-value based on the bootstrap model

#### skate17

##### New Member
You dont need a pvalue if you have the 95% CI. What additional information would it provide?
If you don't mind explaining, what is the CI-p.value similarity that you refer to?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
@skate - what i was referencing, was if you have the 95% bootstrap confidence interval on r-squared you dont need to report a pvalue, it does not provide any additional information.

#### skate17

##### New Member
@skate - what i was referencing, was if you have the 95% bootstrap confidence interval on r-squared you dont need to report a pvalue, it does not provide any additional information.
So is that the same as inputting a Descriptor_Score value and outputting a p-value based on the bootstrap model? If not, how would you suggest I approach that problem?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Why do you want a pvalue?

#### skate17

##### New Member
Why do you want a pvalue?
so I can assign a p.value to a set of drugs that have a Descriptor_Score. the p.value will be a quantified measure of the confidence in binding interaction between the ligands and the protein as a function of Descriptor_Score

#### hlsmith

##### Less is more. Stay pure. Stay poor.
I dont follow your description. It seems like you are trying to use a pvalue for an r-square d value to support a relationship ( "interaction"), but the pvalue is based on the null hypothesis not proving a relationship. That is a misinterpretation that it is proving the alternative, correct!

#### skate17

##### New Member
so how do i interpret the CIs?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Well they may not serve your need, typically you calculate r-squared or partial r-rquared. You provide 95% CIs on these estimates, representing - upon resampling from population, 95% of these intervals will contain the true r-square value (AKA variability in dependent variable explain by independent variable).

So are you regular or goofy footed?

#### skate17

##### New Member
Oh ok. So how would you suggest I approach the bootstrap to achieve the goal that I described above?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
so I can assign a p.value to a set of drugs that have a Descriptor_Score. the p.value will be a quantified measure of the confidence in binding interaction between the ligands and the protein as a function of Descriptor_Score

Can you try to rephrase this - it doesn't quite make since to me. Are you just trying to quantify the association between two variables. If so, can you tell us how they are formatted.

#### skate17

##### New Member
I am trying to develop a likelihood model to estimate a Descriptor_Score based on another score (vdw_score) and extrapolate that model to calculate pvalues for a sample with a certain vdw and Descriptor_Score based on its location within the likelihood model.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
That just sounds confusing as heck. You really can't describe it in more simple terms. How is it written in your protocol or if applicable research application. What are these scores how are they formatted can you provide a sample of the data frame with synthetic data. I have a feeling you are doing some fairly basically but describing it in a way that blurs the objective.