# Adequate subsample size to represent sample regression

#### toasapre

##### New Member
I am performing an engineering analysis to find out the effect of earthquake on concrete buildings. Analysis applies earthquake forces on building model for 50 earthquakes, resulting into separate 50 analyses. Earthquake intensity is independent variable of study. building drift, extracted from analysis, is dependent variable. Regression on (drift, intensity) result using OLS regression for 50 results, gives simple linear relationship: ln(drift) = β0+β1*ln(intensity). This log linear relationship is well established in field of earthquake engineering. Each analysis is a time consuming process because it involves material non-linearity and dynamic analysis. Hence I am looking for a method that can check whether the regression using lesser number of analysis results (say 35 or 40) is statistically similar to 50 sample regression. I came up with following process -
1) generate 10000 samples of 50 data points by replacement. regress all 10000 samples and find out mean and variance of intercept and slope. lets say its β0_50 SD_β0_50 and β1_50 SD_β1_50.
2) generate 10000 samples of 45 (or any other number less than original data size) data points with replacement. regress these 10000 samples and obtain β0_45 SD_β0_45 and β1_45 SD_β1_45.
3) Compare β0_50 with β0_45 and β1_50 with β1_45 using z-statistics.
Please suggest if such an approach makes sense statistically. I am an engineer by training and not sure if I violated some basic principles of bootstrap or z-statistics comparison.
Any suggestion about the standard process/tests to establish adequate subsample size is welcome.