# Thread: (SPSS) External validation for a linear regression model

1. ## (SPSS) External validation for a linear regression model

Hello,
I am investigating relationships of continuous variables and I am aiming at getting a model that can be generalized for the study. My DV is indoor pollutant and selected IVs are behaviors on ventilation devices and relative humidity. I did Pearson correlation first and got the indication of significance on those IVs. I did curve fit on those IVs too prior to conduct the multiple linear regression to find if those IVs are not linearly correlated. Having checked that linear assumptions on them all are valid then I did regression. So using IVs that indicated p-value of < 0.05 (from Pearson correlation), the linear regression model was built. Now, I have another set of raw data from other project that I could use for external validation (same variables as per model). I checked this separate set of data and unfortunately, it did not give me the same outcome on significance from Pearson correlation. Anyway, is this important at all prior to validate the model? Also as this is not cross validation (so theory on random select case is not suitable), do you guys mind sharing with me on how to conduct this on SPSS please? I checked theory on comparing Rsquare back to the model, but how to do it using this external data on SPSS?

As a background information, I am at the beginner level on SPSS, so I don't have any knowledge on programming to use the free R software.

With all my best wishes to you all in the New Year 2015 too.

2. ## Re: (SPSS) External validation for a linear regression model

The linear assumption is on the error terms in the regression model.

Can you create a table of descriptive stats to compare the samples to see where they differ. Also how might these samples differ from the population.
How where the two samples collected (convenience samples) ?

Perhaps you need to establish inclusion and exclusion criteria to make a model generalizable!

3. ## The Following User Says Thank You to hlsmith For This Useful Post:

wid (01-03-2015)

4. ## Re: (SPSS) External validation for a linear regression model

Hello hlsmith, thank you for your response.
I checked the linear assumption with residual plots from bivariate regression.
Both samples are ad-hoc samplings from 2 different cities, as the measurement couldn't be taken place without occupants' permission and they are random.

Do you mind clarifying which information from descriptive stats for comparison please?

5. ## Re: (SPSS) External validation for a linear regression model

Comparing the covariates that will be introduced into the model and any others ot interest (e.g., means and percentages, etc.).

Understanding what is different about the samples.

6. ## The Following User Says Thank You to hlsmith For This Useful Post:

wid (01-03-2015)

7. ## Re: (SPSS) External validation for a linear regression model

Below is the descriptive information on data to build model (63 case studies)

Statistics
Window Door Pollutant Humidity
N Valid 64 64 63 63
Missing 0 0 1 1
Mean 229.1149 7152.3175 1344.92 49.84
Std. Error of Mean 62.22448 1226.98730 74.156 .875
Median 2.6760 139.9100 1143.00 50.00
Std. Deviation 497.79582 9815.89839 588.595 6.943
Variance 247800.676 96351861.201 346444.332 48.200
Skewness 2.075 1.144 .770 .034
Std. Error of Skewness .299 .299 .302 .302
Kurtosis 3.405 .778 -.375 -.046
Std. Error of Kurtosis .590 .590 .595 .595
Range 1958.04 37732.80 2170 32
Minimum 1.96 48.00 465 33
Maximum 1960.00 37780.80 2635 65

The following is the descriptive information for validation (39 case studies)

Statistics
Window Door Pollutant Humidity
N Valid 39 39 39 39
Missing 0 0 0 0
Mean 157.4999 7799.9713 1350.46 45.33
Std. Error of Mean 58.63950 1500.63454 94.108 1.313
Median 3.0300 100.1200 1151.00 45.00
Std. Deviation 366.20359 9371.45971 587.705 8.199
Variance 134105.068 87824257.039 345396.676 67.228
Skewness 1.998 .379 1.405 .118
Std. Error of Skewness .378 .378 .378 .378
Kurtosis 2.099 -1.959 1.224 -1.014
Std. Error of Kurtosis .741 .741 .741 .741
Range 1037.99 18840.34 2487 29
Minimum 2.54 50.06 607 31
Maximum 1040.53 18890.40 3094 60

I also did interaction on window*door to check ( using training data set 63 case studies). Although there is a correlation but result from linear regression analysis did not give any significance. So I did not include this in the model.

8. ## Re: (SPSS) External validation for a linear regression model

Did anything pop out at you presentation ofdata was a little difficult to read) ?

Also were variable effects comparable between the models, perhaps they may have less statistical power?

9. ## Re: (SPSS) External validation for a linear regression model

Yesterday, I figured out that samples that aren't used in developing the model (in this case 39 cases) are used to get the final model ignoring p-values and adopted the regression coefficients for the generalization. Is this the right path to arrive at the answer? Hope anybody can clarify this. Many thanks

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts