Research Help_Regression Analysis_Log Transformation for Better Fits


New Member
Dear All,

I'm doing a research on the estimation of "Tunnel Cost" for which I carried out some regression analysis based on data of 30 tunnel projects implemented in the USA. The dataset consists of recorded tunnel construction cost and tunnel size (i.e., tunnel diameter and length). the results of the regression analysis results showed that better correlation coefficient (R2) is obtained when the log10 of parameters are used, i.e., tunnel depth and diameter. below are some of my results for further detail.

Cost(M£)=10^(3.052+3.905Log(L*)+0.867Log(D**)..... ............R2= 0.95 ....................(1)

Cost(M£)=-301.43+53.249(L*)+ 9.341(D**).......................R2= 0.80......................,...(2)

*L is tunnel length in (km), and **D is tunnel diameter in (m)

My question is what do these results statistically mean , I mean having higher correlation coefficient with log10 of the independent variables than their original row values what does statistically mean? Can you please explain this for me , I have tried to find out what are these results statistically mean but could not find any proper explanation for these results!

Thank you very much in advance for your help and support.

Best regards,


TS Contributor
this means that the relationship is possibly non-linear. I would check the residuals and see if I have some non-random pattern in case of the simple model and that the pattern disappears with the second model.



TS Contributor
Always do a reality check. Does it make sense to have a nonlinear relationship?

In the case of diameter, the tunnel cost is probably proportional to the amount of material to be excavated and finishing material used. Therefore, cost would be proportional to the cross-sectional area which would increase exponentially with the increase in radius (i.e., diameter/2). So a nonlinear relationship with diameter makes sense.

In the case of length, the longer the tunnel, the more difficult it becomes to transport material in and out. Again, this makes sense for this to be nonlinear. Just look at the supply chain logistics during WWII, if there is any doubt.