Currently I am modeling a multiple regression for a research project with Stata.

I want to examine the influence of patents/innovation and output on the prices of batteries over the last 30 years.

Therefore, I logarithmized inflation-adjusted prices, cumulative output and cumulative patents.

Both independent variables explain the price degression themselves quite well (Rē: 0.94, 0.98). When I include both as independent variables Rē increases to 0.99.

I tested the correlation of the independent variables and it resulted in r=0.9752.

Further, I calculated the variance inflation factor. It amounts to 20.

Therefore, in my opinion it is obvious that I have to care about multicollinearity. One solution is to add both independent variables together. That does not work in my case, as I want to identify the impact of those on the price.

Literature suggest a two-step regression approach were the correlations are removed by using a residual variable. However, I do not understand what to do exactly.

I would really appreciate your comments and your help.

Best,

Anton ]]>

I have been searching for a while without finding a solution, so I hope someone here can help.

Here is my problem:

I have (X, Y) data points. X is a roughness parameter for which I have several measurements, Y is the area of cells when they grow on a surface with the roughness X, for which I also have several measurements. Therefore for each pair (X;Y), X and Y are the mean values of a set of measurements, with standard deviations Sx for the X values and standard deviations Sy for the s values. One comment (I don't know if it matters): Sy does not correspond to an uncertainty in the measurement of a single cell, but to the variation of the area in a population of cells

To show the correlation between X and Y, I would like to perform a linear regression taking into account the standard deviations. I think I found a way to do that, using the approach described in the attached PDF.

First question: is that approach OK in my case ?

Second question : I would like to compute 90% confidence bands. Is there a simple way to do that ? I found the answer only for the case of a simple linear regression (with no standard deviations associated to X and Y measurements)

Thank your your help,

Quentin

I was wondering if anyone can assist me with this issue.

I am building a logistic regression model to predict purchase or not purchase based on web site behaviour data.

One of the factors that I would like to include in the model is the visits to purchase and the days to purchase. The problem that I have is that in the case where the visitor has not purchased both of these are null. My first approach was to fill the null values with 0 but the resulting model looks too good to be true as the visits to purchase is the single biggest factor in the model.

When I run this through Python/Scipy with null values I get a problem with the message "LinAlgError: SVD did not converge" so I expect that I need to give these a value. I know that this is not a python forum but my question is more general for logistic regression models rather than a code related question.

I would greatly appreciate any assistance that the experts on this forum could provide.

Best

Rod ]]>