Whether / how to combine 2 linear equations for different variables, and weighting


New Member
I'm analysing results from measurements of 2 variables (a, b). The goal is to use them singly or together to determine a 3rd variable, c. All 3 variables are instrumented, measured data (not social science).

We made over 5000 simultaneous measurements of a & b. Our equipment software gave us a rough estimated value for c, but we need more accuracy.

Using an online database, we were able to look up accurate "c" values for 400 of our observations. It would take too long to look up all 5000.

We're trying to find the best equation to get from our measurements of a & b, to a closer approximation of c than the estimate produced by our equipment's software program.

For both a and b the data scatter plots form a clear linear slope; I don't see curves suggesting a need for data transformation (log or 1/x).

Using Excel for linear regression, we have found a linear equation to relate 400 of our measured "a" values to the actual catalogue values for c. The R2 is 0.88. It's a strong negative correlation, i.e. a negative slope.

I could use the "a" linear equation on its own to estimate "c". It is much more accurate than our software's estimate. If we can somehow include the "b" data / equation as well it might be even more accurate, and could account for what might otherwise appear to be outliers.

I similarly used linear regression with Excel on the same 400 samples to give an equation to relate "b" to the actual catalogue values for c. The R2 for this linear equation is 0.54--a lower correlation of determination (moderate positive correlation; data points don't fit the trendline as tightly).

1) How could / should I combine these 2 linear equations, to get an equation to produce the most accurate estimate of "c" from my 5000 data points for a and b? I've looked at a lot of math & stats websites, but perhaps I don't know the right terminology for my question.

2) The instrumentation for measuring "b" is not as precise as that for "a", so it isn't as sensitive to small differences. Is there some "weighting" or other modification of the 2nd equation needed, since "b" has a lower correlation to "c", compared to "a"? Where can I find information about whether / how to do this?

Thanks in advance for any suggestions.