# Physics Teacher with a question of appropriate use of regression equaitons

#### Mr. O

##### New Member
Is it ever acceptable to replace discrete measured values with a continuous function if you are certain you have the right model?

For the last week my students have been working on using experimental results and statistical tests as evidence to either accept or refute someone else s or their own claims.

They are trying to show evidence of the Law of Conservation of Mechanical Energy.
The only measured values in the entire experiment are position and time.

Our motion sensors are giving us very clean data that fit to quadratic fit with a R^2 value = 1.

Using any of the numerical methods for taking the derivative of data set leads to results that get very "wobbly" as time goes on.

However when we replace the discrete values with the continuous function returned by regression, and then use basic calculus to take the derivative we are able to get clean functions for velocity, acceleration, potential & Kinetic energy.

They are also able to clearly see that there is a loss do to air resistance, and actually solve the differential equation and measure the work done by this force.

In short as long as we give evidence that our experimental model achieved via regression is correct (R^2 values & Residuals Analysis all point very strongly to this) can we move out of the discrete domain and into the continuous domain where calculus works better and the data is far far cleaner for the students to see?

I've showed my students findings to a few other teachers and they say the kids should try and publish, but I would hate for them to be embarrassed do to us using regression and calculus if it is wrong to do so.

Any guidance would be appreciated.

#### rogojel

##### TS Contributor
hi,
I am no expert on publications but at a first glance using the model seems ok to me, you use the same parameters to calculate the speed as you used in the model for the distance after all. I.e. S= a*x^2 +b*x+c and V=2*a*x+b - so a and b determine both.

The R-squared of 1 and the wobliness of the speed make me pause a bit. Are the position-time measurements sitting very closely on the parabola ? How do you calculate the numerical derivative of the data set? Maybe that is where the problem is.

regards

#### Mr. O

##### New Member
The R-squared of 1 and the wobliness of the speed make me pause a bit. Are the position-time measurements sitting very closely on the parabola ? How do you calculate the numerical derivative of the data set? Maybe that is where the problem is.
rogojel,
Thank you for the reply.

The original data points are exactly on the line of fit. The wobbliness does come from numerical derivative to find velocity and then is compounded by the calculation of Kinetic Energy.

I've used a couple methods for the numerical derivative both backward differentiating and center differentiating they are similar to numerical integration's Riemann Sums.

backward differentiating: f'(x) = [f(x) - f(x-h)]/h

center differentiating: f'(x) = [f(x+h) - f(x-h)]/2h

Just the graphs of Position and Velocity look acceptable, however both methods lead to excessively wobbly data (I need a better term than wobbly) one the calculations for Kinetic and Potential Energy are applied.

Kinetic Energy = .5 * mass * (derivative of Position)^2

Any truncation error generated by the numerical derivative is grossly enhanced by squaring and scaling the data to calculate KE.

I've been looking unsuccessfully to other numerical differentiation methods but doing straight calculus on the continuous fit function leads to so much cleaner data.

Which leads back to the first question, Is it acceptable and appropriate to use the function returned by regression as a replacement for the actual data?

#### GretaGarbo

##### Human
Is it ever acceptable to replace discrete measured values with a continuous function if you are certain you have the right model?
Yes, of course it is. And it is done all the time.

If you have regression model: Y = a + b1*x + b2*x^2 + eps

where Y is energy and x is speed, and you have made a design with say 5 levels of speed (say v1, v2, v3, v4 ,v5) and the speed levels are carefully controlled and Y is a measured variable of energy and eps is a random error term. Then the only statement that the regression model makes is that the Y are conditional on the x-values (Y|x). So statistics and the regression model remains silent of what is happening in between the designed values. Maybe there is a continuous or dis-continuoues jump in the "true" relation for Y between v1 and v2. The statistics does not say anything about that.

But if subject matter knowledge, like physics, says that the relation is just: Y = b2*x^2 + eps so that b2 is mass/2 and a and b1 are zero, and that physics says that x - the speed - is a continuous variable and that the model is valid for all values in a interval, then you can make statements about what is happening between the design values.

This is a very common situation like it is believed that there is a smooth function between health and a drug or the yield and a fertiliser, but the smoothness of the function is about subject matter knowledge.

(But in this experiment maybe there was just measures of time and position, and it is not completely clear for me what model the R^2 is referring to.)

#### rogojel

##### TS Contributor
Mr. O!
it is clear now, it should have been obvious from the start. Your h values in the dataset are random, so of course the estimates of the derivatives will be random as well.

So, derivating the model is the only way.

regards