Can you define what you mean by polynomial regression. I can think of about three different scenarios a person may use that phrase with.
So I took a lot of data (~10000) on a machine (3D printer). For each reading, I got the error and a lot of different parameter (speed, voltage, angle, temperature etc.).
My first analysis showed me that none of the parameter clearly explain my error, but a correlation is clearly visible (Coefficient of determination around 50% with Polynomial regression or Savitzky–Golay filter) with many of those parameter.
But even if each of those individuals parameters doesn't explain my error perfectly, I highly confident a combination of 2-3-4 parameter would have a really high r2.
My question is, how can I fit a Regression analysis with multiples independent variable? What are your suggesting?
Thanks.
Last edited by Elok; 11-18-2014 at 02:22 PM.
Can you define what you mean by polynomial regression. I can think of about three different scenarios a person may use that phrase with.
Stop cowardice, ban guns!
Just found out I mixed dependant and independent.
Anyway, basically I put my dependant variable (Error) on the y-Axis and one of my independent variable on the X axis to get a scatter plot (Variable VS Error). Since the correlation isn't linear but clearly exist (example), I decided to use polynomial regression to fit a high-order polynomial to my case. This exercise proved to me that many different variable was influencing my error.
Now, I would like to make the same analysis with 2 independent variable (example). I'm looking for help for a multi-variable regression analysis in this scenario.
Last edited by Elok; 11-18-2014 at 02:39 PM.
Are you calling this polynomial because you logged variables or did you transform via higher powers (squared, cubedd, etc.) some variables, if so, described how model variables were transformed and why.
Stop cowardice, ban guns!
Before we go further that road, I only used polynomial to prove to myself that many independent variable was the source of my error and discarded those who wasn't.
All I'm asking here is how to do regression analysis with 1 dependant variable (error) and many independent variable causing the error (speed, voltage, angle, temperature etc.).
No problem. If someone does not reply, I will in a bit (I have something I need to do). I think you will be better off not call the variable error, 'error'. I keep thinking you are referencing the error term in your model, not a quantified error in your printer. Can you tell us about the variable used for printer "ineffectiveness" AKA error?
Stop cowardice, ban guns!
Well, I'm working on something complex and since English isn't my primary language, I try to vulgarize as much as possible.
To be more precise, I should have said "positional error" instead of error which are three dimensional (X, Y, Z). You mix in that another couple factor and I got over 10 dependants variables. But since I'm planning to attack each one of those individually, I decided to talk about only one 1 dependant variable as general "Error" in my question.
Hmmm, do you have experience with multiple linear regression?
Stop cowardice, ban guns!
Multiple linear regression is a very common technique that uses one dependent variable and multiple independent variables. It is available in any statistical package and you can learn about it in any book on elementary statistics or many just on regression (One good one is Regression Analysis by Example by Chatterjee et al, but there are tons of good ones).
However, once you start adding multiple polynomial terms, things get complex to interpret. Depending on the order of the polynomials it might be better to use some sort of splines.
It might also be worth it to you to hire a consultant; learning to do multiple regression well will take quite some time. You can't just plug things in and expect good results, you need to check assumptions and so on.
Also, if "error" is a count (that is, number of errors) you might need to use a count regression model such as Poisson or negative binomial regression
No "Error" is a value. For instance, I got a 5mm positionnal error (0mm on the X axis, 3mm on the Y axis, 4mm on the Z axis). Same for my parameter (Speed was 10m/s, Temperature 120 degree etc.).
Well I got the basic about multiple "line" regression. Simple, but not useful for my case.
Thanks but no I cannot hire a consultant and yes I do have some time to learn "non linear" multiple regression.
So far I found this page that give me a good hint about the thing I need.
Last edited by Elok; 11-19-2014 at 08:13 AM.
You may not need to have polynomials or transformed variables in your model. They are typically included to address non-normally distributed errors in the model (heteroscedasticity) or to try to get better predictive power. I would start with a basic model with all of your variables and then start test the model assumptions.
Stop cowardice, ban guns!
If I understand what you said correctly, yes my data are "heteroscedastic" and I need the biggest predictive power possible. I already tried a multiple "line" regression and the result are awful. Far worse than a simple high order polynomial fitting with my most significant independent variable.
As I said before, the effect of all my independent variable on my dependent variable are non-linear. Here's an example :
Speed < 5m/s : Positionnal error is large
5m/s < Speed < 6m/s : : Positionnal error is small
6m/s < Speed < 7m/s : : Positionnal error is larger
7m/s < Speed < 9m/s : : Positionnal error is null
9m/s < Speed < 10m/s : : Positionnal error large
etc.
Of course, I just invented this example, but it show the non-linear link between my dependant and independent variable. A polynomial fitting is giving me good result (R2 around 50%, Correction of error around 30%), but I'm sure I can go way higher but I need to take all my independent variable into account.
Today I will start by doing polynomial fitting into succession but I'm not confident it'll give me the best result :
Dependent variable 1 + Independent variable 1 = Result 1
Result 1 + Independent variable 2 = Result 2
Result 2 + Independent variable 3 = Result 3
etc.
Last edited by Elok; 11-19-2014 at 11:29 AM.
Are you also examining the partial R^2 each variable is contributing and also examining colinearity between variables?
Stop cowardice, ban guns!
Short answer, no. I'm not familiar with partial r².
EDIT : After a little search (link), is seem to give about the same info as plain r². Little note : I'm looking for the biggest predictive power possible, not the biggest correlation possible. In my case, even a low r² can give me a huge correction. After all, AFAIK r² only give an indication on how my variables are linked.
You mean looking if two variable effect are so similar that it's useless to use the two of them? Yes already did and the colinears(?) independents variables have already been discarded. The remaining independent variable aren't linked.
Last edited by Elok; 11-19-2014 at 11:41 AM.
Tweet |