Math behind Pareto chart of standardized effects

#1
I would like to understand the math behind the graph that Minitab computes explained here, which I have encountered a couple of times in publications regarding regression modeling such as example1, example2.

According to the Minitab website, the bars represent each term in the regression equation, with their effect size in the model. In addition, the graph has a vertical line that shows a reference score, of which variables with values bigger than that are significant.

I believe the variable values are defined or related with T-Statistc (t) value computed for each model term, such as this. I came to that conclusion since they match perfectly in the number but not sign. The image below taken from example1, ilustrates that.

1659012704018.png

I still dont know where the standardize word come into effect, e.g. are the T-Statistics standardazied after being computed ? Also is standardized coming from effect size defintion or z-score conversion ?

I believe the reference score comes from the Student-T table critical values, since all the graphs I inspected so far have their values covered in the two-tail student-t table. Yet still unsure how they are being looked up e.g. how the degrees of freedom are being defined for each case.

Any help is appreciated.
 

Dason

Ambassador to the humans
#3
The plots you're seeing just look to be using the absolute value of the t-statistic. These statistics don't change depending on if you're using standardized data or the raw data so that part doesn't really matter. It looks like they're just using "standardized effect" to mean that the sign of the statistic doesn't matter.

I'm not sure what you're talking about in terms of degrees of freedom because this looks to be a case of multiple linear regression and all the actual estimated terms will have the same degrees of freedom.
 
#4
The plots you're seeing just look to be using the absolute value of the t-statistic. These statistics don't change depending on if you're using standardized data or the raw data so that part doesn't really matter. It looks like they're just using "standardized effect" to mean that the sign of the statistic doesn't matter.

I'm not sure what you're talking about in terms of degrees of freedom because this looks to be a case of multiple linear regression and all the actual estimated terms will have the same degrees of freedom.
The degrees of freedom I was refering were from the the Student T table. For finding the critical point you need to calculate the degrees of freedom. I assumed the reference line value (2.571 in my example) that the graph uses to tell whether something is significant or not came from the student T table. In your view, how do you compute the degrees of freedom from a regression model ? like: degreee = Observations - features ?

Do you have any clue why they call this standardize ?
 
Last edited:

Dason

Ambassador to the humans
#5
Right. But the degrees of freedom for every single 'feature' is the same so why do you think it matters here?
 
#6
Right. But the degrees of freedom for every single 'feature' is the same so why do you think it matters here?
My guess for the origin of the reference value e.g. 2.571. is from the the T Distribution table link. In that table the first colum states the DF which you need to have computed from your anylses, as to find the critical point. I think those values are coming from there, since every graph that I checked had their reference value also present in the T-distribution table.