# Accuracy of a transformation of a data set

#### LeonJ

##### New Member
Hello,

I'm very new to statistics and ran into a problem that I don't know the answer to. I want to do a capability analysis, but my data set is not normally distributed, so I transformed it using the Johnson transform in Minitab. Now I had a normal distribution and very good CP, CPK values. However, when I also approximated another function to the original data, the values of CP,CPK were lower by a factor of 10.
Now my question is how accurate is the Johnson transform. When can I use them and when not? My P-value was always well above 0.05 for the approximation and transformation. The size of the data set is n=40.
I would say the approximation is way better than the transformation to describe the process capability of the data, but I can't say why.   left: original dataset
middle: transformed dataset (CP= 1,78)
right: Approximation of a Weibull function to the original dataset (CP= 0,88)
(Lower and upper spec should be the same for approxiamtion and transformation, if i havent done a mistake there)

Maybe you can help me with that.
Thank you very much
Leon

#### katxt

##### Well-Known Member
How big are your samples? Small samples from normal data can give histograms that look like your raw data chart.

#### Karabiner

##### TS Contributor
Just out of curiosity, is it required for capability analysis that the sample data are normally distributed?
How will the transformed data be interpreted?

#### LeonJ

##### New Member
How big are your samples? Small samples from normal data can give histograms that look like your raw data chart.
if I understood the question correctly, the samplesize of each test is about 40

#### LeonJ

##### New Member
Just out of curiosity, is it required for capability analysis that the sample data are normally distributed?
How will the transformed data be interpreted?
As I understand it, the data does not have to be normally distributed. You can also carry out a non-normal distributed-capability analysis, this would be one of the above-mentioned approximations of other functions to the actual data.
But my question here would be, if the transformation delivers better values according to the p-value, can I also use them, do I even have to, even if the CP values are different by a factor of 10? Where does the error come from, or is a transformation ultimately just an embellishment of the values and my real process would ultimately fail despite good values on paper?
“How will the transformed data be interpreted?” that's my problem I don't know how Minitab or other software interprets this.