# What type of distribution is?

#### etelebaxi

##### New Member
Usually results of direct measurements are normally distrubuted. Also, if we make linear transformation on result. e.g. if X it is a normally distrubuted random variable, also Y=a+bX also it is.
Statement is not any more thrue, if we make exponential or logaritmic transformations. e.g.

if Y has a normal distribution, then Y= exp(X) has a log-normal distribution. (http://en.wikipedia.org/wiki/Lognormal_distribution)

I am interested in the opposite situation. Usually we determine absorbtion by the help of the Beer-Lambert law, measuring intensity of transmitted light.

A=ln(Io/I), where A = absorbtion, Io = the intensity of the incident light, and I= the intensity of transmitted light) I and Io are directly measured, and normally distributed.

My theoretical problem is, what type of distribution has A (the absorbtion)?

More general formulated the question it is: if X has a normal distribution, what type of distribution has Y=lnX?

Thanks for any information. Ican not make further research, because I do not know the name of the distribution (it is not lognormal!!!!!).

Last edited:

#### etelebaxi

##### New Member
hi,
seems to be a difficult one:

I guess for any practical situaton a simulation would be the best approach.

regards
rogojel
Oh, so it has no name... I made desperate researches to find it between wellknown distributions. Can you help me to calculate the median, the average and the mode, or it is possible only by simulation? I have tried, but my method it was too imprecisse for a scientific work. I have studied mathematics more than 30 Years ago, I am a chemist, and I want to prove, that measured absorbtions, [using the Lambert Beer law: A= ln(Io/I) where Io, and I ate the intensity of incident and transmitted light (direct measured)] have not gaussian (normal) distribution, however everybody in the scientific literature considers being normal one.

#### BGM

##### TS Contributor
Several issues:

1. In statistical hypothesis testing, there is some normality test. So you may use it to test the collected data

$$A = \ln\left(\frac {I_0} {I}\right)$$

is consistent to a Gaussian distribution assumption or not.

2. Theoretically speaking, if $$I$$ follows a normal distribution, then it will have a non-zero probability of being negative, and you know the domain of a logarithmic function does not including the non-positive region. Unless the mean of the variable is sufficiently large, the variance is sufficiently small such that using a truncated normal distribution will be suffice in your modelling assumption.

3. After clearing these issues, we can definitely calculate those summary statistics as you wish. (at least to estimate it approximately)

#### rogojel

##### TS Contributor
hi,
just a quick thought, the median will be the log of the original median because the log transformation preserves the order.

I ran a few simulations of a normal distribution N(100, s) where s was 10 and 20. It looks like the distribution of the logs will be left skewed ( no big surprise knowing the log curve,) and the skewness will be larger for larger std dev in the original. Also, the kurtosis will be a lot higher then in the original normal.

The mean stayed pretty close to the log of the original mean though, for practical purposes.

I hope this helps a bit.

regards
rogojel

#### GretaGarbo

##### Human
Could it be that you are simply interested in a calibration experiment like:

Absorbance = a + b*Concentration +(random error)

(This is a long shot )
(Maybe there should be log(absorbance) and log(concentration here.)

So that the absorbed light is linearly related to the concentration of the substance. In such experiments the concentrations are fixed numbers, set by the experimenter. Then the usual assumption in such experiment, is the the random error is normally distributed and that the absorbance variable is also normally distributed, given the concentrations levels. (Maybe there is increasing variance in the random error as concentration increases.)

Maybe you want to do a number of calibration experiments, and then with the estimated "a" and "b" parameters, use that in new measurements of the absorbance and thereby get an estimate of the concentrations. Dason, on his blog (somewhere), has shown how how to estimate the "uncertainty in the concentration" in R with the delta method (also called Gauss approximation).

Alternatively, if you have a given and fixed concentration, it might be that you want to vary the light intensity (sorry I don't know the correct physical vocabulary), then those light intensities would be fixed values and the absorbance values, conditional on the lights, would be univariate normally distributed.

I made a long shot that might simplify the problem considerably!

#### etelebaxi

##### New Member
hi,
just a quick thought, the median will be the log of the original median because the log transformation preserves the order.

I ran a few simulations of a normal distribution N(100, s) where s was 10 and 20. It looks like the distribution of the logs will be left skewed ( no big surprise knowing the log curve,) and the skewness will be larger for larger std dev in the original. Also, the kurtosis will be a lot higher then in the original normal.

The mean stayed pretty close to the log of the original mean though, for practical purposes.

I hope this helps a bit.

regards
rogojel
Hi rogojel (Vagy jó napot! :yup: )

I think we may continue our discussion in Hungarian, I do not know, if the rules of this site permits. My English is a little bit Hunglish, I do not speak it well.

I made some researches about the null hipothesys it were useful, but not enough. Much water went down on the Danube when I have learned statistics, and probability...

Io/I is always positive. A negativ intensity of light, also a negativ intensity of electricity it is a nonsens. We also use a calibration curve, with a low and high limit of measurement, so if we measure a negativ value it is a serious problem and we have to call service.

Io > I, per definition, Io is the start point, I is the transmitted light intensity, which is always partial absorbed. In practice there are out of measurement limits if 0,95*Io> I > 0.1*Io but in most cases A= logIo/I =[0.05-0.3] (sorry in my first intervention I had used ln natural logarithm, it is 10 based loarithm)

standard deviation is not very low in biochemistry critical Random error can be even 9-10% Total analitical error (admitted) can reach in some cases 25% (e.g. ALAT)

The assimetry increases if mean is low and SD is high. At simmetric distributions mean=median. we use to calculate SD relating to the mean. I am wondering (but not sure) if instead of mean, would not be more correct to calculate it relating to the median at assimmetric curves. I am in research...

The essential difference between the normal symmetric distribution and our assymmetric "expnormal" distribution (analogy with lognormal distribution ) occur at higher random errors than 1.5 SD.
Thanks for median, Ihave found it myself yesterday, also I know how to calculate the mode point, (it is the point, where tthe derivative is 0, is a maximum point). I did not calculate it yet, but I have no idea for the mean...

Thanks a lot for all your help, it was very useful.

#### etelebaxi

##### New Member
Could it be that you are simply interested in a calibration experiment like:

Absorbance = a + b*Concentration +(random error)

(This is a long shot )
(Maybe there should be log(absorbance) and log(concentration here.)

So that the absorbed light is linearly related to the concentration of the substance. In such experiments the concentrations are fixed numbers, set by the experimenter. Then the usual assumption in such experiment, is the the random error is normally distributed and that the absorbance variable is also normally distributed, given the concentrations levels. (Maybe there is increasing variance in the random error as concentration increases.)

Maybe you want to do a number of calibration experiments, and then with the estimated "a" and "b" parameters, use that in new measurements of the absorbance and thereby get an estimate of the concentrations. Dason, on his blog (somewhere), has shown how how to estimate the "uncertainty in the concentration" in R with the delta method (also called Gauss approximation).

Alternatively, if you have a given and fixed concentration, it might be that you want to vary the light intensity (sorry I don't know the correct physical vocabulary), then those light intensities would be fixed values and the absorbance values, conditional on the lights, would be univariate normally distributed.

I made a long shot that might simplify the problem considerably!
HI! You are very close!

I did not want to make chemistry therory here, So I have simplified the system. Of sure, it is about concentrations, and kinetics. It is chemistry.

WE determine concentrations of our measurands (in serum, plasma, urine, etc) using a calibration curve. C=Cs(A-Ao)/As-Ao) index o is from water (null concentration) index s in from a known standard, and are constants, the only variable is A, the absorbtion given by our measurand

"the usual assumption in such experiment, is the the random error is normally distributed and that the absorbance variable is also normally distributed,"

That is the statement I am questioning, however all literature states so!!!!

Io/I what we measure directly is obvious normally distributed. but its logatithm not! That I am stating. This is mathematics! if we make exponential or logarithmal tranformations on normal distributed variables, we get other distributions, as described by rogojel. If X is normally distributed, expX is lognormal distributed. LOgX or LnX also is not normally dostributed, but I did not found a name for this distribution. I the median of X is m the median of LnX is ln(m). But I am not able to calculate the mean of the values of this dtribution. Therefore I came here, to ask for help.

#### GretaGarbo

##### Human
Now it seem to me that this problem has changed from a more difficult bivariate problem to a less difficult univariate one.

"the usual assumption in such experiment, is the the random error is normally distributed and that the absorbance variable is also normally distributed,"
Yeah, we got to start somewhere!

That is the statement I am questioning, however all literature states so!!!!
Good! Great!

Io/I what we measure directly is obvious normally distributed.
Well it is not obvious to me.

And I don't understand the objection. Is it empirically obvious from the data you have seen that that is normally distributed? Or is it from theoretical reasons for example that there is something like the central limit theorem?

If it is a matter or how the physics/chemestry/biology works (the empirical issue) then one can start with something that seems reasonable, like the normal distribution. If that does not fit then try something different.

Anyway if your question is if what is the distribution of the logarithms of a normally distributed variable I leave it over back to BGM and the others.

#### rogojel

##### TS Contributor
hi (jo reggelt )
thinking about it, I guess that a Weibull with a large beta, could model the data well. The beta should be an increasing function of the origina std dev. I have no idea whether this could be mathematically derived and it might possibly not fit the kurtosis well, but probably worth a try.

regards
rogojel

#### etelebaxi

##### New Member
Now it seem to me that this problem has changed from a more difficult bivariate problem to a less difficult univariate one.

Yeah, we got to start somewhere!

Good! Great!

Well it is not obvious to me.

And I don't understand the objection. Is it empirically obvious from the data you have seen that that is normally distributed? Or is it from theoretical reasons for example that there is something like the central limit theorem?

If it is a matter or how the physics/chemestry/biology works (the empirical issue) then one can start with something that seems reasonable, like the normal distribution. If that does not fit then try something different.

Anyway if your question is if what is the distribution of the logarithms of a normally distributed variable I leave it over back to BGM and the others.
1. Yes, the question is, what is the type of the distribution of the logarithm of a normally distributed variable. and also which is the average (mean) of the values. Rogojel has given a link, where the ecuation is described, it has "no name" and is not a simple one.

about the supposition that Io/I is normally distributed, you have got me! I started from the premise, that DIRECT measurements (absorbtion it is not) are usually (sic!) normally distributed...

My observation is that our measurements (concentrations) are nor normally distributed however literature state so. We have a lot of substances we determine the concentration, and for one of them for example for Uric Acid, (without changing calibration, or reactives) from 54 data I have got a bimodal distrubution, with only 12 percent of data between -0,5SD+0,5SD... probability less then 4% to be the game of low probabilities.

There a lot of motives for it, I have got some of them, the nongaussian distribution - theoretical approach it will sustain the statement, as onother reason. But I nead some help in mathematics... In a scientific work you can not say: I tried a simulation and it seemed to me, that Median and nmean are not the same (sic!)

It is a problem related to QC.

Thanks for attentioning for the wrong premise.

#### etelebaxi

##### New Member
hi (jo reggelt )
thinking about it, I guess that a Weibull with a large beta, could model the data well. The beta should be an increasing function of the origina std dev. I have no idea whether this could be mathematically derived and it might possibly not fit the kurtosis well, but probably worth a try.

regards
rogojel
I need to make some researhes to understand this, I shall maybe return later.

#### etelebaxi

##### New Member
The Weibull distribution does not fit the kurtosis well, in my opinion, and therefore are differences out of +/-SD. I managed to calculate the mode log (μ+√(μ^2+4∙σ))/2, caculating the derivative, and making it equal to 0. (is the maximum point of the curve). The mean of the Absorbtion is the logarithm of the geometric mean of the Io/I. Thanks once more a lot, all of you helped me a lot.

#### GretaGarbo

##### Human
if it is that you (etelbaxi) want to find a distribution that might fit to your empirical data, then I suggest that you show us some histograms for the actual data.

And, often one model does not fit to all data and all substances.

What software are you using?

(it could also be that you have data that are censored, that is, data that are below a detection limit.)