log(0) should have a real value!!!

#1
Dear all,

I’ve been stuck for quite some time with a problem that relates to use of log-scale with values that equal zero.

My data comes from an experiment in which five different instruments (one of them using the reference method) take two measurements of a given sample’s signal emission. A total of fifteen samples are processed. I would like to copy an approach widely used in the bibliography I have reviewed, comprised buy the following
- perform a log transformation of the average of the two reads (taken for every sample, respectively, by the instruments)
-perform a linear regression
-ANOVA analysis

A grave issue arises when readings for some of the instruments drop to zero. To make a long story short, some instruments alter the sample’s capacity to generate a signal and the signal drops below the resolution of the instrument (this doesn’t happen in all the instruments, let alone the gold standard), actually this is quite a finding! However, I’m left with no apparent way to compare myself to available bibliography.
I can’t perform any analysis of reads with zero values, not if I use a log transformation.
My thinking, is that, rigorously speaking, the “0 results” (let’s call these “x”) are NOT equal to 0. In fact, they are smaller than the instrument’s limit of resolution (res), hence, “x < (res)”, a result that seems fairly sound to me, much more rigorous than just “0”, in fact.

The question comes as to whether I could, for the sake of analysing the data do the following:
-Set all values of 0 equal to the limit of resolution (res);
-Perform the linear regression analysis

Now, if the linear regression yielded results that were acceptable according to prefixed criteria, I could do very little but say that, allegedly, if the “0” results equalled res, a good linear correlation would exist, pretty much use-less. Yet, here comes my doubt; if the linear correlation was bad could I soundly argue that “for the existing results, linear correlation can be considered AT LEAST as bad as the one obtained“. And can I use this sort of approach in any way to perform an ANOVA or a similar test you might suggest?

Thank you in advance for any advice!!!
 

Miner

TS Contributor
#2
You actually have two problems here. One is a Limit of Detection issue. I am not an expert in this topic, but I know there are a number of different approaches that can be taken and there is a lot of arguments for and against each.

Regarding your problem with Log(0), I have changed the 0 values to a very small positive number such as 0.0000001. This allows the log transformation and has minimal impact to your analysis, particularly since you know already that the results are not actually 0. Note: some of the Limit of Detection methods will also resolve the log(0) issue as well.
 

Miner

TS Contributor
#4
A lot would depend on the range of your data. If the adjustment is a large percentage of the total range of data, you would have a valid concern. However, you would have an even larger issue to deal with, the Limit of Detection issue. In the cases where I have used the adjustment, the adjustment is an extremely small percent of the total data range.
 
#5
My thinking, is that, rigorously speaking, the “0 results” (let’s call these “x”) are NOT equal to 0. In fact, they are smaller than the instrument’s limit of resolution (res), hence, “x < (res)”, a result that seems fairly sound to me, much more rigorous than just “0”, in fact.
I agree.

When a value is below (or above) a detection limit it is said to be censored.

Have a look at this one.

Helsel and others have written about cesored data. He has written an R packadge ("NADA") and a textbook (search for it). Look at this introductory text. (This could be a good read for many here at talkstats.)

Edit: ROS means regression on order statistics.

Also a longer but older text is given here.

- - -

Frankly, I don't understand the method of replacing the censored value with "0.0000001".
If you have these data, where the first is a substituted value:

(0.0000001 , 0.25, 0.5, 1, 1.5, 2, 4 )

then, if you take the logarithm with the base of 10, you will get (with rounded values) :

( -7.00 -0.60 -0.30 0.00 0.18 0.30 0.60)

So that the first value - the substituted value - would appear as an outlier.

Helsel has been quite harsh in his writing about substitution methods. (Actually it is quite funny reading.) He want to refuse all papers with substitution methods. I would not go that far.