Comparing methods of standardising variables

#1
I have count data of the number scattered photons detected at various wavelengths. The data is collected under different conditions, so it is normal practice to standardise the variables. There are 3 commonly used methods to do this:

1.) divide by the sum of squares
2.) zero mean and unit variance
3.) unit variance

Does anyone have any insight and/or resources comparing the strengths/ weaknesses of these (or other) methods?

Cheers.
 

maartenbuis

TS Contributor
#2
I am surprised that you want to standardize at all. Your variables are counts, so the units of your variables are perfectly comparable, regardless of the conditions under which they were collected.
 
#3
The experimental set-up means that the same peak at a wavenumber could be measured as 700 photons or 70000 photons - but given the same experimental conditions you would expect roughly the same photon count.

At the moment i'm just following what is being done in the established literature, but i want to explore why it is done. The data is typically analysed by PCA or some kind of regression.
 

hlsmith

Omega Contributor
#4
Yeah, this most be a "field" thing, since I am also surprised by the notion of standardizing counts. As maartenbuis states, it is my impression that a key reason in continuous data is to have them all comparable for comparisons. Say you want to compare their effects.


Another reason is to ease computations and pull down some big numbers.


Can you provide some example data, so we can better understand what you are working with!