Constrain a value to be between -1 and 1

trinker

ggplot2orBust
#1
I asked this question a while back and never found an answer.

trinker said:
How can I constrain a number to be between -1 and 1
After reading my multilevel book I realized the answer from the logit link reading:

Code:
((1 - (1/(1 + exp(x)))) * 2) - 1
This should take any real number between -Inf and Inf and transform it to between -1 and 1. I think. I came up with this (I'm sure others have already) so it may not be right.
 

Dason

Ambassador to the humans
#2
Your method works.

There are lots of ways to make a function that maps from the reals to the interval [-1, 1] (or possibly (-1,1) depending on what you actually want). If you have any CDF (call it F(x)) then 2*F(x)-1 will map to [-1,1]. Your method corresponds to using the CDF of the logistic distribution as the CDF in this transformationl.

Another concrete example is: arctan(x)*2/pi also maps to (-1,1) (this is the previous transformation applied with the CDF of the standard Cauchy distribution)
 
#3
dang, beat me to it ... yeah, what Dason said


does it have to be smooth? why do you need truncating? you can simply truncate everything <-1 to be -1 and everything >1 to be 1, for example. Or do you want to preserve some features of the numbers, like relative distance between them? you can then essentially center and scale them.
 

Dason

Ambassador to the humans
#5
This seems like the best solution:
Code:
bestRescale <- function(x) runif(length(x), min=-1, max=1)
A noble attempt but it should be reproducible so you get the same output for the same input. Here I fixed it for you
Code:
bestRescale <- function(x){set.seed(1); runif(length(x), min = -1, max = 1)}
 

Dason

Ambassador to the humans
#7
What is this for exactly? If you have motivation for this then just choosing any ol' function that meets this criteria probably isn't the best route to go unless ordering is all that matters.
 
#10
sentiment analysis? cool, seems interesting, i saw that somewhere before, maybe kagle.

I don't know the in's and out's of polarity anlaysis, but it seems that since this is a score you'd want to simply rescale to [-1,1], e.g.

Code:
m=-50 # min
M=100 # max
set.seed(1) # Dason
x=runif(10000,m,M) # example x's
summary(2*(x-m)/(M-m)-1) # [-1,1]
\(
f(x) = 2\frac{x-m}{M-m}-1
\)

If there's a theoretical min and max to the original un-scaled score then use those for m and M. If the original score is unbounded in both directions, use the observed min and max. Unless there's something about this type of analysis in particular, I wouldn't use a sigmoid or other curve, since, e.g., logistic will "bunch" the numbers close to the extremes -- that is, since it's a score it seems it'll be important to keep their relative positions intact.
 

Dason

Ambassador to the humans
#11
I guess I don't see the point in doing any of the previous transformations where you just use a function to map from the reals to [-1, 1]. This doesn't really make comparisons any more meaningful. I don't know enough about the measure itself but I have a feeling if you want to compare scores between samples you really need to incorporate more information from the sample into the transformation instead of just mapping values from the reals to [-1,1]
 
#12
I agree. I couldn't say what current literature says on these polarity scores, or what other's have done to compare b/t different data. Any function that maps from the whole real line will not keep relative positions intact; on the otherhand, rescaling to [-1,1] as I did above may or may not be what you need if the min and max are different b/t data sources (not sure that matters, but might, i just don't know enough about the context)
 

Dason

Ambassador to the humans
#13
Any function that maps from the whole real line will not keep relative positions intact
What do you mean by this exactly? If we have a function F that is monotonic increasing then if a < b then F(a) < F(b) so the ordering of the transformed scores will be the same as the ordering of the raw scores.
 
#14
I meant the relative distances between the points, not just the ordering

like this:
we want to preserve (I'm assuming) the relative distance between scores
\(
\frac{|x_1-x_2|}{|x_2-x_3|} \propto \frac{|f(x_1)-f(x_2)|}{|f(x_2)-f(x_3)|}
\)

where f is the transformation
 

trinker

ggplot2orBust
#15
Preserving the relative distance between scores would be nice. I hadn't thought of that. My approach maitnains order and direction but the relative distance would be nice. But the end game was as Dason said to compare across samples.
 
#16
yeah, that's what I meant above when asking what features you need to preserve, including, as you/we stated: order, direction, relative distance ... others?

and I'm not sure which/any of these are important for your application; that might be one reason it was difficult to get an answer last time you asked (not sure since I didn't see that thread)

when i said above that Dason beat me to it, I was also going to suggest sigmoidal curve, so I agree it might be a good choice, its just later I thought it migh cause a lot of your transformed scores to be close to -1 and 1, depending on how you parameterize the logistic (or whatever s-shaped curve you use) and where the original values are.
 
#17
oh, I just thought ...

if the different data sets have different min and max scores, and it's thought that that might cause a problem because the "m" and "M" in my previous formula will then be different between data sets, then how about using the overall min as m and the overall max as M? that might work ... again, though, I think only you can say since you're the one intimately knowledgeable of the details