trinker said:

How can I constrain a number to be between -1 and 1

Code:

`((1 - (1/(1 + exp(x)))) * 2) - 1`

trinker said:

How can I constrain a number to be between -1 and 1

Code:

`((1 - (1/(1 + exp(x)))) * 2) - 1`

There are lots of ways to make a function that maps from the reals to the interval [-1, 1] (or possibly (-1,1) depending on what you actually want). If you have any CDF (call it F(x)) then 2*F(x)-1 will map to [-1,1]. Your method corresponds to using the CDF of the logistic distribution as the CDF in this transformationl.

Another concrete example is: arctan(x)*2/pi also maps to (-1,1) (this is the previous transformation applied with the CDF of the standard Cauchy distribution)

does it have to be smooth? why do you need truncating? you can simply truncate everything <-1 to be -1 and everything >1 to be 1, for example. Or do you want to preserve some features of the numbers, like relative distance between them? you can then essentially center and scale them.

I don't know the in's and out's of polarity anlaysis, but it seems that since this is a score you'd want to simply rescale to [-1,1], e.g.

Code:

```
m=-50 # min
M=100 # max
set.seed(1) # Dason
x=runif(10000,m,M) # example x's
summary(2*(x-m)/(M-m)-1) # [-1,1]
```

f(x) = 2\frac{x-m}{M-m}-1

\)

If there's a theoretical min and max to the original un-scaled score then use those for m and M. If the original score is unbounded in both directions, use the observed min and max. Unless there's something about this type of analysis in particular, I wouldn't use a sigmoid or other curve, since, e.g., logistic will "bunch" the numbers close to the extremes -- that is, since it's a score it seems it'll be important to keep their relative positions intact.

and I'm not sure which/any of these are important for your application; that might be one reason it was difficult to get an answer last time you asked (not sure since I didn't see that thread)

when i said above that Dason beat me to it, I was also going to suggest sigmoidal curve, so I agree it might be a good choice, its just later I thought it migh cause a lot of your transformed scores to be close to -1 and 1, depending on how you parameterize the logistic (or whatever s-shaped curve you use) and where the original values are.

if the different data sets have different min and max scores, and it's thought that that might cause a problem because the "m" and "M" in my previous formula will then be different between data sets, then how about using the overall min as m and the overall max as M? that might work ... again, though, I think only you can say since you're the one intimately knowledgeable of the details