# gregexpr negative numbers and decimals

#### TheEcologist

##### Global Moderator
Dear Bots, Raptors and Humans,

I've got a little issue, I'm extracting numbers from strings with the following code:

Code:
  textstring<-c("the limits are -100 to 100")
match<- gregexpr('[0-9]+',textstring)
limits <- regmatches(textstring,match)
Which works file however it does not retain negative numbers.
I've tried the following perl-esk adaption: ^-?[0-9]\\d*(.\\d+)?\$
which should also capture decimals .. but I cant get it to work so clearly I'm doing something wrong.. but what? How do I get those minus signs in the correct position?

@trinker, I'm sure this will be a piece of cake for you.

Thanks,

TE

#### TheEcologist

##### Global Moderator
By the way, using '.[0-9]+' does not solve the problem as it will also return non-numbers e.g.
Code:
 textstring<-c("CI (-100,100)")
match<- gregexpr('.[0-9]+',textstring)
limits <- regmatches(textstring,match)
[[1]]
[1] "-100" ",100"

#### Lazar

##### Phineas Packard
Do all decimals have a leading zero? If so this horrible piece of crap might work:
Code:
textstring<-c("the limits are -100 to 100 also 0.1")
match<- gregexpr('(-|)[0-9]+\\.*[0-9]*',textstring)
regmatches(textstring,match)
EDIT: This would work even without leading zeros
Code:
match<- gregexpr('(-|)[0-9]*\\.*[0-9]+',textstring)

#### TheEcologist

##### Global Moderator
Do all decimals have a leading zero? If so this horrible piece of crap might work:
Code:
textstring<-c("the limits are -100 to 100 also 0.1")
match<- gregexpr('(-|)[0-9]+\\.*[0-9]*',textstring)
regmatches(textstring,match)
No they don't. I tried this piece of crap: "-?\\d*[0-9]\\d+"

Code:
textstring<-c("the limits are -100 to 100 also 0.1")
match<- gregexpr("-?\\d*[0-9]\\d+" ,textstring)
regmatches(textstring,match)
Which gives me the first two but not the third - your addition... which I also want to capture :/

Getting closer though..

#### Lazar

##### Phineas Packard
See edit:
Code:
> textstring<-c("the limits are -100 to 100 also 50.001 and -0.2 also .3 and -.4.")
> match<- gregexpr('(-|)[0-9]*\\.*[0-9]+',textstring)
> regmatches(textstring,match)
[[1]]
[1] "-100"   "100"    "50.001" "-0.2"   ".3"     "-.4"

#### TheEcologist

##### Global Moderator
As far as I can tell your code works without there needing to be a leading zero:

Code:
textstring<-c("the limits are -100 to 100 also 0.1 and also 1.1")
match<- gregexpr('(-|)[0-9]+\\.*[0-9]*',textstring)
regmatches(textstring,match)
## [1] "-100" "100"  "0.1"  "1.1"
Sweet!

Thanks Lazar

#### TheEcologist

##### Global Moderator
See edit:
Code:
> textstring<-c("the limits are -100 to 100 also 50.001 and -0.2 also .3 and -.4.")
> match<- gregexpr('(-|)[0-9]*\\.*[0-9]+',textstring)
> regmatches(textstring,match)
[[1]]
[1] "-100"   "100"    "50.001" "-0.2"   ".3"     "-.4"
Aah we replied at the same time.. yes it works fine!

#### Lazar

##### Phineas Packard
I changed around the + * in my last post. I just wanted to check you saw as my first will match some things you do not want.

#### TheEcologist

##### Global Moderator
I changed around the + * in my last post. I just wanted to check you saw as my first will match some things you do not want.
It works fine, thanks... it was meant for extracting the priors automatically out of a BUGS/JAGS model file and plotting them against the posterior... which should now work perfectly!

#### TheEcologist

##### Global Moderator
I rejoiced to quickly, some instances still fail:

Code:
textstring<-c("the limits are -100 to 100 also 0.1 and also 1.1 and (0,1)")
match<- gregexpr('(-|)[0-9]+\\.*[0-9]+',textstring)
regmatches(textstring,match)
But this does not:

Code:
textstring<-c("the limits are -100 to 100 also 0.1 and also 1.1 and (0,1)")
match<- gregexpr('(-|)[0-9]+\\.*[0-9]*',textstring)
regmatches(textstring,match)
Just a follow-up for clarity!

TE

#### Dason

Blather. I had posted something before heading out but it's not here. My general outline was

1) strsplit on " ". This will break into words and retain a negative if present.
2) use grep (or whatever you want) to only grab matches that contain a number. This is an easy pattern "[0-9]+"
3) use grep to prune out stuff you don't want (for instance if it contains "(" or ")" )

It's more steps than writing one regex to rule them all but I think the logic and code are easier to follow and thus will be easier to maintain in the future.

#### TheEcologist

##### Global Moderator
Blather. I had posted something before heading out but it's not here. My general outline was

It's more steps than writing one regex to rule them all but I think the logic and code are easier to follow and thus will be easier to maintain in the future.

Anyway the plot thickens... as the regex now fails on numbers like; 1E7 or 1e-7.

NUTS..

I'll think on how to solves this... post back if I find something

Code:
textstring<-c("the limits are -100 to 100 also 0.1 and also 1.1 and (0,1) but also 1e+7, 1e-7 and 1E7 ")
match<- gregexpr('(-|)[0-9]+\\.*[0-9]*',textstring)
regmatches(textstring,match)

#### TheEcologist

##### Global Moderator
Got a solution.

Code:
## This works! One regex to rule them all?

match<- gregexpr('(-|)[0-9]+\\.*[0-9]*|*\\d+([e|E][+\\-]*\\d*)',textstring)
However Dason's words are wise, simpler code is often better... always anticipate your future stupidity. In a years time, I will likely no longer be able to read the logic in the above...