gregexpr negative numbers and decimals

TheEcologist

Global Moderator
#1
Dear Bots, Raptors and Humans,

I've got a little issue, I'm extracting numbers from strings with the following code:

Code:
  textstring<-c("the limits are -100 to 100")
  match<- gregexpr('[0-9]+',textstring)
  limits <- regmatches(textstring,match)
Which works file however it does not retain negative numbers.
I've tried the following perl-esk adaption: ^-?[0-9]\\d*(.\\d+)?$
which should also capture decimals .. but I cant get it to work so clearly I'm doing something wrong.. but what? How do I get those minus signs in the correct position?

@trinker, I'm sure this will be a piece of cake for you.

Thanks,

TE
 

TheEcologist

Global Moderator
#2
By the way, using '.[0-9]+' does not solve the problem as it will also return non-numbers e.g.
Code:
 textstring<-c("CI (-100,100)")
  match<- gregexpr('.[0-9]+',textstring)
  limits <- regmatches(textstring,match)
[[1]]
[1] "-100" ",100"
 

Lazar

Phineas Packard
#3
Do all decimals have a leading zero? If so this horrible piece of crap might work:
Code:
textstring<-c("the limits are -100 to 100 also 0.1")
match<- gregexpr('(-|)[0-9]+\\.*[0-9]*',textstring)
regmatches(textstring,match)
EDIT: This would work even without leading zeros
Code:
match<- gregexpr('(-|)[0-9]*\\.*[0-9]+',textstring)
 

TheEcologist

Global Moderator
#4
Do all decimals have a leading zero? If so this horrible piece of crap might work:
Code:
textstring<-c("the limits are -100 to 100 also 0.1")
match<- gregexpr('(-|)[0-9]+\\.*[0-9]*',textstring)
regmatches(textstring,match)
No they don't. I tried this piece of crap: "-?\\d*[0-9]\\d+"


Code:
textstring<-c("the limits are -100 to 100 also 0.1")
match<- gregexpr("-?\\d*[0-9]\\d+" ,textstring)
regmatches(textstring,match)
Which gives me the first two but not the third - your addition... which I also want to capture :/

Getting closer though..
 

Lazar

Phineas Packard
#5
See edit:
Code:
> textstring<-c("the limits are -100 to 100 also 50.001 and -0.2 also .3 and -.4.")
> match<- gregexpr('(-|)[0-9]*\\.*[0-9]+',textstring)
> regmatches(textstring,match)
[[1]]
[1] "-100"   "100"    "50.001" "-0.2"   ".3"     "-.4"
 

TheEcologist

Global Moderator
#6
As far as I can tell your code works without there needing to be a leading zero:

Code:
textstring<-c("the limits are -100 to 100 also 0.1 and also 1.1")
match<- gregexpr('(-|)[0-9]+\\.*[0-9]*',textstring)
regmatches(textstring,match)
## [1] "-100" "100"  "0.1"  "1.1"
Sweet!

Thanks Lazar
 

TheEcologist

Global Moderator
#7
See edit:
Code:
> textstring<-c("the limits are -100 to 100 also 50.001 and -0.2 also .3 and -.4.")
> match<- gregexpr('(-|)[0-9]*\\.*[0-9]+',textstring)
> regmatches(textstring,match)
[[1]]
[1] "-100"   "100"    "50.001" "-0.2"   ".3"     "-.4"
Aah we replied at the same time.. yes it works fine!
 

Lazar

Phineas Packard
#8
I changed around the + * in my last post. I just wanted to check you saw as my first will match some things you do not want.
 

TheEcologist

Global Moderator
#9
I changed around the + * in my last post. I just wanted to check you saw as my first will match some things you do not want.
It works fine, thanks... it was meant for extracting the priors automatically out of a BUGS/JAGS model file and plotting them against the posterior... which should now work perfectly!
 

TheEcologist

Global Moderator
#10
I rejoiced to quickly, some instances still fail:

Code:
textstring<-c("the limits are -100 to 100 also 0.1 and also 1.1 and (0,1)")
match<- gregexpr('(-|)[0-9]+\\.*[0-9]+',textstring)
regmatches(textstring,match)
But this does not:

Code:
textstring<-c("the limits are -100 to 100 also 0.1 and also 1.1 and (0,1)")
match<- gregexpr('(-|)[0-9]+\\.*[0-9]*',textstring)
regmatches(textstring,match)
Just a follow-up for clarity!

TE
 

Dason

Ambassador to the humans
#11
Blather. I had posted something before heading out but it's not here. My general outline was

1) strsplit on " ". This will break into words and retain a negative if present.
2) use grep (or whatever you want) to only grab matches that contain a number. This is an easy pattern "[0-9]+"
3) use grep to prune out stuff you don't want (for instance if it contains "(" or ")" )

It's more steps than writing one regex to rule them all but I think the logic and code are easier to follow and thus will be easier to maintain in the future.
 

TheEcologist

Global Moderator
#12
Blather. I had posted something before heading out but it's not here. My general outline was

It's more steps than writing one regex to rule them all but I think the logic and code are easier to follow and thus will be easier to maintain in the future.
Indeed, but what about speed?

Anyway the plot thickens... as the regex now fails on numbers like; 1E7 or 1e-7.

NUTS..

I'll think on how to solves this... post back if I find something

Code:
textstring<-c("the limits are -100 to 100 also 0.1 and also 1.1 and (0,1) but also 1e+7, 1e-7 and 1E7 ")
match<- gregexpr('(-|)[0-9]+\\.*[0-9]*',textstring)
regmatches(textstring,match)
 

TheEcologist

Global Moderator
#13
Got a solution.

Code:
## This works! One regex to rule them all?

match<- gregexpr('(-|)[0-9]+\\.*[0-9]*|*\\d+([e|E][+\\-]*\\d*)',textstring)
However Dason's words are wise, simpler code is often better... always anticipate your future stupidity. In a years time, I will likely no longer be able to read the logic in the above...