# Split string

#### Layo909

##### Member
Is there any way of splitting

Code:
AnonEtAl2013
into

Code:
Anon Et Al 2013
All attempts have failed so far

#### Dason

Can you explain why you want to split it that way? What is the logic? And can you give several examples?

Will it always be 4 characters, 2 characters, 2 characters, 4 characters?

#### Layo909

##### Member
Basically I have a load of citations keys in the format AnonEtAl2013 (therefore no spaces). I need to search for the bibtex references on Google Scholar, and searching for AnonEtAl2013 doesn't tend to work.

Won't always be 4,2,2,4 characters but if I know to to split AnonEtAl2013 then I could figure out how to do the rest.

#### Dason

You haven't actually described the pattern though. What is it that you want to split on? Capital letters and/or numbers? I can't read your mind - neither can the computer. The first step is to break the problem down into small little pieces. In this case that means identify the actual pattern you want to split on. If you can adequately describe that then we're a lot closer to achieving your goal.

This is partially why I wanted several examples. A single example doesn't do anything in establishing a pattern.

#### Layo909

##### Member
In every instance pattern is NameEtAlyear, like this:

Code:
AnonEtAl2010

AnotherEtAl2011

OnemoreEtAl2012

LastoneEtAl2013
Each of these should be split into:

Code:
Anon Et Al 2010

Another Et Al 2011

Onemore Et Al 2012

Lastone Et Al 2013

#### Dason

Will there be inputs that don't follow this pattern?

#### Dason

Code:
x <- c("AnonEtAl2010", "AnotherEtAl2011", "OnemoreEtAl2012", "LastoneEtAl2013")

gsub("([A-Z]|[[:digit:]]+)", " \\1", x)
That will add a space at the beginning but you can probably deal with that. What this does is add a space before every capital letter and every group of numbers.

#### Layo909

##### Member
There are two other possible patterns

(1) NameYear eg Anon2013

(2) NameNameYear eg SomeoneAnother2013

I thought that if I knew how to split AnonEtAl2013, I could apply similar code to these other two patterns

#### Dason

We can get rid of adding the space to the beginning if we use perl compatible regexs.

Code:
x <- c("AnonEtAl2010", "AnotherEtAl2011", "OnemoreEtAl2012", "LastoneEtAl2013")
gsub("(?!^)([A-Z]|[[:digit:]]+)", " \\1", x, perl=T)

#which gives
> gsub("(?!^)([A-Z]|[[:digit:]]+)", " \\1", x, perl=T)
[1] "Anon Et Al 2010"    "Another Et Al 2011" "Onemore Et Al 2012"
[4] "Lastone Et Al 2013"

Code:
> x <- c("AnonEtAl2010", "AnotherEtAl2011", "OnemoreEtAl2012", "LastoneEtAl2013", "Anon2013", "SomeoneAnother2013")
>
> gsub("(?!^)([A-Z]|[[:digit:]]+)", " \\1", x, perl=T)
[1] "Anon Et Al 2010"      "Another Et Al 2011"   "Onemore Et Al 2012"
[4] "Lastone Et Al 2013"   "Anon 2013"            "Someone Another 2013"

#### Dason

x <- c("AnonEtAl2010", "AnotherEtAl2011", "OnemoreEtAl2012", "LastoneEtAl2013", "Anon2013", "SomeoneAnother2013")
gsub("(?!^)([[:upper:]]|[[:digit:]]+)", " \\1", x, perl = TRUE)