Loop through values - only first element is used

#1
Hi everybody,

i feel this is very baisc but i need a hand.

I want to use the strsplitfunction on characters in a data.frame but under certain conditions!

If the condition is met --> split character and take the first two stringparts
If the condition is not met --> split the character and take only the first stringparts

Problem: Only one element of my list standard.modellliste is taken.

Please see my code i produced so far:
Code:
autos<-data.frame(hersteller=c("bmw","bmw", "audi", "ford"), modell= c("320 i", "117 e", "A 4", "Focus V1"),stringsAsFactors=FALSE)


standard.modellliste<-c("audi","bmw")
super.simple.modelle<-NULL
for (i in 1:length(autos[,1])){
  for (ii in 1:2){
    if(autos[,1][i]==standard.modellliste[[ii]]){
      super.simple.modelle[i]<-paste(strsplit(autos[,2]," ")[[i]][1],strsplit(autos[,2]," ")[[i]][2])
    }else{
      super.simple.modelle[i]<-strsplit(autos[,2]," ")[[i]][1]
    }}}
super.simple.modelle
I understand that the if-argument does only take one element of a list.

I tried with ifelse and neting it but it was not working either.

Code:
for (i in 1:4){ 
  tester<-ifelse(autos[,1]=='bmw', paste(strsplit(autos[,2][[i]]," ")[1],strsplit(autos[,2]," ")[[i]][2]),
              ifelse(autos[,1] == 'audi', paste(strsplit(autos[[i]][,2]," ")[1],strsplit(autos[,2]," ")[[i]][2]),strsplit(autos[,2]," ")[[i]][2])
)}

tester
Intended result is the following: modell("320 i", "117 e", "A 4", "Focus" ) (bmw and audi-modells have two stringparts - ford only one)


So guys - how can i do a loop and use a list of certain arguments (conditions) for directing the correct functions on my data.frame?

Thank you!
 
#2
Dear all,

i found the solution myself with big help of the internet of course.

The use of nested ifelse statements was the key. It is not pretty but it works!

Code:
super.simple.modelle<-NULL
for (i in 1:length(autos[,1])){
  super.simple.modelle[i]<-ifelse(autos[i,1]=="bmw",paste(strsplit(autos[i,2]," ")[[1]][1],strsplit(autos[i,2]," ")[[1]][2]),
                 ifelse(autos[i,1]=="audi",paste(strsplit(autos[i,2]," ")[[1]][1],strsplit(autos[i,2]," ")[[1]][2]),strsplit(autos[i,2]," ")[[1]][1]))
}
super.simple.modelle
The basic structure of this code is:
"for"-loop to go through each column of the data frame
nested "ifelse" inside the loop to inform R what conditions i have. Basically the nested ifelse is build for this example like that
ifelse(condition1,do this when true,ifelse(condition2,do this when true,do that when 1 and 2 are false).

As said this is not nice code but gets the job done. Im sure you can nest till eternity but i had hoped to do something along a list like so: list of models --> if modell is true for df - do this, if fals do that.

But for now i will ive with nesting like there is no tomorrow.
 

trinker

ggplot2orBust
#3
Here's why you haven't gotten a response. Put yourself in the role of the helping someone end. Pretend you're the helper. You know nothing about your problem. Now read your question and at the end you'll still know nothing about the problem.

You're trying to describe the problem using "If a certain condition is met" What's the condition? Also you try to explain what you want with broken code. Computers read code to figure things out. Humans can with extreme cognitive effort. So the suggestion: Use human language to describe the problem to humans. Be explicit. "Here's the data I have" "Here's how I want it to look" "Here's the logic to arrive at the desired output".

I think I get the logic and there's a more R way to achieve this but am not going to try it unless I know the idea I have actually matches your logic.
 
#4
Dear trinker,

thank you for pointig out what is problematic with my post. I understand that i have to be more specific and clear. Please let me try to be clearer.

As you can see from the code posted above i made a data.frame as an example. This data.frame consists of two columns (hersteller and modell). Both are characters.

I am now trying to conditionally "cut" the modell-column. Conditionally means, that if a certain "hersteller" [manufacturer] is found, the corresponding modell is to be cut either in two or just in one strings.

Example with what i want to achive:
When bmw in herstellercolumn appears (first row), the corresponding modell (second row) is "320 i". Via the strsplit function now both stringparts ("320" and "i") are to be saved into a new vector called super.simple.modelle. The same goes for Audi. So the condition of hersteller (bmw and Audi) defines how many stringparts are to be saved in the new vector - in this case 2. Any longer stringparts are not transfered. I highlighted the code which invokes this...

Code:
[B]super.simple.modelle[i]<-ifelse(autos[i,1]=="bmw",paste(strsplit(autos[i,2]," ")[[1]][1],strsplit(autos[i,2]," ")[[1]][2]),
                 ifelse(autos[i,1]=="audi",paste(strsplit(autos[i,2]," ")[[1]][1],strsplit(autos[i,2]," ")[[1]][2])[/B],strsplit(autos[i,2]," ")[[1]][1]))
But what happens, when you actually see another hersteller than "bmw" or "audi"? Well, "Ford" has not been specified in the ifelse command. So here the second argument of the last nested ifelse function steps into action - for "ford" manufacturer there should only be taken the first stringpart. So for Ford "Focus V1" it would be only "Focus". I made the relevant codepart bold...

Code:
super.simple.modelle[i]<-ifelse(autos[i,1]=="bmw",paste(strsplit(autos[i,2]," ")[[1]][1],strsplit(autos[i,2]," ")[[1]][2]),
                 ifelse(autos[i,1]=="audi",paste(strsplit(autos[i,2]," ")[[1]][1],strsplit(autos[i,2]," ")[[1]][2]),[B]strsplit(autos[i,2]," ")[[1]][1])[/B])
As you can see the code works with nested ifelse-function inside a for-loop. But it is tiresome and errorladen to nest a big list of manufacturers. Imagine having 20 or so hersteller.

Therefore it would be better to have it in another function. I tried it with for loop and nested "if" first but that didn`t work out (as "if" only takes the first element of a list).

Conclusion - what would be a better approach to do the described function above via a list?

Example:
You have said data.frame with manufacturer as one column and corresponding modell as another
1. Define a list (manulist<-c("bmw", "audi", "jeep","...")
2. Define a function, loop, lapply etc. which will cut the characters in the modell column
2.1 if one of the manufacturer in the list comes up -> take two strings
2.2 if the manufacturer in column "manufacturer" is not on list -> take one string
3. Do this for the whole of the data.frame and save it into a vector
4. attach the vector via cbind to the data.frame

Thank you guys for your time!
 

trinker

ggplot2orBust
#5
I think this will work (you can add to the vector where "ford" is):

Code:
FUN <- function(x) sapply(strsplit(x, "\\s+"), "[", 1)
ifelse(autos[, 1] %in% c("ford"), FUN(autos[, 2]), autos[, 2])
So first I create a function that splits on spaces and then use sapply and "[" to grab the 1 piece. This is what we'll conditionally apply using one ifelse. The key is to supply a logical vector of meeting the list or not with autos[, 1] %in% c("ford"). Here I only had ford because that's all your example gave (you may want to make your examples slightly bigger but not so big that it's unwieldy but you want to capture the problem (emphasis on slightly). Anyway, you can add more to the manulist (e.g., autos[, 1] %in% c("ford", "chevy", "dodge")). To make this into a column in the original simply assign it to an index:

Code:
autos[["new_column"]] <- ifelse(autos[, 1] %in% c("ford"), FUN(autos[, 2]), autos[, 2])
This yields:

Code:
  hersteller   modell new_column
1        bmw    320 i      320 i
2        bmw    117 e      117 e
3       audi      A 4        A 4
4       ford Focus V1      Focus
 
#6
Dear trinker,

thank you so much for your code!

I will try it out as soon as i can. Being on a business trip. So i can't really comment on it working with my data yet but from the looks of it it does what i need. :)

The %in% is new to me - will have a deeper look at it.

Also, have a "Thanks" and a great week.

tester1234
 

trinker

ggplot2orBust
#7
%in% will become a friend to you if you use R a bit. :) Have a look at ?match as %in% is a binary operator derived from match.
 

trinker

ggplot2orBust
#8
tester1234 said:
Finally there is one thing i can`t wrap my head around - that is "\\s+" and "[". I guess those are regular expressions. I tried to find a kind of codebook - to help me understand what they do - for "\\s+" i found this http://www.coderanch.com/t/570917/ja...gex-difference but what is "[".
This was asked in a private chat as the the poster was worried it was too off topic. But I think it goes to explaining the current problem so here's as good a place as any for it...

The first \\s+ is indeed a regular expression. Have a look at:

Code:
?regex
Which says...

The preceding item will be matched one or more times.
This code may help you understand some of what's going on:

Code:
x <- "I  like to eat robots    alot   !"

gregexpr(" ", x)
gregexpr("\\s", x)
gregexpr("\\s+", x)
gregexpr("\\s{2}", x)
gregexpr("\\s{2,}", x)

For the second part...

This is indexing. [ is actually a function in R. I'm passing this function to sapply which takes a function as an argument. Have a look at:

Code:
?`[`
Here's some code to begin understanding about indexing with sapply. The last one I'm being more explicit with an anonymous function.

Code:
sapply(list(a = 1:3, b = LETTERS, c = mtcars), "[", 1)
sapply(list(a = 1:3, b = LETTERS, c = mtcars), "[[", 1)

sapply(list(a = 1:3, b = LETTERS, c = mtcars), function(x) {
    x[1]
})
 
#9
Dear Trinker,

thank you for your code. I just edited it a little bit.

Code:
FUN1<- function(x) paste(sapply(strsplit(x, "\\s+"), "[",1),sapply(strsplit(x, "\\s+"), "[",2))
FUN2<- function(x) sapply(strsplit(x, "\\s+"), "[",1)
df.1["standardisierte_Modelle"]<-ifelse(df.1[, 8] %in% c("Alpina","Audi","BMW","Chrysler","Citroën","Jeep","Mercedes-Benz","Volvo"), FUN1(df.1[, 9]), FUN2(df.1[, 9]))
As you see i used your original function FUN1 and made two functions out of it.

FUN1 splits the character into two seperate instance, without regard to any other characterelements.

Example: "Focus 1.5 Special Editon" will be "Focus 1.5"

Fun2 only takes the first character element

Example: "Focus 1.5 Special Editon" will be "Focus"

Then as you proposed those two functions will be used in an ifelse-function on the whole column "standardisierte _Modelle" in the data.frame df.1. IF according to %in% any of the named makers
should be true in column 8, then use FUN1, if not then ELSE, use FUN2.

It worked very well and is also very fast!

I still have not understood the regular expessions "\\s+" and "[". I understand that "//s+" does stand for a "space" in characters, but what does "[" do. Is there a kind of a regular expressions
code book i can look after?