can anyone help me?
hi,
I met some problems in R, plz help me.
1. How to do a intersect operation among several groups in one list, without a loop statement? (I think It may be a list)
create data:
I want to get intersection of product between every year. I know the basic method is:Code:myData <- data.frame(product = c(1,2,3,1,2,3,1,2,2), year=c(2009,2009,2009,2010,2010,2010,2011,2011,2011),value=c(1104,608,606,1504,508,1312,900,1100,800)) mySplit<- split(myData,myData$year) mySplit $`2009` product year value 1 1 2009 1104 2 2 2009 608 3 3 2009 606 $`2010` product year value 4 1 2010 1504 5 2 2010 508 6 3 2010 1312 $`2011` product year value 7 1 2011 900 8 2 2011 1100 9 2 2011 800
this will give the correct answer:Code:intersect(intersect(mySplit[[1]]$product, mySplit[[2]]$product),mySplit[[3]]$product)
above code lacks reusability, so It should use a for loop:Code:[1] 1 2
It's correct too, but stll too complex, so my question is:Code:myIntersect<-mySplit[[1]]$product for (i in 1:length(mySplit)-1){ myIntersect<-intersect(myIntersect,mySplit[[i+1]]$product) }
Can I do the same thing just use another similar intersect function (without for/repeat/while).
What's this simple function's name ?
2.how to do a relative computation after split (notice: not befor split)?
create data:
I want compute relative value in the every group, what I mean is , I want get the result is just like below:Code:myData1 <- data.frame(product = c(1,2,3,1,2,3), year=c(2009,2009,2009,2010,2010,2010),value=c(1104,608,606,1504,508,1312),relative=0) mySplit1<- split(myData1,myData1$year) mySplit1 $`2009` product year value relative 1 1 2009 1104 0 2 2 2009 608 0 3 3 2009 606 0 $`2010` product year value relative 4 1 2010 1504 0 5 2 2010 508 0 6 3 2010 1312 0
I think to use a loop maybe work, but Is there no direct method on list?Code:$`2009` product year value relative 1 1 2009 1104 0 2 2 2009 608 -496 3 3 2009 606 -2 $`2010` product year value relative 4 1 2010 1504 0 5 2 2010 508 -996 6 3 2010 1312 804
3.how to do a sorting after split, It's just like above question, what I want is sorting by value:
4. how to do a filtering after split, Yes, It's just like above quetion, what I want is filtering out data which value is more than 1000:Code:$`2009` product year value relative 3 3 2009 606 0 2 2 2009 608 0 1 1 2009 1104 0 $`2010` product year value relative 5 2 2010 508 0 6 3 2010 1312 0 4 1 2010 1504 0
Code:$`2009` product year value relative 1 1 2009 1104 0 $`2010` product year value relative 4 1 2010 1504 0 6 3 2010 1312 0
Last edited by bestbird7788; 06-06-2012 at 02:09 AM.
can anyone help me?
You ask too much that is fairly simple. Try asking just one question at a time. I would also suggest visiting this thread and finding some good intro to R material.
Here is some code that should help with your first two questions
Code:myData <- data.frame(product = c(1,2,3,1,2,3,1,2,2), year=c(2009,2009,2009,2010,2010,2010,2011,2011,2011),value=c(1104,608,606,1504,508,1312,900,1100,800)) mySplit<- split(myData,myData$year) # Only grab the products productList <- split(myData$product, myData$year) # Reduce repeated applies intersect Reduce(intersect, productList) for(i in seq(mySplit)){ mySplit[[i]]$relative <- c(0, diff(mySplit[[i]]$value)) } mySplit
"His programming is malfunctioning. It begins! Get your weapons, he's going to become a killbot!!!" - bryangoodrich
Very grateful, thank you. I will follow your advice.
answer 1 is perfect, I had never kown function Reduce, I need these structural functions like Reduce or do.call.
answer 2 is um..... do we have to use a loop expression? I mean for expression is already very simple here, but Is there some expression like Reduce that can edge out loop?
Sure there are probably ways to get rid of the loop. But can I ask you a quesiton - why?
No offense but you don't seem to be the best with R at the moment and a loop is a very clear direct way to do what you want to do. Before you focus on optimizing the crap out of everything your focus should just be getting things done.
"His programming is malfunctioning. It begins! Get your weapons, he's going to become a killbot!!!" - bryangoodrich
OK, It seemed I should tell you the secret: ur...., me and my small team were not good at program, some statements such as while, for, loop puzzled me.
I want to avoid loop, but if statement and functions are acceptable.
I'm good( I think) at EXCEL ,but not VBA, and I know business, those can explain all.
Dason's loop is likely to be pretty darn fast. You may have heard loops are slow in R. This is true if you compare a loop to a vectorized solution (not applicable in this case; I don't think) so we could use an lapply solution but the loop is likely as fast if not faster (a few releases of R ago the slow loop issue was addressed). You could compile the loop later on to gain even more speed if you wish.
Also Reduce is pretty nice but it is also pretty slow. It's eloquence comes with a price... speed.
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
It's hard to write the code without a loop, but check this one:answer 2 is um..... do we have to use a loop expression? I mean for expression is already very simple here, but Is there some expression like Reduce that can edge out loop?
=mySplit1.(~.dup@t().derive(if(#==1,0,value-value[-1])))
~means current group, # means current row number, [-1] means last row.
I googled "~.dup@t()" "derive" and I found this:http://www.******.com/forum/index.php?topic=8674.0
I thought that's a language named ******, but I have no interest in another language, sorry for that.
hopes that not make you in bad taste.
I think If you show me the whole code in ****** at your free time, I will not mind to try it.
|
|