+ Reply to Thread
Results 1 to 14 of 14

Thread: Could anyone recommend an interactive application for mass data analysis?

  1. #1
    Points: 220, Level: 4
    Level completed: 40%, Points required for next Level: 30

    Posts
    17
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Red face Could anyone recommend an interactive application for mass data analysis?




    Hi, everybody
    I need to conduct a large amount of data analysis on database. Could anyone recommend an interactive application for data analysis?
    The requirements are:
    1. Able to cope with the unexpected requirement rapidly.
    2. Able to perform further computations on results interactively (base on the mass data).
    3. Easy to confront even a large amount of complex computations
    What would you great expert recommend?
    Thanks in advance.

  2. #2
    Points: 220, Level: 4
    Level completed: 40%, Points required for next Level: 30

    Posts
    17
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Could anyone recommend an interactive application for mass data analysis?

    any suggestion?

  3. #3
    Devorador de queso
    Points: 95,705, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,931
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Could anyone recommend an interactive application for mass data analysis?

    R
    I don't have emotions and sometimes that makes me very sad.

  4. #4
    Points: 220, Level: 4
    Level completed: 40%, Points required for next Level: 30

    Posts
    17
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Could anyone recommend an interactive application for mass data analysis?

    thank you, Dason
    I spend a whole day to study R, It's powerful, I think It just seems like what I want. And I find A extend tools of R, named RExcel. I have a question about it, can you give me some tips? My question is:
    Can I refer a result by cell name in RExcel? What I mean is: when I write "c(1,2)" or other format code in cell A1, Can I get a furter result in A2 by write "sum(A1)". Please notice that I want refer c(1,2) by cell name A1. I konw the traditional usage is: write "c(1,2)->M" in A1, and "sum(M)" in A2, but This traditional usage is not what I want.
    thanks for your further replay.

  5. #5
    Devorador de queso
    Points: 95,705, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,931
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Could anyone recommend an interactive application for mass data analysis?

    I've never used RExcel. I just do everything within R. If you're looking for a nicer platform to run R in I can recommend RStudio.
    I don't have emotions and sometimes that makes me very sad.

  6. #6
    Points: 220, Level: 4
    Level completed: 40%, Points required for next Level: 30

    Posts
    17
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Could anyone recommend an interactive application for mass data analysis?

    hi, Dason
    Forget RExcel, It is useless.
    I have studied R for all my day since you had suggested it to me. I have learned how to use split, aggregate and so on.
    I met a problem, I think It maybe easy to you, Please give me a hand. I have spend much time to compute these simple comutation, but I stick at group and relative position computation.
    Please check for me how to write these steps in R code.
    --------------------------------------
    original data a data frame, named "result"
    product year value
    1 2009 1104
    2 2009 608
    3 2009 606
    1 2010 1504
    2 2010 508
    3 2010 1312

    I want to compute the product whose annual sales values are all among the top 2. so I need to do
    step1: split data by year
    2009
    product year value
    1 2009 1104
    2 2009 608
    3 2009 606
    2010
    product year value
    1 2010 1504
    2 2010 508
    3 2010 1312

    step2: compute rank in every group
    2009
    product year value rank
    1 2009 1104 1
    2 2009 608 2
    3 2009 606 3
    2010
    product year value rank
    1 2010 1504 1
    2 2010 508 3
    3 2010 1312 2

    step3: filter top2 in every group
    2009
    product year value rank
    1 2009 1104 1
    2 2009 608 2
    2010
    product year value rank
    1 2010 1504 1
    3 2010 1312 2

    step4: compute intersection of every group
    product
    1

    -----------------------------------------------------
    My second problem
    the original data is based on step1 of form2:
    2009
    product year value
    1 2009 1104
    2 2009 608
    3 2009 606
    2010
    product year value
    1 2010 1504
    2 2010 508
    3 2010 1312

    I want to get relative difference in every group (It's a simple sample, has no actual business means, in business I want to solve link relative ratio problem)
    that means: 0, 608-1104,606-608 and 0, 508-1504, 1312-508. I know in SQL It should use lag and lead function, but How to do the same thing in R.
    What I want in result should be:
    2009
    product year value relative
    1 2009 1104 0
    2 2009 608 -494
    3 2009 606 -2
    2010
    product year value relative
    1 2010 1504 0
    2 2010 508 -996
    3 2010 1312 804
    Last edited by bestbird7788; 05-29-2012 at 01:46 AM.

  7. #7
    Phineas Packard
    Points: 16,013, Level: 81
    Level completed: 33%, Points required for next Level: 337
    Lazar's Avatar
    Location
    Sydney
    Posts
    1,159
    Thanks
    198
    Thanked 336 Times in 299 Posts

    Re: Could anyone recommend an interactive application for mass data analysis?

    Let this get you started:
    Code: 
    myData <- data.frame(product = rep(1:3, 2), year=rep(2009:2010, each=3),
                         value=c(1104,608,606,1504,508,1312))
    
    myData$R <- do.call(c, tapply(myData$value, myData$year, 
                                      function(x) rank(-x)) )
    
    myDataTops <- myData[which(myData$R<3),]
    myDataTops <- split(myDataTops, myDataTops$year)
    
    intersect(myDataTops[[1]]$product, myDataTops[[2]]$product)
    Part 2:
    Code: 
    f1 <- function(x){
                      y <- rep(0, length(x))
                      y[c(2:length(x))] <- diff(x)
                      y
                      }
    
    myData$relative <- do.call(c,tapply(myData$value, myData$year,f1) )

  8. #8
    Points: 220, Level: 4
    Level completed: 40%, Points required for next Level: 30

    Posts
    17
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Could anyone recommend an interactive application for mass data analysis?

    thank you Lazar
    I have run the code and got correct result, I have studied every function you provided ,It spent me 5 hours, but I felt happy.
    you awsome!
    It's nearly perfect, but I want to improve it on like steps in my last post, I checked many document and find nothing.please give me some tips.
    1. how to tread intersect of uncertain years.
    Actually the number of years is uncertain, It maybe many years, not just 2 years. so I cant just use myDataTops[[1]] and myDataTops[[2]] in intersect function.
    Maybe I should write a function myself, using "for" to access every year? But It seems stupid. Is there a easy way to do this?
    2. how to split data at first , then to compute relative, (your computation is not base on a split date) (It seems more easy to understand, and is my original requirement). what I mean is:
    split data at first:
    Code: 
           myData <- data.frame(product = rep(1:3, 2), year=rep(2009:2010, each=3),value=c(1104,608,606,1504,508,1312))
           myData$relative<-0
           split(myData,myData$year)
    We can got temporary result:
    Code: 
           $`2009`
      product year value relative
    1       1 2009  1104        0
    2       2 2009   608        0
    3       3 2009   606        0
    
    $`2010`
      product year value relative
    4       1 2010  1504        0
    5       2 2010   508        0
    6       3 2010  1312        0
    what I want is how to write code to change it to below?
    Code: 
           $`2009`
      product year value relative
    1       1 2009  1104        0
    2       2 2009   608        -496
    3       3 2009   606        -2
    
    $`2010`
      product year value relative
    4       1 2010  1504        0
    5       2 2010   508        -996
    6       3 2010  1312        804
    3.how to split data at first then to rank ,then to filter,then to intersect.
    a. split data at first (just like question 2),we got
    Code: 
           $`2009`
      product year value R
    1       1 2009  1104        0
    2       2 2009   608        0
    3       3 2009   606        0
    
    $`2010`
      product year value R
    4       1 2010  1504        0
    5       2 2010   508        0
    6       3 2010  1312        0
    b. how to change it to:
    Code: 
           $`2009`
      product year value R
    1       1 2009  1104        1
    2       2 2009   608        2
    3       3 2009   606        3
    
    $`2010`
      product year value R
    4       1 2010  1504        1
    5       2 2010   508        3
    6       3 2010  1312        2
    c. then how to filter top 2 to
    Code: 
           $`2009`
      product year value R
    1       1 2009  1104        1
    2       2 2009   608        2
    
    $`2010`
      product year value R
    4       1 2010  1504        1
    6       3 2010  1312        2
    d. then how to compute intersect to:
    1

    Quote Originally Posted by Lazar View Post
    Let this get you started:
    Code: 
    myData <- data.frame(product = rep(1:3, 2), year=rep(2009:2010, each=3),
                         value=c(1104,608,606,1504,508,1312))
    
    myData$R <- do.call(c, tapply(myData$value, myData$year, 
                                      function(x) rank(-x)) )
    
    myDataTops <- myData[which(myData$R<3),]
    myDataTops <- split(myDataTops, myDataTops$year)
    
    intersect(myDataTops[[1]]$product, myDataTops[[2]]$product)
    Part 2:
    Code: 
    f1 <- function(x){
                      y <- rep(0, length(x))
                      y[c(2:length(x))] <- diff(x)
                      y
                      }
    
    myData$relative <- do.call(c,tapply(myData$value, myData$year,f1) )
    Last edited by bestbird7788; 05-30-2012 at 03:17 AM.

  9. #9
    Phineas Packard
    Points: 16,013, Level: 81
    Level completed: 33%, Points required for next Level: 337
    Lazar's Avatar
    Location
    Sydney
    Posts
    1,159
    Thanks
    198
    Thanked 336 Times in 299 Posts

    Re: Could anyone recommend an interactive application for mass data analysis?

    I will try to get to all your questions but just quickly, I decided against spliting the data at first because I thought in your case it was easier to just work with the whole data set. It is still identical to what you wanted in the first place all that you need to do is split the data at the end if you so desire. i.e. run all the script I wrote and then
    myDataList <- split(myData, myData$year).


    this gives:
    Code: 
    $`2009`
      product year value R relative
    1       1 2009  1104 1        0
    2       2 2009   608 2     -496
    3       3 2009   606 3       -2
    
    $`2010`
      product year value R relative
    4       1 2010  1504 1        0
    5       2 2010   508 3     -996
    6       3 2010  1312 2      804
    Which is what you want right?

  10. #10
    Phineas Packard
    Points: 16,013, Level: 81
    Level completed: 33%, Points required for next Level: 337
    Lazar's Avatar
    Location
    Sydney
    Posts
    1,159
    Thanks
    198
    Thanked 336 Times in 299 Posts

    Re: Could anyone recommend an interactive application for mass data analysis?

    1. how to tread intersect of uncertain years.
    Actually the number of years is uncertain, It maybe many years, not just 2 years. so I cant just use myDataTops[[1]] and myDataTops[[2]] in intersect function.
    Do you mean all the combinations? In other words the intersect between 2009 and 2010, 2009 and 2011, and 2010 and 2011? If this is the case then:
    Code: 
    #Gives all the possible combinations between the years
    combs <- expand.grid(1:length(myDataTops), 1:length(myDataTops))
    #Gets rid of comparisons with own year
    combs <- combs[!(combs[,1] == combs[,2]),]
    #Gets rid of duplicate combinations
    combs <-   combs[!duplicated(t(apply(combs, 1, sort))), ]
    
    sapply(1:nrow(combs), function(x){intersect(myDataTops[[combs[x,1]]]$product, 
                                                        myDataTops[[combs[x,2]]]$product)
                                                       }
              )
    Will give you what you want but the labelling is not really ideal of the output is not really clear.

  11. #11
    Points: 220, Level: 4
    Level completed: 40%, Points required for next Level: 30

    Posts
    17
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Could anyone recommend an interactive application for mass data analysis?

    thanks for quick reply.
    No, what I mean is not all the possible combinations.
    I mean there maybe many years, e.g. 2008,2009,2010,2012
    if I use intersect , I must code as below:
    result1<-intersect(myDataTops[[1]]$product, myDataTops[[2]]$product)
    result2<-intersect(result1, myDataTops[[3]]$product)
    result3<-intersect(result2, myDataTops[[4]]$product)

    to avoid this handword, I must use a for loop to tread random multiple years ( I have not studied loop statement)
    But loop statement seems a stupid way, there should be a single function or easy way to tread with thses situation, shouldn't it?

    Quote Originally Posted by Lazar View Post
    Do you mean all the combinations? In other words the intersect between 2009 and 2010, 2009 and 2011, and 2010 and 2011? If this is the case then:
    Code: 
    #Gives all the possible combinations between the years
    combs <- expand.grid(1:length(myDataTops), 1:length(myDataTops))
    #Gets rid of comparisons with own year
    combs <- combs[!(combs[,1] == combs[,2]),]
    #Gets rid of duplicate combinations
    combs <-   combs[!duplicated(t(apply(combs, 1, sort))), ]
    
    sapply(1:nrow(combs), function(x){intersect(myDataTops[[combs[x,1]]]$product, 
                                                        myDataTops[[combs[x,2]]]$product)
                                                       }
              )
    Will give you what you want but the labelling is not really ideal of the output is not really clear.

  12. #12
    Points: 220, Level: 4
    Level completed: 40%, Points required for next Level: 30

    Posts
    17
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Could anyone recommend an interactive application for mass data analysis?

    hi, Lazar
    thanks for your help. I think I didn't describe clearly and I make some wrong description in my reply, so you completely misread my intentions.
    The result you posted is just what I want. But I didn't just want a final result, I want a natural( I mean thinking things directly and simplely ) and easy way to analyse data for a analyst. I think these steps is more natural ("Part" 1 in your reply):
    1. split data to some groups by year.
    2. get rank in each group. ( for further analysis, I can run computations like max, min based on these step).
    3. get top 2 in each group. ( so I can observe importent data by group, It's more clear than a non-splited data).
    4. get intersect of every group. here I solved problem:the product whose annual sales values are all among the top 2.

    next, I want to compute relative difference of every record in each year( "part2" in your reply) , I found splited data still there in step1 above , so I can make a further compution based on step1 above( need not rewrite it, It seems call "code reuse"?). the nature thought is to directly add a relative value in each group based on step1 above , and not to start from non-splited original data.
    did I make it clear? Sorry for my poor expression ability.


    Quote Originally Posted by Lazar View Post
    I will try to get to all your questions but just quickly, I decided against spliting the data at first because I thought in your case it was easier to just work with the whole data set. It is still identical to what you wanted in the first place all that you need to do is split the data at the end if you so desire. i.e. run all the script I wrote and then
    myDataList <- split(myData, myData$year).


    this gives:
    Code: 
    $`2009`
      product year value R relative
    1       1 2009  1104 1        0
    2       2 2009   608 2     -496
    3       3 2009   606 3       -2
    
    $`2010`
      product year value R relative
    4       1 2010  1504 1        0
    5       2 2010   508 3     -996
    6       3 2010  1312 2      804
    Which is what you want right?

  13. #13
    Phineas Packard
    Points: 16,013, Level: 81
    Level completed: 33%, Points required for next Level: 337
    Lazar's Avatar
    Location
    Sydney
    Posts
    1,159
    Thanks
    198
    Thanked 336 Times in 299 Posts

    Re: Could anyone recommend an interactive application for mass data analysis?

    You can do it your way but it is very hard to effeciently define new varaibles when you have your dataframes in a list. My understanding is that it requires a for loop to do so. Surely the most natural way to analyses the data is that which gives you your answer efficently?

  14. The Following User Says Thank You to Lazar For This Useful Post:

    bestbird7788 (06-01-2012)

  15. #14
    Points: 220, Level: 4
    Level completed: 40%, Points required for next Level: 30

    Posts
    17
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Could anyone recommend an interactive application for mass data analysis?


    Thanks for your professional attitude
    The most natural way to analyses the data is ideal, but I shuold give it up if It's very hard to write the code. If I change the steps as below, Can It make the code more easy than my original steps?
    1. copy the original data: myDataRank<-myData
    2. add a 0 value column (named R): myDataRank$R<-0
    3. split data by year: split(myDataRank,myData$year)
    4. get rank in each group, assign rank to column R.
    5. get top 2 in each group
    6. get intersect of every group. It may as well do not use a loop here.
    please give some tips how to write code for step 4,5,6

    next
    1. copy the original data: myDataRelative<-myData
    2. add a 0 value column (named relative): myDataRelative$relative<-0
    3. split data by year: split(myDataRank,myData$year)
    4. compute relative difference,assign it to column relative.
    please give some tips how to write code for step 4

    The more close to natural, the better, if code is not very hard.
    You may take notice of that is my way.
    I'm sorry to bring your some trouble, hopes not bother you.


    Quote Originally Posted by Lazar View Post
    You can do it your way but it is very hard to effeciently define new varaibles when you have your dataframes in a list. My understanding is that it requires a for loop to do so. Surely the most natural way to analyses the data is that which gives you your answer efficently?

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats