Help on nested for loop please!!!

tobi

New Member
#1
Dear R experts,
I am struggling with my raw data. I am trying to filter data by a nested loop, it has been running for days. I think my loop function is not optimal.

I have two sets of data frame, one is a list of event dates of 100 companies during 10 years (6796 obs) as in this photo:


the other set is the list of trade dates of those 100 firms (19,523 obs). In this data set, there are 2 N.A variables (e_windows and e_dates) that I want to fill in after filtering data.


My goal is to filter all the instrds$Trade.date which were made before events$Date from 1-40 days and then fill the difference numbers in instrds$e_windows and the event dates which are satisfied the loop condition in instrds$e_dates

The code I used for this is
Code:
for(i in 1:nrow(events)) {
  for(j in 1:nrow(instrds)) {
    if(events$Date[i] - instrds$Trade.date[j]>0 & events$Date[i] - instrds$Trade.date[j] <=40 & instrds$Company[j] == events$Ticker[i]){
      instrds$e_windows[j] = events$Date[i] - instrds$Trade.date[j]; instrds$e_dates[j] = events$Date[i]}
  }
}
However, It has been taking too long time to finish.
Could you please help me if there is any solution for this?

Thanks in advance,
tobi
 
Last edited:
#2
From what you have posted, I cannot overview, if your algorithms could be shortened.

Typical approaches to speed R code up, are

  • Avoid R looping by apply and its derivatives
  • Do not concatenate to data in a loop (define the whole data result before you start looping and within the loop assign to already defined empty data, instead)
  • use vector, matrix data type instead of data.frame
  • use compile from package compile
  • use for each loop to gain multi-core computing

Consuli
 

bryangoodrich

Probably A Mammal
#3
Your explanation needs more clarity, as I don't know precisely what you're trying to do. In any case, it doesn't look like you need to do for loops at all. R is a vectorized language, that provides means to extract and assign values to data frames ("tables") by operating on entire columns (vectors) of data. There is no reason to then go row-by-row to check something and do something. You want to abstract to what you're doing to the entire column vector as a whole.

To help you, I suggest you shrink your problem set to something that demonstrates what you are trying to do. We don't have your data, but you can make a smaller version of your problem set. You can output it and provide that here by using the dput function (dput your data frame and provide us the output). Show us what a simplified version of your problem looks like and what you expect the output from that to look like. In doing that, you may yourself better understand your own problem, so it is good practice to do, regardless.
 

tobi

New Member
#4
Thank you guys for your detailed comments. I am sorry for my bad expression making you not to understand
I have found the solution myself and still used loops. Like you said to avoid using loops to speed up R code. Therefore, I hope I could have better choice for my data preparation.

To clearly understand my problem, I would like to describe my issue again: (I created my small sample in excel for easy to observe)
I have to datasets which are Data1 and Data2. Data1 has 2 variables 'ticker' and 'tradedate'. Data2 has 2variables 'Ticker' and 'Date'.
I want to find all the dates in Data1$tradedate which are made before Data2$Date from 1-40day (with the condition that values in Data1$ticker match with Data2$Ticker). If these logical conditions are satisfied then fill in Data1$ewindow with numbers of different dates and Data1$edate with date value in Data2$Date. The final data needed is dataframe Data1
I have 25000obs in data1 and 8000 obs in data2. It would take time if I dont use loop. I though

This is my data


I use my code:
Code:
for(i in 1:nrow(Data2)) {
  n = min(which(Data1$ticker == Data2$Ticker[i]))
  m = n-1 + nrow(subset(it, Data1$ticker == Data2$Ticker[i] ))
  for(j in n:m) {
    if (Data2$Date[i] - Data1$tradedate[j]>0 & Data2$Date[i] - Data1$tradedate[j] <=40){
      Data1$ewindows[j] = Data2$Date[i] - Data1$tradedate[j]; Data1$edate[j] = Data2$Date[i]
    }
  }
}
This is my result after running the code.



This solution might not be the optimal solution. So , if you have any idea for this please help.
thank you somuch
 
Last edited:
#5
There is no reason to then go row-by-row to check something and do something. You want to abstract to what you're doing to the entire column vector as a whole.
True.
And when you have several columns or output-list, you use apply() and its derivatives for iterate on them. Loop iteration is in R usually only required, when something has to be performed conditionally.

Show us what a simplified version of your problem looks like and what you expect the output from that to look like. In doing that, you may yourself better understand your own problem, so it is good practice to do, regardless.
Exactly.