Divide tick-data into intervalls

#1
Hello,

Im currently writing my bachelor thesis in statistical finance and i have run into a small problem. I want to evaluate forcasts from my GARCH with realized intraday volatility. The intraday data is Tick-data over a certain period. The date column is presented as for example 2011-11-01 09:24:41 for different points in time. The other column is with the stock prices at that same time. What I want to do is to recieve the end courses of certain time intervals. For example i want to know what the closing course if for every five minute or ten minute interval in the sample. In other words, i want to transform the tick-data into k-minute-interval data.

I have been trying to this in the following way:

The data has been converted to a time serie and look likes:

price
2011-11-01 08:00:00 0.000000000
2011-11-01 08:00:00 0.000000000
2011-11-01 08:02:00 0.000000000
2011-11-01 08:03:00 -0.017033339
2011-11-01 08:24:00 0.000000000
2011-11-01 08:24:00 0.000000000
2011-11-01 08:29:00 0.000000000
2011-11-01 08:29:00 0.000000000
2011-11-01 08:29:00 0.000000000
2011-11-01 08:29:00 0.000000000
2011-11-01 08:29:00 0.002166062
2011-11-01 08:44:00 0.000000000
2011-11-01 08:44:00 -0.002166062
2011-11-01 08:44:00 0.004321374
2011-11-01 10:36:00 0.010618976
2011-11-01 15:59:00 0.002092990
2011-11-01 16:21:00 0.000000000
2011-11-01 16:30:00 0.004155960
2011-11-02 08:00:00 0.000000000
2011-11-02 11:50:00 0.000000000
2011-11-02 13:38:00 -0.002073009

and so on for 108 days (this stock is a small cap company, and therefore the infrequent trading)...

then i construct a sequence that consist all days there is no trade:

# Define m as the given days to loop through
start <- as.Date("2011-11-01")
end <- as.Date("2012-04-11")

# define a list of holidays
holidays <- as.Date(c("2011-12-26","2012-01-06","2012-04-06","2012-04-09",

## And days when there is no trading in the stock

"2012-01-03","2012-01-12","2012-01-16","2012-01-26","2012-03-23"))

# create a sequence of dates
allDates <- seq(start, end, by = "day")

# remove weekends
dayofweek <- as.POSIXlt(allDates)$wday
isweekend <- dayofweek==0L | dayofweek==6L
allDates <- allDates[!isweekend]

# delete holidays
allDates <- allDates[-match(holidays, allDates, nomatch = 0L)]

so then i control that both this sequence and my time serie consist of 108, and they do. Thereafter i am trying to perform this loop:

# Loop through all days
for(i in allDates){

# Take previoustick given intervall k
aggregatePrice(t, k=60,
marketopen="09:00:00", marketclose="17:30:00")
}

where the command aggregatePrice only work for one day at the time, and thats why im using the for() command, i want to perform this command for all days in the time serie.

But everytime i am getting this message:

Error in `[.xts`(t, i) : subscript out of bounds

I run traceback() and receive:

7: stop("subscript out of bounds")
6: `[.xts`(t, i)
5: t
4: is.data.frame(x)
3: colnames(a)
2: dataformatc(ts)
1: aggregatePrice(t, k = 60, marketopen = "09:00:00", marketclose = "17:30:00") at #4

I have been trying to understand why this happend and find a remedy, but so far i havent been able to solve the problem...

someone that might want to help me? please?!
 

Jake

Cookie Scientist
#2
It's happening because in your for loop where you write:
Code:
for(i in allDates)
you are attempting to use the actual elements of allDates as the indexing variables. Obviously this doesn't make sense as allDates is a collection of Date objects. Replacing the above with
Code:
for(i in seq_along(allDates))
should get you what you want (or at least fix the error message you brought up here).

For future reference, it is preferred that when posting code you encapsulate it in code tags, [noparse]
Code:
like this
[/noparse]
 
#3
It's happening because in your for loop where you write:
Code:
for(i in allDates)
you are attempting to use the actual elements of allDates as the indexing variables. Obviously this doesn't make sense as allDates is a collection of Date objects. Replacing the above with
Code:
for(i in seq_along(allDates))
should get you what you want (or at least fix the error message you brought up here).

For future reference, it is preferred that when posting code you encapsulate it in code tags, [noparse]
Code:
like this
[/noparse]



Thanks for your answer... I just started to use R, and that why I may have some silly questions... However I tried your suggestion, and as you said the error message vanished. Although I didnt received the "right" output.

Code:
# Loop through all days
for(i in seq_along(m)){
 	
 		# Take previoustick given intervall k
 		result <- aggregatePrice(t[i], k=60, 
 		marketopen="09:00:00", marketclose="17:30:00")
}

result
                        [,1]
2011-11-28 09:00:00 0.014582
2011-11-28 09:00:00 0.014582
2011-11-28 17:30:00 0.014582
But when i use this function for just one day i receive:

Code:
aggregatePrice(t["2011-11-02"], k=60, marketopen="09:00:00", marketclose="17:30:00")
                            [,1]
2011-11-02 09:00:00  0.000000000
2011-11-02 09:00:00  0.000000000
2011-11-02 10:00:00  0.000000000
2011-11-02 11:00:00  0.000000000
2011-11-02 12:00:00  0.000000000
2011-11-02 13:00:00  0.000000000
2011-11-02 14:00:00 -0.002073009
2011-11-02 15:00:00  0.000000000
2011-11-02 16:00:00  0.000000000
2011-11-02 17:00:00  0.014443428
2011-11-02 17:30:00  0.014443428
As you can see when i am taking just one single day the function works, i am getting the last ticket price for every hour (k=60). I want that the loop returns a serie with the last ticket every hour for all days in the original time serie. Do you have any idea?
 

Jake

Cookie Scientist
#4
Try defining "result" as a list outside of the for loop, and then storing the returned value of each call to aggregatePrice() in result[]. What you are doing now is recreating "result" in each iteration of the loop, so you only end up retaining the value from the very last call to aggregatePrice(). Make sense?
 
#5
Try defining "result" as a list outside of the for loop, and then storing the returned value of each call to aggregatePrice() in result[]. What you are doing now is recreating "result" in each iteration of the loop, so you only end up retaining the value from the very last call to aggregatePrice(). Make sense?



Well, I think so...

Something like this?

Code:
result <- rep(NA, 1188)
It should be 1188 number of observations in the new data when the intervall, k, is set to bo 60.

However, i get this error message:

Code:
Error in h[[i]] <- aggregatePrice(t[i], k = 60, marketopen = "09:00:00",  : 
  more elements supplied than there are to replace
Again, thanks for your help!
 

Jake

Cookie Scientist
#6
Trying setting up result as a list (not a vector) like this:
Code:
result <- list()
Then after the results are stored you will probably want to collapse the list down into a data.frame or matrix.