Hi there,
I need help with predicting the near-future values using an unevenly sampled time-series data. Data is collected as events, and is converted to time series. I have tried out a few approached which have not been successful. Please let me know if there is anything I can do around them, and if there are any better solutions.
Background
Objective
To understand the patterns in duration of each opening and to predict the significantly long openings (in the near future) of the refrigerator.
Modifying the data
Approaches already tried unsuccessfully
I am currently considering using NNs, SVMs and wavelet analysis. But I am not really sure if they would work. I am also reading some material on network security, as this problem is similar to predicting the next breach in security from network log data.
I can really use some ideas or any relevant papers regarding this. Any help is much appreciated.
I need help with predicting the near-future values using an unevenly sampled time-series data. Data is collected as events, and is converted to time series. I have tried out a few approached which have not been successful. Please let me know if there is anything I can do around them, and if there are any better solutions.
Background
- A sensor is placed on the door of a refrigerator
- Data: Opening and closing of the door (Door open=1, Door closed= 0) along with a timestamp of the event
- This is stored in a database for about 28,000 events
- The data is collected only when the door opens. So essentially the data is a series of open-close pairs (1010101010101010...)
Objective
To understand the patterns in duration of each opening and to predict the significantly long openings (in the near future) of the refrigerator.
Modifying the data
- The time duration for each opening is calculated using the timestamps
- Openings shorter than 1 minute were eliminated
- The remaining is around 8% of the data
- Long duration openings are rare and hence are outliers. We have an outlier duration prediction problem where X = time of opening and Y = corresponding duration. The aim is to build a method which can predict Y for a given X.
Approaches already tried unsuccessfully
- Function/polynomial Approximation - Approximating a function f(X) = Y doesn’t work in this scenario as the final result has to be translated to event data - and for predicting the future values, we would need a very high degree polynomial
- Autoregressive and moving average models - We are dealing with irregular time samples here
- Making the time series uniform by inserting zeros for every minute, and the outliers for the minute - it gets very heavy on memory and this approach is not scalable in the longer run
- Treating the data as a series e.g stock price prediction - In this case, the time of occurrence is as important as the occurrence itself. So, it's not possible to discount that factor.
- Outlier detection techniques - These techniques won’t work as we have already detected the outliers and we are only aiming to predict them
- Modelling the event as a Poisson process - can only be used for detecting patterns, cannot be used for predicting them
I am currently considering using NNs, SVMs and wavelet analysis. But I am not really sure if they would work. I am also reading some material on network security, as this problem is similar to predicting the next breach in security from network log data.
I can really use some ideas or any relevant papers regarding this. Any help is much appreciated.