Predicting the near-future values using an unevenly sampled time-series data

#1
Hi there,

I need help with predicting the near-future values using an unevenly sampled time-series data. Data is collected as events, and is converted to time series. I have tried out a few approached which have not been successful. Please let me know if there is anything I can do around them, and if there are any better solutions.

Background

  • A sensor is placed on the door of a refrigerator
  • Data: Opening and closing of the door (Door open=1, Door closed= 0) along with a timestamp of the event
  • This is stored in a database for about 28,000 events
  • The data is collected only when the door opens. So essentially the data is a series of open-close pairs (1010101010101010...)

Objective

To understand the patterns in duration of each opening and to predict the significantly long openings (in the near future) of the refrigerator.

Modifying the data

  • The time duration for each opening is calculated using the timestamps
  • Openings shorter than 1 minute were eliminated
  • The remaining is around 8% of the data
  • Long duration openings are rare and hence are outliers. We have an outlier duration prediction problem where X = time of opening and Y = corresponding duration. The aim is to build a method which can predict Y for a given X.

Approaches already tried unsuccessfully

  1. Function/polynomial Approximation - Approximating a function f(X) = Y doesn’t work in this scenario as the final result has to be translated to event data - and for predicting the future values, we would need a very high degree polynomial
  2. Autoregressive and moving average models - We are dealing with irregular time samples here
  3. Making the time series uniform by inserting zeros for every minute, and the outliers for the minute - it gets very heavy on memory and this approach is not scalable in the longer run
  4. Treating the data as a series e.g stock price prediction - In this case, the time of occurrence is as important as the occurrence itself. So, it's not possible to discount that factor.
  5. Outlier detection techniques - These techniques won’t work as we have already detected the outliers and we are only aiming to predict them
  6. Modelling the event as a Poisson process - can only be used for detecting patterns, cannot be used for predicting them

I am currently considering using NNs, SVMs and wavelet analysis. But I am not really sure if they would work. I am also reading some material on network security, as this problem is similar to predicting the next breach in security from network log data.

I can really use some ideas or any relevant papers regarding this. Any help is much appreciated.