I have been collecting data on the temperature in an experiment for the last couple months, and I'm interested in the rate of temperate changes. The total amount of data is extremely large, and takes a long time to process, even using multi-threaded code. Currently I'm doing option 1, but I'm wondering if option 2 would be better?

Option 2 would dramatically speed up the operation. How would you go about assessing whether Y is too large though? I'm assuming if x = 1000 and y = 500, then i wouldn't have a particularly accurate representation of my data since i would only be using 2 days out of several months. I'm assuming having the same initial point for each group of Y samples may be detrimental as well?

**Option 1:**Selecting a random sample each day, and a random duration time for the measurement, recording the temperature change and looping over all the days until the designated total number of samples, x, is reached. x is usually a couple thousand.**Option 2:**Selecting a specific day at random, selecting a random point and recording y samples, each of varying length. A new day is then chosen until the total number of samples >= x.Option 2 would dramatically speed up the operation. How would you go about assessing whether Y is too large though? I'm assuming if x = 1000 and y = 500, then i wouldn't have a particularly accurate representation of my data since i would only be using 2 days out of several months. I'm assuming having the same initial point for each group of Y samples may be detrimental as well?

**EDIT:**Temperature is getting measured each time there is a 0.01 degree C change, so for each second of time that passes there can be a varying number of data points. The temperature measurements are taken from my parents green house, I figured it would be a good opportunity to dabble with some time series analysis / c# and my parents already had a suitable thermometer.
Last edited: