Sampling methods, which is more appropriate?

#1
I have been collecting data on the temperature in an experiment for the last couple months, and I'm interested in the rate of temperate changes. The total amount of data is extremely large, and takes a long time to process, even using multi-threaded code. Currently I'm doing option 1, but I'm wondering if option 2 would be better?

Option 1: Selecting a random sample each day, and a random duration time for the measurement, recording the temperature change and looping over all the days until the designated total number of samples, x, is reached. x is usually a couple thousand.

Option 2: Selecting a specific day at random, selecting a random point and recording y samples, each of varying length. A new day is then chosen until the total number of samples >= x.

Option 2 would dramatically speed up the operation. How would you go about assessing whether Y is too large though? I'm assuming if x = 1000 and y = 500, then i wouldn't have a particularly accurate representation of my data since i would only be using 2 days out of several months. I'm assuming having the same initial point for each group of Y samples may be detrimental as well?

EDIT: Temperature is getting measured each time there is a 0.01 degree C change, so for each second of time that passes there can be a varying number of data points. The temperature measurements are taken from my parents green house, I figured it would be a good opportunity to dabble with some time series analysis / c# and my parents already had a suitable thermometer.
 
Last edited:

hlsmith

Less is more. Stay pure. Stay poor.
#2
Do you want to provide a little more information about the study and how often temperature is getting measured. This way we can better understand the context and if using either option may have greater limitations.