How long is data still useful for

noetsi

Fortran must die
#1
I commonly read studies based on federal data sets that might be 30 plus years old. Something I had not thought of is, are the results in the study still valid enough to use? I know where is no definite answer to this question, but I wanted people's thoughts.
 

Miner

TS Contributor
#2
This would depend on the rate of change in the measure of interest. For example, geological data for a given region may be highly relevant even though it is 30 years old. On the other hand, demographic data for an area would only be relevant if you were studying the change in demographics over time. Using 30 year old demographic data to make assertions about the present would be unwise.
 

maartenbuis

TS Contributor
#3
The oldest data I have used was 52 years old at that time together with younger datasets. I looked at long term trends in the association between parental education and the education of their offspring. So that fits Miner's caveat that within the social sciences those older data are mainly useful for trend analysis.

The real challenge is finding such data and converting it to a format that is readable by todays computers. I remember a story where my advisor was really excited that he found another old dataset only to find out that they were stored on (cardboard) punch cards stored in a cellar invested with mice...
 
#4
I commonly read studies based on federal data sets that might be 30 plus years old. Something I had not thought of is, are the results in the study still valid enough to use?
Of course, the empirical evidence is exactly the same today as it was then.

The oldest data I have used was 52 years old at that time together with younger datasets.
We have all used the Fisher's Iris data set. And that is from the 1930:ies.
 

rogojel

TS Contributor
#5
There is an other caveat with older data - the measurement methodology might have changed over time as well as the process, so the numbers might mean something completely different today even if the labels are the same. Also, the precision might be completely different and so on...Even plotting the same kind of data from different data sources might be a problem, especially when looking at trends.

regards
 

maartenbuis

TS Contributor
#6
These were face to face interviews, so nothing spectacular changed with respect to the technology. The response rate has ofcourse changed (dropped) quite a bit. What did change was the educational system, so some harmonization was required.

In practice, old datasets are a pain, so noone accidentally uses old datasets. If you choose old datasets then you have done that conciously for a reason and are very well aware of the limitations.
 

noetsi

Fortran must die
#7
There is an other caveat with older data - the measurement methodology might have changed over time as well as the process, so the numbers might mean something completely different today even if the labels are the same. Also, the precision might be completely different and so on...Even plotting the same kind of data from different data sources might be a problem, especially when looking at trends.

regards
That is why you read the methods section carefully :)
 

rogojel

TS Contributor
#8
IF they are available, and written carefully, of course :)

I was once given a file with all the changeover times of a large aggregate, carefully maintained and updated over a period of 5 years, several hundred entries in total. Every single number was either 30 or 55.
 

noetsi

Fortran must die
#9
The ones I work with, other than the ones I run myself which are 99% of my data come with voluminous methods sections. That is the federal approach. The trouble is understanding the endless notes.