I am currently working on my master thesis, but i am struggeling a bit with how to prepare the data before I can analyze it.
Info on the data available:
15 years of historical data in separate files. Each year contains data on all of the firms which were operating that year. Each firm is identified with a unique number, and a set of variables. The number of variables each year is not constant, but the ones I need are there every year.
Goal:
Get a single file with yearly data for the firms which have been operating for the last 15 years (i.e not gone bankrupt or founded earlier than 15 years ago.)
Any tips on how I could do this?
Sorry if I have submitted imprecise and/or missing info.
If the variables in each year's data are consistent, for example, year1 may have variables a b c,
year2 may have a b c d; however, year1 doesn't have variables like a1 b c d when a1 and a measure the same quantity.
Then if this were my problem, I need to open each in a loop,
-generate/replace- a year variable (if there is not any) with the corresponding year of the file. -append- one by one.
For example, if the files are temporary files `file2001'.dta, ..., `file2012'.dta, the loop looks like
***************************
// prepare a dataset
clear
input x1 x2
1 2
end
forvalues x = 2001/2012 {
tempfile file`x'
save `file`x'', replace
}
// code starts
clear
generate year = .
forvalues x=2001/2012 {
append using `file`x''
replace year = `x' if missing(year)
}
list
// you could -keep- only interested variables here