Hello dear forum members,
Using county-level panel data (N = 2500+, T = 3) I aim to examine the associations between multiple biological, socioeconomic, and psychological factors and cancer incidence. There are two outcome measures available for me:
(1) cancer incidence rate, IR = (New cancers / Population) × 100,000
M = 197.19, SD = 51.12, Min = 0, Max = 610.6, Skewness = .1, Kurtosis = 7.25
(2) count of new occurrences
M = 241.4, SD = 674.11, Min = 0, Max = 17,742, Variance = 454,427.7
My initial approach is to model (continuous) rates using OLS. However, although its distribution is relatively normal, there is some variation in the tails:
As a result, the OLS residuals are far from perfect:
Question 1: What modeling approach would you recommend to address such variation? I realize quantile regression is one option (with its ups and downs), but perhaps there are other "standard" ways to model rates?
Question 2: Are there any reasons why I should use specifically rates, or instead counts for the purpose of analysis? Is there any general consensus on this?
Your feedback would be greatly appreciated.
Using county-level panel data (N = 2500+, T = 3) I aim to examine the associations between multiple biological, socioeconomic, and psychological factors and cancer incidence. There are two outcome measures available for me:
(1) cancer incidence rate, IR = (New cancers / Population) × 100,000
M = 197.19, SD = 51.12, Min = 0, Max = 610.6, Skewness = .1, Kurtosis = 7.25
(2) count of new occurrences
M = 241.4, SD = 674.11, Min = 0, Max = 17,742, Variance = 454,427.7
My initial approach is to model (continuous) rates using OLS. However, although its distribution is relatively normal, there is some variation in the tails:


As a result, the OLS residuals are far from perfect:

Question 1: What modeling approach would you recommend to address such variation? I realize quantile regression is one option (with its ups and downs), but perhaps there are other "standard" ways to model rates?
Question 2: Are there any reasons why I should use specifically rates, or instead counts for the purpose of analysis? Is there any general consensus on this?
Your feedback would be greatly appreciated.
Last edited: