Is it permissible to aggregate results from multiple cross sectional regressions?

Lets consider the following approach:

  1. We estimate a simple linear regression model y = b0 + b1*x + error The input data in our model estimation shall be:
    • y = 1 month stock returns (cross sectional) for month n
    • x = market cap for each stock (cross sectional) for month n
  2. We test if our coefficient b1 is statistically significant. And record the result for month n. We also record the R squared.
  3. We do step 1. & 2. for a total of N month.
  4. Finally we have recorded the results of N significance tests for b1, and calculate the percentage of times our coefficient was significant. For R squared we calculate the average of all N R squared.
Are the resulting 'synthetic' measures of significance & average R squared sound from a statistical perspective?
Is this approach of aggregating over multiple cross sectional regressions violating any best practices in statistical research?


Not a robit
Well, I will start off by asking why are you doing this and what is the purpose?

Second, significance is always a poor metric. You could have a pvalue of 0.0501 and you are saying no effect, when it could just be a statistical power issue for that month.

Have you considered time series approaches?
The overall research question, i.e. the purpose is: Is a stocks market cap a good predictor of its 1 month return? The intuition behinde the described approach is, that if the coefficient of the cross sectional model is significant in many of the N cases this sould mean that the variable is likely a good predictor over time.
What concrete time series approach would you propose?