How to pick the better model parameter in this example?

ColdKing

New Member
Let's say the model is fixed (e.g., a linear model where 5 coefficients need to be fit). Denote the parameter (coefficient) vector as [TEX]\beta[/TEX].

First, you are given 50 data (X,y) and use them to get a [TEX]\beta[/TEX], say [TEX]\beta_1[/TEX];
Then, you are given additional 50 data (so now you have 100 data) and you can use them to get another [TEX]\beta[/TEX], say [TEX]\beta_2[/TEX].

Which [TEX]\beta[/TEX] should you pick? And what is a good criterion and method?

I think with the second one fit by 100 data, RSS/n (training error) should be lower than the first one (is it correct?) However, I think training error is not a good evaluation metric and we should use test error. But what if in this case? Dividing the first group to (25,25) -- half used for testing and similar for the second group seems not so fair.

Anybody has a better solution? Thanks!

Dason

Is there any reason that you wouldn't want to use all of the data?

noetsi

No cake for spunky
You would not normally do a linear model where part of the data is used to estimate one parameter and a second set of data used to estimate a second parameter. If the data sets were equivalent it would be better to use more cases to estimate both parameters at once. If they weren't the same you could not use them to estimate Y in the same model.