Leave-one-out error strictly decreasing when number of parameters is increased, when it should not be?

#1
People may or may not be able to help intuitively shed light on my problem. Maybe I haven't considered some aspects. I'm running a non-parametric (kernel) regularized least squares estimation on some binary training data to then predict probabilistic values for the non-training data.

I have 9 variables in the original model, which I know should find the best predictions. I know this should be the right selection as I'm following a widely-cited scientific paper which justifies those 9 variables and no others.

However, when I add variables to the model, leave-one-out error, RMSE, MAE always all decrease. The pseudo R-squared always increases. This just shouldn't be happening, as far as I'm aware, as that implies that these models with more variables would explain the left-out data point data more accurately, and, therefore, the out-of-sample data too. Does anyone know any reason this might be happening?
 

hlsmith

Not a robit
#2
You need to add more details to your post. So when you add more than the nine variables or what in particular are you writing about. By chance you could have trivial improvement in out-of-sample. Perhaps try a different seed value for data split and see if it still persists or not in order to rule that out, some people also use k-fold and k-repeat sampling, so they do k-fold on multiple groupings during the process, this may eliminate threats of just a random improvement.

P.S., you mean out-of-sample is improving but in-sample is not improving? Provide actual values for data, please.
 
#3
You need to add more details to your post. So when you add more than the nine variables or what in particular are you writing about. By chance you could have trivial improvement in out-of-sample. Perhaps try a different seed value for data split and see if it still persists or not in order to rule that out, some people also use k-fold and k-repeat sampling, so they do k-fold on multiple groupings during the process, this may eliminate threats of just a random improvement.

P.S., you mean out-of-sample is improving but in-sample is not improving? Provide actual values for data, please.

Thanks for the reply, it's given me some good angles to work from. I'll play around with the model.