I have a real world business problem of selecting products to recommend to our contacts in an email campaign.
The data set I have contains the click thrus of our products' ads on various websites and other info about the visitors of the websites that have the ads on them.
Some of the variables in the data set does not make sense to use in scoring for email campaigns such as the time of the day the visitor visited or the websites the ads are displayed. However, they need to be in the training data to model the correct behavior of the visitors and they turn out to be strong predictors. My question is "should I use these variables in creating the gains table on my test data set to evaluate the model performance?"
Thank you for any input in advance!
The data set I have contains the click thrus of our products' ads on various websites and other info about the visitors of the websites that have the ads on them.
Some of the variables in the data set does not make sense to use in scoring for email campaigns such as the time of the day the visitor visited or the websites the ads are displayed. However, they need to be in the training data to model the correct behavior of the visitors and they turn out to be strong predictors. My question is "should I use these variables in creating the gains table on my test data set to evaluate the model performance?"
Thank you for any input in advance!
Last edited: