Hi - I am trying to determine a good tool that will help me generate a probability of a sale for a list of of 300,000 products. I have a table of historical sales data (with about 300,000 records) that contains around 10 continuous variables along with a dependent variable that has a yes/no (i.e., binary outcome) value indicating whether product in the list has had a sale in the past 12 months.
The historical data essentially looks like this.
Product1,2,3 etc
Variable 1
Variable 2
Variable 3
Variable 4
Variable 5
Variable 6
Variable 7
Variable 8
Variable 9
Variable 10
Sold in past 12 months (Yes or No)
The last variable in the list is of course the dependent variable.
All I want to do is to find a simply tool that is going to be the best or easiest to use, so that I can assign a probability to each product in the list, essentially giving me the chance to condense my list to the products that are the highest likelihood to generate a sale, so that I can list those products instead of the others.
Ideally, the tool could do a quick lostic regression, or some other probability calculation based on the available variables, and thereby give me a (RVU-like) number (perhaps a probability ranging from 0 to 1) for each product, allowing me to quickly select the top 50,000 products to list on a website, since they have the higher probability of generating a sale according to the available variables.
I am of course assuming that the variables are somehow correlated to the outcome, but perhaps the tool will help me determine that.
(1) Does anyone have any suggestions of a good tool to accomplish this? I would presume that there is a simple way to set this up in Microsoft Excel, but if not, then a piece of software that does this would of course be great too.
(2) I am also open to suggestions as to which type of regression analysis (or other analysis) is the best to accomplish this.
Thanks for any suggestions.
The historical data essentially looks like this.
Product1,2,3 etc
Variable 1
Variable 2
Variable 3
Variable 4
Variable 5
Variable 6
Variable 7
Variable 8
Variable 9
Variable 10
Sold in past 12 months (Yes or No)
The last variable in the list is of course the dependent variable.
All I want to do is to find a simply tool that is going to be the best or easiest to use, so that I can assign a probability to each product in the list, essentially giving me the chance to condense my list to the products that are the highest likelihood to generate a sale, so that I can list those products instead of the others.
Ideally, the tool could do a quick lostic regression, or some other probability calculation based on the available variables, and thereby give me a (RVU-like) number (perhaps a probability ranging from 0 to 1) for each product, allowing me to quickly select the top 50,000 products to list on a website, since they have the higher probability of generating a sale according to the available variables.
I am of course assuming that the variables are somehow correlated to the outcome, but perhaps the tool will help me determine that.
(1) Does anyone have any suggestions of a good tool to accomplish this? I would presume that there is a simple way to set this up in Microsoft Excel, but if not, then a piece of software that does this would of course be great too.
(2) I am also open to suggestions as to which type of regression analysis (or other analysis) is the best to accomplish this.
Thanks for any suggestions.