Need Help with Correlation


I am trying to figure out the best way to calculate correlation with data I have for my business. Here is an example of what I the fields:

Sale Amt - numeric
Sold - yes or no
State - lists 1 of 50 states
Industry - lists the industry of the buyer
Lead Source - lists where the lead came from

I want to determine what type of correlation there is between the "state", "industry", and "lead source" to selling a deal. Is there higher probability of selling in a particular state or industry? I also want to see if the sale amount goes up or down in certain states, industries, or lead sources. Any suggestions on best approach?

What I have done so far is assign a 1 or 2 to the "Sold" field where sold is a 1 and not sold is a 0. I then took the fields like state and industry and created columns for each possibility with a 1 or 0 in the column to indicate if that answer was a yes or no. For example, I created 50 columns for each state with a 1 or 0 in each to signify what state it came from. I then used the correlation tool in excel and grabbed all those fields to compare to whether it was sold or not. It didn't show much correlation so I am not sure if that is the best approach.

Any help would be appreciated. Thanks.


Less is more. Stay pure. Stay poor.
What is you sample size?

You should probably forget about the simple correlations and move toward multiple regression so that you can control for multiple variables at once. Let us know what you think after examining the approach.