Thread: Best Methodology for Solving for a Dummy Variable

1. Best Methodology for Solving for a Dummy Variable

Hi everyone. This is my first post on the forum and unfortunately is an act of frustration after reading many articles and textbook sections without finding a clear answer. I've taken Statistics 101 twice (once in undergrad and once in graduate school) and understand the basics of statistics, but now that I'm developing my own project I'm struggling.

I'll try to keep things simple and only add detail as necessary. So I have a large data set of around 250,000 samples and am trying to solve for a categorical value. So for example my dependent variable may be "poodle", "bulldog", "siamese cat", or "persian cat". So two values are dogs and two are cats. So given multiple independent values I'd like to solve for predicting whether the sample is a dog or cat.

I've cleaned up my data and mapped dog to 1 and cat to 0. Also, I have many independent variables like height and weight (both numbers) or color (a categorical value). I've mapped all my categorical values to numbers (1, 2, 3, 4,...), but now I have no idea what to do with this data.

I have experience with R and have tried calculating the correlations and covariance values along with attempting to use linear regression in an attempt to produce a function such that I could plug in each independent variable and then arrive at a value representing the probability that the sample is a dog (or cat). I've tried plotting my values, but since many of the values are zeros and ones it's hard for me to grasp what's going on. I'm not sure if I need to run the two sets independently, but that doesn't entirely make sense to me either.

I'm open to suggestions of more effective methodologies like logarithmic regression or even machine learning or just comments on my strategy.

Sorry for the long post, but after a week of reading and working on my data with no results, I'm rather frustrated. Thank you very much in advance for any help on this.

2. Re: Best Methodology for Solving for a Dummy Variable

By "samples", you mean observations within a single sample?

If you are trying to predict a binary dependent variable (cat or dog), you should use a multiple logistic regression model. Let us know if you have more questions.

 Tweet

Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts