My collegue is developing a logistic regression to predict the probability of a customer taking up a particular credit card.

Let's call the credit card the gold card, so his dependent variable on the left side is GoldCard = 0,1. On the right side of his equation he is also including a variable for overall credit card indicator that equals 1 if the customer has ANY credit card. Therefore any observation where GoldCard=1 the overall credit card indicator also equals 1.

It seems to me that this would result in an artificially high accuracy because the two variables are perfectly correlated. A comperable example would be developing a model that predicts gender and then including a gender variable on the right side.. And you'd never do that, right?

Anyways, I'm pretty sure it's incorrect to include the overall card indicator but I'm not knowledgeable or articulate enough to explain why. Any info/insight would be greatly appreciated.

Thanks,

Ben ]]>

Let's say I have a bird feeder that feeds 1 bird at a time. From past observations I know that if there is a bird at the bird feeder, there is a 50% probability that the bird is a Cardinal, 30% it's a Blue Jay, and 20% it's a Raven.

However, I can also detect it's wing span. Let's say a Cardinal has an average wing span of 5.7" and a standard deviation of 0.2". A Blue Jay has an average wing span of 5.9" and a standard deviation of 0.15". A Raven has an average wing span of 6.3" and a standard deviation of 0.25".

If there is a bird at the feeder with a wing span of 5.8", what are the odds it is a Cardinal, Blue Jay, or Raven? ]]>