The birth intervals are calving intervals for killer whales. Killer whales typically give birth to their first calf at 10-20 years of age and subsequently give birth to calves at 2-10 year intervals over about a 25 year reproductive lifespan. I was interested in whether calving rates slow (intervals increase) with age of the mother, indicating reproductive senescence (e.g. reduced ovulation rates, increased abortion rates, etc.). I initially regressed calving intervals on the age of the mother when she completed the interval and found a positive correlation, suggesting reproductive senescence. However, I now recognize that approach to be biased/spurious - even if calving intervals are unrelated to age, a positive correlation occurs because mothers tend to be younger following short calving intervals and older following long calving intervals.
I've attached simulated data and screen shot of the regressions to illustrate the problem. I generated 100 random ages between 10 and 35 to simulate age of the mother at the beginning of the calving interval. I then generated 100 random calving intervals between 2 and 10 years. Finally, I calculated the age of mothers at the end of the calving interval by adding the intervals to her age at the beginning of the interval. There is a significant relationship between calving interval and age of the mother at the end of the interval, but no relationship between calving interval and age of the mother at the beginning of the calving interval.
As noted above, the "spurious" correlation has nothing to do with biology, but occurs because there is a mathematical relationship between calving interval and age of the mother at the end of the interval. If X=Age at Beginning of Calving Interval and Y=Calving Interval, my first approach was equivalent to regressing Y on X+Y which introduces an artificial relation without involving any 3rd variable.
I believe the problem of spurious relationships due to regressing mathematically related variable arises in other situations. Let me give a non-biological example. Suppose one were interested in assessing whether the time it takes to learn how to drive a car, defined as the time between when one first gets behind the wheel to the time they get their drivers license, increases with age. One might inadvertently assess this issue by regressing the time it took to learn how to drive on the age at which one receives their drivers license. But this would result in a spurious relationship - even if learning time was completely random, those that take longer to learn will tend to be older when they get their license. Mathematically, if X=Age First Get Behind the Wheel and Y=Time it Takes to Learn How to Drive, it follows that Age One Receives License will be X+Y and thus spuriously correlated with Y.
As an undergrad, I took some excellent statistics courses and recall there being a specific term for spurious relationships attributable to regressing mathematically related variables. But I'm now retired, so it's been awhile, and I can't recall what the term is called.