regression of 'proportion of people with a higher income' on 'income'

hey guys

basically I want to regress the following model:
p(x)= constant + a*x,
where x is the income of a particular person and p(x) the proportion of people in my sample with a higher or equal income than that particular person. I am ultimately interested in a.

thus i thought the obv thing to do would be to
regress px x (letting p(x)=px here),
but I am struggling with how to compute the variable px for my sample.

Any suggestion/help would be greatly appreciated!


If you just have one observation per person, and no missing data, then it's quite easy to generate these proportions. You can use _N which refers to the number of observations in the dataset, and _n which refers to an individual observation's position within the dataset. So:

sort income
gen proportion=(_N - _n + 1)/(_N)
First of all thanks for the reply!

I don't have missing variables but some observations have the same income in my dataset. Wouldn't that cause a problem, as i am looking for the propor with greater or equal income?
In particular, e.g. If i had incomes 1 1 2, wldn't that give me different values for the observation with income 1, although they should be identical?


Yes that's true; you could get around that by replacing proportions with the highest proportion calculated for each income:
sort income proportion
by income: replace proportion=proportion[_N]

(when you use the -by- prefix, _N refers to the number of observations within the group specified by -by- rather than the number in the complete dataset)