A complicated Variable: IMO

#1
For a project, I would like to create a variable, which in my opinion is pretty complex. But I'm very new to Stata, and hope one of you can help me out.

My data has 3 vars: country, and city, pop.

E.G:

Country City Pop
U.S City 1 200M
U.S City 2 190M
U.S City 3 100M
U.S City 4 10M
.
.
.
Canada City 1 120M
Canada City 2 100M
Canada City 3 90M
.
.
.
China City 1 1200M
China City 2 100M
China City 3 700M

I would like to create a var 4 that will assign numbers to the cities of the countries based on the population.

For example, Var 4, would give city 1, a value of 1 because it has the highest pop and city 2, a value of 2 and so on. Keeping in mind that I have a very large data set and it is not ordered in any logical fashion.

Any help or pointers in the right direction would be very much appreciated!

Thanks! :)
 
#2
First, the variable "Pop" needs to be numeric rather than string type. Here is an example:

*******************

clear
input str10 Country str10 City Pop
U.S City_1 200
U.S City_2 190
U.S City_3 100
Canada City_1 120
Canada City_2 100
Canada City_3 90
end

generate Pop_inv = -1*Pop
bysort Country (Pop_inv) : generate var4 = _n

list

*************************

Note that -bysort Country (Pop_inv)- uses Country as groups but also sort (but not group) by Pop_inv. Since _n is 1, 2, 3, 4, ... from minimum Pop_inv to maximum, I create a variable "Pop_inv" to be the inverse order of Pop, so that var4 is 1, 2, 3, 4, ... from highest Pop to lowest.

Please see

help by
help _n

for more information.

For a project, I would like to create a variable, which in my opinion is pretty complex. But I'm very new to Stata, and hope one of you can help me out.

My data has 3 vars: country, and city, pop.

E.G:

Country City Pop
U.S City 1 200M
U.S City 2 190M
U.S City 3 100M
U.S City 4 10M
.
.
.
Canada City 1 120M
Canada City 2 100M
Canada City 3 90M
.
.
.
China City 1 1200M
China City 2 100M
China City 3 700M

I would like to create a var 4 that will assign numbers to the cities of the countries based on the population.

For example, Var 4, would give city 1, a value of 1 because it has the highest pop and city 2, a value of 2 and so on. Keeping in mind that I have a very large data set and it is not ordered in any logical fashion.

Any help or pointers in the right direction would be very much appreciated!

Thanks! :)