generate n length proportion (sum to 1)

trinker

ggplot2orBust
#1
All right that title was terrible but I didn't know what to name it.

Basically I want to make a function that makes an n length vector of random proportions that sum to 1 (100%). I can do it if I know n and fix the function at at certain length of the vector but not if I allow n to be free. I'd really like to use an apply family solution (or non loop if there's some solution I can't for see using neither loop or apply) but if that's not possible a loop is fin (I'd actually like to see both as it'll help with the thinking):

A forced n (works but not what I want
Code:
p <- function(){
    v <- sample(seq(0, 1, by=.01), 1)
    w <- sample(seq(0, 1-v, by=.01), 1)
    x <- sample(seq(0, 1-(v + w), by=.01), 1)
    y <- sample(seq(0, 1-(v + w + x), by=.01), 1)
    z <- round(1-(v + w + x + y), 2)
    c(v, w, x, y, z)
}

p()
an attempt to use global assignment to generate the function
I'm actually not sure why this approach doesn't work:confused:
Code:
n<-4
y <- 0
sapply(seq_len(n), function(i) {
        x <- sample(seq(0, 1-y, by=.01), 1)
        y <<- y + x
    }
)
What I'd like to get:
Code:
p(n=4)
[1] 0.24 0.01 0.50 0.25

p(n=4)
[1] 0.16 0.05 0.70 0.09

p(n=5)
[1] 0.30 0.49 0.15 0.01 0.05
In my code I restricted the sampling to the hundreds place but that doesn't have to be the case, it just seemed like an easy approach.
 

trinker

ggplot2orBust
#2
Ohhhhh.............................!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

You dope trinker:

Code:
n<-4
y <- 0
sapply(seq_len(n), function(i) {
        x <- sample(seq(0, 1-y, by=.01), 1)
        y <<- y + x
        return(x)  [COLOR="red"]# I needed this guy[/COLOR]
    }
)
Still interested in a loop solution. I may play with it myself to see if I can get it since it seems a loop isn't much different than what I did.
 

trinker

ggplot2orBust
#3
Nope not working the way I want it to. In mine I don't need to use the I but I do in the loop and I don't know how.

Code:
n<-4
y <- 0
for(i in 1:n){
        x <- sample(seq(0, 1-y, by=.01), 1)
        y <- y + x
}
 

trinker

ggplot2orBust
#4
I thought I had it but I don't :confused:

Please help again. This is what I got. It spits ot the correct number of proportions but they don't sum to 1. And for n=1 it gives this error:

Code:
p <- function(n){
    y <- 0
    z <- sapply(seq_len(n-1), function(i) {
            x <- sample(seq(0, 1-y, by=.01), 1)
            y <<- y + x
            return(x)
        }
    )
    w <- c(z ,sample(seq(0, 1-sum(z), by=.01), 1))
    return(w)
}
Code:
> p(1)
Error in sum(z) : invalid 'type' (list) of argument
Where as I'd expect it to be 1.
 

trinker

ggplot2orBust
#5
Duh again:

Code:
p <- function(n){
    y <- 0
    z <- sapply(seq_len(n-1), function(i) {
            x <- sample(seq(0, 1-y, by=.01), 1)
            y <<- y + x
            return(x)
        }
    )
    w <- c(z , 1-sum(z))
    return(w)
}
Still the length zero doesn't work.
 

trinker

ggplot2orBust
#6
Alright this is it:

Code:
p <- function(n){
    if (n < 2) stop("n must be greater than 1")
    y <- 0
    z <- sapply(seq_len(n-1), function(i) {
            x <- sample(seq(0, 1-y, by=.01), 1)
            y <<- y + x
            return(x)
        }
    )
    w <- c(z , 1-sum(z))
    return(w)
}
Having length 1 is silly anyway. I could do an if else but it makes no sense for the purposes I want this for.

Thanks for the help everybody :) Sorry for polluting TS with a thread I could have solved if I slowed down a bit but maybe someone will learn from this. The for loop way would still interest me as I want to learn looping better (I know I'll need it as I mope to other languages).
 

Dason

Ambassador to the humans
#7
Do you necessarily want the stick breaking method to be used to generate your proportions?

Otherwise you could make your life a lot easier...

Code:
n <- 5
# or whatever random number generator you want that only gives positives
tmp <- rgamma(5, 1, 1) 
tmp <- tmp/sum(tmp)
# tmp now contains stuff that sums to 1
 

bryangoodrich

Probably A Mammal
#8
****, that was gonna be my answer. Just make random numbers, sum them to get a total and then treat each number as a proportion of that total as Dason aptly demonstrated with rgamma. Though, there may be the problem with rounding. I'm assuming the accuracy of this processing isn't that dire, however!
 

Dason

Ambassador to the humans
#9
Also note that my algorithm (using rgamma) produces a special case of draws from a Dirichlet distribution.
 

bryangoodrich

Probably A Mammal
#11
A few hours behind you? Some of us needed to catch up on sleep. Actually, I was walking through my presentation which turned out to be WAY longer than anticipated.
 

trinker

ggplot2orBust
#13
To get it to be exactly 1 for rowSums I had to modify it in this way (because of rounding):

Code:
props2 <- function(nrow=10, ncol=5, var.names=NULL, digits=2){     
    p <- function(n, digits){                                      
        tmp <- rgamma(n, 1, 1)                                     
        X <- round(tmp/sum(tmp), digits=digits)                    
        if (sum(X)!=1) {                                           
            o <- diff(c(1, sum(X)))                                
            X[which.max(X)] <- max(X)-o                            
        }                                                          
        return(X)                                                  
    }                                                              
    DF <- data.frame(t(replicate(nrow, p(n=ncol, digits=digits)))) 
    if (!is.null(var.names)) colnames(DF) <- var.names             
    return(DF)                                                     
}