Conditional Probability

#1
I am trying to create a matrix of the conditional probabilities from this:

Code:
Signal <- runif(100)
PercChange <- abs((rnorm(100)/100))
signalt <- seq(0, 1, 0.05)
abst <- seq(0, c(0.02:1), 0.0025)

CondDistMat <- matrix(0, nrow = length(signalt), ncol = length(abst))

for(j in 1:length(signalt - 1)){
    xbool = (is.na((Signal >= signalt[j] & Signal < signalt[j + 1]) ) * 1)
    ysubset =  (PercChange * xbool[j] )
    CondProb = hist(ysubset, breaks = abst, freq = TRUE)
    CondDistMat[signalt, abst] <- CondProb$density 
}
The columns will be the percentiles defined by abst while the rows will be 5% tiles defined by signalt. The idea is through the boolean vector to produce 1's where the absolute returns PercChange should be in the columns, and then plot the probabilities for each signalt of this.

I am however not being able to produce an output - can anyone spot the error(s)?

The desired output should look something like the attached image

Thanks in advance
 
#2
To start with, i see a lot of issues with the code, may be you can explain a little more..
1. why rnorm(100)/100 ?
2. is.na is a missing value indicator, i don't think you need that....
3. you are just using xbool[j] in your calculations, so why generate the whole xbool vector. The point of asking this is not just about computational complexity but may be it will help you realize any flaw in your logic.
 
#3
1) rnorm(100)/100 just gave me a change that would be in percentage. This is just an example input, as I have the true percentage changes based on the data I am using.
2) I had a problem with R returning NAs instead of 1s, the is.na inclusion solved this. If it is avoidable that is of course preferable.
3) I wanted to create a vector for each j as it takes the values of signalt, and then apply this to the ysubset.

I'm fairly new to programming these things in R, so any help is much appreciated.
 
#4
I would suggest a couple of things
rnorm(100)/100 would give your very small numbers as it creates 100 normal random variables with mean 0 and s.d 1, so the values sampled are probably near zero and then you are dividing by 100 also, which makes it further small. Just print out the vector and see if you were actually looking at so small values. I would suggest using your actual data, to be sure there are no isses with the data you are working on.

R would return NA only when there is a missing character or numeric value. When you are doing a comparison, it would not result in NA. I tried out your code without is.na and it is generating 0's and 1's only in xbool.

xbool[j] referes to the jth element of xbool vector. See your compuatation again, you are not using the whole xbool vector at jth step rather you are using jth element of the xbool vector (which changes in each iteration) at jth step. So think again.

Why don't you try to write a pseduo code and then work it into a R code. Also just try to print the variables at each step to see if your expected results are in sync with what you have coded.
 
#5
Thanks for your inputs.

So I rearranged a few things in the coding and am now producing the histogram output of the entire Signal vector. However this is plotting the frequency on the whole range.

Signal # From Data
PercChange # From Data
Code:
Signal <- runif(100)
PercChange <- abs((rnorm(100)/100))
Signalt<- seq(0, 1, 0.05) # Produce the 5% tiles
abst <- c(seq(0, 0.02,  0.0025), 1)  #Produce the 0% to 2% tiles with 0.25% increments. 

CondDistMat <- matrix(0, nrow = length(Signalt), ncol = length(abst)) # Matrix for output 

for(j in 1:length(Signalt)- 1)  {
    # Produce 0 or 1s 
    xbool = ((a >= Signalt[j] & a < Signalt[j + 1]) *1) 

    # Multiply Price Change vector on the xbool vector to produce the returns for each 5% tile. 
    # I am getting errors here with "number of items to replace is not a multiple of 
    # replacement length", even though the method is correct?
    ysubset =  PercChange [ xbool]

    # This is where I would like to create the frequencies of each ysubset[j]
    CondProb = hist(ysubset, breaks = abst, freq = TRUE)

    # Add the frequencies to the matrix
    CondDistMat[Signalt, abst] <- CondProb$density 
}
 
Last edited:
#6
few more things....
1: length(Signalt-1) is not probably what you want, as it taking length of vector(Signalt-1) rather than ((length of vector Signalt) -1)
PercChange*xbool is this matrix multiplication or element by element product. these two are different things. Just check if the size of matrices/vectors is appropriate for the required operation

PS: I cannot run your code as now you are using your own data.
 
#7
I have rewritten the code without the loop

a # Signal Vector
b # Price Change Vector
Signalt<- seq(0, 1, 0.05) # Produce the 5% tiles
abst <- c(seq(0, 0.02, 0.0025), 1) #Produce the 0% to 2% tiles with 0.25% increments.

xbool = ((Signal >= Signalt[1] & a < Signalt[1 + 1]) *1) # 1 for True 0 for False
temp = PercChange * xbool
temp2 <- temp[which(temp > 0)]
CondProb <- cut(temp2, abst, include.lowest = T)
table(CondProb)

This outputs the table with abst columns with the number of occurences.
I of course need it to be in % of total per row, but I would like first to be able to run the loop and get the matrix output.