I have a data set with 2 state spaces (for which I've already generated a transition matrix).
The form of the transition matrix is the following columns:
User_id P(S1)*P(S2) P(S1)*(1-P(S2)) (1-P(S1))*P(S2) (1-P(S1))*(1-P(S2))
.
.
.
I can't seem to understand how to generate the P for use with MDP toolbox since it needs to be a 3d matrix.
My reward function is also just 0s and 1s - since you either get utility from it or you don't.

Please help. Thanks so much!