Trying to think of alternative methods to approaching my model for dynamic pricing

Right now i run a pricing experiment where 25%of users get one of 4 random prices. After collecting sufficient data, I split my data into training and validation and test, and then split the training data into 4 subsets, one for each price offered. I then model all 4 separately.

I score all 4 models on the validation data and whichever model predicts the highest revenue is the chosen price for that group. So if model a predicts the highest, that person is predicted to get price a

I take all the people who were offered the same price as their predicted max price to use for uplift calculation purposes.

My problem is my model accuracy isn't directly correlated with the uplift calculated so I'm wondering if there is a way to model this without doing the 4 separate models.