Say you're trying to evaluate the success of a change in an email newsletter design. To do this, you can send half of your list one design and half of your list the other design. You can then use a hypothesis test to try to determine if the change is significant. If you do this for a large number of emails and randomly select who receives the design each time, then the better design should be determinable.
But not all subscribers on your email list are the same. Some have better past engagement rates than others. And you have information about each user's past engagement rates. If you randomly split the list, then you might end with one half of the list having a higher engagement rate. But, by taking into account past information, you can divide the list into groups with similar engagement rates and then randomly decide who receives each design amongst those groups. Maybe you could create a model that can predict engagement for each user and compare the predicted engagement rates to actual rates.
It seems that by taking into account past information, you might be able to determine the better email design faster than pure random selection. Is this ever true? If so, in what situations is it true? How do you know when to take into account prior information and when to ignore it? How does taking into account prior information impact hypothesis tests?