When are residual plots ok?


Fortran must die
In general you want a random scatter in the residuals. You want no obvious patterns. I know that is the theory. But what does that really mean in practice? How much of a random scatter do you need?

Part of my problem is that I have thousands of data points and commonly they look like a giant blob, not the pictures that are shown in text to indicate you do or do not have a problem.

Note I am making this as a general comment. I have no residuals to post, or I would have.


TS Contributor
Interpreting residual plots is something you develop a feel for over time. There are gross violations and there are "good" looking plots, but in between is where you learn how to think about what's really going on. You can see patterns that tell you a lot about the model and how it could be improved, such as including an interaction with a grouping variable, or that you've omitted certain terms.

One way to improve in the art of interpreting residual plots is to always keep in mind what the plot might tell you about a particular assumption. Once you keep this in mind, you should think how the model could be affected when the assumption is not reasonably satisfied. With this in mind, you may come across a plot where you're not sure how to call it. In this case, you would employ at least one of the methods to remedy a potential violation, checking the assumption after the fix, and then comparing the results of the "violated" to the "fixed" model. This is a good practice that will allow you to better understand the plots and how severely things can be "violated" before needing a fix (how well can the conclusions hold up to potential violations). In addition, it usually isn't a bad idea to implement a fix and compare the results when you're unsure if certain assumptions are reasonable (this isn't a magic wand, though, because everything you do needs to make sense). I know of someone with a PhD in statistics as well as 30+ years of consulting and teaching experience who pretty much only uses the residual plots, rather than formal "tests" (like Breusch-Pagan), and he strongly advocates for this approach of "fixing" and comparing if you aren't sure (often you need to look deeper at the data to see why they are giving a particular pattern).