Two ways to include temporal autocorrelation

assume that we have time series with autocorrelated values described by the regression model Y_i ~ X_i. As far as I understand, in AR-regression models, this correlation is considered by assuming that residuals are autocorrelated. Instead, I could introduce additional predictors "X_minus_one", "X_minus_two",.... just showing the values of the 1th, 2th,... time step before. Here, I would consider autocorrelation directly on the levels of the predictors.

The question is: What is the practical difference between these two approaches? The second one usually converges much faster, so what is the advantage of the AR-models?