I am currently attempting to analyze the efficacy of a treatment on a premature baby's SpO2 levels.

I have many measurements before the treatment begins and many measurements after. There are about 20 different babies in the study. The data looks like this:

ID Mean_SpO2 Minutes
1 98 -5
1 99 -4
1 99 -3

In a typical repeated measures study, there is a single baseline measurement and we see if the treatment has any effect after the baseline measurement. Here, we can establish some sort of a norm for the babies using the pre-treatment measurements and check to see if the post-treatment measurements differ significantly.
My question is, how can we test for this?

My approach for now is to use a splining method. Place a knot at time=0 when the treatment occurs and then see if the spline term is significant and check its effect size.

Does this approach seem good?