Statistically different tops for 2 parabolas with non-normal distributed data

rob1

New Member
#1
For situation 1 and 2 from my data came two parabola relations between variables X and Y. (the two parabolas have both a maximum)

My question is whether it is possible to test if the two maxima of the parabolas are significantly different from each other, since the tops of the parabolas are unequal to the means of the data. (for both parabolas, the vast majority of the observations is left from the maximum)

If it is possible, how can I measure this? (e.g., which test can do this?)

Thanks a lot :)
 

Dason

Ambassador to the humans
#2
You could do this a variety of ways. None of which I would describe as 'simple' though. To start with let me as a question - if you fit a quadratic regression (a regression model of the form \(y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \epsilon_i\)) to each set of data do the residuals appear to be approximately normally distributed?
 

rob1

New Member
#3
Yes, I did the quadratic regressions, and the residuals appear to be normally distributed.

I don't know whether this matters, but I use SPSS for the analysis.
 
Last edited:

BGM

TS Contributor
#4
So I try to continue from Dason's question, with the notation proposed by Dason which is standard.

From the elementary knowledge about the parabola, we know that the maximum is

\( \frac {4\beta_0\beta_2 - \beta_1^2} {4\beta_2} \)

Note: I am assuming you are not referring to the axis of symmetry \( \frac {-\beta_1} {2\beta_2} \), but actually the calculations can be similarly applied

So you will be testing

\( H_0: \frac {4\alpha_0\alpha_2 - \alpha_1^2} {4\alpha_2} =
\frac {4\beta_0\beta_2 - \beta_1^2} {4\beta_2} \)

where \( \alpha, \beta \) representing the parameters of the two situations respectively.

For modelling prospective, you need to specify whether the two situations are independent. If the two variables are the "same" but in two different "situations" then it is likely that the pair in two situations are dependent. Using dummy variable, the model could be written as

\( y_i = D_i(\alpha_0 + \alpha_1x_i + \alpha_2x_i^2) +
(1 - D_i)(\beta_0 + \beta_1x_i + \beta_2x_i^2) + \epsilon_i \)

where \( D_i = 0, 1 \) is the dummy variable indicating the two situation. The homogenity of the error can be also addressed as well.


One way to test it is using the bootstrap; another way is using Delta's method. Let's confirm the model set-up first.
 

rob1

New Member
#5
Thanks so far :)

I read your part again, and I confirm the model set-up, except that I am comparing the axis of symmetry, instead of the maxima.
 
Last edited:

Dason

Ambassador to the humans
#6
Bootstrapping would probably be the easiest approach but otherwise I would suggest the delta method as bgm mentioned.
 

BGM

TS Contributor
#7
First I must say that I have little experience in implementing the bootstrap procedure in practice, so please correct me if I am wrong.

Now you want to test

\( H_0: \frac {-\alpha_1} {2\alpha_2} = \frac {-\beta_1} {2\beta_2} \)

which is equivalent to

\( H_0: \alpha_1\beta_2 - \beta_1\alpha_2 = 0 \)

The basic idea is that as the estimators \( \hat{\alpha}_1, \hat{\alpha}_2, \hat{\beta}_1, \hat{\beta}_2 \) are consistent estimators, we may use

\( T = \hat{\alpha}_1\hat{\beta}_2 - \hat{\beta}_1\hat{\alpha}_2 \)

as the test statistic and reject \( H_0 \) when it is significantly different from 0.

To determine whether it is significant or not, you need to determine the distribution of \( T \) under \( H_0 \) and find out the corresponding quantiles (with the given significance level) and use that to give the rejection/acceptance region.

The steps could be like the following:

1. Suppose you have \( m, n \) pairs of data for each situation. Now you re-sample from the original sample with replacement with the same sample size for each situation.

2. Using the generated sample, now estimate all the parameters under the \( H_0 \) constraint - which you need to jointly estimate for both situation and you may need to use the Lagrange multiplier if you are seeking a closed-form solution.

3. Calculate the test statistic \( T \) in this sample, and record it.

4. Return to step 1 and repeat for \( B \) times. Use the sample percentile of these recorded \( T_1, T_2, \ldots, T_B \) to construct the acceptance region. E.g. if your significance level is \( 5\% \) then you will use the 2.5 and 97.5 percentile as the pair of end-points for the acceptance region (interval).

Once you obtain the acceptance region, you can calculate \( T \) for the original sample again and make the decision.
 

Dason

Ambassador to the humans
#8
Actually this would probably be fairly simple to do using non-linear regression... What software are you using?
 

Dason

Ambassador to the humans
#9
If one set up a model of the form

\(y_i = (\beta_0 + \alpha_0I_i) + (\beta_1 + \alpha_1I_i)(x_i - (x_0 + a_0I_i))^2 + \epsilon_i\)

you could fit that using a non-linear regression routine. If \(I_i\) was an indicator that was 0 for group 1 and 1 for group 2 then if you're interested in asking the question "does the max occur at different x-values for these two groups" this translates into testing the null hypothesis \(a_0 = 0\).

Really all this is is a way of writing a quadratic function in the form
\(y = \beta_0 + \beta_1(x - x_0)^2\) which is functionally equivalent to the typical way we write quadratics when doing regression but takes a non-linear form in the parameters. We allowed the two groups to have different parameters.