Varying coefficient problem with weighted regression - DOES IT MAKE SENSE?

Dear List,

Please consider first a toy example. Suppose we have a large norming database with three variables, representing raw score on a mathematical test (rawMath), raw score on a nonverbal IQ task (nvIQ), and socioeconomic background (SES). For math skills, we have a standard score as well (stMath), computed by standardizing the raw math score controlling for school grade (GR).
With standard linear regression, we can show that nvIQ and SES predict rawMath. However, theoretical considerations suggest that the relative importance of nvIQ and SES in the prediction of rawMath depends on the level of stMath and the age of the children (AGE). E.g. for young children with dyscalculia, SES is a better predictor of math test performance, while for older and/or mathematically talented children, nvIQ is the main predictor.

How can I test the hypothesis above?
1) My first thought was that I could build a varying coefficient model, e.g.
However, since rawMath can be very well predicted by stMath and AGE (this is of course trivial), the model above does not give role to the original predictors. Shall I can categorize the stMath variable, so that the unexplained variance is still high enough to build the varying coefficient model?

2) My second thought was to use weighting, borrowing the idea of the geographically weighted regression method (GWR, see In GWR, spatial dependence is handled by running weighted regression for each datapoint, where the weights are based on the spatial distances between the observations. We could extend this idea to our case, i.e. defining a space by AGE and stMath, computing the distances between observations, and running GWR (actually simple weighted linear regressions for each observation). Than we can plot the standardized coefficients (or whatever we want) in the given "space", and look for fluctuations. We can test the significance of local variations in the coefficients by Monte Carlo simulations (i.e. permuting the coordinates before running GWR, and repeat it many times, to get an approximate distribution of the given measure).

I implemented this method in R, and the results are encouraging. However, I would appreciate if you could help me out if this method makes statistically sense or I shall forget it. Of course alternative solutions are highly welcome.