Multiple regression case?

Hi all,

I am pretty new around, and my statistical studies were taken more than 20 years ago, so my skills in the area are totally rusty.
Still, I recently I bumped into an interesting case which made me wonder if I can still revive something of the old glory.

Let's say we have an INITIAL value, to which 6 features (Feat1, Feat2, ....., Feat6) can sum up, giving a FINAL value.
These features have some other maximum 6 mini-features at their turn, (for ex Feat1 has 5: a1, a2, a3, a4 and a5), and their value can be any integer in a designated interval (for ex, for Feat1, between 0 and 50).
Is there a way, having 500 records how these mini features contribute to final value (but without knowing exactly their value) to find out the average value of all mini-features?

Obs: I tried to attach an xls file to have a common base of discussion, but it did not work - which formats are supported?


Ambassador to the humans
So are you saying

FINAL = INITIAL + Feat1 + Feat2 + Feat3 + Feat4 + Feat6 (is there any error involved in this at all or will the equality always hold?)

and also

Feat1 = a1 + a2 + a3 + a4 + a5
Feat2 = b1 + b2 + b3 + b4 + b5 (maybe + b6 or something - you don't specify if there are the same # of sub-features for each main feature)
Feat6 = f1 + f2 + f3 + f4 + f5

And you have values for FINAL, INTIAL, Feat1, Feat2, Feat3, Feat4, Feat5, and Feat6 but you don't have the values for a1, a2, ..., b1, b2, ..., f4, f5. Is that an accurate summarization of what you have?
Hi Dason,

And thank you for your intervention. Unfortunately I have not been able to upload the xls file, but now I succeed to add a pic, and I hope you'll be able to figure out all details.

To reply to your questions:

1) Yes, FINAL = INITIAL + Feat1 + Feat2 + Feat3 + Feat4 + Feat5 + Feat6
No error involved in this, the equality always hold

2) Feat1, Feat2, Feat 3 etc are generic names under which those mini-features act, used to express exactly the fore-mentioned equation.
There is no relation like Feat1 = a1 + a2 + a3 + a4 + a5 (just that a1, a2, ...a5 are variables of Feat1)

3) I have the values for FINAL & INITIAL, and that is all.
For Feat1, Feat2, Feat3, Feat4, Feat5, and Feat6 I have the allowed ranges, which in fact are extended to mini-features level, but no discrete values for a1, a2, ..., b1, b2, ..., f4, f5.

This is supposed to be a more accurate summary of what I have.

It makes more sense now?


Ambassador to the humans
You could just set up indicators essentially. So your "model" (really you don't need a statistical model if everything is exact but it doesn't hurt to think of it this way).

\(\beta_{a1}a1 + \beta_{a2}a2 + \ldots + \beta_{b1}b1 + \ldots + \beta_{f4}f4 = FINAL - INITIAL\)

If things really are "exact" and those relationships that you say exist really do hold then you'll be able to get the exact contribution of each mini-factor (assuming each mini-factor is represented in the data adequately).
Oh, this is very logical, but the question is exactly this - how to set up the indicators...?
Please keep in mind that in case Feat1, for ex, all minifeatures (a1, a2, a3 , a4 and a5) can vary between 0 and 50...


Ambassador to the humans
Like instead of a column that has Feat1 that displays a1, a1, a3, a2, etc... you have columns for a1 which takes a value of 1 if feat1 was a1 in that row and 0 otherwise, a column for a2 which takes a value of 1 if feat1 was a2 in that row and 0 otherwise. So the columns a1, a2, a3, a4, a5 will contain a single 1 and four 0s for each row. Then once you set those up forget that feat1, feat2, ... feat5 exist - you already have what you need in the columns a1, a2, ..., f3, f4. So you would basically do a regression using the 27 columns corresponding to your mini-features. That would allow you to estimate the 'impact' each of the mini-features has on the difference between final and initial.

In lots of software though you don't need to manually create those columns. You're really just saying you have 5 different factors with 5, 3, 6, 6, 3, and 4 levels.