+ Reply to Thread
Results 1 to 8 of 8

Thread: Multiple Regression to receive cost for a single item of an order

  1. #1
    Points: 88, Level: 1
    Level completed: 76%, Points required for next Level: 12

    Posts
    4
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Lightbulb Multiple Regression to receive cost for a single item of an order




    Hi everybody,

    I appreciate any kind of help and tips:

    Problem:
    - I have a dataset with 1600 orders
    - For each of these orders I know the total costs & products included (e.g. 7975 & A, C, D, H)
    - I don't know the individual prices for the range of products (in total 80), which would, for instance, be helpful to calcuate new offers .
    --> please see the attached pic

    Potential Solultion: Multiple regression
    - no of observations = no of orders
    - independent variables = all products (= ~80)
    - dependent variable = total cost (per order)

    Questions:
    - Is a multiple regression a suitable statistical tool?
    - If not: Is there an alternative?
    - If yes: Are there any specialities I have to consider (e.g. multicollinearity --> how can I reduce it, etc.)?

    Many thanks for your help!
    Chris
    Attached Images  

  2. #2
    Omega Contributor
    Points: 38,396, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,001
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Multiple Regression to receive cost for a single item of an order

    Yeah, I could see multiple linear regression working or getting you in the ballpark, then you confirm results based on a few future orders. Are you going to treat products as binary variables (1/0)? That is what I would do. I don't think MC should impact it.


    This also seems solvable in matrix algebra. Though I do not know MA that well. Which regression uses MA.
    Stop cowardice, ban guns!

  3. The Following User Says Thank You to hlsmith For This Useful Post:

    christoph_b (02-04-2016)

  4. #3
    Points: 2,758, Level: 32
    Level completed: 6%, Points required for next Level: 142

    Posts
    21
    Thanks
    10
    Thanked 1 Time in 1 Post

    Re: Multiple Regression to receive cost for a single item of an order

    Can you de-duplicate the rows and find the inverse matrix of the Product Matrix?

    Linear algebra would be as follows:
    1) AB = C where A = Product Matrix, B = Price Vector, and C = Order Price Vector
    2) Inv(A)AB = Inv(A)C where Inv(A)A = Identity Matrix
    3) IB = Inv(A)C
    4) B = Inv(A)C

    If you cannot invert matrix A then the regression solution might be good enough. But if it works, then B should yield a vector of the prices for each Product.

  5. The Following User Says Thank You to Oberon For This Useful Post:

    christoph_b (02-04-2016)

  6. #4
    Points: 88, Level: 1
    Level completed: 76%, Points required for next Level: 12

    Posts
    4
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Multiple Regression to receive cost for a single item of an order

    Thanks a lot for your instant answer, hlsmith & Oberon!
    1) Linear Algebra:
    - Based on my current state of knowledge, I think I cannot build a quadratic (= invertible) product matrix, which is necessary to use the approach of Oberon, right?

    2) Multiple regression
    - Sounds good that a multiple linear regression is a valid approach
    - @hlsmith: yes, I want to treat products as binary variables
    - When I conduct a multiple regression with Stata, I receive improper values, e.g.[INDENT]- negative coefficients (= cost per product)
    - low R^2 (0.72)
    - high standard error (e.g. 638 for coefficient 1108)
    - 30% of coefficients not statistically significant
    - omitted variables because of multicollinearity
    - a potential reasons might be: costs for individual product calculated within an order vary strongly from order to order
    - what are possible countermeasures: clean up data? My first approaches would be:
    - combine correlated variables to factor?
    - de-duplicate rows (orders)?
    - exclude those orders that contain products with strongly varying cost per product over different orders?
    - Do you have more ideas?

    Thanks a lot!
    Cheers
    Chris

  7. #5
    Omega Contributor
    Points: 38,396, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,001
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Multiple Regression to receive cost for a single item of an order

    Can you just solve the easier products first? Find some one who only bought a couple of product and different combinations of those? Or if a person who just bought a single product - kind of like a word decipher!
    Stop cowardice, ban guns!

  8. #6
    Points: 88, Level: 1
    Level completed: 76%, Points required for next Level: 12

    Posts
    4
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Multiple Regression to receive cost for a single item of an order

    Ok. So you mean start with some chosen orders and products and then enlarge it step by step?

  9. #7
    Omega Contributor
    Points: 38,396, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,001
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Multiple Regression to receive cost for a single item of an order

    Yes, that would be another approach. If there are 80 possible products, for observations, what is the fewest products selected.
    Stop cowardice, ban guns!

  10. #8
    Points: 88, Level: 1
    Level completed: 76%, Points required for next Level: 12

    Posts
    4
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Multiple Regression to receive cost for a single item of an order


    Hello again,
    I have worked on my data record and regression for the last few days, but unfortunately I have not received the results I expected. Please check out the excel file attached:
    • Some orders seem to be calculated accurately (e.g. order 156 - 172); Those orders contain mostly the same products)
    • For other orders the deviation (regression actual, column BB) is partly pretty high; those orders contain often a variety of different products (e.g. order 103 -155).
    • Some products have negative costs (e.g. ca, cc,..)

    -->Considering the results and the structure of the data record, is a multiple regression the right tool?

    -->Will the results be more accurate if I drop more datasets (with outliers) or focus on those with more reasonable results first?
    Many thanks!!
    Attached Files

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats