If the continuous variable is independent variable(IV), then you can make the split using scatter-plot(IV &DV). If you are able to see two clusters in the graph then it is easy to find the split.
or split IV in the meanof(IV)
Hi,
What segmentation method I could use for splitting optimally (*) a continuous variable in two groups in order to run an ANOVA ?
(beside exploratory approach as boxplot and so on.)
(*): that is to say to maximize inter-group variance.
vincent
If the continuous variable is independent variable(IV), then you can make the split using scatter-plot(IV &DV). If you are able to see two clusters in the graph then it is easy to find the split.
or split IV in the meanof(IV)
In the long run, we're all dead.
I would avoid an approximative graphical solution, because I need to implement a method automatically for a lot of different sample. So I am looking for an algorithmic solution giving me treshold values.
I am not clear about your objective. If you wanted algorithmic solution, you can use clustering technique, use k-means clustering( use k=2)
In the long run, we're all dead.
My objective is more simple than what could bring a cluster analysis.
My population is composed of observation of firms.
My DV is an economic ratio of theses individual firm.
My IV is the size of theses firms (the unity is the number of employees).
I have good reasons to think (theory, viewing boxplot) that the economic ratio is related to the size of firms. But because heteroscedasticity, outliers and other specific needs, I would prefer run anova after partitionning the IV in two classes with a cut-off point.
I would like to use an algorithm to choose the cut-off point that divide optimally my population so as to maximise the inter-group variance (*) with sas or R to allow me programmation.
gratefully yours.
vincent
(*) empirically I have already ranked my observation ascendantly with the size of the IV and calculate slipping means according to the move of an observation of one one group to the other in order to identify the maximal gap between the means of every group. But it is not enough statistical nor automatic.
I don't think that splitting DV is a good idea. When you put into two groups(assigning indicator variable 0 or 1), you are reducing the information of DV. It is my personal opinion. If you found this method is useful plz let me know.
you can't use discriminant analysis, because the group is not pre-determined.
I guess it is one time work. I still feel k-means algorithm will help you.
In the long run, we're all dead.
|
|