# Searching for data reduc

#### Shari

##### New Member
Hello,

I am so confused and so depressed, I hope someone can help.
In my Database I have a plenty of datas about 1000 machines.
It is to hard to recognize something in a diagram, so I need to reduce the machines data.
Every machines has >5 variables (about their costs, energy etc.)
All variables has no special max or min.
I want to reduce these datas and find similary machines and machines, that are totally different from the others.

Now my problem:
I have been searching since 2 whole weeks(!) and can´t find a mathematical method or something, that can help me.
Maybe I found some, but my mathematical background knowledge was too low, that I couldn´t recognise it as such a method.
Many times I found something, but later after studying I think "No, I can´t use this methods on my data set":shakehead
Then I am back at the beginning again, very frustrating.

For example I found these:
-Principal component analysis
-Factor analysis
-Clustering
but I read many papers and I still do not know, if I can use it for my case.
Especially factor analysis seems not to work in my case, but I read everywhere, that it is almost the same like PCA...

Please, can someone, who knows about these things, call some methods/techniques that can help for this form of data and my intention?
The more, the better, because I want to try different methods for maximal results.

I would be deeply grateful!
Shari

#### hlsmith

##### Omega Contributor
Please describe your data and purpose with greater detail.

So you have data for 1,000 machines, but how many observations per machine? And you want to see which machines are similar based on what (a single variable or a cluster)?

#### Shari

##### New Member
Hi hlsmith,

I hope I understand observations/cluster and answer your questions correctly:
Each machine has at least 5 values(I ll add 2 or 3 more soon), separated in costs, energy, width, length, height.
I want to see which machines are similar/different based on all 5 values together (or at least 2 or 3, if 5 is not possible).

#### szm

##### New Member
Initially I would try clustering and develop a dendrogram. I think that is the easier way to go.

As a second option, you can use PCA and look at the plots (that is a form of clustering) or after PCA construct a biplot. You might want to google that and find information how to do it because it is not the easiert thing to do.

See if that will work.