I have a massive dataset (10s of millions of rows and 100s of dimensions). The dimensions are of all conceivable data types.

How do I arrive at the sample that is:

1) Smallest

2) Most representative of the population with respect to all the dimensions

If you can direct me to any scholarly article on this subject, would be grateful!