Attempting to fix the Online Data Repository

There are a few data set repositories online, and some of them have gained a degree of popularity with academics and data scientists.

However, many criticisms have been leveled at online data repositories, including poor data hygiene, poor search functionality, a lack of functionality facilitating communication among users, etc. Consequently, online data repositories, although extremely useful, haven't gained as much of a user base as they could otherwise have.

At, we've attempted to address many of the common criticisms directed at online data repositories. datapub's features include the following:

  • an innovative data set search grid, which enables search by multiple criteria, as well as permitting forward and reverse alphanumeric browsing by multiple criteria
  • data set format validation: when data sets are uploaded and a common file format is specified, the application attempts to validate that the file is properly formatted
  • users can rate data sets, letting others know which data sets are more likely to be useful
  • uploaded files which have had their format validated can be previewed, and visualizations can be dynamically generated
  • social network/communications functionality: each user has a public message board; users can send private messages to multiple recipients; each data set has a message board on which users can leave comments

datapub is essentially an experiment, and as such, we greatly welcome user input. We hope you give datapub a try, let us know what you think, sign up, and contribute data sets of your own. With your help, we'd like to reinvent the online data repository. Data is one of the most valuable commodities in the world, and we'd like to create a great platform for sharing it. Enjoy!

community [at]