May 29, 2009

Data.gov: A Critique

We spoke of the government's launch of data.gov the other day, but I thought it might be interesting to look at a general critique of the project from a technological standpoint.

Sunlight Labs developer Clay has written about what he would like to see on the site in order to make it into the federal government data clearinghouse it seeks to be.

His first problem is that most of the data comes from one source (the USGS) and lacks significant amounts of other kinds of data (campaign finance reports, financial disclosure statements by administration officials, census data, and labor statistics). While info on copper smelting is good, knowing who recieved the most campaign cash from the pharamacology industry is better.

The second problem is that the data listed on data.gov sometimes points to third-party websites. While there are perfectly logical reasons for this, cleaner solutions do exist. Data.gov can (and should) design URLs which allow application developers to create tools that work with datasets on data.gov. For example, imagine being able to download census data from 2000 from this easy to understand URL: data.gov/data/census/2000/xml. Want it from 1999? data.gov/data/census/1999/xml. Another format? data.gov/data/census/1999/json.

Finally, there is the issue of engagement. Data.gov can learn a lot just by maintaining a blog. As Clay puts it:

So much of dealing with data is narrative, and telling the story of Data.gov on an ongoing basis has so much value to it. We want to know what's going on on the inside, who is working on it, what the process is and who is building it. How are you talking Federal Agencies into putting their data online. What software challenges are you facing? When there's new data, how will we know?

So there you have it. More data, cleaner URLs and more engagement. Sounds good to us.