US Congress data opened - Open Knowledge Blog

Exciting news on open legislative data from the US. Eric Mills (from the Sunlight Foundation), Josh Tauberer (of GovTrack.us) and Derek Willis have been beavering away on a public domain scraper and dataset from THOMAS.gov, the official source for legislative information for the US Congress. They’ve just hit a key milestone – the incorporation of everything that THOMAS has on Bills going back to 1973 when its records began!

Eric says:

We’ve published and documented all of this data in bulk, and I’ve worked it into Sunlight’s pipeline, so that searches for bills in Scout use data collected directly from this effort.

The data and code are all hosted on Github on a “unitedstates” organization, which is right now co-owned by me, Josh, and Derek – the intent is to have this all exist in a common space. To the extent that the code needs a license at all, I’m using a public domain “unlicense” that should at least be sufficient for the US (other suggestions welcome).

There’s other great stuff in this organization, too – Josh made an amazing donation of his legislator dataset, and converted it to YAML for easy reuse. I’ve worked that dataset into Sunlight’s products already as well. I’ve also moved my legal citation extractor into this organization — and my colleague Thom Neale has an in-progress parser for the US Code, to convert it from binary typesetting codes into JSON.

Github’s organization structure actually makes possible a very neat commons. I’m hoping this model proves useful, both for us and for the public.