The following guest post is by Boyan Yurukov, blogger and open government data activist.
In the beginning of 2011 some open data was released by the Bulgarian government on www.parliament.bg. Visitors could export information of bills and members of parliament as XML or CSV. They could also download the votes of individual MPs or parliamentary groups as Excel files. While what data was useful and an important step forward, I found problems in the format and the exported files. Also, one could find a lot more information on the website, that could not be exported as open structured data.
So I started a project to scrape the website, fix the available data, refine, enrich and link it. After several versions of the schema, the final dataset was released in the beginning of December. It contains over 11,000 data points and over 1.12 Gb of data. The items are as follows:
- Profiles for each MP – general biography, previous parliamentary terms, participation in parliamentary groups, committees, “friendship groups” and delegation, supported bills, absences, external consultants, questions during plenary meetings.
- Data on bills – laws, legislative proposals, decisions and official declarations.
- Parliamentary groups and committees – current members and member history, proposed bills, external consultants, meeting schedule, agenda, transcripts and reports.
- Parliamentary delegations and “friendship groups” – current members and member history.
- Parliamentary sittings – program for the sitting with questions and legislative proposals; transcripts; voting history for each MP on each discussion point.
- Parliamentary procurements – description, topic, procurement registry code.
The dataset can be downloaded as two ZIP files together with the XSD schema. The scraping scripts are also open sourced in GitHub. You can find all this open Bulgarian Parliament data on the DataHub.
Although refined, this data is not without its flaws. Some historical data on MPs’ biography and questions is missing. Also, transcripts are not structured, but in free text, making it almost impossible to parse. There is some hope that the parliamentary administration will release the transcripts in XML, but I’m not holding my breath. Currently the transcripts go back 20 years, and those back to the ’70s are being parsed and will be released soon. All other data is since 2001, except individual votes, which are since 2009.
This data can be quite useful for parliamentary journalism, but in itself consists only of raw XML files. This is why another project is being set up that aims at building a platform for analyzing and visualizing the refined dataset. It will be targeted at data journalists and visualization experts. It is sponsored by the Institute for Public Environment Development and all results will be released as open data. I hope that in the first quarter of 2012 the first beta will come out.
Theodora is press officer at the Open Knowledge Foundation, based in London. Get in touch via press@okfn.org
Thanks to OKFN for posting info on the Bulgarian parliament open data. Just to point out that the data journalizm project is not mine – I will just help with transforming and cleaning up the data.
Update on the project – due to many changes in the structure of both the parliament and their site, the automatic download process no longer recognizes the otherwise broken data files. I’m working on a new solution, but that will take time. My attempts to get the parliament administration to fix their files and provide a true opendata solution have failed.