How to study lobbying with crowdsourced open data
The following guest post is from Regards Citoyens, a French organisation that promotes open data.
For about a year, Regards Citoyens has been working together with the French chapter of Transparency International in order to bring more transparency in the processes of influence and lobbying within the French parliament.
Lobbying is a very controversial subject in France: we discuss it a lot, but we do not know much about it. So we decided to try and study the visible part of this mysterious iceberg by bringing out some new data to the public debate. On a regular basis, MPs publish official reports regarding the preparation of their legislative and government evaluation work. It makes sense that they would listen to anyone concerned with the current topic during this process. But is this done in a fair, plural and transparent way? Are corporations and unions listened on an equal footing? What about NGOs and other actors from the civil society? Much like the European Parliament did, the French Assembly recently created an official register of lobbyists who get granted access to the hallways. But it turns out that this register does not contain more than a hundred names.
A few official reports from MPs
We decided to take a closer look and try to get a more complete list by browsing through all the 1,174 reports published between July 2007 and July 2010. Indeed, some of them propose an appendix with a list of all the hearings organised during the preparation of the report. Unfortunately, we quickly discovered that most reports do not feature such a list: using text analysis tools, we found them in only 38 % of the reports. Even this small visible part of influence seriously lacks in transparency. But that already provided us with an important dataset of 16,000 names, much more than the few officially registered lobbyists.
Our main concern then was to identify each organisation behind all of these names. Doing so was sometimes easy (mentioned along the name in the appendix), sometimes a bit harder (requiring to read pieces of the report, for instance). So we decided to develop a crowdsourcing tool allowing anyone to participate. An application available under a free licence, the AGPL, was built to process each name one by one, at least by three different users to validate the data. The idea was to make anyone able to easily contribute for just a few minutes, without having to register. Registration was only needed to participate in the top 50 contributors ladder. The simplicity and dynamicity of the Ajax-based interface (fields pre-filled and reports pre-loaded and scrolled), the fun of discovering lobbyists while “digitizing them” and the competitive aspect, provided by the ladder, certainly helped a lot: in a couple days a good Identi.ca/Twitter buzz started, and while we expected the crowdsourcing to take a couple months, everything was achieved in only 10 days thanks to more than 3,000 citizens!
This cool process brought us a database of 16,000 hearings with names, sex, functions and organisations of each one of the lobbyists. After some brief discussions with the national Assembly and the CNIL (French commission for privacy rights), we decided to release only the names of the organisations and not those of the people. Even though they are already public, coming from official reports, these institutions were unable to find an agreement on whether the names of lobbyists were public or private information. In the end, we decided to anonymise the data and make sure no illegal database of religious or union affiliation could be published out of it. Using Freebase GridWorks, we finally refined the data and consolidated it into 9,300 grouped hearings of organizations, which were associated to the theme subjects of each report.
But to be able to draw trends, we needed to categorize these organizations by interests: unions, corporations, individuals, religious organisations, think-tanks, NGO’s and associations. We first used the EU registry, but the large number of organisations we needed to classify quickly revealed the limits of the commission’s categories, especially regarding the public sector organisations. So we decided to improve it and build progressively our own categorization of interest representatives (fr) while categorizing gradually the data.
But what did we learn? First, that on all subjects, there were considerably fewer hearings with women (24 %), with the only exception of the reports regarding… gender issues, of course! Also, the study reveals that MPs listen mainly during their hearings to administrations and organisations from the public sector (48 %). Trade unions and other professionnal organisations come then, followed at the 3rd place by private companies. NGOs and civil society organisations lobby in only 7 % cases. But the most interesting conclusion probably comes from the comparison of the categories for each specific theme. We can observe that companies are more often listened on topics like economy, energy, environment and probably more suprisingly on transportations, culture or digital issues. On the other hand, civil society organisations are more presents in topics like development aid or veterans. All of these results concern of course only the visible part of the lobbying, but taking a close look at the holes (like the surprisingly low number of hearings for private companies on health issues) provides interesting insights and validates our conclusion: transparency in France is definitely lacking in this area!
Of course, all of the anonymised data that was generated for this study is republished as open data under the ODBL licence and freely reusable. We completed the data with extra information such as the authors and political groups of the reports and such. This means there are certainly plenty other possible uses to these data! We’re convinced making it open data can only bring more great projects!