A little note on behalf of Nelson Lah, Chair of the Open Data Society of British Columbia, Canada.
The Open Data Society of BC is hosting the BC Open Data Summit on February 19, 2013 in downtown Vancouver at SFU Segal Graduate School of Business at 500 Granville Street. We want you to be part of this conversation.
We’re especially proud of what the BC open data community has accomplished over the past three years since we first started growing our community. At the beginning, the thought of accessing data on a government web site was tricky business (imagine wanting access to government data!). We had to sort out challenges around licensing, formats, methods of access…the works. Through many great conversations and threads here and elsewhere, we have thankfully sorted out much of the “WHAT” of open data.
Now that we have some open data accessible to us, we thought it would be helpful to focus on the value that it can bring. We want to explore how it is being used in academia, in evidence-based decision making, in fighting corruption, in uncovering new opportunities, and in creating economic value, businesses and jobs.
That’s why we’re inviting you to join us at the BC Open Data Summit in February.
We are calling you to come out and make a short presentation. Share your ideas, and what you or your organization has done with open data to create value. Proposals are due by January 13.
Last week saw the launch of prescribinganalytics.com (covered in the Economist and elsewhere). At present it’s “just” a nice data visualisation of some interesting open data that show the NHS could potentially save millions from its drug budget. I say “just” because we’re in discussions with several NHS organizations about providing a richer, tailored, prescribing analytics service to support the best use of NHS drug budgets.
Working on the project was a lot of fun, and to my mind the work nicely shows the spectacular value of open data when combined with people and internet. The data was, and is, out there. All 11 million or so rows of it per month, detailing every GP prescription in England. Privately, some people expressed concern that failure to do anything with the data so far was undermining efforts to make it public at all.
Once data is open it takes time for people to discover a reason for doing something interesting with it, and to get organized to do it. There’s no saying what people will use the data for, but provided the data isn’t junk there’s a good bet that sooner or later something will happen.
The story of how prescribinganalytics.com came to be is illustrative, so I’ll briefly tell my version of it here… Fran (CEO of https://www.mastodonc.com/) emailed me a few months ago with the news that she was carrying out some testing using the GP prescribing data.
I replied and suggested looking at prescriptions of proprietary vs generic ACE-inhibitors (a drug that lowers blood pressure) and a few other things. I also cc’d Ben Goldacre and my good friend Tom Yates. Ben shared an excellent idea he’d had a while ago for a website with a naughty name that showed how much money was wasted on expensive drugs where there was an identically effective cheaper option and suggested looking at statins (a class of drug that reduces the risk of stroke, heart attack, and death) first.
Fran did the data analysis and made beautiful graphics. Ben, Tom, and I, with help from a handful of other friends, academics, and statisticians provided the necessary domain expertise to come up with an early version of the site which had a naughty name. We took counsel and decided it’d be more constructive, and more conducive to our goals, not to launch the site with a naughty name. A while later the http://www.theodi.org/ offered to support us in delivering prescribinganalytics.com. In no particular order, Bruce (CTO of https://www.mastodonc.com/), Ayesha (http://londonlime.net/), Sym Roe, Ross Jones, and David Miller collaborated with the original group to make the final version.
I’d call the way we worked peer production, a diverse group of people with very different skill sets and motivations formed a small self-organizing community to achieve the task of delivering the site. I think the results speak for themselves, it’s exciting, and this is just the beginning
Mastodon C is a start-up company currently based at The Open Data Institute. The Open Data Institute’s mission is to catalyse the evolution of an open data culture to create economic, environmental, and social value.
This blog post is written by Sven Vlaeminck | ZBW – German National Library of Economics / Leibniz Information Center for Economics
In Economics, as in many other research disciplines, there is a continuous increase in the number of papers where authors have collected their own research data or used external datasets. However, so far there have been few effective means of replicating the results of economic research within the framework of the corresponding article, of verifying them and making them available for repurposing or using in the support of the scholarly debate.
In the light of these findings B.D. McCullough pointed out: “Results published in economic journals are accepted at face value and rarely subjected to the independent verification that is the cornerstone of the scientific method. Most results published in economics journals cannot be subjected to verification, even in principle, because authors typically are not required to make their data and code available for verification.” (McCullough/McGeary/Harrison: “Lessons from the JMCB Archive”, 2006)
Harvard Professor Gary King also asked: “[I]f the empirical basis for an article or book cannot be reproduced, of what use to the discipline are its conclusions? What purpose does an article like this serve?” (King: “Replication, Replication” 1995). Therefore, the management of research data should be considered an important aspect of the economic profession.
The project EDaWaX
Several questions came up when we considered the reasons why economics papers may not be replicable in many cases:
First: what kind of data is needed for replication attempts? Second: it is apparent that scholarly economic journals play an important role in this context: when publishing an empirical paper, do economists have to provide their data to the journal? How many scholarly journals commit their authors to do so? Do these journals require their authors to submit only the datasets, or also the code of computation? Do they pledge their authors to provide programs used for estimations or simulations? And what about descriptions of datasets, variables, values or even a manual on how to replicate the results?
As part of generating the functional requirements for this publication-related data archive, the project analyzed the data (availability) policies of economic journals and developed some recommendations for these policies that could facilitate replication.
To read about the results of the EDaWaX survey, please see the full blog post on Open Economics.
The Digital Public Library of America (DPLA) is an ambitious project to build a national digital library platform for the United States that will make the cultural and scientific record available, free to all Americans. Hosted by the Berkman Center for Internet & Society at Harvard University, the DPLA is an international community of over 1,200 volunteers and participants from public and research libraries, academia, all levels of government, publishing, cultural organizations, the creative community, and private industry devoted to building a free, open, and growing national resource.
Here’s an outline of some of the key developments in the DPLA planning initiative. For more information on the Digital Public Library of America, including ways in which you can participate, please visit http://dp.la.
In the fall of 2012, the DPLA received funding from the National Endowment for the Humanities, the Institute for Museum and Library Services, and the Knight Foundation to support our Digital Hubs Pilot Project. This funding enabled us to develop the DPLA’s content infrastructure, including implementation of state and regional digital service pilot projects. Under the Hubs Pilot, the DPLA plans to connect existing state infrastructure to create a national system of state (or in some cases, regional) service hubs.
The service hubs identified for the pilot are:
Mountain West Digital Library (Utah, Nevada and Arizona)
Digital Commonwealth (Massachusetts)
Digital Library of Georgia
Kentucky Digital Library
Minnesota Digital Library
South Carolina Digital Library
In addition to these service hubs, organizations large digital collections that are going make their collections available via the DPLA will become content hubs. We have identified the National Archives and Records Administration, the Smithsonian Institute, and Harvard University as some of the first potential content hubs in the Digital Hubs Pilot Project.
Here’s our director for content, Emily Gore, to give you a full overview:
The technical development of the Digital Public Library of America is being conducted in a series of stages. The first stage (December 2011-April 2012) involved the initial development of a back-end metadata platform. The platform provides information and services openly and to all without restriction by way of open source code.
We’re now on stage two: integrating continued development of the back-end platform, complete with open APIs, with new work on a prototype front end. It’s important to note that this front-end will serve as a gesture toward the possibilities of a fully built-out DPLA, providing but one interface for users to interact with the millions of records contained in the DPLA platform.
Development of the back-end platform — conducted publicly, with all code published on GitHub under a GNU Affero General Public License — continues so that others can develop additional user interfaces and means of using the data and metadata in the DPLA over time, which continues to be a key design principle for the project overall.
We’ve been hosting a whole load of events, from our large public events like the DPLA Midwest last month in Chicago, to smaller more intimate hackathons. These events have brought together a wide range of stakeholders — librarians, technologists, creators, students, government leaders, and others – and have proved exciting and fruitful moments in driving the project forward.
On November 8-9, 2012, the DPLA will convene its first “Appfest” Hackathon at the Chattanooga Public Library in Chattanooga, TN. The Appfest is an informal, open call for both ideas and functional examples of creative and engaging ways to use the content and metadata in the DPLA back-end platform. We’re looking for web and mobile apps, data visualization hacks, dashboard widgets that might spice up an end-user’s homepage, or a medley of all of these. There are no strict boundaries on the types of submissions accepted, except that they be open source. You can check out some of the apps that might be built at the upcoming hackathon on the Appfest wiki page.
The DPLA remains an extremely ambitious project, and we encourage anyone with an interest in open knowledge and the democratization of information to participate in one form or another. If you have any questions about the project or ways to get involved, please feel free to email me at kwhitebloom[at]cyber.law.harvard.edu.
“The big problem in economics is that it really matters in which journals you publish, so the reputation factor is a big hindrance in getting open access journals up and going”. Can the accepted norms of scholarly publishing be successfully challenged?
This quotation is a line from the correspondence about writing this blogpost for the OKFN. The invitation came to write for the Open Economics Working Group, hence the focus on economics, but in reality the same situation pertains across pretty much any scholarly discipline you can mention. From the funding bodies down through faculty departments and academic librarians to individual researchers, an enormous worldwide system of research measurement has grown up that conflates the quality of research output with the publications in which it appears. Journals that receive a Thomson ISI ranking and high impact factors are perceived as the holy grail and, as is being witnessed currently in the UK during the Research Excellence Framework (REF) process, these carry tremendous weight when it comes to research fund awards.
Earlier this year, I attended a meeting with a Head of School at a Russell Group university, in response to an email that I had sent with information about Social Sciences Directory, the ‘gold’ open access publication that I was then in the first weeks of setting up. Buoyed by their acceptance to meet, I was optimistic that there would be interest and support for the idea of breaking the shackles of existing ranked journals and their subscription paywall barriers. I believed then – and still believe now – that if one or two senior university administrators had the courage to say, “We don’t care about the rankings. We will support alternative publishing solutions as a matter of principle”, then it would create a snowball effect and expedite the break up of the current monopolistic, archaic system. However, I was rapidly disabused. The faculty in the meeting listened politely and then stated categorically that they would never consider publishing in a start up venture such as Social Sciences Directory because of the requirements of the REF. The gist of it was, “We know subscription journals are restrictive and expensive, but that is what is required and we are not going to rock the boat”.
I left feeling deflated, though not entirely surprised. I realised some time ago that the notion of profit & loss, or cost control, or budgetary management, was simply anathema to many academic administrators and that trying to present an alternative model as a good thing because it is a better deal for taxpayers is an argument that is likely to founder on the rocks of the requirements of the funding and ranking systems, if not apathy and intransigence. A few years ago, whilst working as a sales manager in subscription publishing, I attended a conference of business school deans and directors. (This in itself was unusual, as most conferences that I attended were for librarians – ALA, UKSG, IFLA and the like – as the ‘customer’ in a subscription sense is usually the university library). During a breakout session, a game of one-upmanship began between three deans, as they waxed lyrically about the overseas campuses they were opening, the international exchanges of staff and students they had fixed up, the new campus buildings that were under construction, and so on.
Eventually, I asked the fairly reasonable question whether these costly ventures were being undertaken with a strategic view that they would eventually recoup their costs and were designed to help make their schools self-funding. Or indeed, whether education and research are of such importance for the greater good of all that they should be viewed as investments. The discomfort was palpable. One of the deans even strongly denied that this is a question of money. That the deans of business schools should take this view was an eye-opening insight in to the general academic attitude towards state funding. It is an attitude that is wrong because ultimately, of course, it is entirely about the money. The great irony was that this conversation took place in September 2008, with the collapse of Lehman Brothers and the full force of the Global Financial Crisis (GFC) soon to impact gravely on the global higher education and research sector. A system that for years had been awash with money had allowed all manner of poor practices to take effect, in which many different actors were complicit. Publishers had seized on the opportunity to expand output massively and charge vast fees for access; faculty had demanded that their libraries
subscribe to key journals, regardless of cost; libraries and consortia had agreed to publishers’ demands because they had the money to do so; and the funding bodies had built journal metrics into the measurement for future financing. No wonder, then, that neither academia nor publishers could or would take the great leap forward that is required to bring about change, even after the GFC had made it patently clear that the ongoing subscription model is ultimately unsustainable. Change needs to be imposed, as the British government bravely did in July with the decision to adopt the recommendations of the Finch Report.
However, this brings us back to the central issue and the quotation in the title. For now, the funding mechanisms are the same and the requirement to publish in journals with a reputation is still paramount. Until now, arguments against open access publishing have tended to focus on quality issues. The argument goes that the premier (subscription) journals take the best submissions and then there is a cascade downwards through second tier journals (which may or may not be subscription-based) until you get to a pile of leftover papers that can only be published by the author paying a fee to some sort of piratical publisher. This does not stand much scrutiny. Plenty of subscription-based journals are average and have been churned out by publishers looking to beef up their portfolios and justify charging ever-larger sums. Good research gets unnecessarily dumped by leading journals because they adhere to review policies dating from the print age when limited pagination forced them to be highly selective. Other academics, as we have seen at Social Sciences Directory, have chosen to publish and review beyond the established means because they believe in finding and helping alternatives. My point is that good research exists outside the ‘top’ journals. It is just a question of finding it.
So, after all this, do I believe that the “big hindrance” of reputation can be overcome? Yes, but only through planning and mandate. Here is what I believe should happen:
The sheer number of journals is overwhelming and, in actuality, at odds with modern user behaviour which generally accesses content online and uses a keyword search to find information. Who needs journals? What you want is a large collection of articles that are well indexed and easily searchable, and freely available. This will enable the threads of inter-disciplinary research to spread much more effectively. It will increase usage and reduce cost-per-download (increasingly the metrics that librarians use to measure the return on investment of journals and databases), whilst helping to increase citation and impact.
Ensure quality control of peer review by setting guidelines and adhering to them.
De-couple the link between publishing and tenure & department funding.
In many cases, universities will have subscribed to a particular journal for years and will therefore have access to a substantial back catalogue. This has often been supplemented by the purchase of digitised archives, as publishers cottoned on to other sources of revenue which happened to chime with librarians’ preferences to complete online collections and take advantage of non-repeatable purchases. Many publishers also sell their content to aggregators, who agree to an embargo period so that the publisher can also sell the most up-to-date research directly. Although the axe has fallen on many print subscriptions, some departments and individuals still prefer having a copy on their shelves (even though they could print off a PDF from the web version and have the same thing, minus the cover). So, aside from libraries often paying more than once for the same content, they will have complete collections up to a given point in time. University administrators need to take the bold decision to change, to pick an end date as a ‘cut off’ after which they will publicly state that they are switching to new policies in support of OA. This will allow funds to be freed up and used to pay for institutional memberships, article processing fees, institutional repositories – whatever the choice may be. Editors, authors and reviewers will be encouraged to offer their services elsewhere, which will in turn rapidly build the reputation of new publications.
Scholarly publishing is being subjected to a classic confrontation between tradition and modernity. For me, it is inevitable that modernity will win out and that the norms will be successfully challenged.
Hackathons are a wonderful way to introduce people of all walks to the amazing possibilities of open data. Here in British Columbia we are fortunate to have a very active open data community which has organized and run 17 open data hackathons in the past two years. This year a few of us decided that there was enough demand for hackathons that we wanted to figure out how to scale them out so that a lot more events could be held. We saw lots of people wanting to facilitate and sponsor them, but the hackathon concept was still quite new to them.
We decided that the best way to scale it out was to create more champions and encourage others to hold their own hackathons. As part of that effort we decided that a guide would be a great tool for this purpose. The guide is meant to describe not only how to run a hackathon but why we run them the way we do. Our intention is that our participants walk away from our events inspired and with a new understanding of what open data is and how it can be used to inform, and add value. Our intention with the guide is that other people will take on running their own hackathons and create that same result in their own communities.
So – we’re happy to announce that v1.0 of our Open Data Hackathon How To Guide is ready and available here:
After a weekend-long hacking all participating teams will present their ideas to a panel of experts with the chance to get one-on-one mentorship and other prizes. Some of the judges include Dr Ben Goldacre – author of Bad Science and Bad Pharma and Dr Carl Reynolds - Co-Founder of Open Health Care UK. The winning team would also present their project at the launch of the Open Data Institute to Tim Berners-Lee.
Participants will be inspired by existing digital health startups, leading industry representatives and mentors throughout the weekend.
Exciting news on open legislative data from the US. Eric Mills (from the Sunlight Foundation), Josh Tauberer (of GovTrack.us) and Derek Willis have been beavering away on a public domain scraper and dataset from THOMAS.gov, the official source for legislative information for the US Congress. They’ve just hit a key milestone – the incorporation of everything that THOMAS has on Bills going back to 1973 when its records began!
The data and code are all hosted on Github on a “unitedstates” organization, which is right now co-owned by me, Josh, and Derek – the intent is to have this all exist in a common space. To the extent that the code needs a license at all, I’m using a public domain “unlicense” that should at least be sufficient for the US (other suggestions welcome).
There’s other great stuff in this organization, too – Josh made an amazing donation of his legislator dataset, and converted it to YAML for easy reuse. I’ve worked that dataset into Sunlight’s products already as well. I’ve also moved my legal citation extractor into this organization — and my colleague Thom Neale has an in-progress parser for the US Code, to convert it from binary typesetting codes into JSON.
Github’s organization structure actually makes possible a very neat commons. I’m hoping this model proves useful, both for us and for the public.
New research shows that the traditional arguments for copyright extension are as flawed as we always suspected.
Copyright is generally defended in terms of the stimulus it gives to creative production: what motivation would anyone have to do anything ever if they don’t get decades of ownership afterwards? But then how do you justify the continual increase in copyright terms which has taken place over the last century, and applies retrospectively to works made in the past? Extending their copyright protection can’t stimulate their production – they’ve already been made!
Three main arguments are advanced: that works which fall into the public domain will be under-exploited, because there will be no incentive to produce new works; that they will be over-exploited, with too many people using them and therefore reducing their worth; and that they will be tarnished, by being reproduced in low quality ways or associated with undesirable things.
Our data suggest that the three principal arguments in favor of copyright term extension—under-exploitation, over-exploitation, and tarnishment—are unsupported There seems little reason to fear that once works fall into the public domain, their value will be substantially reduced based on the amount or manner in which they are used. We do not claim that there are no costs to movement into the public domain, but, on the opposite side of the ledger, there are considerable benefits to users of open access to public domain works. We suspect that these benefits dramatically outweigh the costs.
Our data provide almost no support for the arguments made by proponents of copyright term extension that once works fall into the public domain they will be produced in poor quality versions that will undermine their cultural or economic value. Our data indicate no statistically significant difference, for example, between the listeners’ judgments of the quality of professional audiobook readers of copyrighted and public domain texts.
It’s getting to be that time again, when Mickey Mouse gets closer and closer to the public domain — and you know what that means: a debate about copyright term extension. As you know, whenever Mickey is getting close to the public domain, Congress swoops in, at the behest of Disney, and extends copyright.
The results are clear. The so-called “harm” of works falling into the public domain does not appear to exist. Works are still offered (in fact, they’re more available to the public, which we’re told is what copyright is supposed to do), there are still quality works offered, and the works are not overly exploited. So what argument is there left to extend copyright?
Code for Europe is a new organization looking to enliven a culture of innovation in city government. This week they have launched a hunt for talented developers and app makers, “to help make a breakthrough in how government services its citizens.” The projects will take place in six European cities: Helsinki, Amsterdam, Rome, Berlin, Manchester and Barcelona.
Fellows will work collaboratively to “develop out-of the-box answers to the common challenges cities are faced with, to not only make a real difference in each location, but also provide solutions that can be used in other European cities, and around the world.” They will do more than just code – they will act as their own project managers to research, create and develop a complete solution. They are provided an access to city officials and often newly opened government data sets, to gain full comprehension on the issues. The positions are salaried, starting in January 2013, and the application deadline is November 4th.