Support Us

You are browsing the archive for Open Economics.

Newsflash! OKFestival Programme Launches

Beatrice Martini - June 4, 2014 in Events, Free Culture, Join us, Network, News, OKFest, OKFestival, Open Access, Open Data, Open Development, Open Economics, Open Education, Open GLAM, Open Government Data, Open Humanities, Open Knowledge Foundation, Open Knowledge international Local Groups, Open Research, Open Science, Open Spending, Open Standards, Panton Fellows, Privacy, Public Domain, Training, Transparency, Working Groups

At last, it’s here!

Check out the details of the OKFestival 2014 programme – including session descriptions, times and facilitator bios here!

Screen Shot 2014-06-04 at 4.11.42 PM

We’re using a tool called Sched to display the programme this year and it has several great features. Firstly, it gives individual session organisers the ability to update the details on the session they’re organising; this includes the option to add slides or other useful material. If you’re one of the facilitators we’ll be emailing you to give you access this week.

Sched also enables every user to create their own personalised programme to include the sessions they’re planning to attend. We’ve also colour-coded the programme to help you when choosing which conversations you want to follow: the Knowledge stream is blue, the Tools stream is red and the Society stream is green. You’ll also notice that there are a bunch of sessions in purple which correspond to the opening evening of the festival when we’re hosting an Open Knowledge Fair. We’ll be providing more details on what to expect from that shortly!

Another way to search the programme is by the subject of the session – find these listed on the right hand side of the main schedule – just click on any of them to see a list of sessions relevant to that subject.

As you check out the individual session pages, you’ll see that we’ve created etherpads for each session where notes can be taken and shared, so don’t forget to keep an eye on those too. And finally; to make the conversations even easier to follow from afar using social media, we’re encouraging session organisers to create individual hashtags for their sessions. You’ll find these listed on each session page.

We received over 300 session suggestions this year – the most yet for any event we’ve organised – and we’ve done our best to fit in as many as we can. There are 66 sessions packed into 2.5 days, plus 4 keynotes and 2 fireside chats. We’ve also made space for an unconference over the 2 core days of the festival, so if you missed out on submitting a proposal, there’s still a chance to present your ideas at the event: come ready to pitch! Finally, the Open Knowledge Fair has added a further 20 demos – and counting – to the lineup and is a great opportunity to hear about more projects. The Programme is full to bursting, and while some time slots may still change a little, we hope you’ll dive right in and start getting excited about July!

We think you’ll agree that Open Knowledge Festival 2014 is shaping up to be an action-packed few days – so if you’ve not bought your ticket yet, do so now! Come join us for what will be a memorable 2014 Festival!

See you in Berlin! Your OKFestival 2014 Team

Open Economics: the story so far…

Velichka Dimitrova - August 30, 2013 in Featured, Open Economics

A year and a half ago we embarked on the Open Economics project with the support of the Alfred P. Sloan Foundation and we would like a to share a short recap of what we have been up to.

Our goal was to define what open data means for the economics profession and to become a central point of reference for those who wanted to learn what it means to have openness, transparency and open access to data in economics.

Advisory Panel of the Open Economics Working Group:

Advisory Panel

We brought together an Advisory Panel of twenty senior academics who advised us and provided input on people and projects we needed to contact and issues we needed to tackle. The progress of the project has depended on the valuable support of the Advisory Panel.

1st Open Economics Workshop, Dec 17-18 ’12, Cambridge, UK:

2nd Open Economics Workshop, 11-12 June ’13, Cambridge, MA:

International Workshops

We also organised two international workshops, first one held in Cambridge, UK on 17-18 December 2012 and second one in Cambridge U.S. on 11-12 June 2013, convening academics, funders, data publishers, information professionals and students to share ideas and build an understanding about the value of open data, the still persisting barriers to opening up information, as well as the incentives and structures which our community should encourage.

Open Economics Principles

While defining open data for economics, we also saw the need to issue a statement on the openness of data and code – the Open Economics Principles – to emphasise that data, program code, metadata and instructions, which are necessary to replicate economics research should be open by default. Having been launched in August, this statement is now being widely endorsed by the economics community and most recently by the World Bank’s Data Development Group.


The Open Economics Working Group and several more involved members have worked on smaller projects to showcase how data can be made available and what tools can be built to encourage discussions and participation as well as wider understanding about economics. We built the award-winning app Yourtopia Italy – for a user-defined multidimensional index of social progress, which won a special prize in the Apps4Italy competition.

Yourtopia Italy: application of a user-defined multidimensional index of social progress:

We created the Failed Bank Tracker, a list and a timeline visualisation of the banks in Europe which failed during the last financial crisis and released the Automated Game Play Datasets, the data and code of papers from the Small Artificial Agents for Virtual Economies research project, implemented by Professor David Levine and Professor Yixin Chen at the Washington University of St. Louis. More recently we launched the Metametrik prototype of a platform for the storage and search of regression results in the economics.

MetaMetrik: a prototype for the storage and search of econometric results:

We also organised several events in London and a topic stream about open knowledge and sustainability at the OKFestival with a panel bringing together a diverse range of panelists from academia, policy and the open data community to discuss how open data and technology can help improve the measurement of social progress.

Blog and Knowledge Base

We blogged about issues like the benefits of open data from the perspective of economics research, the EDaWaX survey of the data availability of economics journals, pre-registration of in the social sciences, crowd-funding as well as open access. We also presented projects like the Statistical Memory of Brazil, Quandl, the AEA randomized controlled trials registry.

Some of the issues we raised had a wider resonance, e.g. when Thomas Herndon found significant errors in trying to replicate the results of Harvard economists Reinhart and Rogoff, we emphasised that while such errors may happen, it is a greater crime not to make the data available with published research in order to allow for replication.

Some outcomes and expectations

We found that opening up data in economics may be a difficult matter, as many economists utilise data which cannot be open because of privacy, confidentiality or because they don’t own that data. Sometimes there are insufficient incentives to disclose data and code. Many economists spend a lot of resources in order to build their datasets and obtain an advantage over other researchers by making use of information rents.

Some journals have been leading the way in putting in place data availability requirements and funders have been demanding data management and sharing plans, yet more general implementation and enforcement is still lacking. There are now, however, more tools and platforms available where researchers can store and share their research content, including data and code.

There are also great benefits in sharing economics data: it enables the scrutiny of research findings and gives a possibility to replicate research, it enhances the visibility of research and promotes new uses of the data, avoids unnecessary costs for data collection, etc.

In the future we hope to concentrate on projects which would involve graduate students and early career professionals, a generation of economics researchers for whom sharing data and code may become more natural.

Keep in touch

Follow us on Twitter @okfnecon, sign up to the Open Economics mailing list and browse our projects and resources at

Introducing the Open Economics Principles

Velichka Dimitrova - August 7, 2013 in Featured, Open Economics, WG Economics

The Open Economics Working Group would like to introduce the Open Economics Principles, a Statement on Openness of Economic Data and Code. A year and a half ago the Open Economics project began with a mission of becoming central point of reference and support for those interested in open economic data. In the process of identifying examples and ongoing barriers for opening up data and code for the economics profession, we saw the need to present a statement on the guiding principles of transparency and accountability in economics that would enable replication and scholarly debate as well as access to knowledge as a public good.

We wrote the Statement on the Open Economics Principles during our First and Second Open Economics International Workshops, receiving feedback from our Advisory Panel and community with the aim to emphasise the importance of having open access to data and code by default and address some of the issues around the roles of researchers, journal editors, funders and information professionals.

Second Open Economics International Workshop, June 11-12, 2013

Second Open Economics International Workshop, June 11-12, 2013

Read the statement below and follow this link to endorse the Principles.

Open Economics Principles

Statement on Openness of Economic Data and Code

Economic research is based on building on, reusing and openly criticising the published body of economic knowledge. Furthermore, empirical economic research and data play a central role for policy-making in many important areas of our economies and societies.

Openness enables and underpins scholarly enquiry and debate, and is crucial in ensuring the reproducibility of economic research and analysis. Thus, for economics to function effectively, and for society to reap the full benefits from economic research, it is therefore essential that economic research results, data and analysis be openly and freely available, wherever possible.

  1. Open by default: by default data in its different stages and formats, program code, experimental instructions and metadata – all of the evidence used by economists to support underlying claims – should be open as per the Open Definition1, free for anyone to use, reuse and redistribute. Specifically open material should be publicly available and licensed with an appropriate open licence2.
  2. Privacy and confidentiality: We recognise that there are often cases where for reasons of privacy, national security and commercial confidentiality the full data cannot be made openly available. In such cases researchers should share analysis under the least restrictive terms consistent with legal requirements, abiding by the research ethics and guidelines of their community. This should include opening up non-sensitive data, summary data, metadata and code, and facilitating access if the owner of the original data grants other researchers permission to use the data
  3. Reward structures and data citation: recognizing the importance of data and code to the discipline, reward structures should be established in order to recognise these scholarly contributions with appropriate credit and citation in an acknowledgement that producing data and code with the documentation that make them reusable by others requires a significant commitment of time and resources. At minimum, all data necessary to understand, assess, or extend conclusions in scholarly work should be cited. Acknowledgements of research funding, traditionally limited to publications, could be extended to research data and contribution of data curators should be recognised.
  4. Data availability: Investigators should share their data by the time of publication of initial results of analyses of the data, except in compelling circumstances. Data relevant to public policy should be shared as quickly and widely as possible. Funders, journals and their editorial boards should put in place and enforce data availability policies requiring data, code and any other relevant information to be made openly available as soon as possible and at latest upon publication. Data should be in a machine-readable format, with well-documented instructions, and distributed through institutions that have demonstrated the capability to provide long-term stewardship and access. This will enable other researchers to replicate empirical results.
  5. Publicly funded data should be open: publicly funded research work that generates or uses data should ensure that the data is open, free to use, reuse and redistribute under an open licence – and specifically, it should not be kept unavailable or sold under a proprietary licence. Funding agencies and organizations disbursing public funds have a central role to play and should establish policies and mandates that support these principles, including appropriate costs for long-term data availability in the funding of research and the evaluation of such policies3, and independent funding for systematic evaluation of open data policies and use.
  6. Usable and discoverable: as simply making data available may not be sufficient for reusing it, data publishers and repository managers should endeavour to also make the data usable and discoverable by others; for example, documentation, the use of standard code lists, etc., all help make data more interoperable and reusable and submission of the data to standard registries and of common metadata enable greater discoverability.

See Reasons and Background and a link to endorsing the Principles:


2. Open licences for code are those conformant with the Open Source Definition see and open licences for data should be conformant with the open definition, see

3. A good example of an important positive developments in this direction from the United States is

Open tax data, or just VAT ‘open wash’

Chris Taggart - July 30, 2013 in Featured, Open Data, Open Economics, Open Government Data, Public Money, WG Open Government Data

This post is by Chris Taggart, the co-founder and CEO of OpenCorporates, the largest open database of companies in the world, and a member of the Open Government working group.

[Disclosure: I am on the UK Tax Transparency Board, which has not yet discussed these proposals, but will be doing so at the next meeting in early September]

A little over a week ago, Her Majesty’s Revenue & Customs (HMRC) published a consultation on publishing its data more widely, and in it stated its intention to join the open-data movement.

The UK helped secure the G8’s Open Data Charter, which presumes that the data held by Governments will be publicly available unless there is good reason to withhold it. It is important that HMRC plays a full part. HMRC’s relationship with businesses and individuals is unique, and this is reflected in the scope and depth of the information HMRC collects, creates and protects on behalf of taxpayers.

Great. Well, no.

The problem is that, despite what the above says, this consultation and the proposals within have little to do with open data or widening access, but instead are primarily about passing data, much of it personal data relating to ordinary individuals, to the anointed few. It also exposes some worrying data-related problems within HMRC that should be ringing alarm bells within government.

So what exactly is being suggested? There are two parts:

  1. Proposals to do with sharing HMRC’s data, particularly aggregated and anonymised data. At the moment HMRC can, in general, only share such data if it relates to HMRC’s functions, even if it’s in the wider public benefit.
  2. Proposals to do with the VAT Register. The VAT Register is currently private, even though the a large extent much of the information is ‘out there’, on till receipts, on invoices, on websites, and in various private datasets, and in fact in many countries it’s already public.

Both have their issues, but for moment we’ll concentrate on the second.

Now there has been no great clamour for the VAT Register from open-data activists (unlike say the postcode address file, company register, or Ordnance Survey data), so why is it being opened up? Well, why not? As the consultation says:

An underlying principle in developing the proposals in this chapter is brought out in the Shakespeare Review. Data belong to citizens and the presumption of government should be towards openness, unless this causes harm. It is not for government to dictate the nature of the opportunity. The corollary is that the Government will not always be aware of the range or scale of potential benefits, as the quotation below shows – this consultation will help to establish these.

So the proposal is to publish the VAT Register as open data, so that the wider community can do cool stuff with it? No. The consultation neatly elides from this lofty aim with something rather more grubby.

There has been public interest for some time, for example from credit reference agencies (CRAs), in the publication of VAT registration data as a resource to generate benefits.

Don’t the three big credit reference agencies (Experian, Equifax and Callcredit) already know a lot about companies? Surely they know the VAT numbers of many of them, and in any case know a lot more about most companies, especially active, trading companies (the sort that are registered for VAT)?

What they don’t have, however, is much information about sole-traders, small partnerships, individuals trading on their own account and without the shield of limited liability, with the responsibilities for publishing information that comes with that. That’s why the VAT register is so important to them, and that’s what this consultation is proposing to give them.

Of course they could just ask people for that information. But people might refuse, particularly if they don’t need to borrow money, and that would be a problem as far as building a monetisable dataset of them. If they could only get the government to give them access to that data – have the government act as their own data-collection arm, with the force of law to compel providing of the information – that would be great. For them. For individuals, and for the wider world, it’s not good at all.

First, because what we’re talking about here are individuals, who have privacy and data protection rights, not companies, and there needs to be compelling reasons for making that public in the first place – just because the big three credit reference agencies, or CRAs (Experian, Equifax, CallCredit), think they can make money from it isn’t good enough.

Second, because if open data is about one thing, it is about democratising access to data, about reversing the traditional position where, to use the words of the Chancellor, George Osborne, “Access to the world’s information – and the ability to communicate it – was controlled by an elite few”. And if there’s one thing that’s certain it’s that the CRAs have a lot of power.

But wait, doesn’t the consultation also propose that some of the VAT register is published as open data, specifically “a very selective extract covering just three data fields – VAT registration number (VRN), trading name, and Standard Industry Code (SIC) classification number”.

At first sight this might be seen as good, or better than nothing. In fact it shows that HMRC either doesn’t get data, or it’s just ‘openwash’ – an open-data figleaf to obscure the passing of personal and private data wholesale to the CRAs, and one that could potentially lead to greater fraud. Here’s why:

  • The three fields (VAT number, trading name, SIC code) together make up an orphan dataset, i.e. one that’s unconnected with any other data, and therefore is fundamentally useless… unless you want to fraudulently write an invoice calling yourself ‘AAA Plumbing’, charging VAT on it, and pocketing the 20%, knowing that either you will never be caught, or the real AAA Plumbing will be first place HMRC will come looking. Fraud is fundamentally about asymmetries of information flows (the fraudster knows more about you than you know about them). If, for example, you know that the real AAA Plumbing is a company with a registered address in Kirkcaldy, Scotland, for example, or the BBB Services is dissolved or has a website showing it works in the aircraft business, then you have a much greater chance of avoiding fraud.
  • Trading names are very problematic, and in general are not registered anywhere, so are little help. They also need have no relationship to the legal name, either of the person or the company. So if you want to find the company behind ZZZ Financial Experts, if indeed there is one, you’re out of luck. It’s puzzling that HMRC would even consider publishing the VAT Register without the legal form, and in the case of companies the company number.
  • One of the stated reasons for publishing the register is that “VAT registration data could also provide a foundation for private sector business registers”. Really? In this world of open data and the importance of core reference data, HMRC wants a private, proprietary identifier set to be created, with all the problems that it would entail? In fact, HMRC was supposed to working with the Department of Business, Innovation & Skills to build such a public dataset. Has it decided that it doesn’t understand data well enough to do this? Or would it rather shackle not just the government but the business sector as a whole to some such dataset?
  • Finally, it’s also rather surprising to discover that the VAT register appears to contain fields such as the company’s incorporation date and SIC codes. In the geek world we call this a denormalised dataset, meaning it’s duplicating data that rightfully belongs in another table or dataset. There are sometimes good reasons for doing this, but there are risks, such as the data becoming out of sync (which is the correct SIC code – the one on the VAT Register or on the Companies House record).

So what should HMRC be doing? First, it should abandon any plans to act as the Credit Reference Agencies’ data collectors, and publish the VAT register or part of the VAT register as a single open dataset, equal to all under the same terms. This would be a genuine spur for innovation, and may even result in increased competition and transparency.

Second, it should realise that there’s a fundamental difference between an individual – a living, breathing person with human rights – and a company. As well as human rights, individuals have data protection rights, privacy rights and don’t exist on a public register; companies on the other hand are artificial entities given a distinct legal personality by the state for the good of society, and in return exist in public (on the public Register of Companies). In the case of the VAT register, the pragmatic approach would be to publish the register as open data, but only that part that relates to companies.

Third, it needs to realise that it is fundamentally in the data business, like it or not, and it needs to quickly get to grips with the modern data world, including the power of data, for good, and for bad. The UK has probably the leading organisations in the world in this area, including OpenCorporates, the Open Knowledge Foundation and the Open Data Institute.

Second Open Economics International Workshop

Velichka Dimitrova - June 5, 2013 in Events, Featured, Open Data, Open Economics, WG Economics, Workshop

Next week, on June 11-12, at the MIT Sloan School of Management, the Open Economics Working Group of the Open Knowledge Foundation will gather about 40 economics professors, social scientists, research data professionals, funders, publishers and journal editors for the second Open Economics International Workshop.

The event will follow up on the first workshop held in Cambridge UK and will conclude with agreeing a statement on the Open Economics principles. Some of the speakers include Eric von Hippel, T Wilson Professor of Innovation Management and also Professor of Engineering Systems at MIT, Shaida Badiee, Director of the Development Data Group at the World Bank and champion for the Open Data Initiative, Micah Altman, Director of Research and Head of the Program on Information Science for the MIT Libraries as well as Philip E. Bourne, Professor at the University of California San Diego and Associate Director of the RCSB Protein Data Bank.

The workshop will address topics including:

  • Research data sharing: how and where to share economics social science research data, enforce data management plans, promote better data management and data use
  • Open and collaborative research: how to create incentives for economists and social scientists to share their research data and methods openly with the academic community
  • Transparent economics: how to achieve greater involvement of the public in the research agenda of economics and social science

The knowledge sharing in economics session will invite a discussion between Joshua Gans, Jeffrey S. Skoll Chair of Technical Innovation and Entrepreneurship at the Rotman School of Management at the University of Toronto and Co-Director of the Research Program on the Economics of Knowledge Contribution and Distribution, John Rust, Professor of Economics at Georgetown University and co-founder of, Gert Wagner, Professor of Economics at the Berlin University of Technology (TUB) and Chairman of the German Census Commission and German Council for Social and Economic Data as well as Daniel Feenberg, Research Associate in the Public Economics program and Director of Information Technology at the National Bureau of Economic Research.

The session on research data sharing will be chaired by Thomas Bourke, Economics Librarian at the European University Institute, and will discuss the efficient sharing of data and how to create and enforce reward structures for researchers who produce and share high quality data, gathering experts from the field including Mercè Crosas, Director of Data Science at the Institute for Quantitative Social Science (IQSS) at Harvard University, Amy Pienta, Acquisitions Director at the Inter-university Consortium for Political and Social Research (ICPSR), Joan Starr, Chair of the Metadata Working Group of DataCite as well as Brian Hole, the founder of the open access academic publisher Ubiquity Press.

Benjamin Mako Hill, researcher and PhD Candidate at the MIT and Berkman Center for Internet and Society at Harvard Univeresity, will chair the session on the evolving evidence base of social science, which will highlight examples of how economists can broaden their perspective on collecting and using data through different means: through mobile data collection, through the web or through crowd-sourcing and also consider how to engage the broader community and do more transparent economic research and decision-making. Speakers include Amparo Ballivian, Lead Economist working with the Development Data Group of the World Bank, Michael P. McDonald, Associate Professor at George Mason University and co-principle investigator on the Public Mapping Project and Pablo de Pedraza, Professor at the University of Salamanca and Chair of Webdatanet.

The morning session on June 12 will gather different stakeholders to discuss how to share responsibility and how to pursue joint action. It will be chaired by Mireille van Eechoud, Professor of Information Law at IViR and will include short statements by Daniel Goroff, Vice President and Program Director at the Alfred P. Sloan Foundation, Nikos Askitas, Head of Data and Technology at the Institute for the Study of Labor (IZA), Carson Christiano, Head of CEGA’s partnership development efforts and coordinating the Berkeley Initiative for Transparency in the Social Sciences (BITSS) and Jean Roth, the Data Specialist at the National Bureau of Economic Research.

At the end of the workshop the Working Group will discuss the future plans of the project and gather feedback on possible initiatives for translating discussions in concrete action plans. Slides and audio will be available on the website after the workshop. If you have any questions please contact economics [at]

Reinhart-Rogoff Revisited: Why we need open data in economics

Velichka Dimitrova - April 22, 2013 in Open Data, Open Economics, WG Economics


This blog post is cross-posted from the Open Economics Blog.

Another economics scandal made the news last week. Harvard Kennedy School professor Carmen Reinhart and Harvard University professor Kenneth Rogoff argued in their 2010 NBER paper that economic growth slows down when the debt/GDP ratio exceeds the threshold of 90 percent of GDP. These results were also published in one of the most prestigious economics journals – the American Economic Review (AER) – and had a powerful resonance in a period of serious economic and public policy turmoil when governments around the world slashed spending in order to decrease the public deficit and stimulate economic growth.

Carmen Reinhart

Kenneth Rogoff

Yet, they were proven wrong. Thomas Herndon, Michael Ash and Robert Pollin from the University of Massachusetts (UMass) tried to replicate the results of Reinhart and Rogoff and criticised them on the basis of three reasons:

  • Coding errors: due to a spreadsheet error five countries were excluded completely from the sample resulting in significant error of the average real GDP growth and the debt/GDP ratio in several categories
  • Selective exclusion of available data and data gaps: Reinhart and Rogoff exclude Australia (1946-1950), New Zealand (1946-1949) and Canada (1946-1950). This exclusion is alone responsible for a significant reduction of the estimated real GDP growth in the highest public debt/GDP category
  • Unconventional weighting of summary statistics: the authors do not discuss their decision to weight equally by country rather than by country-year, which could be arbitrary and ignores the issue of serial correlation.

The implications of these results are that countries with high levels of public debt experience only “modestly diminished” average GDP growth rates and as the UMass authors show there is a wide range of GDP growth performances at every level of public debt among the twenty advanced economies in the survey of Reinhart and Rogoff. Even if the negative trend is still visible in the results of the UMass researchers, the data fits the trend very poorly: “low debt and poor growth, and high debt and strong growth, are both reasonably common outcomes.”

Source: Herndon, T., Ash, M. & Pollin, R., “Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff, Public Economy Research Institute at University of Massachusetts: Amherst Working Paper Series. April 2013.

What makes it even more compelling news is that it is all a tale from the state of Massachusetts: distinguished Harvard professors (#1 university in the US) challenged by empiricists from the less known UMAss (#97 university in the US). Then despite the excellent AER data availability policy – which acts as a role model for other journals in economics – the AER has failed to enforce it and make the data and code of Reinhart and Rogoff available to other researchers.

Coding errors happen, yet the greater research misconduct was not allowing other researchers to review and replicate the results through making the data openly available. If the data and code were made available upon publication in 2010, it may not have taken three years to prove these results wrong, which may have influenced the direction of public policy around the world towards stricter austerity measures. Sharing research data means a possibility to replicate and discuss, enabling the scrutiny of research findings as well as improvement and validation of research methods through more scientific enquiry and debate.

Get in Touch

The Open Economics Working Group advocates the release of datasets and code, along with published academic articles, and provides practical assistance to researchers who would like to do so. Get in touch if you would like to learn more by writing us at economics [at] and signing up to our mailing list.


Herndon, T., Ash, M. & Pollin, R., “Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff, Public Economy Research Institute at University of Massachusetts: Amherst Working Paper Series. April 2013: Link to paper | Link to data and code

Releasing the Automated Game Play Datasets

Velichka Dimitrova - March 7, 2013 in Open Economics, Our Work, WG Economics


This blog post is cross-posted from the Open Economics Blog.

We are very happy to announce that the Open Economics Working Group is releasing the datasets of the research project “Small Artificial Human Agents for Virtual Economies“, implemented by Professor David Levine and Professor Yixin Chen at the Washington University of St. Louis and funded by the National Science Foundation [See dedicated webpage].

The authors of the study have given their permission to publish their data online. We hope that through making this data available online we will aid researchers working in this field. This initiative is motivated by our belief that in order for economic research to be reliable and trusted, it should be possible to reproduce research findings – which is difficult or even impossible without the availability of the data and code. Making material openly available reduces to a minimum the barriers for doing reproducible research.

If you are interested to know more or you like to get help in releasing research data in your field, please contact us at: economics [at]

Project Background

An important requirement for developing better economic policy recommendations is improving the way we validate theories. Originally economics depended on field data from surveys and laboratory experiments. An alternative method of validating theories is through the use of artificial or virtual economies. If a virtual world is an adequate description of a real economy, then a good economic theory ought to be able to predict outcomes in that setting.

An artificial environment offers enormous advantages over the field and laboratory: complete control – for example, over risk aversion and social preferences – and great speed in creating economies and validating theories. In economics, the use of virtual economies can potentially enable us to deal with heterogeneity, with small frictions, and with expectations that are backward looking rather than determined in equilibrium. These are difficult or impractical to combine in existing calibrations or Monte Carlo simulations.

The goal of this project is to build artificial agents by developing computer programs that act like human beings in the laboratory. We focus on the simplest type of problem of interest to economists: the simple one-shot two-player simultaneous move games. The most well-known form of these are “Prisoner’s Dilemmas” – a much studied scenario in game theory which explores the circumstances of cooperation between two people. In its classic form, the model suggests that two agents who are fully self-interested and rational would always betray each other, even though the best outcome overall would be if they cooperated. However, laboratory humans show a tendency towards cooperation. Our challenge is therefore developing artificial agents who share this bias to the same degree as their human counterparts.

There is a wide variety of existing published data on laboratory behavior that will be the primary testing ground for the computer programs. As the project progresses, the programs will be challenged to see if they adapt themselves to changes in the rules in the same ways as human agents: for example, if payments are changed in a certain way, the computer programs will play differently: do people do the same? In some cases we may be able to answer these questions with data from existing studies; in others we will need to conduct our own experimental studies.

Find the full list of available datasets here

Preregistration in the Social Sciences: A Controversy and Available Resources

James Monogan - February 20, 2013 in Open Economics, WG Economics

This blog post is cross-posted from the Open Economics Blog.

For years now, the practice preregistering clinical trials has worked to reduce publication bias dramatically (Drummond Rennie offers more details). Trying to build on this trend for transparency, the Open Knowledge Foundation, which runs the Open Economics Working Group, has expressed support for All Trials Registered, All Results Reported ( This initiative argues that all clinical trial results should be reported because the spread of this free information will reduce bad treatment decisions in the future and allow others to find missed opportunities for good treatments. The idea of preregistration, therefore, has proved valuable for the medical profession.

In a similar push for openness, a debate now is emerging about the merits of preregistration in the social sciences. Specifically, could social scientific disciplines benefit from investigators’ committing themselves to a research design before the observation of their outcome variable? The winter 2013 issue of Political Analysis takes up this issue with a symposium on research registration, wherein two articles make a case in favor of preregistration, and three responses offer alternate views on this controversy.

There has been a trend for transparency in social research: Many journals now require authors to release public replication data as a condition for publication. Additionally, public funding agencies such as the U.S. National Science Foundation require public release of data as a condition for funding. This push for additional transparency allows for other researchers to conduct secondary analyses that may build on past results and also allows empirical findings to be subjected to scrutiny as new theory, data, and methods emerge. Preregistering a research design is a natural next step in this transparency process as it would allow readers, including other scholars, to gain a sense of how the project was developed and how the researcher made tough design choices.

Another advantage of preregistering a research design is it can curb the prospects of publication bias. Gerber & Malhotra observe that papers produced in print tend to have a higher rate of positive results in hypothesis tests than should be expected. Registration has the potential to curb publication bias, or at least its negative consequences. Even if committing oneself to a research design does not change the prospect for publishing an article in the traditional format, it would signal to the larger audience that a study was developed and that a publication never emerged. This would allow the scholarly community at large to investigate further, perhaps reanalyze data that were not published in print, and if nothing else get a sense of how preponderant null findings are for commonly-tested hypotheses. Also, if more researchers tie their hands in a registration phase, then there is less room for activities that might push a result over a common significance threshold.

To illustrate how preregistration can be useful, my article in this issue of Political Analysis analyzes the effect of Republican candidates’ position on the immigration issue on their share of the two-party vote in 2010 elections for the U.S. House of Representatives. In this analysis, I hypothesized that Republican candidates may have been able to garner additional electoral support by taking a harsh stand on the issue. I designed my model to estimate the effect on vote share of taking a harsher stand on immigration, holding the propensity of taking a harsh stand constant. This propensity was based on other factors known to shape election outcomes, such as district ideology, incumbency, campaign finances, and previous vote share. I crafted my design before votes were counted in the 2010 election and publicly posted it to the Society for Political Methodology’s website as a way of committing myself to this design.


In the figure, the horizontal axis represents values that the propensity scores for harsh rhetoric could take. The tick marks along the base of the graph indicate actual values in the data of the propensity for harsh rhetoric. The vertical axis represents the expected change in proportion of the two party vote that would be expected for moving from a welcoming position to a hostile position. The figure shows a solid black line, which indicates my estimate of the effect of a Republican’s taking a harsh stand on immigration on his or her proportion of the two-party vote. The two dashed black lines indicate the uncertainty in this estimate of the treatment effect. As can be seen, the estimated effects come with considerable uncertainty, and I can never reject the prospect of a zero effect.

However, a determined researcher could have tried alternate specifications until a discernible result emerged. The figure also shows a red line representing the estimated treatment effect from a simpler model that also omits the effect of how liberal or conservative the district is. The dotted red lines represent the uncertainty in this estimate. As can be seen, this reports a uniform treatment effect of 0.079 that is discernible from zero. After “fishing” with the model specification, a researcher could have manufactured a result suggesting that Republican candidates could boost their share of the vote by 7.9 percentage points by moving from a welcoming to a hostile stand on immigration! Such a result would be misleading because it overlooks district ideology. Whenever investigators commit themselves to a research design, this reduces the prospect of fishing after observing the outcome variable.

I hope to have illustrated the usefulness of preregistration and hope the idea will spread. Currently, though, there is not a comprehensive study registry in the social sciences. However, several proto-registries are available to researchers. All of these registries offer the opportunity for self-registration, wherein the scholar can commit him or herself to a design as a later signal to readers, reviewers, and editors.

In particular, any researcher from any discipline who is interested in self-registering a study is welcome to take advantage of the Political Science Registered Studies Dataverse. This dataverse is a fully-automated resource that allows researchers to upload design information, pre-outcome data, and any preliminary code. Uploaded designs will be publicized via a variety of free media. List members are welcome to subscribe to any of these announcement services, which are linked in the header of the dataverse page.

Besides this automated system, there are also a few other proto-registries of note: * The EGAP: Experiments in Governance and Politics ( website has a registration tool that now accepts and posts detailed preanalysis plans. In instances when designs are sensitive, EGAP offers the service of accepting and archiving sensitive plans with an agreed trigger for posting them publicly.

  • J-PAL: The Abdul Latif Jameel Poverty Action Lab ( has been hosting a hypothesis registry since 2009. This registry is for pre-analysis plans of researchers working on randomized controlled trials, which may be submitted before data analysis begins.

  • The American Political Science Association’s Experimental Research Section ( hosts a registry for experiments at its website. (Please note, however, that the website currently is down for maintenance.)

Open Research Data Handbook Sprint

Velichka Dimitrova - February 15, 2013 in Open Access, Open Content, Open Data, Open Economics, Open Science, Open Standards, Our Work, WG Economics

On February 15-16 we are updating the Open Research Data Handbook to include more detail on sharing research data from scientific work, and to remix the book for different disciplines and settings. We’re doing this through an open book sprint. The sprint will happen at the Open Data Institute, 65 Clifton Street, London EC2A 4JE.

The Friday lunch seminar will be streamed through the Open Economics Bambuser channel. If you would like to participate, please see the Online Participation Hub for links to documents and programme updates. You can follow this event at the IRC channel #okfn-rbook and follow on twitter with hashtags #openresearch and #okfnrbook.

The Open Research Data Handbook aims to provide an introduction to the processes, tools and other areas that researchers need to consider to make their research data openly available.

Join us for a book sprint to develop the current draft, and explore ways to remix it for different disciplines and contexts.

Who it is for:

  • Researchers interested in carrying out their work in more open ways
  • Experts on sharing research and research data
  • Writers and copy editors
  • Web developers and designers to help present the handbook online
  • Anyone else interested in taking part in an intense and collaborative weekend of action

What will happen:

The main sprint will take place on Friday and Saturday. After initial discussions we’ll divide into open space groups to focus on research, writing and editing for different chapters of the handbook, developing a range of content including How To guidance, stories of impact, collections of links and decision tools.

A group will also look at digital tools for presenting the handbook online, including ways to easily tag content for different audiences and remix the guide for different contexts.


Where: 65 Clifton Street, EC2A 4JE (3rd floor – the Open Data Institute)

Friday, February 15th

  • 13:00 – 13:30: Arrival and sushi lunch
  • 13:30 – 14:30: Open research data seminar with Steven Hill, Head of Open Data Dialogue at RCUK.
  • 14:30 – 17:30: Working in teams

Friday, February 16th

  • 10:00 – 10:30: Arrival and coffee
  • 10:30 – 11:30: Introducing open research lightning talks (your space to present your project on research data)
  • 11:30 – 13:30: Working in teams
  • 13:30 – 14:30: Lunch
  • 14:30 – 17:30: Working in teams
  • 17:30 – 18:30: Reporting back

As many already registered for online participation we will broadcast the lunch seminar through the Open Economics Bambuser channel. Please drop by in the IRC channel #okfn-rbook


OKF Open Science Working Group – creators of the current Open Research Data Handbook
OKF Open Economic Working Group – exploring economics aspects of open research
Open Data Research Network – exploring a remix of the handbook to support open social science
research in a new global research network, focussed on research in the Global South.
Open Data Institute – hosting the event

Dutch PhD-workshop on research design, open access and open data

Velichka Dimitrova - February 1, 2013 in Open Access, Open Economics, Open Standards

This blog post is written by Esther Hoorn, Copyright Librarian, University of Groningen, the Netherlands. It is cross-posted from the Open Economics Blog.

If Roald Dahl were still alive, he would certainly be tempted to write a book about the Dutch social psychologist Diederik Stapel. For not only did he make up the research data to support his conclusions, but also he ate all the M&M’s, which he bought with public money for interviews with fictitious pupils in fictitious high schools. In the Netherlands the research fraud by Stapel was a catalyst to bring attention to the issue of research integrity and availability of research data. A new generation of researchers needs to be aware of the policy on sharing research data by the Dutch research funder NWO, the EU policy and the services of DANS, the Dutch Data archiving and networked services. In the near future, a data management plan will be required in every research proposal.


For some time now the library at the University of Groningen is organizing workshops for PhDs to raise awareness on the shift towards Open Access. Open Access and copyright are the main themes. The question also to address verifiability of research data came from SOM, the Research Institute of the Faculty of Economics and Business. The workshop is given as part of the course Research Design of the PhD program. The blogpost Research data management in economic journals proved to be very useful to get an overview of the related issues in this field.

Open Access

As we often see, Open Access was a new issue to most of the students. Because the library buys licenses the students don’t perceive a problem with access to research journals. Moreover, they are not aware of the big sums that the universities at present pay to finance access exclusively for their own staff and students. Once they understand the issue there is a strong interest. Some see a parallel with innovative distribution models for music. The PhDs come from all over the world. And more and more Open Access is addressed in every country of the world. One PhD from Indonesia mentioned that the Indonesian government requires his dissertation to be available through the national Open Access repository. Chinese students were surprised by availability of information on Open Access in China.


The students prepared an assignment with some questions on Open Access and sharing research data. The first question still is on the impact factor of the journals in which they intend to publish. The questions brought the discussion to article level metrics and alternative ways to organize the peer review of Open Access journals.

Will availability of research data stimulate open access?

Example of the Open Access journal Economics

The blogpost Research data management in economic journals presents the results of the German project EdaWax, European Data Watch Extended. An important result of the survey points at the role of association and university presses. Especially it appears that many journals followed the data availability policy of the American Economic Association.

[quote] We found out that mainly university or association presses have high to very high percentages of journals owning data availability policies while the major scientific publishers stayed below 20%.

Out of the 29 journals with data availability policies, 10 used initially the data availability policy implemented by the American Economic Review (AER). These journals either used exactly the same policy or a slightly modified version.

For students it is assuring to see how associations take up their role to address this issue. An example of an Open Access journal that adopted the AER policy is Economics. And yes, this journal does have an impact factor in the Social Science Citation Index and also the possibility to archive the datasets in the Dataverse Network.

Re-use of research data for peer review

One of the students suggested that the public availability of research data (instead or merely research findings) may lead to innovative forms of review. This may facilitate a further shift towards Open Access. With access to underlying research data and methodologies used, scientists may be in a better position to evaluate the quality of the research conducted by peers. The typical quality label given by top and very good journals may then become less relevant, over time. It was also discussed that journals may not publish a certain numbers of papers in a volume released e.g. four times a year, but rather as qualifying papers are available for publication throughout the year. Another point raised was that a substantial change in the existing publication mechanics will likely require either top journals or top business schools to lead the way, whereas associations of leading scientists in a certain field may also play an important role in such conversion.

Get Updates