This post is by Chris Taggart, the co-founder and CEO of OpenCorporates, the largest open database of companies in the world, and a member of the Open Government working group.

[Disclosure: I am on the UK Tax Transparency Board, which has not yet discussed these proposals, but will be doing so at the next meeting in early September]

A little over a week ago, Her Majesty’s Revenue & Customs (HMRC) published a consultation on publishing its data more widely, and in it stated its intention to join the open-data movement.

The UK helped secure the G8’s Open Data Charter, which presumes that the data held by Governments will be publicly available unless there is good reason to withhold it. It is important that HMRC plays a full part. HMRC’s relationship with businesses and individuals is unique, and this is reflected in the scope and depth of the information HMRC collects, creates and protects on behalf of taxpayers.

Great. Well, no.

The problem is that, despite what the above says, this consultation and the proposals within have little to do with open data or widening access, but instead are primarily about passing data, much of it personal data relating to ordinary individuals, to the anointed few. It also exposes some worrying data-related problems within HMRC that should be ringing alarm bells within government.

So what exactly is being suggested? There are two parts:

  1. Proposals to do with sharing HMRC’s data, particularly aggregated and anonymised data. At the moment HMRC can, in general, only share such data if it relates to HMRC’s functions, even if it’s in the wider public benefit.
  2. Proposals to do with the VAT Register. The VAT Register is currently private, even though the a large extent much of the information is ‘out there’, on till receipts, on invoices, on websites, and in various private datasets, and in fact in many countries it’s already public.

Both have their issues, but for moment we’ll concentrate on the second.

Now there has been no great clamour for the VAT Register from open-data activists (unlike say the postcode address file, company register, or Ordnance Survey data), so why is it being opened up? Well, why not? As the consultation says:

An underlying principle in developing the proposals in this chapter is brought out in the Shakespeare Review. Data belong to citizens and the presumption of government should be towards openness, unless this causes harm. It is not for government to dictate the nature of the opportunity. The corollary is that the Government will not always be aware of the range or scale of potential benefits, as the quotation below shows – this consultation will help to establish these.

So the proposal is to publish the VAT Register as open data, so that the wider community can do cool stuff with it? No. The consultation neatly elides from this lofty aim with something rather more grubby.

There has been public interest for some time, for example from credit reference agencies (CRAs), in the publication of VAT registration data as a resource to generate benefits.

Don’t the three big credit reference agencies (Experian, Equifax and Callcredit) already know a lot about companies? Surely they know the VAT numbers of many of them, and in any case know a lot more about most companies, especially active, trading companies (the sort that are registered for VAT)?

What they don’t have, however, is much information about sole-traders, small partnerships, individuals trading on their own account and without the shield of limited liability, with the responsibilities for publishing information that comes with that. That’s why the VAT register is so important to them, and that’s what this consultation is proposing to give them.

Of course they could just ask people for that information. But people might refuse, particularly if they don’t need to borrow money, and that would be a problem as far as building a monetisable dataset of them. If they could only get the government to give them access to that data – have the government act as their own data-collection arm, with the force of law to compel providing of the information – that would be great. For them. For individuals, and for the wider world, it’s not good at all.

First, because what we’re talking about here are individuals, who have privacy and data protection rights, not companies, and there needs to be compelling reasons for making that public in the first place – just because the big three credit reference agencies, or CRAs (Experian, Equifax, CallCredit), think they can make money from it isn’t good enough.

Second, because if open data is about one thing, it is about democratising access to data, about reversing the traditional position where, to use the words of the Chancellor, George Osborne, “Access to the world’s information – and the ability to communicate it – was controlled by an elite few”. And if there’s one thing that’s certain it’s that the CRAs have a lot of power.

But wait, doesn’t the consultation also propose that some of the VAT register is published as open data, specifically “a very selective extract covering just three data fields – VAT registration number (VRN), trading name, and Standard Industry Code (SIC) classification number”.

At first sight this might be seen as good, or better than nothing. In fact it shows that HMRC either doesn’t get data, or it’s just ‘openwash’ – an open-data figleaf to obscure the passing of personal and private data wholesale to the CRAs, and one that could potentially lead to greater fraud. Here’s why:

  • The three fields (VAT number, trading name, SIC code) together make up an orphan dataset, i.e. one that’s unconnected with any other data, and therefore is fundamentally useless… unless you want to fraudulently write an invoice calling yourself ‘AAA Plumbing’, charging VAT on it, and pocketing the 20%, knowing that either you will never be caught, or the real AAA Plumbing will be first place HMRC will come looking.
    Fraud is fundamentally about asymmetries of information flows (the fraudster knows more about you than you know about them). If, for example, you know that the real AAA Plumbing is a company with a registered address in Kirkcaldy, Scotland, for example, or the BBB Services is dissolved or has a website showing it works in the aircraft business, then you have a much greater chance of avoiding fraud.
  • Trading names are very problematic, and in general are not registered anywhere, so are little help. They also need have no relationship to the legal name, either of the person or the company. So if you want to find the company behind ZZZ Financial Experts, if indeed there is one, you’re out of luck. It’s puzzling that HMRC would even consider publishing the VAT Register without the legal form, and in the case of companies the company number.
  • One of the stated reasons for publishing the register is that “VAT registration data could also provide a foundation for private sector business registers”. Really? In this world of open data and the importance of core reference data, HMRC wants a private, proprietary identifier set to be created, with all the problems that it would entail? In fact, HMRC was supposed to working with the Department of Business, Innovation & Skills to build such a public dataset. Has it decided that it doesn’t understand data well enough to do this? Or would it rather shackle not just the government but the business sector as a whole to some such dataset?
  • Finally, it’s also rather surprising to discover that the VAT register appears to contain fields such as the company’s incorporation date and SIC codes. In the geek world we call this a denormalised dataset, meaning it’s duplicating data that rightfully belongs in another table or dataset. There are sometimes good reasons for doing this, but there are risks, such as the data becoming out of sync (which is the correct SIC code – the one on the VAT Register or on the Companies House record).

So what should HMRC be doing? First, it should abandon any plans to act as the Credit Reference Agencies’ data collectors, and publish the VAT register or part of the VAT register as a single open dataset, equal to all under the same terms. This would be a genuine spur for innovation, and may even result in increased competition and transparency.

Second, it should realise that there’s a fundamental difference between an individual – a living, breathing person with human rights – and a company. As well as human rights, individuals have data protection rights, privacy rights and don’t exist on a public register; companies on the other hand are artificial entities given a distinct legal personality by the state for the good of society, and in return exist in public (on the public Register of Companies). In the case of the VAT register, the pragmatic approach would be to publish the register as open data, but only that part that relates to companies.

Third, it needs to realise that it is fundamentally in the data business, like it or not, and it needs to quickly get to grips with the modern data world, including the power of data, for good, and for bad. The UK has probably the leading organisations in the world in this area, including OpenCorporates, the Open Knowledge Foundation and the Open Data Institute.

+ posts