Some misconceptions about data journalism

This blog originally appeared on Medium and is reposted with permission.

In the past few years, a new discipline in journalism is slowly getting more and more followers — a discipline commonly known as ‘data journalism’. These so-called ‘data journalists’ are usually envisioned as the younger, tech savvy journalists, ones that are not afraid to analyse data, understand how computer code works and simply love these colourful and detailed visualisations.

On the other end of the scale are the non-data-journalists . We usually imagine them, still using a phone and Rolodex as they simply don’t get email — and the last technological leap they made was when the mechanical typewriters were replaced by computerised word processors.

Moving away from these simplistic (even stereotypical) dichotomies into a better understanding of what a data journalist actually looks like, will do justice to the actual hard-working data-journalists out there as well as take this movement forward and make it more open and inclusive.

The Python vs. Rolodex dilemma

Let’s begin with the ground truth about the journalism trade: Journalism is all about telling a story, and the best stories are ones that revolve around humans, not numbers.

This basic fact was true a hundred years ago, and is not about to change — even if technology does. For this reason, the best journalists will always be the masters of words; those who have the best understanding of people and what makes them tick. It is the unfortunate truth that the benefit of knowing how to work with data will always come after that.

Don’t get me wrong, there’s certainly a place for all the ‘visualisation-oriented journalists’ (or “visi-journalists”). That’s because sometimes the data is the story. Sometimes, the fact that some new data is available to the public is newsworthy. Sometimes, some hard-to-find, hidden links in a large dataset are the scoop. Sometimes, a subject is too technical and complex that only a super-interactive visualisation is the only way to actually explain it. But most times, this is not the case.

So we have on one end of the spectrum, that old school journalist with her Rolodex, holding a precious network of high-ranking sources. On the other extreme, a journalist that also codes and wrangles data, trying to find a corruption case by sifting through publicly available data using a custom made Python script. But in between these two extremes, lies a vast range of hard-working journalists, reporting on the day to day happenings in politics, economy, foreign affairs and domestic issues. These journalists don’t have any sources in any high places, and have never heard of Python.

Yet, this majority of journalists is mostly ignored by the data journalism movement — which is a shame, as these are the ones most likely to benefit from it and advance it the most.

A website is not a source

Flashback to five years ago — I’m one of the few founding-volunteers of an open-data NGO in Israel, “The Public Knowledge Workshop”. One of our first projects was called “The Open Budget” — a website who took the publicly available (but hard-to-understand) national budget data and presented it in a feature-rich, user friendly website.

At that time, we tried to meet with as many journalists as we could to tell them about the new budget website — and not many would spare an hour of their busy schedules for some geeks from an unknown NGO. We would show them how easy it was to find information and visualise it in an instant. Then we would ask them whether they might consider using our website by themselves for their work.

A common answer that took me by quite a surprise always went along the lines of “That is very nice indeed but I don’t need your website as I have my sources in the Ministry of Finance and they get me any data I need”. The fact that the data was lying there, within a mouse-click’s reach, and they still wouldn’t use it — simply baffled me. It took me some time to understand why it made perfect sense.

Nevertheless, we would offer ourselves to these journalists as domain experts in understanding and analysing government data (or even knowing where to find that data) — and as volunteer ‘data wranglers’. In theory, it was supposed to be a mutually beneficial relationship: they needed help with getting the right data in their stories, and we were a young NGO, hungry for some media spotlight. In practice, this situation resulted in too many articles where we would do the work but would not be credited for it. Journalists would ask for some budget related data analysed for an article with a tight deadline. We would do our part, only to find the data attributed in the printed paper to the Ministry of Finance. As annoying as it was, they would always claim that they cannot give us credit as “No one knows who you are. We need someone with some credibility”…

Getting an answer is a human thing

So what is the reason, really, that journalists will not use an official government open-data web-site to get data and for fact-checking?

I remember one time a journalist calling me with a very simple question:

– ‘Can you tell me the total size of this year’s national budget?’

– ’Sure, but did you try our website? It’s the one single big number right there on the homepage.’

– ‘Umm… there are a few other numbers there. Can you please copy-paste the correct one and send it to me in an email?’

And so I did.

Was that reporter lazy? Perhaps. But it wasn’t just that. As it turns out, it’s not just a matter of credibility — it’s also a matter of attribution. Journalistic reporting is a delicate art of telling a narrative using only “facts”, not the journalist’s own personal opinions. Journalistic facts (which may be just someone else’s opinion) need to always be attributed to someone, be it a person or an organisation.

So you’d get sayings similar to this: ‘according to this NGO, spending on health in the national budget is 20%’. This sort of wording leaves room for other parties to claim the analysis was wrong and the actual number is different. It keeps journalists free from biases — and from accusations of such biases — while still promoting a specific world view.

The only catch is that this only works if they are solely reporting these interpretations — not making them.

Getting the right answer is also a human thing

As time passed and the number of journalists seeking our help constantly grew, a new understanding slowly emerged. We were no longer just the geeks with the best budget data in town, but we became also the geeks that know the most about the intricacies of the budgeting cycle, tenders and procurement processes.

All of a sudden we were able to answer more vague questions from journalists. Take this question as an example – “how much money is a specific company getting from the government?”. To answer that, you first need to know what options there are to ‘get money from the government’ (there are at least three or four). Then you need to know how to query the data correctly to find the actual data rows that answer the question. You might find that a single company is in fact more than one legal entity. You could discover that it’s being called differently in different data sources. Some data sources might contain data that’s partly overlapping. And after all that work you still need to produce an answer that is (most likely) correct and you can wholeheartedly stand behind it.

Getting to such a level of expertise is not something that happens in a day. This is another reason why open-data portals are simply not that useful for journalists. Even if the journalist has a clue as to which dataset contains an answer to her question — which is rarely the case, nor that a single dataset will hold the answer — it’s not enough to see the data, you need to make sense out of it. You need to understand the context. You need to know what it really means — and for that, you need an expert.

When Open Data takes the Lead

With deep knowledge of data, arrive interesting findings. Most are standard cases of negligence with public funds. Some are interesting insights regarding money flows that are only visible when analysing the ‘big picture’. Only rarely you find small acts of corruption. We believed that each of these findings was newsworthy, and we would try to find journalists that might take our leads and develop them into a complete story.

But hard as we tried, our efforts were in vain — none of the methods we tried seemed to be working. We tweeted our findings, wrote about them in our blog, pushed them hard through facebook — we even got a Telegram bot pushing algorithmically detected suspicious procurements in real time! But journalists were not impressed.

On other instances, we managed to get a specific journalist interested in a story. The only problem was that sometimes they would hold on that piece of information for weeks without doing anything with it until it became irrelevant — thus losing our chance to use it anywhere else.

At that point we decided to get some help from an expert, and hired a PR manager to help our efforts to get the message across. Seeing him work with journalists left me in awe: his ability to match the right story to the correct person, ensure we were always credited properly, that stories were written promptly was something we’d never seen. And the best part was how he was leveraging his many connections to make journalists come to us for the next story instead of the other way round.

But he also made us change our ways a little bit — as good leads needed to be kept secret until a good match was found. Exclusivity and patience bought us larger media coverage and a wider reach — but with the price of compromising on our open-data and transparency ideologies.

Data is a Source

Back to present day.

We still meet journalists on a regular basis. and although it’s now easier to get their attention, most of them would still start our meetings with a skeptical approach. They look as if they wonder ‘what are they trying to sell me?’ and ‘how on earth these geeks could have anything to do with my work?’.

But then we start talking — first we tell them about our different projects and areas of expertise, and the conversation flows to what they’re interested in: what are the ideas they’re trying to promote? which big projects they’ve always dreamt of doing but never had the data? They tell us about all their attempts to get data from the government through FOIA requests that ended in hitting brick walls.

That’s usually the point where I take out my laptop. They seem baffled when I start typing a few SQL commands on my terminal, and utterly surprised when after two or three minutes I present them with a graph of what they were looking for. “Wow, I didn’t know it was even possible… and all of that just from data that’s out there?” they say, with a smile and a new sparkle in their eyes. And that’s when I know — a new data-journalist was born.

Every once in a while, a beautifully interactive data visualisation project is published by one of the media outlets. Everybody applauds the “innovative use of the medium” and the “fine example of data-journalism” — and I’m also impressed! — but to me this is simply forgetting all these other journalists who made that leap into the world of data.

These journalists understand that leads come not just from sources in the government, but also from algorithms analysing CSV files. They cautiously learn to link to the government data portals as proof for their claims. They take data and make it a part of their story.

These are the true heroes of the data-journalism revolution. And the motto of this revolution cannot be ‘Visualise More!’ or ‘Use Big Data!’ — it must be: ‘Data is a Source’.

Thanks to Paul Walsh for the encouragement and to Nir Hirshman for being that awesome PR guy…

Adam Kariv

+ posts

Adam is an experienced technologist and open data activist, with over 25 years of experience in a wide range of areas - from embedded systems, through mobile applications, scalable services, big data analysis through to UI design. He currently works for Open Knowledge Foundation as Senior Developer and OpenSpending Technical Lead.

Some misconceptions about data journalism

The Python vs. Rolodex dilemma

A website is not a source

Getting an answer is a human thing

Getting the right answer is also a human thing

When Open Data takes the Lead

Data is a Source

Adam Kariv

1 thought on “Some misconceptions about data journalism”

Search

Tools

Community

The Python vs. Rolodex dilemma

A website is not a source

Getting an answer is a human thing

Getting the right answer is also a human thing

When Open Data takes the Lead

Data is a Source

Adam Kariv

1 thought on “Some misconceptions about data journalism”

Search

Browse by Topic

Tools

Community