The Data Journalism Handbook is a free, open source reference book for anyone interested in the emerging field of data journalism. It is the result of an international, collaborative effort involving dozens of data journalism's leading advocates and best practitioners – including from the BBC, the Chicago Tribune, the Guardian, the Financial Times, the New York Times, the Washington Post and many others.
The book will be made freely available online under a Creative Commons Attribution-ShareAlike license so anyone can read, copy, share, redistribute and reuse it. Additionally a printed version and an e-book will be published by O’Reilly Media. If you want to be notified when the book is released, you can sign up on the website.
The handbook will be released this Saturday at the International Journalism Festival in Perugia.
Here is an excerpt from the book where leading data journalism practitioners, advocates, and enthusiasts tell us…
Why is Data Journalism Important?
Filtering the Flow of Data
When information was scarce, most of our efforts were devoted to hunting and gathering. Now that information is abundant, processing is more important. We process at two levels: (1) analysis to bring sense and structure out of the never-ending flow of data and (2) presentation to get what's important and relevant into the consumer's head. Like science, data journalism discloses its methods and presents its findings in a way that can be verified by replication.
Philip Meyer (Professor Emeritus: University of North Carolina at Chapel Hill)
New Approaches to Storytelling
Data journalism is an umbrella term that, to my mind, encompasses an ever-growing set of tools, techniques and approaches to storytelling. It can include everything from traditional computer-assisted reporting (using data as a 'source') to the most cutting edge data visualisation and news applications. The unifying goal is a journalistic one: providing information and analysis to help inform us all about important issues of the day.
Aron Pilhofer (New York Times)
Like Photo Journalism with a Laptop
'Data journalism' only differs from 'words journalism' in that we use a different kit. We all sniff out, report, and relate stories for a living. It's like 'photo journalism' – just swap the camera for a laptop.
Brian Boyer (Chicago Tribune)
Data Journalism is the Future
Data-driven journalism is the future. Journalists need to be data-savvy. It used to be that you would get stories by chatting to people in bars, and it still might be that you'll do it that way some times. But now it's also going to be about poring over data and equipping yourself with the tools to analyse it and picking out what's interesting. And keeping it in perspective, helping people out by really seeing where it all fits together, and what's going on in the country.
Tim Berners-Lee (Founder of the World Wide Web)
Number-Crunching Meets Word-Smithing
Data journalism is bridging the gap between stat technicians and wordsmiths. Locating outliers and identifying trends that are not just statistically significant, but relevant to de-compiling the inherently complex world of today.
David Anderton (Freelance Journalist)
Updating Your Skills Set
Data journalism is a new set of skills for searching, understanding and visualising digital sources in a time that basic skills from traditional journalism just aren't enough. It's not a replacement of traditional journalism, but an addition to it.
In a time where sources go digital, journalists can and have to be closer to those sources. The Internet opened up possibilities beyond our current understanding. Data journalism is just the beginning of evolving our past practices to adapt to the online.
Data journalism serves two important purposes for news organisations: finding unique stories (not from news wires) and executing the watchdog function. Especially in times of financial peril, these are important goals for newspapers to achieve.
From the standpoint of a regional newspaper, data journalism is crucial. We have the saying 'a loose tile in front of your door is considered more important than a riot in a far-away country'. It's hits you in the face and impacts your life more directly. At the same time, digitisation is everywhere. Because local newspapers have this direct impact in their neighbourhood and sources become digitised, a journalist must know how to find, analyse and visualise a story from data.
Jerry Vermanen (NU.nl)
A Remedy for Information Asymmetry
Information asymmetry – not the lack of information, but the inability to take in and process it with the speed and volume that it comes to us – is one of the most significant problems that citizens face in making choices about how to live their lives. Information taken in from print, visual and audio media influence citizens' choices and actions. Good data journalism helps to combat information asymmetry.
Tom Fries (Bertelsmann Foundation)
An Answer to Data-driven PR
The availability of measurement tools and their decreasing prices, in a self-sustaining combination with a focus on performance and efficiency in all aspects of society, have led decision-makers to quantify the progresses of their policies, monitor trends and identify opportunities.
Companies keep coming up with new metrics showing how well they perform. Politicians love to brag about reductions in unemployment numbers and increases in GDP. The lack of journalistic insight in the Enron, Worldcom, Madoff or Solyndra affairs is proof of many a journalist's inability to clearly see through numbers. Figures are more likely to be taken at face value than other facts as they carry an aura of seriousness, even when they are entirely fabricated.
Fluency with data will help journalists sharpen their critical sense when faced with numbers and will hopefully help them gain back some terrain in their exchanges with PR departments.
Nicolas Kayser-Bril (Journalism ++)
To Provide Independent Interpretations of Official Information
After the devastating earthquake and subsequent Fukushima nuclear plants disaster in 2011, the importance of data journalism has been driven home to media people in Japan, a country which is generally lagging behind in digital journalism.
We were at a loss when the government and experts had no credible data about the damage. When officials hid SPEEDI data (predicted diffusion of radioactive materials) from the public, we were not prepared to decode it even if it were leaked. Volunteers began to collect radioactive data by using their own devices but we were not armed with the knowledge of statistics, interpolation, visualisation and so on. Journalists need to have access to raw data, and to learn not to rely on official interpretations of it.
Isao Matsunami (Tokyo Shimbun)
Dealing with the Data Deluge
The challenges and opportunities presented by the digital revolution continue to disrupt journalism. In an age of information abundance, journalists and citizens alike all need better tools, whether we're curating the samizdat of the 21st century in the Middle East, processing a late night data dump, or looking for the best way to visualise water quality for a nation of consumers. As we grapple with the consumption challenges presented by this deluge of data, new publishing platforms are also empowering everyone to gather and share data digitally, turning it into information. While reporters and editors have been the traditional vectors for information gathering and dissemination, the flattened information environment of 2012 now has news breaking first online, not on the news-desk.
Around the globe, in fact, the bond between data and journalism is growing stronger. In an age of big data, the growing importance of data journalism lies in the ability of its practitioners to provide context, clarity and, perhaps most important, find truth in the expanding amount of digital content in the world. That doesn't mean that the integrated media organisations of today don't play a crucial role. Far from it. In the information age, journalists are needed more than ever to curate, verify, analyse and synthesise the wash of data. In that context, data journalism has profound importance for society.
Today, making sense of big data, particularly unstructured data, will be a central goal for data scientists around the world, whether they work in newsrooms, Wall Street or Silicon Valley. Notably, that goal will be substantially enabled by a growing set of common tools, whether they're employed by government technologists opening Chicago, healthcare technologists or newsroom developers.
Alex Howard (O'Reilly Media)
Our Lives are Data
Good data journalism is hard, because good journalism is hard. It means figuring out how to get the data, how to understand it, and how to find the story. Sometimes there are dead ends, and sometimes there's no great story. After all if it were just a matter of pressing the right button, it wouldn't be journalism. But that's what makes it worthwhile, and – in a world where our lives are increasingly data – essential for a free and fair society.
Chris Taggart (OpenCorporates)
A Way to Save Time
Journalists don't have time to waste transcribing things by hand and messing around trying to get data out of PDFs, so learning a little bit of code, or knowing where to look for people who can help, is incredibly valuable.
One reporter from Folha de São Paulo was working with the local budget and called me to thank us for putting up the accounts of the municipality of São Paolo online (two days work from a single hacker!). He said he had been transcribing them by hand for the past three months, trying to build up a story. I also remember solving a 'PDF issue' for 'Contas Abertas', a parliamentary monitoring news organisation: 15 minutes and 15 lines of code solved a months worth of work.
Pedro Markun (Transparência Hacker)
An Essential Part of the Journalists' Toolkit
I think it's important to stress the 'journalism' or reporting aspect of 'data journalism'. The exercise should not be about just analysing data or visualising data for the sake of it, but to use it as a tool to get closer to the truth of what is going on in the world. I see the ability to be able to analyse and interpret data as an essential part of today's journalists' toolkit, rather than a separate discipline. Ultimately, it is all about good reporting, and telling stories in the most appropriate way.
Data journalism is another way to scrutinise the world and hold the powers that be to account. With an increasing amount of data available, now more than ever it is important that journalists are aware of data journalism techniques. This should be a tool in the toolkit of any journalist: whether learning how to work with data directly, or collaborating with someone who can.
Its real power is in helping you to obtain information that would otherwise be very difficult to find or to prove. A good example of this is Steve Doig's story that analysed damage patterns from Hurricane Andrew. He joined two different datasets: one mapping the level of destruction caused by the hurricane and one showing wind speeds. This allowed him to pinpoint areas where weakened building codes and poor construction practices contributed to the impact of the disaster. He won a Pulitzer Prize for the story in 1993 and it's great inspiration of what is possible.
Ideally you use the data to pinpoint outliers, areas of interest, or things which are surprising. In this sense data can act as a lead or a tip off. While numbers can be interesting, just writing about the data is not enough. You still need to do the reporting to explain what it means.
Cynthia O'Murchu (Financial Times)
Adapting to Changes in Our Information Environment
New digital technologies bring new ways of producing and disseminating knowledge in society. Data journalism can be understood as the media's attempt to adapt and respond to the changes in our information environment – including more interactive, multi-dimensional story-telling, enabling readers to explore the sources underlying the news and encouraging them to participate in the process of creating and evaluating stories.
César Viana (University of Goiás)
A Way to See Things You Might Not Otherwise See
Some stories can only be understood and explained through analysing – and sometimes visualising – the data. Connections between powerful people or entities would go unrevealed, deaths caused by drug policies that would remain hidden, environmental policies that hurt our landscape would continue unabated. But each of the above was changed because of data that journalists have obtained, analysed and provided to readers. The data can be as simple as a basic spreadsheet or a log of cell phone calls, or complex as school test scores or hospital infection data, but inside it all are stories worth telling.
Cheryl Phillips (The Seattle Times)
A Way To Tell Richer Stories
We can paint pictures of our entire lives with our digital trails. From what we consume and browse, to where and when we travel, to our musical preferences, our first loves, our children’s milestones, even our last wishes – it all can be tracked, digitised, stored in the cloud and disseminated. This universe of data can be surfaced to tell stories, answer questions and impart an understanding of life in ways that is currently surpassing even the most rigorous and careful reconstruction of anecdotes.
Sarah Slobin (Wall Street Journal)