Laura and Lucy from the OKFN team recently travelled to India to learn where the challenges and opportunities for open data in India lay. This is part 1 of 5 of the Open Data India Series.

The Bangalore data scene is huge. A bustling IT and data mining industry means that you are never far away from the nearest data miner or analyst, and at the Fifth Elephant Conference, the data crowd prowled for the best tips, biggest data and newest discoveries. The Fifth Elephant was our first port of call and Laura Newman and I were there to conduct workshops on the School of Data and OpenSpending

The workshops were at capacity and a learning experience for both teacher and students, with some really interesting questions being asked. In the School of Data, Laura gave a first taste of what was in store in the School of Data, which is due to launch this autumn. The workshop ‘challenge’ involved reverse engineering a Guardian article on the World’s Worst Carbon Emitters, which shows India scoring pretty badly if looked at as a country, but pretty well if looked at on a per capita basis. After some discussion of cleaning, manipulating and analysis techniques, participants were encouraged to find their own stories in data. A few surprises were in store; despite being quite a technical audience, many had little or no experience with spreadsheet programmes, and were very interested to learn what they could do with them. At the end of the class, a few even stole away into a corner to carry on experimenting. Extra credit due to these two…

Next up, the OpenSpending workshop produced a flurry of great questions, “How much does OpenSpending know about your data? Does it have a concept of what is revenue and what is spending?”, “Can you compare real vs planned expenditure?”, “Are there any time-series visualisations?”, and also a flurry of volunteers, wanting to know how they could get involved. A discussion on how important it is to show your working, a tutorial on Google Refine for cleaning messy spending data appeared well-received and one nail-biting live demonstration on a flaky internet connection later, we had a high-level visualisation of the general shape of the Indian Expenditure Budget showing how much money in 2012 is planned to go on on debt servicing vs planned and non-plan expenditure…

See the data we used and how we wrangled it for these visualisations on the Datahub.

Open Data Meets the Datameeters

On from those who mine big data to those who struggle to get access to the majority of the datasets which they need in order to be able to do their jobs. We came to India with the mission of finding out what the local challenges are in getting, working with, sharing and publishing data, and this first group gave us some great insights into that. We had no idea how many people we were expecting, but we settled in the courtyard as the first few arrived, a journalist from Citizen Matters magazine, the team behind Babajob.com who had been running some data analysis on job demand and supply, programmers, designers and data enthusiasts… and then they came in droves! We moved into a meeting room at Java, which was soon packed full.

With more people of a corporate background than many of the Open Data Meetups we have had here in the UK, big data and data which was key for analysis was a hot topic, but then conversation turned to what the issues facing open data in India were. Here are a couple of thoughts from the group members:

  • Key problems include knowing who to approach to get data. Often, you need to have a personal connection in order to get hold of the relevant data. You also have to tread carefully with data once you have it, so as to preserve relationships for the future.
  • Most people want to collect data themselves rather than trusting ‘second hand’ data collected by the government. (Someone threw out the question to the room, “Which data do you trust more, government or crowdsourced?” the response echoed round the room: “crowdsourced!”.)
  • Too few people actually analyse data. In many cases, once people have got hold of the data they don’t know what to do with it.
  • Very unclear what the legal/copyright situation is with data that has been obtained from an RTI (Right To Information request). We heard this refrain of uncertainty over and over again at the various meet ups. To the best of the group’s knowledge, no-one had ever been charged for releasing data that was given to them in response to an RTI request. However, anecdotally one person had been requested to cease analyses on government data – and did stop.

The conclusion of the evening was a discussion around what the key datasets were and what people wanted to see released. The old reliable post-it notes came pouring in and here’s what people wanted:

Government data/ legal

  • Municipality budget data (held by BBMP)
  • Data regarding performance of government schemes (Planning Commission of India)
  • Data about whistleblowers, follow-up action, people involved, data by state
  • Data on where taxes are spent
  • Detailed data about MPs (no more details provided)
  • Macro-economic data
  • Numbers / Ports of entry of refugees / migrants / aliens (given that India shares open borders with two countries)
  • Judgements and orders of lower courts
  • Legislation and amendments

Transport data

  • Railway ticket movement data – are trains really sold out when they say they are? (Indian Railways)
  • Bangalore transit data. Where are the bus stops? Where are the timetables? – questions which are often local knowledge in India and passed on by word of mouth. The ambitious even ask for real time location data for buses.

Water data

  • Data on ground water

Land/Geography

  • Urban land usage
  • Land ownership, sale, transfer and litigation in progress
  • Access to geodata/ shapefiles/ area-based maps
  • Amount of forest cover (from the Forest Survey of India – 2 votes)
  • Infrastructure database

Census Data

  • Data from 2011

Education & Schools

  • Education Department Data, on the level of training held by teachers (MHRD)

Cultural

  • All historical tests owned by the Archaeological Survey of India

Weather data

  • Long term, high resolution, daily climate data in real time

Other

  • Nobel Prize nominations
  • Automobile cost data & how much duty paid (Excise & Custom)
  • List of blocked websites
  • Anonymised aggregate cell phone locations over time

It would be great to see another meetup on open data in Bangalore as the topic gets its roots established, getting deeper into the subject, getting policymakers involved, perhaps building on this list of requests.

We’d like to say a huge thank you to the Datameet group for allowing us to theme one of their meetups around open data. They are a huge and very active group of Data Science enthusiasts in India which meet online (and organise their offline meetings) via a Google Group. Membership spans many diverse communities connected by a common interest in data. Some of the members of the datameet group have also been driving discussion on the OKFN India mailing list, where talk is specifically about open data. We’d also like to thank Aditya Hari, who volunteered to find us a venue, and to the fantastic venue, Jaaga, themselves, who let us invade the wonderful courtyard cafe and the atmospheric orange room.