Forget Big Data, Small Data is the Real Revolution

April 22, 2013, by Rufus Pollock

This is the first in a series of posts. The next posts in the series is What Do We Mean by Small Data

There is a lot of talk about “big data” at the moment. For example, this is Big Data Week, which will see events about big data in dozens of cities around the world. But the discussions around big data miss a much bigger and more important picture: the real opportunity is not big data, but small data. Not centralized “big iron”, but decentralized data wrangling. Not “one ring to rule them all” but “small pieces loosely joined”.

Big data smacks of the centralization fads we’ve seen in each computing era. The thought that ‘hey there’s more data than we can process!’ (something which is no doubt always true year-on-year since computing began) is dressed up as the latest trend with associated technology must-haves.

Meanwhile we risk overlooking the much more important story here, the real revolution, which is the mass democratisation of the means of access, storage and processing of data. This story isn’t about large organisations running parallel software on tens of thousand of servers, but about more people than ever being able to collaborate effectively around a distributed ecosystem of information, an ecosystem of small data.

Just as we now find it ludicrous to talk of “big software” – as if size in itself were a measure of value – we should, and will one day, find it equally odd to talk of “big data”. Size in itself doesn’t matter – what matters is having the data, of whatever size, that helps us solve a problem or address the question we have.

For many problems and questions, small data in itself is enough. The data on my household energy use, the times of local buses, government spending – these are all small data. Everything processed in Excel is small data. When Hans Rosling shows us how to understand our world through population change or literacy he’s doing it with small data.

And when we want to scale up the way to do that is through componentized small data: by creating and integrating small data “packages” not building big data monoliths, by partitioning problems in a way that works across people and organizations, not through creating massive centralized silos.

This next decade belongs to distributed models not centralized ones, to collaboration not control, and to small data not big data.

Want to create the real data revolution? Come join our community creating the tools and materials to make it happen — sign up here:

This is the first in a series of posts about the power of Small Data – follow the Open Knowledge Foundation blog, Twitter or Facebook to learn more and join the debate at #SmallData on Twitter.

23 thoughts on “Forget Big Data, Small Data is the Real Revolution”

Daniel Lombraña González says:

April 22, 2013 at 14:50

I agree with you Rufus. Great blog post! It is time to make platforms services where you not have silos and you can integrate small datasets to analyze your problem. That will be the real success!
Bill Roberts says:

April 22, 2013 at 16:07

Great stuff. I agree, decentralisation is the way forward. The current obsession with big data is frustrating: it’s about getting the right data for your purpose.
Jose Leal says:

April 22, 2013 at 17:49

I would agree that the whole “Big Data” meme is yet another bubble. Not too dissimilar to those of social media, cloud etc. Not to say that they are not relevant, but they are just an evolution of the digital space.

People have always wanted to connect and share, so in that sense social media is simply a digital fulfillment of that human need. The cloud is also an evolution to hosted solutions. Again, not much new there. It’s been a long time since most of us have hosted our own websites, or email servers etc. Those trends will continue, but they will not revolutionize how our world works.

You’re right, Small Data is the future. I think “Small, Open and Linked Data” (SOLD) that is the future! But, where as the others are incremental evolution of the digital space, SOLD is what has the potential to change our society. Distribute creation, access, updating, linking, and most importantly control.

Let’s learn to value small, not big. Let’s learn to value open, not proprietary. Let’s learn to value connected, not silos. I’m SOLD ;-)
1. groundrace says:
  
  April 23, 2013 at 10:21
  
  Jose, SOLD is pretty awesome acronym.. good catch
groundrace says:

April 23, 2013 at 10:15

Rufus, great post!

however I’m pretty sure that the big data platforms and technologies while are crossing the chasm are also becoming more and more “democratic” of what we can imagine. Isn’t true that big data technologies, until few time ago was only available to very large corporations? And isn’t true that today Hadoop platforms can be installed, scaled and used in matter of few clicks by almost anyone?
It seems to me that all the platform vendors, made few exceptions, are committed on pursue the open source approach and focus on elevating the technology to the enterprise grade and ready for mass adoption.There are vendors who already predict for 2013 that the Big prefix soon will disappears and we will just talk about Data.
Therefore as you say the point is not about Big Data or Small Data but more about open and not open data and when it’s open is all about its quality and affordability.
bilgrami says:

April 24, 2013 at 16:38

terminology aside, while there is plenty of innovation in big data visualisation and new tools – most ‘small data’ is created in excel and communicated in PowerPoint – tools that are pretty old and don’t really fit with the workflows people are using today. I just wrote a blog on this subject (check out my posterous blog), but anyone interested should have a look at SharpCloud!
Urs E. Gattiker says:

April 25, 2013 at 06:38

Dear Rufus

Thanks for this blog post. I agree with you that big data can reveal much more information than we may be willing to admit or be aware of.

For instance, even a blog like this can, if we analyze reactions, social sharing and comments, reveal plenty of information about the organization, its clients (here supporters :-) ) and so forth.

I have tried to demonstrate this by just benchmarking this blog and some small data.

@rufuspollock:twitter thanks for sharing.

Urs (see small data here)
http://blogrank.cytrap.eu/rank/blog.okfn.org

PS. I also like small data because it is far less of a nightmare in cases of data breach. Hence, we are more able to protect user rights and their privacy.

Something we should maybe also keep in mind?
Pingback: Olvídense del Big Data, la revolución real son los datos pequeños ~ #GobiernoAbierto ~ Infobae.com
zool says:

April 25, 2013 at 16:05

Good article, but your excessive use of bold text is irritating to say the least. Your readers are supposed to be smart, they deserve better than this.
Abbott Katz says:

April 26, 2013 at 12:23

Apropos the above, you may be interested in my blog: http://www.spreadsheetjournalism.com
Pingback: Cutting Big Data Down to Size | Inside-BigData.com
miska knapek says:

April 27, 2013 at 12:23

Thanks for the post. Good points.

The latest edition of the Datastories podcast, run by some eminent information design people, partly touches on this issue. http://datastori.es/data-stories-21-visualization-save-the-world/

While the episode is about information design work for NGOs, they mention that working with big data assumes that one can find an answer to something in the data, but not really beginning with a question.

When starting out with a question, one needs to look at which “small data” ( and maybe the big data, depending on what one wants to find) could be relevant.

Hence, bragging about how “big” one’s data is, is a bit like bragging about how many features one’s phone has, rather than how useful it is.

Of course, all data has its place – and the democratic access to this is important, as Rufus mentioned – and it’s a good question of whether to start with opening data, or thinking of good questions to ask, and then seeing which data one should open.

Perhaps one answer is to get as many domain-experts – and people in general – into the discussion, so they can come up with good questions leading to relevant data being opened.
Pingback: Committing Sociology and Change – Evidence of our times | Minor Expletives & Better Questions
Allen Bonde says:

April 30, 2013 at 13:51

Love the post/topic! The way I’ve been thinking about Small Data is as the ‘last mile’ of Big Data – we (often) need Big Data behind the scenes, but the trick is to provide simpler, more consumer-style apps and tools at the front end that work on any device, foster social sharing and help non-technical users turn insights into actions…that are actually helpful in the moment. Readers can see my latest thoughts here: http://www.digitalclaritygroup.com/blog/small-data-goes-big-time/

cheers,
Allen
Emanuil Tolev says:

May 5, 2013 at 18:08

Agreed. “small data” may be a harder problem to tackle than Big Data though, similar to how organisations learned to deal with writing and maintaining big software systems in-house, but when it came to distributing and reusing functionality, the problem becomes quite thorny (think Python’s setuptools, pip; win32’s DLL hell; Ruby’s problems with multiple gem versions on 1 system).

Another problem which may arise with Big Data is the ability to co-operate – in the software industry, before the API age there wasn’t really a way to maximise interoperability “out-of-the-box”. And it’s still difficult to get an API quite right.

It is, however, a worthwhile problem, certainly.

(Another related topic is that software and the software industry have really mostly been about data…)
Morten Skaaning says:

May 7, 2013 at 13:30

Hi,

I read your article and a few things stuck out as odd. The revolution talk is all nice, but you didn’t explain why big data is unprofitable, nor how you would reach a consensus about data formats in a decentralized environment.

Small data have a tendency to fragment and loosing meaning as data formats permute. Why mention Excel as an example when it’s produced by a company with a history of proprietary format lock-in? From your examples it seems like you encourage a movement of “keep track of your household expenses” and “know what you’re doing”, but those are not exactly revolutionary concepts.

You mention “componentization”, but how do you make sure that components have any meaningful connection? What is a “data package”? Is it like an XML-file or a zip-file? How do you define the data exchange protocols if all data permutes faster than your consensus grows?

If all data is decentralized and sporadic, wouldn’t there be big business opportunities in organizing the data and making it searchable? You know, like Google and The Web. What about selling redundancy for data that could go missing? Or selling a package of related data in a chuck, so you’d know that you have all the data you need for a specific purpose.

What you do you with the email addresses from the people that sign up at the bottom of your article? Do you make any kind of money from those?
1. Morten Skaaning says:
  
  May 7, 2013 at 13:34
  
  i forgot. One of the “poster childs” of big data would be Google Translate. Which tries to use so much data that grammar becomes a “soft problem”.
sindu says:

May 31, 2013 at 17:58

Nice piece of info. You may be interested in this article

http://www.bodhtree.com/blog/
Adam Krause says:

October 9, 2013 at 18:07

Everybody seems to have a different definition of “Big Data”. Some see it as a single huge repository of information that can tell you the meaning of Life, the Universe, and Everything. That model dates back to the mainframe era.

Just as that system gave way to a distributed system of computing, data will become more local. It’s still “Big Data”, you’ll just get data that’s more relevant to the end user. But “Local Data” or “Your Data” doesn’t quite have the same awe-inspiring, jaw-dropping, come-to-Jesus effect, does it?
Georgeo Nocera says:

February 18, 2014 at 23:38

Great, tks!
Dmitry says:

April 13, 2014 at 11:26

I believe that our ultimate goal is reuse already generated knowledge rather than reinvention of wheels.
Experience of work within P2P environments shows the vital importance of indexing of decentralised knowledge and data. “Loosely joined” packages would be difficult to find unless they are organised within the Unified Conceptual Space:
http://confocal-manawatu.pbworks.com/w/page/62073491/Why%20Unified%20Access%20to%20Information%20is%20Required
and linked in the way similar to this:
http://confocal-manawatu.pbworks.com/w/page/67722926/Integrated%20Virtual%20Associative%20Network
I would appreciate joining a group interested in development of “small data” initiative under the principles similar to IVAN and UCS.
Please visit the prototype of IVAN as the example how a set of about 15,000 packages of “small data” are arranged with each of the packages (“sense domains”) to be accessed in seconds:
http://confocal-manawatu.pbworks.com/w/page/68435296/What%20is%20noaSphere
I would be happy to help you started.
Dimitri
mony says:

June 25, 2014 at 18:03

شركة
تنظيف خزانات بجدة
mony says:

June 25, 2014 at 18:04

شركات
تنظيف بالمدينة المنورة

شركة
تنظيف منازل بالمدينة المنورة

شركة
كشف تسربات المياه بالمدينة المنورة

نقل
اثاث الدمام

شركة
تنظيف خزانات بالدمام

شركة
تنظيف فلل بالدمام

شركة
تنظيف مسابح بالمدينة المنورة

شركة
تنظيف مجالس بالمدينة المنورة

شركة
تنظيف فلل بالمدينة المنورة

شركة
تنظيف موكيت بالدمام

شركة
تسليك مجاري بالدمام

شركة
تنظيف فلل بالدمام

شركة
نقل اثاث بالدمام

شركة
عزل اسطح بالدمام

شركة
تنظيف شقق بالدمام

شركة
مكافحة حشرات بالدمام

شركة
كشف تسربات المياه بالدمام

شركة
تنظيف خزانات بالدمام

شركة
نقل اثاث بالمدينة المنورة

شركات
تنظيف بالمدينة المنورة

شركة
رش مبيدات بالمدينة المنورة

شركة
تنظيف بجدة

شركة
تنظيف خزانات بجدة

شركة
رش مبيدات بجدة

نقل
عفش جدة

شركة
مكافحة الحشرات بجدة

شركة
تنظيف شقق جدة

شركه
نقل عفش بالطائف

شركة
نقل أثاث بالقطيف

شركة
تسليك مجارى بالأحساء

هنا

here

here

هنا

here

Comments are closed.

Forget Big Data, Small Data is the Real Revolution

Further Reading

Rufus Pollock

23 thoughts on “Forget Big Data, Small Data is the Real Revolution”

Search

Tools

Community

Further Reading

Rufus Pollock

23 thoughts on “Forget Big Data, Small Data is the Real Revolution”

Search

Browse by Topic

Tools

Community