From CMS to DMS: C is for Content, D is for Data

4 Min Read

This is a joint blog post by Francis Irving, CEO of ScraperWiki, and Rufus Pollock, Founder of the Open Knowledge Foundation. It’s being cross-posted to both blogs.

Content Management Systems, remember those?

Tim Berners-Lee in thought

It’s 1994. You haven’t heard of the World Wide Web yet.

Your brother goes to a top university. He once overheard some geeks in the computer room making a ‘web site’ consisting of a photo tour of their shared house. He thought it was stupid, Usenet is so much better.

The question – in 1994 did you understand what a Content Management System (CMS) was?

In the intervening years, CMS’s have gone through ups and downs.

Building massive businesses, crashing in the .com collapse. Then a glut, web design agencies all building their own CMS in the early noughties. Ending up with the situation now.

A mature market, commoditised by open source WordPress. Anyone can get a page on the web using Facebook. There’s still room for expensive, proprietary players, newspapers custom make their own, and businesses have fancy intranets.

Data Management Systems, time to meet them!

DMSs are also called "data hubs". Hopefully less patented than this wheel!

It’s 2012. You’ve just about heard of Open Data.

Your nephew researches the Internet at a top university. He says there’s no future in Open Data, no communities have formed round it. Companies aren’t publishing much data yet, and Governments the wrong data reluctantly.

The question – what is a Data Management System (DMS)?

There isn’t a very good one yet. We’re at round about where CMS’s were in the mid 1990s. Most people get by fine without them.

Just as then we wrote HTML in text files by hand and uploaded it by FTP, now we analyse data on our laptops using Excel, and share it with friends by emailing CSV files.

But it reaches the point where using the filesystem and Outlook as your DMS stretches to breaking point. You’ll need a proper one.

Nobody really knows what a proper one will look like yet. We’re all working on it. But we do know what it will enable.

What must a DMS do?

All the things people expect a DMS to do!

A mature DMS will let people do all the following things. Whether as a proprietary monolith, or by slick integration across the web:

Load and update data from any source (ETL)
Store datasets and index them for querying
View, analyse and update data in a tabular interface (spreadsheet)
Visualise data, for example with charts or maps
Analyse data, for example with statistics and machine learning
Organise many people to enter or correct data (crowd-sourcing)
Measure and ensure the quality of data, and its provenance
Permissions; data can be open, private or shared
Find datasets, and organise them to help others find them
Sell data, sharing processing costs between users

If it sounds like a fat list for a product, that’s because it is. But sometimes the need, the market, pulls you – something simple just won’t do. It has to do or enable, best it can, everything above. (Compare it to the same list for CMSs)

In short, it’s what the elite data wrangling teams inside places like Wolfram Alpha and Google’s Metaweb teams do. But made easier and more visible using standardised tools and protocols.

Who’s making a DMS?

More people than I realise. From the largest IT company to the tiniest startup. Here are some I know about, mention more in the comments:

Windows / OSX (+ Excel / LibreOffice / …) – the desktop serves as a (good enough so far) DMS
CKAN software – started as a data catalog, but has grown into more and powers the DataHub, a community data hub and market. Created by the Open Knowledge Foundation
ScraperWiki– coming from the viewpoint of a programmer, good at ETL
Infochimps/DataMarket – approaching it as a data marketplace
BuzzData – specialising in the social aspects
Tableau Public – specialising in visualisation
Google Spreadsheets – coming from the web spreadsheet direction
Microsoft Data Hub – corporate information management
PANDA – making a DMS for newsrooms

They’re all DMS’s because they all naturally grow bad versions of each other’s features. Two examples.

ScraperWiki is particularly good at complex ETL (loading data into a system), yet every DMS has to have a data ingestion interface of at least choosing CSV columns.

CKAN has particularly good metadata, usage and provenance, yet every DMS has to have a way for people to find the data stored in it.

So will they be giant monolithic bits of software?

We standardised the shipping container, can we standardise data interoperation?

We hope not! That didn’t turn out great for CMSs, although there are some businesses providing that.

CMS’s only really came of age when in the mid-noughties everyone realised that WordPress (open source blogging software!) was a better CMS than most CMS’s.

It’s in everyone’s interest that users aren’t locked into one DMS. One of them might have a whizzy content analysis tool that somebody who has data in another DMS wants to use. They should be able to, and easily.

OKFN is about to launch a standards initiative to bring together such things. It’s called Data Protocols.

So far the clearest needs are twofold and mirror each other – pulling and pushing data:

a) a data query protocol/format to allow realtime querying, for example for exploring data. Imagine a Google Refine instance live querying a large dataset on OKFN’s the Data Hub.

b) a data sync protocol/format that is liken to CouchDB’s protocol. It would let datasets get updated in real time across the web. Imagine a set of scrapers on ScraperWiki automatically updating a visualisation on Many Eyes as the data changed.

Later even more imaginative things… I reckon Google’s Web Intents can be used to make the whole experience of the user slick when using multiple DMS’s at once. And hopefully somebody, somewhere is making a simplified version of SPARQL/RDF just as XML simplified SGML and then really took off.

Enough of me! What do you think?

Join in. Make standards. Write code.

Leave a comment below, and join the data protocols list.

Written by

Francis Irving

CEO of ScraperWiki. Made several of the world's first civic websites, such as TheyWorkForYou and WhatDoTheyKnow.

48 Comments

Kling 3 says:

July 19, 2026 at 02:20

This is a thoughtful take on from cms to dms: c is for content, d is for data. The practical examples really help illustrate the concepts. If you want a related example, Kling 3 is worth a look.

Reply
SkillSelect EOI data says:

July 18, 2026 at 14:53

This is such a sharp and timely shift in thinking. The move from content to data really redefines how we should approach information management. Great read!

Reply
Paralives Blog says:

July 14, 2026 at 13:58

The comparison between emailing CSV files and uploading hand-written HTML by FTP neatly captures why data management systems were still at an early stage. The proposed feature list also shows that a mature DMS must handle discovery, provenance, permissions, analysis, and collaboration rather than merely store datasets. That framework is useful when considering how structured community resources such as Paralives Blog organize evolving information for their readers.

Reply
David Brooks says:

July 5, 2026 at 03:01

Good perspective. I found paper-banana relevant when reading about paper-banana.

Reply
Tom Harrison says:

July 1, 2026 at 13:45

Great write-up — I bookmarked astrocarto for reference on astrocarto.

Reply
BSB-Lookup says:

June 18, 2026 at 14:09

Great piece! The shift from CMS to DMS really resonates—data-first thinking is where true innovation lies. Thanks for articulating it so clearly.

Reply
AFSL Search says:

June 18, 2026 at 07:55

This is such a sharp and insightful take. The shift from managing content to managing data really reframes how we think about digital structures. Thanks for articulating it so clearly.

Reply
Australian BPAY codes says:

June 18, 2026 at 04:43

Great read! The shift from CMS to DMS really makes you rethink how we manage information. Love the clear distinction between content and data.

Reply
geminiworld says:

June 17, 2026 at 14:48

This is a thoughtful take on from cms to dms: c is for content, d is for data. The practical examples really help illustrate the concepts.

geminiworld

Reply
I2V says:

June 16, 2026 at 13:02

This is a thoughtful take on from cms to dms: c is for content, d is for data. The practical examples really help illustrate the concepts.

I2V

Reply
flux3 says:

June 12, 2026 at 09:22

The discussion about from cms to dms: c is for content, d is for data raises some really valid points. This perspective is refreshing.

flux3

Reply
Gpt-image2 says:

June 12, 2026 at 06:36

This is a thoughtful take on from cms to dms: c is for content, d is for data. The practical examples really help illustrate the concepts.

Gpt-image2

Reply
pomelli says:

June 12, 2026 at 00:58

This is a thoughtful take on from cms to dms: c is for content, d is for data. The practical examples really help illustrate the concepts.

pomelli

Reply
shye3783 says:

June 11, 2026 at 14:06

The discussion about from cms to dms: c is for content, d is for data raises some really valid points. This perspective is refreshing.

shye3783

Reply
suueu33 says:

June 11, 2026 at 09:35

This is a thoughtful take on from cms to dms: c is for content, d is for data. The practical examples really help illustrate the concepts.

suueu33

Reply
shu83833 says:

June 10, 2026 at 10:07

This is a thoughtful take on from cms to dms: c is for content, d is for data. The practical examples really help illustrate the concepts.

shu83833

Reply
sju8833 says:

June 9, 2026 at 10:25

This is a thoughtful take on from cms to dms: c is for content, d is for data. The practical examples really help illustrate the concepts.

sju8833

Reply
Kevin Lee says:

June 9, 2026 at 02:32

I absolutely agree with your points about Best AI https://aiimagegenerators.net/ Image Generator. It’s very helpful. Continue the excellent work!

Reply
efr44 says:

June 7, 2026 at 23:45

The discussion about from cms to dms: c is for content, d is for data raises some really valid points. This perspective is refreshing.

efr44

Reply
shu83833 says:

June 7, 2026 at 23:02

The discussion about from cms to dms: c is for content, d is for data raises some really valid points. This perspective is refreshing.

shu83833

Reply
ee33 says:

June 3, 2026 at 12:39

The discussion about from cms to dms: c is for content, d is for data raises some really valid points. This perspective is refreshing.

ee33

Reply
e33e34 says:

June 1, 2026 at 13:03

The discussion about from cms to dms: c is for content, d is for data raises some really valid points. This perspective is refreshing.

e33e34

Reply
aiimageagent says:

May 31, 2026 at 06:14

This is a thoughtful take on from cms to dms: c is for content, d is for data. The practical examples really help illustrate the concepts.

aiimageagent

Reply
donking says:

May 22, 2026 at 05:52

The discussion about from cms to dms: c is for content, d is for data raises some really valid points. This perspective is refreshing.

donking

Reply
Veo 3 says:

January 8, 2026 at 08:05

Huge thanks for sharing this with all of us! Your post is packed with useful tips and thoughtful ideas that I’ll definitely apply in my daily life. It’s rare to come across such sincere and well-organized content, and I’m really grateful for your generosity.

Reply
Sha A. says:

January 6, 2026 at 07:50

Fascinating overview of DMS evolution! As we also review different SaaS tools at TheSoftReview, we found that understanding these systems is crucial for selecting the right data management software.

Reply
E.Shm says:

December 22, 2025 at 06:26

Thought-provoking perspective on how data management evolved from traditional CMS models. As we also review different SaaS tools at TheSoftReview, we’ve seen many modern platforms still building toward the DMS vision outlined here.

Reply
Aliecett says:

May 22, 2014 at 08:34

http://aliecett.wicp.net/

Reply
Francis Irving says:

March 15, 2012 at 13:56

Martin, yeah! WordPress is like the commodity, lowest-common-denominator CMS. Just as in the last couple of decades, Excel has been the go-to for data analysis. In both cases, there are better higher end products!

Reply
Francis Irving says:

March 15, 2012 at 13:42

Rasmus – intriguing! Have never thought about Drupal. Does it handle multiple kinds of data sets for its end users? I don’t think we should call everything a DMS just because it has data in it! Reckon dissecting Drupal as a DMS or not would take a proper conversation though not just blog comments…

Michael – thanks! Hmmm, that’s a shame regarding DMS. I find myself calling them “data hubs” more often, so perhaps that is what we should go for. What do you think?

Reply
Michael Hausenblas says:

March 15, 2012 at 08:53

Franics, Rufus,

Congrats, great post, thanks for establishing the necessary terms here. We really need more discussion and awareness in this area (data management systems). Minor nit: the abbreviation DMS, I think, is somehow attached to Document Management Systems, which, at least in my experience is a rather suboptimal thing to do or further encourage. Can we come up with something better?

FYI: I’ve linked your post from my recent presentation [1] at the Irish Local Government Management Agency (LGMA) Open Source Forum in Dublin, where I explain the transition from Doc MS over Content MS to Data MS.

Again, thanks and KUTGW!

Cheers,
Michael

[1] http://bit.ly/consumer-pull-through-open-data

Reply
Pingback: Stream of consciousness March 14th
Pingback: (14:42 13-03-2012) Noticias más populares de #opendata en las ultimas 24 horas | Tuits de Software Libre
Pingback: (09:01 13-03-2012) Noticias más populares de #opendata en las ultimas 24 horas | Tuits de Software Libre
Rasmus Schultz says:

March 12, 2012 at 13:22

This article completely failed to mention Drupal – a DMS that has been around for a long time. In fact, Drupal was never much of a CMS, if you ask me.

I don’t personally like Drupal, for a number of reasons – I won’t get into those here. But I wonder if part of the reason it doesn’t meet my expectations for neither a CMS or a DMS, is the fact that they think they’re building a CMS – when clearly it’s much closer to a DMS in terms of features.

In fact, I think those who pick Drupal are those who actually need a DMS more than a CMS. I wonder how many people tried out Drupal as a CMS and were confused and disappointed? Perhaps it’s time they change their description from CMS to DMS.

Reply
Francis Irving says:

March 12, 2012 at 10:10

Pieter, it looks fantastic!

Reply
Pieter Colpaert says:

March 12, 2012 at 10:05

Hi Francis,

Doesn’t seem down to me now: http://thedatatank.com. Our hosting provider announced some maintenance down-time yesterday though. Sorry for that.

Kind regards,

Pieter

Reply
Francis Irving says:

March 12, 2012 at 09:12

Glenn, some data hubs have transparent loading, yes! But they don’t have to, and when they do it is of its nature limiting. I think actually if something totally automatically gets all its data (like say an RSS reader), then all its datasets are “of one kind”. And that means it isn’t really a data hub, just an app and a database. Hmmm, so maybe my definition of data hub is somewhere that has “a list of datasets, of an indefinite number of kinds (i.e. schemas)”.

Pieter, I’d never heard of DataTank before. Their website (thedatatank.com) that appsforghent.be links to seems to be down. Is it the same as this DataTank? http://datatank.co.uk/ Anyway, yes Apps for Ghent looks like a Data Hub, in the publishing Government information vertical! Like Socrata and CKAN.

Kerstin, that sounds to me like ERP running on databases… But now I think about it, perhaps ERP is a data hub vertical, in theory, if it was across the web and more transparent.

Reply
Martin De Wulf says:

March 11, 2012 at 22:41

Finally, I have a better idea of what ckan or the datatank are trying to do. This article was an eye opener for me. That said, with my background, your argumentation is a bit weakened by the fact that you use wordpress as the example of what a dms should try to be, while it is a quite bad cms( but a decent blog platform). Just a minor nitpick.

Reply
Francis Irving says:

March 11, 2012 at 17:07

Fariz, it’s always easy when making an innovative product to claim that you have no competition. This isn’t true – people always do something at the moment, get by in some way. The world wouldn’t end without your new product. That current solution – getting by somehow else without is the competion.

In the case of data hubs, people’s operating systems are the incumbent competition. We use a combination of tools, such as their filesystem, email clients, and applications like Excel, SPSS, Matlab etc.

Reply
Kerstin Forsberg says:

March 11, 2012 at 11:53

Nice blog post!

An old auntie, like me, can tell you about some of the “DMS” we had long before 1994 — Systems to manage for example payroll data and accounting data with systems running on so called mainframes. And You may have an old onkel who can tell you about how they for example managed manufacturing data and spare part data with with systems on so called minicomputers. We didn’t call them”DMS” but “ADB system” (Automatisk DataBehandling in Swedish).

Many things were different from now: no global scale (often local homegrown systems), no crowd (often just a bunch of terminal- users) and no sql (but a lot of Get-Hold-Unique-within-Parent calls in hierarchical database management systems).

While other things have become even more important given global scale, crowdsourcing, and the recognision of that “anyone can say anything about any topic” on a web of data — That is things such as data integrity, data traceability, and the need for data context and also to cope with the variances in the reality represented in the data.

Reply
Fariz says:

March 11, 2012 at 11:41

I still don’t quite understand. Why do Windows and OSX serve as a DMS? May I have a brief explanation?

Reply
Pieter Colpaert says:

March 11, 2012 at 10:27

You must really hate The DataTank because this is exactly what we started to do from the start ;).

http://data.appsforghent.be

Reply
Lazaros says:

March 10, 2012 at 20:27

I code for a living!

When do we start???

Reply
Glenn says:

March 10, 2012 at 05:47

Data loading is very important but must be so intuitive as to be almost transparent. For example, http://www.dynamicalsoftware.com/convocontent/ccm.html is an Alfresco and Hippo CMS integration where online team discussion gets summarized into actionable documents and published automatically.

Reply
Francis Irving says:

March 10, 2012 at 01:22

Yaron – interesting, yes it does look like Semantic MediaWiki is a data hub!

Reply
Patrick says:

March 9, 2012 at 16:43

DataCouch has great potential for data sharing/crowd-sourced cleanup. Since it uses CouchDB, which automtically versions every change, it provides a method for data set forking & merging. It also lets users create visualization/etc. apps on top of datasets.

Reply
Yaron Koren says:

March 9, 2012 at 15:59

Interesting article; I assume you’re not aware of Semantic MediaWiki.

Reply