Featured Project: MusicBrainz
MusicBrainz is a user-maintained community music metadatabase. The MusicBrainz community collects and maintains data about recorded music releases such as artist name, release title and track listing. That data is re-used by music services across the web, including Amazon and Last.fm, as well as in Free and Open Source Software applications.
Robert Kaye is Executive Director of the MetaBrainz Foundation, the non-profit group which operates MusicBrainz. The Open Knowledge Foundation spoke with Robert about the history of the MusicBrainz project, about the role of community, about the way open licensing helps MusicBrainz work, and about the future he sees for music meta data.
This interview is the first in what we hope will become a regular spotlight on individual members of the Open Knowledge community.
The core MusicBrainz data is in the public domain, hence open in accordance with the Open Knowledge Definition. Further details are available at:
Open Knowledge Foundation: Can you give us a brief history of MusicBrainz? When was it started? Who was responsible? And what initially motivated the project?
Robert Kaye: A full history of the project is here: http://musicbrainz.org/doc/MusicBrainz_History
It was started in 1999 as the CD Index and then in 2000/2001 it became MusicBrainz. I’m the person who started MusicBrainz.
OKF: How has it developed? Has anything happened along the way that was completely unexpected?
RK: It developed better than I expected – there are many facets that I didn’t envision when I first started the project. For instance, I didn’t envision people getting quite so involved and passionate about the project. One example is a person in Oslo, Norway who has Asperger’s syndrome and has a hard time leading a normal life. This person will probably never hold down a regular job, but this person has made quite serious contributions to MusicBrainz. In a sense MusicBrainz gave this person an opportunity that was never before available to them.
I didn’t see that one coming at all!
OKF: Has anything turned out differently to the way you expected it to?
RK: Yes. At first I thought automatic collection of data could make the job for people a lot easier. But it turns out that automatic collection is the exact wrong thing to do – a certain subset of people (those who contribute to MusicBrainz and Wikipedia) do not trust automatically collected data and we threw out most of the automatically collected data in favour of human verified data. I also considered importing the FreeDB data at some point in order to bootstrap MusicBrainz, but the community made it clear that they would revolt if I did that. They felt that they didn’t want to clean up that giant mess – instead we opted to build a clean database one step at a time.
There are tons of things that I didn’t expect, but many of those are technical in nature and not so relevant to open data.
OKF: What’s been your biggest challenge and what has been your biggest moment of pride?
RK: The biggest challenge has been dealing with “Poisonous People”. People (developers, interestingly enough) who are well intentioned but don’t always get along with everyone else. These people divide the community as they rally support for their views and that hinders forward progress. When a community is divided it’s hard to get anything done, because everyone is bickering and sniping at each other.
In the summer of 2006 we had this problem and it nearly ripped the project in half – that was clearly a low point for MusicBrainz.
The biggest moment of pride? Getting the BBC on board and having the BBC use MusicBrainz data to organise their whole music play-out and music tracking system. MusicBrainz provides the metadata that gives BBC Music its structure. To see how the BBC integrated MusicBrainz in a publicly visible manner, see: http://www.bbc.co.uk/music/reviews/dw9x
OKF: How big is the community that contributes to Music Brainz, and what do you think motivates people to contribute?
RK: We have 465,897 registered users and of those 1,385 were active in the last week. The MusicBrainz community fits into roughly three categories:
· The core people: These are the people who are hacking on MusicBrainz or editing profusely. MusicBrainz is their hobby, job or resume builder.
· Regular editors: People who love music more than the average person and want to make sure the data for their artists is clean and that their music collection is sparkling clean as well.
· Tagger users: People who use one of the tagging applications to clean up their music collection. These people tend to be a very transient group. They come, clean up their collection and leave. They may come back in a while to clean up new data in their collection. To them MB is a means to clean up their collection.
OKF: What is the role of the community inside MusicBrainz?
RK: The community is critical for MusicBrainz. If people stopped editing the data in MusicBrainz, the data would stop changing and the business model would instantly vaporise. The software that powers MusicBrainz is worthless without the data. The data is worthless without the people behind it. Given that, we need to make sure that we don’t alienate our contributors – we can’t afford any missteps that would cause the community to lose faith in MusicBrainz.
OKF: Can you give us some stats on the material? Releases? Updates?
RK: All the stats you could possibly ever need are here:
OKF: Where do you get your data from? Do you build on any other open material?
RK: It’s all user-curated and the users decide what sources they want to use to verify information. I know our users use Amazon quite a bit to glean information and Discogs when Discogs has the data they are looking for. The only open data source we use is FreeDB.
OKF: Where can you download the data? What format is the data in?
RK: You can download the data here: http://musicbrainz.org/doc/Database
It’s in Postgres data dump format. Normally you would use our open source software to load the data into a Postgres database. The page above also talks about our Live Data Feed, which is how we keep our customers updated on an hourly basis. Commercial use of this service requires a license from us, which is how we make ends meet at the foundation.
OKF: Why did you decide to use an open licence? What are the advantages of using an open licence from your point of view?
RK: Mainly I was upset that CDDB, which used to be freely downloadable, was taken private by Escient (now (dis)GraceNote). I typed in several hundred CDs and now someone else was making money off my work. I was pissed. At the time I was getting into open source and I saw that open data would be a critical play in the future – a future I perceived to be off in a number of months – I wasn’t ready to wait a decade for it to be really ready.
The vision I saw included a well linked data set with stable identifiers that didn’t change so that the data set could be cross-linked in a stable manner. What I saw was the “Semantic Web” or what we’re now calling “linked data” and it was clear to me that in order to play in this field you couldn’t make a walled garden around your data. If you ever hoped that others would link to your data, it was clear to me that I had to bend over backwards in order to make this data available to everyone. I also saw Linux growing steadily and slowly making in-roads against Microsoft – how can Microsoft compete with free AND high quality? It would be hard. We’re seeing the same happening with Wikipedia and classic encyclopaedias – Microsoft recently shelved Encarta, a sign that Wikipedia is edging out some of the smaller players.
This vision was the easy part. Then the hard work started – what licence should I use? The only licence out there was the Open Content licence, which was largely unproven. And it didn’t address the issues that faced data very well. In an email conversation with Richard Stallman he suggested that I use the GFDL… Compared to the GPL the GFDL is a horrid abomination! (I’m still trying to find the front matter and the appendix in my database tables!!) Mr. Stallman also brusquely informed me that the text of the GPL was *NOT* available under the GPL or any licence for that fact. He specifically forbade me from using his text to create a better, more data oriented licence. Not surprisingly, I stopped being a fan of RMS from that point on.
I ended up having many conversations about licences and was quite frustrated… then I got a call from the Creative Commons! They were about to launch and were looking for projects who would adopt their licences before they went public. I read the licences and was immediately jazzed about them. I had already been educating myself about the Public Domain and the Feist vs Rural Telephone company case and thought that my core data needed to be in the Public Domain. Now the CC provided a nice and clean method for doing this – I adopted the licenses clear across the board.
The non-commercial licence was actually the magic that enabled me to found the MetaBrainz Foundation! I was convinced to NOT create a legal entity for MusicBrainz until I could see a business model emerge that didn’t hinge on begging. My concept was to allow free access to the core data, but play gatekeeper on the data and control how quickly and how conveniently someone could get access to the data. By allowing the public non-commercial unfettered access to the data, I would win over the Open Source communities, which we have. But by taking money for timely and convenient access, I could fund the foundation and in turn fund my own paycheque. This has been working well so far – while we’re not making oodles of money (especially in this economic climate), we’ve been in the black year over year since inception. I never resort to begging and yet I can license public domain data to make ends meet.
What’s even more trippy about this is that I may have created the first 100% profit non-profit business model. Since the operations of the project are for the public at large, we make this as cheap as possible. And making the data available to the public is part of that deal – it is written into our IRS charter. When a commercial customer comes along, they tap into our live data feed, which they pick up from our FTP site, which is actually operated by the Oregon State University with support from Google. In other words, the incremental cost for adding a new customer is ZERO. After I sign the contract, I do nothing but cash the cheques. It’s a rather odd arrangement, but the IRS hasn’t given me grief and my community and customers are happy.
OKF: Where has MusicBrainz been re-used?
RK: A roster of our paying customers is here:
There have also been quite a few research projects and university papers. The Solr 1.4: Enterprise Search Server book has been written using the MusicBrainz data as examples. There are dozens of start-ups that are using our data and if they ever make it past the seed stage they will becomes customers of MetaBrainz. Plus the Open Source world uses our data and it can be hard to see who makes use of the data – so there are many places that use our data without us ever knowing about it.
OKF: What are your plans for MusicBrainz in the future? In an ideal world where do you hope it will go?
RK: We want to support classical music much better than we do now. We’re in the process of creating a new schema that allows us to finish support for classical music. There are lots and lots of ways in which we can improve the experience for our users and make it easier for everyone to contribute. We’re also keen on getting music information from the whole world over – not just those who can read English.
Then I want to make sure that MusicBrainz gets more connected to the outside world. I want reviews and concert information to be one click away. And as applications like Google Maps get more impressive, I want to provide the data about musicians who are playing at a given venue when you walk past that venue. There are many more places where music metadata could be used and I want to make sure that MusicBrainz gets into all of these nooks and crannies.
OKF: Where do you need help?
OKF: What can contributors do?
RK: We need help editing the data, cleaning it up and throwing out duplicate data. We also need documentation written and help answering emails from people who have questions.
OKF: Is there any work you need done that volunteers could help with?
RK: Tons! We’re driven by volunteers!
OKF: How can people get involved?
RK: Our homepage is suited just for this reason: http://musicbrainz.org/
Start tagging your music collection with MusicBrainz’ Picard and then spot problems in our data and help us fix them!