Rufus Pollock

Rufus Pollock is Founder and President of Open Knowledge.

More Reading

Post navigation

7 Comments

  • @Jo: the question of data aggregates is an important one but I’m not entirely sure what you mean by it. Do you mean that people create changesets against an aggregate dataset (with all the issues about how that is persisted down to the underlying dataset or how the changeset is applied when the aggregate is refreshed) or are you thinking of the issue of derived datasets?

    In either case, the problems aren’t easy though I do not know how much harder they are than in code. In code areas you will have software that ‘aggregates’ underlying libraries. There the approach so far seems mainly based on the use of versioning and the specification of the version in dependencies.

  • @John: not sure why you need to trust everyone in a distributed setup. The whole point there is that anyone can make changes to their copy of the data but who I choose to pull changes from is up to me.

    I’m also concerned that the example the wikis I gave (which was trying to make it more non-technical) may have misled people. Even in that model the point was that it would be up to each group if and when they pulled changes from another wiki.

  • Data language translation and developing compression for every generalizable type of data are first steps for
    your goal.

  • As Jo Walsh says above, without trusting who you
    are doing the version control with, it’s no use.

    So the distributed part of your wish is going to be limited to trustworthy members of groups, not a wikipedia type of collaboration.

    JSON may be handy, but may not be enough to deal
    well with the volume of what you are wishing for…
    the data in database driven sites…

    from Wikipedia: “JSON parsing must ironically be accomplished on a character-by-character basis. Additionally, the standard has no provision for data compression, interning of strings, or object references.”

    For every data type you want to version control, the compression you want to always have in a distributed system will probably have to come from a translator tool for the specific data language into a “compressed lowest common denominator form”, a form that is lossless and translates back or to another language completely.

    Data language translation and developing compression for every generalizable type of data are first steps for you goal.

    John

Leave a Reply

Your email address will not be published. Required fields are marked *

back to top