Dig the new breed, Part III – wrapping it all up

This is the third in the amazing series of guest blogs from Ant Beck on the impact of linked open data for archaeology.

Part 1: New approaches to archaeological data analysis, as seen in the DART and STAR projects
Part 2: Considering the ethics of sharing archaeological knowledge

OK, to recap we have:

A scientific movement that advocates open approaches to data, theory and practice
Emerging foundational interoperability using semantic web technology
The potential to remove a barrier and facilitate the submission of primary data

These three powerful factors could prove to be highly disruptive. In combination they have the potential to turn archaeological data and data repositories from static siloed islands (containing data that is increasingly stale) into an interlinked network of data nodes that reflect changes dynamically.

The linch-pin is the use of triplestores (RDF databases) that provide persistent identifiers. Persistent identifiers allow us to refer to a digital object (a statement, a file or set of files) in perpetuity, even if the underlying storage location moves. This means links between objects are persistent: therefore, when an observation or interpretation changes its effects are propagated through to all the data/events that link to it. I see organisations such as the ADS, Talis (an innovating semantic web technology provider which provide the Talis Platform which includes a free RDF hosting service for open data) and national heritage bodies providing such services.

Some open science projects are likely to adopt RDF as their de-facto data sharing format. RDF triples (subject, predicate, object) provide a schema transparent mechanism for data storage. They are not ideal for all data types (raster data structures for example) but when used with Ontology and SKOS, as demonstrated by STAR, they are powerful analytical, search and inference tools.

So, what is the importance of storing heritage data in RDF? Well, it depends which point of view you take. From a data management perspective there is no longer any need to migrate data formats. However, to facilitate re-use, different “views” of the RDF model can be generated and incorporated into traditional analytical software, such as GIS. Importantly, analysis stops being a “knowledge backwater”: new knowledge can be appended back into the triplestore.

From a data curation, re-use and analysis perspective the quality of the data has the potential to be dramatically improved. Deposition is no longer the final act of the excavation process: rather it is where the dataset can be integrated with other digital resources and analysed as part of the complex tapestry of heritage data. The data does not have to go stale: as the source data is re-interpreted and interpretation frameworks change these are dynamically linked through to the archives, hence, the data sets retain their integrity in light of changes in the surrounding and supporting knowledge system.

An example is probably useful at this juncture: In addition to many other things pottery provides essential dating evidence for archaeological contexts. However, pottery sequences are developed on a local basis by individuals with imperfect knowledge of the global situation. This means there is overlap, duplication and conflict between different pottery sequences which are periodically reconciled (your Type IIb sherd is the same as my Type IVd sherd and we can refine the dating range…… Hurrah… now let’s have another beer). This is the perennial process of lumping and splitting inherent in any classification system. Updated classifications and probable dates allow us to re-examine our existing classifications. One can reason over the data to find out which contexts, relationships and groups are impacted by a change in the dating sequences either by proxy or by logical inference (a change in the date of a context produces a logical inconsistency with a stratigraphically related group) While we’re on the topic of stratigraphy, an area of notorious tedium and poor quality data (often with conflicting relationships), RDF allows rapid logical consistency checking as stratigraphic relationships are basically a graph and RDF triples are a graph database. Publicly deposited RDF data should be linked data: this means that all the primary data archives are linked to their supporting knowledge frameworks (such as a pottery sequence). When a knowledge framework changes the implications are propagated through to the related data dynamically. This means that policy, development control and research decisions are based upon data that reflects the most-up-to date information and knowledge….. cool huh.

Incorporating excavation data into RDF means that ontology and SKOS can be used to dynamically repurpose the data for policy formulation, planning impact, regional heritage control and mitigation purposes in conjunction with the data in the Sites and Monuments Record (SMR). Raw data can be integrated from multiple different sources with different degrees of spatial and attribute granularity and, where appropriate, generalised so that the data is fit for the end users’ purpose. From a policy perspective curatorial officers no longer have to battle to stop datasets becoming stale and add new datasets to the local SMR. The SMR will remain an essential dataset: even though it is a generalised resource it is the only location of a digital record for resources that are unlikely to be digitised in the future (unless there is a very unlikely reverse in funding patterns). Thus the curatorial officer can develop more effective regional research agendas based upon up-to-date and accurate data.

This has the potential to change the way Historic Environment Information Resources (HEIRs) are managed by curatorial officers and transform how developers (property and software), policy makers and the general public engage with and consume any data. They will be able to support innovative access to primary linked data resources by researchers, planners and most importantly the public. This is a significant and important change in role. In addition the heritage data can be mashed up with other data resources to produce tailor made resources for different end-user communities – following the model successfully employed by data.gov.uk.

Data re-use and mashups are also important for those undertaking research and analysis. The big difference will be for those who undertake research or collect data that transcends different traditional analytical scales. For example, the National Mapping Programme which aims to “enhance the understanding of past human settlement, by providing primary information and synthesis for all archaeological sites and landscapes visible on aerial photographs or other airborne remote sensed data” will provider deeper insights when it is integrated with other data. However, this integration can occur in real time and add tangible interpretative depth. If an interpreter is digitising data from an aerial photograph and they see two ditches cutting one another they are unlikely to be able to tell the relative stratigraphic sequence of the two features. Direct access to excavation or other data will allow the full relationships and their interpretative relevance to be deduced during data collection.

In the longer term consumers of archaeological data will be more used to dealing with primary data, will become more aware of its potential and demand more of the resource. This should produce a ground up re-appraisal of recording systems and a better understanding of archaeological hermeneutics. The interpretative interplay between theory, practice and data as part of a dynamic knowledge system is essential. Although this has been recognised, in reality theory, practice and data have never really been joined up. We don’t have to use a one size fits all approach to conducting excavations, but we can tailor bespoke systems that address local, regional and national research challenges. We can generate interesting and provocative data that can be used to test theory and inform practice and move away from recording systems mired in the theoretical and intellectual paradigms of the mid 70’s.

The virtuous circle is re-established; theory will influence practice, which will change the nature of the data, which will impact on interpretative frameworks, which will provide a body of knowledge against which theory can be tested.

Final comments

There is a new breed: there are people and organisations who don’t want to do what’s always been done. People who are empowered and don’t believe that established institutions and hierarchies are the gatekeepers of progress: organisations that can, and want to, change the way we ‘play the game’, people who want to collaborate. Organisations that want to share. Open approaches can help to make all this happen. This is all facilitated by disruptive technology which is increasingly mature, broadly available for free (or at a low cost) and with low barriers of use and re-use. In the nearly twenty years of studying and working in the heritage sector I’ve seen it change dramatically. I feel we are on the cusp of changing the way we engage with our data which could profoundly alter the way we understand the past, how we can communicate this in the present and how we can sustainably manage a complex resource for the future.