Dig the New Breed: How open approaches can empower archaeologists- Part I
Very happy to post the first in an amazing series of OKFN guest blogs by Ant Beck, a member of the Open Archaeology working group. Ant discusses the DART project and the STAR project, both of which employed Linked Data in a heritage context. Later we’ll get into the ethics of open heritage, and a vision for the future of archaeological data.
The title “Dig the New Breed” is taken from the presentation I gave at the Open Knowledge Conference 2010. I did this for two reasons: It’s a terrible play on words (dig is employed as a synonym for “excavation” and “To like”) and I like name-checking “The Jam”. As this series of posts has taken form, it’s changed from being a piece about Open Science and Ethics into something about how disruptive technologies can be implemented to transform how the heritage sector operates.
STAR & STELLAR – Anyone for linked heritage data?
I recently attended a STAR project workshop and saw a glimpse of the future. The Semantic Technologies for Archaeological Resources (STAR) project investigated “the potential of semantic terminology tools for widening and improving access to heritage resources, exploring the possibilities of combining a high level, core ontology with domain thesauri and natural language processing techniques”. The project has looked at extracting structured knowledge from “grey literature” using Natural Language Processing (NLP) tools – all very worthy and interesting but not something I’m directly excited by as “grey literature” is essentially tertiary data (an extraction of synthetic data derived from the primary record). In addition they have developed an RDF based approach to query data stored in heterogeneous excavation databases. WOW!
And in case you missed that…. querying data stored in heterogeneous excavation databases. Essentially they have resolved syntactic (platform/format), schematic (structural) and semantic (language) heterogeneities by generating mappings of key fields (i.e. a sub-set of the source data) to the English Heritage extension of the CIDOC Conceptual Reference Model (CRM), extracting the data as RDF and providing semantic interoperability through Knowledge Organization Systems (KOS) represented in SKOS format from standard heritage thesauri. In essence, they extract RDF from relational databases using hand crafted mappings to both SKOS and ontology articulating semantics and canonical concepts respectively.
The combination of RDF, ontology and SKOS have allowed the team to produce a demonstrator capable of cross searching different excavation databases with “difficult queries”. The team demonstrated that they could address questions such as, show me contexts that satisfy the following criteria:
- Roman corn drying ovens with palaeobotanical analysis
- Charred plant remains and charcoal from 4 post structures
- Post holes that contain ritual deposits
Granted there are limitations: it currently supports a sub-set of the data collected during excavation and the RDF model is viewed as an interim tool with users going back to the source databases to conduct further analysis. However, the concept has been definitively demonstrated. Great stuff! The impact of this work is profound: the SKOS and ontology will allow inferencing/reasoning over the data which will transform the way the data can be re-used, analysed and generalised (more on this in a bit).
The Glamorgan team have a follow on project called Semantic Technologies Enhancing Links and Linked data for Archaeological Resources (STELLAR) funded by the AHRC. One of the aims of STELLAR is to develop “best practice guidelines and tools … both for mapping/extracting archaeological data as RDF and for generating archaeological Linked Data”. This will take the research developed in STAR and provide tools so that it can be deployed to mainstream archaeological data. I’m really looking forward to seeing the roll-out of this technology.
DART and Open Science
DART is an acronym for Detection of Archaeological Residues using remote sensing Techniques. DART is a three year Science and Heritage initiative funded by AHRC and EPSRC, led by the School of Computing at the University of Leeds. The project aims to improve the understanding of the physical, chemical, biological and environmental factors that determine whether an archaeological feature (pit, ditch, posthole etc.) can be detected by a sensor (camera, Ground Penetrating Radar, etc.). DART brings together consultants and researchers from the areas of computer vision, geophysics, remote sensing, knowledge engineering and soil science.
Archaeological sites and features are created by localized processes of formation and deformation. There are a range of imaging instruments that can be used to detect these archaeological residues, although, the knowledge required to determine what, when, how and why to use each different type of sensor is patchy. Seasonal, environmental and vegetation dynamics play a part, although the complexities of interaction and how they modify “contrast signatures” derived from the existing formation and deformation processes is uncertain.
This is important so I will provide an example: as a mud-brick built farmstead erodes, the silt, sand, clay, large clasts and organics in the mud-brick along with other anthropogenic debris are incorporated into the soil. This produces a localised variation in soil size and structure. This in turn impacts on drainage and localised crop stress and vigour. These localised variations can all provide measurable differences, or contrasts, that indicate the presence of archaeology.
For example, archaeological residues can affect drainage of the soil, which then affects the appearance of crops. Different drainage characteristics result in different soil moisture retention properties, and local variations in crop stress/vigour can be observed as differences in crop height or crop colour (essentially crop marks). Archaeological contrasts can be expressed through, for example, variations in chemistry, magnetic field, resistance, topography, temperature and spectral reflectance.
The DART project is trying to identify physical, chemical and biological contrast factors that may allow us to detect archaeological residues (both directly and by proxy) under different land-use and environmental conditions. We address the following research issues:
- What are the factors that produce archaeological contrasts?
- How do these contrast processes vary over space and time?
- What processes cause these variations?
- How can we best detect these contrasts (sensors and conditions)?
DART is committed to open science principles and aims to act as an exemplar for how data, tools, and analysis can be made available to the wider academic, heritage and general community. Data, software, algorithms and services developed throughout the project will be made available for re-use with appropriate open licences.
Licensing is an issue as license incompatibility can severely restrict re-use. Science Commons is establishing protocols in this area. Publicly accessible dissemination is preferred, however, where necessary domain specific or institutional repositories will be utilised for long-term preservation. Cameron Neylon is part of the project consortium and provides steer on these issues.
The whole point of taking an open science position on this project is so that we can maximise the benefit and impact. The research problem is large and complex: one project will not solve it. Inevitably the science will need refining; adequate articulation will require long term data collection under different conditions, followed by iterative hypothesis testing and modelling. The challenge is to get this information in the quickest, cheapest and easiest ways. An Open Science approach means that DART is openly collaborating with researchers and individuals throughout the world. The body of work developed within DART can be easily re-used by others: our results can be tested as the data and algorithms will be in the public domain, which means that they can be rapidly evaluated and easily re-used. Unlocking the “body of knowledge” and “know how” surrounding a programme of research should significantly reduce the barriers to re-use. This may generate a critical mass of surrounding research, which can only improve the underlying models and science. Providing scientists with the methodology of how to make the wheel will not only stop us reinventing it, but will also improve the manufacturing process.