Support Us

You are browsing the archive for WG Archaeology.

Building an archaeological project repository II: Where are the research data repositories?

Guest - April 17, 2014 in CKAN, Open Science, WG Archaeology

This is a guest post by Anthony Beck, Honorary fellow, and Dave Harrison, Research fellow, at the University of Leeds School of Computing


Data repository as research tool

In a previous post, we examined why Open Science is necessary to take advantage of the huge corpus of data generated by modern science. In our project Detection of Archaeological residues using Remote sensing Techniques, or DART, we adopted Open Science principles and made all the project’s extensive data available through a purpose-built data repository built on the open-source CKAN platform. But with so many academic repositories, why did we need to roll our own? A final post will look at how the portal was implemented.

DART: data-driven archaeology

DART’s overall aim is to develop analytical methods to differentiate archaeological sediments from non-archaeological strata, on the basis of remotely detected phenomena (e.g. resistivity, apparent dielectric permittivity, crop growth, thermal properties etc). DART is a data rich project: over a 14 month period, in-situ soil moisture, soil temperature and weather data were collected at least once an hour; ground based geophysical surveys and spectro-radiometry transects were conducted at least monthly; aerial surveys collecting hyperspectral, LiDAR and traditional oblique and vertical photographs were taken throughout the year, and laboratory analyses and tests were conducted on both soil and plant samples. The data archive itself is in the order of terabytes.

Analysis of this archive is ongoing; meanwhile, this data and other resources are made available through open access mechanisms under liberal licences and are thus accessible to a wide audience. To achieve this we used the open-source CKAN platform to build a data repository, DARTPortal, which includes a publicly queryable spatio-temporal database (on the same host), and can support access to individual data as well as mining or analysis of integrated data.

This means we can share the data analysis and transformation processes and demonstrate how we transform data into information and synthesise this information into knowledge (see, for example, this Ipython notebook which dynamically exploits the database connection). This is the essence of Open Science: exposing the data and processes that allow others to replicate and more effectively build on our science.

Lack of existing infrastructure

Pleased though we are with our data repository, it would have been nice not to have to build it! Individual research projects should not bear the burden of implementing their own data repository framework. This is much better suited to local or national institutions where the economies of scale come into their own. Yet in 2010 the provision of research data infrastructure that supported what DART did was either non-existent or poorly advertised. Where individual universities provided institutional repositories, these were focused on publications (the currency of prestige and career advancement) and not on data. Irrespective of other environments, none of the DART collaborating partners provided such a data infrastructure.

Data sharing sites like Figshare did not exist – and when it did exist the size of our hyperspectral data, in particular, was quite rightly a worry. This situation is slowly changing, but it is still far from ideal. The positions taken by Research Councils UK and the Engineering and Physical Science Research Council (EPSRC) on improving access to data are key catalysts for change. The EPSRC statement is particularly succinct:

Two of the principles are of particular importance: firstly, that publicly funded research data should generally be made as widely and freely available as possible in a timely and responsible manner; and, secondly, that the research process should not be damaged by the inappropriate release of such data.

This has produced a simple economic issue – if research institutions can not demonstrate that they can manage research data in the manner required by the funding councils then they will become ineligible to receive grant funding from that council. The impact is that the majority of universities are now developing their own, or collaborating on communal, data repositories.

But what about formal data deposition environments?

DART was generously funded through the Science and Heritage Programme supported by the UK Arts and Humanities Research Council (AHRC) and the EPSRC. This means that these research councils will pay for data archiving in the appropriate domain repository, in this case the Archaeology Data Service (ADS). So why produce our own repository?

Deposition to the ADS would only have occurred after the project had finished. With DART, the emphasis has been on re-use and collaboration rather than primarily on archiving. These goals are not mutually exclusive: the methods adopted by DART mean that we produced data that is directly suitable for archiving (well documented ASCII formats, rich supporting description and discovery metadata, etc) whilst also allowing more rapid exposure and access to the ‘full’ archive. This resulted in DART generating much richer resource discovery and description metadata than would have been the case if the data was simply deposited into the ADS.

The point of the DART repository was to produce an environment which would facilitate good data management practice and collaboration during the lifetime of the project. This is representative of a crucial shift in thinking, where projects and data collectors consider re-use, discovery, licences and metadata at a much earlier stage in the project life cycle: in effect, to create dynamic and accessible repositories that have impact across the broad stakeholder community rather than focussing solely on the academic community. The same underpinning philosophy of encouraging re-use is seen at both FigShare and DataHub. Whilst formal archiving of data is to be encouraged, if it is not re-useable, or more importantly easily re-useable, within orchestrated scientific workflow frameworks then what is the point.

In addition, it is unlikely that the ADS will take the full DART archive. It has been said that archaeological archives can produce lots of extraneous or redundant ‘stuff’. This can be exacerbated by the unfettered use of digital technologies – how many digital images are really required for the same trench? Whilst we have sympathy with this argument, there is a difference between ‘data’ and ‘pretty pictures’: as data analysts, we consider that a digital photograph is normally a data resource and rarely a pretty picture. Hence, every image has value.

This is compounded when advances in technology mean that new data can be extracted from ‘redundant’ resources. For example, Structure from Motion (SfM) is a Computer Vision technique that extracts 3D information from 2D objects. From a series of overlapping photographs, SfM techniques can be used to extract 3D point clouds and generate orthophotographs from which accurate measurements can be taken. In the case of SfM there is no such thing as redundancy, as each image becomes part of a ‘bundle’ and the statistical characteristics of the bundle determine the accuracy of the resultant model. However, one does need to be pragmatic, and it is currently impractical for organisations like the ADS to accept unconstrained archives. That said, it is an area that needs review: if a research object is important enough to have detailed metadata created about it, then it should be important enough to be archived.

For DART, this means that the ADS is hosting a subset of the archive in long-term re-use formats, which will be available in perpetuity (which formally equates to a maximum of 25 years), while the DART repository will hold the full archive in long term re-use formats until we run out of server money. We are are in discussion with Leeds University to migrate all the data objects over to the new institutional repository with sparkling new DOIs and we can transfer the metadata held in CKAN over to Open Knowledge’s public repository, the dataHub. In theory nothing should be lost.

How long is forever?

The point on perpetuity is interesting. Collins Dictionary defines perpetuity as ‘eternity’. However, the ADS defines ‘digital’ perpetuity as 25 years. This raises the question: is it more effective in the long term to deposit in ‘formal’ environments (with an intrinsic focus on preservation format over re-use), or in ‘informal’ environments (with a focus on re-use and engagement over preservation (Flickr, Wikimedia Commons, DART repository based on CKAN, etc)? Both Flickr and Wikimedia Commons have been around for over a decade. Distributed peer to peer sharing, as used in Git, produces more robust and resilient environments which are equally suited to longer term preservation. Whilst the authors appreciate that the situation is much more nuanced, particularly with the introduction of platforms that facilitate collaborative workflow development, this does have an impact on long-term deployment.

Choosing our licences

Licences are fundamental to the successful re-use of content. Licences describe who can use a resource, what they can do with this resource and how they should reference any resource (if at all).

Two lead organisations have developed legal frameworks for content licensing, Creative Commons (CC) and Open Data Commons (ODC). Until the release of CC version 4, published in November 2013, the CC licence did not cover data. Between them, CC and ODC licences can cover all forms of digital work.

At the top level the licences are permissive public domain licences (CC0 and PDDL respectively) that impose no restrictions on the licensees use of the resource. ‘Anything goes’ in a public domain licence: the licensee can take the resource and adapt it, translate it, transform it, improve upon it (or not!), package it, market it, sell it, etc. Constraints can be added to the top level licence by employing the following clauses:

  • BY – By attribution: the licensee must attribute the source.
  • SA – Share-alike: if the licensee adapts the resource, they must release the adapted resource under the same licence.
  • NC – Non commercial: the licensee must not use the work within a commercial activity without prior approval. Interestingly, in many area of the world, the use of material in university lectures may be considered a commercial activity. The non-commercial restriction about the nature of the activity, not the legal status of the institution doing the work.
  • ND – No derivatives: the licensee can not derive new content from the resource.

Each of these clauses decreases the ‘open-ness’ of the resource. In fact, the NC and ND clause are not intrinsically open (they restrict both who can use and what you can do with the resource). These restrictive clauses have the potential to produce license incompatibilities which may introduce profound problems in the medium to long term. This is particularly relevant to the SA clause. Share-alike means that any derived output must be licensed under the same conditions as the source content. If content is combined (or mashed up) – which is essential when one is building up a corpus of heritage resources – then content created under a SA clause can not be combined with content that includes a restrictive clause (BY, NC or ND) that is not in the source licence. This licence incompatibility has a significant impact on the nature of the data commons. It has the potential to fragment the data landscape creating pockets of knowledge which are rarely used in mainstream analysis, research or policy making. This will be further exacerbated when automated data aggregation and analysis systems become the norm. A permissive licence without clauses like Non-commercial, Share-alike or No-derivatives removes such licence and downstream re-user fragmentation issues.

For completeness, specific licences have been created for Open Government Data. The UK Government Data Licence for public sector information is essentially an open licence with a BY attribution clause.

At DART we have followed the guidelines of The Open Data Institute and separated out creative content (illustrations, text, etc.) from data content. Hence, the DART content is either CC-BY or ODC-BY respectively. In the future we believe it would be useful to drop the BY (attribution) clause. This would stop attribute stacking (if the resource you are using is a derivative of a derivative of a derivative of a ….. (you get the picture), at what stage do you stop attribution) and anything which requires bureaucracy, such as attributing an image in a powerpoint presentation, inhibits re-use (one should always assume that people are intrinsically lazy). There is a post advocating ccZero+ by Dan Cohen. However, impact tracking may mean that the BY clause becomes a default for academic deposition.

The ADS uses a more restrictive bespoke default licence which does not map to national or international licence schemes (they also don’t recognise non CC licences). Resources under this licence can only be used for teaching, learning, and research purposes. Of particular concern is their use of the NC clause and possible use of the ND clause (depending on how you interpret the licence). Interestingly, policy changes mean that the use of data under the bespoke ADS licence becomes problematic if university teaching activities are determined to be commercial. It is arguable that the payment of tuition fees represents a commercial activity. If this is true then resources released under the ADS licence can not be used within university teaching which is part of a commercial activity. Hence, the policy change in student tuition and university funding has an impact on the commercial nature of university teaching which has a subsequent impact on what data or resources universities are licensed to use. Whilst it may never have been the intention of the ADS to produce a licence with this potential paradox, it is a problem when bespoke licences are developed, even if they were originally perceived to be relatively permissive licences. To remove this ambiguity it is recommended that submissions to the ADS are provided under a CC licence which renders the bespoke ADS licence void.

In the case of DART, these licence variations with the ADS should not be a problem. Our licences are permissive (by attribution is the only clause we have included). This means the ADS can do anything they want with our resources as long as they cite the source. In our case this would be the individual resource objects or collections on the DART portal. This is a good thing, as the metadata on the DART portal is much richer than the metadata held by the ADS.

Concerns about opening up data, and responses which have proved effective

Christopher Gutteridge (University of Southampton) and Alexander Dutton (University of Oxford) have collated a Google doc entitled ‘Concerns about opening up data, and responses which have proved effective‘. This document describes a number of concerns commonly raised by academic colleagues about increasing access to data. For DART two issues became problematic that were not covered by this document:

  • The relationship between open data and research novelty and the impact this may have on a PhD submission.
  • Journal publication – specifically that a journal won’t publish a research paper if the underlying data is open.

The former point is interesting – does the process of undertaking open science, or at least providing open data, undermine the novelty of the resultant scientific process? With open science it could be difficult to directly attribute the contribution, or novelty, of a single PhD student to an openly collaborative research process. However, that said, if online versioning tools like Git are used, then it is clear who has contributed what to a piece of code or a workflow (the benefits of the BY clause). This argument is less solid when we are talking solely about open data. Whilst it is true that other researchers (or anybody else for that matter) have access to the data, it is highly unlikely that multiple researchers will use the same data to answer exactly the same question. If they do ask the same question (and making the optimistic assumption that they reach the same conclusion), it is still highly unlikely that they will have done so by the same methods; and even if they do, their implementations will be different. If multiple methods using the same source data reach the same conclusion then there is an increased likelihood that the conclusion is correct and that the science is even more certain. The underlying point here is that 21st-century scientific practice will substantially benefit from people showing their working. Exposure of the actual process of scientific enquiry (the algorithms, code, etc.) will make the steps between data collection and publication more transparent, reproduceable and peer-reviewable – or, quite simply, more scientific. Hence, we would argue that open data and research novelty is only a problem if plagiarism is a problem.

The journal publication point is equally interesting. Publications are the primary metric for academic career progression and kudos. In this instance it was the policy of the ‘leading journal in this field’ that they would not publish a paper from a dataset that was already published. No credible reasons were provided for this clause – which seems draconian in the extreme. It does indicate that no one size fits all approach will work in the academic landscape. It will also be interesting to see how this journal, which publishes work which is mainly funded by EPSRC, responds to the EPSRC guidelines on open data.

This is also a clear demonstration that the academic community needs to develop new metrics that are more suited to 21st century research and scholarship by directly link academic career progression to other source of impact that go beyond publications. Furthermore, academia needs some high-profile exemplars that demonstrate clearly how to deal with such change. The policy shift and ongoing debate concerning ‘Open access’ publications in the UK is changing the relationship between funders, universities, researchers, journals and the public – a similar debate needs to occur about open data and open science.

The altmetrics community is developing new metrics for “analyzing, and informing scholarship” and have described their ethos in their manifesto. The Research Councils and Governments have taken a much greater interest in the impact of publically funded research. Importantly public, social and industry impact are as important as academic impact. It is incumbent on universities to respond to this by directly linking academic career progression through to impact and by encouraging improved access to the underlying data and procesing outputs of the research process through data repositories and workflow environments.

Building an archaeological project repository I: Open Science means Open Data

Guest - February 24, 2014 in CKAN, Open Science, WG Archaeology

This is a guest post by Anthony Beck, Honorary fellow, and Dave Harrison, Research fellow, at the University of Leeds School of Computing.

In 2010 we authored a series of blog posts for the Open Knowledge Foundation subtitled ‘How open approaches can empower archaeologists’. These discussed the DART project, which is on the cusp of concluding.

The DART project collected large amounts of data, and as part of the project, we created a purpose-built data repository to catalogue this and make it available, using CKAN, the Open Knowledge Foundation’s open-source data catalogue and repository. Here we revisit the need for Open Science in the light of the DART project. In a subsequent post we’ll look at why, with so many repositories of different kinds, we felt that to do Open Science successfully we needed to roll our own.

Open data can change science

Open inquiry is at the heart of the scientific enterprise. Publication of scientific theories – and of the experimental and observational data on which they are based – permits others to identify errors, to support, reject or refine theories and to reuse data for further understanding and knowledge. Science’s powerful capacity for self-correction comes from this openness to scrutiny and challenge. (The Royal Society, Science as an open enterprise, 2012)

The Royal Society’s report Science as an open enterprise identifies how 21st century communication technologies are changing the ways in which scientists conduct, and society engages with, science. The report recognises that ‘open’ enquiry is pivotal for the success of science, both in research and in society. This goes beyond open access to publications (Open Access), to include access to data and other research outputs (Open Data), and the process by which data is turned into knowledge (Open Science).

The underlying rationale of Open Data is this: unfettered access to large amounts of ‘raw’ data enables patterns of re-use and knowledge creation that were previously impossible. The creation of a rich, openly accessible corpus of data introduces a range of data-mining and visualisation challenges, which require multi-disciplinary collaboration across domains (within and outside academia) if their potential is to be realised. An important step towards this is creating frameworks which allow data to be effectively accessed and re-used. The prize for succeeding is improved knowledge-led policy and practice that transforms communities, practitioners, science and society.

The need for such frameworks will be most acute in disciplines with large amounts of data, a range of approaches to analysing the data, and broad cross-disciplinary links – so it was inevitable that they would prove important for our project, Detection of Archaeological residues using Remote sensing Techniques (DART).

DART: data-driven archaeology

DART aimed is to develop analytical methods to differentiate archaeological sediments from non-archaeological strata, on the basis of remotely detected phenomena (e.g. resistivity, apparent dielectric permittivity, crop growth, thermal properties etc). The data collected by DART is of relevance to a broad range of different communities. Open Science was adopted with two aims:

  • to maximise the research impact by placing the project data and the processing algorithms into the public sphere;
  • to build a community of researchers and other end-users around the data so that collaboration, and by extension research value, can be enhanced.

‘Contrast dynamics’, the type of data provided by DART, is critical for policy makers and curatorial managers to assess both the state and the rate of change in heritage landscapes, and helps to address European Landscape Convention (ELC) commitments. Making the best use of the data, however, depends on openly accessible dynamic monitoring, along the lines of that developed for the Global Monitoring for Environment and Security (GMES) satellite constellations under development by the European Space Agency. What is required is an accessible framework which allows all this data to be integrated, processed and modelled in a timely manner.

It is critical that policy makers and curatorial managers are able to assess both the state and the rate of change in heritage landscapes. This need is wrapped up in national commitments to the European Landscape Convention (ELC). Making the best use of the data, however, depends on openly accessible dynamic monitoring, along similar lines to that proposed by the European Space Agency for the Global Monitoring for Environment and Security (GMES) satellite constellations. What is required is an accessible framework which allows all this data to be integrated, processed and modelled in a timely manner. The approaches developed in DART to improve the understanding and enhance the modelling of heritage contrast detection dynamics feeds directly into this long-term agenda.

Cross-disciplinary research and Open Science

Such approaches cannot be undertaken within a single domain of expertise. This vision can only be built by openly collaborating with other scientists and building on shared data, tools and techniques. Important developments will come from the GMES community, particularly from precision agriculture, soil science, and well documented data processing frameworks and services. At the same time, the information collected by projects like DART can be re-used easily by others. For example, DART data has been exploited by the Royal Agricultural University (RAU) for use in such applications as carbon sequestration in hedges, soil management, soil compaction and community mapping. Such openness also promotes collaboration: DART partners have been involved in a number of international grant proposals and have developed a longer term partnership with the RAU.

Open Science advocates opening access to data, and other scientific objects, at a much earlier stage in the research life-cycle than traditional approaches. Open Scientists argue that research synergy and serendipity occur through openly collaborating with other researchers (more eyes/minds looking at the problem). Of great importance is the fact that the scientific process itself is transparent and can be peer reviewed: as a result of exposing data and the processes by which these data are transformed into information, other researchers can replicate and validate the techniques. As a consequence, we believe that collaboration is enhanced and the boundaries between public, professional and amateur are blurred.

Challenges ahead for Open Science

Whilst DART has not achieved all its aims, it has made significant progress and has identified some barriers in achieving such open approaches. Key to this is the articulation of issues surrounding data-access (accreditation), licensing and ethics. Who gets access to data, when, and under what conditions, is a serious ethical issue for the heritage sector. These are obviously issues that need co-ordination through organisations like Research Councils UK with cross-cutting input from domain groups. The Arts and Humanities community produce data and outputs with pervasive social and ethical impact, and it is clearly important that they have a voice in these debates.

Cultural Anthropology journal to go Open Access by 2014

Theodora Middleton - March 13, 2013 in Open Access, WG Archaeology

We’re really pleased by this week’s announcement from the Society of Cultural Anthropology that their influential journal, Cultural Anthropology will become open access by next year. The plan is that from the first issue of 2014, the journal will be available online globally under an open access license, along with 10 years’ worth of the back catalogue.

From their press release:

This is a boon to our authors, whose work we can guarantee the widest possible readership —and to a new generation of readers inside of anthropology and out. Cultural Anthropology will be the first major, established, high-impact journal in anthropology to offer open access to all of its research, and we hope that our experience with open access will provide the AAA as a whole, as well as other journals in the social and human sciences, valuable guidance as we explore alternative publishing models together.

As far as we can see, the specifics of licensing are yet to be figured out, as are other logistical questions like where the journal will be hosted and what it’s financial model is going to look like. Still a lot of work to be done, then, in making this a sustainable and truly open reality, but we’re really happy their taking the plunge!

Look out for opportunities to discuss these transitionary issues on their website.

A Year in the Life of Open Archaeology (and some upcoming events to look out for)

Stefano Costa - December 5, 2011 in WG Archaeology

This update from the working group on Open Data in Archaeology is brought to you by Nicole Beale and Leif Isaksen. Nicole is a PhD candidate based in the Archaeological Computing Research Group and the Web Science Research Group, University of Southampton. Leif is a Research Fellow in the Archaeological Computing Research Group, University of Southampton.

As 2011 draws to an end, it seemed timely to put together a quick update on a year’s happenings around Open Archaeology, as well as providing a brief overview of upcoming events relating to open access within the discipline, and sector of Archaeology.

Forthcoming 2012 Open Archaeology Events

Over the next six months, there are a few events that will contribute to the on-going effort to promote the importance of open access, open data, and open knowledge with Archaeology. In particular, the annual Computing Applications and Quantitative Methods in Archaeology Conference 2012, which is being hosted by the Archaeological Computing Research Group at the University of Southampton in the UK (26-30th March 2012) will include a number of prominent ‘Open Archaeology’ events:

  • Nicole Beale and Leif Isaksen (disclaimer: this is us!) will be chairing a session that is intended to provide a showcase for projects and theory related to the subject of Open Content in Archaeology. The session intends to cover legal and practical issues and end with a discussion of lessons learned and future action. Session details: The Shoulders of Giants: Open Content in Archaeology
  • Matteo Romanello , Felix Schäfer and Reinhard Förtsch will be chairing a session considering the use of linked open data for the study of the ancient world, considering opportunities and challenges represented by issues such as publication of data, use of live applications, digital libraries and URIs of objects. Session details: Linked Open Data for the Ancient World

There are also numerous other sessions that will be including papers covering open data. The call for abstracts has been extended until the 7th December 2011, so please do submit soon if you are planning to contribute to these sessions! CAA abstract submission details.

In an exciting development, CAA2012 introduces the annual CAA Recycle Award. CAA Recycle Award seeks to recognise those who “breathe new life into old data”, and will be presently jointly to:

  • The best exemplar of data re-use at a CAA International Conference (the recycler)
  • The project or institution that made available the source dataset/s (the originator/s).

To follow the CAA2012 twitter account, use the hashtag #caa2012 or the user account @caasoton.

There has been much work to advertise the benefits of open access in archaeology, and the forthcoming events continue this great trend.

Outline of Open Archaeology of 2011

So, a quick review of 2011 follows. I have picked out some notable projects and events here, but by no means have I intended to cover all of the great open content/access/data/science Archaeology projects and events that occurred in 2011. If I have missed any useful references, please do submit them to this post via the comments thread below.

On 24th March, the PELAGIOS (Pelagios: Enable Linked Ancient Geodata In Open Systems) project, which uses Linked Open Data to refer to places in the ancient world, ran a workshop at Kings College London, on Linking Open GeoData in the Humanities. The workshop covered three key themes: referencing ancient and contemporary places online, lightweight ontology approaches, and methods for generating, publishing and consuming compliant data. Gregory Marler’s workshop write-up provides a useful summary.

In April, the Research Information Network’s report “Reinventing research? Information practices in the humanities, which provided case studies for discovery and use of information, mentioned COPAC (an open access catalogue, integrating numerous databases), and put forward open access journals as a desirable dissemination practice.

In mid-May the Workshop “Archaeologists & the Digital: Towards Strategies of Engagement” with the Centre for Audio Visual Study and Practice in Archaeology and the Archaeology and Communication Research Network, at UCL Institute of Archaeology, included presentations and discussions on the benefits of open access or Archaeology. In particular, Brian Hole’s presentation, ‘Open Access and Open Data – and why they matter for archaeology’, covered the opportunities open access provides for collaboration and research not previously possible. Hole discussed the potential of repositories and appropriate licensing. Hole’s presentation is available through Prezi, and Daniel Pett’s write-up of the workshop is available through the 7 Pillars of Wisdom blog. If you have access to the Public Archaeology journal, there is also a review article covering the event by Pett available there. Reference: D. Pett, “Review Article. Archaeologists & the Digital: Towards Strategies of Engagement. A workshop of The Centre for Audio-Visual Study and Practice in Archaeology and the Archaeology and Communication Research Network at UCL Institute of Archaeology, 26th May 2011,” Public Archaeology, vol. 10, no. 2, pp. 119-127, May. 2011.

In the summer, the Archaeology Data Service released as part of Data Train, a set of open access teaching materials for the management of research data for Archaeology postgraduate students.

The excellent Day of Archaeology on 29th July, in which over 400 archaeologists participated, included numerous references to the benefits of open access and open data. Some of those posts are included below:

In September, Ant Beck presented the Detection of Archaeological Residues using remote sensing Techniques (DART) project, which embraces an Open Science approach, at the British Science Festival (read the press pack here).

In this same month, the British Museum released a semantic web endpoint to the Collection Online search tool. The press release on the ResearchSpace site told us that “The Museum is the first UK arts organisation to instigate a Semantic Web version of its collection data. The new service brings the British Museum into the ‘linked data’ world and will allow software developers to produce their own applications that can directly manipulate and reuse the data.” The collection data has been mapped to the CIDOC-CRM, and is available on the Collection Space of the British Museum.

In October, the e-journal Internet Archaeology (which is based on a hybrid open access model) went totally open access as part of Open Access Week (24-30th October). Press releases, and mailing list messages informed readers that this was in anticipation of plans to “move fully towards a sustainable Open Access (OA) model.”

Phew, quite an eventful year. Here’s to 2012 providing as many, if not more, excellent opportunities to promote open access, open data, open science, and open knowledge in Archaeology. I for one am most excited about the CAA2012, where I am sure that we will see many great open data examples.

Nicole Beale and Leif Isaksen

Coarse Glazed Ware IV

Cultural Heritage rights in the age of digital copyright

Stefano Costa - December 21, 2010 in COMMUNIA, Events, Public Domain, WG Archaeology, WG Cultural Heritage, Workshop

The following guest post is from Stefano Costa at the University of Siena. Stefano is Founder of the IOSA initiative and Coordinator of the Open Knowledge Foundation‘s Working Group on Open Data in Archaeology.

On December, 10th the COMMUNIA WG3 gathered in Istanbul for the final workshop, with the aim of producing a set of recommendations about cultural heritage and the public domain.

I am not a lawyer, so I took a chance to learn about the marked differences between access rights and property rights. More than that, it became soon clear that Cultural Heritage rights (CHR) only exist in certain EU member states (e.g. Italy, Greece) while in others there are no such rights.

This poses a first set of basic problems: a Finn tourist taking a photograph of the Parthenon in Athens might actually be violating Greek CHR, especially if she’s going to publish the resulting image on the Web. Same would happen in Italy, not just inside museums but also for public buildings and panoramas. On the other hand, Portugal only listed 5 buildings that cannot be freely photographed. Apparently Finland poses no restrictions on photographing of CH, be it historical buildings or artistic creations.

CH laws were mostly conceived in a pre-digital age and even those that got recently revamped (like the Italian case) apparently ignore the ease of creating digital reproductions of CH items at no cost and with no risk of damaging the items themselves. Cultural Heritage institutions (CHI) claim quasy-property rights over the artifacts they are custodians of, thus posing serious restrictions not just to personal usage, but also to the development of public repositories like Wikimedia Commons. As the recent GLAMWIKI event at the British Museum showed, some institutions are engaging with open content creators in a positive way, claiming their role of primacy by sharing the knowledge they have, rather than closing their doors and keeping the best for themselves.

In the case of licensing, the widespread distinction between commercial and non-commercial use is really harmful and poses more problems than it solves. What is particularly frustrating is that this distinction doesn’t take into account the existance of the Commons and of the Public Domain, in other words content that can be both commercial and non-commercial at the same time. A photographer might want to publish her photographs of Archaic korai under a CC-BY-SA license, thus enabling any kind of reuse, from the incorporation into Wikimedia Commons to the publishing on a tourist guide or a textbook.

Here a further distinction is worth: most CH items are in the Public Domain themselves (because they were made several centuries ago), but the same doesn’t currently apply to their digital reproductions. If the r. is basically a mechanical operation, one might argue that no copyright should apply to the reproduction, too. Clearly, the distinction between a work that is creative and one that is not is going to be very dangerous in the case of photography and ultimately impossible (think about those monuments that are photographed thousands of times per day).

The fact that going into these subtle juridic details takes so much time and effort is, alone, a good example of the difficulties that this double layer of rights is posing.

The recommendations we collected are aimed in the direction of clearing the nature and extent of CHR, and of maximising the benefits for the Commons and the Public Domain. CHR should not be property rights but rather access rights, thus posing no limitations on subsequent copies of the first reproduction once this takes place. If there is going to be a fee for commercial use of reproductions, the process has to be easy and quick. The policy for museum visitors should be “open by default” and larger institutions (or networks) might ask digital publishers like bloggers and wikipedians to link back to the original item – even though this assumes that there’s a digital collection available on the Web. Licensing of such collections is beyond the scope of COMMUNIA, and CH is also explicitly excluded from the EU PSI directive. There was some work done by the LAPSI project at the last meeting in Barcelona about this, and the survey launched by the European Commission might help in changing this situation. Clearly, countries like Italy and Greece might see this as “selling out” one of their major assets for economic development. We believe the opposite, and tried to develop our discussion around the concept of cultural heritage as infrastructure, just like the road network or the public green, that needs to be maintained for the benefit of all citizens and the overall development of society.

CHI want to retain control over items and buildings that they often regard as “theirs”, but this need has to live together with the fact that millions of people want to share digital content about cultural heritage on the web. Ultimately, this fact should be regarded as a very positive thing, if the mission of institutions is to maximise the awareness of Cultural Heritage among the public and the impact it has on the social and economic life of EU citizens.

Enriched publications in Dutch archaeology

Guest - September 20, 2010 in WG Archaeology

The following guest post is from Janneke Adema, researcher in the department of media and communications at the University of Coventry, member of the Open Knowledge Foundation’s Working Group on Open Data in Archaeology and coordinator of the OKF Working Group on Open Resources in the Humanities.

Archaeological data are currently exposed as an appendix in traditional archaeological publishing. Digital publishing of monographs and journals enables experiments in integrating text and images with other content formats, thus bridging the gap between presentation of results and dissemination of research data. This post is about research performed in the Netherlands for the Journal of Archaeology in the Low Countries (JALC). In the Netherlands, archiving of research data is done in a centralized way, and part of the system is already Open Access. The findings and results of the JALC project on Enriched publications in Dutch Archaeology are relevant beyond Dutch borders, and should be taken into consideration for the development of a sustainable strategy for Open Data in archaeology.

The enrichment of publications in archaeology, focusing on an integrated presentation of publications and research data, is rather a new development. However, it is a development that offers huge potential for archaeological communication. Many aspects still need to be resolved and explored pertaining to specific technological, archaeological and user issues. Over the last year, as part of the SURFshare project “Enriched publications in Dutch Archaeology”, we have been conducting research into user needs and expectations concerning enriched publications. The Enriched publications project was based on setting up an infrastructure between the Open Access e-journal Journal of Archaeology in the Low Countries (JALC) and the ‘e-depot Nederlandse archeologie’ (EDNA), in cooperation with Data Archiving and Networked Services (DANS) and the Digitaal Productiecentrum (DPC). Within this project the importance of the input and involvement of future users of the yet-to-be-created system was clearly felt.

The main questions we wanted to see answered in the user study were firstly, whether there was support in the Dutch and Belgian archaeological community for enhanced publications; secondly, what archaeologists see as the main possibilities and drawbacks of enhanced publications; and thirdly, what kind of services or enhancements they would most like to see. The first phase of research was based on interviews with 14 archaeologists and a literature study. The second phase involved an external evaluation, using an online survey, after the publication of the first two issues of JALC.

More information about the project, including details on the user needs studies as well as the results of the other work packages can be found at the project website

Benefits and drawbacks

The potential benefits of an enhanced publication according to the interviewees concerned the addition of material that would otherwise not fit in a printed publication; the increased efficiency of scholarly communication, leading to increased data transparency; and the wider dissemination of their work to, and the sharing of data with, their peers. The drawbacks of the new format were also clearly seen and felt, especially concerning the extra work and time that will go into enhancing (particularly given the lack of true incentive), the potential for enhancement to cause information overload and distraction, the financing and upkeep (data interoperability, solid infrastructure) of the enhancements, and the ownership, quality establishment, and peer review of the additional material.

We also asked which added services or enhancements the interviewees would most like to see. Apart from the possibility of GIS maps (which most of the interviewees were quite aware might be hard to implement, but which they would love to have), many of the enhancements deemed most important are the most basic ones: the possibility of adding color, enhanced search options, and the possibility of adding a database or dataset of images.

This preference for basic services and enhancements was one of the indications which supported the notion that, at least within our interviewee group, the static print-based article or print paradigm still seemed to be very much the norm. The added services were mostly seen as enhancements of things that are simply harder to achieve in a print publication. Reading from print is still preferred and a rather ‘traditional’ view pertains when it comes to formal publishing concerning quality standards, peer review, copyright issues, and the updating of papers. There seemed to be no support within our archaeological test-community to go from mere enhancements to more liquid and fluid forms of publications where the article becomes more wiki-like for instance.


After the first two issues of JALC were published—which included a number of enhanced publications—we went back to our test-community to see whether their views relating to the enhancements had changed. This evaluation showed an increased support and enthusiasm for the enhancements. In general the respondents were quite positive about the enhancements that were eventually established in JALC, especially with respect to the more elaborate ones, for instance the GIS applications. A large majority of the participants believed the enhancements improved the quality of the publications. Some of them even felt they made you focus more on the text. A majority of the participants were also more willing to provide an enhanced publication themselves after having seen the enhancements in JALC.

Comments focused mainly on technical details, on formats and design, as well as on the general outlook of the enhancements. It was clearly felt the enhancements ‘should really add something’, in other words, that the benefits from the extra digital products should be clear and have a meaningful relation. One of the main concerns had to do with the financial sustainability of the enhancements. It is not clear whether the fact that JALC is an Open Access journal might have had something to do with the insecurity towards the financial sustainability. However, it was felt some more information and support from the publisher’s side (or a link to more information) could perhaps be beneficial, as could some more information on who pays for what and how the costs for set up and maintenance are being met. Furthermore, our research showed that information, guidance, and advice from the publisher on creating and delivering an enhanced publication is essential, at least for the time being.

Conclusion: missionary work still needed

From the user needs research and the evaluation afterwards, we concluded that there is a large base of potential support for enhanced publications but a lot of missionary work still needs to be done. This is necessary not so much to show the community what the benefits of enhanced publications are, but rather to relieve fears and uncertainties regarding the new format. Secondly, we concluded that more experiments need to be done to establish a clear infrastructure and clearer policies when it comes to enhanced publications, to give an example to archaeologists of what an enhanced publication might look like in the context of the Open Access e-journal JALC. Practical experiments should still take the print paradigm and more traditional scholarly communication methods as their starting point, to ensure the best uptake of the new format in the archaeological community and to appease fears and uncertainties.

Pollen data in the New and Old World

Guest - July 14, 2010 in External, Open Data, Open Science, Open/Closed, Technical, WG Archaeology, WG Open Data in Science

The following guest post is from Stefano Costa at the University of Siena. He is Founder of the IOSA initiative and Coordinator of the Open Knowledge Foundation‘s Working Group on Open Data in Archaeology. Stefano wishes to thank Thomas Kluyver and David Jones for their help in reviewing the post.

Since the 19th century, the study of archaeobotanical remains has been very important for combining “strict” archaeological knowledge with environmental data. Pollen data enable assessing the introduction of certain domesticated species of plants, or the presence of other species that grow typically where humans dwell. Not all pollen data come from archaeological fieldwork, and pollen analysis is often done by ecologists without a particular focus on human-associated plants. However, from an archaeologist’s perspective the relationship among the two sets is strong enough to take an interested look at pollen data worldwide, their availability and most importantly their openness, for which we follow the Open Knowledge Definition.

We found that there is a serious misunderstanding by universities and research centers of their role in society as places of research, innovation that is available for everyone. As for dendrochronological data, academia is a closed system producing data (at very high costs for society) that are only available inside its walls, but it’s all done with public money.

Finding pollen data

The starting point for finding pollen data is the NOAA website.

The Global Pollen Database hosted by the NOAA is a good starting point, but apparently its coverage is quite limited outside the US. Furthermore, data from 2005 onwards aren’t available via FTP in simple documented formats, but are instead downloadable as Access databases from another external website. Defining Access databases as a Bad Choice™ for data exchange is perhaps an euphemism.

Unfortunately, a large number of databases covering single continents or smaller regions is growing, and the approaches to data dissemination show marked differences.


For both North and South America, you can get data from more than one thousand sites directly via FTP. There are no explicit terms of use. Usually, data retrieved from federal agencies are public domain data.

The README document only states NOTE: PLEASE CITE ORIGINAL REFERENCES WHEN USING THIS DATA!!!!!. Fair enough, the requirement for attribution is certainly compatible with the Open Knowledge Definition.


From the GPD website we can easily reach the European Pollen Database, that is found at another website tough (and things can be even more confusing, provided that the NOAA website has some dead links).

You can download EPD data in PostgreSQL dump format (one file for each table, with a separate SQL script create_epd_db.sql). Data in the EPD can be restricted or unrestricted. That’s fine, let’s see how many unrestricted datasets there are. Following the database documentation, the P_ENTITY table contains the use status of each dataset:

steko@gibreel:~/epd-postgres-distribution-20100531$ cat p_entity.dump |
awk -F "t" {' print $5 '} | sort | uniq -c
    154 R
   1092 U

which is pretty good because almost 88% of them are unrestricted (NB I write most of my programs in Python but I love one liners that involve awksort and uniq). We could easily create an “unrestricted” subset and make it available for easy download to all those who don’t want to mess up with restricted data.

But what do “unrestricted” mean for EPD data? Let’s take a more careful look (emphasis mine):

  1. Data will be classified as restricted or unrestricted. All data will be available in the EPD, although restricted data can be used only as provided below.
  2. Unrestricted data are available for all uses, and are included in the EPD on various electronic sites.
  3. Restricted data may be used only by permission of the data originator. Appropriate and ethical use of restricted data is the responsibility of the data user.
  4. Restrictions on data will expire three years after they are submitted to the EPD. Just prior to the time of expiration, the data originator will be contacted by the EPD database manager with a reminder of the pending change. The originator may extend restricted status for further periods of three years by so informing the EPD each time a three-year period expires.

Sounds quite good, doesn’t it? “for all uses” is reassuring and the short time limit is a good trade off. The horror comes a few paragraphs below with the following scary details:

  1. The data are available only to non-profit-making organizations and for research.
Profit-making organizations may use the data, even for legitimate uses, only with the written consent of the EPD Board, who will determine or negotiate the payment of any fee required.

Here the false assumption that only academia is entitled to perform research is taken for granted. And there are even more rules about the “normal ethics”: basically if you use EPD data in a publication the original data author should be listed among the authors of the work. I always thought citation and attribution were invented just for that exact purpose, but it looks like they have distinctly different approach to attribution. The EPD is even deciding what are “legitimate” uses of pollen data (I can hardly think of any possible unlegitimate use).


For “Africa” read “Europe” again, because most research projects are from French and English universities. For this reason, the situation is almost the same. What is even worst is that in developing countries there are far less people or organizations that can afford buying those data, notwithstanding the fact that in regions under rapid development the study and preservation of environmental resources are of major importance.

Data are downloadable for individual sites using a search engine, in Tilia format (not ASCII unfortunately). The problems come out with the license:

The wording is almost exactly the same as for the EPD seen above:

Normal ethics pertaining to co-authorship of publications applies. The contributor should be invited to be a co-author if a user makes significant use of a single contributor’s site, or if a single contributor’s data comprise a substantial portion of a larger data set analysed, or if a contributor makes a significant contribution to the analysis of the data or to the interpretation of the results. The data will be available only to non-profit-making organisations and for research. Profit-making organisations may use the data for legitimate purposes, only with the written consent of the majority of the members of the Advisory board, who will determine or negotiate the payment of any fee required. Such payment will be credited to the APD.


The only positive bit of the story, if any, is that these datasets are nevertheless available on the web, and their terms of use are clearly stated, no matter how restrictive. It would be just impossible to write a similar article about archaeological pottery, or zooarchaeological finds.

Appendix: Using pollen data

Pollen data are usually presented in forms of synthetic charts where both stratigraphic data and quantitative pollen data are easily readable. Each “column” of the chart stands for a species or genus. You can create this kind of visualization with free software tools.

The stratigraph package for R can be used for

plotting and analyzing paleontological and geological data distributed through through time in stratigraphic cores or sections. Includes some miscellaneous functions for handling other kinds of palaeontological and paleoecological data.

See the chart for an example of how they look like.

Open Context

Guest - July 13, 2010 in External, Open Access, WG Archaeology, Working Groups

The following guest blog is from Open Context’s Project Lead Eric Kansa and Editor Sarah Whitcher Kansa, who are both members of the Open Knowledge Foundation‘s Working Group on Open Data in Archaeology.

About Open Context

Open Context is a free, open access resource for the electronic publication of primary field research from archaeology and related disciplines. We developed it to help scholars and students to easily find and reuse field science data and media. The system makes data searchable and citable, with robust archival support from the California Digital Library. The Alexandria Archive Institute, an independent 501(c)(3) non-profit organization, maintains Open Context and provides editorial oversight for Open Context content. The project has been funded by the William and Flora Hewlett Foundation, the National Endowment for the Humanities (NEH), and the Institute of Museum and Library Services (IMLS)

Key Features of Open Context:

  • Data publication (with peer review, if desired) of datasets, images, maps, and related items.
  • Stable URL to every project and individual item within a project. Projects, items, and groups of items can be cited elsewhere and permanently linked to print publications.
  • Citation provided for every project and item.
  • Project and Person information to provide the user with more in-depth knowledge about the author and background of the study. *Faceted navigation that enables users to compose analytically precise searches and queries through a simple point and click interface.
  • Web services, including Atom feeds, so that content can by syndicated and visualized elsewhere on the Web.
  • Creative Commons licenses so that datasets are legally free for reuse.

Open Data Publication and Archiving with Open Context

Open Context emphasizes publication to work with familiar patterns in scholarly communication and encourage data dissemination in the research community. To this end, Open Context does not disseminate raw data but instead relies on editorial supervision to add description, documentation, and structure to researcher-contributed content. This transforms raw data into a more polished and intelligible product that is still as detailed and comprehensive as the original field documentation.

In development since late 2006, Open Context now hosts over 180,000 items, including nearly 5,000 media items, from 35 archaeological sites around the world. The current rate of publication is about one project per month, and we hope to increase that rate as our publication tools become more streamlined. While Open Context contains mainly archaeological content, it can also accommodate content from other field-based sciences (public health, conservation biology, geological sciences, etc.), so please feel free to get in touch if you have data you would like to publish.

To see some of this at work, check out the recently-published Aegean Prehistory Project, featuring data on shells recovered from three archaeological sites in the Aegean. Canan Çak?rlar published these data as an online appendix to the printed publication of her Ph.D. dissertation. In addition to an overview of her project and a link to where the printed publication can be purchased, Canan also has a “person page” with information about her work, publications, etc. Her data has been drawn (via an Atom feed) into BoneCommons, a Web resource for the worldwide zooarchaeology community. Thus, Canan’s work can be found via a search engine, Open Context, BoneCommons, or any other place that draws her content from Open Context. This makes for maximum exposure of her work to her colleagues, as well as its discovery by others for uses beyond archaeomalacology.

How Open is Open Context?

Open Context requires use of Creative Commons licenses (or the CC-Zero public domain dedication). Open Context also makes all data, including structured data (in XML, JSON, and CSV formats) freely available with no login barrier.

However, Open Context does permit use of license variants that restrict commercial use. While this restriction does inhibit interoperability, some stakeholder communities, especially indigenous groups, have deep historical and political concerns regarding commercial uses of cultural heritage materials. These concerns represent complex ethical challenges, but do highlight how the ideal of “openness” needs to be evaluated by other ethical considerations.

Incentives and Guidance for Openness

The National Science Foundation recently announced additional requirements for grant-seekers to develop meaningful “Data Access” plans for their proposals. Many researchers will have little background or understanding on how best to meet this requirement. To offer guidance for the researcher community, Open Context now offers guidelines for researchers to prepare data for online publication. In addition to these, we have developed an online estimation tool, which helps scholars budget appropriately for data sharing, guide them through licensing choices, and offer tips regarding good practices in data sharing. The estimation tool then returns texts to researchers that can be used in their NSF Data Access plans.

Dig the new breed, Part III – wrapping it all up

jwalsh - June 11, 2010 in External, Ideas and musings, Uncategorized, WG Archaeology

This is the third in the amazing series of guest blogs from Ant Beck on the impact of linked open data for archaeology.

Part 1: New approaches to archaeological data analysis, as seen in the DART and STAR projects Part 2: Considering the ethics of sharing archaeological knowledge

OK, to recap we have:

  • A scientific movement that advocates open approaches to data, theory and practice
  • Emerging foundational interoperability using semantic web technology
  • The potential to remove a barrier and facilitate the submission of primary data

These three powerful factors could prove to be highly disruptive. In combination they have the potential to turn archaeological data and data repositories from static siloed islands (containing data that is increasingly stale) into an interlinked network of data nodes that reflect changes dynamically.

The linch-pin is the use of triplestores (RDF databases) that provide persistent identifiers. Persistent identifiers allow us to refer to a digital object (a statement, a file or set of files) in perpetuity, even if the underlying storage location moves. This means links between objects are persistent: therefore, when an observation or interpretation changes its effects are propagated through to all the data/events that link to it. I see organisations such as the ADS, Talis (an innovating semantic web technology provider which provide the Talis Platform which includes a free RDF hosting service for open data) and national heritage bodies providing such services.

Some open science projects are likely to adopt RDF as their de-facto data sharing format. RDF triples (subject, predicate, object) provide a schema transparent mechanism for data storage. They are not ideal for all data types (raster data structures for example) but when used with Ontology and SKOS, as demonstrated by STAR, they are powerful analytical, search and inference tools.

So, what is the importance of storing heritage data in RDF? Well, it depends which point of view you take. From a data management perspective there is no longer any need to migrate data formats. However, to facilitate re-use, different “views” of the RDF model can be generated and incorporated into traditional analytical software, such as GIS. Importantly, analysis stops being a “knowledge backwater”: new knowledge can be appended back into the triplestore.

Linked Data concepts in archaeology

Linked Data concepts in archaeology

From a data curation, re-use and analysis perspective the quality of the data has the potential to be dramatically improved. Deposition is no longer the final act of the excavation process: rather it is where the dataset can be integrated with other digital resources and analysed as part of the complex tapestry of heritage data. The data does not have to go stale: as the source data is re-interpreted and interpretation frameworks change these are dynamically linked through to the archives, hence, the data sets retain their integrity in light of changes in the surrounding and supporting knowledge system.

An example is probably useful at this juncture: In addition to many other things pottery provides essential dating evidence for archaeological contexts. However, pottery sequences are developed on a local basis by individuals with imperfect knowledge of the global situation. This means there is overlap, duplication and conflict between different pottery sequences which are periodically reconciled (your Type IIb sherd is the same as my Type IVd sherd and we can refine the dating range…… Hurrah… now let’s have another beer). This is the perennial process of lumping and splitting inherent in any classification system. Updated classifications and probable dates allow us to re-examine our existing classifications. One can reason over the data to find out which contexts, relationships and groups are impacted by a change in the dating sequences either by proxy or by logical inference (a change in the date of a context produces a logical inconsistency with a stratigraphically related group) While we’re on the topic of stratigraphy, an area of notorious tedium and poor quality data (often with conflicting relationships), RDF allows rapid logical consistency checking as stratigraphic relationships are basically a graph and RDF triples are a graph database. Publicly deposited RDF data should be linked data: this means that all the primary data archives are linked to their supporting knowledge frameworks (such as a pottery sequence). When a knowledge framework changes the implications are propagated through to the related data dynamically. This means that policy, development control and research decisions are based upon data that reflects the most-up-to date information and knowledge….. cool huh.

Incorporating excavation data into RDF means that ontology and SKOS can be used to dynamically repurpose the data for policy formulation, planning impact, regional heritage control and mitigation purposes in conjunction with the data in the Sites and Monuments Record (SMR). Raw data can be integrated from multiple different sources with different degrees of spatial and attribute granularity and, where appropriate, generalised so that the data is fit for the end users’ purpose. From a policy perspective curatorial officers no longer have to battle to stop datasets becoming stale and add new datasets to the local SMR. The SMR will remain an essential dataset: even though it is a generalised resource it is the only location of a digital record for resources that are unlikely to be digitised in the future (unless there is a very unlikely reverse in funding patterns). Thus the curatorial officer can develop more effective regional research agendas based upon up-to-date and accurate data.

This has the potential to change the way Historic Environment Information Resources (HEIRs) are managed by curatorial officers and transform how developers (property and software), policy makers and the general public engage with and consume any data. They will be able to support innovative access to primary linked data resources by researchers, planners and most importantly the public. This is a significant and important change in role. In addition the heritage data can be mashed up with other data resources to produce tailor made resources for different end-user communities – following the model successfully employed by

Data re-use and mashups are also important for those undertaking research and analysis. The big difference will be for those who undertake research or collect data that transcends different traditional analytical scales. For example, the National Mapping Programme which aims to “enhance the understanding of past human settlement, by providing primary information and synthesis for all archaeological sites and landscapes visible on aerial photographs or other airborne remote sensed data” will provider deeper insights when it is integrated with other data. However, this integration can occur in real time and add tangible interpretative depth. If an interpreter is digitising data from an aerial photograph and they see two ditches cutting one another they are unlikely to be able to tell the relative stratigraphic sequence of the two features. Direct access to excavation or other data will allow the full relationships and their interpretative relevance to be deduced during data collection.

In the longer term consumers of archaeological data will be more used to dealing with primary data, will become more aware of its potential and demand more of the resource. This should produce a ground up re-appraisal of recording systems and a better understanding of archaeological hermeneutics. The interpretative interplay between theory, practice and data as part of a dynamic knowledge system is essential. Although this has been recognised, in reality theory, practice and data have never really been joined up. We don’t have to use a one size fits all approach to conducting excavations, but we can tailor bespoke systems that address local, regional and national research challenges. We can generate interesting and provocative data that can be used to test theory and inform practice and move away from recording systems mired in the theoretical and intellectual paradigms of the mid 70’s.

The virtuous circle is re-established; theory will influence practice, which will change the nature of the data, which will impact on interpretative frameworks, which will provide a body of knowledge against which theory can be tested.

Final comments

There is a new breed: there are people and organisations who don’t want to do what’s always been done. People who are empowered and don’t believe that established institutions and hierarchies are the gatekeepers of progress: organisations that can, and want to, change the way we ‘play the game’, people who want to collaborate. Organisations that want to share. Open approaches can help to make all this happen. This is all facilitated by disruptive technology which is increasingly mature, broadly available for free (or at a low cost) and with low barriers of use and re-use. In the nearly twenty years of studying and working in the heritage sector I’ve seen it change dramatically. I feel we are on the cusp of changing the way we engage with our data which could profoundly alter the way we understand the past, how we can communicate this in the present and how we can sustainably manage a complex resource for the future.

Dig the new breed, Part II – open archaeology and ethics

jwalsh - June 11, 2010 in External, Ideas and musings, WG Archaeology

The second in this great series of three guest blogs by Ant Beck. See Part 1 for applications of linked data and remote sensing in archaeology. Part 3 will wrap things up and talk about the disruptive implications of linked open data for impact of archaeology.

Open Science provides the framework for producing transparent and reproducible science by providing open access to raw data, algorithms and interpretations. Efforts such as STAR and STELLAR provide the foundation from which fine granularity excavation data can be made available as part of the semantic web and feed into Open Science analysis. This provides answers to the questions of how and why we should have open access to archaeological data. However, it does not provide answers to what data should be opened or if archaeological data should be opened at all. We move into the sphere of ethics and open archaeology.

Treasure seeking - CC-BY-SA-NC

Recently I have chatted to a number of people and organisations who want to open up heritage data. The conversations tend to have an ethical component. Like other disciplines, such as ecology, there are potential ethical issues in making heritage data open. The oft touted reason, in the UK at least, is that if access is given to this information then it will be exploited by “night hawkers” (irresponsible metal-detectorists) and other “treasure hunters” and sites (a term I don’t really like) will be destroyed.

This argument is polarised and plays to the lowest common denominator: it is based on the premise that “accessible knowledge will inevitably be abused” and eschews any of the benefits that data sharing can provide. Nor does it consider the nuanced ethical arguments concerning re-appropriation of artifacts collected under imperialist regimes or the ethical conundrum surrounding research into aboriginal or other indigenous communities (which, now that I’ve raised them I wont comment on them further). The Portable Antiquities Scheme has done much to improve this argument.

The elephant in the room in this debate concerns those archaeologists who have sat on their archive for decades. We know of its significance but it is not available for academic and research analysis and does not inform the planning process. This has enormous impact on local planning policy, public and academic understanding, theory, practice etc. Since, the 1990 introduction of Planning Policy Guidance 16 (PPG16: essentially commercial archaeology) in the UK, and the later Planning Policy Statement 5 has improved the situation a bit.

But I find the situation somewhat paradoxical. The UK curatorial systems expect that a generalised summary, or synthesis, of any investigation is deposited with the regional curatorial officers. This data is entered into the Sites and Monuments Record (SMR) and is publicly accessible. Therefore, the public has access to a generalised dataset. The expectations for primary, or raw, data are different: it’s considered ethically appropriate to deposit fine granularity data (i.e. non-generalised, primary, data, such as those from excavation) with the Archaeology Data Service (ADS), however, there are issues raised if an individual wants to do this outside such formal structures (however, the Perry Oaks Project have released redacted versions of their site data).

Is this an issue of ethics, or where formal and informal work practices collide; or is this simply an issue of cost, where individuals and organisations have the will but not the finances? Alternatively, and possibly most likely, do archaeologists just feel uncomfortable making their fine grained data available to a mass audience without going through a representative authority such as the ADS? My feeling is that within the archaeology domain there is an informal belief that if data is deposited with a repository then the repository also takes the ethical responsibility if the data is released. Deposition so that data is available in perpetuity is part of business and academic best practice, however, deposition does not necessarily mean release and subsequent consumption by other parties (public or otherwise).

Whatever the answer the point remains: archaeologists, for right or wrong, consider the implications of placing fine grained data in the public domain and “Ethical considerations” have been identified as a “barrier” to deposition. However, there appears to be limited guidance as to how to resolve these issues. This means that many archaeologists are re-inventing the wheel. The challenge is to provide some supporting “thing” that makes it easy for individuals and organisations to get to a clear, and hopefully unambiguous, ethical position. Such a “thing” will reduce uncertainty thereby removing one of the barriers to data sharing. The current default position is the equivalent of doing nothing: surely this must change.

Supporting “stuff” which is recognised and approved by national heritage organisations and standards bodies will act as important lubricant to help individuals and groups to release data through informal channels. It should be recognised that the relationship between the “citizen”, the archaeologists and heritage data will change: citizen science and citizen data, will play more of a role in heritage than ever before. Hence, a focus on the informal is important: we don’t want more grey data so we? The Portable Antiquities Scheme is the “poster boy” for archaeological approaches to citizen science – although they do have a range of different user access levels.

I raised this as a topic for the Archaeology working group at the Open Knowledge Foundation. Response so far has been positive and has spilled over to colleagues in the curatorial sector and beyond (the discussion thread can be found here). We’ll be setting up a meeting to discuss these issues later in 2010. Both the Archaeology Data Service and the University of Leeds have kindly offered a venue.

There’s also a start at creating an ethics statement on open access to raw archaeological data – a statement that should be supportable by institutions and individual researchers alike. If you’d like to get involved, please join the Open Archaeology working group and mailing list – involvement could be helping to craft the ethics statement, asking your institution to contribute its own statement, helping to plan and document the workshop.

Get Updates