Support Us

You are browsing the archive for WG Archaeology.

Building an archaeological project repository I: Open Science means Open Data

Guest - February 24, 2014 in CKAN, Open Science, WG Archaeology

This is a guest post by Anthony Beck, Honorary fellow, and Dave Harrison, Research fellow, at the University of Leeds School of Computing.

In 2010 we authored a series of blog posts for the Open Knowledge Foundation subtitled ‘How open approaches can empower archaeologists’. These discussed the DART project, which is on the cusp of concluding.

The DART project collected large amounts of data, and as part of the project, we created a purpose-built data repository to catalogue this and make it available, using CKAN, the Open Knowledge Foundation’s open-source data catalogue and repository. Here we revisit the need for Open Science in the light of the DART project. In a subsequent post we’ll look at why, with so many repositories of different kinds, we felt that to do Open Science successfully we needed to roll our own.

Open data can change science

Open inquiry is at the heart of the scientific enterprise. Publication of scientific theories – and of the experimental and observational data on which they are based – permits others to identify errors, to support, reject or refine theories and to reuse data for further understanding and knowledge. Science’s powerful capacity for self-correction comes from this openness to scrutiny and challenge. (The Royal Society, Science as an open enterprise, 2012)

The Royal Society’s report Science as an open enterprise identifies how 21st century communication technologies are changing the ways in which scientists conduct, and society engages with, science. The report recognises that ‘open’ enquiry is pivotal for the success of science, both in research and in society. This goes beyond open access to publications (Open Access), to include access to data and other research outputs (Open Data), and the process by which data is turned into knowledge (Open Science).

The underlying rationale of Open Data is this: unfettered access to large amounts of ‘raw’ data enables patterns of re-use and knowledge creation that were previously impossible. The creation of a rich, openly accessible corpus of data introduces a range of data-mining and visualisation challenges, which require multi-disciplinary collaboration across domains (within and outside academia) if their potential is to be realised. An important step towards this is creating frameworks which allow data to be effectively accessed and re-used. The prize for succeeding is improved knowledge-led policy and practice that transforms communities, practitioners, science and society.

The need for such frameworks will be most acute in disciplines with large amounts of data, a range of approaches to analysing the data, and broad cross-disciplinary links – so it was inevitable that they would prove important for our project, Detection of Archaeological residues using Remote sensing Techniques (DART).

DART: data-driven archaeology

DART aimed is to develop analytical methods to differentiate archaeological sediments from non-archaeological strata, on the basis of remotely detected phenomena (e.g. resistivity, apparent dielectric permittivity, crop growth, thermal properties etc). The data collected by DART is of relevance to a broad range of different communities. Open Science was adopted with two aims:

  • to maximise the research impact by placing the project data and the processing algorithms into the public sphere;
  • to build a community of researchers and other end-users around the data so that collaboration, and by extension research value, can be enhanced.

‘Contrast dynamics’, the type of data provided by DART, is critical for policy makers and curatorial managers to assess both the state and the rate of change in heritage landscapes, and helps to address European Landscape Convention (ELC) commitments. Making the best use of the data, however, depends on openly accessible dynamic monitoring, along the lines of that developed for the Global Monitoring for Environment and Security (GMES) satellite constellations under development by the European Space Agency. What is required is an accessible framework which allows all this data to be integrated, processed and modelled in a timely manner.

It is critical that policy makers and curatorial managers are able to assess both the state and the rate of change in heritage landscapes. This need is wrapped up in national commitments to the European Landscape Convention (ELC). Making the best use of the data, however, depends on openly accessible dynamic monitoring, along similar lines to that proposed by the European Space Agency for the Global Monitoring for Environment and Security (GMES) satellite constellations. What is required is an accessible framework which allows all this data to be integrated, processed and modelled in a timely manner. The approaches developed in DART to improve the understanding and enhance the modelling of heritage contrast detection dynamics feeds directly into this long-term agenda.

Cross-disciplinary research and Open Science

Such approaches cannot be undertaken within a single domain of expertise. This vision can only be built by openly collaborating with other scientists and building on shared data, tools and techniques. Important developments will come from the GMES community, particularly from precision agriculture, soil science, and well documented data processing frameworks and services. At the same time, the information collected by projects like DART can be re-used easily by others. For example, DART data has been exploited by the Royal Agricultural University (RAU) for use in such applications as carbon sequestration in hedges, soil management, soil compaction and community mapping. Such openness also promotes collaboration: DART partners have been involved in a number of international grant proposals and have developed a longer term partnership with the RAU.

Open Science advocates opening access to data, and other scientific objects, at a much earlier stage in the research life-cycle than traditional approaches. Open Scientists argue that research synergy and serendipity occur through openly collaborating with other researchers (more eyes/minds looking at the problem). Of great importance is the fact that the scientific process itself is transparent and can be peer reviewed: as a result of exposing data and the processes by which these data are transformed into information, other researchers can replicate and validate the techniques. As a consequence, we believe that collaboration is enhanced and the boundaries between public, professional and amateur are blurred.

Challenges ahead for Open Science

Whilst DART has not achieved all its aims, it has made significant progress and has identified some barriers in achieving such open approaches. Key to this is the articulation of issues surrounding data-access (accreditation), licensing and ethics. Who gets access to data, when, and under what conditions, is a serious ethical issue for the heritage sector. These are obviously issues that need co-ordination through organisations like Research Councils UK with cross-cutting input from domain groups. The Arts and Humanities community produce data and outputs with pervasive social and ethical impact, and it is clearly important that they have a voice in these debates.

Cultural Anthropology journal to go Open Access by 2014

Theodora Middleton - March 13, 2013 in Open Access, WG Archaeology

We’re really pleased by this week’s announcement from the Society of Cultural Anthropology that their influential journal, Cultural Anthropology will become open access by next year. The plan is that from the first issue of 2014, the journal will be available online globally under an open access license, along with 10 years’ worth of the back catalogue.

From their press release:

This is a boon to our authors, whose work we can guarantee the widest possible readership —and to a new generation of readers inside of anthropology and out. Cultural Anthropology will be the first major, established, high-impact journal in anthropology to offer open access to all of its research, and we hope that our experience with open access will provide the AAA as a whole, as well as other journals in the social and human sciences, valuable guidance as we explore alternative publishing models together.

As far as we can see, the specifics of licensing are yet to be figured out, as are other logistical questions like where the journal will be hosted and what it’s financial model is going to look like. Still a lot of work to be done, then, in making this a sustainable and truly open reality, but we’re really happy their taking the plunge!

Look out for opportunities to discuss these transitionary issues on their website.

A Year in the Life of Open Archaeology (and some upcoming events to look out for)

Stefano Costa - December 5, 2011 in WG Archaeology

This update from the working group on Open Data in Archaeology is brought to you by Nicole Beale and Leif Isaksen. Nicole is a PhD candidate based in the Archaeological Computing Research Group and the Web Science Research Group, University of Southampton. Leif is a Research Fellow in the Archaeological Computing Research Group, University of Southampton.

As 2011 draws to an end, it seemed timely to put together a quick update on a year’s happenings around Open Archaeology, as well as providing a brief overview of upcoming events relating to open access within the discipline, and sector of Archaeology.

Forthcoming 2012 Open Archaeology Events

Over the next six months, there are a few events that will contribute to the on-going effort to promote the importance of open access, open data, and open knowledge with Archaeology. In particular, the annual Computing Applications and Quantitative Methods in Archaeology Conference 2012, which is being hosted by the Archaeological Computing Research Group at the University of Southampton in the UK (26-30th March 2012) will include a number of prominent ‘Open Archaeology’ events:

  • Nicole Beale and Leif Isaksen (disclaimer: this is us!) will be chairing a session that is intended to provide a showcase for projects and theory related to the subject of Open Content in Archaeology. The session intends to cover legal and practical issues and end with a discussion of lessons learned and future action.
    Session details: The Shoulders of Giants: Open Content in Archaeology
  • Matteo Romanello , Felix Schäfer and Reinhard Förtsch will be chairing a session considering the use of linked open data for the study of the ancient world, considering opportunities and challenges represented by issues such as publication of data, use of live applications, digital libraries and URIs of objects.
    Session details: Linked Open Data for the Ancient World

There are also numerous other sessions that will be including papers covering open data. The call for abstracts has been extended until the 7th December 2011, so please do submit soon if you are planning to contribute to these sessions! CAA abstract submission details.

In an exciting development, CAA2012 introduces the annual CAA Recycle Award. CAA Recycle Award seeks to recognise those who “breathe new life into old data”, and will be presently jointly to:

  • The best exemplar of data re-use at a CAA International Conference (the recycler)
  • The project or institution that made available the source dataset/s (the originator/s).

To follow the CAA2012 twitter account, use the hashtag #caa2012 or the user account @caasoton.

There has been much work to advertise the benefits of open access in archaeology, and the forthcoming events continue this great trend.

Outline of Open Archaeology of 2011

So, a quick review of 2011 follows. I have picked out some notable projects and events here, but by no means have I intended to cover all of the great open content/access/data/science Archaeology projects and events that occurred in 2011. If I have missed any useful references, please do submit them to this post via the comments thread below.

On 24th March, the PELAGIOS (Pelagios: Enable Linked Ancient Geodata In Open Systems) project, which uses Linked Open Data to refer to places in the ancient world, ran a workshop at Kings College London, on Linking Open GeoData in the Humanities. The workshop covered three key themes: referencing ancient and contemporary places online, lightweight ontology approaches, and methods for generating, publishing and consuming compliant data. Gregory Marler’s workshop write-up provides a useful summary.

In April, the Research Information Network’s report “Reinventing research? Information practices in the humanities, which provided case studies for discovery and use of information, mentioned COPAC (an open access catalogue, integrating numerous databases), and put forward open access journals as a desirable dissemination practice.

In mid-May the Workshop “Archaeologists & the Digital: Towards Strategies of Engagement” with the Centre for Audio Visual Study and Practice in Archaeology and the Archaeology and Communication Research Network, at UCL Institute of Archaeology, included presentations and discussions on the benefits of open access or Archaeology. In particular, Brian Hole’s presentation, ‘Open Access and Open Data – and why they matter for archaeology’, covered the opportunities open access provides for collaboration and research not previously possible. Hole discussed the potential of repositories and appropriate licensing. Hole’s presentation is available through Prezi, and Daniel Pett’s write-up of the workshop is available through the 7 Pillars of Wisdom blog. If you have access to the Public Archaeology journal, there is also a review article covering the event by Pett available there. Reference: D. Pett, “Review Article. Archaeologists & the Digital: Towards Strategies of Engagement. A workshop of The Centre for Audio-Visual Study and Practice in Archaeology and the Archaeology and Communication Research Network at UCL Institute of Archaeology, 26th May 2011,” Public Archaeology, vol. 10, no. 2, pp. 119-127, May. 2011.

In the summer, the Archaeology Data Service released as part of Data Train, a set of open access teaching materials for the management of research data for Archaeology postgraduate students.

The excellent Day of Archaeology on 29th July, in which over 400 archaeologists participated, included numerous references to the benefits of open access and open data. Some of those posts are included below:

In September, Ant Beck presented the Detection of Archaeological Residues using remote sensing Techniques (DART) project, which embraces an Open Science approach, at the British Science Festival (read the press pack here).

In this same month, the British Museum released a semantic web endpoint to the Collection Online search tool. The press release on the ResearchSpace site told us that “The Museum is the first UK arts organisation to instigate a Semantic Web version of its collection data. The new service brings the British Museum into the ‘linked data’ world and will allow software developers to produce their own applications that can directly manipulate and reuse the data.” The collection data has been mapped to the CIDOC-CRM, and is available on the Collection Space of the British Museum.

In October, the e-journal Internet Archaeology (which is based on a hybrid open access model) went totally open access as part of Open Access Week (24-30th October). Press releases, and mailing list messages informed readers that this was in anticipation of plans to “move fully towards a sustainable Open Access (OA) model.”

Phew, quite an eventful year. Here’s to 2012 providing as many, if not more, excellent opportunities to promote open access, open data, open science, and open knowledge in Archaeology. I for one am most excited about the CAA2012, where I am sure that we will see many great open data examples.

Nicole Beale and Leif Isaksen

Coarse Glazed Ware IV

Cultural Heritage rights in the age of digital copyright

Stefano Costa - December 21, 2010 in COMMUNIA, Events, Public Domain, WG Archaeology, WG Cultural Heritage, Workshop

The following guest post is from Stefano Costa at the University of Siena. Stefano is Founder of the IOSA initiative and Coordinator of the Open Knowledge Foundation‘s Working Group on Open Data in Archaeology.

On December, 10th the COMMUNIA WG3 gathered in Istanbul for the final workshop, with the aim of producing a set of recommendations about cultural heritage and the public domain.

I am not a lawyer, so I took a chance to learn about the marked differences between access rights and property rights. More than that, it became soon clear that Cultural Heritage rights (CHR) only exist in certain EU member states (e.g. Italy, Greece) while in others there are no such rights.

This poses a first set of basic problems: a Finn tourist taking a photograph of the Parthenon in Athens might actually be violating Greek CHR, especially if she’s going to publish the resulting image on the Web. Same would happen in Italy, not just inside museums but also for public buildings and panoramas. On the other hand, Portugal only listed 5 buildings that cannot be freely photographed. Apparently Finland poses no restrictions on photographing of CH, be it historical buildings or artistic creations.

CH laws were mostly conceived in a pre-digital age and even those that got recently revamped (like the Italian case) apparently ignore the ease of creating digital reproductions of CH items at no cost and with no risk of damaging the items themselves. Cultural Heritage institutions (CHI) claim quasy-property rights over the artifacts they are custodians of, thus posing serious restrictions not just to personal usage, but also to the development of public repositories like Wikimedia Commons. As the recent GLAMWIKI event at the British Museum showed, some institutions are engaging with open content creators in a positive way, claiming their role of primacy by sharing the knowledge they have, rather than closing their doors and keeping the best for themselves.

In the case of licensing, the widespread distinction between commercial and non-commercial use is really harmful and poses more problems than it solves. What is particularly frustrating is that this distinction doesn’t take into account the existance of the Commons and of the Public Domain, in other words content that can be both commercial and non-commercial at the same time. A photographer might want to publish her photographs of Archaic korai under a CC-BY-SA license, thus enabling any kind of reuse, from the incorporation into Wikimedia Commons to the publishing on a tourist guide or a textbook.

Here a further distinction is worth: most CH items are in the Public Domain themselves (because they were made several centuries ago), but the same doesn’t currently apply to their digital reproductions. If the r. is basically a mechanical operation, one might argue that no copyright should apply to the reproduction, too. Clearly, the distinction between a work that is creative and one that is not is going to be very dangerous in the case of photography and ultimately impossible (think about those monuments that are photographed thousands of times per day).

The fact that going into these subtle juridic details takes so much time and effort is, alone, a good example of the difficulties that this double layer of rights is posing.

The recommendations we collected are aimed in the direction of clearing the nature and extent of CHR, and of maximising the benefits for the Commons and the Public Domain. CHR should not be property rights but rather access rights, thus posing no limitations on subsequent copies of the first reproduction once this takes place. If there is going to be a fee for commercial use of reproductions, the process has to be easy and quick. The policy for museum visitors should be “open by default” and larger institutions (or networks) might ask digital publishers like bloggers and wikipedians to link back to the original item – even though this assumes that there’s a digital collection available on the Web. Licensing of such collections is beyond the scope of COMMUNIA, and CH is also explicitly excluded from the EU PSI directive. There was some work done by the LAPSI project at the last meeting in Barcelona about this, and the survey launched by the European Commission might help in changing this situation. Clearly, countries like Italy and Greece might see this as “selling out” one of their major assets for economic development. We believe the opposite, and tried to develop our discussion around the concept of cultural heritage as infrastructure, just like the road network or the public green, that needs to be maintained for the benefit of all citizens and the overall development of society.

CHI want to retain control over items and buildings that they often regard as “theirs”, but this need has to live together with the fact that millions of people want to share digital content about cultural heritage on the web. Ultimately, this fact should be regarded as a very positive thing, if the mission of institutions is to maximise the awareness of Cultural Heritage among the public and the impact it has on the social and economic life of EU citizens.

Enriched publications in Dutch archaeology

Guest - September 20, 2010 in WG Archaeology

The following guest post is from Janneke Adema, researcher in the department of media and communications at the University of Coventry, member of the Open Knowledge Foundation’s Working Group on Open Data in Archaeology and coordinator of the OKF Working Group on Open Resources in the Humanities.

Archaeological data are currently exposed as an appendix in traditional archaeological publishing. Digital publishing of monographs and journals enables experiments in integrating text and images with other content formats, thus bridging the gap between presentation of results and dissemination of research data. This post is about research performed in the Netherlands for the Journal of Archaeology in the Low Countries (JALC). In the Netherlands, archiving of research data is done in a centralized way, and part of the system is already Open Access. The findings and results of the JALC project on Enriched publications in Dutch Archaeology are relevant beyond Dutch borders, and should be taken into consideration for the development of a sustainable strategy for Open Data in archaeology.

The enrichment of publications in archaeology, focusing on an integrated presentation of publications and research data, is rather a new development. However, it is a development that offers huge potential for archaeological communication. Many aspects still need to be resolved and explored pertaining to specific technological, archaeological and user issues. Over the last year, as part of the SURFshare project “Enriched publications in Dutch Archaeology”, we have been conducting research into user needs and expectations concerning enriched publications. The Enriched publications project was based on setting up an infrastructure between the Open Access e-journal Journal of Archaeology in the Low Countries (JALC) and the ‘e-depot Nederlandse archeologie’ (EDNA), in cooperation with Data Archiving and Networked Services (DANS) and the Digitaal Productiecentrum (DPC). Within this project the importance of the input and involvement of future users of the yet-to-be-created system was clearly felt.

The main questions we wanted to see answered in the user study were firstly, whether there was support in the Dutch and Belgian archaeological community for enhanced publications; secondly, what archaeologists see as the main possibilities and drawbacks of enhanced publications; and thirdly, what kind of services or enhancements they would most like to see. The first phase of research was based on interviews with 14 archaeologists and a literature study. The second phase involved an external evaluation, using an online survey, after the publication of the first two issues of JALC.

More information about the project, including details on the user needs studies as well as the results of the other work packages can be found at the project website

Benefits and drawbacks

The potential benefits of an enhanced publication according to the interviewees concerned the addition of material that would otherwise not fit in a printed publication; the increased efficiency of scholarly communication, leading to increased data transparency; and the wider dissemination of their work to, and the sharing of data with, their peers. The drawbacks of the new format were also clearly seen and felt, especially concerning the extra work and time that will go into enhancing (particularly given the lack of true incentive), the potential for enhancement to cause information overload and distraction, the financing and upkeep (data interoperability, solid infrastructure) of the enhancements, and the ownership, quality establishment, and peer review of the additional material.

We also asked which added services or enhancements the interviewees would most like to see. Apart from the possibility of GIS maps (which most of the interviewees were quite aware might be hard to implement, but which they would love to have), many of the enhancements deemed most important are the most basic ones: the possibility of adding color, enhanced search options, and the possibility of adding a database or dataset of images.

This preference for basic services and enhancements was one of the indications which supported the notion that, at least within our interviewee group, the static print-based article or print paradigm still seemed to be very much the norm. The added services were mostly seen as enhancements of things that are simply harder to achieve in a print publication. Reading from print is still preferred and a rather ‘traditional’ view pertains when it comes to formal publishing concerning quality standards, peer review, copyright issues, and the updating of papers. There seemed to be no support within our archaeological test-community to go from mere enhancements to more liquid and fluid forms of publications where the article becomes more wiki-like for instance.

Evaluation

After the first two issues of JALC were published—which included a number of enhanced publications—we went back to our test-community to see whether their views relating to the enhancements had changed. This evaluation showed an increased support and enthusiasm for the enhancements. In general the respondents were quite positive about the enhancements that were eventually established in JALC, especially with respect to the more elaborate ones, for instance the GIS applications. A large majority of the participants believed the enhancements improved the quality of the publications. Some of them even felt they made you focus more on the text. A majority of the participants were also more willing to provide an enhanced publication themselves after having seen the enhancements in JALC.

Comments focused mainly on technical details, on formats and design, as well as on the general outlook of the enhancements. It was clearly felt the enhancements ‘should really add something’, in other words, that the benefits from the extra digital products should be clear and have a meaningful relation. One of the main concerns had to do with the financial sustainability of the enhancements. It is not clear whether the fact that JALC is an Open Access journal might have had something to do with the insecurity towards the financial sustainability. However, it was felt some more information and support from the publisher’s side (or a link to more information) could perhaps be beneficial, as could some more information on who pays for what and how the costs for set up and maintenance are being met. Furthermore, our research showed that information, guidance, and advice from the publisher on creating and delivering an enhanced publication is essential, at least for the time being.

Conclusion: missionary work still needed

From the user needs research and the evaluation afterwards, we concluded that there is a large base of potential support for enhanced publications but a lot of missionary work still needs to be done. This is necessary not so much to show the community what the benefits of enhanced publications are, but rather to relieve fears and uncertainties regarding the new format. Secondly, we concluded that more experiments need to be done to establish a clear infrastructure and clearer policies when it comes to enhanced publications, to give an example to archaeologists of what an enhanced publication might look like in the context of the Open Access e-journal JALC. Practical experiments should still take the print paradigm and more traditional scholarly communication methods as their starting point, to ensure the best uptake of the new format in the archaeological community and to appease fears and uncertainties.

Pollen data in the New and Old World

Guest - July 14, 2010 in External, Open Data, Open Science, Open/Closed, Technical, WG Archaeology, WG Open Data in Science

The following guest post is from Stefano Costa at the University of Siena. He is Founder of the IOSA initiative and Coordinator of the Open Knowledge Foundation‘s Working Group on Open Data in Archaeology. Stefano wishes to thank Thomas Kluyver and David Jones for their help in reviewing the post.

Since the 19th century, the study of archaeobotanical remains has been very important for combining “strict” archaeological knowledge with environmental data. Pollen data enable assessing the introduction of certain domesticated species of plants, or the presence of other species that grow typically where humans dwell. Not all pollen data come from archaeological fieldwork, and pollen analysis is often done by ecologists without a particular focus on human-associated plants. However, from an archaeologist’s perspective the relationship among the two sets is strong enough to take an interested look at pollen data worldwide, their availability and most importantly their openness, for which we follow the Open Knowledge Definition.

We found that there is a serious misunderstanding by universities and research centers of their role in society as places of research, innovation that is available for everyone. As for dendrochronological data, academia is a closed system producing data (at very high costs for society) that are only available inside its walls, but it’s all done with public money.

Finding pollen data

The starting point for finding pollen data is the NOAA website.

The Global Pollen Database hosted by the NOAA is a good starting point, but apparently its coverage is quite limited outside the US. Furthermore, data from 2005 onwards aren’t available via FTP in simple documented formats, but are instead downloadable as Access databases from another external website. Defining Access databases as a Bad Choice™ for data exchange is perhaps an euphemism.

Unfortunately, a large number of databases covering single continents or smaller regions is growing, and the approaches to data dissemination show marked differences.

Americas

For both North and South America, you can get data from more than one thousand sites directly via FTP. There are no explicit terms of use. Usually, data retrieved from federal agencies are public domain data.

The README document only states NOTE: PLEASE CITE ORIGINAL REFERENCES WHEN USING THIS DATA!!!!!. Fair enough, the requirement for attribution is certainly compatible with the Open Knowledge Definition.

Europe

From the GPD website we can easily reach the European Pollen Database, that is found at another website tough (and things can be even more confusing, provided that the NOAA website has some dead links).

You can download EPD data in PostgreSQL dump format (one file for each table, with a separate SQL script create_epd_db.sql). Data in the EPD can be restricted or unrestricted. That’s fine, let’s see how many unrestricted datasets there are. Following the database documentation, the P_ENTITY table contains the use status of each dataset:

steko@gibreel:~/epd-postgres-distribution-20100531$ cat p_entity.dump |
awk -F "t" {' print $5 '} | sort | uniq -c
    154 R
   1092 U

which is pretty good because almost 88% of them are unrestricted (NB I write most of my programs in Python but I love one liners that involve awksort and uniq). We could easily create an “unrestricted” subset and make it available for easy download to all those who don’t want to mess up with restricted data.

But what do “unrestricted” mean for EPD data? Let’s take a more careful look (emphasis mine):

  1. Data will be classified as restricted or unrestricted. All data will be available in the EPD, although restricted data can be used only as provided below.
  2. Unrestricted data are available for all uses, and are included in the EPD on various electronic sites.
  3. Restricted data may be used only by permission of the data originator. Appropriate and ethical use of restricted data is the responsibility of the data user.
  4. Restrictions on data will expire three years after they are submitted to the EPD. Just prior to the time of expiration, the data originator will be contacted by the EPD database manager with a reminder of the pending change. The originator may extend restricted status for further periods of three years by so informing the EPD each time a three-year period expires.

Sounds quite good, doesn’t it? “for all uses” is reassuring and the short time limit is a good trade off. The horror comes a few paragraphs below with the following scary details:

  1. The data are available only to non-profit-making organizations and for research.

Profit-making organizations may use the data, even for legitimate uses, only with the written consent of the EPD Board, who will determine or negotiate the payment of any fee required.

Here the false assumption that only academia is entitled to perform research is taken for granted. And there are even more rules about the “normal ethics”: basically if you use EPD data in a publication the original data author should be listed among the authors of the work. I always thought citation and attribution were invented just for that exact purpose, but it looks like they have distinctly different approach to attribution. The EPD is even deciding what are “legitimate” uses of pollen data (I can hardly think of any possible unlegitimate use).

Africa

For “Africa” read “Europe” again, because most research projects are from French and English universities. For this reason, the situation is almost the same. What is even worst is that in developing countries there are far less people or organizations that can afford buying those data, notwithstanding the fact that in regions under rapid development the study and preservation of environmental resources are of major importance.

Data are downloadable for individual sites using a search engine, in Tilia format (not ASCII unfortunately). The problems come out with the license:

The wording is almost exactly the same as for the EPD seen above:

Normal ethics pertaining to co-authorship of publications applies. The contributor should be invited to be a co-author if a user makes significant use of a single contributor’s site, or if a single contributor’s data comprise a substantial portion of a larger data set analysed, or if a contributor makes a significant contribution to the analysis of the data or to the interpretation of the results. The data will be available only to non-profit-making organisations and for research. Profit-making organisations may use the data for legitimate purposes, only with the written consent of the majority of the members of the Advisory board, who will determine or negotiate the payment of any fee required. Such payment will be credited to the APD.

Conclusions

The only positive bit of the story, if any, is that these datasets are nevertheless available on the web, and their terms of use are clearly stated, no matter how restrictive. It would be just impossible to write a similar article about archaeological pottery, or zooarchaeological finds.

Appendix: Using pollen data

Pollen data are usually presented in forms of synthetic charts where both stratigraphic data and quantitative pollen data are easily readable. Each “column” of the chart stands for a species or genus. You can create this kind of visualization with free software tools.

The stratigraph package for R can be used for

plotting and analyzing paleontological and geological data distributed through through time in stratigraphic cores or sections. Includes some miscellaneous functions for handling other kinds of palaeontological and paleoecological data.

See the chart for an example of how they look like.

Open Context

Guest - July 13, 2010 in External, Open Access, WG Archaeology, Working Groups

The following guest blog is from Open Context’s Project Lead Eric Kansa and Editor Sarah Whitcher Kansa, who are both members of the Open Knowledge Foundation‘s Working Group on Open Data in Archaeology.

About Open Context

Open Context is a free, open access resource for the electronic publication of primary field research from archaeology and related disciplines. We developed it to help scholars and students to easily find and reuse field science data and media. The system makes data searchable and citable, with robust archival support from the California Digital Library. The Alexandria Archive Institute, an independent 501(c)(3) non-profit organization, maintains Open Context and provides editorial oversight for Open Context content. The project has been funded by the William and Flora Hewlett Foundation, the National Endowment for the Humanities (NEH), and the Institute of Museum and Library Services (IMLS)

Key Features of Open Context:

  • Data publication (with peer review, if desired) of datasets, images, maps, and related items.
  • Stable URL to every project and individual item within a project. Projects, items, and groups of items can be cited elsewhere and permanently linked to print publications.
  • Citation provided for every project and item.
  • Project and Person information to provide the user with more in-depth knowledge about the author and background of the study. *Faceted navigation that enables users to compose analytically precise searches and queries through a simple point and click interface.
  • Web services, including Atom feeds, so that content can by syndicated and visualized elsewhere on the Web.
  • Creative Commons licenses so that datasets are legally free for reuse.

Open Data Publication and Archiving with Open Context

Open Context emphasizes publication to work with familiar patterns in scholarly communication and encourage data dissemination in the research community. To this end, Open Context does not disseminate raw data but instead relies on editorial supervision to add description, documentation, and structure to researcher-contributed content. This transforms raw data into a more polished and intelligible product that is still as detailed and comprehensive as the original field documentation.

In development since late 2006, Open Context now hosts over 180,000 items, including nearly 5,000 media items, from 35 archaeological sites around the world. The current rate of publication is about one project per month, and we hope to increase that rate as our publication tools become more streamlined. While Open Context contains mainly archaeological content, it can also accommodate content from other field-based sciences (public health, conservation biology, geological sciences, etc.), so please feel free to get in touch if you have data you would like to publish.

To see some of this at work, check out the recently-published Aegean Prehistory Project, featuring data on shells recovered from three archaeological sites in the Aegean. Canan Çak?rlar published these data as an online appendix to the printed publication of her Ph.D. dissertation. In addition to an overview of her project and a link to where the printed publication can be purchased, Canan also has a “person page” with information about her work, publications, etc. Her data has been drawn (via an Atom feed) into BoneCommons, a Web resource for the worldwide zooarchaeology community. Thus, Canan’s work can be found via a search engine, Open Context, BoneCommons, or any other place that draws her content from Open Context. This makes for maximum exposure of her work to her colleagues, as well as its discovery by others for uses beyond archaeomalacology.

How Open is Open Context?

Open Context requires use of Creative Commons licenses (or the CC-Zero public domain dedication). Open Context also makes all data, including structured data (in XML, JSON, and CSV formats) freely available with no login barrier.

However, Open Context does permit use of license variants that restrict commercial use. While this restriction does inhibit interoperability, some stakeholder communities, especially indigenous groups, have deep historical and political concerns regarding commercial uses of cultural heritage materials. These concerns represent complex ethical challenges, but do highlight how the ideal of “openness” needs to be evaluated by other ethical considerations.

Incentives and Guidance for Openness

The National Science Foundation recently announced additional requirements for grant-seekers to develop meaningful “Data Access” plans for their proposals. Many researchers will have little background or understanding on how best to meet this requirement. To offer guidance for the researcher community, Open Context now offers guidelines for researchers to prepare data for online publication. In addition to these, we have developed an online estimation tool, which helps scholars budget appropriately for data sharing, guide them through licensing choices, and offer tips regarding good practices in data sharing. The estimation tool then returns texts to researchers that can be used in their NSF Data Access plans.

Dig the new breed, Part III – wrapping it all up

jwalsh - June 11, 2010 in External, Ideas and musings, Uncategorized, WG Archaeology

This is the third in the amazing series of guest blogs from Ant Beck on the impact of linked open data for archaeology.

Part 1: New approaches to archaeological data analysis, as seen in the DART and STAR projects
Part 2: Considering the ethics of sharing archaeological knowledge

OK, to recap we have:

  • A scientific movement that advocates open approaches to data, theory and practice
  • Emerging foundational interoperability using semantic web technology
  • The potential to remove a barrier and facilitate the submission of primary data

These three powerful factors could prove to be highly disruptive. In combination they have the potential to turn archaeological data and data repositories from static siloed islands (containing data that is increasingly stale) into an interlinked network of data nodes that reflect changes dynamically.

The linch-pin is the use of triplestores (RDF databases) that provide persistent identifiers. Persistent identifiers allow us to refer to a digital object (a statement, a file or set of files) in perpetuity, even if the underlying storage location moves. This means links between objects are persistent: therefore, when an observation or interpretation changes its effects are propagated through to all the data/events that link to it. I see organisations such as the ADS, Talis (an innovating semantic web technology provider which provide the Talis Platform which includes a free RDF hosting service for open data) and national heritage bodies providing such services.

Some open science projects are likely to adopt RDF as their de-facto data sharing format. RDF triples (subject, predicate, object) provide a schema transparent mechanism for data storage. They are not ideal for all data types (raster data structures for example) but when used with Ontology and SKOS, as demonstrated by STAR, they are powerful analytical, search and inference tools.

So, what is the importance of storing heritage data in RDF? Well, it depends which point of view you take. From a data management perspective there is no longer any need to migrate data formats. However, to facilitate re-use, different “views” of the RDF model can be generated and incorporated into traditional analytical software, such as GIS. Importantly, analysis stops being a “knowledge backwater”: new knowledge can be appended back into the triplestore.

Linked Data concepts in archaeology

Linked Data concepts in archaeology

From a data curation, re-use and analysis perspective the quality of the data has the potential to be dramatically improved. Deposition is no longer the final act of the excavation process: rather it is where the dataset can be integrated with other digital resources and analysed as part of the complex tapestry of heritage data. The data does not have to go stale: as the source data is re-interpreted and interpretation frameworks change these are dynamically linked through to the archives, hence, the data sets retain their integrity in light of changes in the surrounding and supporting knowledge system.

An example is probably useful at this juncture: In addition to many other things pottery provides essential dating evidence for archaeological contexts. However, pottery sequences are developed on a local basis by individuals with imperfect knowledge of the global situation. This means there is overlap, duplication and conflict between different pottery sequences which are periodically reconciled (your Type IIb sherd is the same as my Type IVd sherd and we can refine the dating range…… Hurrah… now let’s have another beer). This is the perennial process of lumping and splitting inherent in any classification system. Updated classifications and probable dates allow us to re-examine our existing classifications. One can reason over the data to find out which contexts, relationships and groups are impacted by a change in the dating sequences either by proxy or by logical inference (a change in the date of a context produces a logical inconsistency with a stratigraphically related group) While we’re on the topic of stratigraphy, an area of notorious tedium and poor quality data (often with conflicting relationships), RDF allows rapid logical consistency checking as stratigraphic relationships are basically a graph and RDF triples are a graph database. Publicly deposited RDF data should be linked data: this means that all the primary data archives are linked to their supporting knowledge frameworks (such as a pottery sequence). When a knowledge framework changes the implications are propagated through to the related data dynamically. This means that policy, development control and research decisions are based upon data that reflects the most-up-to date information and knowledge….. cool huh.

Incorporating excavation data into RDF means that ontology and SKOS can be used to dynamically repurpose the data for policy formulation, planning impact, regional heritage control and mitigation purposes in conjunction with the data in the Sites and Monuments Record (SMR). Raw data can be integrated from multiple different sources with different degrees of spatial and attribute granularity and, where appropriate, generalised so that the data is fit for the end users’ purpose. From a policy perspective curatorial officers no longer have to battle to stop datasets becoming stale and add new datasets to the local SMR. The SMR will remain an essential dataset: even though it is a generalised resource it is the only location of a digital record for resources that are unlikely to be digitised in the future (unless there is a very unlikely reverse in funding patterns). Thus the curatorial officer can develop more effective regional research agendas based upon up-to-date and accurate data.

This has the potential to change the way Historic Environment Information Resources (HEIRs) are managed by curatorial officers and transform how developers (property and software), policy makers and the general public engage with and consume any data. They will be able to support innovative access to primary linked data resources by researchers, planners and most importantly the public. This is a significant and important change in role. In addition the heritage data can be mashed up with other data resources to produce tailor made resources for different end-user communities – following the model successfully employed by data.gov.uk.

Data re-use and mashups are also important for those undertaking research and analysis. The big difference will be for those who undertake research or collect data that transcends different traditional analytical scales. For example, the National Mapping Programme which aims to “enhance the understanding of past human settlement, by providing primary information and synthesis for all archaeological sites and landscapes visible on aerial photographs or other airborne remote sensed data” will provider deeper insights when it is integrated with other data. However, this integration can occur in real time and add tangible interpretative depth. If an interpreter is digitising data from an aerial photograph and they see two ditches cutting one another they are unlikely to be able to tell the relative stratigraphic sequence of the two features. Direct access to excavation or other data will allow the full relationships and their interpretative relevance to be deduced during data collection.

In the longer term consumers of archaeological data will be more used to dealing with primary data, will become more aware of its potential and demand more of the resource. This should produce a ground up re-appraisal of recording systems and a better understanding of archaeological hermeneutics. The interpretative interplay between theory, practice and data as part of a dynamic knowledge system is essential. Although this has been recognised, in reality theory, practice and data have never really been joined up. We don’t have to use a one size fits all approach to conducting excavations, but we can tailor bespoke systems that address local, regional and national research challenges. We can generate interesting and provocative data that can be used to test theory and inform practice and move away from recording systems mired in the theoretical and intellectual paradigms of the mid 70’s.

The virtuous circle is re-established; theory will influence practice, which will change the nature of the data, which will impact on interpretative frameworks, which will provide a body of knowledge against which theory can be tested.

Final comments

There is a new breed: there are people and organisations who don’t want to do what’s always been done. People who are empowered and don’t believe that established institutions and hierarchies are the gatekeepers of progress: organisations that can, and want to, change the way we ‘play the game’, people who want to collaborate. Organisations that want to share. Open approaches can help to make all this happen. This is all facilitated by disruptive technology which is increasingly mature, broadly available for free (or at a low cost) and with low barriers of use and re-use. In the nearly twenty years of studying and working in the heritage sector I’ve seen it change dramatically. I feel we are on the cusp of changing the way we engage with our data which could profoundly alter the way we understand the past, how we can communicate this in the present and how we can sustainably manage a complex resource for the future.

Dig the new breed, Part II – open archaeology and ethics

jwalsh - June 11, 2010 in External, Ideas and musings, WG Archaeology

The second in this great series of three guest blogs by Ant Beck. See Part 1 for applications of linked data and remote sensing in archaeology. Part 3 will wrap things up and talk about the disruptive implications of linked open data for impact of archaeology.

Open Science provides the framework for producing transparent and reproducible science by providing open access to raw data, algorithms and interpretations. Efforts such as STAR and STELLAR provide the foundation from which fine granularity excavation data can be made available as part of the semantic web and feed into Open Science analysis. This provides answers to the questions of how and why we should have open access to archaeological data. However, it does not provide answers to what data should be opened or if archaeological data should be opened at all. We move into the sphere of ethics and open archaeology.

Treasure seeking - CC-BY-SA-NC

Recently I have chatted to a number of people and organisations who want to open up heritage data. The conversations tend to have an ethical component. Like other disciplines, such as ecology, there are potential ethical issues in making heritage data open. The oft touted reason, in the UK at least, is that if access is given to this information then it will be exploited by “night hawkers” (irresponsible metal-detectorists) and other “treasure hunters” and sites (a term I don’t really like) will be destroyed.

This argument is polarised and plays to the lowest common denominator: it is based on the premise that “accessible knowledge will inevitably be abused” and eschews any of the benefits that data sharing can provide. Nor does it consider the nuanced ethical arguments concerning re-appropriation of artifacts collected under imperialist regimes or the ethical conundrum surrounding research into aboriginal or other indigenous communities (which, now that I’ve raised them I wont comment on them further). The Portable Antiquities Scheme has done much to improve this argument.

The elephant in the room in this debate concerns those archaeologists who have sat on their archive for decades. We know of its significance but it is not available for academic and research analysis and does not inform the planning process. This has enormous impact on local planning policy, public and academic understanding, theory, practice etc. Since, the 1990 introduction of Planning Policy Guidance 16 (PPG16: essentially commercial archaeology) in the UK, and the later Planning Policy Statement 5 has improved the situation a bit.

But I find the situation somewhat paradoxical. The UK curatorial systems expect that a generalised summary, or synthesis, of any investigation is deposited with the regional curatorial officers. This data is entered into the Sites and Monuments Record (SMR) and is publicly accessible. Therefore, the public has access to a generalised dataset. The expectations for primary, or raw, data are different: it’s considered ethically appropriate to deposit fine granularity data (i.e. non-generalised, primary, data, such as those from excavation) with the Archaeology Data Service (ADS), however, there are issues raised if an individual wants to do this outside such formal structures (however, the Perry Oaks Project have released redacted versions of their site data).

Is this an issue of ethics, or where formal and informal work practices collide; or is this simply an issue of cost, where individuals and organisations have the will but not the finances? Alternatively, and possibly most likely, do archaeologists just feel uncomfortable making their fine grained data available to a mass audience without going through a representative authority such as the ADS? My feeling is that within the archaeology domain there is an informal belief that if data is deposited with a repository then the repository also takes the ethical responsibility if the data is released. Deposition so that data is available in perpetuity is part of business and academic best practice, however, deposition does not necessarily mean release and subsequent consumption by other parties (public or otherwise).

Whatever the answer the point remains: archaeologists, for right or wrong, consider the implications of placing fine grained data in the public domain and “Ethical considerations” have been identified as a “barrier” to deposition. However, there appears to be limited guidance as to how to resolve these issues. This means that many archaeologists are re-inventing the wheel. The challenge is to provide some supporting “thing” that makes it easy for individuals and organisations to get to a clear, and hopefully unambiguous, ethical position. Such a “thing” will reduce uncertainty thereby removing one of the barriers to data sharing. The current default position is the equivalent of doing nothing: surely this must change.

Supporting “stuff” which is recognised and approved by national heritage organisations and standards bodies will act as important lubricant to help individuals and groups to release data through informal channels. It should be recognised that the relationship between the “citizen”, the archaeologists and heritage data will change: citizen science and citizen data, will play more of a role in heritage than ever before. Hence, a focus on the informal is important: we don’t want more grey data so we? The Portable Antiquities Scheme is the “poster boy” for archaeological approaches to citizen science – although they do have a range of different user access levels.

I raised this as a topic for the Archaeology working group at the Open Knowledge Foundation. Response so far has been positive and has spilled over to colleagues in the curatorial sector and beyond (the discussion thread can be found here). We’ll be setting up a meeting to discuss these issues later in 2010. Both the Archaeology Data Service and the University of Leeds have kindly offered a venue.

There’s also a start at creating an ethics statement on open access to raw archaeological data – a statement that should be supportable by institutions and individual researchers alike. If you’d like to get involved, please join the Open Archaeology working group and mailing list – involvement could be helping to craft the ethics statement, asking your institution to contribute its own statement, helping to plan and document the workshop.

Dig the New Breed: How open approaches can empower archaeologists- Part I

Rufus Pollock - June 10, 2010 in External, WG Archaeology

Very happy to post the first in an amazing series of OKFN guest blogs by Ant Beck, a member of the Open Archaeology working group. Ant discusses the DART project and the STAR project, both of which employed Linked Data in a heritage context. Later we’ll get into the ethics of open heritage, and a vision for the future of archaeological data.

The title “Dig the New Breed” is taken from the presentation I gave at the Open Knowledge Conference 2010. I did this for two reasons: It’s a terrible play on words (dig is employed as a synonym for “excavation” and “To like”) and I like name-checking “The Jam”. As this series of posts has taken form, it’s changed from being a piece about Open Science and Ethics into something about how disruptive technologies can be implemented to transform how the heritage sector operates.

STAR & STELLAR – Anyone for linked heritage data?

DART

DARTProject Flickr page

I recently attended a STAR project workshop and saw a glimpse of the future. The Semantic Technologies for Archaeological Resources (STAR) project investigated “the potential of semantic terminology tools for widening and improving access to heritage resources, exploring the possibilities of combining a high level, core ontology with domain thesauri and natural language processing techniques”. The project has looked at extracting structured knowledge from “grey literature” using Natural Language Processing (NLP) tools – all very worthy and interesting but not something I’m directly excited by as “grey literature” is essentially tertiary data (an extraction of synthetic data derived from the primary record). In addition they have developed an RDF based approach to query data stored in heterogeneous excavation databases. WOW!

And in case you missed that…. querying data stored in heterogeneous excavation databases. Essentially they have resolved syntactic (platform/format), schematic (structural) and semantic (language) heterogeneities by generating mappings of key fields (i.e. a sub-set of the source data) to the English Heritage extension of the CIDOC Conceptual Reference Model (CRM), extracting the data as RDF and providing semantic interoperability through Knowledge Organization Systems (KOS) represented in SKOS format from standard heritage thesauri. In essence, they extract RDF from relational databases using hand crafted mappings to both SKOS and ontology articulating semantics and canonical concepts respectively.

The combination of RDF, ontology and SKOS have allowed the team to produce a demonstrator capable of cross searching different excavation databases with “difficult queries”. The team demonstrated that they could address questions such as, show me contexts that satisfy the following criteria:

  • Roman corn drying ovens with palaeobotanical analysis
  • Charred plant remains and charcoal from 4 post structures
  • Post holes that contain ritual deposits

Granted there are limitations: it currently supports a sub-set of the data collected during excavation and the RDF model is viewed as an interim tool with users going back to the source databases to conduct further analysis. However, the concept has been definitively demonstrated. Great stuff! The impact of this work is profound: the SKOS and ontology will allow inferencing/reasoning over the data which will transform the way the data can be re-used, analysed and generalised (more on this in a bit).

The Glamorgan team have a follow on project called Semantic Technologies Enhancing Links and Linked data for Archaeological Resources (STELLAR) funded by the AHRC. One of the aims of STELLAR is to develop “best practice guidelines and tools … both for mapping/extracting archaeological data as RDF and for generating archaeological Linked Data”. This will take the research developed in STAR and provide tools so that it can be deployed to mainstream archaeological data. I’m really looking forward to seeing the roll-out of this technology.

DART and Open Science

HeritageDetectionProblem

DART is an acronym for Detection of Archaeological Residues using remote sensing Techniques. DART is a three year Science and Heritage initiative funded by AHRC and EPSRC, led by the School of Computing at the University of Leeds. The project aims to improve the understanding of the physical, chemical, biological and environmental factors that determine whether an archaeological feature (pit, ditch, posthole etc.) can be detected by a sensor (camera, Ground Penetrating Radar, etc.). DART brings together consultants and researchers from the areas of computer vision, geophysics, remote sensing, knowledge engineering and soil science.

Archaeological sites and features are created by localized processes of formation and deformation. There are a range of imaging instruments that can be used to detect these archaeological residues, although, the knowledge required to determine what, when, how and why to use each different type of sensor is patchy. Seasonal, environmental and vegetation dynamics play a part, although the complexities of interaction and how they modify “contrast signatures” derived from the existing formation and deformation processes is uncertain.

This is important so I will provide an example: as a mud-brick built farmstead erodes, the silt, sand, clay, large clasts and organics in the mud-brick along with other anthropogenic debris are incorporated into the soil. This produces a localised variation in soil size and structure. This in turn impacts on drainage and localised crop stress and vigour. These localised variations can all provide measurable differences, or contrasts, that indicate the presence of archaeology.

For example, archaeological residues can affect drainage of the soil, which then affects the appearance of crops. Different drainage characteristics result in different soil moisture retention properties, and local variations in crop stress/vigour can be observed as differences in crop height or crop colour (essentially crop marks). Archaeological contrasts can be expressed through, for example, variations in chemistry, magnetic field, resistance, topography, temperature and spectral reflectance.

The DART project is trying to identify physical, chemical and biological contrast factors that may allow us to detect archaeological residues (both directly and by proxy) under different land-use and environmental conditions. We address the following research issues:

  • What are the factors that produce archaeological contrasts?
  • How do these contrast processes vary over space and time?
  • What processes cause these variations?
  • How can we best detect these contrasts (sensors and conditions)?

DART is committed to open science principles and aims to act as an exemplar for how data, tools, and analysis can be made available to the wider academic, heritage and general community. Data, software, algorithms and services developed throughout the project will be made available for re-use with appropriate open licences.

Licensing is an issue as license incompatibility can severely restrict re-use. Science Commons is establishing protocols in this area. Publicly accessible dissemination is preferred, however, where necessary domain specific or institutional repositories will be utilised for long-term preservation. Cameron Neylon is part of the project consortium and provides steer on these issues.

The whole point of taking an open science position on this project is so that we can maximise the benefit and impact. The research problem is large and complex: one project will not solve it. Inevitably the science will need refining; adequate articulation will require long term data collection under different conditions, followed by iterative hypothesis testing and modelling. The challenge is to get this information in the quickest, cheapest and easiest ways. An Open Science approach means that DART is openly collaborating with researchers and individuals throughout the world. The body of work developed within DART can be easily re-used by others: our results can be tested as the data and algorithms will be in the public domain, which means that they can be rapidly evaluated and easily re-used. Unlocking the “body of knowledge” and “know how” surrounding a programme of research should significantly reduce the barriers to re-use. This may generate a critical mass of surrounding research, which can only improve the underlying models and science. Providing scientists with the methodology of how to make the wheel will not only stop us reinventing it, but will also improve the manufacturing process.

Get Updates