Support Us

You are browsing the archive for Texts.

Announcing Recline.JS: a Javascript library for building data applications in the browser

Rufus Pollock - July 5, 2012 in Featured, LOD2, Open Knowledge Foundation, Open Textbooks, Press, Sprint / Hackday, Texts

Today we’re pleased to announce the first public release of Recline.JS, a simple but powerful open-source library for building data applications in pure Javascript.

For those of you who want to get hands on right away, you can:

recline-map-geo-filter-sf-crime

What Is It?

Recline is a Javascript library of data components incuding grid, graphing and data connectors.

The aim of Recline is to weave together existing open-source components to create an easy to use but powerful platform for building your own data apps.

The views can be embedded in to other apps just like we’ve done for CKAN and the DataHub where it’s used for our data viewer and visualisations.

What makes Recline so versatile is its modularity, meaning you only need to take what you need for the data app you want to build.

Main features:

  • View (and edit) your data in a clean grid / table interface
  • Built in visualizations including graphs, maps and timelines
  • Load data from multiple sources including online CSV and Excel, local CSV, Google Docs, ElasticSearch and the DataHub
  • Bulk update/clean your data using an easy scripting UI
  • Easily extensible with new Backends so you can connect to your database or storage layer
  • Open-source, pure javascript and designed for integration — so it is easy to embed in other sites and applications
  • Built on the simple but powerful Backbone giving a clean and robust design which is easy to extend
  • Properly designed model with clean separation of data and presentation
  • Componentized design means you use only what you need

Who’s Behind It?

Recline has been developed by Rufus Pollock and Max Ogden with substantial contributions from the CKAN team including Adria Mercader and Aron Carroll.

Demos

There are a selection of demos now available on the Recline website for you to try out.

Multiview Demo

reclinejs-demo-multiview-20120705

The Data Explorer

recline-us-unemployment-20120607

Timeliner

recline-timeliner-20120703

The Right to Read Is the Right to Mine

Peter Murray-Rust - June 1, 2012 in Bibliographic, OKF Projects, Open Access, Open Content, Open Data, Open Science, Texts, WG Open Bibliographic Data, Working Groups

The following is a draft content mining declaration developed by the Open Knowledge Foundation’s Working Group on Open Access

In brief: The Right to Read Is the Right to Mine

Introduction

Researchers can find and read papers online, rather than having to manually track down print copies.  Machines  (computers) can index the papers and extract the details (titles,  keywords etc.) in order to alert scientists to relevant material.  In addition, computers can extract factual data and meaning by “mining” the content, opening  up the possibility that machines could be used to make connections (and  even scientific discoveries) that might otherwise remain invisible to  researchers.

However,  it is not generally possible today for computers to mine the content in papers due to constraints imposed by publishers.  While Open Access (OA) is improving the ability for researchers to read papers (by removing  access barriers), still only around 20% of scholarly papers are OA. The  remainder are locked  behind paywalls. As per the vast majority of subscription contracts, Subscribers may read paywalled papers, but they may not mine them.

Content  mining is the way that modern technology locates digital information. Because digitized scientific information comes from hundreds of  thousands of different sources in today’s globally connected scientific  community [2] and because current data sets can be measured in  terabytes,[1] it is often no longer possible to simply read a scholarly  summary in order to make scientifically significant use of such  information.[3]  A researcher must be able to copy information,  recombine it with other data and otherwise “re-use” it so as to produce  truly helpful results.  Not only is it a deductive tool to analyze  research data, it is how search engines operate to allow discovery of content. To prevent mining is therefore to force scientists into blind  alleys and silos where only limited knowledge is accessible.  Science  does not progress if it cannot incorporate the most recent findings and  move forward from there.

Definition

‘Open  Content Mining’ means the unrestricted right of subscribers to extract,  process and republish content manually or by machine in whatever form  (text, diagrams, images, data, audio, video, etc.) without prior  specific permissions and subject only to community norms of responsible  behaviour in the electronic age.

  • Text
  • Numbers
  • Tables: numerical representations of a fact
  • Diagrams (line drawings, graphs, spectra, networks, etc.): Graphical  representations of relationships between variables, are images and  therefore may not be, when considered as a collective entity, data.  However, the individual data points underlying a graph, similar to  tables, should be.
  • Images and video (mainly photographic)- where it is the means of expressing a fact?
  • Audio: same as images – where it is expresses the factual representation of the research?
  • XML:  Extensible Markup Language (XML) defines rules for encoding documents  in a format that is both human-readable and machine-readable.”<
  • Core  bibliographic data: described as “data which is necessary to identify  and / or discover a publication” and defined under the Open Bibliography  Principles.
  • Resource  Description Framework (RDF): information about content, such as  authors, licensing information and the unique identifier for the article

Principles

Principle 1: Right of Legitimate Accessors to Mine

We assert that there is no legal, ethical or moral reason to refuse to  allow legitimate accessors of research content (OA or otherwise) to use  machines to analyse the published output of the research community.   Researchers expect to access and process the full content of the research literature with their computer programs and should be able to use their machines as they use their eyes.

The right to read is the right to mine

Principle 2: Lightweight Processing Terms and Conditions

Mining  by legitimate subscribers should not be prohibited by contractual or  other legal barriers.  Publishers should add clarifying language in  subscription agreements that content is available for information mining by download or by remote access.  Where access is through researcher-provided tools, no further cost should be required.

Users and providers should encourage machine processing

Principle 3: Use

Researchers can and will publish facts and excerpts which they discover by reading and processing documents.  They expect to disseminate and aggregate statistical results as facts and context text as fair use excerpts, openly and with no restrictions other than attribution. Publisher  efforts to claim rights in the results of mining further retard the advancement of science by making those results less available to the research community; Such claims should be prohibited.

Facts don’t belong to anyone.

Strategies

We plan to assert the above rights by:

  • Educating  researchers and librarians about the potential of content mining and the current impediments to doing so, including alerting librarians to the need not to cede any of the above rights when signing contracts with  publishers
  • Compiling  a list of publishers and indicating what rights they currently permit,  in order to highlight the gap between the rights here being asserted and  what is currently possible
  • Urging governments and funders to promote and aid the enjoyment of the above rights

[1]  Panzer-Steindel, Bernd, Sizing and Costing of the CERN T0 center, CERN-LCG-PEB-2004-21, 09 June 2004, at http://lcg.web.cern.ch/lcg/planning/phase2_resources/SizingandcostingoftheCERNT0center.pdf.

[2]  The Value and Benefits of Text Mining, JISC, Report Doc #811, March 2012, Section 3.3.8 at http://www.jisc.ac.uk/publications/reports/2012/value-and-benefits-of-text-mining.aspx,  citing P.J.Herron, “Text Mining Adoption for Pharmacogenomics-based  Drug Discovery in a Large Pharmaceutical Company: a Case STudy,”  Library, 2006, claiming that text mining tools evaluated 50,000 patents  in 18 months, a task that would have taken 50 person years to manually.

[3] See MEDLINE® Citation Counts by Year of Publication, at http://www.nlm.nih.gov/bsd/medline_cit_counts_yr_pub.html and National Science Foundation, Science and Engineering Indicators: 2010, Chapter 5 at http://www.nsf.gov/statistics/seind10/c5/c5h.htm asserting the annual volume of scientific journal articles published is on the order of 2.5%.

Prizewinning bid in ‘Inventare il Futuro’ Competition

James Harriman-Smith - November 5, 2011 in Annotator, Bibliographic, Featured Project, Free Culture, Ideas and musings, News, OKF Projects, Open Shakespeare, Public Domain, Public Domain Works, Texts, WG Humanities, WG Open Bibliographic Data

By James Harriman-Smith and Primavera De Filippi

On the 11th July, the Open Literature (now Open Humanities) mailing list got an email about a competition being run by the University of Bologna called ‘Inventare il Futuro’ or ‘Inventing the Future’. On the 28th October, Hvaing submitted an application on behalf of the OKF, we got an email saying that our idea had won us €3 500 of funding. Here’s how.

The Idea: Open Reading

The competition was looking for “innovative ideas involving new technologies which could contribute to improving the quality of civil and social life, helping to overcome problems linked to people’s lives.” Our proposal, entered into the ‘Cultural and Artistic Heritage’ category, proposed joining the OKF’s Public Domain Calculators and Annotator together, creating a site that allowed users more interaction with public domain texts, and those texts a greater status online. To quote from our finished application:

Combined, the annotator and the public domain calculators will power a website on which users will be able to find any public domain literary text in their jurisdiction, and either download it in a variety of formats or read it in the environment of the website. If they chose the latter option, readers will have the opportunity of searching, annotating and anthologising each text, creating their own personal response to their cultural literary heritage, which they can then share with others, both through the website and as an exportable text document.

As you can see, with thirty thousand Euros for the overall winner, we decided to think very big. The full text, including a roadmap is available online. Many thanks to Jason Kitkat and Thomas Kandler who gave up their time to proofread and suggest improvements.

The Winnings: Funding Improvements to OKF Services

The first step towards Open Reading was always to improve the two services it proposed marrying: the Annotator and the Public Domain Calculators. With this in mind we intend to use our winnings to help achieve the following goals, although more ideas are always welcome:

  • Offer bounties for flow charts regarding the public domain in as yet unexamined jurisdictions.
  • Contribute, perhaps, to the bounties already available for implementing flowcharts into code.
  • Offer mini-rewards for the identification and assessment of new metadata databases.
  • Modify the annotator store back-end to allow collections.
  • Make the importation and exportation of annotations easier.

Please don’t hesitate to get in touch if any of this is of interest. An Open Humanities Skype meeting will be held on 20th November 2011 at 3pm GMT.

Forthcoming Series of Open Articles on Open Shakespeare

James Harriman-Smith - September 5, 2011 in Ideas and musings, News, OKF Projects, Open Shakespeare, Texts

This is a cross-posting from Open Shakespeare to announce the culmination of a project run over the summer to encourage greater participation in the website and greater awareness of its goals of promoting open critical commentary.

From Monday 12th September to Monday 10th October, Open Shakespeare will host a series of articles on the topic of ‘Shakespeare and the Internet’. When we invited contributions, the theme was deliberately kept as broad as possible in order to facilitate a wide and diverse range of responses from each of those who have written a post for us. Our contributors range from teachers and students of Shakespeare to an experimental theatre company.

Having already read the majority of the contributions, I can say now that the series fulfils its goal of offering what the Bard would call a “multitudinous” range of approaches to the topic of Shakespeare and the Internet; subjects range from why Polonius would appreciate hypertext to the problems and opportunities of online abundance. The contributions will appear in the following order:

Every article in this series is published under a Creative Commons 3.0 SA BY licence; as with all the other material on Open Shakespeare, we hope that publication under such a licence will encourage the diffusion and development of our contributors’ ideas.

My thanks to all those who have contributed their time and thoughts to this project, particularly Erin Weinberg, whose proof-reading skills have been extremely useful in the preparation of these pieces for publication. Depending on the success of this series, we intend to publish similar, themed posts under an open licence in the future: if you would like to participate as either a writer or an editor, please get in touch through the usual channels.

Now, to conclude, I leave you, I hope, in approximately the same state of anticipation as Leonato leaves an impatient Claudio in Much Ado about Nothing:

> till Monday [...] which is hence a just seven-night; and a time too brief too, to have all things answer my mind.

The Public Domain Review has a new website!

Jonathan Gray - August 9, 2011 in Bibliographic, Free Culture, Public Domain, Public Domain Works, Texts, WG Public Domain, Working Groups

The following post is from Jonathan Gray, Community Coordinator at the Open Knowledge Foundation.

As part of our work to open up the wealth of cultural works which have entered the public domain, earlier this year we launched the Public Domain Review.

Adam Green, the Public Domain Review‘s wonderful Editor, has been hard at work over the past few weeks and the project now has a beautiful new website which you can find here:

In addition to weekly articles about interesting or obscure public domain works, there are now curated collections of texts, images, audio and film material – hand-picked from various online sources.

If you’re interested in receiving the Public Domain Review you can sign up to receive it in your inbox. If you like the project, you can also become a supporter.

Update: Text Camp: 13th August 2011

James Harriman-Smith - August 8, 2011 in Bibliographic, Events, OKF Projects, Open Shakespeare, Texts, Workshop

The Open Knowledge Foundation’s first ever Text Camp will be taking place this Saturday 13th August, thanks to JISC offering us the use of their meeting rooms in London.

Details

  • Where? Brettenham House, 9 Savoy Street, WC2E 7EG, London. – Meet outside ‘The Savoy Tup’ Pub, Savoy Street, at 10am to be guided to the venue.
  • When? Saturday 13th August, 10am – 6pm
  • What?A gathering for all those interested in the relation between technology and literature, with a specal focus on the creation of open knowledge.
  • More details: http://wiki.openliterature.net/Text_Camp_2011
  • Order (free) tickets: http://textcamp2011.eventbrite.com/
  • Twitter: #tcamp11

Hope you can make it!

Open Correspondence

Guest - June 16, 2010 in External, Free Culture, OKF Projects, Public Domain, Texts, WG Humanities

The following guest post is from Iain Emsley, who is a member of the Open Knowledge Foundation Working Group on Open Resources in the Humanities, and a contributor to the Open Shakespeare and Open Milton projects.

Using the social graph, one can find the connections between seemingly disparate groups of people on different services. Most of the projects in the area are focussed on social media, such as Facebook, Twitter and so on. There is, however, a layer of social information that was created before this. Letters were, and still are, used as a method of communication. To some extent it is the Internet before the technology became available. There is a host of data that is shared in each missive. For example, the author and their correspondent. That is only the tip of the metadata though:

  • What are they writing about?
  • Whom are they writing about?
  • When was the letter written?
  • Where was it written?

The Open Letters project, grew out of some musings when working on the timeline for the Open Milton website. I could see the links between the texts and some of the events but I was curious about how things linked together. Neither texts nor authors exist in a vacuum. Authors write to other people – agents, authors, casual acquaintances, friends and family – and they write about books. Sometimes they write about books that they have read, sometimes about what they are writing.

From these we can infer what books, authors, or authors who influenced the author or were being influenced at the time. From this, we can see the growth of the social graph into the cultural graph. Essentially it is the same notion as the social graph but the cultural graph links items like books, poems and events together. In itself it means nothing but linked to the social graph, it allows the user to discover who is being written to whilst a book was being written. Is the author talking to other authors or only to his agent about it?

Charles Dickens was a prolific letter writer which is why he was chosen as the first author for the project. From his own letters, we can see him writing to authors, such as George Eliot or Wilkie Collins, and scientists like Charles Babbage, inventor of the Difference Engine or his agent about his works in progress. His letters shed some light into the nineteenth century literary world but also contextualises it within the wider world. His wide range of writing gave me a chance to cast widest net possible and set up as many nodes on the graph.

A brief peak at the correspondents to whom Dickens was writing about the Pickwick Papers, Dickens’s first novel, suggests that it more than just a book but an item of conversation which is revealed through his letters about the book. He managed to offend Mr David Dickson, a reader, with a passage in the novel, though invited W C Macready to a dinner to celebrate its publication. Later in his life, he wrote to Wilkie Collins, the author, complaining that “I have never seen anything about myself in print which has much correctness in it–any biographical account of myself I mean”. The set of letters sheds a little light into the public and private worlds of Dickens, from his mortification at offending a reader to complaining about his own portrayal. He comes alive as a person rather than just an author as does his social graph and the relationships with his correspondents is illuminated by the way that he addresses them with varying degrees of formality.

Now that the site is set up, the next step is to complete the set of Dickens letters which his daughters edited and published from the Project Gutenberg texts. The next major step is to try and collect the letters of his correspondents and from them the new correspondent nodes. As well as HTML representations of the letters, the project uses RDF, reusing Dublin Core and Friend of a Friend (FOAF) with its own extensions for the collection of letters called letter. Rufus Pollock has already created a graph that visualises the relationships between authors, time of begin written to and the number of times to which they were written and timelines for the letters are being developed.

There are, of course, more things that I would like to do but the major one task is building the collections of letters under open licenses. The project can be contacted through the open-literature mailing list if you would like to find out more or to contribute.

Open Correspondence

Public Domain Day 2010: A roundup

Jonathan Gray - January 5, 2010 in COMMUNIA, Public Domain, Public Domain Works, Texts, WG Public Domain

January 1st 2010 was Public Domain Day, when around the world various works fell out of copyright and into the public domain. Back in November we put together a rough list of which works fall into the public domain:

You can find the list of 563 authors on our Public Domain Works project, which is a simple registry of artistic works that are in the public domain:

The list can be sorted by author surname, birth date, death date and number of works by clicking on the relevant headings. Notable authors include the poets William Butler Yeats and Osip Mandelstam, as well as the father of psychoanalysis Sigmund Freud.

There were celebrations in Poland and Switzerland. Communia, the EU policy network for the digital public domain launched a new website at:

The Telegraph celebrated Public Domain Day with an editorial from Shane Richmond, Head of Technology:

Happy Public Domain Day everyone! Today is the day that copyright expires on a whole range of works. As we reported this morning, from today works by Sigmund Freud, WB Yeats, Ford Madx Ford and illustrator Arthur Rackham are today part of the public domain. They can be made cheaply available as educational editions, translated into braille or made into audiobooks, all without anyone needing to give permission or any fees changing hands. They are also available to be reinterpreted and re-used by new artists.

The Telegraph also reported an announcement from Wikimedia UK inviting people to upload sources to Wikimedia Commons:

Wikimedia UK anticipates January 1, “Public Domain Day”, 2010 being a great year for additions to the digital Wikimedia Commons. The poetry of W. B. Yeats, the works of Sigmund Freud, and Arthur Rackham’s classic children’s book illustrations all enter the public domain. When the complexities of copyright no longer encumber reuse of old works, a work that has been a “sleeper” can become a new classic. Perhaps the definitive example of this is “It’s a Wonderful Life“, the 1946 Frank Capra film that became a Christmas classic in the 1980s.

Wikimedia UK promotes the uploading of copyright-free text to Wikisource, a sister site to Wikipedia, so that it can be widely enjoyed. Audio recordings of public domain works may be added to the Wikimedia Commons site, and Wikimedia UK invites you to join us and help digitise and preserve our common cultural heritage. You can make it available for everyone to share, build on, and simply enjoy.

On a less happy note, copyright scholar James Boyle at the Center for the Study of the Public Domain writes:

What is entering the public domain in the United States? Sadly, we will have nothing to celebrate this January 1st. Not a single published work is entering the public domain this year. Or next year. Or the year after. Or the year after that. In fact, in the United States, no publication will enter the public domain until 2019. And wherever in the world you live, you now have to wait a very long time for anything to reach the public domain. When the first copyright law was written in the United States, copyright lasted 14 years, renewable for another 14 years if the author wished. Jefferson or Madison could look at the books written by their contemporaries and confidently expect them to be in the public domain within a decade or two. Now? In the United States, as in most of the world, copyright lasts for the author’s lifetime, plus another 70 years. And we’ve changed the law so that every creative work is automatically copyrighted, even if the author does nothing. What do these laws mean to you? As you can read in our analysis here, they impose great (and in many cases entirely unnecessary) costs on creativity, on libraries and archives, on education and on scholarship. More broadly, they impose costs on our entire collective culture. [...] We have little reason to celebrate on Public Domain Day because our public domain has been shrinking, not growing.

More detailed comment and analysis from the Centre is available at:

See also posts from:

Get Updates