Inferencing, or machine reasoning has a slightly unsavoury reputation perhaps stemming from the failure of Strong AI and its association with science fiction. This is a bit unfortunate and it could be argued that it has led Semantic Web technologies to be underdeveloped.

With the Semantic Web and RDF we are concerned with simple statements, or assertions. When humans make statements they generally rely on a large amount of background knowledge and contextual information to make their meaning clear without it being explicit. For example, if I say, “Mary had a little lamb,” it is unnecessary to explain that Mary is a person, Mary is female, a lamb is a young sheep, a sheep is a kind of quadrupedal animal or to digress in a discussion of what it means to “have” something, to what extent notions of ownership can extend to animals, or even the idea of time, past, present and future.

If we transcribe this first line of the nursery rhyme into the Notation 3 language so that it can be ingested by a computer, we might get,

Mary had [ a Lamb; size little ].

And because this transcription has to be done manually by a human unless we can invent some very good Natural Language Processing software, this is probably the most we can expect someone to do before it begins to get very tedious. We certainly don’t want to have to teach the computer about background facts more than once, but we can imagine that we have some set of background information at our disposal, indeed OpenCyc and SUMO can help here. Indeed OpenCyc can teach us that a lamb is a young sheep, a domesticated animal, a ruminant, a quadruped, a terrestrial organism, etc..

It is natural to want to be able to ask simple questions such as, “what type of animal did Mary have” which are actually quite easy to express in a query language like SPARQL,

SELECT ?animal ?type         WHERE {                 Mary had ?animal .                 ?animal a Animal .                 ?animal a ?type         }

however if this query were to be evaluated only against the facts transcribed from the nursery rhyme it would return no results. Mary had a lamb not an animal. To get the right answer we need to introduce rules in addition to our background facts. In this case the rule we need is very simple,

{ ?x a ?class . ?class subClassOf ?superclass } => { ?x a ?superclass }

This simply says that, for all things (x), if they have a class, and that class is a subclass of some superclass, then the thing is also whatever the superclass is. So if x is a lamb and lamb is a subclass of sheep then x is a sheep. Likewise, since sheep is a subclass of animal then x is an animal.

It is precisely this kind of situation where machine reasoning is helpful, to evaluate this type of simple rule. It is nothing spectacular, just following chains of statements made by humans to answer questions that would be obvious to a two-year old. That said, this is just a toy example, the same principle can be used with facts and questions that are not quite so obvious. However, if the rules get much more complicated or numerous, it becomes quite a lot more computationally expensive to evaluate them.

We have a very good reasoning engine, called FuXi, that is supported in ORDF for implementing these sorts of rules. Behind the scenes it is used in searching for specific types of things in the Bibliographica, one can search for publications, or articles, or books or, in some cases, chapters, but a search at a higher level of granularity will return all types of results and a search at a lower level will return only the types sought.

Website | + posts

Rufus Pollock is Founder and President of Open Knowledge.