RDFa: Add semantics from the ground up [#378144]

This is a proposal for adding semantics to the Drupal objects. This idea comes from the need to have a centralized way to store the meaning of the data Drupal deals with. At the moment the RDFa semantics in core is being added in the theme layer without any wider consideration. If we define the semantics earlier in the workflow and before the theme layer, other modules could reuse or alter these semantics if they want to. There are other benefits:
- Modules can export data along with their semantics in the format they want (RDF/XML, ntriple etc.). Core would only support RDFa, but contrib could directly use these semantics for various purposes (export, import etc.).
- We don't have to double the work of defining the semantics in the theme layer and in a contrib helper module which could otherwise lead to conflicts in the semantics. Having semantics defined in the theme layer means modules cannot change these easily without rewriting the theme functions.
- The theme layer does not have to worry about the semantics anymore, it simply outputs it along with the data.
- Better control on what namespaces are being used for a given page so that only these namespaces are included in the header of the HTML document.

Example for a node:

  $node->nid = 3;
  $node->type = 'blog';
  $node->title = 'Title of my blog post';
  $node->body = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.';
  $node->uid = 2;
  $node->status = 1;
  $node->format = 1;
  $node->moderate = 0;
  $node->promote = 0;
  $node->sticky = 0;
  $node->name = 'john';
  $node->created = '1235130980';
  $node->changed = '1235130980';
  $node->comment = 0;
  $node->rdf = array(
    'type' => 'sioct:Weblog',
    'title' => 'dc:title',
    'body' => 'sioc:content',
    'uid' => 'dc:creator',
  );

Example for a taxonomy term:

  $term->tid = 1;
  $term->vid = 1;
  $term->name = 'Art';
  $term->description = 'Art refers to a diverse range of human activities, creations, and expressions';
  $term->rdf =  array(
    'type' => 'skos:Concept',
    'name' => 'rdfs:label',
    'description' => 'rdfs:comment',
    'property' => 'sioc:topic',
  );

same goes for comment, user etc.

Typical use case: when a module creates its own content type via hook_node_info(), the developer should be able to specify the content type semantics along with its definition. This gets automatically exported in RDFa. Other modules can access and alter these semantics. How sweet!

I would like to discuss this idea before moving on with the work of RDFa in core.

Comments

Comment #1

Frando CreditAttribution: Frando commented 20 February 2009 at 16:10

This sounds absolutely amazing!
Keeping RDF data with the main data objects in Drupal could then be used for many purposes, also by contrib. Also, it could also be of use when e.g. creating XML exports, as every bit of structured semantics we have helps there. Adding this would be a huge step forward IMO.

To pass the data to the theming layer, I'd propose we add the RDF name of each child of the node to the array we create for drupal_render, as in a adding an '#rdf' => 'dc:title' property to the 'title' element in the node array, etc.

Comment #2

fago

German

Vienna

CreditAttribution: fago commented 24 February 2009 at 11:13

I agree that d7 should ship with RDF data for everything, however I'm not sure that this approach suffices - we might want to add triples that have to be generated in a more complex way out of the object also - so it might make sense to support separate handlers for getting triples. So I think this could learn a lot from #113614: Add centralized token/placeholder substitution to core, which in turn could be based upon this patch (see http://drupal.org/node/113614#comment-1285264).

Comment #3

dman CreditAttribution: dman commented 1 March 2009 at 23:22

While I'm totally, madly in favor of thinking about internal Drupal structures as RDF, and for exposing them, I think this exact approach (notation at least) needs further thinking.
I'm already doing full RDF representations of taxonomies in taxonomy_xml, and have spent large bits of the last year researching different syntaxes. And I still don't know which dialect is correct. :-/

  $term->rdf =  array(
    'type' => 'skos:Concept',
    'name' => 'rdfs:label',
    'description' => 'rdfs:comment',
    'property' => 'sioc:topic',
  );

- Having to cobble together 3 or 4 different namespaces to say one thing feels like we are using the tools wrong.
Not even including parents and synonyms yet - I use rdfs:subClassOf and owl:equivalentClass, although sometimes I feel I should use wordnet wn:hyponym etc. I'm guilty of it myself, but feel dirty using lots of disparate namespaces just because none fit.
But this is because of the ways taxonomy gets used - when doing a locations DB, a medical dictionary, a product classification scheme, sometimes rdfs:subClassOf, wn:hyponym, sioc:BroaderTerm are differently appropriate. Which is why I admit I haven't found the true answer. But this did lead me to this:

Although it may be getting a little TOO abstract, I think of Drupla-RDF (as I did back when I tried to build relationship.module) as a mapping from what drupal thinks of as a 'term' and what our rdf dialect does. We hope to find a 1:1 relationship, but that's not really true.
What I'm trying to say is:

Leave drupal-data in a drupal-native format as much as possible.

Expose a get_rdf($obj, $type) function that provides the mapping and returns the namespaced rdf array (and the appropriate namespaces while it's at it.)

  // Simple mapping of a set of owl:equivalentProperty statements.
  $drupal_rdf_mappings =  array(
    'node:title' => 'dc:title',
    'node:body' => 'sioc:content',
    'node:name' => 'dc:creator',
    'term:type' => 'skos:Concept',
    'term:name' => 'rdfs:label',
    'term:description' => 'rdfs:comment',
    'term:property' => 'sioc:topic',
  );

get_rdf() will of course invoke the hooks required to let everyone else mess with the data.

In some cases, the mapping can (needs to) be done via callbacks, eg dc:creater = array('user_get_fullname', 'node:uid') or something.

This mapping can be flux, and tweakable depending on the site. For some vocabs, terms should be exposed as folksonomy:tag while for others they are just a glossary:item.

I've been thinking about these things a lot (too much) over the last few years.

.dan.

Comment #4

fago

German

Vienna

CreditAttribution: fago commented 7 March 2009 at 16:10

@#2: see http://groups.drupal.org/node/19786

@#3: I agree that we should have internally fixed prefixes/namespaces, probably including some basic schema. This ensures metadata is available in a unique way and allows other code to make use of it easily - e.g. the token integration. Then we could add a mapping to external vocabularies. Should we do that or not? Probably that's one of the first things we should agree upon.

Comment #5

scor CreditAttribution: scor commented 7 March 2009 at 18:45

@#1: yes! I'll look into drupal_render

@#3:

- Having to cobble together 3 or 4 different namespaces to say one thing feels like we are using the tools wrong.[..] I'm guilty of it myself, but feel dirty using lots of disparate namespaces just because none fit.

This is a different issue. I'm not debating which namespaces/terms to use here, but simply how to store them, whatever namespace they use. In a different issue, we will need to discuss which terms fit the best to the each content types and fiels defined in core. contrib modules should be able to alter these mappings depending on their purpose.

when doing a locations DB, a medical dictionary, a product classification scheme, sometimes rdfs:subClassOf, wn:hyponym, sioc:BroaderTerm are differently appropriate.

and that is fine because each vocabulary will have its own set of RDF mappings, the same way field instances mappings will be different from each other.

could you elaborate a bit more on your idea of get_rdf($obj, $type) please? What does 'term:property' => 'sioc:topic', mean here, which vocabulary and term does it refer to? Were are these mappings defined? I would like them to be definable in the module and serialized in the db along with the rest of the content type/fields settings. They should be also definable via a basic UI as part of the core content type creator (and respectively the Fields API whether it ends up in core or contrib).

Comment #6

febbraro CreditAttribution: febbraro commented 9 March 2009 at 14:03

subscribing

Comment #7

pvhee CreditAttribution: pvhee commented 13 March 2009 at 11:50

subscribing

Comment #8

mitchell CreditAttribution: mitchell commented 14 March 2009 at 23:28

subscribing

Comment #9

barinder CreditAttribution: barinder commented 17 March 2009 at 13:46

Category:

feature

» task

subscribing

Comment #10

peterx CreditAttribution: peterx commented 26 April 2009 at 23:08

Why dc:creator?

Something worth discussing in a separate post and documenting. When we associate RDF with a column of a table, we go throuh a decision process. If we had a page where we could document our decisions, others could follow. Something line:
Table: Column: RDF tag: References to the formal documentation for that namespace and tag. Why we use that namespace and tag.

users: uid: Dublin Core (http://dublincore.org/): creator (http://dublincore.org/documents/usageguide/elements.shtml). Just copied the previous example.

We could then have long discussions about the meaning of content. In the creator case, the Dublin Core asks for a name, which is users.name, but users.name is not in other tables.

We then branch into using other namespaces. How do we differentiate between users:name and users.uid? Is there a namespace covering this? Do we start drupal: or the more generic cms:? We can use drupal:uid or equivalent to get to a definition that says the dc:creator is users.name. At this point we can define the RDF meaning of a field when defining the schema.

We can include the definition with our modules so people will better understand the meaning of columns in the database. In effect, any column with an RDF description is intended for public exposure and anything without RDF is for internal use only with no intention of long term consistency.

Comment #11

scor CreditAttribution: scor commented 27 April 2009 at 08:54

dc:creator is just an example and is not so relevant in this issue. What I want to discuss here is how and where to specify these mappings, whatever they are. We will need to agree on the mappings on a different issue, but first let's clarify the how and where. I'm not sure we should clutter our output in core with proprietary RDF terms on a new namespace drupal: or cms:. These won't be useful for RDFa consumers which will rather find things like dc:title or foaf:name. users.uid will translate into a URI, and users.name into a foaf:name value.

Comment #12

Freso CreditAttribution: Freso commented 27 April 2009 at 10:16

Subscribing. (Sorry, I haven't had time to read it through yet, so I have no useful feedback... yet.)

Comment #13

peterx CreditAttribution: peterx commented 27 April 2009 at 23:36

Hello Scor,
My previous post was more about starting documentation on what we used and explaining the relationship between Drupal fields and the RDF terms applied to them. Start a page where people can place the references to external documents.

The reverse path will also appear in the documentation. When we find RDF output, we can search the documentation page to find the RDF term then find the field that feeds into the RDF element.

http://drupal.org/node/219856 is a nice start on standards and it is buried in the RDF module.

Comment #14

scor CreditAttribution: scor commented 11 May 2009 at 18:39

We just posted a proposal for storing the mappings in core at http://groups.drupal.org/node/22124. please review and give your feedback (either here or edit the wiki page on gdo).

Comment #15

shunting CreditAttribution: shunting commented 27 May 2009 at 00:36

Since there can be synomomous terms across ontologies, a syntax like this might be more appropriate. Note arrays of values, not values.

$term->rdf =  array(
    'type' => array('skos:Concept', 'foo:Idea'),
    'name' => array('rdfs:label'),
    'description' => array('rdfs:comment'),
    'property' => array('sioc:topic'),
  );

Otherwise, we've violated the principle of decentralization:

Centralization in social systems can apply to concepts, too. For example, if we make a knowledge representation system which requires anyone who uses the concept of "automobile" to use the term "http://www.kr.org/stds/industry/automobile" then we restrict the set of uses of the system to those for whom this particular formulation of what an automobile is works. The Semantic Web must avoid such conceptual bottlenecks just as the Internet avoids such network bottlenecks.

Comment #16

dman CreditAttribution: dman commented 27 May 2009 at 01:49

First, the mapping at this level is about what names are used to represent intrinsic structural data. The choice between rdfs:Type and dc:type is at a higher meta-level than the choice between krog:automobile and wordnet:automobile.

At some point we have to hitch our understanding to some common understanding. I personally tried to build a system where even the concept of 'type', 'domain', 'range' etc were externally defined and mutable. It collapsed under its own weight - mostly my failings of course, but it taught me that you need to start with some hard-coded knowledge in the system. Like "what do we mean by 'type'"

anyway,
As scor said, each vocabulary can have its own mappings.
A 1:1 mapping of concepts (not a 1:many) is optimal on a per-vocab basis.
We don't want to be saying that a term is both a 'skos:Concept' and a 'foo:Idea' at the same time and in the same context. The only way to represent that would be by doubling up heaps in the serialization output.
Sometimes a drupal 'term' means one or the other, but only one at a time depending on the facet.

What we can do (with per-vocab mapping) is say that terms in the vocabulary called "Study topics" represent 'skos:Concept' and terms in the vocabulary "Discussion Topics" are notated as being of type 'foo:Idea'.

And if appropriate, with nice query support, we just tag that [ 'skos:Concept' owl:equivalentClass 'foo:Idea' ]

Comment #17

scor CreditAttribution: scor commented 16 July 2009 at 14:52

fyi, I've updated the main patch for RDF in core at #493030: RDF #1: core RDF module. All the other RDF patches are at http://drupal.org/project/issues/search/drupal?issue_tags=RDF
make sure to subscribe to these issues as it's where most of the work will be discussed. We can keep this issue for general discussions.

@shunting: totally agreed with using arrays of properties, and that's already what we are doing in the current patches #493030: RDF #1: core RDF module