According to what scor said about a previous post in the https://groups.drupal.org/semantic-web group I thought it would be best to change my post ( https://groups.drupal.org/node/508570 ) to an issue. More information is available at the post, but in short it is about changing the taxonomy tag and annotion markup to RDFa so that it can be machine readable in hopes that it will be automatically indexed to the arc2 store set up according to https://www.drupal.org/node/2028111 . For the annotation part I thought that this would involve changing at least line 896 in
https://github.com/szabyg/annotate.js/blob/gh-pages/lib/annotate.js and then working with Drupal in such a way to write this markup.

Comments

bshambaugh created an issue.

fago’s picture

I see. I guess the easiest way to customize that, would be overriding the JS file used in your theme and do the necessary customizations there. Probably it would be good if that works without hacking annotate.js, however it does not seem the library got much love recently.

scor’s picture

So I guess you are asking about 2 things. Please correct where I got anything wrong.

1. How to get annotate.js to render the RDFa markup that you want. By default, I believe it will generate something like this:

<a href="http://example.com/res" resource="http://example.com/res" rel="skos:related">example</a>

which will give you a basic triple about your page:

<http://your.site.com/page>   skos:related <http://example.com/res> .

Do you want to customize that, or is that good enough for your use case?

2. How to get ARC2 and RDF indexer to not only index the node field, but also parse the RDFa generated by Annotate.js inside the node body.
You have at least 2 options for this, you can either:
a) write a patch for rdf_indexer that would add some code inside RdfIndexerArc2StoreService::indexItems() to go through the entity, find the text fields (or follow a predefined list of known text fields), parse their content and add the RDF triples to the $rdf variable before it gets inserted into the store,or
b) write a patch for rdfx that would extend rdfx_get_rdf_model() to follow a similar approach.

Hope that helps, and sorry for the delayed response.

bshambaugh’s picture

At first I was thinking about what the annotate button did in the IKScE demo. After inspecting the element from the browser console, I found it was calling annotate.js. I looked at the annotate.js code and found it produced something very much like what occurred in the content:encoded tag when I chose to view a particular node in rdf/xml that had been previously annotated.

The markup for an annotation (e.g. Austria) looked like:
&lt;a aria-disabled=&quot;false&quot; href=&quot;http://localhost/iksce/?q=node/15&quot; resource=&quot;http://localhost/iksce/?q=node/15&quot; rel=&quot;related&quot; class=&quot;entity place Place acknowledged&quot;&gt;Austria&lt;/a&gt;

After studying the markup, I did not believe it was true RDFa, so I conjured up what I thought it should look like: <p>Salzburg Research Forschungsgesellschaft mbH ist a non-academic research institute in <span vocab="http://schema.org/"><a property="schema:Comment" aria-disabled="false" href="http://localhost/iksce/?q=node/15" resource="http://localhost/iksce/?q=node/15" rel="sameAs" class="entity place Place acknowledged">Austria</a></span>. The research technology organisation specializes in applied research and development in the field of information and communication technologies (ICT) and New Media.</p>

In so do I believe I changed skos:related into owl:sameAs. Maybe this is the wrong prose, I am not sure. I wrote earlier that I want "Austria" sameAs "http://localhost/iksce/?q=node/15" or something that has that meaning instead of <http://localhost/iksce/node/10> <schema:sameAs> <http://localhost/iksce/?q=node/15> and <http://localhost/iksce/node/10> schema:Comment "Austria". This was not possible with the RDFa in the span tags I wrote above.

Austria is string, which gave me no way to add any more information about Austria (say a URI it was related to) without violating the RDFa W3C spec. I chose to make up a dummy URI "#Austria1" to represent a particular occurrence of Austria, and I settled on the RDFa Markup:

<div class="field-item-even" property="content:encoded">
<p>Salzburg Research Forschungsgesellschaft mbH ist a non-academic research institute in <span vocab="http://schema.org/"><span property="schema:comment" resource="http://localhost/iksce/?q=node/10#Austria1"></span><span resource="http://localhost/iksce/?q=node/10#Austria1"><a property="schema:name" rel="schema:SameAs" aria-disabled="false" href="http://localhost/iksce/?q=node/15" class="entity place Place acknowledged">Austria</a></span></span>. The research technology organisation specializes in applied research and development in the field of information and communication technologies (ICT) and New Media.</p>
</div>

When this is parsed into Turtle with Gregg Kellogg's RDFa Distiller I got:

@prefix content: <http://purl.org/rss/1.0/modules/content/> .
@prefix rdfa: <http://www.w3.org/ns/rdfa#> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://localhost/iksce/?q=node/10> content:encoded """
Salzburg Research Forschungsgesellschaft mbH ist a non-academic research institute in Austria. The research technology organisation specializes in applied research and development in the field of information and communication technologies (ICT) and New Media.
""";
   schema:comment <http://localhost/iksce/?q=node/10#Austria1>;
   rdfa:usesVocabulary schema: .

<http://localhost/iksce/?q=node/10#Austria1> schema:SameAs <http://localhost/iksce/?q=node/15>;
   schema:name "Austria" .

So, I tracked down the original code pattern around the string pattern to Annotate.js and I thought originally I would have to change that to give the Turtle code above. This proved to be difficult since I had no experience with jQuery and changing the line in the code that appeared to do this (line 896 on Github for Annotate.js) caused odd results and did not give me what I wanted. The original developer, Szaby Grünwald, has since moved on but he did give me some names at Salzburg Research for people working on similar things. I did get afraid, and I saw no way with a person of my skill and experience actually getting the desired result in the time frame set out.

I settled on something crude, screen scraping. First I attacked the tagging issue, another function of the IKScE demo. Each time I tagged something I noticed in a SPARQL query off the ARC2 store I got something like:

http://localhost/iksce/node/16 http://schema.org/isRelatedTo http://localhost/iksce/taxonomy/term/1

The http://schema.org/isRelatedTo was set for the RDF mapping for the vie.js annotate Term Reference Field.
When I went to http://localhost/iksce/taxonomy/term/1 I found that the URI field associated with the taxonomy term had been filled in. I wanted this URI in my SPARQL query results off the ARC2 store. So I found out how to first scrape this URI off the taxonomy page and then write directly to ARC2 with a SPARQL INSERT. I got something like: http://localhost/iksce/taxonomy/term/1 http://schema.org/Comment http://dbpedia.org/Quiet_Riot as an additional triple where http://dbpedia.org/Quiet_Riot is what I scraped from the URI field.

I am guessing at this point that I can also scrape the pages Annotate.js creates since the end goal is like the goal I had for the tagging: to have triples returned from a SPARQL query on the ARC2 store.

Adding a patch to the RDF Indexer as scor says in point 2 sounds like an elegant way to get markup from the page (node) into the ARC2 store. Repeating for clarity "a) write a patch for rdf_indexer that would add some code inside RdfIndexerArc2StoreService::indexItems() to go through the entity, find the text fields (or follow a predefined list of known text fields), parse their content and add the RDF triples to the $rdf variable before it gets inserted into the store"

For reference: I pulled some of this dialogue from the thread starting at: https://lists.w3.org/Archives/Public/semantic-web/2016Jan/0052.html . Although I put this on the main Semantic Web Drupal Group, my code for the tagging part is: https://github.com/bshambaugh/ARC2-Experiments/tree/master/scrape_and_write . Apologies for having trouble at first figuring how to get posts on drupal.org.

bshambaugh’s picture

The line The http://schema.org/isRelatedTo

should be:

The http://schema.org/isRelatedTo was set for the RDF mapping for the VIE.js autocomplete Term Reference Field..