Exporting data to the RDF store [#1271186]

This relates to #1199472: Extend core RDF mappings to support compound fields, but is different enough that I thought it deserved a separate issue.

I am trying to export data from an embedded spreadsheet (the sheetnode module) to the RDF store. I am assuming that this is best done using compound fields. So far I am taking the approach of creating the RDF output on the fly using hook_rdf_model_alter (with the appropriate patches to rdfx, namely #1240778: drupal_alter() support in rdfx_get_rdf_model() and #1237078: Align rdfx_get_rdf_model() prototype with entity API naming convention). With that, I have the following code that seems to work OK, at least in being appropriately called during the call to build the RDF data:

function sheetnode_rdf_model_alter($res, &$context)	{
	if ($context['type'] == 'node')	{
		//	Get the wrapper from the node ID
		$wrapper = entity_metadata_wrapper($context['type'], $context['data']);
		//	Get the entity data from the wrapper
		$entity = $wraper->value();
		$bundle_type = $entity->type;
		if ($bundle_type == 'sheetnode')	{
			//	This is a sheetnode type, so we can add the RDF data
			Subject = ??;
			Predicate = ??;
			$res->index[Subject][Predicate]['0']['value'] 		= 'MyValue';
			$res->index[Subject][Predicate]['0']['type'] 		= 'MyType';
			$res->index[Subject][Predicate]['0']['datatype'] 	= 'MyDataType';
		}
	}
}

I have not yet figured out exactly how to programmatically get the Predicate or Subject value from the CURIE, but I am not expecting that to be difficult. My main question has to do with how to translate the compound #mapping designation into these assignments. Should it be ['#mapping']['value'], for example, in place of ['value'], or something else? Also, if I set this correctly, can I expect that the SPARQL queries would be able to pick it up OK? Or is trying this too bleeding edge at this time? Any comments about a more appropriate way to go about this are also welcome, or whether this approach is close to best practices or not at this time.

Comments

Comment #0.0

bkudrle CreditAttribution: bkudrle commented 6 September 2011 at 18:31

Issue summary:

View changes

Correct text

Comment #0.1

bkudrle CreditAttribution: bkudrle commented 6 September 2011 at 18:43

Issue summary:

View changes

Added span tag to try to show status info

Comment #0.2

bkudrle CreditAttribution: bkudrle commented 6 September 2011 at 18:45

Issue summary:

View changes

Use span tags for status on other two links

Comment #1

scor CreditAttribution: scor commented 6 September 2011 at 18:50

I'm not very familiar with the sheetnode module, but I guess what you are trying to do is create a new (RDF) resource per row and a triple per column?
- the subject will be whatever URI you want it to be associated with each row of your spreadsheet, if you have unique IDs in there, it could be something like http://example.org/node/123#ID, or else if you already have unique URIs in your spreadsheet you could just use that.
- the predicate will related to the column from which the value is coming from. Say for example the column name in a list of people, you could use the foaf:name predicate. The predicate is the same for every value of a given column. that's where the compounds RDF mappings could come into play, you could define it via an API, but given that we don't have a stable API for that yet, it's maybe better to hard code them in your module for now, e.g. in sheetnode_rdf_model_alter() via a switch() statement.

$res->index[Subject][Predicate]['0']['value'] = 'MyValue';

MyValue would be the value of the current cell.

$res->index[Subject][Predicate]['0']['type'] = 'MyType';

This line is to set the type in the ARC2 RDF index structure. you probably don't want to include that line unless you want to use very specific types such as 'uri', but if all your value are just plain text or number (no URI), you can set it to 'literal' or leave it out so ARC2 automatically guesses the type.

$res->index[Subject][Predicate]['0']['datatype'] = 'MyDataType';

That's the datatype in RDF, again, not necessary if you are just exporting plain text, but you could set it to xsd:integer for example if you really care about having strict integer in your RDF output.

Also, if I set this correctly, can I expect that the SPARQL queries would be able to pick it up OK?

absolutely, whatever RDF you generate during the rdfx_get_rdf_model() and its drupal_alter implementations will end up in the SPARQL endpoint. I suggest to first work out the RDF output you want by visiting node/123.rdf (make sure to have the RESTws module enabled). Then index the whole site with SPARQL endpoint. Note that if you make a change in sheetnode_rdf_model_alter() it will be reflected in the RDF export of each node/entity, but you will need to reindex or resave the node to see the new RDF data in the SPARQL endpoint.

Hope that helps. Please ask more questions if you have here or on IRC #drupal-rdf. It's great to see such advanced uses of the RDF modules!

Comment #1.0

scor CreditAttribution: scor commented 6 September 2011 at 18:52

Issue summary:

View changes

Added #

Comment #2

bkudrle CreditAttribution: bkudrle commented 7 September 2011 at 23:54

@scor - That definitely helps. Thanks for taking the time to reply. I will definitely take advantage of the RestWS approach to check the RDF output - I hadn't been doing that previously. And it is good to know that there is no problem with the SPARQL endpoint with these custom formats. It is exciting to be able to do this in Drupal. There have been several projects that have done something similar, e.g., RDF 123 and XL Wrap, but they are Java based. Now with the excellent work that you and Lin have done in maturing the RDF and SPARQL capabilities in Drupal, it is possible to start thinking about offering these capabilities in a more approachable manner. It will probably take several days to get this working like it should, but this sets me on the right path. Thanks again.