Needs review
Project:
RDF Indexer
Version:
7.x-1.x-dev
Component:
Code
Priority:
Minor
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
27 Jun 2013 at 14:11 UTC
Updated:
23 Oct 2020 at 15:22 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
scor commentedHi Adam, this is great! Could you indicate what means Virtuoso supports for receiving RDF data via HTTP? SPARQL Update, or something else? If there are different ways of sending RDF data to Virtuoso, it would be good to also compared their pros/cons (in particular in terms of performance) so we can decide which one is the most appropriate for RDF Indexer to use.
Comment #2
ashepherd commentedI believe the version I have locally supports SPARQL Update (http://docs.openlinksw.com/virtuoso/rdfsparql.html) and it may support other methods as well, so I will investigate and, if so, write multiple implementations to benchmarking. I imagine that seomwhere in the conifugration UI, I'll request to Drupal user for a Virtuoso username/password that has rights to the SPARQL_UPDATE Virtuoso user-group, but it looks like I can handle that with the SearchApiAbstractService methods. I'll keep this thread posted of progress, and feel free to get in touch anytime.
Comment #3
ashepherd commentedComment #4
ashepherd commentedI've got a proof of concept working at: https://github.com/ashepherd/rdf_virtuoso
What's the best way to "collaborate". Should I create a Drupal.org sandbox for this and go down the path of creating a separate module? Soon, I'll get a chance to test it out with some data to see how well it performs.
Comment #5
ashepherd commentedUsing devel generate, I created 1000 nodes which took about 20 seconds on my laptop. I then indexed those, verified they exist in Virtuoso, and that took about 20 seconds as well. I have a Drupal site on a larger server with 500,000 nodes, so I'm going to try it out there.
{UPDATE}
My dev server running a bunch of sites, and not optimized, updated the same Virtuoso instance (on the same server) with 583,282 nodes in 2hours and 12min averaging about 4,419 nodes per minute.
Comment #6
scor commentedThese numbers look great. I need to look at your code further, but it looks to me that there is a lot of code duplication. The service class you implement should maybe extends the service class from rdf_indexer. I think I should abstract the current ARC2 service class into a rdf_indexer base class, that each backend service class (ARC2, virtuoso, OWLIM, etc.) would extend.
Also, if your code was made as a patch against rdf_indexer, you would just need to register your class in rdf_indexer_search_api_service_info() and not need to repeat the rest of the code. Looks like you managed to get all the code to deal with Virtuoso indexing, so this is great progress!
Comment #7
ashepherd commentedI really like your idea of a base class which which would remove a lot of my dupe code. I'm happy to write the base class and submit a patch including the service.virtuoso.inc file w/ an updated rdf_indexer_search_api_service_info()
Comment #8
ashepherd commentedHere is the patch with a base class and the virtuoso Service.
In testing, I've found that the entity module is causing problems with:
Comment #9
ashepherd commentedHere is an updated version of the patch:
It fixes a few Virtuoso issues:
Comment #10
scor commentedThanks Adam for your continued work on this. I haven't reviewed the entire code in the patch, but it looks like this include LICENSE.txt and rdf_indexer-add_virtuoso_support-2029717-8.patch (which explain the tripling in size).
Also, it's a good idea to have a separate file for each backend (like it is the case for Virtuoso), but it would also make sense to have a separate class file for ARC2.
Comment #11
scor commentedComment #12
ashepherd commentedMy apologies!
I separated out the ARC2 implementation into it's own file, and I modified the logic for when deleteItems is passed $ids = 'all'. Originally, is I saw 'all' I was clearing the entire graph. However, if another index is also writing to the same graph (let's say someone setup separate indexes for nodes, taxonomy terms, or other entity types), it was wiping out another indexes data. I left the clearGraph() line in the code, but commented it out in service.inc
Comment #13
ashepherd commentedI'm an idiot. Forgot to run 'git add' before creating the patch.
Comment #14
ashepherd commentedHi Steph,
Here is the latest version of the Virtuoso Service. It patches branch 7.x-1.x, and adds a base service class which the ARC2 & Virtuoso service implement.
cheers, Adam
Comment #15
ashepherd commentedLatest version of patch fixes bug of unescaped forward slashes and adds support for xml:lang
Comment #16
avpaderno