As mentioned in #812126: Decide future and integration there are several efforts to improve search in drupal recently.

Search API is one of them, it is for 7.x branch of drupal.

Now that we have a functional version of the module for 7.x, I think it is the right time to start implementing that API. I actually met Thomas at Copenhagen, and after talking about this, I end up choosing join his effort, since it seems like the more promising. For example, facets are planned in the module.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

marvil07’s picture

After looking at the __awesome__ screencast prepared by Thomas, I am now totally convinced to start working on this :-)

marvil07’s picture

Status: Active » Needs review
FileSize
10.71 KB

This is my start version(aka needs work, search missing, but indexing already works and with tons of debug :-p), not sure about what to store in the "data" for the xapian document, I suppose that is going to be clear when I have implemented SearchApiServiceInterface::search().

drunken monkey’s picture

After looking at the __awesome__ screencast prepared by Thomas, I am now totally convinced to start working on this :-)

Excellent, all according to plan! ;D
And you have chosen a great time for starting this. I believe the module is now sufficiently stable for third party additions (which it sadly wasn't in previous so-called "beta" releases). But don't hesitate to ask if you feel there's something missing in the main module!

Subscribing here, and curious for more.

marvil07’s picture

Now, basic search is working!

So, as the last post, lots of things to care about(debugging messages, catch exceptions, get values instead of hard-coding them, support facets, complete xapian server support), but this is working!

Thomas, I think I am not getting the right way to set values on the server. Should I have to store manually somewhere my settings using SearchApiServiceInterface::configurationFormSubmit()?

marvil07’s picture

Upps.. ignore the last question, seems like it's simple form api stuff, blame me :-p

drunken monkey’s picture

Just wanted to notify you that I made a slight schema change, by adding a machine_name field in #936360: Make servers and indexes exportable. I really have no idea whether this will influence your work (if you update to my current dev version, you'll have to run the update script, in any case), but just wanted to be on the safe side and warn you. ;) Hope this is no real bother for you.

I still didn't manage to check out your integration, but will do so this week-end.

marvil07’s picture

Assigned: Unassigned » marvil07
FileSize
15.13 KB

Thomas, thanks for taking the time to notify me about that patch, like you mentions, seems not really related, but I need to try this patch on the new code, by now, I have a new(cleaner) version.

In this new patch debug messages are out, support for remote xapian databases should be working.

So I still need to:

  • catch exceptions on every xapian call
  • support facets: I need to try to do this, like in other places, I think the actual implementations can help.

Questions:

  • get values instead of hard-coding them: this is really fixed all around, but I need help to get range values(number/position of values to retrieve) to use at SearchApiServiceInterface::search().
  • on SearchApiServiceInterface::search(), I am using $query->getOriginalKeys() to get the original string. I really do not know if it is a better idea to make a method to process the value returned from $query->getKeys(), or if it is better to define a query class which implements SearchApiQueryInterface for xapian. A reference to take a decision here could be $extra variable logic, at xapian_query().
drunken monkey’s picture

get values instead of hard-coding them: this is really fixed all around, but I need help to get range values(number/position of values to retrieve) to use at SearchApiServiceInterface::search().

I don't really understand what you mean with that. Which values are hard-coded, and what is the problem in getting the range parameters? Is it on the API-side or on the Xapian-side?

on SearchApiServiceInterface::search(), I am using $query->getOriginalKeys() to get the original string. I really do not know if it is a better idea to make a method to process the value returned from $query->getKeys(), or if it is better to define a query class which implements SearchApiQueryInterface for xapian. A reference to take a decision here could be $extra variable logic, at xapian_query().

This is a bad idea, as the original keys aren't guaranteed to be a string. For complex queries you can also pass an already parsed keys array to keys(), then the original keys would be an array.
Also, it violates the given parse modes – only for mode "direct" should you use the keys directly. Otherwise you should probably take the approach I used for the Solr backend, and parse the keys array instead of subclassing the query class. You can probably copy a lot of my flattenKeys() method, since Solr and Xapian seem to have about the same syntax.

Also, setting the stemming language globally doesn't really work for multi-language sites. I guess you could create one server for each language, but it would be better, if you could decide the stemming language based on each item's search_api_language field.
Of course, this would be a problem when searching on more than one language … So, OK, maybe it is the best option that way. But not stemming at all should be an option in any case.

marvil07’s picture

Status: Needs review » Needs work

Thanks for taking the time to review this!

I got a little problem: #936956: Argument 1 passed to search_api_admin_server_view() must be an instance of SearchApiServer

About _hard-coded_ values, I mainly was pointing my hard-coded values, but I get rid of most of them in the last patch :-p, but I end up with one more; which are these lines in the search() method:

+++ search_api_xapian/service.inc
@@ -0,0 +1,442 @@
+      //FIXME stop hardcoding this
+      $start = 0;
+      $length = 10;
+++ search_api_xapian/service.inc
@@ -0,0 +1,442 @@
+      $query_parser->set_stemming_strategy(XapianQueryParser::STEM_SOME); //TODO this should be an option

Yep, like you mentioned stemming should be an option, and that's planned ;-)

About using getKeys(), ok, I going to follow the method you mentioned soon.

About languages, yep, I should auto-map values, thanks for pointing that (aka /me use to make lazy first implementations :-p)

Powered by Dreditor.

drunken monkey’s picture

+      $start = 0;
+      $length = 10;

This should probably just be:

$start = $query->getOption('offset') === NULL ? 0 : $query->getOption('offset');
$length = $query->getOption('limit') === NULL ? 1000000 : $query->getOption('limit');

When no limit is set, you have to assume the user a) is insane and b) wants all results. If Xapian has a direct way to specify "no limit", use that instead, of course.

marvil07’s picture

This still needs some work about the details mentioned.

But for now, attaching a better patch with:

  • no more hard-coding values
  • stop depending on xapian.lib.inc from xapian module
  • catch xapian exceptions everywhere
  • add install file to check xapian bindings as requirement.
marvil07’s picture

Status: Needs work » Needs review
FileSize
22.22 KB

Ok, here it is a patch that use ->getKeys() at search borrowing solr flattenKeys() method as suggested.

In the other side, it seems like parseKeys() have some problems to identify quoted strings(it first implode with spaces), so, taking in account that xapian have it own query parse class(see specially the flags parameter on parse_query() method), I think it would be great to have a way to overwrite the query class, but I am not sure how to do that.

Two more big TODOs for this issue(or maybe two more issues?):
- support views integration
- support facets integration

drunken monkey’s picture

I think it would be great to have a way to overwrite the query class, but I am not sure how to do that.

If you only want to change how the query keys are parsed, you can just override the parseModes() and parseKeys() methods. Then the only thing left is to return an instance of your query class in the server's query() method.
It would just be nice if your query class still supported at least "terms". Or, better yet, all default modes (the other two are totally simple anyways).
- support views integration
As long as you implement filters correctly, this will work automatically. That's the beauty of it. ;)

marvil07’s picture

After looking a little more(SearchApiQuery class) on the api, it expects to pass parsed keys as array.

Xapian try to do the search stuff on C++, as optimally as it can, so, all the parsing done with XapianQueryParser class is actually done internally, and the result of the parsing is a XapianQuery object, which is different from the expected array. So I think I would stay with the last patch approach of flattenKeys().

In the other side, it would be great to have a way to use the power of xapian to configure the query(as we can now configure the server options), for example the xapian parser flags to define how to do the parsing.

BTW, I end up finding one thing I think it's a little bug somewhere: #938982: Not all SearchApiQuery options are passed

marvil07’s picture

Just another update, the following patch include:

  • api update: configurationForm* server methods changed its signature
  • api update: use new machine_name to identify servers
  • use the indicated default search_api query conjuntion as the default xapian operator at xapian parse_query
  • avoid xapian query FLAG_DEFAULT to support versions less than 1.0.11
  • implement hook_uninstall
  • better description for server service
  • default way to index unknown field types
  • minor: right default values at xapian server service configuration form

I think this is almost ready for a first commit, I see views integration is working, but filters are not working at view, and I suppose the same for sorting, I need to review my search() method to include filter and sort idea that I have seen in the other implementations.

In the other side, I think facet support is going to be an independent feature request, since it implies propose a way to use facets with xapian search engine.

Thomas, thanks for notifying the api chages by mail.

marvil07’s picture

Well, I still need to implement correctly filters to support views, but now views depends on ctools :-o, so I going to review that soon.

By now, here is a patch that:

  • a better name for search api implementation module
  • avoid namespace problems if both core integration and search_api are enabled
  • see if search_api is enable at hook_uninstall before deleting rows
  • api change: #939482-25: Fix exportables machine_name is now the recommended identificator.

To be able to really work with this patch first we need to rely on this bugs to be closed ;-)

marvil07’s picture

Status: Needs review » Needs work

Ok, soon was not too soon :-/, but I am updating this to point to #962582: Cross-entity searches, that is becoming really interesting and now have a patch! :-)

BTW moving status to reflect the last work to do here implement correctly filters to support view

marvil07’s picture

API changes pending to review/incorporate: #991632-6: Incorporate newly available entity hooks.

Related open issue(bug?): #1002210: Can not use search indexes

pillarsdotnet’s picture

Category: feature » bug
Status: Needs work » Active

Both the xapian and search_api_xapian implementations of hook_requirements() use exactly the same code. If both modules are enabled, this results in a status report with "Array" shown in both the title and description columns.

I can think of three possible resolutions for this problem:

  1. File a patch against Drupal core (system_status, or possibly module_invoke_all) that merges duplicate keys instead of returning arrays where strings are expected. This may involve writing a smarter replacement for array_merge_recursive.
  2. File a patch against xapian that changes either xapian_requirements or search_api_xapian_requirements so that their returned titles differ.
  3. File a patch against xapian that changes either xapian_requirements or search_api_xapian_requirements so that an error is shown if both modules are enabled. Also mention in the README.txt that these modules are intended to be mutually exclusive.
pillarsdotnet’s picture

Status: Needs review » Active
FileSize
570 bytes

4. File a patch against search_api_xapian that lets search_api_xapian_requirements return early if xapian is also enabled.

pillarsdotnet’s picture

Status: Active » Needs review
marvil07’s picture

Category: bug » feature
Status: Active » Fixed

Ok, I just finally committed the patch at #16 after taking a look to API changes from Search API module.

I have also added three new issues related with this one:

@pillarsdotnet: Thanks for reporting the compatibility problem, but I am moving that to other issue #1042704: Fix compatibility of xapian and search_api_xapian hook_requirements() implementations, since it's almost offtopic here.

drunken monkey’s picture

Great work! Even though creating the final 1.0 release right afterwards was maybe a bit risky. ;)

I wanted to try it out myself, but even though everything up to then worked fine, when indexing (with a local service) I got a fatal error:

Fatal error: No matching function for overloaded 'new_WritableDatabase' in /usr/share/php/xapian.php on line 1134

About the same (just without "Writable") happened when I tried to search.
Do you know how I can fix this? I'm completely new to Xapian, so probably I've just messed something up during installation …
If this can happen easily, it maybe should be mentioned in the README.txt.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

pillarsdotnet’s picture

Status: Closed (fixed) » Needs review
FileSize
33.22 KB

Re-rolled #16 after the Great Git Migration. If the commit made it, I didn't see it.

marvil07’s picture

Status: Needs review » Closed (fixed)
pillarsdotnet’s picture

Category: feature » support
Status: Closed (fixed) » Active

Are you planning on backporting to 7.x-1.x or is that a dead branch?

marvil07’s picture

Category: support » feature
Status: Active » Closed (fixed)

@pillarsdotnet: Please stop hijacking this issue :-/, open another one instead. I am using master now for 7.x