While there were initially doubts about using Solr field identifiers with colons, tests showed no problems and escaping was working fine, so we decided to just use the Search API field identifiers as-is in Solr and thereby to avoid having to keep track of aliases (like, e.g., the DB backend has got to do).

However, recently several specific use cases and advanced features have popped up (see, e.g., #2053553: Fix spatial featues with the "compatibility mode" and #1744250-17: Support location based search) which don't allow field escaping and therefore won't work with field names that contain colons. I discovered (though I don't find the link anymore), that field names that aren't valid Java identifiers are generally discouraged.
So, we're left with the problem of how to deal with this situation. If we just rewrite the way we map Search API field identifiers to Solr field names, everyone would have to re-index which I would really hate to enforce.

I therefore propose the safer way of introducing a server setting, "Compatibility mode", and to only escape field identifiers if it is enabled. It would by default disabled for existing servers, but enabled for new ones. For escaping, I'd suggest just replacing all colons with a double underscore (":" => "__"), which should virtually never lead to clashes – and when it does, people can just disable the setting, as long as they don't want to sort by distance. Otherwise, we'd have to implement additional code to keep track of all field "aliases" like the DB backend, leading to more complicated code and additional sources of error.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

drunken monkey’s picture

Status: Active » Needs review
FileSize
4.15 KB

Patch attached.

andrewbelcher’s picture

Status: Needs review » Needs work

I wonder whether 'Legacy mode' may be better than 'Compatibility mode'. Or to maintain the negative state of existing servers, we could call it 'Standards mode'... There's something about compatibility mode that makes me think of doing something you don't want to so that you're compatible with a different system that doesn't behave properly, where as we're fixing an issue in ours and allowing things to continue using the old system to avoid having to re-index...

Having a quick look at the spec, it looks like $ may also be valid in identifiers. I can't quite figure out if they are only valid at the start of an identifier, or anywhere within... If anywhere in, that may be an alternative that would ensure no conflicts...

I think also it needs to explicitly clear the index rather than just re-index it. Otherwise I think you could end up with legacy data...

Other than that, looks good from looking at the patch. I will see if I can find some time to test it...

drunken monkey’s picture

Status: Needs work » Needs review
FileSize
4.49 KB

Thanks for your feedback!

You're right, it does have a negative taste to it. I like "Standards mode", that seems to capture it better.

Also, I thought we could maybe just hide the option if it is enabled (and on server create), so new users won't even be aware of it. After all, there are virtually no disadvantages, except for field clashes.

Dollar signs are very interesting, and really seem to work fine. Thanks for the suggestion! I have to add, though, that writing "valid Java identifiers" was just a shortcut – the restriction doesn't have anything to do with it, the usual valid identifiers are just what works in most cases. The nasty thing is that there aren't any explicit restrictions by Solr. Some things just don't work, and you have to look at each feature separately to see what that would be.
So, while dollar signs are legal anywhere in Java identifiers, there is no guarantee that they'll always be handled correctly in Solr. To be really on the safe side, I'd guess we'd have to stick to alphanumeric characters and the underscore (like I did in my patch).

However, as said, dollar signs seem to work, and since they completely eliminate the (faint) possibility of field name conflicts, I think I prefer them over using double underscores. It would just suck if we'd have to let people re-index again, only because we find out that dollar signs don't always work either …
Anyways, let's just go with that. When conflicts become impossible, hiding the option altogether (except for old servers) becomes an even better alternative.

Regarding re-indexing vs. clearing the index: the data will only become invalid for some fields, all others would work like normal. In such cases, we usually only enforce re-indexing, to avoid an unnecessarily large impact. Users for which the incomplete data would be unacceptable can always clear the index themselves. (We can mention that in the README.txt, though.)

Revised patch attached.

drunken monkey’s picture

Title: Add a "Compatibility mode" setting » Add the option to use clean field identifiers
Category: task » feature
Status: Needs review » Fixed

Committed.

Désiré’s picture

Status: Fixed » Needs work

I'm tested it, and it's works, but:

+++ b/includes/service.incundefined
@@ -102,10 +102,37 @@ class SearchApiSolrService extends SearchApiAbstractService {
+        '#submit' => array(array($this, 'switchToCleanIds')),

Submit callback should be a string, otherwise the submit will end up with a fatal error.

drunken monkey’s picture

Which of the two – did it work or is there a fatal error? It's hardly possible that both are the case.

I just tested again, and it works fine. I also see no reason why it shouldn't work. Did you mean that there should only be a single function or method in the #submit key, or that only functions should be in the array?

Désiré’s picture

Ok, my comment was a little under described.

1. When I submit the search server edit for with the 'Switch to clean field identifiers' button, I get the following error:
Fatal error: Function name must be a string in /var/www/includes/form.inc on line 1465

2. But I wanted to test the functionality, so I called the submit function (switchToCleanIds) directly instead of form submit, and it worked.

drunken monkey’s picture

Status: Needs work » Needs review
FileSize
3.02 KB

Ah, huh … Maybe this was changed in PHP 5.3 or 5.4?
I do remember that, idiotically, the simple $function() way of calling variable functions was only supported for strings previously. Since I wasn't aware of any recent change there, I assumed the Form API was using the proper call_user_func() way to call the handlers – but now I looked and, indeed, it (again, idiotically) isn't. So I guess we really have to change that.

Patch attached, please test! (Prepend something like $this->options['clean_ids'] = FALSE; to the method body of configurationForm() to show the button again.)

Désiré’s picture

Status: Needs review » Reviewed & tested by the community

It works well.

Thank you.

drunken monkey’s picture

Status: Reviewed & tested by the community » Fixed

Godd to hear, thanks for testing!
Committed.

Désiré’s picture

Can I ask you, to create a new release from the module, from this state?

We want to use it in a project but I don't want to use the dev version if not necessary.

drunken monkey’s picture

It's been hardly more than a month since the last release, and only seven commits, which isn't much. But if it helps, sure, I'll create a new release in the next days.

Désiré’s picture

Thank you, it would be great!

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.