Use Case
Search API shall be used to provide a search in an external system.
The external system is accessed via SOAP.
Architecture
Stack of the application would look like this:
- Facet API
- Search API
- Search API integration of external system. (Like search_api_solr)
- Client to handle the connection etc. to the external system. (Like SolrPhpClient)
The external system has special needs and has to be configurable (wsdl, credentials), thus a SearchApiService
is created and registered with hook_search_api_service_info
.
And because the external system is a non entity data source, a dedicated SearchApiDataSource
is defined and registered with hook_search_api_item_type_info
.
Issue - chicken egg problem
The SearchApiDataSource
provides meta-data via getPropertyInfo()
and other methods. But in this use case these meta-data are located in the external system. This means we need to connect to the external system to fetch these meta-data. Since the connection properties / methods are decoupled in the specialized SearchApiService
and the index configuration, we need to access these data somehow. Unfortunately the SearchApiDataSource
doesn't know on which index it's currently acting - but without this information it's impossible to figure out which service / configuration has to be used.
Suggested solution
The only way I see how this could be solved is to make the SearchApiDataSource's
index aware.
This would change some of the current usage patterns:
- Pass always a
SearchApiIndex
to the constructor ofSearchApiDataSource's
.
Remove thetype
parameter - we can use$index->item_type
instead. - Remove the index parameter on these methods (and change the constructs in which they are called):
startTracking()
stopTracking()
trackItemInsert()
trackItemChange()
trackItemQueued()
trackItemIndexed()
trackItemDelete()
getChangedItems()
getIndexStatus()
To verify I missed nothing in the above description I rewrote the whole code according to my suggestion. Attached patch was created using the facetapi branch of the dedicated sandbox.
After all the changes I'm able now to do something like this $this->index->server()->ping()
in SearchApiDataSource::getPropertyInfo()
:)
Comment | File | Size | Author |
---|---|---|---|
search_api-chicken-egg-issue.patch | 34.32 KB | das-peter | |
Comments
Comment #1
drunken monkeyHuh. Wow, that's quite some patch …
I'm really reluctant to a) make such a huge API change and b) weaken the decoupling of data source and index. Also, the type is supposed to stay fixed throughout the Search API. It's weird when it can change (or at least all properties) when you change the server. From an architectural point of view, the data source just shouldn't depend on the index it is used by.
What if you just encoded the connection information (e.g., the server's machine name, or some other mechanism) in the item type name? That would be much cleaner anyways (as the different servers seem to represent different types after all).
Comment #2
das-peter CreditAttribution: das-peter commentedThanks for your feedback.
I absolutely understand your perspective.
However let me tell you what my thoughts were before I decided to create this patch.
The index is the glue between the datasource (item type) and the server - and while I can't imagine a use case where the server has to use the index, the datasource makes excessive usage of the index.
Direct usage in the methods I've listed above - but also indirect usage over the method
getMetadataWrapper
which is used in several places inSearchApiAbstractDataSourceController
.And not to forget
getMetadataWrapper()
calls my chicken egg problem placegetPropertyInfo()
.If I try to look at this from a higher level it seems to me the current design is to drupal/entity centric.
As long as we use the entity datasource only, the type is enough information to do all the necessary stuff in the datasource (fetch entity metadata) - but only because we're acting in the framework that we base on. There the only needed "glue" is the type.
Of course, rely on just the type has some advantages in perspective of usage convenience in the framework (creating an array of indexes with the same type and pass them as parameter).
For me the external datasource use case unmasks the need for a relation instead a decoupling.
The datasource needs the index in any case - at the moment it's just called type because it's all that's necessary to cover the framework internal use cases.
Switching from type to index won't break the framework internal use case, simply because the necessary type information is embedded in the index.
Changing this "decoupling" to a real relation might look like loosing flexibility on code perspective but for me it seems more like enhancing the flexibility in terms of use cases.
Comment #3
drunken monkeySorry, but I still don't really see why there should be a dependency, or why you can't just encode the information in the type.
It's true that the abstract data source controller calls
getMetadataWrapper()
in several places, but a) that's only my suggestion for a default implementation, it's not mandatory, and b) what has that got to do with the index? It just calls its own method. If you use external information (from the server, as it seems, not from the index) in there, you are allowed to do so, but you should keep the information/key for that in the item type. That's what the type is there for (identifying the kind of data to index), after all, it wouldn't have any use otherwise.Comment #4
das-peter CreditAttribution: das-peter commentedIf I got you right you suggest to do something like this:
And in the data source:
I agree this is absolutely doable.
A nasty downside I see is that you've to explain the user he has to select the item type according to the server he wants to use - and once done there's no return. Atm. you can't switch the item type later on.
Besides that the advantage of being able to collect indexes with the same item type is nevertheless gone (even of course only for the external data integration), same applies to the datasource controller caching in
search_api_get_datasource_controller()
.In my eyes it's still better to have a dependency in the code as being dependent on the knowledge of a user.
Comment #5
drunken monkeyHm, you are right about the UX there, hadn't thought about that.
However, my thinking was that such external data sources are almost always a customization for a certain site, with the site builders known, or even identical to the developers, and the search configuration maybe already stored in code.
If you need this for more untrained users, though, I agree it's a problem. However, you could rather easily overcome this problem with some slight modifications/altering to the index create and edit forms, so that the user is automatically directed to the right selections.
You certainly have a point, though. I'll have to contemplate this further.
By the way, I mentioned this issue in the project announcements – maybe someone else wants to chime in and convinve either of us. ;)
Comment #6
das-peter CreditAttribution: das-peter commentedGood idea to add this discussion to the announcements - feedback from other dev's with different use cases would be really helpful.
Do you know if there's someone who also worked with own data-sources?
Regarding your suggestion to alter the configuration form my contra is this:
Why should we prefer a solution which makes it more complex to extend the functionality instead of changing to a design that has better support for extending and doesn't seem to have other downsides (yeah, I know the decoupling - but as of my description in #2 I consider this argument as invalid ;D )
Hmm, I guess I don't have any argument left - now we need a negotiator :P
Comment #7
Akaoni CreditAttribution: Akaoni commentedI'm not sure I'm quite across what you're trying to do here, but...
Isn't
SearchApiDataSource
aware of the Item Type which is, in turn, aware ofSearchApiService
?I think I'm working on something similar which has an instance of
SearchApiDataSource
fetching all non-Drupal indexes from the search server. These indexes are then available to users for use as read-only Search API indexes.My in-progress code for this is:
Note: Obviously, this also involves creating Item Types with
hook_search_api_item_type_info()
.Useful?
Edit: I just had a proper look into this again and the only reason Item Type is aware of
SearchApiService
is because I added two non-API values forserver
anddatabase
. Worth adding something similar as optionals to the API?Comment #8
das-peter CreditAttribution: das-peter commented@Akoni: Thank you very much for your participation. Glad to see someone else has similar needs :)
The approaches in #4 and #7 are quite similar.
At least they suffer from the same problem - they introduce special knowledge for developers and UI users as already described in #4.
Comment #9
Akaoni CreditAttribution: Akaoni commentedBy gum, very similar indeed!!
Teach me to only half read an issue before posting. Sorry. :/
The only thing mine adds is the ability for one server to have multiple types (external datasets) attached to it.
Will put some thought into the problem you described in #4.
Comment #10
drunken monkeyThanks a lot for chiming in!
I think I already note in the hook documentation that people are free to add any other keys they want. Your example is, in my opinion, an excellent usage of that, encoding the server information that way.
It would also allow to later change the server the type is associated with (even though I'd consider that a very bad idea).
As a side note, I hope you're using the server's machine name for
$type_info['server']
, not its ID – this is a slight flaw in #4, which will make problems when used with Features.@das-peter:
It does have downsides:
- Worse performance for the „normal“ use case (where we can't pass in all indexes at once anymore).
- Bad architecture (which, in my opinion, still stands, and could well cause us some headaches later).
While I do want to support your use case, it's not the traditional one and violates the assumption/rule that indexes should be independent from their server (possibly except for defined service class features). This will cause chaos anyways if a user tries to change the index's server, so I guess you need to prevent this anyways. Adding a workaround for half of the problem to the framework itself just masks it up, and makes the framework more interdependent on the way.
Consider for example that someone would just want some additional information on a type, for any reason. With the proposed architectural change they'd have to create a mock-up index just to do that.
But of course, maybe there's a different solution alltogether. We're rooting for you, Akaoni! ;)
Comment #11
Akaoni CreditAttribution: Akaoni commentedMost welcome!! ;)
Yep:
Comment #12
Akaoni CreditAttribution: Akaoni commentedI've started a sandbox project called "Search non-Drupal indexes":
http://drupal.org/sandbox/Akaoni/1327976
This is an add-on where each server whose service has supportsFeature('search_api_non_drupal') == TRUE populates an item type for each non-Drupal index on that server. It does this through two new service functions getNonDrupalIndexes() and then getNonDrupalProperties($index_name).
This is all still pretty basic and UI hackish, but it does work.
I'm thinking the next step would be to create a secondary UI specifically for non-Drupal indexes. Eg:
My IDOL search service implements this add-on:
http://drupal.org/sandbox/Akaoni/1240206
Thoughts? Suggestions? Offers to co-maintain?
Comment #13
das-peter CreditAttribution: das-peter commentedI think this is stuck and I got over it ;) Hope no-one minds if I close with "won't fix".
Comment #13.0
das-peter CreditAttribution: das-peter commentedFix markup