Hi,
I found out that the module does not currently support search indexes which hold multiple entities. Is this correct?
This is what I found out:
My setup
Versions:
- Drupal 7.39
- Apache SOLR 5.2.1
- search_api 7x-1.16
- search_api_solr 7x-1.9
- search_api_attachments 7.x-1.6
- Results are being rendered by search_api_views
I have 1 custom search index with the item type selected as "multiple types" (thus using multiple entity types). Next, under Datasource options -> Entity Types I selected "node" and 1 custom ECK entity.
For search_api_attachments I have followed the readme. I am using the SOLR extraction method, and SOLR is properly configured.
In the "Filters" tab of the index, I selected the File attachments with no special settings.
The problem
When I index, the contents of attached files are not being indexed. When I run a query in SOLR, the file contents aren't picked up either.
After some searching and debugging I found this:
In callback_attachments_settings.inc, when the entityType of the index != 'file' (my case), you start traversing the $items array at line 44. When a normal index on a single entity type is use, every item in the $items array contains the entity object (node etc.). But when using multiple entity types in you index, every item in the $items array is again an array containing information about the entity, and the first key is the entity object itself.
Because the entity is one level deeper in the $items array, the code stops in the second foreach where is says if (isset($item->$name)) {
(line 46).
The solution?
I applied a quick fix to see if I would get any results. Right before the first foreach, where you start traversing the $items array in the big else{}, I added:
if ($this->index->item_type == 'multiple') {
foreach ($items as $key => $item) {
$items[$key] = reset($item);
}
}
I just move the entity object to the current $item, so that when you start looping through $items, the entity object is directly available.
I doesn't seem like the right solution, but it works for now as I'm now able to index file contents and can succesfully search within files.
Diff:
--- a/htdocs/sites/all/modules/contrib/search_api_attachments/includes/callback_attachments_settings.inc
+++ b/htdocs/sites/all/modules/contrib/search_api_attachments/includes/callback_attachments_settings.inc
@@ -41,6 +41,13 @@ class SearchApiAttachmentsAlterSettings extends SearchApiAbstractAlterCallback {
}
else {
$fields = $this->getFileFields();
+
+ if ($this->index->item_type == 'multiple') {
+ foreach ($items as $key => $item) {
+ $items[$key] = reset($item);
+ }
+ }
+
foreach ($items as $id => &$item) {
foreach ($fields as $name => $field) {
if (isset($item->$name)) {
Comment | File | Size | Author |
---|---|---|---|
#15 | no_support_for_indexes-2596283-15.patch | 5.68 KB | frob |
#13 | no_support_for_indexes-2596283-13.patch | 5.69 KB | frob |
#8 | multiple types.png | 33.45 KB | screon |
#5 | no_support_for_indexes-2596283-5.patch | 5.89 KB | screon |
Comments
Comment #2
screon CreditAttribution: screon commentedI forgot about another problem:
Since an index with item_type = "multiple" doens't return anything when getEntityType() gets called, no FileFields are returned and thus the whole functionality doesn't work. When I comment out the array_key_exists part, it does work.
I'll think about a possible solution.
Comment #3
izus CreditAttribution: izus commentedhi,
actually this module supports the entity types : node, file, field collections, references and entity reference (and there is an opened issue to support comments)
the 'multiple' item_type doesn't seem default for me (do you use another module that adds this ?)
apart from that, as each entity handles the information about files differently, we have provides support to each entity type is a submodule (look the examples for entityreference or field_collections...)
if you have another entity type : you can add support to is an contribute it (if the entity type is provided in a contrib module) or just create a custom module for it.
if your concern is to search throw different indexex that have attahcments extractions data, this can be done thank's to https://www.drupal.org/project/search_api_multi.
Comment #4
screon CreditAttribution: screon commentedThe 'multiple' option is default in search_api, but I don't know from which version. I set up a sandbox on simplytest with search_api 1.16 as a test: https://dfyi9.ply.st/node#overlay=admin/config/search/search_api/add_index.
So I guess the best solution would be to create a submodule to support the 'multiple' item type like you suggest? I'll look into it with my colleagues, and will post my progress here.
Comment #5
screon CreditAttribution: screon commentedHere is a first attempt. I'm not very experienced at this, so bare with me.
Basically I create a new submodule which does almost the same as the base attachments module, but I added 1 more foreach loop in alterItems() so we can pick up de file contents of the entity in the $items array. Then I write the attachments content to the $item object (for some reason, when I write it to the subitem, the search doesn't pick this up). And I changed the logic for the getFileFields() function.
Comment #6
screon CreditAttribution: screon commentedComment #7
GrimreaperHello,
Thanks for the patch.
To test it would you please tell us which module you use to index multiple entity types into one index because as Izus, I can't see the "multiple" option.
Comment #8
screon CreditAttribution: screon commentedHi,
As I said before, this seems to be default in search_api (at least in version 7x-1.16), you can see this in this simplytest sandbox: https://dmrr9.ply.st/
Comment #9
izus CreditAttribution: izus commentedcode in #5 seems ok but i'll wait for some feedback from users that use it.
Comment #10
kbrinnerFor those who don't see the multiple option, it's when you are creating a new index at admin/config/search/search_api/add_index, and selecting from the select list 'item type' - 'multiple types' is the last option. See the screenshot in comment #8.
Comment #11
rovoPatch in #5 worked for me.
Prior to the patch, when creating a new index: I could select multiple types, pick fields, and choose File attachments from the Filters tab; but I was not able to extract the content from files that were attached to my custom entities.
After the patch, when creating a new index: I could select multiple types, pick fields, choose File attachments from the Filters tab; then I was able to select fields 'Attachment content: FIELD_NAME' as Fulltext types.
Thank you Screon!
Comment #12
frobThis patch is made from the sites directory, it needs to be remade in the search_api_attachment's module root directory.
Comment #13
frobI have rerolled the patch to just throw this submodule into the existing contrib. I haven't done a full review of the module but I can verify that it works when using multiple entity types, in my case I am using nodes and users.
Really I don't think a sub-module should be necessary to make it do this. This should just be a part of normal functionality and such I am also changing this to a bug report and not a support request.
Comment #14
frobI have noticed that on multivalue fields only the last file is actually indexed, the rest are not.
Comment #15
frobI fixed the issue where only the last file's content are getting indexed.
Comment #16
rovofrob, good catch on the multivalue. I've applied it and it's resolved that aspect for me.
Tested patch in #15
Comment #17
frob@rovo, is the issue rtbc then?
Comment #18
izus CreditAttribution: izus commentedas of #16
Comment #20
izus CreditAttribution: izus commentedThanks all
this is now merged