Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Thanks for this great module. It works great except for one thing. When a user searches on a word in an attached PDF file, the node that it belongs to is found (as expected). However, the snippet (excerpt) returns nothing. So the user has no idea where to look for the given search term in the result.
Can you add the excerpt from the attached file and add it to the parent entity?
Thanks a lot!
Comment | File | Size | Author |
---|---|---|---|
#1 | 2134163-search_api_attachments_add-search-snipper.patch | 1.06 KB | BarisW |
Comments
Comment #1
BarisW CreditAttribution: BarisW commentedI believe it can be as simple as this. What's still missing is the information about in which file this snippet is found. aybe this would work (haven't tested it yet).
Instead of:
use this
Comment #2
izus CreditAttribution: izus commentedHi,
i do have the extract with the last code base using tika. maybe the issue was fixed meanwhile !
i commited the hl parameter as it may be interesting to have.
thanks
Comment #3
tostinni CreditAttribution: tostinni commentedHi izus,
I'm still unable to get the excerpt using the lastest dev and Tika 1.4.
I reindexed all the PDF by clearing up the cache_search_api_attachment table and making sure that Tika is running using "top" command but I still can't see it in views.
Do you have some additional configuration to share ?
Edit :
I noticed that the lastest dev doesn't update the cache_search_api_attachments table, so I think something isn't right with this version.
Comment #4
izus CreditAttribution: izus commentedhmm, i wonder how i considered this working in my last test, maybe i didn't drink enough cofee... i just tested it again and can't get the excerpt.
This definitely needs a patch
Sorry for confusion
Comment #5
rovoHello, just checking in to see if there has been any new development with this? I'm running into the same issue where I can't get the snippet to show in the search results for terms searched that are in the PDF attachment.
Comment #6
izus CreditAttribution: izus commentedHello rovo,
there is no commit doing this i'm aware of yet.
patches are welcome for this if anyone can contribute.
++
Comment #7
rovoHi Izus,
If you don't mind me double checking that I understand the issue correctly and it's not that I just misconfigured something; Search API Attachments will not create a snippet of text based on what it finds in the PDF attachment, to be displayed on the search results page? Or maybe Search API Attachments is able to do this, and I've just misconfigured my setup?
Greatly appreciate your insight.
Comment #8
izus CreditAttribution: izus commentedit doesn't do it yet and i'd love to review a patch for this and have it in :)
Comment #9
rovoI'm looking into it, but I started thinking maybe I'm not making the best use of this module(or have it misconfigured). Since this module isn't returning a snippet for the search results, I'm wondering how others are making use of it? What do you have showing for a search result that matches? In my case, I've found that it will return the the Summary of the node.
Comment #10
maximpodorov CreditAttribution: maximpodorov commentedIn my case, Solr server returns excerpts for file attachments (and for other fields also). This requires the following query rewriting:
Comment #11
rovoMax, this looks great. I did find that SOLR would already return a preview snippet of the attached PDF. I was trying to bypass using SOLR, instead only relying on Search API, Search API attachments, Database search, and Search pages modules. I'm trying to avoid SOLR, because it has a default limit on the amount of tokens indexed from the PDF, and I can't change them in the solrconfig.xml file to increase it on the host provider I'm using. I've found that Search API, Search API attachments, and Database search, actually do index the entire PDFon their own without SOLR, they just don't provide a contextual highlighted preview snippet for the search results page like SOLR does.
Comment #12
Anonymous (not verified) CreditAttribution: Anonymous commentedI get also no excerpts for file attachments (pdf) with solr.
Comment #13
izus CreditAttribution: izus commentedhi fku,
did you test what #10 suggests ?
if this is really what we need we can may be make values configurable and add a patch for this.
or i don't know if someone tried to have this done out of SOLR so that it is more general and can fit #11 too.
++
Comment #14
Anonymous (not verified) CreditAttribution: Anonymous commentedhi izus,
i'm not a drupal-programmer. I have copy the code if (isset ... } to search_api_solr\includes\service.inc at line 1692 (version 7.x-1.x-dev 2014-05-12), into the protected function preQuery. No excerpts.
Comment #15
izus CreditAttribution: izus commentedhi,
try adding it to the .module file and replacing MYMODULE with your module name.
++
Comment #16
Anonymous (not verified) CreditAttribution: Anonymous commentedI don't understand you. I have not an own module. Should I do that in the search_api_solr.module or in search_api_attachments.module?
Comment #17
izus CreditAttribution: izus commentedyou can just test it like this and if it does the job, you can delete your tests and we can suggest a patch :)
Comment #18
Anonymous (not verified) CreditAttribution: Anonymous commentedI have integrate the function #10 at the end of search_api_attachments.module:
function search_api_attachments_search_api_solr_query_alter(array &$call_args, SearchApiQueryInterface $query) {
if (isset($call_args['params']['hl']) && ($call_args['params']['hl'] === 'true')) {
$call_args['params']['hl.fl'] = '*'; // The very essence of the trick.
$call_args['params']['hl.requireFieldMatch'] = 'true';
$call_args['params']['hl.fragsize'] = 400;
$call_args['params']['hl.maxAnalyzedChars'] = 300000;
}
}
- clear all indexed data
- flush all caches
- index
- search with my view
No excerpts.
My dev Environment:
Win 8.1 64-bit, xampp 1.8.0, JRE 1.7, Tomcat 7.0.28, Apache Solr 4.8.0, search_api 7.x-1.x-dev 2014-05-12, search_api_solr 7.x-1.x-dev 2014-05-12, search_api_attachments 7.x-1.x-dev 2014-03-02, Tika 1.5.
The drupal installation is a copy of a online website with 3138 pdf-files 30kb - 200kb and 64 pdf-files 4mb - 12mb + 299 nodes without files attached. The online installation has old modules for search_api*. I would like to update and test this in the dev environment.
Comment #19
Robert_W CreditAttribution: Robert_W commentedThe code in #10 doesn't work with the latest Search API, Search API Attachments and using database search. My documents get indexed as I can find text in the document, but it does not display the match as a snippet.Doh, the code in #10 isn't suppose to work with database search.Comment #21
izus CreditAttribution: izus commentedmade #2068805: Multiple file field or multivalued file field : how to find which file contains which text ? a duplicate of this, so in the excerpt we should somehow have the filename that contains the keywords we searched for.
Comment #22
c3rberus CreditAttribution: c3rberus commentedFirst off, this module is well needed so thank you for maintaining it.
We're looking for this feature as well, it would be great to be able to return a excerpt of text from the indexed document in the search result as this alone can answer the user's search query without them having to dig deeper into the document.
We're using Tika as it is pretty straight forward to setup you only need a single file and trying to avoid Solr due to its complexity.
Is this possible with Tika or only with Solr? I have this module using Tika but I don't get any search excerpt returned back in my search result.
Any possibility to get this added? Would make a world of difference and to be able to stick to search_api, search_db, views, tika and search_api_attachments.
Comment #23
natew CreditAttribution: natew commentedI ran in to the same issue with search api solr and attachments. The solution in #10 improved the excerpts with highlighting, however highlighting of the search terms is still somewhat spotty with some pdfs.
Comment #24
c3rberus CreditAttribution: c3rberus commentedcan #10 be used with search_api_attachments and tika backend or is that not supported?
Comment #25
maximpodorov CreditAttribution: maximpodorov commentedI use #10 for search_api_attachments and tika to show excerpts.
Comment #26
GrimreaperHello,
What needs to be done to close this issue?
I think there is a duplication with #2503743: Enhance highlight support by increasing maxAnalyzedChars parameter for excerpts.
Comment #27
izus CreditAttribution: izus commentedDuplicate of #2503743: Enhance highlight support by increasing maxAnalyzedChars parameter for excerpts