Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
1. Select to index the Rendered HTML output
2. Enable solr to produce an excerpt
3. Have a view which displays the excerpt
4. Search for a term that appears in the full html
Expected result:
Result returned and excerpt of full html page displayed
Actual result:
Result returned but no excerpt of full html page displayed
Comment | File | Size | Author |
---|---|---|---|
#17 | excerpt_highlight-2719573-17.patch | 945 bytes | garnett2125 |
#12 | 2719573_excerpt_highlight.patch | 18.5 KB | mkalkbrenner |
#7 | rendered_html_output-2719573-7.patch | 761 bytes | fran seva |
#2 | Screen Shot 2016-05-09 at 13.03.16.png | 36.44 KB | fran seva |
Comments
Comment #2
fran seva CreditAttribution: fran seva as a volunteer and at Bluespark commentedHi @arknoll -- I've been working in this issue and I'm not sure what is the expected result.
I tried to reproduce the error following this steps:
What I got was:
My question is, should the excerpt be displayed as full html?
Comment #3
fran seva CreditAttribution: fran seva as a volunteer and at Bluespark commentedComment #4
arknoll CreditAttribution: arknoll commented@fran you have to change one step in your config to reproduce:
1. Create a Solr server: I tried with and without "Return an excerpt for all results"option
2. Create an index
3.
Add body field to be indexed3. Select to index the Rendered HTML output
4. Add a preprocessor to the index (with default configuration)
5. Create a view to display indexed content with a fullText search exposed filter and displaying the Excerpt field.
6. Create dummy content and alter the body with some links and setting the format to full html (then check the HTML was indexed)
7. Go to the page and search by a word that is part of the link text
This functionality is key for pages that are controlled by panelizer. The full rendered HTML is really the only way to index content for those pages (although, the problem is reproducible with a basic content type as well)
Comment #5
fran seva CreditAttribution: fran seva as a volunteer and at Bluespark commentedThanks @arknoll I'm able to reproduce the error.
Comment #6
fran seva CreditAttribution: fran seva as a volunteer and at Bluespark commentedComment #7
fran seva CreditAttribution: fran seva at Bluespark commentedHi -- After review the code we found (@plopesc and me) that the code was trying to access to an object instead an array:
$response['highlighting'][$solr_id]
To make excerpt works we have to:
Comment #8
mkalkbrennerLooks good! Thank you.
But I think we should add a test for it.
And I'm not sure if "rendered HTML" == "excerpt" ;-)
Comment #9
mkalkbrennerOK, leveraging the "spell" field here is just a temporary workaround because the content of "spell" is filtered by stop words, length filters, removal of duplicates and coverted into lower case:
As proposed in #2195465: Returning snippets for partial matches doesn't work we should use something like "rendered_item" but without HTML tags.
From my point of view we have to port the concept of the apachesolr 7.x module to store a plain text / stripped tags version of the rendered item in a dedicated field (previously "content") and use that one to generate highlighted snippets.
But before we start implementing it we need to decide how to fix #2718209: RenderedItem uses wrong DataType, leads to various issues with Solr backends.
Comment #10
mkalkbrennerOK, I was confused. For sure, the filters don't apply if the stored value is returned.
Nevertheless, the spell field is not that suitable. Due to the fact that all full text fields are copied to spell, it might return snippets coming from hidden fields.
I'm still convinced, that we must use a field that only contains the content of the rendered entity in other words we must not show snippets that the user doesn't see anymore if he jumps to that content.
Comment #11
mkalkbrennerComment #12
mkalkbrennerOMG what a mess. After discussing with drunken_monkey what "excerpt" and "highlight" means in Search API, it turned out that both features are broken.
"Excerpt" is the corresponding feature to apachesolr 7.x highlighted search result snippets. This excerpt consts of multiple snippets of a given size.
"Highlight" means replacing single field values by corresponding highlighted values (without any snippets).
The broken implementation tries to provide both features with the one and only Solr highlighter. I think I fixed it and also wrote tests for it. I also added a simple configuration for most of the parameters of the Solr standard highlighter.
I first wanted to fix the current issue. Now we need follow-up issues:
Comment #13
mkalkbrennerTests pass:
https://travis-ci.org/mkalkbrenner/search_api_solr/builds/133428852
It would be good if someone could verify that the upgrade path works.
Comment #15
mkalkbrennerFollow-ups:
Comment #17
garnett2125 CreditAttribution: garnett2125 commentedThe code does check if highlight_data is set to TRUE but doesn't change the $output at all.
I had solr highlighting my keyword but it wasn't returned from the getExcerpt function.
This patch fixed the issue for me.
Comment #18
mkalkbrennerPlease don't comment on closed issues. Open a new one instead.
BTW Your patch doesn't seem to be related to your comment.