There is an issue #2179755: HTML filter leaves whitespaces about whitespace and HTML filter but this unfortunately was not solved properly.
If for example you have this text in body field:
<h2>Introduction </h2>
<p>Introduction</p>
and the "Correct faulty and chopped off HTML" text processor is enabled in text filter, the
will be replaced by some special char that trim(search_api/includes/processor_html_filter.inc:116) can't remove when "HTML filter" processor is enabled in Search api filters.
The output from above text with "Correct faulty and chopped off HTML" text processor(checking the $text variable in search_api/includes/processor_html_filter.inc:111):
<h2> introduction </h2>
introduction
This untrimmed space leads us to the following error:
SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '85213-body:value-introduction' for key 'PRIMARY': INSERT INTO
I created a patch to fix this.
Comment | File | Size | Author |
---|---|---|---|
#5 | 3081180-5--html_filter_breaking_spaces.patch | 2.36 KB | drunken monkey |
|
Comments
Comment #2
mibfire CreditAttribution: mibfire commentedThe patch that solves this issue.
Comment #3
donquixote CreditAttribution: donquixote commentedComment #4
mibfire CreditAttribution: mibfire commentedComment #5
drunken monkeyThanks for reporting this issue and providing a patch! You’re right, seems we hadn’t really taken non-breaking spaces and other “exotic” whitespace into account.
However, your solution seems a bit verbose. Also, if we are using the exact same code three times, we might want to split it off into its own helper method. (Can include the
html_entities_decode()
call there, too, it seems.)Please test/review my attached revision and see if it still resolves the problem for you!
PS: In the future, please also remember to set status to “Needs review” when posting a working patch. (And please don’t misuse the “Priority” field!)
Comment #7
drunken monkeyComment #8
mibfire CreditAttribution: mibfire commentedI was thinking on the same but i was not sure where i should put it to. I didn't want to extend the current class because i thought it is a global function that we might also use somewhere else.
I checked your patch but you don't remove the spaces between words. Is there always only one word? So couldn't we have something like: "word1(double spaces)word2"?
I also checked that on my profile page that i have already 1 credit in "Search API" but i am not sure that it is this one or something else what i did earlier. I think i should have 2 with this one.
Thanks
Comment #9
drunken monkeyI don’t think that’s true,
preg_replace('/\s+/u', ' ', $token)
should replace all whitespace with a single classical space.You can actually view the exact issues by clicking on “View all issue credits”. As you see (and I also see in the module’s commit log) it doesn’t seem like you were credited for another Search API issue yet.
Comment #10
mibfire CreditAttribution: mibfire commentedIndeed, you are right! ok, thanks!