Using Search API 8x-1.27 version with php 8.1, got an error while indexing data:
Deprecated: preg_replace(): Passing null to parameter #3 ($subject) of type array|string is deprecated in /mnt/www/html/marsinc01dev/docroot/modules/contrib/search_api/src/Plugin/search_api/processor/HtmlFilter.php on line 206
Apparently, not all items have value for 'alt', prepared patch for it.
| Comment | File | Size | Author |
|---|---|---|---|
| #4 | 3347610-4--fix_html_filter_for_overlong_attributes.patch | 6.35 KB | drunken monkey |
| #4 | 3347610-4--fix_html_filter_for_overlong_attributes--tests_only.patch | 1.06 KB | drunken monkey |
Issue fork search_api-3347610
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
drunken monkeyThanks for reporting this issue!
Can you reliably reproduce this error message? If so, could you try to debug what is going on in that
processFieldValue()method? Specifically, are there any other warnings/errors?From looking at the code, it seems the only way for
$textto beNULLat that point is for the previouspreg_replace()call to fail with an error.We might still want to guard against such errors, but first it would be good to know if there are other problems in that code.
Comment #3
davidhk commentedI have just updated search_api from 8.x-1.25 => 8.x-1.29 and ran into this problem when I tried to rebuild the index.
One of the problem pages has an image pasted directly into the text, so the html is:
<p><img alt="" src="image/png;base64,iVBORw0 - LONG HEX STRING FOLLOWSIt has the alt="" which matches the original description of the problem.
I've attached a copy of the contents of $value that was passed into function processFieldValue()
Comment #4
drunken monkeyThanks a lot for this additional information, that enabled me to reproduce the problem.
Well, using regular expressions there was the lazy way out, but I guess that fails when the backtracking gets too much. So, attached is a reimplementation using just normal
mb_strpos()calls and one very straight-forward regular expression that only uses possessive quantifiers.Please test/review!
Comment #6
godotislateI can confirm that the deprecation error no longer appears after applying the patch, and `$value` ends up the same. I'll hold off on RTBC for now because I have not reviewed patch contents.
Comment #8
drunken monkeyIf it works for you, that’s already good to know, thanks. Since there is automated test coverage, I think that’s already enough.
Merged. Thanks again!
Comment #10
chris64Some things not clear.
Is it the real problem or juste an idea?
So
alt=""produces this error? And why? Since parameter #3 ($subject) is the reterned value of preg_replace, null if an error occurred.And what is such an error? What is the connection?
""makes a problem since matching no expression in,$text = preg_replace('/<[-a-z_]+[^>]*["\s]alt\s*=\s*("([^"]+)"|\'([^\']+)\')[^>]*>/i', ' <img>$2$3</img> ', $text);Look not nice. To get an alternative expression in the same idea, and to match
""or'',$text = preg_replace('/<[-a-z_]+[^>]*["\s]alt\s*=\s*(?|"([^"]*)"|\'([^\']*)\')[^>]*>/i', ' <img>$1</img> ', $text);In the same spirit instead of,
$text = preg_replace('/(<[-a-z_]+[^>]*["\s])title\s*=\s*("([^"]+)"|\'([^\']+)\')([^>]*>)/i', '$1 $5 $3$4 ', $text);rather,
$text = preg_replace('/(<[-a-z_]+[^>]*["\s])title\s*=\s*(?|"([^"]*)"|\'([^\']*)\')([^>]+>)/i', '$1 $3 $2 ', $text);