Using Search API 8x-1.27 version with php 8.1, got an error while indexing data:

Deprecated: preg_replace(): Passing null to parameter #3 ($subject) of type array|string is deprecated in /mnt/www/html/marsinc01dev/docroot/modules/contrib/search_api/src/Plugin/search_api/processor/HtmlFilter.php on line 206

Apparently, not all items have value for 'alt', prepared patch for it.

Issue fork search_api-3347610

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

    Comments

    itaran created an issue. See original summary.

    drunken monkey’s picture

    Status: Active » Postponed (maintainer needs more info)

    Thanks for reporting this issue!
    Can you reliably reproduce this error message? If so, could you try to debug what is going on in that processFieldValue() method? Specifically, are there any other warnings/errors?
    From looking at the code, it seems the only way for $text to be NULL at that point is for the previous preg_replace() call to fail with an error.

    We might still want to guard against such errors, but first it would be good to know if there are other problems in that code.

    davidhk’s picture

    Version: 8.x-1.27 » 8.x-1.29
    Status: Postponed (maintainer needs more info) » Active
    StatusFileSize
    new756.53 KB

    I have just updated search_api from 8.x-1.25 => 8.x-1.29 and ran into this problem when I tried to rebuild the index.

    One of the problem pages has an image pasted directly into the text, so the html is:
    <p><img alt="" src="image/png;base64,iVBORw0 - LONG HEX STRING FOLLOWS

    It has the alt="" which matches the original description of the problem.

    I've attached a copy of the contents of $value that was passed into function processFieldValue()

    drunken monkey’s picture

    Status: Active » Needs review
    StatusFileSize
    new1.06 KB
    new6.35 KB

    Thanks a lot for this additional information, that enabled me to reproduce the problem.
    Well, using regular expressions there was the lazy way out, but I guess that fails when the backtracking gets too much. So, attached is a reimplementation using just normal mb_strpos() calls and one very straight-forward regular expression that only uses possessive quantifiers.

    Please test/review!

    godotislate’s picture

    I can confirm that the deprecation error no longer appears after applying the patch, and `$value` ends up the same. I'll hold off on RTBC for now because I have not reviewed patch contents.

    • drunken monkey committed 542e798b on 8.x-1.x
      Issue #3347610 by drunken monkey, itaran: Fixed error in HTML filter...
    drunken monkey’s picture

    Status: Needs review » Fixed

    If it works for you, that’s already good to know, thanks. Since there is automated test coverage, I think that’s already enough.
    Merged. Thanks again!

    Status: Fixed » Closed (fixed)

    Automatically closed - issue fixed for 2 weeks with no activity.

    chris64’s picture

    Some things not clear.

    but I guess that fails when the backtracking gets too much.

    Is it the real problem or juste an idea?
    So alt="" produces this error? And why? Since parameter #3 ($subject) is the reterned value of preg_replace, null if an error occurred.
    And what is such an error? What is the connection? "" makes a problem since matching no expression in,
    $text = preg_replace('/<[-a-z_]+[^>]*["\s]alt\s*=\s*("([^"]+)"|\'([^\']+)\')[^>]*>/i', ' <img>$2$3</img> ', $text);
    Look not nice. To get an alternative expression in the same idea, and to match "" or '',
    $text = preg_replace('/<[-a-z_]+[^>]*["\s]alt\s*=\s*(?|"([^"]*)"|\'([^\']*)\')[^>]*>/i', ' <img>$1</img> ', $text);
    In the same spirit instead of,
    $text = preg_replace('/(<[-a-z_]+[^>]*["\s])title\s*=\s*("([^"]+)"|\'([^\']+)\')([^>]*>)/i', '$1 $5 $3$4 ', $text);
    rather,
    $text = preg_replace('/(<[-a-z_]+[^>]*["\s])title\s*=\s*(?|"([^"]*)"|\'([^\']*)\')([^>]+>)/i', '$1 $3 $2 ', $text);