The attached patch fixes an off-by-one error in the drupal_substr() function of the includes/unicode.inc file.

The original code looks like this:

    $strlen = strlen($text);
// ... skipping ...
      $bytes = $istart; $chars = 0;
      while ($bytes < $strlen && $chars < $length) {
        $bytes++;
        $c = ord($text[$bytes]);
        if ($c < 0x80 || $c >= 0xC0) {
          $chars++;
        }
      }

Notice that on the last iteration of the loop, it will try to extract $test[$bytes] where the recently-incremented $bytes value is equal to the length of the $text string. This results in an "undefined index" error, as the element indexes range from [0 .. strlen()-1], not from [1 .. strlen()] as the code would imply.

My corrected version follows:

    $strlen = strlen($text);
// ... skipping ...
      $bytes = $istart+1; $chars = 0;
      while ($bytes < $strlen && $chars < $length) {
        $c = ord($text[$bytes]);
        $bytes++;
        if ($c < 0x80 || $c >= 0xC0) {
          $chars++;
        }
      }

Trivial two-line patch attached, created via "cvs diff" from a current checkout of the DRUPAL-6 branch.

Comments

dave reid’s picture

Can you provide a case that I can use to test this error?

Damien Tournoud’s picture

Can you provide a case that I can use to test this error?

Better yet, we should have an unit test for this function :)

Related issue: #276453: Tests needed: unicode.inc

pillarsdotnet’s picture

Dunno about testcases, but on the two Drupal-6 installations I have, it comes up all the time. My experience is perhaps atypical because I use a custom-built php binary that doesn't have the mbstring extension installed.

Along those lines, I'd like to ask: is the extension really worth it, performance-wise? I suppose that having binary code would be quicker than looping character-by-character in PHP, but my installation is memory-starved, and the extension pulls in so many dependency libraries that it triples the size of the resultant (statically-compiled) PHP binary.

Damien Tournoud’s picture

Version: 6.x-dev » 7.x-dev
Status: Needs review » Needs work

The patch is wrong... the real range is [0 .. strlen($text) - 1], no need to start at 1.

pillarsdotnet’s picture

@Damien "The patch is wrong ..."

You are correct, and after a long battle trying to make the code work as advertised without generating warnings, I gave up and recompiled PHP with mbstring support.

Damien Tournoud’s picture

Status: Needs work » Closed (duplicate)