
The attached patch fixes an off-by-one error in the drupal_substr() function of the includes/unicode.inc file.
The original code looks like this:
$strlen = strlen($text);
// ... skipping ...
$bytes = $istart; $chars = 0;
while ($bytes < $strlen && $chars < $length) {
$bytes++;
$c = ord($text[$bytes]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
Notice that on the last iteration of the loop, it will try to extract $test[$bytes] where the recently-incremented $bytes value is equal to the length of the $text string. This results in an "undefined index" error, as the element indexes range from [0 .. strlen()-1], not from [1 .. strlen()] as the code would imply.
My corrected version follows:
$strlen = strlen($text);
// ... skipping ...
$bytes = $istart+1; $chars = 0;
while ($bytes < $strlen && $chars < $length) {
$c = ord($text[$bytes]);
$bytes++;
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
Trivial two-line patch attached, created via "cvs diff" from a current checkout of the DRUPAL-6 branch.
Comment | File | Size | Author |
---|---|---|---|
includes-unicode.inc-off-by-one-fix.diff | 823 bytes | pillarsdotnet |
Comments
Comment #1
dave reidCan you provide a case that I can use to test this error?
Comment #2
Damien Tournoud CreditAttribution: Damien Tournoud commentedBetter yet, we should have an unit test for this function :)
Related issue: #276453: Tests needed: unicode.inc
Comment #3
pillarsdotnet CreditAttribution: pillarsdotnet commentedDunno about testcases, but on the two Drupal-6 installations I have, it comes up all the time. My experience is perhaps atypical because I use a custom-built php binary that doesn't have the mbstring extension installed.
Along those lines, I'd like to ask: is the extension really worth it, performance-wise? I suppose that having binary code would be quicker than looping character-by-character in PHP, but my installation is memory-starved, and the extension pulls in so many dependency libraries that it triples the size of the resultant (statically-compiled) PHP binary.
Comment #4
Damien Tournoud CreditAttribution: Damien Tournoud commentedThe patch is wrong... the real range is [0 .. strlen($text) - 1], no need to start at 1.
Comment #5
pillarsdotnet CreditAttribution: pillarsdotnet commented@Damien "The patch is wrong ..."
You are correct, and after a long battle trying to make the code work as advertised without generating warnings, I gave up and recompiled PHP with mbstring support.
Comment #6
Damien Tournoud CreditAttribution: Damien Tournoud commentedThis is now superseded by #352359: [core] Fixes for drupal_strpos().