PHP offers two different function groups for regular expressions. The ereg functions expect a POSIX-style pattern syntax, while preg functions use a perl-compatible syntax.

Not only does Drupal mostly use the latter, php.net actually states that the latter is faster and it seems that ereg may be removed entirely in PHP 6. However, six places remain where core uses ereg. I don't see any problem with rewriting these expressions for preg instead, which would standardize core on a single form and may (negligibly) improve performance.

If there is a problem or a disadvantage with using preg in these places, this should probably be documented inline (currently isn't).

Here's a grep.

$ grep ereg -R includes/ modules/
includes/file.inc:    $regex = '/\.(' . ereg_replace(' +', '|', preg_quote($extensions)) . ')$/i';
includes/file.inc:        elseif ($depth >= $min_depth && ereg($mask, $file)) {
includes/unicode.inc:  if (!$bom && ereg('^<\?xml[^>]+encoding="([^"]+)"', $data, $match)) {
includes/unicode.inc:      $data = ereg_replace('^(<\?xml[^>]+encoding)="([^"]+)"', '\\1="utf-8"', $out);
modules/blogapi/blogapi.module:  if (eregi('<title>([^<]*)</title>', $contents, $title)) {
modules/blogapi/blogapi.module:    $contents = ereg_replace('<title>[^<]*</title>', '', $contents);
CommentFileSizeAuthor
#4 drupal-preg-286893-4.patch4.64 KBcburschka

Comments

damien tournoud’s picture

Ok, I'm responsible for one of them (the ereg_replace() in drupal_xml_parser_create()), it was a long time ago, while I was young and naive (oh, wait a minute, I'm still at least one of the two, goooood!).

Those can be changed without side effects (it already started in user_validate_name(), which got into D7 some weeks ago), no doubt about it. But don't forget also split() and spliti(), which are hidden forms of ereg.

cburschka’s picture

Thanks, I forgot the functions that use posix but don't contain ereg in their names. spliti() is never used, but split() is.

$ grep ' split(' -R *       
modules/filter/filter.module:      list($tag) = split('[ >]', substr($chunk, 2 - $open), 2);
modules/profile/profile.module:        $values = split("[,\n\r]", $value);
modules/profile/profile.module:        $lines = split("[,\n\r]", $field->options);

I'm out of time for rolling a patch tonight, but it's trivial really. For the most part, the only required change in the pattern are the delimiters (//) and escaping slashes.

Anonymous’s picture

Subscribing

cburschka’s picture

Status: Active » Needs review
StatusFileSize
new4.64 KB

Note: file_scan_directory exposes its regular expression handling to contrib, so changing the function is an API change. So two things:

1.) That part cannot be ported to 6.x, unless we can somehow convert patterns from posix to perl on the fly (which is probably not worth it).
2.) Core uses file_scan_directory in a lot of places, making that part non-trivial (and I'm still short on time). This patch fixes only the 8 trivial lines.

catch’s picture

Status: Needs review » Closed (duplicate)

This is a duplicate of http://drupal.org/node/64967.

I'll bump the other issue.