As per http://drupal.org/node/208587 we had to disable code filter on d.o due to installing 5.6 core with the new UTF-8 validation stuff (http://drupal.org/node/208564). :(

I don't know more than that, perhaps killes can add more details if he has them.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Heine’s picture

The issue is the prepare step in the hook_filter implementation:

    case 'prepare':
      // Note: we use the bytes 0xFE and 0xFF to replace < > during the filtering process.
      // These bytes are not valid in UTF-8 data and thus least likely to cause problems.
      $text = preg_replace('@(.+?)@se', "'\xFEcode\xFF'. codefilter_escape('\\1') .'\xFE/code\xFF'", $text);
      $text = preg_replace('@[\[<](\?php|%)(.+?)(\?|%)[\]>]@se', "'\xFEphp\xFF'. codefilter_escape('\\2') .'\xFE/php\xFF'", $text);
      return $text; 

This happens before the code is sent to filter_xss which will return an empty string because it encountered invalid sequences.

JohnAlbin’s picture

We need to find a replacement for \xFF and \xFE in the code.

I was trying to find Klingon characters to use, but apparently its inclusion in Unicode is apocryphal. :-( Damn.

I was looking at the “Private Use Area” of Unicode. Private Use Area is defined as “designated code points in the Unicode Standard or other character encoding standards whose interpretations are not specified in those standards and whose use may be determined by private agreement among cooperating users.”

Many fonts use that space to add extra font-specific ligatures or other characters that only differ by its styling from another Unicode character. So I’m not sure if that’s a safe place to use either. However, what is the likelihood that people will have font-specific characters in the database?

Just thinking out loud here.

zeta ζ’s picture

Title: code filter no longer works with 5.6 core due to UTF-8 » Drupal 5.6 breaks Code Filter due to Fix for SA-2008-006

Assumed requirements:

  1. Replacements for \xFF and \xFE need to be valid bytes (ie. handled properly by IE6)
  2. Replacements need to be absent from the text as a whole

Can we scan the text before the preg_replace() for the replacements, so that they can be used with confidence? We would need a ‘fail mode’ triggered by finding the replacements, either the current complete blank, or prompt to rectify at input time. The 2nd requirement then becomes;

  1. Replacements need to be unlikely to be present in the text as a whole

Could this be a plan of attack?

hswong3i’s picture

Subscribe.

soxofaan’s picture

Status: Needs review » Active

In GeSHi filter I have the same problem (http://drupal.org/node/208635), because I copied the xFE/xFF approach of code filter.
I currently have a patch implementing a plain latin alternative with '[' and ']' and a namespace prefix 'geshifilter' to avoid possible collisions with other filters.
For example, in the preparation pass the string "<python>foo</python>" becomes "[geshifilter-python]foo[/geshifilter-python]".

(edit: fixed the messed up example)

aclight’s picture

subscribing

JohnAlbin’s picture

Status: Active » Needs review
FileSize
1.88 KB

Since this module is used on d.o., I would like some 3rd party testing, please.

zeta ζ’s picture

I think 3rd comment line no longer applies.

Would look even neater if 'php' were the same length as 'code'. :-)

hswong3i’s picture

I guess we may try chr(0), it seems able to replace the use of \xFE

I test chr(0) with following code. In case of chr(0), it retun "TRUE" so we may use it:

<?php
$text = chr(0);
//$text = "\xFE";

$return = preg_match('/^./us', $text);

if ($return == 1) {
print "TRUE";
} else {
print "FALSE";
}
?>
hswong3i’s picture

Some footnotes:

  1. chr(0) == "\x00", but "\x00" is not allowed by preg_match().
  2. Since "\x00" can pass though drupal_validate_utf8(), the logic can also apply to "\x01" and "\x02".

I work out the above logic and able to keep every thing function. The only question is: are "\x01" and "\x02" really as safe as case of "\xFE"?

aclight’s picture

Status: Active » Needs review

Re patch in #7: I get a parse error when I patch codefilter 5.x-1.x-dev:
Parse error: syntax error, unexpected $end in codefilter.module on line 119

The patch for codefilter 5.x-1.x-dev from comment #10 applies and appears to work at first glance.

I'm not sure that the method used in #10 is safe, however.

pwolanin’s picture

@hswong3i what do you mean by "safe"? characters in this range are non-printing control chars, so they would never be expected in printed text. Assuming they could be pasted into into a textarea, it seems like the worse that could happen would be that extra <code> tags might be added to the text?

In ASCII, x02 is STX (start of transmission) and x03 is ETX (end transmission), so that has a nice symmetry if you care about that.

pwolanin’s picture

patch in #10 seems to work fine on 5.x (or a similar version using x02, x03)

hswong3i’s picture

Status: Needs review » Reviewed & tested by the community

@pwolanin: Surly that I understand the nature of using x02 and x03, as they are non-printable characters, and so I use them. (off-topic: they are proved as safe when I am implementing Oracle driver for D6). My only concern is: do we have any special reason for choosing xFE and xFF? If we use them just because they are NOT valid UTF-8 characters + non-printable characters, change them as x02 and x03 should be safe, too.

I am now using patch in #10 for my personal blog + D5.6, and all seems to work fine. Should we push this issue to RTBC?

soxofaan’s picture

My only concern is: do we have any special reason for choosing xFE and xFF?

I'm also wondering about this. Working with \xFE and \xFF always seemed more a hack than a structural solution to me. If they would be replaced by other hackish ones (like x02 and x03), there is no guarantee that this approach wont break in the future too, or break other filters in the input format stack.

Is there something fundamentally wrong with the "[codefilter-code]..." preparation solution suggested in #5 and implemented in #7?

pwolanin’s picture

@soxofann - since the main point is just to bypass any collision with text that might normally occur in a post, that seems like a reasonable approach too. As I understand it, the content is never stored or passed on to other filters in an intermediate state where those characters are present, so I don't see how other filters can cause a conflict.

Bevan’s picture

http://drupal.org/node/180425#comment-634230 is another victim of this issue.

I have a patch with simpletest unit test for node_teaser() pending on the resolution of this issue so that I can copy the test and expected results from the above link.

zeta ζ’s picture

FileSize
1.71 KB

Re #11: due to ?> in comment.

pwolanin’s picture

here are two slightly different, slightly improved versions of the patch by zeta-zoo

one escapes the brackets as \[ to avoid having to do that weird concat operation.

the other simply uses slashes instead (since the regex is delineated by the @ symbol)

Both use a C-style comment to avoid the closing ?> breaking the code.

soxofaan’s picture

Status: Reviewed & tested by the community » Needs review

(Resetting status to CNR, because the solution appears to be changed since it was set to RTBC)

at #19: I prefer the square brackets as delimiters because you have an open and a close bracket, while with slashes you just have one symbol, which makes things less obvious. The square brackets are also wildly used as tag/element delimiter not only in other filters but also in other CMSes, markup languages (e.g. BBCode), wikis, etc. I'm planning to go for the "[geshifilter-java]...[/geshifilter-java]" approach in GeSHi filter for syntax highlighting. It would be nice if there could be some convergence/guidelines/standardization for drupal filters instead of hacks like xFE/xFF.

About the patch itself: why the inversion in the closing element: "[codefilter-code]...[/code-codefilter]" and not just "[codefilter-code]...[/codefilter-code]" (which would be the logical thing to do, I guess)?

(I didn't really test the patch yet, I just read it, so I don't know if it works)

hswong3i’s picture

Status: Needs review » Reviewed & tested by the community

IMHO, the approach in #10 should be enough: we keep on using xFE and xFF for more than years without any problem, and so the similar solution should be solid enough, too. On the other hand, D5.6 is target to fix a rare bug which cause by IE6 only (where IE6 is well known as buggy...), it is not easy for codefilter to face similar bug once again with existing approach.

Moreover, it is always not easy to judge for a "perfect" solution. As codefilter is used in D.O., I will suggest for a solid and prompt solution, rather than a perfect and elegant one ;-)

BTW, I am not opposing the use of a meaningful wording as placeholder; it just seems to be a bit unnecessary complicated ;-(

zeta ζ’s picture

Status: Reviewed & tested by the community » Needs review

Yes, I prefer the open & close nature of []: This is also the reason why I like [codefilter-code]...[/code-codefilter]. It would be a shame to separate the code from its /.

NB My Status drop-down defaulted to RTBC. --edit despite replying to #20

soxofaan’s picture

Status: Needs review » Reviewed & tested by the community

Moreover, it is always not easy to judge for a "perfect" solution. As codefilter is used in D.O., I will suggest for a solid and prompt solution, rather than a perfect and elegant one ;-)

I understand the urgency of solving this issue and the resulting preference for small changes.
Maybe patch #10 should be committed now to the Drupal 5 version,
and an approach like in #19 could go in later or only in the Drupal 6 version for example?

Moreover, the solution like #19 is not perfect either, it's just a bit more elegant/obvious.

it just seems to be a bit unnecessary complicated ;-(

I would rather say that the xFE/xFF/x02/x01 stuff looks obscure/complicated ;-)

soxofaan’s picture

at#22

This is also the reason why I like [codefilter-code]...[/code-codefilter]. It would be a shame to separate the code from its /.

I don't see the shame of it. I see it as prefixing 'code' and 'php' with namespace like prefix 'codefilter-'. The '/' is just to show it closes the code fragment (which was retagged from 'code' to 'codefilter-code').

As compromise, we could also go for "[codefilter-code]...[codefilter-/code]" ;-)

NB: sorry about the constant status changing due to cross posting, I won't touch it this time.

zeta ζ’s picture

Status: Reviewed & tested by the community » Needs review

#19 brackets is my favourite, and works for me on FF.

+1 RTBC

zeta ζ’s picture

Status: Needs review » Reviewed & tested by the community

Sorry.

JohnAlbin’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
1.9 KB

Sorry about #7 causing a parse error. I didn't test it; I just put the patch up to get the ball rolling while I was out of town. Anyway, I'm back now and I'm glad that there has been some discussion while I was gone.

As I understand it, the content is never stored or passed on to other filters in an intermediate state where those characters are present, so I don't see how other filters can cause a conflict.

See the input format API (where \xFE and \xFF are used in the docs.) The results of 'prepare' are seen by all the other input filters. So there can be a conflict with other input filters if they choose to use the same escape sequence. For example, if codefilter and geshifilter both used \x02code\x03 during 'prepare', than there could easily be a conflict.

Therefore, I think we need to namespace the escape sequence to prevent conflicts with other input filters.

As for using \x02 and \x03…

\xFE and \xFF were used because (since they were invalid UTF-8) they were unlikely to be included in a node's content. \x02 and \x03 are control characters and are also unlikely to be included in a node's content.

However, since \x02 and \x03 are non-printable sequences of bytes and are control characters, hackers will suspect them to be possible attack vectors, rightly thinking that programmers may forget about them when writing code. We can see that the IE6 programmers made that mistake (http://drupal.org/node/208564) with the characters starting with \xC0 - \xFF. Might someone else make that mistake for characters starting with \x00-\x0F? I’d say yes. Especially since it's now been a proven attack vector.

Using [codefilter_ doesn't suffer from that possible problem.

hswong3i’s picture

See the input format API (where \xFE and \xFF are used in the docs.) The results of 'prepare' are seen by all the other input filters. So there can be a conflict with other input filters if they choose to use the same escape sequence. For example, if codefilter and geshifilter both used \x02code\x03 during 'prepare', than there could easily be a conflict.

So I have a silly question: why there is not conflict within existing implementation? If \x02code\x03 will course conflict, why codefilter and geshifilter are able to function now?

Let's back the the main topic. As this issue is a "critical bug report": a bug that cause codefilter NOT function NOW, #10 should be the most strict forward and prompt solution. I would like to suggestion an on time commit for #10 (or any similar approach, e.g. using x02 and x03). Well, again I am not sure if it is the perfect and elegant solution; BTW, at least it is functioning :-p

On the other hand, using [xxx]code[/xxx] syntax should be more elegant; BTW, as we discussed above, we have different preference about how to implement this syntax (the one that you like may be not my first choice...). Anyway, the discussion of this syntax is just about perfect and elegant, and so I would like to suggest a new "normal feature request" issue for it. As mentioned in #23, we may come back and fix the ugly \xYYcode\xZZ syntax with [xxx]code[/xxx] afterward ;-)

P.S. I guess the exploration of perfect and elegant solution shouldn't be an excuse for delay an on time bug fix patch :-)

zeta ζ’s picture

Status: Needs review » Reviewed & tested by the community

d56.patch fails for me.

brackets-208636-19.patch works, is obvious what it trying to do, avoids conflict and is elegant.

pwolanin’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
1.93 KB

that patch doesn't apply for me.

Here's a re-rolled one - with C-style comment also. Tested with PHP 4.4.7, Drupal 5.x-dev

pwolanin’s picture

Status: Needs review » Reviewed & tested by the community

cross-post. Yes, I think it's RTBC. The only difference compared to brackets-208636-19.patch is swapping the - for _ as JohnAlbin seems to prefer.

zeta ζ’s picture

Applies for me. I don’t mind - or _, though would be nice to have GeSHi filter on-board

dww’s picture

Status: Reviewed & tested by the community » Needs work

I was about to install #30 on project.drupal.org for testing. Unfortunately, it no longer applies against the end of the DRUPAL-5 branch, since John committed revision 1.21.2.7:
http://drupal.org/cvs?commit=95578
Therefore, this needs a re-roll...

JohnAlbin’s picture

Status: Needs work » Fixed

Looking at the issue queue and mapping out some other changes that will need to be made shortly, I've moved the [codefilter_ string wrapping to the codefilter_escape function.

Derek, I've committed a slightly modified version of #27 to DRUPAL-5. If you want to test it on project.drupal.org, I can wait until you report back before tagging this as DRUPAL-5--1-1.

zeta ζ’s picture

I’ve just posted Coding standards – Temporary placeholders and delimiters inspired by this. Hope you think this is appropriate and useful.

zeta ζ’s picture

dww’s picture

i updated p.d.o to the latest code in DRUPAL-5. see http://project.drupal.org/node/85771#comment-158116 --- seems to work. further poking around on project.d.o would be appreciated. make some code and php comments in various tricky situations, make sure nested code tags are handled right, etc... thanks!

zeta ζ’s picture

<code>...</code> & <?php ?> seem to work.
<?php ?> will nest in <code>...</code> to show php embedded in html.
<code>...</code> is quoted when nested in <?php ?>: seems reasonable. </code> doesn’t end php block.
<code>...</code> will NOT nest in <code>...</code> though it doesn’t cause a problem: First </code> ends block.
<code>...</code> will NOT nest in <?php ?> that is nested in <code>...</code> as it ends the php prematurely, so no highlighting. Would anyone need this structure?

Seems to be good to deploy.

JohnAlbin’s picture

Nick, thanks for the testing. The nested code tags didn't work in the previous version either. Guess that's a separate bug.

Codefilter 5.x-1.1 is now available!

Thanks to all who helped flush out a solution for this bug! It made it much easier.

NancyDru’s picture

I'm sure it's an extreme edge case, but I did have an instance the other day to have <code> within <code> and it didn't work.

JohnAlbin’s picture

I've opened an issue for the </code> within <code> bug: http://drupal.org/node/210046

liamgoldstein’s picture

Hi,

I'm not sure if this is even the same issue but I can't add any html to my content.

I have tried to manually apply the patch to filter.module (as I don't have a local copy or ssh access on my hosting) but it's still not fixing the issue.

I have a 'out of the box' install of drupal 5.6 and It has never allowed me to add html to my content whether I choose filtered or full. When i try to preview or submit content or even change the input format configuration I just get redirected to the root of my site (i.e it doesn't display the same page with an 'updated' message it takes me to the base url).

If anyone has any ideas or is willing to give me some help I would be sooo greatful.

Thanks.

pwolanin’s picture

@liamgoldstein - this module is not part of Drupal core, so it would not have anything to do with your problem. Your problem sounds like something wrong with settings.php or .htaccess.

liamgoldstein’s picture

@pwolanin - Thanks for your quick response - I'll look into this... do you have any suggestions??

I did wonder about settings.php or .htaccess but ruled it out as it only happens with items relating to filtering - everything else works fine.

dww’s picture

@liamgoldstein: can you please submit a new support issue about your problem? Please don't "hijack" this thread with discussion about your unrelated troubles. Thanks.

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.