Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
As per http://drupal.org/node/208587 we had to disable code filter on d.o due to installing 5.6 core with the new UTF-8 validation stuff (http://drupal.org/node/208564). :(
I don't know more than that, perhaps killes can add more details if he has them.
Comment | File | Size | Author |
---|---|---|---|
#30 | brackets-208636-D56-28.patch | 1.93 KB | pwolanin |
#27 | d56.patch | 1.9 KB | JohnAlbin |
#19 | slashes-208636-19.patch | 1.92 KB | pwolanin |
#19 | brackets-208636-19.patch | 1.93 KB | pwolanin |
#18 | codefilter-5.6-1.patch | 1.71 KB | zeta ζ |
Comments
Comment #1
Heine CreditAttribution: Heine commentedThe issue is the prepare step in the hook_filter implementation:
This happens before the code is sent to filter_xss which will return an empty string because it encountered invalid sequences.
Comment #2
JohnAlbinWe need to find a replacement for \xFF and \xFE in the code.
I was trying to find Klingon characters to use, but apparently its inclusion in Unicode is apocryphal. :-( Damn.
I was looking at the “Private Use Area” of Unicode. Private Use Area is defined as “designated code points in the Unicode Standard or other character encoding standards whose interpretations are not specified in those standards and whose use may be determined by private agreement among cooperating users.”
Many fonts use that space to add extra font-specific ligatures or other characters that only differ by its styling from another Unicode character. So I’m not sure if that’s a safe place to use either. However, what is the likelihood that people will have font-specific characters in the database?
Just thinking out loud here.
Comment #3
zeta ζ CreditAttribution: zeta ζ commentedAssumed requirements:
Can we scan the text before the preg_replace() for the replacements, so that they can be used with confidence? We would need a ‘fail mode’ triggered by finding the replacements, either the current complete blank, or prompt to rectify at input time. The 2nd requirement then becomes;
Could this be a plan of attack?
Comment #4
hswong3i CreditAttribution: hswong3i commentedSubscribe.
Comment #5
soxofaan CreditAttribution: soxofaan commentedIn GeSHi filter I have the same problem (http://drupal.org/node/208635), because I copied the xFE/xFF approach of code filter.
I currently have a patch implementing a plain latin alternative with '[' and ']' and a namespace prefix 'geshifilter' to avoid possible collisions with other filters.
For example, in the preparation pass the string "<python>foo</python>" becomes "[geshifilter-python]foo[/geshifilter-python]".
(edit: fixed the messed up example)
Comment #6
aclight CreditAttribution: aclight commentedsubscribing
Comment #7
JohnAlbinSince this module is used on d.o., I would like some 3rd party testing, please.
Comment #8
zeta ζ CreditAttribution: zeta ζ commentedI think 3rd comment line no longer applies.
Would look even neater if 'php' were the same length as 'code'. :-)
Comment #9
hswong3i CreditAttribution: hswong3i commentedI guess we may try
chr(0)
, it seems able to replace the use of\xFE
I test
chr(0)
with following code. In case ofchr(0)
, it retun "TRUE" so we may use it:Comment #10
hswong3i CreditAttribution: hswong3i commentedSome footnotes:
chr(0)
=="\x00"
, but"\x00"
is not allowed by preg_match()."\x00"
can pass though drupal_validate_utf8(), the logic can also apply to"\x01"
and"\x02"
.I work out the above logic and able to keep every thing function. The only question is: are
"\x01"
and"\x02"
really as safe as case of"\xFE"
?Comment #11
aclight CreditAttribution: aclight commentedRe patch in #7: I get a parse error when I patch codefilter 5.x-1.x-dev:
Parse error: syntax error, unexpected $end in codefilter.module on line 119
The patch for codefilter 5.x-1.x-dev from comment #10 applies and appears to work at first glance.
I'm not sure that the method used in #10 is safe, however.
Comment #12
pwolanin CreditAttribution: pwolanin commented@hswong3i what do you mean by "safe"? characters in this range are non-printing control chars, so they would never be expected in printed text. Assuming they could be pasted into into a textarea, it seems like the worse that could happen would be that extra <code> tags might be added to the text?
In ASCII, x02 is STX (start of transmission) and x03 is ETX (end transmission), so that has a nice symmetry if you care about that.
Comment #13
pwolanin CreditAttribution: pwolanin commentedpatch in #10 seems to work fine on 5.x (or a similar version using x02, x03)
Comment #14
hswong3i CreditAttribution: hswong3i commented@pwolanin: Surly that I understand the nature of using x02 and x03, as they are non-printable characters, and so I use them. (off-topic: they are proved as safe when I am implementing Oracle driver for D6). My only concern is: do we have any special reason for choosing xFE and xFF? If we use them just because they are NOT valid UTF-8 characters + non-printable characters, change them as x02 and x03 should be safe, too.
I am now using patch in #10 for my personal blog + D5.6, and all seems to work fine. Should we push this issue to RTBC?
Comment #15
soxofaan CreditAttribution: soxofaan commentedI'm also wondering about this. Working with \xFE and \xFF always seemed more a hack than a structural solution to me. If they would be replaced by other hackish ones (like x02 and x03), there is no guarantee that this approach wont break in the future too, or break other filters in the input format stack.
Is there something fundamentally wrong with the "[codefilter-code]..." preparation solution suggested in #5 and implemented in #7?
Comment #16
pwolanin CreditAttribution: pwolanin commented@soxofann - since the main point is just to bypass any collision with text that might normally occur in a post, that seems like a reasonable approach too. As I understand it, the content is never stored or passed on to other filters in an intermediate state where those characters are present, so I don't see how other filters can cause a conflict.
Comment #17
Bevan CreditAttribution: Bevan commentedhttp://drupal.org/node/180425#comment-634230 is another victim of this issue.
I have a patch with simpletest unit test for node_teaser() pending on the resolution of this issue so that I can copy the test and expected results from the above link.
Comment #18
zeta ζ CreditAttribution: zeta ζ commentedRe #11: due to
?>
in comment.Comment #19
pwolanin CreditAttribution: pwolanin commentedhere are two slightly different, slightly improved versions of the patch by zeta-zoo
one escapes the brackets as \[ to avoid having to do that weird concat operation.
the other simply uses slashes instead (since the regex is delineated by the @ symbol)
Both use a C-style comment to avoid the closing ?> breaking the code.
Comment #20
soxofaan CreditAttribution: soxofaan commented(Resetting status to CNR, because the solution appears to be changed since it was set to RTBC)
at #19: I prefer the square brackets as delimiters because you have an open and a close bracket, while with slashes you just have one symbol, which makes things less obvious. The square brackets are also wildly used as tag/element delimiter not only in other filters but also in other CMSes, markup languages (e.g. BBCode), wikis, etc. I'm planning to go for the "[geshifilter-java]...[/geshifilter-java]" approach in GeSHi filter for syntax highlighting. It would be nice if there could be some convergence/guidelines/standardization for drupal filters instead of hacks like xFE/xFF.
About the patch itself: why the inversion in the closing element: "[codefilter-code]...[/code-codefilter]" and not just "[codefilter-code]...[/codefilter-code]" (which would be the logical thing to do, I guess)?
(I didn't really test the patch yet, I just read it, so I don't know if it works)
Comment #21
hswong3i CreditAttribution: hswong3i commentedIMHO, the approach in #10 should be enough: we keep on using xFE and xFF for more than years without any problem, and so the similar solution should be solid enough, too. On the other hand, D5.6 is target to fix a rare bug which cause by IE6 only (where IE6 is well known as buggy...), it is not easy for codefilter to face similar bug once again with existing approach.
Moreover, it is always not easy to judge for a "perfect" solution. As codefilter is used in D.O., I will suggest for a solid and prompt solution, rather than a perfect and elegant one ;-)
BTW, I am not opposing the use of a meaningful wording as placeholder; it just seems to be a bit unnecessary complicated ;-(
Comment #22
zeta ζ CreditAttribution: zeta ζ commentedYes, I prefer the open & close nature of
[]
: This is also the reason why I like . It would be a shame to separate the from its .NB My Status drop-down defaulted to RTBC. --edit despite replying to #20
Comment #23
soxofaan CreditAttribution: soxofaan commentedI understand the urgency of solving this issue and the resulting preference for small changes.
Maybe patch #10 should be committed now to the Drupal 5 version,
and an approach like in #19 could go in later or only in the Drupal 6 version for example?
Moreover, the solution like #19 is not perfect either, it's just a bit more elegant/obvious.
I would rather say that the xFE/xFF/x02/x01 stuff looks obscure/complicated ;-)
Comment #24
soxofaan CreditAttribution: soxofaan commentedat#22
I don't see the shame of it. I see it as prefixing 'code' and 'php' with namespace like prefix 'codefilter-'. The '/' is just to show it closes the code fragment (which was retagged from 'code' to 'codefilter-code').
As compromise, we could also go for "[codefilter-code]...[codefilter-/code]" ;-)
NB: sorry about the constant status changing due to cross posting, I won't touch it this time.
Comment #25
zeta ζ CreditAttribution: zeta ζ commented#19 brackets is my favourite, and works for me on FF.
+1 RTBC
Comment #26
zeta ζ CreditAttribution: zeta ζ commentedSorry.
Comment #27
JohnAlbinSorry about #7 causing a parse error. I didn't test it; I just put the patch up to get the ball rolling while I was out of town. Anyway, I'm back now and I'm glad that there has been some discussion while I was gone.
See the input format API (where \xFE and \xFF are used in the docs.) The results of 'prepare' are seen by all the other input filters. So there can be a conflict with other input filters if they choose to use the same escape sequence. For example, if codefilter and geshifilter both used \x02code\x03 during 'prepare', than there could easily be a conflict.
Therefore, I think we need to namespace the escape sequence to prevent conflicts with other input filters.
As for using \x02 and \x03…
\xFE and \xFF were used because (since they were invalid UTF-8) they were unlikely to be included in a node's content. \x02 and \x03 are control characters and are also unlikely to be included in a node's content.
However, since \x02 and \x03 are non-printable sequences of bytes and are control characters, hackers will suspect them to be possible attack vectors, rightly thinking that programmers may forget about them when writing code. We can see that the IE6 programmers made that mistake (http://drupal.org/node/208564) with the characters starting with \xC0 - \xFF. Might someone else make that mistake for characters starting with \x00-\x0F? I’d say yes. Especially since it's now been a proven attack vector.
Using [codefilter_ doesn't suffer from that possible problem.
Comment #28
hswong3i CreditAttribution: hswong3i commentedSo I have a silly question: why there is not conflict within existing implementation? If \x02code\x03 will course conflict, why codefilter and geshifilter are able to function now?
Let's back the the main topic. As this issue is a "critical bug report": a bug that cause codefilter NOT function NOW, #10 should be the most strict forward and prompt solution. I would like to suggestion an on time commit for #10 (or any similar approach, e.g. using x02 and x03). Well, again I am not sure if it is the perfect and elegant solution; BTW, at least it is functioning :-p
On the other hand, using [xxx]code[/xxx] syntax should be more elegant; BTW, as we discussed above, we have different preference about how to implement this syntax (the one that you like may be not my first choice...). Anyway, the discussion of this syntax is just about perfect and elegant, and so I would like to suggest a new "normal feature request" issue for it. As mentioned in #23, we may come back and fix the ugly \xYYcode\xZZ syntax with [xxx]code[/xxx] afterward ;-)
P.S. I guess the exploration of perfect and elegant solution shouldn't be an excuse for delay an on time bug fix patch :-)
Comment #29
zeta ζ CreditAttribution: zeta ζ commentedd56.patch fails for me.
brackets-208636-19.patch works, is obvious what it trying to do, avoids conflict and is elegant.
Comment #30
pwolanin CreditAttribution: pwolanin commentedthat patch doesn't apply for me.
Here's a re-rolled one - with C-style comment also. Tested with PHP 4.4.7, Drupal 5.x-dev
Comment #31
pwolanin CreditAttribution: pwolanin commentedcross-post. Yes, I think it's RTBC. The only difference compared to brackets-208636-19.patch is swapping the - for _ as JohnAlbin seems to prefer.
Comment #32
zeta ζ CreditAttribution: zeta ζ commentedApplies for me. I don’t mind - or _, though would be nice to have GeSHi filter on-board
Comment #33
dwwI was about to install #30 on project.drupal.org for testing. Unfortunately, it no longer applies against the end of the DRUPAL-5 branch, since John committed revision 1.21.2.7:
http://drupal.org/cvs?commit=95578
Therefore, this needs a re-roll...
Comment #34
JohnAlbinLooking at the issue queue and mapping out some other changes that will need to be made shortly, I've moved the [codefilter_ string wrapping to the codefilter_escape function.
Derek, I've committed a slightly modified version of #27 to DRUPAL-5. If you want to test it on project.drupal.org, I can wait until you report back before tagging this as DRUPAL-5--1-1.
Comment #35
zeta ζ CreditAttribution: zeta ζ commentedI’ve just posted Coding standards – Temporary placeholders and delimiters inspired by this. Hope you think this is appropriate and useful.
Comment #36
zeta ζ CreditAttribution: zeta ζ commentedIssue for Coding standards – Temporary placeholders and delimiters
Comment #37
dwwi updated p.d.o to the latest code in DRUPAL-5. see http://project.drupal.org/node/85771#comment-158116 --- seems to work. further poking around on project.d.o would be appreciated. make some code and php comments in various tricky situations, make sure nested code tags are handled right, etc... thanks!
Comment #38
zeta ζ CreditAttribution: zeta ζ commented<code>...</code> & <?php ?> seem to work.
<?php ?> will nest in <code>...</code> to show php embedded in html.
<code>...</code> is quoted when nested in <?php ?>: seems reasonable. </code> doesn’t end php block.
<code>...</code> will NOT nest in <code>...</code> though it doesn’t cause a problem: First </code> ends block.
<code>...</code> will NOT nest in <?php ?> that is nested in <code>...</code> as it ends the php prematurely, so no highlighting. Would anyone need this structure?
Seems to be good to deploy.
Comment #39
JohnAlbinNick, thanks for the testing. The nested code tags didn't work in the previous version either. Guess that's a separate bug.
Codefilter 5.x-1.1 is now available!
Thanks to all who helped flush out a solution for this bug! It made it much easier.
Comment #40
NancyDruI'm sure it's an extreme edge case, but I did have an instance the other day to have <code> within <code> and it didn't work.
Comment #41
JohnAlbinI've opened an issue for the </code> within <code> bug: http://drupal.org/node/210046
Comment #42
liamgoldstein CreditAttribution: liamgoldstein commentedHi,
I'm not sure if this is even the same issue but I can't add any html to my content.
I have tried to manually apply the patch to filter.module (as I don't have a local copy or ssh access on my hosting) but it's still not fixing the issue.
I have a 'out of the box' install of drupal 5.6 and It has never allowed me to add html to my content whether I choose filtered or full. When i try to preview or submit content or even change the input format configuration I just get redirected to the root of my site (i.e it doesn't display the same page with an 'updated' message it takes me to the base url).
If anyone has any ideas or is willing to give me some help I would be sooo greatful.
Thanks.
Comment #43
pwolanin CreditAttribution: pwolanin commented@liamgoldstein - this module is not part of Drupal core, so it would not have anything to do with your problem. Your problem sounds like something wrong with settings.php or .htaccess.
Comment #44
liamgoldstein CreditAttribution: liamgoldstein commented@pwolanin - Thanks for your quick response - I'll look into this... do you have any suggestions??
I did wonder about settings.php or .htaccess but ruled it out as it only happens with items relating to filtering - everything else works fine.
Comment #45
dww@liamgoldstein: can you please submit a new support issue about your problem? Please don't "hijack" this thread with discussion about your unrelated troubles. Thanks.
Comment #46
Anonymous (not verified) CreditAttribution: Anonymous commentedAutomatically closed -- issue fixed for two weeks with no activity.