I noticed that geshifilter_filter() doesn't respond to $op=prepare. This means that certain types of code, etc. can get clobbered by other filters, such as the HTML tags filter.
For example, if you had code like this:
<cpp>
#include <stdio.h>
</cpp>
This would show up as:
#include
I took a look at codefilter.module and pretty much tried to copy what that module is doing to process the text so that it doesn't get messed up by another filter.
I've tested this patch on my site with several different types of input and it seems to work just fine. It might not be ready to go in yet, but I'd like people to be able to take a look at it and try it out on their own systems to see if there are any problems with it.
Comments
Comment #1
soxofaan commentedComment #2
soxofaan commentedI worked a bit on the patch to simplify it and solve some remaining issues.
please be free to test it and report you findings.
It's not ready for committing, yet.
Thanks to this preparation a lot of the conflict detection is not needed anymore, I have to look what I can remove.
Comment #3
aclight commentedI just ran a few quick tests with your patch and it seemed to work as well as my patch on my test page with highlighted code, but it does get closer than my patch does at fixing the problems linked to in the original post of this issue: http://drupal.org/node/70904
Specifically, your patch fixes #1, #2, and #4 on the test page at http://www.dvessel.com/node/93
Also, I'm a little worried about your use of 0xFA-0xFD. My understanding is that technically those are valid UTF-8 characters, though they were never assigned to represent anything and thus are very unlikely to be used. This statement is based on the 3rd table of the Description topic at http://en.wikipedia.org/wiki/UTF-8
Once you're done with the patch I'll try to review it more closely.
Comment #4
soxofaan commentedI'm also not very happy with the 0xFA-0xFF hack. I chose not to use 0xFE and 0xFF, because code filter uses them already.
I'm looking for a better solution, e.g. with legal utf8 encoding or base64 encoding of the whole code block, but I'm not sure yet.
For the moment I think it is safe enough to use 0xFA-0xFD however.
This new version of the patch also removes some unneeded conflict detection stuff: html tag filtering and conflicts with the line break filter.
(note that geshifilter.module got some other updates in the meantime)
Comment #5
aclight commentedI downloaded the newest 2.x-dev version and the patch applied, but with offsets. I guess you meant in your last comment that you created the patch before you made the other upgrades to the module? In any way, it applied successfully.
I tested the patch and I had the same results as I indicated for the patch in comment #3.
As for the patch itself:
The only potential problem I see is that both codefilter and GeSHi filter might respond to
<lang></lang>tags (where "lang" = "code"). In that case those sections of code would be prepared identically (assuming GeSHi filter moves to using xFE and xFF), and so whatever filter was set to come first would do the filtering. The user could get around this by using [lang][/lang] tags, which codefilter doesn't respond to.Also, in my original patch I didn't change tags in square brackets (eg. [lang][/lang), but you've done that in your patches. In retrospect, I think you're right to also prepare those blocks.
So, to summarize, I think it's best to prepare blocks of text that will be processed by GeSHi filter in the following way:
If you wanted to make sure that only GeSHi filter touched blocks that were prepared by GeSHi filter, you could do this instead:
In this block you're replacing any < and > chars within the block of code that will be filtered with xFC and xFD. I'm not sure why this is necessary or desirable. codefilter.module sends all text between the code tags through a function that replaces \r and \n (as the patch does above) but then sends the text through check_plain(), which then calls htmlentities() and encodes <, &, etc. in their safe form.
Again, I don't think that's wise. For example, if there were a filter that processed after GeSHi filter, and if that filter did no escaping of text that it would process, it would potentially process the text returned by GeSHi filter because there are unescaped HTML entities left over.
Comment #6
soxofaan commentedConcerning the patch+offset issue: the dev snapshots are only generated every 12 hours, so I guess you were too early and downloaded an outdated one ;). I wrongly assumed you were using CVS. Luckily the patch applied successfully.
Anyway, about your remarks:
I could avoid this by flagging codefilter as a conflicting filter and raise warnings when codefilter and GeSHi filter occur in the same input format.
This last point made me think of another problem however. With the GeSHi filter you have an option to do nothing with the source code by default (when there is no language attribute). This is only detected in the process phase, while it should be done in the preprocess phase, so that there would be no preparation and other filters (like html filter) could do their thing instead.
I'll work on a new patch
Comment #7
soxofaan commentedhere is the reworked patch
Comment #8
aclight commentedRe:
#1: I think flagging it as conflicting is a good idea. Is it possible to only flag it as conflicting with codefilter if geshifilter is set up to handle <code></code> tags? And only if brackets are GESHIFILTER_BRACKETS_ANGLE or GESHIFILTER_BRACKETS_BOTH? I also didn't see anything in the patch in #7 that adds this warning. It wasn't clear to me if you had added that already or if you were just planning to add that.
#2: I understand. I forgot that GeSHi calls htmlentities() on its own, so I guess calling check_plain() isn't absolutely necessary, but it doesn't seem to hurt anything (now that you call decode_entities() in the process step.
#3:
That's a good point. Does the new patch include detection of such a case? If so, where is it doing that? I must have missed it.
BTW, I tested the new patch and it still works as before. I also tested with highlighted code in a teaser and it was correctly highlighted (I hadn't tested this before).
I think this is getting close.
Comment #9
soxofaan commentedat #8
Anyway, the conflict detection is not in the patch from #7. I was planning to do it in a different patch, since it involves some extra code changes under the hood and I prefer small but functional patches.
You mean this as a good thing, no?
If you have no further remarks on the patch from #7, I'll commit it.
(thanks for the fruitful discussion btw)
Comment #10
soxofaan commentedComment #11
aclight commentedWhen I said the patch worked as before, that was a good thing. Sorry for not being clearer. Your responses in #8 all sound good to me. I have no further worries about the patch.
Thanks for working on this so much. This should really help geshifilter work more robustly.
Comment #12
soxofaan commentedcommitted: http://drupal.org/cvs?commit=84565
Comment #13
soxofaan commentedthe remaining issues mentioned in #9 are also solved:
1) conflict detection with codefilter: http://drupal.org/cvs?commit=84644
3) not escaping in prepare phase when there won't be a real processing phase: http://drupal.org/cvs?commit=84649
Comment #14
(not verified) commented