When you submit a &
in the translation it gets encoded to &
before it's put in the database. This results in the translated string being displayed as &
because the &
from &
is encoded again at display time.
Steps:
1. activate locale module
2. activate another language
3. switch languages (set default to the other language or in your preferences)
4. in "site configuration" > "localization" > "manage strings"
5. search for "My account"
6. Edit the (non-existing) translation
7. Put in "one & two" and "save translations"
8. The menu entry now display "one & two"
This is caused by the filter_xss function in the filters.module. The string goes through filter_xss_admin before getting saved to the locales_target, so the value that actually get's saved is one & two
instead of one & two
This is in 5.0 and 4.7.
Comment | File | Size | Author |
---|---|---|---|
#30 | locale_xss_1.patch.txt | 5.34 KB | Heine |
#28 | locale_xss_0.patch | 5.34 KB | Steven |
#19 | locale_xss.patch | 877 bytes | Steven |
#12 | patch0.txt | 1.43 KB | Takafumi |
#8 | locale_diff1.txt | 907 bytes | Takafumi |
Comments
Comment #1
chx CreditAttribution: chx commentedI am not inclined to fix this the language string interface is for quick & dirty fixes, if you want anything fancy export, edit, import.
Comment #2
Jax CreditAttribution: Jax commentedI'm sorry but the language interface is used often on multi-lingual sites. When you create menu links and they have to be translated my customers use this to enter the menu strings in the other languages. For example I have a site where I needed to add the menu entry "Purchase & Sale". When the customer translated it, it didn't look ok. So I had to edit it manually in the DB.
Maybe you are nog inclined to fix it but at least give me the opportunity to provide a patch.
Comment #3
Takafumi CreditAttribution: Takafumi commentedI think that this issue is bug.
So, accept this patch to fix it.
Comment #4
Takafumi CreditAttribution: Takafumi commentedComment #5
Jax CreditAttribution: Jax commentedThe issue here is that the
&
is saved as&
in the table. Modifiingcheck_plain
so that the&
is turned back into a&
at display time isn't a good idea IMO.Comment #6
Takafumi CreditAttribution: Takafumi commentedI think that it's processing necessary to prevent XSS(or other attack) that encodes “&" to the character entity before storing to DB.
Comment #7
Jax CreditAttribution: Jax commentedLook at how nodes are saved in the db table. There the
&
is saved as a&
. It's not good practive to save altered data in the db. You should escape html entities at display time only.Comment #8
Takafumi CreditAttribution: Takafumi commentedTry this patch if you don't want to encode '&'.
Comment #9
drummCommitted to HEAD.
Comment #10
Takafumi CreditAttribution: Takafumi commentedthanks.
Comment #11
Jax CreditAttribution: Jax commentedAll this patch does is reconvert the
&
to&
which was converted byfilter_xss()
in the first place. It does nothing to fix the cause.Try to translate with any other 'special' character like < or >, they are converted to their html entity (by
filter_xss()
and saved as an entity in the DB. Then the&
from the entity is escaped again at display time and you end up with<
or>
in the menu.If we wanted to solve those as well in the same manner as this patch all we're implementing is
filter_xss()
-1.Changed title to indicate we're not only talking about
&
.Comment #12
Takafumi CreditAttribution: Takafumi commentedThe root of the issue is in check_plain().
So, accept this patch to fix it again.
Comment #13
Jax CreditAttribution: Jax commentedIMO, the root of the issue is that the value to be saved goes through filter_xss_admin before it is saved to the database. This happens in locale.inc on line 425.
There the & is converted to a & and afterwards, that value is saved in the table. The value that should be saved in the database is the original & NOT the &.
Your patch reconverts the saved & to & before it is converted to & again. Not the root cause of this issue.
Comment #14
Takafumi CreditAttribution: Takafumi commentedStoring it after the special character is converted to character entity is not a issue.
However, character entity can't be used in the text as long as the patch is not applied.
Comment #15
Jax CreditAttribution: Jax commentedImagine, like I said before, it is bad practice to store altered characters in the database. You want to submit the data the user submitted. Maybe stripped from some characters, but the characters you're going to keep should be unaltered.
Because if you change a & to a & before storing, how will you & when a user submits it? Also if your display policies change you have no way of knowing what the user actually submitted as data the first time.
I'm done explaining this. This patch breaks other behavior that is there by design (eg. http://drupal.org/node/97642) and is just a bad idea. I'm not putting this back in 'code needs work' since imagine will probably just set it back to review.
Comment #16
drummPlease provide a few specific test cases and what is expected to be stored in the database and output to the screen.
Comment #17
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedI've backported the patch that drumm committed.
Comment #18
Takafumi CreditAttribution: Takafumi commentedJax's opinion are "how to store the entered strings in DB", but the issue that I judged is "how to output the entered character entity".
I decided to report as another issue.
Because, I think that "how to store the entered strings in DB" is not a problem, and it seems to differ from **Jax's** issue.
Comment #19
Steven CreditAttribution: Steven commentedAll these solutions are essentially horrible. Patching check_plain() is bad because it applies globally, and doesn't give you the benefit of using HTML, but still just forces you to double escape HTML entities when talking about them.
The root of the problem is that the text passed into t() can be both plain-text as well as HTML. Because there is no reliable way to distinguish the two, we should just remove the call to filter_xss_admin() on locale import. It is obviously causing problems.
We really can't reliably clean up imported locale strings. At most we can come up with hacks that will work in most cases. But I think it's better to just remove it from core.
Perhaps a better approach is simply to make a locale checker script that throws warnings? We could use the following snippet to reliably detect possibly bad strings, without false negatives:
if (decode_entities($value) != decode_entities(filter_xss_admin($value))) { /* possibly bad value */ }
This would not cause problems when a lone ampersand is used in a string, but still throw errors when bad stuff is included. We can't use a similar snippet on import though, because it does not preserve HTML validity of the value.
Comment #20
Takafumi CreditAttribution: Takafumi commentedThis issue doesn't exist only in t(). It's necessary to recognize exist in also all functions through the check_plain().
Comment #21
Takafumi CreditAttribution: Takafumi commentedThis issue doesn't exist only in t().It's not only t () this issue exists.
Comment #22
Jax CreditAttribution: Jax commentedImagine, there is nothing wrong with check_plain(). When your text contains a special character, like & (either in title or body) and you submit it. The & is saved in the table. At display time the & is converted to & by check_plain to make sure it displayed correctly. check_plain() is nothing more than a call to htmlspecialchars with an extra option.
Comment #23
Takafumi CreditAttribution: Takafumi commentedNo, it is wrong recognition.
Well, there is no problem if '&' is used alone. However, '&' will be used as a part of '>' and '<'(and much more) of character entity.
Therefore, the problem occurs if '&' is doubly encoded in check_plain().
Comment #24
Steven CreditAttribution: Steven commentedThere is no need to use HTML entities inside check_plain()'d text. You can already use any unicode character directly.
The actual problem here has already been described. Locale assumes its strings are all HTML, which is not true.
Comment #25
Takafumi CreditAttribution: Takafumi commentedIf you want to keep existing bug, I stay out of this issue.
Comment #26
Jax CreditAttribution: Jax commentedAnyway, I confirm that patch #19 solves the issue I was experiencing which was that it was not possible to add characters like & > etc.... to a translated string.
Comment #27
Takafumi CreditAttribution: Takafumi commentedIt's effective only at first time post, even if the patch of #19 is applied.
It's converted into character entity when editing again and enters the text area.
Therefore, patch of #19 is not too important.
Comment #28
Steven CreditAttribution: Steven commentedThere were indeed more calls to filter_xss_admin() elsewhere. Attached a patch to remove them all, including the str_replace.
Comment #29
drummI think this should be fixed and looks okay. Any objections? Security review?
Comment #30
Heine CreditAttribution: Heine commentedRerolled without parse error. Will review.
Comment #31
Heine CreditAttribution: Heine commentedRight now we have two permissions that enable a user to gain more privileges:
1. administer users
2. administer access control
That these two permissions can be used to escalate someone's access is quite obvious. That is not true for the administer locales permission, however.
Now, I know that the administer locales permission will not be handed out that often, but if you have a permission it should sharply define what is possible.
I see several options:
1. Won't fix the current double escape problem.
2. Remove string editing from core all together.
3. Make it obvious that administer locales is a 'dangerous' permission.
4. Tie administer locales to another permission.
5. Patch goes in as is.
6. ?
Comment #32
Heine CreditAttribution: Heine commentedKilles alerted me to the fact that this patch combined with our CVS translation policies allow malicious CVS account holders to execute XSS attacks via our .po files:
I realize malicious CVS account holders can do harm via their own modules, but never they cannot taint core or other modules atm.
So a -1 on this patch *until* we fix our CVS policies and work out the issues I mentioned above.
Comment #33
Steven CreditAttribution: Steven commentedWhere translations come from is a matter of trust... holding this patch because the necessary CVS scanning infrastructure is in place is kind of silly. That's like saying we should keep module installation broken because someone might post an insecure one.
Whether or not a module can 'taint core' is stupid. A module could easily leave behind PHP code in a node or block to do its bidding after it's been uninstalled too, or a module could even insert malicious translation strings itself on install.
As I already said before, the following test can be used to detect possibly bad translation strings. It should not have any false positives. But it is only a test, not a transformation, as it destroys valid HTML.
Either we build a scanning script for cvs, or we add this check to Drupal core when importing. I cannot guarantee that this will never give any false positives, although for the typical cases of translatable strings, it should not be a problem. It should certainly not let any XSS through.
However, I believe we should treat this as a separate issue. This patch itself fixes a bug when importing and editing locales, namely that some strings are irreversibly corrupted.
Comment #34
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedIMO this is a "won't fix". We do not have any strings on core that have & for the menu.
Comment #35
drummShould I revert the previously applied patch?
Comment #36
Anonymous (not verified) CreditAttribution: Anonymous commentednot sure where to say this but with using the t() function, there are places where for example in formsapi where there is something like..
now with having the theme('placeholder'.. that adds in , it's converting the tags out so the style isn't applying and you just see the tag.
Comment #37
Steven CreditAttribution: Steven commentedSteve: you're not making much sense. If %value placeholders are used, the output is HTML and should be printed as such. Forms API just passes through the #description strings, it does not escape them.
Anyway. Discussed with Dries: the filter_xss calls are removed. We will look at an xss safeguard for .po files for 6.0 or for contrib, but this is a highly theoretical attack vector. There are much easier ways to screw up Drupal sites, and we have our hands full already.
Comment #38
Dries CreditAttribution: Dries commentedSteve: come again?
Comment #39
Anonymous (not verified) CreditAttribution: Anonymous commentedlol..
ok, in the imagefield module, it has that snippet i showed... and because it uses theme_placeholder, it wraps that description with an html tag of .
now when formsapi takes that #description, it converts the tag to html chars and displays the tag as is and not the style.
is that a better description? lol
Comment #40
(not verified) CreditAttribution: commented