When you submit a & in the translation it gets encoded to & before it's put in the database. This results in the translated string being displayed as & because the & from & is encoded again at display time.

Steps:
1. activate locale module
2. activate another language
3. switch languages (set default to the other language or in your preferences)
4. in "site configuration" > "localization" > "manage strings"
5. search for "My account"
6. Edit the (non-existing) translation
7. Put in "one & two" and "save translations"
8. The menu entry now display "one & two"

This is caused by the filter_xss function in the filters.module. The string goes through filter_xss_admin before getting saved to the locales_target, so the value that actually get's saved is one & two instead of one & two

This is in 5.0 and 4.7.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

chx’s picture

Status: Active » Closed (won't fix)

I am not inclined to fix this the language string interface is for quick & dirty fixes, if you want anything fancy export, edit, import.

Jax’s picture

Status: Closed (won't fix) » Active

I'm sorry but the language interface is used often on multi-lingual sites. When you create menu links and they have to be translated my customers use this to enter the menu strings in the other languages. For example I have a site where I needed to add the menu entry "Purchase & Sale". When the customer translated it, it didn't look ok. So I had to edit it manually in the DB.

Maybe you are nog inclined to fix it but at least give me the opportunity to provide a patch.

Takafumi’s picture

FileSize
602 bytes

I think that this issue is bug.
So, accept this patch to fix it.

Takafumi’s picture

Status: Active » Needs review
Jax’s picture

The issue here is that the & is saved as & in the table. Modifiing check_plain so that the & is turned back into a & at display time isn't a good idea IMO.

Takafumi’s picture

I think that it's processing necessary to prevent XSS(or other attack) that encodes “&" to the character entity before storing to DB.

Jax’s picture

Look at how nodes are saved in the db table. There the & is saved as a &. It's not good practive to save altered data in the db. You should escape html entities at display time only.

Takafumi’s picture

FileSize
907 bytes

Try this patch if you don't want to encode '&'.

drumm’s picture

Status: Needs review » Fixed

Committed to HEAD.

Takafumi’s picture

Status: Fixed » Closed (fixed)

thanks.

Jax’s picture

Title: Cannot submit "&" in translation » Cannot submit any special character in translation
Status: Closed (fixed) » Active

All this patch does is reconvert the & to & which was converted by filter_xss() in the first place. It does nothing to fix the cause.

Try to translate with any other 'special' character like < or >, they are converted to their html entity (by filter_xss() and saved as an entity in the DB. Then the & from the entity is escaped again at display time and you end up with &lt; or &gt; in the menu.

If we wanted to solve those as well in the same manner as this patch all we're implementing is filter_xss()-1.

Changed title to indicate we're not only talking about &.

Takafumi’s picture

Status: Active » Needs review
FileSize
1.43 KB

The root of the issue is in check_plain().
So, accept this patch to fix it again.

Jax’s picture

Status: Needs review » Needs work

IMO, the root of the issue is that the value to be saved goes through filter_xss_admin before it is saved to the database. This happens in locale.inc on line 425.

There the & is converted to a &amp; and afterwards, that value is saved in the table. The value that should be saved in the database is the original & NOT the &amp;.

Your patch reconverts the saved &amp; to & before it is converted to &amp; again. Not the root cause of this issue.

Takafumi’s picture

Status: Needs work » Needs review

Storing it after the special character is converted to character entity is not a issue.
However, character entity can't be used in the text as long as the patch is not applied.

Jax’s picture

Imagine, like I said before, it is bad practice to store altered characters in the database. You want to submit the data the user submitted. Maybe stripped from some characters, but the characters you're going to keep should be unaltered.

Because if you change a & to a &amp; before storing, how will you &amp; when a user submits it? Also if your display policies change you have no way of knowing what the user actually submitted as data the first time.

I'm done explaining this. This patch breaks other behavior that is there by design (eg. http://drupal.org/node/97642) and is just a bad idea. I'm not putting this back in 'code needs work' since imagine will probably just set it back to review.

drumm’s picture

Status: Needs review » Needs work

Please provide a few specific test cases and what is expected to be stored in the database and output to the screen.

killes@www.drop.org’s picture

I've backported the patch that drumm committed.

Takafumi’s picture

Jax's opinion are "how to store the entered strings in DB", but the issue that I judged is "how to output the entered character entity".
I decided to report as another issue.
Because, I think that "how to store the entered strings in DB" is not a problem, and it seems to differ from **Jax's** issue.

Steven’s picture

Status: Needs work » Needs review
FileSize
877 bytes

All these solutions are essentially horrible. Patching check_plain() is bad because it applies globally, and doesn't give you the benefit of using HTML, but still just forces you to double escape HTML entities when talking about them.

The root of the problem is that the text passed into t() can be both plain-text as well as HTML. Because there is no reliable way to distinguish the two, we should just remove the call to filter_xss_admin() on locale import. It is obviously causing problems.

We really can't reliably clean up imported locale strings. At most we can come up with hacks that will work in most cases. But I think it's better to just remove it from core.

Perhaps a better approach is simply to make a locale checker script that throws warnings? We could use the following snippet to reliably detect possibly bad strings, without false negatives:

if (decode_entities($value) != decode_entities(filter_xss_admin($value))) { /* possibly bad value */ }

This would not cause problems when a lone ampersand is used in a string, but still throw errors when bad stuff is included. We can't use a similar snippet on import though, because it does not preserve HTML validity of the value.

Takafumi’s picture

This issue doesn't exist only in t(). It's necessary to recognize exist in also all functions through the check_plain().

Takafumi’s picture

This issue doesn't exist only in t().
It's not only t () this issue exists.

Jax’s picture

Imagine, there is nothing wrong with check_plain(). When your text contains a special character, like & (either in title or body) and you submit it. The & is saved in the table. At display time the & is converted to &amp; by check_plain to make sure it displayed correctly. check_plain() is nothing more than a call to htmlspecialchars with an extra option.

Takafumi’s picture

No, it is wrong recognition.
Well, there is no problem if '&' is used alone. However, '&' will be used as a part of '&gt' and '&lt'(and much more) of character entity.
Therefore, the problem occurs if '&' is doubly encoded in check_plain().

Steven’s picture

There is no need to use HTML entities inside check_plain()'d text. You can already use any unicode character directly.

The actual problem here has already been described. Locale assumes its strings are all HTML, which is not true.

Takafumi’s picture

If you want to keep existing bug, I stay out of this issue.

Jax’s picture

Anyway, I confirm that patch #19 solves the issue I was experiencing which was that it was not possible to add characters like & > etc.... to a translated string.

Takafumi’s picture

It's effective only at first time post, even if the patch of #19 is applied.
It's converted into character entity when editing again and enters the text area.
Therefore, patch of #19 is not too important.

Steven’s picture

FileSize
5.34 KB

There were indeed more calls to filter_xss_admin() elsewhere. Attached a patch to remove them all, including the str_replace.

drumm’s picture

Priority: Normal » Critical

I think this should be fixed and looks okay. Any objections? Security review?

Heine’s picture

FileSize
5.34 KB

Rerolled without parse error. Will review.

Heine’s picture

Right now we have two permissions that enable a user to gain more privileges:

1. administer users
2. administer access control

That these two permissions can be used to escalate someone's access is quite obvious. That is not true for the administer locales permission, however.

Now, I know that the administer locales permission will not be handed out that often, but if you have a permission it should sharply define what is possible.

I see several options:

1. Won't fix the current double escape problem.
2. Remove string editing from core all together.
3. Make it obvious that administer locales is a 'dangerous' permission.
4. Tie administer locales to another permission.
5. Patch goes in as is.
6. ?

Heine’s picture

Status: Needs review » Needs work

Killes alerted me to the fact that this patch combined with our CVS translation policies allow malicious CVS account holders to execute XSS attacks via our .po files:

I had added this filtering during March when I noticed that imported
files could be used to insert xss. Our PO files are not reviewed or
protected in any way and malicious cvs account owners could add evil XSS
stuff to any PO file, even core PO files.

I realize malicious CVS account holders can do harm via their own modules, but never they cannot taint core or other modules atm.

So a -1 on this patch *until* we fix our CVS policies and work out the issues I mentioned above.

Steven’s picture

Where translations come from is a matter of trust... holding this patch because the necessary CVS scanning infrastructure is in place is kind of silly. That's like saying we should keep module installation broken because someone might post an insecure one.

Whether or not a module can 'taint core' is stupid. A module could easily leave behind PHP code in a node or block to do its bidding after it's been uninstalled too, or a module could even insert malicious translation strings itself on install.

As I already said before, the following test can be used to detect possibly bad translation strings. It should not have any false positives. But it is only a test, not a transformation, as it destroys valid HTML.

if (decode_entities($value) != decode_entities(filter_xss_admin($value))) { /* possibly bad value */ } 

Either we build a scanning script for cvs, or we add this check to Drupal core when importing. I cannot guarantee that this will never give any false positives, although for the typical cases of translatable strings, it should not be a problem. It should certainly not let any XSS through.

However, I believe we should treat this as a separate issue. This patch itself fixes a bug when importing and editing locales, namely that some strings are irreversibly corrupted.

killes@www.drop.org’s picture

IMO this is a "won't fix". We do not have any strings on core that have & for the menu.

drumm’s picture

Should I revert the previously applied patch?

Anonymous’s picture

not sure where to say this but with using the t() function, there are places where for example in formsapi where there is something like..

// taken from imagefield.module
$form['image_path'] = array(
        '#description' => t('Optional subdirectory within the "%dir" directory where images will be stored. Do not include trailing slash.', array('%dir' => theme('placeholder', variable_get('file_directory_path', 'files')))), 
      );

now with having the theme('placeholder'.. that adds in , it's converting the tags out so the style isn't applying and you just see the tag.

Steven’s picture

Status: Needs work » Fixed

Steve: you're not making much sense. If %value placeholders are used, the output is HTML and should be printed as such. Forms API just passes through the #description strings, it does not escape them.

Anyway. Discussed with Dries: the filter_xss calls are removed. We will look at an xss safeguard for .po files for 6.0 or for contrib, but this is a highly theoretical attack vector. There are much easier ways to screw up Drupal sites, and we have our hands full already.

Dries’s picture

Steve: come again?

Anonymous’s picture

lol..

ok, in the imagefield module, it has that snippet i showed... and because it uses theme_placeholder, it wraps that description with an html tag of .

now when formsapi takes that #description, it converts the tag to html chars and displays the tag as is and not the style.

is that a better description? lol

Anonymous’s picture

Status: Fixed » Closed (fixed)