Previously, filtering of translation strings was removed from core in this issue: http://drupal.org/node/97640
However, the resulting possibility that translation strings could contain dangerous script, etc, has not been addressed since.
Rather than filtering on input (which was clearly broken) why not simply validate using the suggested code from Steven:
#33
Steven - December 19, 2006 - 19:09Where translations come from is a matter of trust... holding this patch because the necessary CVS scanning infrastructure is in place is kind of silly. That's like saying we should keep module installation broken because someone might post an insecure one.
Whether or not a module can 'taint core' is stupid. A module could easily leave behind PHP code in a node or block to do its bidding after it's been uninstalled too, or a module could even insert malicious translation strings itself on install.
As I already said before, the following test can be used to detect possibly bad translation strings. It should not have any false positives. But it is only a test, not a transformation, as it destroys valid HTML.
if (decode_entities($value) != decode_entities(filter_xss_admin($value))) { /* possibly bad value */ }
Either we build a scanning script for cvs, or we add this check to Drupal core when importing. I cannot guarantee that this will never give any false positives, although for the typical cases of translatable strings, it should not be a problem. It should certainly not let any XSS through.
However, I believe we should treat this as a separate issue. This patch itself fixes a bug when importing and editing locales, namely that some strings are irreversibly corrupted.
In the unlikely case that the input strings fails this validation, the translator can probably work around by using an entity, right?
Comment | File | Size | Author |
---|---|---|---|
#44 | validate-translation-5x-276111-44.patch | 5.65 KB | pwolanin |
#42 | validate-translation-6x-276111-40.patch | 7.11 KB | pwolanin |
#39 | validate-translation-7x-276111-38.patch | 12.65 KB | pwolanin |
#37 | validate-translation-7x-276111-37.patch | 12.41 KB | pwolanin |
#35 | validate-translation-7x-276111-35.patch | 12.41 KB | pwolanin |
Comments
Comment #1
pwolanin CreditAttribution: pwolanin commentedHere's a quick patch for 7.x. I tested the UI part, but not the import part yet.
same patch applies with offset to 6.x. A backport for 5.x would be needed.
Comment #2
Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commentedLooks good but needs some testing with more than one translation.
Comment #3
Dries CreditAttribution: Dries commentedpwoladin, how about a simpletest test case?
Comment #4
pwolanin CreditAttribution: pwolanin commented@Dries - I'm not sure about the import part - can certainly do one for the UI.
Comment #5
pwolanin CreditAttribution: pwolanin commentedone problem - I get a huge # of exceptions for the existing locale test:
87 passes, 0 fails, 810 exceptions
Appears to be due to a stale static variable, since the exceptions all relate to a missing custom language that I created via the UI
t.language = 'en-US
Comment #6
pwolanin CreditAttribution: pwolanin commentedseparate issue for the test exceptions: http://drupal.org/node/276153
note, we may want to restrict the list of tags further than filter_xss_admin allows . For example <img> tags are common xss vectors for IE. Is there any reason a translator would include images or tables, etc?
a possible list:
'a', 'b', 'big', 'del', 'em', 'i', 'ins', 'pre', 'q', 'small', 'span', 'strong', 'sub', 'sup', 'tt', 'ol', 'ul', 'li', 'p', 'br'
Obviously, 'a' can be problematic too, but I think there are a number of t() texts that use it.
Comment #7
pwolanin CreditAttribution: pwolanin commentedpatch with test and modified tag list. Not sure how to check the import - probably that should be an additional test.
Comment #8
pwolanin CreditAttribution: pwolanin commentedok, silly not to factor out that common list of tags into a single function. Attached does so and is more expansive than the list above - includes almost all tags from filter_xss_admin to avoid false positives.
Comment #9
Damien Tournoud CreditAttribution: Damien Tournoud commentedHum. I don't see the point in validating user input here. Doesn't Drupal assume that the administrator (which is the only one that can import or edit translations by default) is always fully trusted?
Moreover, the easier attack vector on translation files seems to me not from translation strings, but directly on the plural formula, that is evaluated as PHP code without any sanitation.
Comment #10
pwolanin CreditAttribution: pwolanin commentedNo, I don't assume that someone uploading a translation is fully trusted, or someone on a large site whose only authority is to translate strings should be able to hijack the site. Clearly some administrative permissions require full trust- but translating the interface?
Comment #11
pwolanin CreditAttribution: pwolanin commentedstill applies with offset
Comment #12
meba CreditAttribution: meba commentedI don't like this line:
Which string? This may be confusing not only for end users. Can you include the string with check_plain (so it's safe :)? Or at least some unique information about the string so you are able to identify it.
Comment #13
pwolanin CreditAttribution: pwolanin commentedsure, good idea.
Comment #14
meba CreditAttribution: meba commentedI was a little bit confusing in my last message. The line I was reffering to was correct. However, the same applies for importing the PO file.
Attaching a patch which remembers strings that were skipped and presents them to the user.
However: marking this as Code needs work because during testing, I noticed that this string gets skipped:
Due to the tag <none> so I guess the approach with tags whitelisting may not be enough. Anyway, I've (I hope that correctly) grepped Czech translation of core, which is 100 % complete, for HTML tags and here we go:
But there may be much more tags in contrib...
Comment #15
meba CreditAttribution: meba commentedOops, sorry. Here is the list:
Comment #16
Gábor Hojtsy1. I think we'd better do a list of HTML tags appearing in Drupal core and contrib at this point and discuss support for those. I would say it is unlikely that more will appear. We clearly need the a tag, since inline HTML links are explicitly suggested for easier translations. meba has a great list for core. I've heard he might be doing a full contrib scan as well, so it would be great to see the bigger picture.
2. While I share Damien Tournoud's concerns on the plural formula which is converted and then evaluated as PHP code, we should tighten up where we can. We can certainly improve on validation of the plural forumla, since we know about the few characters which are allowed. So we can quite clearly define the possible parts of a plural formula and lock is down hopefully enough. That should be another issue. I've opened it at http://drupal.org/node/323155 with detailed background info and suggestions.
3. This issue is not only to be backported to 5.x and 6.x, it is also relevant for l10n_client and l10n_server, since those should also stop people submitting translations which are not valid. This whole situation of being worries about translations arose from the more commonplace translation tools, which allow anyone to contribute not just people with vetted CVS accounts.
Comment #17
Gábor HojtsyOk, http://drupal.org/node/323155 turned out to be a non-issue / false alarm. We should be fully back at dealing with locale input sanitization on the HTML :).
Comment #18
Jose Reyero CreditAttribution: Jose Reyero commentedI think the HTML tags whiltelist is the way to go.
Just an idea to consider: What if we get out the html tags from the English string, and use that very same tags for validation of the translated string? Is there any case where we may need different tags?
Also, better if we have a reusable validation function, so it can be used by other module/s too like the l10n collection.
Comment #19
meba CreditAttribution: meba commentedAttaching a list of tags I found in contrib. There are some very interesting tags :)
Please note that if a tag has several attributes, there can be any combination of them, I joined them together.
Comment #20
Gábor HojtsyOh, wow,
<?php !invoke; print l($node->
is clearly a WTF, agreed. As for the others, there are mostly acceptable tags, but there are suspicious tags, like<description>
and<destination>
, which seem like indicating an inline tag requirement in t()-s where some "random" XML format is generated, not HTML. How do we deal with that? We suggest people using inline tags in t() to avoid breaking it down to untranslatable half-sentences and words. This is going to be quite interesting... Hm.Comment #21
meba CreditAttribution: meba commentedThe PHP comes from:
!invoke; print l($node->title, $node->field_file_field[0]['filepath'], array('class' => 'media'));
jquery_media: 6.x-1.0 (1), 6.x-1.1 (1)
Comment #22
meba CreditAttribution: meba commentedAll unusual tags:
<description>
Used in Drupal core, og_aggregator
<destination>, <command>, <classes>:
Drush module obviously :)
<basic>:
Seems like Views module
<input>:
Several modules
<netnews>
Netnews module
<script>
Several modules
<use>:
FB module
<current>:
Facebook
<hx>:
<empty>:
CCK, Fieldreference, uc_cart_widget
Comment #23
Gábor HojtsyJose: validating against the source string does not really work when you import a .po file since then you have source and target strings as "user input", so the source can be tainted as well. When does that show up anywhere on the UI then is good question though.
Comment #24
Gábor Hojtsymeba: these look like things we can tell people to just move out of t(), since they are not stuff to actually translate. Then we could use the rest of the list for the whitelisting filtering code.
Comment #25
pwolanin CreditAttribution: pwolanin commentedIt seems that there should never be literal brackets (< or >) in translatable strings - they need to be written as <
Comment #26
Gábor HojtsyErm, no. There should be HTML links for example in translatable text to preserve context for the text of the link for translation.
Comment #27
pwolanin CreditAttribution: pwolanin commentednew issue to fix <none> and the like in core: http://drupal.org/node/329998
Note - this is done correctly in block module itself, it's just the install file's schema description that is wrong.
Comment #28
pwolanin CreditAttribution: pwolanin commented@Gabor - to clarify my comment about brackets - I meant that if the brackets were part of the text (not actual tags) they should be written that way.
Comment #29
pwolanin CreditAttribution: pwolanin commentedSince #329998: Remove all unescaped <> chars from t() strings in core was committed, here's a re-roll of the patch from #8 fixing the error text as suggested to include the escaped string which was rejected.
Comment #30
pwolanin CreditAttribution: pwolanin commentedwhoops - in a stroke of irony, the new assertions in locale.test are missing t().
Also added additional test for the validation of strings when imported from a file, and refactored the checks into a single helper function that can be re-used elsewhere.
Comment #31
Gábor HojtsyLooks good to me. Passes test. It does validate strings on UI submission, file import and provides an API function which modules such as l10n_server and l10n_client can use to do the same. I was thinking of two remaining things:
- It might be better to put the validation into _locale_import_one_string_db() since that is the last stop before the database. That would help people reuse that function to get prevented.
- _locale_import_one_string_db() might be improved then to also count 'refused' strings, which would be another possibility in $report, so that we can report refused strings on import as well. That would help people debug bad translations. Some modules might still use disallowed tags, so that would help us migrate to a safer localization environment with contributed modules. So in short, we should not be entirely silent about dropped strings. Let the user debug (and know we protected them :).
Comment #32
pwolanin CreditAttribution: pwolanin commented@Gabor - I think for D7 that would be a good idea- but I want something we can backport to 6 quickly - is something with that level of API change going to be ok for D6?
Comment #33
Gábor HojtsyIt is a security improvement, so I think a little bit of API change is acceptable in this case. It is really just:
- a new key in the $report array, existing ones are not changed
- a new message added to the number of things imported (I'd prefer to concatenate a new sentence, instead of modifying the existing one, so that we keep the existing); maybe modify the existing one in Drupal 7, add a new string on Drupal 6
Otherwise I have no issues with the patch, so keep up the good work!
Comment #34
pwolanin CreditAttribution: pwolanin commentedok, how about this?
Comment #35
pwolanin CreditAttribution: pwolanin commentedhere's a better patch using format_plural() and a more robust test for form errors.
Comment #36
Gábor HojtsyLooks great. The only point I have to say now, is to move the comment from the top of locale_string_is_safe() to the phpdoc above it, so it appears better in API docs. That is minor, so setting back for review. As long as tests pass, I am all for this patch.
Comment #37
pwolanin CreditAttribution: pwolanin commentedok, moved comments.
Comment #38
pwolanin CreditAttribution: pwolanin commentedComment #39
pwolanin CreditAttribution: pwolanin commentedfurther improved code comment.
Comment #40
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD. Thanks!
Comment #41
pwolanin CreditAttribution: pwolanin commentedThanks, Dries. We also need to backport this for 6.x and 5.x
Comment #42
pwolanin CreditAttribution: pwolanin commentedHere's an intial 6.x backport (not tested yet).
Comment #43
pwolanin CreditAttribution: pwolanin commentedComment #44
pwolanin CreditAttribution: pwolanin commenteda 5.x backposrt as well (needs testing)
Comment #45
Gábor HojtsyThanks, committed to 6.x. Since the 5.x is a bit different, it probably needs some more testing there.
Comment #46
hass CreditAttribution: hass commentedThere is a minor typo in the committed code
becuase
. http://cvs.drupal.org/viewvc.py/drupal/drupal/includes/locale.inc?r1=1.1...Comment #47
Damien Tournoud CreditAttribution: Damien Tournoud commentedThis caused issues with i18n "block translation" feature, where a lot of markup should be allowed... Please see #352121: locale_string_is_safe() is not suitable for user provided text in non-default textgroups.
Comment #48
Gábor Hojtsy#352121: locale_string_is_safe() is not suitable for user provided text in non-default textgroups is a valid issue so I retitled it and following up there.
Comment #49
TR CreditAttribution: TR commented#40 says this was fixed in Drupal 7.x and #45 says this was fixed in Drupal 6.x. The issue remains open only to backport the change to Drupal 5.x. Since Drupal 5.x is no longer supported, I'm marking this issue as won't fix.