Problem/Motivation
Source strings can contain HTML, locale placeholders that shouldn't be touched by the translator. Later on possibly exposed in the UI so that users can mark non-translatable parts. Especially the last will also require to support specific positions, so that we don't ignore/escape too much.
Proposed resolution
Extend the data item structure with an #escape key that looks like this:
$item['#escape'] = array(
// Escape the @bar that starts at position 5.
array('string' => '@bar', 'position' => 5),
// Escape all @foo
array('string' => '@foo'),
);
Then translators can check for that and translate them accordingly. This is not trivial, as positions can affect each other and so on, so this should be abstracted/generalized as much as possible. I expect something like
$final_text = $this->escapeText($data_item), For highest flexibility, I think there should be a $this->getEscapedString($string), which has a default implementation that looks like this: return $this->escapeStart . $string . $this->escapeEnd), so simple escaping patterns can just define those properties.
Remaining tasks
Write the code, update documentation, add a lot of tests.
Define unescape. Can we just look for the pattern and remove it? have to care about shifted positions there would be insanely complicated.
User interface changes
None for now.
API changes
Sources can define to be escaped strings, Translators are supposed to care about them.
Related Issues
Gengo: #1676774: Escape HTML from source before sending to be translated
Google: #2064823: Escaping
All other translators will need issues too.
| Comment | File | Size | Author |
|---|---|---|---|
| #2 | escaping-2064871-2.patch | 9.77 KB | berdir |
Comments
Comment #1
berdirComment #2
berdirFirst implementation.
The #escape definition was changed to always require the position and it is therefore used as the key. This makes actually implementing it fairly easy, which is done here in the test translator for testing purposes. The test translator doesn't actually use it, as he doesn't need to.
The escape definitions is implemented for the locale source, as far as possible, as discussed, see inline comments.
Escaping HTML should IMHO be moved to a separate issue, that will not be trivial and it doesn't need to block getting the API in so that translators can start using it.
Comment #3
miro_dietikerIn addition to the limitations, side effects and problems discussed...
There is a second thought that is similarily crazy:
Instead of pseudo metadata, we could define that sources need to deliver proper ITS.
http://www.w3.org/TR/its20/
See also some approach:
https://drupal.org/project/its
This leads to two problems:
- A source might doesn't like to care about ITS and thus the default would be to hint "it's just text!" - don't try to interprete it as ITS tagged content.
- A source needs quite some complexity to parse source and convert into ITS content (and back)
- A translator that doesn't support anything (like current translators) would need to get ITS stripped payload
- A translator that supports escaping (or a subset of ITS) needs to instanciate an ITS parser and interprete the events
While it is nice to follow a clean standard, it's just crazy complex.
We might provide something like this as V2 with a fallback to placeholder positions like we are doing currently.
Comment #4
blueminds commentedJust some very minor things:
indicates -> indicate
indicates -> indicate
The last dash does not need to be escaped?
Comment #5
berdirThanks, fixed the indicate thing, escaping is not necessary as discussed. Committed and pushed.