Problem/Motivation
In the below sample of a REST export view output in JSON format, you can see that an apostrophe character (ASCII code 39) is double encoded in the form of \u0026#039;.
[{"book_background_pattern":"\/sites\/default\/files\/a_visit_to\/background_images\/avt_s18_background_pattern.jpg","cover":"\/sites\/default\/files\/a_visit_to\/background_images\/avt_doc_S18_cover.jpg","dark_color":"0073b9","accent_color":"a92825","light_color":"c7d5ee","header_background":"\/sites\/default\/files\/a_visit_to\/background_images\/avt_s18_header.png","title":"The Doctor\u0026#039;s Office: A 4D Book","vuforia_device_database":"\/sites\/default\/files\/a_visit_to\/doctors_office\/targets\/a_visit_to_doctors_office.zip","id":"8799","author":"Blake A. Hoena","illustrator":"","series":"A Visit to...","series_id":"268"}]
Steps to reproduce
- Create a node with a title containing an apostrophe character
- Create a view containing a REST Export display
- Set the view format to "Fields"
- Add the "Content:Title" field to the field list
- Preview the results of the view
- Observe that the apostrophe character is double encoded as
\u0026#039;and not the expected'
Proposed resolution
Rollback special character encoder, escaping double quotes with a backslash in preview and output.
The regex searches the $output string for all occurrences of \\uXXXX where X is a hexadecimal character consisting of a digit 0-9 or letter A-F (case insensitive). e.g. \\u0026
For each match that it finds, it uses the mb_convert_encoding() function to convert that character from one encoding to another encoding. Then any double quote characters (") are prefixed with a slash character (\) so that they're properly escaped according to the JSON string requirements.
Remaining tasks
- ✅ Update issue summary, to include the proposed resolution
- ✅ Rollback special character encoder in the Views output
- Rollback special character encoder in the Views preview
- Add a test, showing the problem
User interface changes
API changes
Data model changes
Release notes snippet
| Comment | File | Size | Author |
|---|---|---|---|
| #55 | 2928793-55--rest-views-double-encoding-apostrophes.patch | 866 bytes | _renify_ |
| #46 | 2928793-46-rest-views-double-encoding-apostrophes.patch | 929 bytes | ressa |
| #41 | Screenshot (771).png | 187.29 KB | Sravani Ch |
Issue fork drupal-2928793
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
- 8.9.x
changes, plain diff MR !1281
- 2928793-rest-views-double-encoding
changes, plain diff MR !1280
Comments
Comment #2
alex.stone.filament commentedComment #3
alex.stone.filament commentedComment #4
alex.stone.filament commentedComment #5
alex.stone.filament commentedUpdated to reflect the issue is in rest.module
Comment #6
wim leersThanks for reporting this bug!
Note that this may take a while to get fixed — few people are working on the "REST Export" view display functionality. If you can provide a failing test that reproduces this, I can promise a fix very soon.
Comment #7
jayemel commentedThis is occurring with ampersands in the body field as well. I've generated a JSON export in views, and ampersands occuring in the body text show up in the JSON feed as \u0026amp;.
Comment #8
sebasto commentedI was anoyed by the same problem and managed to get around by ticking the box "Raw output" in Format => Show => Settings
Hope this helps.
Comment #9
wim leers#8: uhm … I wonder why that's not enabled automatically (and forcibly) by
\Drupal\rest\Plugin\views\display\RestExport:(Comment #10
iyyappan.govindSame problem.
Comment #11
kporras07 commentedI'd like to help on this issue (if it's still an issue)
@Wim Leers what's the expected next step? Is it still creating a failing test?
Comment #12
wim leersYes! And ideally, after that: a patch that makes the test pass :)
Comment #13
kingdutchUnfortunately I ran into this issue trying to create a View to serve as a JSON endpoint for a search index. As the search index contains multiple entity types I needed a composite field that moved multiple labels into a single (text type) field. When enabling "Raw output" for this composite field then the output will simply be
null.Comment #14
clintu commentedYou may extend class Serializer and can fix this issue by adding
Comment #15
weynhamzAnother way to workaround this is to use Searh API to index the fied as fulltext and apply 'HTML Filter' on that field.
Comment #16
jayemel commentedRunning into this issue as well, in a taxonomy field set to formatter "Label".
Setting the field to 'raw output' doesn't help as it gives me an entity ID, when I need the term name.
Currently doing a whole lot of extending Serializer to get a satisfactory REST export.
Comment #19
drupalvikingYeah, this appears when trying to use the serializer on Icelandic letters:
"view":"[{\u0022id\u0022:\u002260\u0022,\u0022type\u0022:\u0022Texti\u0022,\u0022field_caption\u0022:\u0022\u0022,\u0022field_image\u0022:\u0022\u0022,\u0022field_image_location\u0022:\u0022\u0022,\u0022field_formatted_text\u0022:\u0022\\u003Cp\\u003E\\u00c1ri\\u00f0 1968 voru Minjasafn Reykjav\\u00edkur og \\u00c1rb\\u00e6jarsafn sameinu\\u00f0 undir nafni hins s\\u00ed\\u00f0arnefnda. \\u00de\\u00e1 var einnig sam\\u00feykkt \\u00ed borgarstj\\u00f3rn a\\u00f0 koma \\u00e1 f\\u00f3t emb\\u00e6tti borgarminjavar\\u00f0ar og var fyrst r\\u00e1\\u00f0i\\u00f0 \\u00ed \\u00fea\\u00f0 starf 1974. Fyrsti borgarminjav\\u00f6r\\u00f0urinn var Nanna Hermansson (1974-1984), s\\u00ed\\u00f0an t\\u00f3k Ragnhei\\u00f0ur \\u00de\\u00f3rarinsd\\u00f3ttir (1984-1989) vi\\u00f0, \\u00feri\\u00f0ja \\u00ed r\\u00f6\\u00f0inni var Margr\\u00e9t Hallgr\\u00edmsd\\u00f3ttir (1989-2000) og \\u00fe\\u00e1 Gu\\u00f0n\\u00fd Ger\\u00f0ur Gunnarsd\\u00f3ttir (2000-2014). \\u00cd ma\\u00ed \\u00e1ri\\u00f0 2006 opna\\u00f0i Landn\\u00e1mss\\u00fdningin \\u00ed A\\u00f0alstr\\u00e6ti 16. \\u00a0\\u00deungami\\u00f0ja s\\u00fdningarinnar er sk\\u00e1lar\\u00fast fr\\u00e1 10. \\u00f6ld, sem fannst \\u00feegar grafi\\u00f0 var fyrir n\\u00fdju h\\u00fasi \\u00e1 horni A\\u00f0alstr\\u00e6tis og T\\u00fang\\u00f6tu.\\u00a0\\u003C\\\/p\\u003E\\n\\u003Cp\\u003E\\u00deegar \\u00c1rb\\u00e6jarsafn var stofna\\u00f0, 1957, var \\u00fea\\u00f0 sp\\u00f6lkorn fyrir utan bygg\\u00f0ina \\u00ed Reykjav\\u00edk. S\\u00ed\\u00f0an hefur borgin st\\u00e6kka\\u00f0 umtalsvert og n\\u00e6r n\\u00fa langt \\u00fat fyrir safni\\u00f0. En \\u00fer\\u00e1tt fyrir b\\u00e6\\u00f0i H\\u00f6f\\u00f0abakkabraut \\u00ed austri og \\u00edb\\u00fa\\u00f0abygg\\u00f0 \\u00ed nor\\u00f0ri er landr\\u00fdmi enn\\u00fe\\u00e1 allnokku\\u00f0. Einnig n\\u00fdtur safni\\u00f0 g\\u00f3\\u00f0s af n\\u00e1l\\u00e6g\\u00f0inni vi\\u00f0 Elli\\u00f0a\\u00e1rdalinn, \\u00fea\\u00f0 gr\\u00f3skusama og v\\u00ed\\u00f0\\u00e1ttumikla \\u00fativistarsv\\u00e6\\u00f0i. Raunar er a\\u00f0d\\u00e1unarvert hve frams\\u00fdnir frumkv\\u00f6\\u00f0lar \\u00c1rb\\u00e6jarsafns hafa reynst \\u00ed flestu tilliti. \\u00der\\u00f3unarm\\u00f6guleikar safnsins vir\\u00f0ast \\u00f3\\u00ferj\\u00f3tandi um langa framt\\u00ed\\u00f0\\u003C\\\/p\\u003E\\n\u0022,\u0022We need to fix this!
Comment #20
paul_leclerc commentedStill aving this issue in Drupal 8.9.11.
I'm using french language with this example : {"vid":"delay","tid":"18","name":"Tr\u00e8s court"}
Sure we can still override the serializer but it's a real shame to have to do this when all other configurations work fine and quickly :/
After some investigation I found that I had to json_encode then json_encode with the option : JSON_UNESCAPED_UNICODE.
It makes me understand that the encoding of the value is not correct.
So I found that name data of the terms are encoded as ASCII in the database : `name` varchar(128) CHARACTER SET ascii
Maybe that's why the json export does not manage the utf8 encoding with this specific field.
The serializer should be able to check if the value to export is database encoded in ascii and convert it ?
Comment #21
gthing commented#8 is the real MVP. Worked for me.
Comment #22
neslee canil pintoSame problem here,
Can anyone tell how can this be fixed
Comment #23
theolem commentedHaving the same problem, #8 works for text fields, but for entity reference fields (to a taxonomy term for example) enabling the "raw" option only results in the field being replaced by the entity ID.
Comment #24
vistree commentedSame problem here. I try to use the body summary field by rewriting the body field with {{ body__summary }}
I can't use the field Formatter "Summary or trimmed" because then I get unwanted HTML and/or some HTML characters are removed from output (">" character).
Using the JSON raw export for this field will remove the field overwrite and will show the whole body text instead of just the summary.
"&" is replaced by """ whereas ">" and "<" are double encoded and show " &gt;" and "&lt;"Comment #25
pallavi_sugandhi commentedFacing same issue with Rest export view for default title field.
Can anyone please suggest the alternative solution to fix this issue.
Comment #32
thidd commentedThe fork PR should be closed by another solution.
Because json encoding is needed for out put string,
So make special character display correctly is the responbility of the json receiver.
But I would make a EventSubscriber to handle those special character for who needed.
Comment #36
richarddavies commentedThe code in #31 and #32 doesn't account for double quotes which need to be escaped when converted from
\u0022to"because the double quote character will be appearing within a JSON string wrapped in double quotes.Here is a slightly modified patch which escapes double quotes with a backslash:
Comment #37
Sravani Ch commentedHi , I checked both condtions
// Rollback special character encoder.
+ $output = preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/', function ($match) {
+ $char = mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
+ return str_replace('"', '\"', $char);
+ }, $output);
// Rollback special character encoder.
$output = preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/', function ($match) {
return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
$char = mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
return str_replace('"', '\"', $char);
}, $output);
But it's not working for me
Comment #38
cilefen commentedComment #39
richarddavies commented@SravaniKrishna What exactly do you mean it isn't working? Based on your sample output, it appears to be working correctly as the JSON is properly formatted with no double character encodings.
Comment #40
Sravani Ch commentedNo Richard Davies
In my postman collection it's not working
Comment #41
Sravani Ch commentedComment #42
richarddavies commented@SravaniKrishna Again, can you point out exactly where the encoding error is in your JSON output? I don't see anything wrong with that JSON. There is no "double encoding" like "\u0026#039;" or JSON parsing error so the patch seems to be working correctly.
If you don't want the ampersand in the title field to be HTML encoded, then you can turn that off by checking "Raw output" in the view's Format => Show => Settings as other commenters have already pointed out. (I don't think you can disable the HTML encoding of full text WYSIWYG fields because they must always be rendered in HTML so the HTML encoding is necessary.)
Comment #43
cilefen commentedSurround output with
<code>tags to highlight actual characters.Comment #45
internetter commentedI discovered other problems with rest export (json) of views and image urls. Perhaps it is related:
There was an encoding of parameter ampersand "&" as "\u0026amp;" for multiple parameter urls from image url formatter (using of focal_point).
Comment #46
ressaThanks @RichardDavies! Your patch works perfectly in Drupal 10, with single quotes (
') as'and the output a lot cleaner. Before and after:Also, much cleaner looking HTML (before and after):
There's also the related issue #2701129: single quote character not escaped in REST output about single quotes (
') which I believe don't need to be HTML encoded into', since single quotes don't need escaping because proper JSON output is in double quotes.Should it be looked at here, or in the other issue?
I am attaching a re-rolled patch for Drupal 10.1, since I have bad experiences with re-basing Drupal core MR's in Drupal's Gitlab. Also, this patch can then be used as a patch in Composer, since it is static.
Comment #47
ressaAlso, fixing #3355796: Allow JSON format when "Accepted request formats" is not defined would get REST and Views export in a great state, working out-of-the-box.
Comment #48
smustgrave commentedCan the issue summary be updated to include the proposed resolution.
Also a test showing the problem will be needed please
Thanks!
Comment #49
ressaThanks for reviewing @smustgrave. I would also be interested in a description of what the regex actually does.
@thidd or @RichardDavies, perhaps you can help with this? And maybe even a test? :)
Comment #50
ressaI also now see that the preview is still escaped, so we probably should do the same there? I'll add the tasks in the issue summary.
Comment #51
richarddavies commented@ressa The regex searches the $output string for all occurrences of \\uXXXX where X is a hexadecimal character consisting of a digit 0-9 or letter A-F (case insensitive). e.g. \\u0026
For each match that it finds, it uses the mb_convert_encoding() function to convert that character from one encoding to another encoding. Then any double quote characters (") are prefixed with a slash character (\) so that they're properly escaped according to the JSON string requirements.
Comment #52
ressaThanks @RichardDavies! Both for working on this solution, and explaining the regex. I have added it in the Issue Summary.
Comment #54
nicolasgraphPatch #46 causes malformed UTF-8 characters for emojis.
Comment #55
_renify_ commented