Problem/Motivation

In the below sample of a REST export view output in JSON format, you can see that an apostrophe character (ASCII code 39) is double encoded in the form of \u0026#039;.

[{"book_background_pattern":"\/sites\/default\/files\/a_visit_to\/background_images\/avt_s18_background_pattern.jpg","cover":"\/sites\/default\/files\/a_visit_to\/background_images\/avt_doc_S18_cover.jpg","dark_color":"0073b9","accent_color":"a92825","light_color":"c7d5ee","header_background":"\/sites\/default\/files\/a_visit_to\/background_images\/avt_s18_header.png","title":"The Doctor\u0026#039;s Office: A 4D Book","vuforia_device_database":"\/sites\/default\/files\/a_visit_to\/doctors_office\/targets\/a_visit_to_doctors_office.zip","id":"8799","author":"Blake A. Hoena","illustrator":"","series":"A Visit to...","series_id":"268"}]

Steps to reproduce

  1. Create a node with a title containing an apostrophe character
  2. Create a view containing a REST Export display
  3. Set the view format to "Fields"
  4. Add the "Content:Title" field to the field list
  5. Preview the results of the view
  6. Observe that the apostrophe character is double encoded as \u0026#039; and not the expected '

Proposed resolution

Rollback special character encoder, escaping double quotes with a backslash in preview and output.

The regex searches the $output string for all occurrences of \\uXXXX where X is a hexadecimal character consisting of a digit 0-9 or letter A-F (case insensitive). e.g. \\u0026

For each match that it finds, it uses the mb_convert_encoding() function to convert that character from one encoding to another encoding. Then any double quote characters (") are prefixed with a slash character (\) so that they're properly escaped according to the JSON string requirements.

Remaining tasks

  • ✅ Update issue summary, to include the proposed resolution
  • ✅ Rollback special character encoder in the Views output
  • Rollback special character encoder in the Views preview
  • Add a test, showing the problem

User interface changes

API changes

Data model changes

Release notes snippet

Issue fork drupal-2928793

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

alex.stone.filament created an issue. See original summary.

alex.stone.filament’s picture

Issue summary: View changes
alex.stone.filament’s picture

Issue summary: View changes
alex.stone.filament’s picture

Component: views.module » rest.module

Updated to reflect the issue is in rest.module

wim leers’s picture

Title: Double encoding of apostrophes in REST Export display » REST views: double encoding of apostrophes in REST Export display
Issue tags: +VDC, +API-First Initiative, +Needs tests

Thanks for reporting this bug!

Note that this may take a while to get fixed — few people are working on the "REST Export" view display functionality. If you can provide a failing test that reproduces this, I can promise a fix very soon.

jayemel’s picture

This is occurring with ampersands in the body field as well. I've generated a JSON export in views, and ampersands occuring in the body text show up in the JSON feed as \u0026amp;.

sebasto’s picture

I was anoyed by the same problem and managed to get around by ticking the box "Raw output" in Format => Show => Settings

Hope this helps.

wim leers’s picture

#8: uhm … I wonder why that's not enabled automatically (and forcibly) by \Drupal\rest\Plugin\views\display\RestExport :(

iyyappan.govind’s picture

Same problem.

kporras07’s picture

I'd like to help on this issue (if it's still an issue)
@Wim Leers what's the expected next step? Is it still creating a failing test?

wim leers’s picture

Is it still creating a failing test?

Yes! And ideally, after that: a patch that makes the test pass :)

kingdutch’s picture

Unfortunately I ran into this issue trying to create a View to serve as a JSON endpoint for a search index. As the search index contains multiple entity types I needed a composite field that moved multiple labels into a single (text type) field. When enabling "Raw output" for this composite field then the output will simply be null.

clintu’s picture

You may extend class Serializer and can fix this issue by adding

public function render() {
....................
....................
....................
}
weynhamz’s picture

Another way to workaround this is to use Searh API to index the fied as fulltext and apply 'HTML Filter' on that field.

jayemel’s picture

Running into this issue as well, in a taxonomy field set to formatter "Label".

Setting the field to 'raw output' doesn't help as it gives me an entity ID, when I need the term name.

Currently doing a whole lot of extending Serializer to get a satisfactory REST export.

Version: 8.3.7 » 8.3.x-dev

Core issues are now filed against the dev versions where changes will be made. Document the specific release you are using in your issue comment. More information about choosing a version.

Version: 8.3.x-dev » 8.9.x-dev

Drupal 8.8.7 was released on June 3, 2020 and is the final full bugfix release for the Drupal 8.8.x series. Branches prior to 8.8.x are not supported, and Drupal 8.8.x will not receive any further development aside from security fixes. Sites should prepare to update to Drupal 8.9.0 or Drupal 9.0.0 for ongoing support.

Bug reports should be targeted against the 8.9.x-dev branch from now on, and new development or disruptive changes should be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

drupalviking’s picture

Yeah, this appears when trying to use the serializer on Icelandic letters:

"view":"[{\u0022id\u0022:\u002260\u0022,\u0022type\u0022:\u0022Texti\u0022,\u0022field_caption\u0022:\u0022\u0022,\u0022field_image\u0022:\u0022\u0022,\u0022field_image_location\u0022:\u0022\u0022,\u0022field_formatted_text\u0022:\u0022\\u003Cp\\u003E\\u00c1ri\\u00f0 1968 voru Minjasafn Reykjav\\u00edkur og \\u00c1rb\\u00e6jarsafn sameinu\\u00f0 undir nafni hins s\\u00ed\\u00f0arnefnda. \\u00de\\u00e1 var einnig sam\\u00feykkt \\u00ed borgarstj\\u00f3rn a\\u00f0 koma \\u00e1 f\\u00f3t emb\\u00e6tti borgarminjavar\\u00f0ar og var fyrst r\\u00e1\\u00f0i\\u00f0 \\u00ed \\u00fea\\u00f0 starf 1974. Fyrsti borgarminjav\\u00f6r\\u00f0urinn var Nanna Hermansson (1974-1984), s\\u00ed\\u00f0an t\\u00f3k Ragnhei\\u00f0ur \\u00de\\u00f3rarinsd\\u00f3ttir (1984-1989) vi\\u00f0, \\u00feri\\u00f0ja \\u00ed r\\u00f6\\u00f0inni var Margr\\u00e9t Hallgr\\u00edmsd\\u00f3ttir (1989-2000) og \\u00fe\\u00e1 Gu\\u00f0n\\u00fd Ger\\u00f0ur Gunnarsd\\u00f3ttir (2000-2014). \\u00cd ma\\u00ed \\u00e1ri\\u00f0 2006 opna\\u00f0i Landn\\u00e1mss\\u00fdningin \\u00ed A\\u00f0alstr\\u00e6ti 16. \\u00a0\\u00deungami\\u00f0ja s\\u00fdningarinnar er sk\\u00e1lar\\u00fast fr\\u00e1 10. \\u00f6ld, sem fannst \\u00feegar grafi\\u00f0 var fyrir n\\u00fdju h\\u00fasi \\u00e1 horni A\\u00f0alstr\\u00e6tis og T\\u00fang\\u00f6tu.\\u00a0\\u003C\\\/p\\u003E\\n\\u003Cp\\u003E\\u00deegar \\u00c1rb\\u00e6jarsafn var stofna\\u00f0, 1957, var \\u00fea\\u00f0 sp\\u00f6lkorn fyrir utan bygg\\u00f0ina \\u00ed Reykjav\\u00edk. S\\u00ed\\u00f0an hefur borgin st\\u00e6kka\\u00f0 umtalsvert og n\\u00e6r n\\u00fa langt \\u00fat fyrir safni\\u00f0. En \\u00fer\\u00e1tt fyrir b\\u00e6\\u00f0i H\\u00f6f\\u00f0abakkabraut \\u00ed austri og \\u00edb\\u00fa\\u00f0abygg\\u00f0 \\u00ed nor\\u00f0ri er landr\\u00fdmi enn\\u00fe\\u00e1 allnokku\\u00f0. Einnig n\\u00fdtur safni\\u00f0 g\\u00f3\\u00f0s af n\\u00e1l\\u00e6g\\u00f0inni vi\\u00f0 Elli\\u00f0a\\u00e1rdalinn, \\u00fea\\u00f0 gr\\u00f3skusama og v\\u00ed\\u00f0\\u00e1ttumikla \\u00fativistarsv\\u00e6\\u00f0i. Raunar er a\\u00f0d\\u00e1unarvert hve frams\\u00fdnir frumkv\\u00f6\\u00f0lar \\u00c1rb\\u00e6jarsafns hafa reynst \\u00ed flestu tilliti. \\u00der\\u00f3unarm\\u00f6guleikar safnsins vir\\u00f0ast \\u00f3\\u00ferj\\u00f3tandi um langa framt\\u00ed\\u00f0\\u003C\\\/p\\u003E\\n\u0022,\u0022

We need to fix this!

paul_leclerc’s picture

Still aving this issue in Drupal 8.9.11.

I'm using french language with this example : {"vid":"delay","tid":"18","name":"Tr\u00e8s court"}

Sure we can still override the serializer but it's a real shame to have to do this when all other configurations work fine and quickly :/

After some investigation I found that I had to json_encode then json_encode with the option : JSON_UNESCAPED_UNICODE.
It makes me understand that the encoding of the value is not correct.
So I found that name data of the terms are encoded as ASCII in the database : `name` varchar(128) CHARACTER SET ascii

Maybe that's why the json export does not manage the utf8 encoding with this specific field.
The serializer should be able to check if the value to export is database encoded in ascii and convert it ?

gthing’s picture

#8 is the real MVP. Worked for me.

neslee canil pinto’s picture

Same problem here,

& is converted into &

Can anyone tell how can this be fixed

?utm_source=nordly&utm_medium=widget&utm_campaign=frontpage2021
theolem’s picture

Having the same problem, #8 works for text fields, but for entity reference fields (to a taxonomy term for example) enabling the "raw" option only results in the field being replaced by the entity ID.

vistree’s picture

Same problem here. I try to use the body summary field by rewriting the body field with {{ body__summary }}
I can't use the field Formatter "Summary or trimmed" because then I get unwanted HTML and/or some HTML characters are removed from output (">" character).
Using the JSON raw export for this field will remove the field overwrite and will show the whole body text instead of just the summary.
"&" is replaced by "&quot;" whereas ">" and "<" are double encoded and show " &amp;gt;" and "&amp;lt;"

pallavi_sugandhi’s picture

Facing same issue with Rest export view for default title field.
Can anyone please suggest the alternative solution to fix this issue.

mrddthi made their first commit to this issue’s fork.

thidd’s picture

The fork PR should be closed by another solution.
Because json encoding is needed for out put string,
So make special character display correctly is the responbility of the json receiver.
But I would make a EventSubscriber to handle those special character for who needed.

  public function onKernelResponse(FilterResponseEvent $event) {
    $response = $event->getResponse();
    if (!$response instanceof CacheableResponse) {
      return;
    }

    $route_attribute = $event->getRequest()->attributes;
    // Check condition for view_id in route parameter.
    if (!$route_attribute->has('view_id')) {
      return;
    }
    $view_id = $route_attribute->get('view_id');
    if ($view_id != 'VIEW_ID') {
      // You may need to add display ID condition.
      return;
    }
    $data = $response->getContent();
    // Rollback special character encoder. @see https://drupal.org/node/2928793.
    $data = preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/', function ($match) {
      return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
    }, $data);
    $response->setContent($data);
    $event->setResponse($response);
  }

Version: 8.9.x-dev » 9.2.x-dev

Drupal 8 is end-of-life as of November 17, 2021. There will not be further changes made to Drupal 8. Bugfixes are now made to the 9.3.x and higher branches only. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.15 was released on June 1st, 2022 and is the final full bugfix release for the Drupal 9.3.x series. Drupal 9.3.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.4.x-dev branch from now on, and new development or disruptive changes should be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

richarddavies’s picture

The code in #31 and #32 doesn't account for double quotes which need to be escaped when converted from \u0022 to " because the double quote character will be appearing within a JSON string wrapped in double quotes.

Here is a slightly modified patch which escapes double quotes with a backslash:

diff --git a/core/modules/rest/src/Plugin/views/display/RestExport.php b/core/modules/rest/src/Plugin/views/display/RestExport.php
index 51c080b93cca2f339a8174d0fb0564c09ffa7c2c..4d6c911d9a053a71f32c1d1c9f12ba749e76661a 100644
--- a/core/modules/rest/src/Plugin/views/display/RestExport.php
+++ b/core/modules/rest/src/Plugin/views/display/RestExport.php
@@ -422,6 +422,11 @@ public static function buildResponse($view_id, $display_id, array $args = []) {
     $renderer = \Drupal::service('renderer');
 
     $output = (string) $renderer->renderRoot($build);
+    // Rollback special character encoder. @see https://drupal.org/node/2928793.
+    $output = preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/', function ($match) {
+        $char = mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
+        return str_replace('"', '\"', $char);
+      }, $output);
 
     $response->setContent($output);
     $cache_metadata = CacheableMetadata::createFromRenderArray($build);
Sravani Ch’s picture

Priority: Normal » Critical

Hi , I checked both condtions

// Rollback special character encoder.
+ $output = preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/', function ($match) {
+ $char = mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
+ return str_replace('"', '\"', $char);
+ }, $output);

// Rollback special character encoder.
$output = preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/', function ($match) {
return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
$char = mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
return str_replace('"', '\"', $char);
}, $output);

But it's not working for me

cilefen’s picture

Priority: Critical » Normal
richarddavies’s picture

@SravaniKrishna What exactly do you mean it isn't working? Based on your sample output, it appears to be working correctly as the JSON is properly formatted with no double character encodings.

Sravani Ch’s picture

No Richard Davies
In my postman collection it's not working

Sravani Ch’s picture

StatusFileSize
new187.29 KB
richarddavies’s picture

@SravaniKrishna Again, can you point out exactly where the encoding error is in your JSON output? I don't see anything wrong with that JSON. There is no "double encoding" like "\u0026#039;" or JSON parsing error so the patch seems to be working correctly.

If you don't want the ampersand in the title field to be HTML encoded, then you can turn that off by checking "Raw output" in the view's Format => Show => Settings as other commenters have already pointed out. (I don't think you can disable the HTML encoding of full text WYSIWYG fields because they must always be rendered in HTML so the HTML encoding is necessary.)

cilefen’s picture

Surround output with <code> tags to highlight actual characters.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.9 was released on December 7, 2022 and is the final full bugfix release for the Drupal 9.4.x series. Drupal 9.4.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.5.x-dev branch from now on, and new development or disruptive changes should be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

internetter’s picture

I discovered other problems with rest export (json) of views and image urls. Perhaps it is related:

There was an encoding of parameter ampersand "&" as "\u0026amp;" for multiple parameter urls from image url formatter (using of focal_point).

ressa’s picture

Version: 9.5.x-dev » 10.1.x-dev
Status: Active » Needs review
Related issues: +#2701129: single quote character not escaped in REST output
StatusFileSize
new929 bytes

Thanks @RichardDavies! Your patch works perfectly in Drupal 10, with single quotes (') as &#039; and the output a lot cleaner. Before and after:

  • "name": "C\u00f4te d\u0026#039;Ivoire"
    "name": "Côte d&#039;Ivoire"
    
  • "name": "Pes\u00e4pallo"
    "name": "Pesäpallo"
    

Also, much cleaner looking HTML (before and after):

  • "title": "Facts about C\u00f4te d\u0026#039;Ivoire"
    "title": "Facts about Côte d&#039;Ivoire"
    
  • "field_body": "\u003Ch2\u003E1. Some facts\u003C\/h2\u003E ..."
    "field_body": "<h2>1. Some facts<\/h2> ..."
    

There's also the related issue #2701129: single quote character not escaped in REST output about single quotes (') which I believe don't need to be HTML encoded into &#039;, since single quotes don't need escaping because proper JSON output is in double quotes.

Should it be looked at here, or in the other issue?

I am attaching a re-rolled patch for Drupal 10.1, since I have bad experiences with re-basing Drupal core MR's in Drupal's Gitlab. Also, this patch can then be used as a patch in Composer, since it is static.

ressa’s picture

Also, fixing #3355796: Allow JSON format when "Accepted request formats" is not defined would get REST and Views export in a great state, working out-of-the-box.

smustgrave’s picture

Status: Needs review » Needs work
Issue tags: +Needs issue summary update, +Needs Review Queue Initiative

Can the issue summary be updated to include the proposed resolution.

Also a test showing the problem will be needed please

Thanks!

ressa’s picture

Issue summary: View changes

Thanks for reviewing @smustgrave. I would also be interested in a description of what the regex actually does.

@thidd or @RichardDavies, perhaps you can help with this? And maybe even a test? :)

ressa’s picture

Issue summary: View changes

I also now see that the preview is still escaped, so we probably should do the same there? I'll add the tasks in the issue summary.

richarddavies’s picture

@ressa The regex searches the $output string for all occurrences of \\uXXXX where X is a hexadecimal character consisting of a digit 0-9 or letter A-F (case insensitive). e.g. \\u0026

For each match that it finds, it uses the mb_convert_encoding() function to convert that character from one encoding to another encoding. Then any double quote characters (") are prefixed with a slash character (\) so that they're properly escaped according to the JSON string requirements.

ressa’s picture

Issue summary: View changes

Thanks @RichardDavies! Both for working on this solution, and explaining the regex. I have added it in the Issue Summary.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

nicolasgraph’s picture

Patch #46 causes malformed UTF-8 characters for emojis.

_renify_’s picture

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.