While testing, I found that two nodes on a Drupal site that contain control characters (specifically backspace 0x08).

This caused a JSON:API parse to fail with this error message:

SyntaxError: Bad control character in string literal in JSON.

Per the JSON specification (RFC 8259, Section 7), JSON:API should escape or sanitize control characters to ensure valid JSON output.

Drupal is not currently escaping control characters in JSON:API output. Control characters have no purpose in web output, so we should consider stripping them.

Steps to reproduce

  1. Create content containing control characters (e.g., backspace, form feed, null byte).
  2. Request the content via JSON:API.
  3. Inspect the JSON output for unescaped control characters.

Issue fork drupal-3549107

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

dries created an issue. See original summary.

scottatdrake’s picture

I took an initial look and want to report my findings for my future self or the next person who looks into this issue.

  • The JSON:API encoder (core/modules/jsonapi/src/Encoder/JsonEncoder.php) is a mimimal class that only extends the serialization module's JsonEncoder
  • The parent encoder uses these JSON encoding flags: JSON_HEX_TAG | JSON_HEX_APOS | JSON_HEX_AMP | JSON_HEX_QUOT
  • These flags only escape HTML-unsafe characters (<, >, ', &, ")
  • But there is no explicit handling for invalid UTF-8 sequences or edge cases with control characters

A solution may be to modify core/modules/jsonapi/src/Encoder/JsonEncoder.php to override the constructor and add the JSON_INVALID_UTF8_SUBSTITUTE flag, like:

  public function __construct(?JsonEncode $encodingImpl = NULL, ?JsonDecode $decodingImpl = NULL) {
    // Encode <, >, ', &, and " for RFC4627-compliant JSON, which may also be
    // embedded into HTML.
    // @see \Symfony\Component\HttpFoundation\JsonResponse
    // Additionally, substitute invalid UTF-8 sequences to ensure control
    // characters and malformed data are properly handled.
    // @see https://www.drupal.org/project/drupal/issues/3549107
    $json_encoding_options = JSON_HEX_TAG | JSON_HEX_APOS | JSON_HEX_AMP | JSON_HEX_QUOT | JSON_INVALID_UTF8_SUBSTITUTE;
    $this->encodingImpl = $encodingImpl ?: new JsonEncode([JsonEncode::OPTIONS => $json_encoding_options]);
    $this->decodingImpl = $decodingImpl ?: new JsonDecode([JsonDecode::ASSOCIATIVE => TRUE]);
  }
scottatdrake’s picture

Upon a second look, I think moving the fix above (ie adding JSON_INVALID_UTF8_SUBSTITUTE ) to the parent encoder (core/modules/serialization/src/Encoder/JsonEncoder.php) may be the way to go so that REST JSON, AJAX, and JSON:API are all protected the same way via inheritance. However, this would broaden the scope of the issue.

bbrala’s picture

I would probably go up one yeah, although it might be easier to start in jsonapi and then follow up the serializer above.

I will review and push through if we can get the work going here :)

scottatdrake’s picture

Assigned: Unassigned » scottatdrake

scottatdrake’s picture

Status: Active » Needs review

I've opened MR !13910 with the fix applied at the parent serialization encoder level. This protects REST JSON, AJAX, and JSON:API responses via inheritance.

The change adds `JSON_INVALID_UTF8_SUBSTITUTE` to the existing encoding flags while preserving the HTML-safety behavior. I've included unit tests for both the serialization and JSON:API encoders that verify:

  • Invalid UTF-8 sequences are substituted (not fatal)
  • HTML-unsafe character escaping still works
  • The JSON:API encoder correctly inherits the behavior

I added "ufffd" to the cspell dictionary to get Spell-checking step in the pipeline to pass.

The merge request passes with warnings. I'm unsure if those are expected. They don't immediately seem related to my changes.

Ready for review when you have a chance.

bbrala’s picture

Status: Needs review » Reviewed & tested by the community

This looks good. I considered if this is breaking, but this is mostly unbreaking ;) Coverage looks good. We might want a change record but I'm leaning to no since this will not change output, even if developers did their own substitution.

  • longwave committed e4057833 on 11.3.x
    fix: #3549107 Escape or strip control characters in JSON:API
    
    By: dries...

  • longwave committed 8d5894c6 on 11.x
    fix: #3549107 Escape or strip control characters in JSON:API
    
    By: dries...
longwave’s picture

Version: 11.x-dev » 10.6.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

Backported down to 11.3.x as an eligible bug fix.

Committed and pushed 8d5894c66b9 to 11.x and e4057833893 to 11.3.x. Thanks!

This also seems worthwhile to backport to 10.6.x, but the MR doesn't apply due to test changes, so marking for backport if anyone wants to work on that.

scottatdrake’s picture

I will investigate a backport for 10.6.x.

scottatdrake’s picture

Status: Patch (to be ported) » Needs review

@longwave I have backported the fix down to 10.6.x in MR 14333. The tests passed.

I am not 110% confident I followed the correct procedure and would welcome any feedback you might have. I have changed this issue status back to "Needs Review" again.

Also, a fix for related issue https://www.drupal.org/project/drupal/issues/3549269 has been awaiting review since December 1. If anyone has time or attention to spare, I'd appreciate any feedback there, too.

smustgrave’s picture

Status: Needs review » Reviewed & tested by the community

Seems like a fine backport.

  • longwave committed 88062f49 on 10.6.x
    fix: #3549107 Escape or strip control characters in JSON:API
    
    By: dries...
longwave’s picture

Status: Reviewed & tested by the community » Fixed

Also looks good to me!

Committed and pushed 88062f490a4 to 10.6.x. Thanks!

Now that this issue is closed, review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, credit people who helped resolve this issue.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.