While testing, I found that two nodes on a Drupal site that contain control characters (specifically backspace 0x08).
This caused a JSON:API parse to fail with this error message:
SyntaxError: Bad control character in string literal in JSON.
Per the JSON specification (RFC 8259, Section 7), JSON:API should escape or sanitize control characters to ensure valid JSON output.
Drupal is not currently escaping control characters in JSON:API output. Control characters have no purpose in web output, so we should consider stripping them.
Steps to reproduce
- Create content containing control characters (e.g., backspace, form feed, null byte).
- Request the content via JSON:API.
- Inspect the JSON output for unescaped control characters.
Issue fork drupal-3549107
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
- 10.6.x
changes, plain diff MR !14333
- 3549107-escape-control-chars
changes, plain diff MR !13910
Comments
Comment #2
scottatdrake commentedI took an initial look and want to report my findings for my future self or the next person who looks into this issue.
JSON_HEX_TAG | JSON_HEX_APOS | JSON_HEX_AMP | JSON_HEX_QUOTA solution may be to modify
core/modules/jsonapi/src/Encoder/JsonEncoder.phpto override the constructor and add theJSON_INVALID_UTF8_SUBSTITUTEflag, like:Comment #3
scottatdrake commentedUpon a second look, I think moving the fix above (ie adding JSON_INVALID_UTF8_SUBSTITUTE ) to the parent encoder (core/modules/serialization/src/Encoder/JsonEncoder.php) may be the way to go so that REST JSON, AJAX, and JSON:API are all protected the same way via inheritance. However, this would broaden the scope of the issue.
Comment #4
bbralaI would probably go up one yeah, although it might be easier to start in jsonapi and then follow up the serializer above.
I will review and push through if we can get the work going here :)
Comment #5
scottatdrake commentedComment #7
scottatdrake commentedI've opened MR !13910 with the fix applied at the parent serialization encoder level. This protects REST JSON, AJAX, and JSON:API responses via inheritance.
The change adds `JSON_INVALID_UTF8_SUBSTITUTE` to the existing encoding flags while preserving the HTML-safety behavior. I've included unit tests for both the serialization and JSON:API encoders that verify:
I added "ufffd" to the cspell dictionary to get Spell-checking step in the pipeline to pass.
The merge request passes with warnings. I'm unsure if those are expected. They don't immediately seem related to my changes.
Ready for review when you have a chance.
Comment #8
bbralaThis looks good. I considered if this is breaking, but this is mostly unbreaking ;) Coverage looks good. We might want a change record but I'm leaning to no since this will not change output, even if developers did their own substitution.
Comment #11
longwaveBackported down to 11.3.x as an eligible bug fix.
Committed and pushed 8d5894c66b9 to 11.x and e4057833893 to 11.3.x. Thanks!
This also seems worthwhile to backport to 10.6.x, but the MR doesn't apply due to test changes, so marking for backport if anyone wants to work on that.
Comment #13
scottatdrake commentedI will investigate a backport for 10.6.x.
Comment #15
scottatdrake commented@longwave I have backported the fix down to 10.6.x in MR 14333. The tests passed.
I am not 110% confident I followed the correct procedure and would welcome any feedback you might have. I have changed this issue status back to "Needs Review" again.
Also, a fix for related issue https://www.drupal.org/project/drupal/issues/3549269 has been awaiting review since December 1. If anyone has time or attention to spare, I'd appreciate any feedback there, too.
Comment #16
smustgrave commentedSeems like a fine backport.
Comment #18
longwaveAlso looks good to me!
Committed and pushed 88062f490a4 to 10.6.x. Thanks!