Split from #479368: D7: Create RFC compliant HTML safe JSON as the issue is much less problematic on D7, and nobody gives a damn.

The problem: our JSON is not RFC 4627 compliant, which makes it unparsable by a large number of JSON parsers (.NET, iPhone, browsers and newer jQuery versions). The most common problem is that we escape characters with \xNN instead of \uNNNN.

Background:

We are dealing with a number of circumstances:

  • At least two different parsers may process the JSON (drupal_add_js setting):
    • HTML parser
    • JSON parser
  • Clients we have no control over need to interpret the JSON: jquery versions, iphone apps, .NET classes, world+dog

This means we need to produce RFC4627 compliant JSON that, at the same time, will not have certain 'special characters' such as ', ", <, > and & be interpreted by an HTML parser.

As RFC4627-2.5 clearly states that "Any character may be escaped", we can avoid special treatment of characters ', ", <, > and & by an HTML parser through simple substitution with a Unicode escape sequence (\uXXXX).

A number of characters MUST be escaped for the JSON parser. These are:

  • "
  • \
  • U+0000 - U+001F

For a number of characters (eg "), it is possible to choose the escape form and either precede them with a backslash (JSON Escape Sequence, eg \"), or use the \uXXXX (Unicode escape sequence, eg \u0027) form. HOWEVER, because we also need to deal with the HTML parser, _it_ may interpret the quote before the the JSON parser even has the chance to run. Because of this, it is advisable to use the Unicode escape sequence (\uXXXX) here as well.

Right now, drupal_to_js is not RFC4627 compliant. This causes problems with a number of non-core JSON consumers.

Violations:

  • Control characters such as U+001F are not escaped
  • It uses the undefined escape sequence \xXX, which only works on certain JSON implementations (likely, only JS eval)
  • It improperly uses the non-existing JSON escape sequence \0 for the 0 byte instead of the Unicode escape sequence \u0000

Additional problem:

  • It risks interpretation of certain characters as special by the HTML parser due to its use of addslashes (\")

Now, enough about the problems of the current implementation, on to the patch.

It correctly escapes U+0000 - U+001F (Ascii 0 - 31) and the potential special HTML characters.

AFAIK we do not have to escape /, U+2028 and U+2029 according to the specs, but naive JSON parsers (eval) may have trouble with it.

As there's no harm in escaping these characters, I've kept them in the revised patch. All this reroll did was to expand the U+0000 - U+001f range and add comments.

Files: 
CommentFileSizeAuthor
#104 drupal6-json_encode_fix-1086098-105.patch3.97 KBjenlampton
PASSED: [[SimpleTest]]: [MySQL] 190 pass(es).
[ View ]
#95 json-encode-fix-1086098-95.patch3.19 KBmdupont
PASSED: [[SimpleTest]]: [MySQL] 190 pass(es).
[ View ]
#90 json-encode-fix-1086098-90.patch3.62 KBmdupont
PASSED: [[SimpleTest]]: [MySQL] 190 pass(es).
[ View ]
#84 json-encode-fix-1086098-84.patch3.69 KBmdupont
PASSED: [[SimpleTest]]: [MySQL] 190 pass(es).
[ View ]
#62 drupal-json_encode_fix-1086098-62.patch566 bytesyannickoo
FAILED: [[SimpleTest]]: [MySQL] Invalid patch format in drupal-json_encode_fix-1086098-62.patch.
[ View ]
#58 json-encode-fix-1086098-58.patch3.47 KBmdupont
PASSED: [[SimpleTest]]: [MySQL] 190 pass(es).
[ View ]
#57 json-encode-fix-1086098-57.patch3.47 KBmdupont
PASSED: [[SimpleTest]]: [MySQL] 190 pass(es).
[ View ]
#54 json-encode-fix-1086098-52.patch553 bytesmdupont
PASSED: [[SimpleTest]]: [MySQL] 190 pass(es).
[ View ]
#49 json-encode-fix-1086098-49.patch585 bytesmdupont
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch json-encode-fix-1086098-49.patch. This may be a -p0 (old style) patch, which is no longer supported by the testbots.
[ View ]
#45 json-encode-fix-1086098-45.patch488 bytesyannickoo
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch json-encode-fix-1086098-45.patch. This may be a -p0 (old style) patch, which is no longer supported by the testbots.
[ View ]
#43 json-encode-fix-1086098-43.patch566 bytesyannickoo
FAILED: [[SimpleTest]]: [MySQL] Invalid patch format in json-encode-fix-1086098-43.patch.
[ View ]
#1 drupal_to_js_proper_json-D6.patch2.6 KBHeine
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch drupal_to_js_proper_json-D6_0.patch. Unable to apply patch. See the log in the details link for more information.
[ View ]

Comments

Heine’s picture

Status:Active» Needs review
StatusFileSize
new2.6 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch drupal_to_js_proper_json-D6_0.patch. Unable to apply patch. See the log in the details link for more information.
[ View ]

Note: all subscribes from non-core maintainers will be unpublished.

klonos’s picture

Thanx Heine!

Edit: sorry just saw that this is a core maintainer only topic. If I my post is not unpublished, let me know how I can help with testing or otherwise.

Heine’s picture

Only "subscribes" :)

Testing / reviewing the patch against the restrictions laid out in the RFC or using it with another parser that gave you trouble before is very much appreciated.

EugenMayer’s picture

Having JSON validation issues on a ~1kb json result validated ( or rather not validated ) using http://www.jsonlint.com/ with the unpatched version.

Applying the patch of #1 Heine, validation was successful. I expect the json request to cover all the freaking characters. Its a office document converted to HTML ( german ones, so also special chars ).

I tend to say the patch works great :)

fabianderijk’s picture

subscribe

Dave-B’s picture

subscribe

I've just applied the patch against D6.20, and it fixed (per http://www.jsonlint.com/) the JSON validation problem I had found with an exposed filter in Views.

mikeytown2’s picture

Is there a reason we are using strtr instead of str_replace? According to this benchmark using str_replace for arrays is quicker http://www.cznp.com/blog/3/strtr-vs-str_replace-a-battle-for-speed-and-d... and in D7 we now use str_replace http://api.drupal.org/api/drupal/includes--common.inc/function/drupal_js...

By the way I've incorporated this patch into advagg http://drupalcode.org/project/advagg.git/commit/db78495 but I use the PHP json_encode function if it exists.

EugenMayer’s picture

@7 Heine told me, that also json_encode in php5 has bugs in its encoding.

buzzman’s picture

I read the entire issue over @ http://drupal.org/node/479368

I notice in comment #129 brad.bulger suggested a workaround to Views Ajax filtering in a D6 site breaking AFTER applying this Patch. Reproducing below for quick ref:

He's essentially set the following 2 lines in the Patch to blanks:

        // While these are allowed unescaped according to ECMA-262, section
        // 15.12.2, they cause problems in some JSON parsers.
        "\xe2\x80\xa8" => '\u2028', // U+2028, Line Separator.
        "\xe2\x80\xa9" => '\u2029', // U+2029, Paragraph Separator.

- Perhaps Brad or anyone else here could add feedback on whether that still applies considering #6 above tested a problem with an "exposed filter in Views" using www.jsonlint.com and it passed there? So does Brad's fix need to go into this current Patch?

- Also per #7 above, should "strtr" still be used instead of "str_replace"?

Big thanks to Heine for all the gr8 work to resolve this frustrating issue :-)

Cheers.

Heine’s picture

What we use in D7 is not an indication for what we should use here; there's a patch pending for D7 as well.

Microbenchmarking strtr vs str_replace makes it clear that strtr is faster on short subject strings up to about 155 bytes. For strings with a length comparable to usernames it is about 2.5x faster. str_replace still takes only about 10 microseconds (vs 4 microseconds).

The performance of strtr gets progressively worse with increasing string lenght; For long strings (12348 bytes) strtr is about 2.8x slower: about 574 microseconds vs 205 microseconds.

The advantage of strtr over str_replace is that

  • the replacement array is more maintainable - though we could use array_keys / array_values on it to generate static from & to arrays for str_replace
  • there's no chance a replacement happens twice - though careful ordering of the replacement arrays can mitigate this for str_replace; \ must be replaced first

This clearly needs some highlevel benchmarking on 'typical' workloads (AHAH requests and JS settings).

mikeytown2’s picture

A good place to bench mark this is on a page that is used to build a feature: admin/build/features/create it has a massive js setting array.

mikeytown2’s picture

What happened to the replacement pair for the Solidus? It's mentioned in the comments but I don't see it. Found the last D7 patch for this issue.

Another thing to benchmark: json_encode if not php5.3 and then doing str_replace to do the Unicode escape sequences vs switch (gettype($var)) {...}

Heine’s picture

We cannot do that because significant quotes in JSON should not be escaped.

+        // Prevent browsers from interpreting the solidus as special and
+        // non-compliant JSON parsers from interpreting // as a comment.
+        '/' => '\u002F',
mfer’s picture

Are there any libraries (as an example or possible point of integration) we can look at?

mfer’s picture

From a pure code style point of view the only thing I noticed was:

@@ -2523,6 +2574,7 @@ function drupal_to_js($var) {
   }
}

+
/**

Notice the extra line being added. This shouldn't be there.

After reading the spec everything looks right and the patch applies against the 6.x branch. Does anyone have a code snippet that can be used to create failing JSON before and working JSON after. I'd like to test this.

EvanDonovan’s picture

Have noticed this in the past using a library called xd.js (Facebook-style cross-domain receiver for Javascript). Will try to test sometime in the next few weeks.

EugenMayer’s picture

@ #15:
I actually had broken json code, (not) validated by the json validator. Applying the pachfixed it. We are talking about a huge json result, arround severa KB, with a _lot_ of variations, umlauts and all this ( its actually a word-document as HTML ). So i expect this path to be pretty robust.

Using this patch on a drupal installation with arround ~130 modules, actually several of those. We are using json extensivly and it all works with this patch.

so for my taste, this patch should ge applied ASAP, no matter the performance-analysis results. To be honest, bootstrapping the full drupal core with using index.php is the real performance hit here, with a huge factor compared to strstr or str_replace. using a light-bootstrap has an enormous effect, so rather optimze at this point :)

mfer’s picture

I understand D7 does this... but why do we have

+        // Prevent browsers from interpreting these as as special.
+        "'" => '\u0027',
+        '<' => '\u003C',
+        '>' => '\u003E',
+        '&' => '\u0026',

The <, >, and & specifically.

Bevan’s picture

Priority:Normal» Major
Status:Needs review» Needs work

I tested this using the following code in a temporary php file with drush's php-script command;

<?php
// Declare an object with different types.
// Do not include an associative array since JSON can not differentiate an associative array from an object.
$object = (Object) array(
 
'integer' => 1234,
 
'float' => 999999.99999999999,
 
'string' => 'foobar',
 
'string_with_quotes' => 'World\'s "Hello"',
 
'array' => array('hi', 'there', 3, 4),
 
'boolean_false' => FALSE,
 
'boolean_true' => TRUE,
);

// Encode it with each.
$drupal_to_js = drupal_to_js($object);
$json_encode = json_encode($object);

// Check if the encoded strings match.
$encodes_match = $json_encode == $drupal_to_js;

// Decode each one.
$drupal_to_js_decoded = json_decode($drupal_to_js);
$json_encode_decoded = json_decode($json_encode);

// Check if the decoded object matches the original object before encoding.
$drupal_to_js_match = $object == $drupal_to_js_decoded;
$json_encode_match = $object == $json_encode_decoded;

// Compare and export the results.
var_export(compact('encodes_match', 'drupal_to_js_match', 'json_encode_match'));
?>

Without the patch the result is;

<?php
array (
 
'encodes_match' => false,
 
'drupal_to_js_match' => false,
 
'json_encode_match' => true,
)
?>

This confirms that drupal_to_js() is broken.

With the patch, the result is;

<?php
array (
 
'encodes_match' => false,
 
'drupal_to_js_match' => true,
 
'json_encode_match' => true,
)
?>

This confirms that the patch fixes drupal_to_js() at least to a point where json_decode() can parse it.

Note however that in both cases $encodes_match is false. On closer inspection the difference is in the handling of double and single quote characters in string values (as well as space characters);

(Note that var_export() escaped backslashes and single-quotes, so there is some double-escaping in the CLI output below.)

<?php
array (
 
'drupal_to_js' => '{ "integer": 1234, "float": 1000000, "string": "foobar", "string_with_quotes": "World\\u0027s \\u0022Hello\\u0022", "array": [ "hi", "there", 3, 4 ], "boolean_false": false, "boolean_true": true }',
 
'json_encode' => '{"integer":1234,"float":1000000,"string":"foobar","string_with_quotes":"World\'s "Hello"","array":["hi","there",3,4],"boolean_false":false,"boolean_true":true}',
 
'encodes_match' => false,
 
'drupal_to_js_match' => true,
 
'json_encode_match' => true,
)
?>

drupal_to_js() encodes single and double quotes to \u0027 and \u0022 respectively. json_encode() escapes double quotes with a backslash and leaves single quotes, as they don't need to be escaped.

Although PHP's json_decode() does not care, there is at least one system which fails to parse \u0027 in JSON strings; Redmine's REST API. (You can test it with the 6.x-2.x branch of the Drupal Redmine API module if you wish. The test to update a redmine issue fails if rest_api_query::execute() is reverted back to use the patched drupal_to_js() instead of json_encode().)

What exactly is the problem with PHP's json_encode() function? If the only problem is that it is only supported in PHP versions 5.2+, then perhaps we can workaround this bug with something like this at the top of drupal_to_json();

<?php
if (function_exists('json_encode')) {
  return
json_encode($var);
}
?>

This is consistent with Drupal's recommended system requirements for Drupal 6 to use PHP 5.2, and does not break Drupal 6 on older versions of PHP any more than it is already broken (and has been for a long time).

If that is not acceptable, perhaps we should look at PHP's implementation of json_encode and copy/mimick that?

mikeytown2’s picture

@Bevan
In terms of php versions, this is how I currently do json encoding in advagg: advagg_drupal_to_js()

Heine’s picture

Status:Needs work» Closed (won't fix)

Gah.

We don't have to match arbitrarily produced JSON. We have to produce VALID JSON. \u0027 is valid. In fact, we could escape EVERY value to \uXXXX and have valid JSON.

Bevan’s picture

Heine;

Thanks for the clarification. In that case it would seem Redmine's REST API fails to parse JSON correctly.

What exactly is the problem with using json_decode() and/or an approach like advagg's?

Regardless of the answer to that. This patch is good; The JSON is valid and the patch is suitable for Drupal 6 and any version of PHP.

Why did you mark this as "Won't Fix"? This is still a major bug in Drupal 6 core's drupal_to_js() function, and this patch fixes it cleanly. I am marking RTBC, at least till we have an explanation as to why "Won't fix" might be appropriate, or if that was an error.

EugenMayer’s picture

Status:Closed (won't fix)» Reviewed & tested by the community

I confirm that withtout this patch, D6 drupal_to_js is not JSON valid.

mparker17’s picture

Subscribe

gateway69’s picture

any updates to this, I just ran into this for a game we are working some of our titles have ' or " , right now we are having to remove them to make the game work because its spitting out improper json data..

we have the latest 6.x drupal running.. ideas?

Heine’s picture

gatway69, what you see is what you get. Apply the patch if you need its functionality now.

vishun’s picture

Just wanted to confirm that this fixed the validity of JSON responses from a Services module API response as well.

Bevan’s picture

vishun; This is still a bug in Drupal 6. It has not been fixed.

vishun’s picture

Indeed, just meant to leave confirmation that this patch fixed the Services module issue I was experiencing on 6.22 with invalid JSON responses due to &, <, and >.

Bevan’s picture

Oh. I misunderstood your comment #27 as a question, not a statement. Ignore me. Sorry.

gateway69’s picture

Is this going to be pushed to core?

Bevan’s picture

If a core maintainer looks at it, I expect so. Go nudge them. ;)

gateway69’s picture

Looks up at the Core gods, here our cry! :)

gateway69’s picture

Some reason I cant get patch in #1 to work, im on pressflow 6.22 and my common.inc has the following code

function drupal_to_js($var) {
  // json_encode() does not escape <, > and &, so we do it with str_replace()
  return str_replace(array("<", ">", "&"), array('\u003c', '\u003e', '\u0026'), json_encode($var));
}

the function starts on line 2489 and ends at 2492

Im trying to manually patch it but it looks a bit differnt than mine, can anyone post the full function ?

MustangGB’s picture

Tested (but not reviewed) and clears up all the errors I was experiencing with jQuery 1.6.1

gateway69’s picture

Question here to all, are you using the patch created or the code that advaggs posted? Whats the major difference?

mikeytown2’s picture

For the curious here is advagg_drupal_to_js() and a link to #1 that contains the patch.

MustangGB’s picture

Of course @mikeytown2's advagg work is massively appreciated, but the project I was working on just didn't have the need for it, so I used only the patch from #1

aliyayasir’s picture

you can try this inside your module. This works for me.

<?php
  
return drupal_json(utf8_encode($output));
?>

i am also using this in my javascript to decode UTF8

http://snippets.dzone.com/posts/show/5294

MrMaksimize’s picture

Hi all,

Just so everyone knows, i put in a patch for jquery_update here. and the patch from #1 is required for it to run in d6 core. However, d6 pressflow does work without any core modifications

EugenMayer’s picture

pressflow uses json_decode / json_encode, as they stopeed 4.x support - they do this out of performance reasons AFAIR.

sin’s picture

#1 patch works to fix Hierarchical Select "Received an invalid response from the server." error with recent jQuery. Tnx!

yannickoo’s picture

StatusFileSize
new566 bytes
FAILED: [[SimpleTest]]: [MySQL] Invalid patch format in json-encode-fix-1086098-43.patch.
[ View ]

Why not just using the attached patch? Works for PHP >= 5.2.1

Status:Reviewed & tested by the community» Needs work

The last submitted patch, json-encode-fix-1086098-43.patch, failed testing.

yannickoo’s picture

Status:Needs work» Needs review
StatusFileSize
new488 bytes
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch json-encode-fix-1086098-45.patch. This may be a -p0 (old style) patch, which is no longer supported by the testbots.
[ View ]

Status:Needs review» Needs work

The last submitted patch, json-encode-fix-1086098-45.patch, failed testing.

yannickoo’s picture

Can anybody tell me why my last patch failed testing?

klonos’s picture

Seems obvious to me from the testbot message:

Unable to apply patch json-encode-fix-1086098-45.patch. This may be a -p0 (old style) patch, which is no longer supported by the testbots.

Take a look here:

...
Note 2: If you choose to create patches with a tool other than Git, be sure to produce a -p1 patch; the old -p0 format was phased out in 2011
...

mdupont’s picture

StatusFileSize
new585 bytes
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch json-encode-fix-1086098-49.patch. This may be a -p0 (old style) patch, which is no longer supported by the testbots.
[ View ]

Rerolling #45 patch.

mikeytown2’s picture

Status:Needs work» Needs review

Status:Needs review» Needs work

The last submitted patch, json-encode-fix-1086098-49.patch, failed testing.

klonos’s picture

The patch in #49 seems to be in the right format (-p1). I don't know what might be wrong.

dman’s picture

Patch #45 started like this

--- /Users/yannick/Downloads/drupal-6.22/includes/common.inc
+++ /Users/yannick/Desktop/common_patched.inc

... It was not built relative to drupal root.

Patch #49 wasn't built relative to Drupal root either.

--- a/website/includes/common.inc
+++ b/website/includes/common.inc

A working patch will be expected to start like:

--- a/includes/common.inc
+++ b/includes/common.inc
mdupont’s picture

StatusFileSize
new553 bytes
PASSED: [[SimpleTest]]: [MySQL] 190 pass(es).
[ View ]

This one should pass. As a reminder, it is a copy of the implementation currently in use in Pressflow.

mdupont’s picture

Status:Needs work» Needs review
dman’s picture

Yay :)
Good one, mdupont

mdupont’s picture

StatusFileSize
new3.47 KB
PASSED: [[SimpleTest]]: [MySQL] 190 pass(es).
[ View ]

This is another version of the patch, using the more complete approach developed by mikeytown2 in advagg module (see #20). It generates valid json even for PHP before version 5.2.0. The only thing I changed from advagg code are some comments.

mdupont’s picture

StatusFileSize
new3.47 KB
PASSED: [[SimpleTest]]: [MySQL] 190 pass(es).
[ View ]

Updating comments to follow more closely the Druapl 7 version committed in #479368: D7: Create RFC compliant HTML safe JSON. The code is almost identical.

gateway69’s picture

be nice if this was included into pressflow as well. ?

mdupont’s picture

I don't think Pressflow supports PHP < 5.2.0, that would make the fallback part irrelevant. However using json_encode() options if PHP >= 5.3.0 may be faster than always using the str_replace() + json_encode() solution.

EugenMayer’s picture

@ #60 that is a solution for D7 but not for D6 ( as its has 5.2 support ). So we have to stick to the manual way for the "non pressflow" version of D6 .. which should still be valid.

Anyhow this issue becomes a useless discussion anyway, due to D6 being dead as it can be. People should just start patching there core manually ( thats what we do, as we are stuck with D6 ). Waiting for D6 non security patches to be part of a D6 release is rather irrealistic :) (just my opinion)

yannickoo’s picture

Status:Needs work» Needs review
StatusFileSize
new566 bytes
FAILED: [[SimpleTest]]: [MySQL] Invalid patch format in drupal-json_encode_fix-1086098-62.patch.
[ View ]

Why we are not using this patch here? Oh, this was added in the patch above.

Status:Needs review» Needs work

The last submitted patch, drupal-json_encode_fix-1086098-62.patch, failed testing.

yannickoo’s picture

donquixote’s picture

khosman’s picture

subscribe

mdupont’s picture

Patch in #58 uses code from Drupal 7 and Pressflow 6 to cover all use cases:
- PHP >= 5.3: use json_encode() with options
- PHP < 5.3 and json_encode() is available: use it with additional escaping (Pressflow code)
- PHP < 5.3 and no json_encode(): fallback to manual encoding (Drupal 7 code with proper \uXXXX escaping)

klonos’s picture

...I'm not qualified to give this a proper review, but this logic seems perfect to me. All possible cases are covered.

mattconnolly’s picture

If you update to a later version of jQuery (I tried current 1.7) then all AHAH functionality breaks because of the non-conformant JSON. In particular, the '\x' escaping is a problem.

+1 for this needs to be fixed.

lawik’s picture

What can we do to aid in this. I have iOS client libraries that have been choking on the JSON. We have workarounds, but they are rather awful.

I'm not well read on the JSON RFC so I doubt my review would give anything. But if there's something needed I'd like to help.

Following this.

mikeytown2’s picture

Status:Needs review» Reviewed & tested by the community

#58 is RTBC. It's been tested in D7 & AdvAgg.

klonos’s picture

Yeah but this issue is against 6.x. #479368: D7: Create RFC compliant HTML safe JSON is for D7.

mikeytown2’s picture

#58 is a patch for D6

klonos’s picture

I know, but (emphasis is mine):

It's been tested in D7 & AdvAgg.

???

klonos’s picture

...let me clarify this: Sure, the D7 equivalent of the patch on #58 was tested and was RTBC'ed in D7. That doesn't mean that its D6 version automatically "inherits" its RTBC status. That's no reason to RTBC something - it should be tested in D6. Specifically, we need more people to confirm that it doesn't break (other) things for them in various environments (php versions for example). That'd qualify it as RTBC.

mikeytown2’s picture

I would also add that most of what this code does is already in pressflow. I don't know how to test this even more. If this isn't RTBC I don't know what to change if we want RFC compliant JSON.

EugenMayer’s picture

@76: D6 is basically dead is it can be. There is no sense in really waiting for fixes which are not-sec fixes anymore, at least thats what we do. This issue is open for "ages" and nobody took any actions on releasing this - i dont think this will change.

People should just go forward and "core hack" fixes. Our list of non-yet-applied core-fixes is way to long to even wait actively to see it ever getting smaller :)

Most of us, at least those who are able, should move to D7. It seems we dont have the bandwith, as a community, to maintain 2 core at the same time.

Bevan’s picture

Confirming RTBC: I have been using this patch on D6 sites and it works fine.

mdupont’s picture

D6 is still far from EOL, and this bug is hitting more and more people that are using modern JSON parsers. It is important a fix is committed.

Gábor Hojtsy’s picture

Status:Reviewed & tested by the community» Needs review

@EugenMayer: the way I'm seeing it, we lack comprehensive testing power to verify Drupal 6 patches. Some people report some success in their environments, but its more often than not hard to gauge if a certain patch was well tested. Drupal 6 does not have any automated tests and relies on crowdsourced testing. With people starting any new Drupal projects on Drupal 7, there is almost nobody testing the development version of Drupal 6 and the chance of errors with fixes is high. Look at the last Drupal 6 release that only fixed two issues *introduced with the previous version*.

Now for this concrete issue:

1. The coding style in the latest patch does not match Drupal. It is surprising it had been around in RTBC for this long.
2. It is not clear if people commenting after that tested the latest patch or #58.
3. The code in #58 is somewhat similar, but is sufficiently different from the D7 code (http://api.drupal.org/api/drupal/includes%21common.inc/function/drupal_j...) which does not use str_replace() for example in any condition. Saying the code was tested in D7 therefore does not stand.

As with many Drupal 6 issues it is very hard to tell who tested which patch and therefore which patch can be relatively trusted for review.

mdupont’s picture

@Gábor : the patch in #58 takes the str_replace fallback code from D7's drupal_json_encode_helper(), so we can be quite confident it works for PHP versions that don't have json_encode().

donquixote’s picture

I do still have a few D6 sites to maintain.
I am more likely to test this, if someone can explain in the issue summary what exactly needs to be tested.

EugenMayer’s picture

@Gabor, read you.

Iam using the patch of 58@ for a long time already and we are doing extremly much with ajax, also pretty huge things and it works, yet, flawlessly. A D6 core without this patch, though, is completly useless ( for us ).

So even if there might be a edge-case, the current status is a lot worse.

mdupont’s picture

StatusFileSize
new3.69 KB
PASSED: [[SimpleTest]]: [MySQL] 190 pass(es).
[ View ]

This patch is #58 with some code style fixes. Please review and test.

There are 3 ways to encode data in JSON with this patch:

  1. With PHP 5.3+, it uses json_encode() with additional options. This bit of code is taken verbatim from D7's drupal_json_encode().
  2. With PHP 5.2.x or PECL json, it uses json_encode() which doesn't support options, so str_replace() is used to escape some characters. This bit of code is taken from Pressflow 6's common.inc and a little customized for code readability and for escaping apostrophes and quotes as well, to behave exactly the same than the PHP 5.3 version.
  3. Without json_encode(), the encoding is completely emulated. This bit of code is taken verbatim (apart from function names) from D7's drupal_json_encode_helper().

In short, every piece of this code has already been deployed and tested in either Pressflow 6 or D7, so there's little chance of a serious bug.

lawik’s picture

I can't claim I have reviewed the patch in #84 but I did just apply it to a problematic installation using json_server and it cleared up the problem (unescaped or badly escaped HTML) our JSON parsers were choking on. So I like it and hope it gets in sometime soon. I still have plenty of projects running Drupal 6 and there are still every now and then reason to still use it :)

Update: Instead of posting another comment.
Applied this on another problematic JSON Server Services2-based installation. Fixed it right up. I'll report if I experience problems stemming from this patch.

mdupont’s picture

One good way to test this patch is to make an AJAX request with a recent jquery version (1.4.2 or later IIRC) to retrieve JSON output from a Drupal 6 installation. Without this patch, it should break on special characters. With it, it should work.

You can also try under Firefox by calling a Drupal path that will produce JSON output of a text containing U+2028 (Line Separator) or U+2029 (Paragraph separator) .

krishworks’s picture

looking forward to have this patch committed. see my related issue with heartbeat module (http://drupal.org/node/1626882) because of the problem described in this issue.

deekayen’s picture

Status:Needs review» Needs work

Changing status to reflect a problem with #84. When I did update.php from core 6.22 to 6.26, the batch api UI bar wouldn't load. I've had better luck with #58.

matkeane’s picture

Similar problem here. While patch #86 has been running flawlessly for me on one site on a server with PHP 5.3, when I updated a different site (on a server with PHP 5.2.17) to Drupal 6.26, all the Javascript elements (Google Maps, FCKeditor) died due to JS errors with the encoding of the settings in the page header. Firefox error console shows:

Error: SyntaxError: invalid property id
Line: 305, Column: 32
Source Code:
jQuery.extend(Drupal.settings, {\u0022basePath\u0022:\u0022\/\u0022,\u0022googleanalytics\u0022: [etc...]

I've rolled back to the default Drupal 6.26 common.inc file and all is well again.

mdupont’s picture

Status:Needs work» Needs review
StatusFileSize
new3.62 KB
PASSED: [[SimpleTest]]: [MySQL] 190 pass(es).
[ View ]

Updated the patch not to encode single and double quotes. It is what is breaking Drupal.settings encoding.

These are encoded in D7 so I thought it would be a good idea to do the same on D6. I stand corrected. I reverted the behavior to be the same than the original D6 and Pressflow code: escape "<", ">" and "&", but don't escape quotes.

matkeane’s picture

OK, I've just tested patch #90 on my site running D6.26 and PHP 5.2.17 and Javascript elements are appearing as expected. Thanks for that!

Another site, also running 6.26 but under PHP 5.3.3, is running fine with the patch in #84 though!

Maedi’s picture

Path #90 works well for my D6.26 too, commit this baby to core?

mdupont’s picture

Status:Needs review» Reviewed & tested by the community

As per #91 and #92.

Heine’s picture

#84, bullet 2 is incorrect. If json_encode doesn't support the desired escaping strategy, just use the custom PHP code implementation, don't replace the significant quotes.

mdupont’s picture

StatusFileSize
new3.19 KB
PASSED: [[SimpleTest]]: [MySQL] 190 pass(es).
[ View ]

Edited the patch so the custom PHP implementation is used if json_encode() doesn't support the desired escaping strategy.

matkeane’s picture

With patch #95 and PHP 5.2.6, Javascript settings are being encoded as:
jQuery.extend(Drupal.settings, { "basePath": "\u002F", "query": "node\u002F132\u002Fedit", etc...

but with patch #90, they were showing as:
jQuery.extend(Drupal.settings, {"basePath":"\/", "query":"node\/132\/edit", etc

I'm not seeing the same errors as with patch #84 (no JS console errors in Firefox, and the text editor is working OK), so I was just wondering whether the encoding of the slashes in the path was intended and/or necessary.

mdupont’s picture

Status:Reviewed & tested by the community» Needs review

Patch #95 dropped support for json_encode() in PHP 5.2, following Heine's advice and sticking more closely to how JSON encoding is done in D7. I think that's why you are seeing a difference in the output with PHP 5.2.

The encoding of slashes is intended and is done when using the fallback encoding method. You can also see the same encoding applied in http://api.drupal.org/api/drupal/includes%21json-encode.inc/function/dru...

<?php
       
// Prevent browsers from interpreting the solidus as special and
        // non-compliant JSON parsers from interpreting // as a comment.
       
'/' => '\u002F',
?>

However what you noticed is interesting: when using native json_encode(), slashes are escaped as "\/" whereas we are encoding it as '\u002F'. I've reproduced the same behavior on a D7 site running on PHP 5.3.8. Both are valid JSON output and do not seem to create any problem, even if there is no point not to do the same thing than json_encode(). Anyway, if ever there is an issue with this, it should be fixed in D7 first, then backported in D6.

Note: forgot to switch back to Needs Review after posting #95.

matkeane’s picture

Thanks for the explanation! I was curious as to why the output was different, but the important thing is that things are working OK with PHP 5.2. Haven't had a chance yet to retest on the server with 5.3, but the patch shouldn't have changed anything there, so patch #95 looks good to me.

EugenMayer’s picture

iam using it with PHP 5.3.4 and 5.3.19 - it works perfectly. And yes, there is quiet a difference between those to for drupal (6), even a pretty significant one

sp3boy’s picture

I am trying #95 on PHP 5.2.13 and it is not working. I'm still investigating but it appears to give problems with the eu-cookie-compliance module, which puts HTML into its js settings output. Double quotes around attribute declarations in the HTML are not being encoded so breaking the json structure. Re-introducing encoding of double quote to \u0022 eliminates the javascript errors I'm seeing.

c960657’s picture

Status:Needs review» Needs work

I can confirm comment #100 that the patch in #95 is broken.

Updated the patch not to encode single and double quotes. It is what is breaking Drupal.settings encoding.

No, AFAICT the problem with the patch in #84 is the following piece of code that escapes the output of json_encode(), including the enclosing pair of quotes.

  if (!isset($json_encode)) {
    $json_encode = function_exists('json_encode');
  }

  if ($json_encode) {
    $replace_pairs = array(
      '<' => '\u003C',
      '>' => '\u003E',
      "'" => '\u0027',
      '&' => '\u0026',
      '"' => '\u0022',
    );
    return strtr(json_encode($var), $replace_pairs);
  }

So I think it was a wrong conclusion that we should not escape quotes. I think we should do like in D7, i.e. we should avoid introducing a new (third) code path using the function_exists('json_encode') call, and we should keep the following lines from the D7 version:

        // ", \ and U+0000 - U+001F must be escaped according to RFC 4627.
        '\\' => '\u005C',
        '"' => '\u0022',
Heine’s picture

So, what was wrong with the earlier patches? Why the need to keep using json_encode on php versions where it is not appropriate?

c960657’s picture

So, what was wrong with the earlier patches?

The patch in #54 is a copy of that in Pressflow but was rejected because it requires PHP 5.

Later patches include a check for function_exists('json_encode')). That is not included in the D7 version, and the corresponding logic wasn't completely right. I don't know why this was introduced in the first place. I checked a few of the preliminary patches in the D7 ticket (#479368: D7: Create RFC compliant HTML safe JSON), but it wasn't used there.

The D7 approach has two code paths: one for PHP 5.3 and one for earlier versions. AFAICT the latter is also compatible with PHP 4, so I don't see why we shouldn't just make the solution in D6 identical to that in D7.

jenlampton’s picture

Status:Needs work» Needs review
StatusFileSize
new3.97 KB
PASSED: [[SimpleTest]]: [MySQL] 190 pass(es).
[ View ]

So it looks like the closest patch (the one we should start from) was the one in #58.

I've removed the check for function_exists('json_encode')) and made the solution look a lot more like what Drupal 7 is doing.

The attached patchsolves the problems with jQuery Update (1.4+) and views ajax handlers on my site (which was how I got here) but needs further review.

c960657’s picture

Thanks for picking this up.

I manually compared the code to the D7 counterpart, and did some rudimentary testing (on a site where all body texts are passed through drupal_to_js()). So far everything looks fine, except one nit:

+  // json_encode() escapes <, >, ', &, and " using its options parameter, but
+  // does not support this parameter prior to PHP 5.3.0. Use str_replace to do
+  // it.

The comment mentions str_replace(), but in fact strtr() is used.

milovan’s picture

All these patches are breaking for example module webfm.

How to reproduce:
- setup wysiwyg module with ckeditor, and turn webfm button in ckeditor on
- click on image icon, then in popup click on Browse server button
- webfm popup will open up
- upload image through webfm popup
- right click on image and choose "Send to rich text editor"
- file path will be copied to image popup (where Browse server button is) BUT with incorrect links:

\/webfm_send\/3

or

\u002Fwebfm_send\u002F3

instead:

/webfm_send/3

Without this patch, all works as expected.

debaryadas’s picture

Can anyone suggest me which patch among the above lists of patches I should use if the output string contain special characters ', ", & only with alphabets and numbers. It seems that every patches has its own advantages and disadvantages.