Problem/Motivation

Stumbled upon this in my travels a http://stackoverflow.com/a/7723730/80281

http://nathanjbrauer.com/playground/serialize-vs-json.php

serialize() w/ md5() took: 0.27976584434509 sec
json_encode() w/ md5() took: 0.071495056152344 sec

json_encode is 391.3% faster with a difference of 0.20827078819275 seconds

Proposed resolution

Consider replacing the hashing with this.

Remaining tasks

User interface changes

API changes

Comments

joelpittet’s picture

Status: Active » Needs review
StatusFileSize
new3.36 KB

Status: Needs review » Needs work

The last submitted patch, 1: 200_improvement_with-2503261-1.patch, failed testing.

joelpittet’s picture

Version: 7.x-2.x-dev » 7.x-2.10
Status: Needs work » Needs review

Things must have changed... let's test against stable.

mikeytown2’s picture

According to this json_encode() is faster once you're dealing with a LARGE amount of data (70+ MB)
http://techblog.procurios.nl/k/n618/news/view/34972/14863/cache-a-large-...

All said and done we're talking about a sub ms improvement here if json_encode is indeed faster.

In this case I'll need some proof that json_encode is faster than serialize inside advagg.

Status: Needs review » Needs work

The last submitted patch, 1: 200_improvement_with-2503261-1.patch, failed testing.

mikeytown2’s picture

If you flip json and serialize around so json goes first, serialize ends up being faster


//The json test
$b4_j = microtime(1);
for ($i=0;$i<10000;$i++) {
    $serial = md5(json_encode($array));
}

//The serialize test
$b4_s = microtime(1);
for ($i=0;$i<10000;$i++) {
    $serial = md5(serialize($array));
}
echo 'serialize() w/ md5() took: '.($sTime = microtime(1)-$b4_s).' sec<br/>';


echo 'json_encode() w/ md5() took: '.($jTime = microtime(1)-$b4_j).' sec<br/><br/>';
echo 'json_encode is <strong>'.( round(($sTime/$jTime)*100,1) ).'%</strong> faster with a difference of <strong>'.($sTime-$jTime).' seconds</strong>';

Running each one by its self is a better way to test. With that I get that json_encode takes 1/2 the time; an improvement but not as big as what was reported.

mikeytown2’s picture

Title: 200%+ improvement with json_encode over serialize() » Micro Optimization (sub millisecond improvement): json_encode() is faster than serialize() for hashing
joelpittet’s picture

Status: Needs work » Closed (works as designed)

@mikeytown2 Sorry you are totally right reversed makes that change moot! Thanks for showing me!

Here's my test results which seem more consistent and show json_encode slower!

PHP 5.5 + Opcache

Serialize Run First

serialize(): 96 ms
json_encode(): 235 ms

serialize(): 105 ms
json_encode(): 249 ms

JSON Run First

serialize(): 100 ms
json_encode(): 202 ms

serialize(): 95 ms
json_encode(): 218 ms

json_encode is now consistently 100-150ms slower regardless of it's order run.


// The json_encode test.
function test_json($data) {
  $start = microtime(TRUE);
  for ($i = 0; $i < 10000; $i++) {
    $serial = md5(json_encode($data));
  }
  return microtime(TRUE) - $start;
}

// The serialize test.
function test_serialize($data) {
  $start = microtime(TRUE);
  for ($i = 0; $i < 10000; $i++) {
    $serial = md5(serialize($data));
  }
  return microtime(TRUE) - $start;
}

// Format results.
function print_results($sTime, $jTime) {
  $sTime = ($sTime * 1000);
  $jTime = ($jTime * 1000);
  echo 'serialize(): ' . round($sTime) . ' ms' . "\n";
  echo 'json_encode(): ' . round($jTime) . ' ms' . "\n";
  echo 'Difference between serialize() and json_encode() ' . round($sTime - $jTime) . ' ms ' . "\n\n";
}

$sTime = test_json($data);
$jTime = test_serialize($data);

$j2Time = test_serialize($data);
$s2Time = test_json($data);

print_results($sTime, $jTime);
print_results($s2Time, $j2Time);

nathanbrauer’s picture

Both of your tests have errors.

@mikeytown2's calculates the end-time for json_encode after BOTH json_encode and serialize has processed.
Here is the corrected code: http://nathanbrauer.com/playground/plain-text/json-vs-serialize.php
In action: http://nathanbrauer.com/playground/json-vs-serialize.php

@joelpittet's saves test_json to $sTime and test_serialize to $jTime (backwards).
Here is the corrected code: http://nathanjbrauer.com/playground/plain-text/drupal-calculation.php
In action: http://nathanjbrauer.com/playground/drupal-calculation.php

:)

mikeytown2’s picture

What happens when you just do json_encode for one request and and serialize for another? Don't run them back to back in the same script. When doing this I found that json_encode is only 50% faster (see the bottom of #7).

joelpittet’s picture

Status: Closed (works as designed) » Needs review

Edit: more typos... arg. Disregard.

joelpittet’s picture

Some individual results as asked by @mikeytown2

serialize(): 260 ms
serialize(): 248 ms
serialize(): 264 ms
serialize(): 253 ms
serialize(): 262 ms
serialize(): 259 ms
serialize(): 247 ms
serialize(): 283 ms
serialize(): 277 ms
serialize(): 263 ms
json_encode(): 109 ms
json_encode(): 109 ms
json_encode(): 98 ms
json_encode(): 100 ms
json_encode(): 107 ms
json_encode(): 98 ms
json_encode(): 116 ms
json_encode(): 104 ms
json_encode(): 101 ms

Code used to generate the results:

// The json_encode test.
function test_json($data) {
  $start = microtime(TRUE);
  for ($i = 0; $i < 10000; $i++) {
    $serial = md5(json_encode($data));
  }
  return microtime(TRUE) - $start;
}

// The serialize test.
function test_serialize($data) {
  $start = microtime(TRUE);
  for ($i = 0; $i < 10000; $i++) {
    $serial = md5(serialize($data));
  }
  return microtime(TRUE) - $start;
}

foreach (range(1, 10) as $value) {
  // echo 'serialize(): ' . round(test_serialize($data) * 1000) . ' ms' . "\n";
  echo 'json_encode(): ' . round(test_json($data) * 1000) . ' ms' . "\n";
}

joelpittet’s picture

Still for this module's use-case I think it's only doing the operation maybe ~10 times not 10,000 times. So at best we could save ~1ms.

Fun experiment nontheless:)

mikeytown2’s picture

Version: 7.x-2.10 » 7.x-2.x-dev
Status: Needs review » Fixed
StatusFileSize
new3.66 KB

This is the patch that I committed. Allows one to change the function by setting a variable; so if https://github.com/igbinary/igbinary is installed and it's quicker one can change to that.

joelpittet’s picture

@mikeytown2 haha, you could just say no, but thank you for the feature:)

mikeytown2’s picture

I'll take a 1ms improvement :)

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.