When inserting users, our default password generator consumes too much time. This is slowing down data migrations into Drupal. More data is forthcoming.

Comments

catch’s picture

Component: comment.module » user system

So I did some profiling on this but not benchmarking yet. Here's the profiling though:

Using devel generate to create the users, and xhprof to profile the request, 10 users created.

Default core settings:
function | calls | percent calls | incl. wall time
md5 163,857 89.7% 7,410,969

That's 16,386 calls to md5() per password hash. And nearly 700ms to generate the hash per user - at least when profiling via xhprof on my laptop.

If I set $conf['password_count_log2'] = 7; in settings.php (the minimum core allows), then generating 10 users takes:

function | calls | percent calls | incl. wall time
md5 1,297 6.4% 58,097

Hat tip to Doug Green for reminding me of the variable, which I couldn't find otherwise.

That's 126 times as fast.

Attached xhprof output to show the difference

Overall time for the requst is reduce from 10.3 seconds to 1.4 seconds - so a 10* improvement in user insert performance.

So I remember the original issue, but I'm not qualified to comment on the security aspects of this (#29706: More secure password hashing for reference), however:

1. sha1, hash() and other (presumably much faster) alternatives were removed during the course of that issue to enable people to move between shared hosts easily. I would rather we set a variable somewhere to indicate that one or the other extension was used to hash passwords, then have a hook _requirements() to warn people if that extension becomes unavailable. It's not worth nearly a second of processing every user_save() just to account for (very) crappy hosting.

2. I see at least some reference in that issue to support for PHP4 and possibly PHP < 5.2.3 - would need to read over it in depth which I don't have time for now, but possibly we could take advantage of something a bit quicker now we've firmly dropped support for older PHP versions. This is much the same as #1.

3. At the very, very least, simpletest should be setting this variable to the absolute minimum while doing test runs. I dread to think how many users we create during testing but imagine it's in the dozens or hundreds.

I will try to do some benchmarks of this to verify the numbers aren't swayed too much by xhprof profiling, but seems we ought to be able to make some kind of optimization here without sacrificing security.

catch’s picture

Title: Improve performance of password generation » Simpletest should set $conf['password_count_log2'] = 7;
Component: user system » simpletest.module

So I was profiling with xhprof, but managed to leave xdebug on in php.ini while doing so, mea culpa :/

Microbenchmarks speak for themselves:

Bootstrapped Drupal, included password.php, did 100 iterations of user_hash_password() with default variable, then 100 iterations with $conf['password_count_log2'] = 7;

xdebug enabled and profiling:
default :79.76 seconds
log2 = 7: 1.04

xhprof enabled and profiling:
4.70
0.23

xhprof enabled but not profiling:
2.37 seconds
0.24 seconds

xdebug and xhprof completely disabled:
1.82 seconds
0.20 seconds

So if this was in the critical path it'd be pretty bad, but that's about 55 hashes per second with the default setting, and 500 set to the lowest possible, instead of around 1. I'll think I'll be forgetting to enable xdebug permanently from now on and stick with my new friend xhprof...

Despite all this though - it would still make a tonne of sense for devel generate and simpletest to set this variable - since that's precisely when you're likely to have xdebug running. So re-purposing this and apologies for the somewhat false alarm.

catch’s picture

Also I think we should consider setting the MIN_HASH_COUNT constant lower or removing it altogether - could have a warning in hook_requirements() if the variable is set low, but simpletest and devel generate are valid reasons to keep it flexible IMO.

pwolanin’s picture

The whole point of this stretching is to make it hard to crack passwords, so it's supposed to be slow. I supposed we could lower the constant for speedier development and imports, but it's something you don't want in production.