Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Password expirations and expiration warnings are handled by cron. For sites with many users (1000s or more), the cron code is problematically inefficient--it uses too much memory and takes too long to execute. The result is password expiration becomes unusable because memory limits are exceeded or execution timeouts occur. All current release branches are affected: 7.x-1.x, 7.x-2.x, and 8.x-3.x.
Need to make the expiration cron code capable of efficiently handling 1000s (if not 10000s and 100000s) of users.
Comments
Comment #1
AohRveTPV CreditAttribution: AohRveTPV commentedThis looks like a design issue: A condition plug-in acts on a single user, and the condition callback expects a user object as an argument. So to evaluate a condition for many users, their user objects must all be loaded and then individually passed to the condition, which is very inefficient for large numbers of users.
Instead I think a condition should act as a filter on an array of candidate uids. The condition could take as an input the array of uids, retrieve only the data it needs to evaluate the condition, and return the uids that match the condition. For instance, for the role condition, you need only do some table joins to determine whether users have some roles, not
user_load()
each user. A condition could stilluser_load()
if it really needed to.I have little familiarity with the code, and have only cursorily looked at this problem, so I am unsure whether this suggested design change is sensible. Setting issue to Needs Review.
For this specific bug, it looks like a fix could alternatively be hacked by building faux user objects without
user_load()
that have only the information needed for the role condition to be evaluated.Comment #2
deekayen CreditAttribution: deekayen commentedthere's no patch to review...
Comment #3
AohRveTPV CreditAttribution: AohRveTPV commentedComment #4
colanComment #5
Elijah LynnRead that this was a blocker over at #2331599: 2.0 release roadmap?. Think this should be bumped to a major then if not critical.
Comment #6
gregglesI think one solution to this could be to modify the query so it only runs on users who have a role that has a policy. At least in 1.x it appears it runs for everyone.
A workaround would be to make it possible to run this via drush instead of via the normal cron interface. Then it can be run in a place where timeouts and RAM are less likely to be an issue.
I'm interested in working on improving both of those things for the 7.x-1.x branch which seems like it would mostly be be portable to 7.x-2.x. If it seems off-topic for this issue then I can move it elsewhere.
Comment #7
gregglesI started with an explain on the main query that seems to take a long time for me:
I noticed that it's using an orderby and doesn't join to password_policy_role/users_roles to reduce the set of uids to query. So, I thought I'd try tweaking those things to see if they can help the plan.
Removing the orderby seems to dramatically improve the plan removing the "Using temporary" and "Using filesort":
And here's the plan without the orderby and with joining to users_roles and password_policy_role. My sense is that Using temporary/Using filesort is not great, but if you look at how many rows it is returning...they are *drastically* reduced.
In practice on my site, running that query returned 384 rows in set (0.01 sec). Whereas the previous query (without orderby) returns 921720 rows in set (7.67 sec). The query with orderby returns 921720 rows in set (8.27 sec)
We can even take it a little further and add that the policy has to be enabled and has to involve an expiration. This would work well on a site where the number of users affected by an enabled expiration policy is much lower than those affected by a general policy:
If we can reduce the number of rows returned, we'll reduce the size of the arrays that are built which will reduce the memory consumed. It also seems to be a much more efficient query for the database to run, so we're not shifting load to the database but in fact reducing load on the database.
This isn't a catch-all solution: if a site has a policy that affects basically every user on the site they are still stuck returning all the data and managing it in php. I think there are probably some ways to handle the data more efficiently as well, but I think the query improvements are worthwhile on their own.
Comment #8
gregglesUntested patch because I have to stop working on this for the day - needs review and testing.
Comment #10
gregglesWell, the patch doesn't apply to 7.x-1.x. Really the question is if this approach makes sense to people before I go further with it.
Another approach I think could be taken at the same time is just pushing this out of cron so that it only runs on a nightly basis.
Really, I think that doing both of these ideas is the right way to go. Any feedback on them? I use the 7.x-1.x branch so this is against that as well, but if people like it conceptually I can make a 7.x-2.x patch and then backport.
Comment #11
AohRveTPV CreditAttribution: AohRveTPV commentedFiltering to roles to which expiration policies apply seems sensible to me (why not?), even if it will not help the problem where there are just very many users to which expiration policies apply.
Would it hurt performance much to always filter on roles that have an expiration policy? So instead of the following as in your patch:
Query 1: Does an expiration policy apply to the 'authenticated user' role?
Query 2: If not, filter to roles to which expiration policies apply.
Do:
Query 1: Filter to roles to which expiration policies apply (even if 'authenticated user' is one of the roles).
I ask because the expiration code in Password Policy 7.x-1.x is already overcomplicated and needing refactoring, IMHO. Wary of adding more complexity.
Comment #12
gregglesIf it applies to all roles then I believe there is a performance decrease from adding 2 unnecessary joins to the process, even if the joins are on indexed columns.
I'd be fine doing your proposed way...people who have a password policy that applies to all users and who have a lot of users are probably already doing something custom (e.g. running in Drush) to make this work at all.
Any thoughts on the code in #10 to make this all basically optional?
Comment #13
gregglesOK, here's a 7.x-1.x patch that combines the two concepts and explains the use of the hidden variable.
Moving the version to 7.x-1.x for automated tests.
I'm happy to provide a similar patch for 7.x-2.x - just let me know.
Comment #14
gregglesComment #17
greggles[] vs. array() mistake.
Comment #18
gregglesTo help benchmark the performance/scalability issues here I ran the following tests on a brand new site. The result is that PHP memory consumption goes from ~71MB to ~33MB if the password policy applies to a subset of users. The time to run the command goes from ~6.6 seconds to ~.5 seconds. In the case where the policy applies to basically authenticated users then performance/resource consumption is not affected (no surprise). In the case where password policy applies to a role that is present on a large number of users (~3,000) the query is still much faster than trying to load all the users.
Steps:
* created a new role, that does not have a password policy associated with it.
* created a password policy associated with admin account
* ran "drush genu --roles=3 100" and then "drush genu --roles=4 3000" and then "drush genu 30000"
* ran the command:
/usr/bin/time -l drush -v ev "password_policy_cron();"
Current 7.x-1.x-dev:
With this patch:
With this patch, and apply password policy to all auth users:
With this patch, and apply password policy to a large number of users with a role:
Then I did "drush genu --roles=4 10000" to test out the scenario of a "lot" of users with roles and it's still dramatically better.
Comment #19
AohRveTPV CreditAttribution: AohRveTPV commentedLooks good to me.
What throws an exception in
password_policy_process_expirations()
? The docblock says it throws\Exception
.Comment #20
gregglesNot sure, that was auto-generated by phpstorm. I think any db query *can* throw a PDOException, but if you'd like to remove that I can easily do so.
Please let me know if you want me to port this to 7.x-2.x. I can say that on a site I manage cron runs went from 21 minutes to 1 minute after we deployed this.
Comment #21
AohRveTPV CreditAttribution: AohRveTPV commentedMade several minor changes to patch:
- Removed couple lines changing table aliases. Also removed "We don't" -> "Don't". Agree with the changes but think they are unrelated so would like to commit separately. Please correct if wrong.
- "with a command like" -> "with a Drush command like" (Administrators may not know about Drush and this may help a little to figure out it is not a native Drupal command.)
-
DRUPAL_AUTHENTICATED_RID
in place of 2- Capitalized "Password Policy" in watchdog message.
I have not tested this functionality.
Comment #24
AohRveTPV CreditAttribution: AohRveTPV commentedLet's see if the patch from #17 fails testing too. (Using the "Retest" link does not seem to work for patches submitted before the CI switch.)
Comment #25
AohRveTPV CreditAttribution: AohRveTPV commentedTests won't run because the automated tests are "Waiting for branch to pass".
The 7.x-1.x branch isn't passing because it is being tested on D6 for some reason, and I can't find a way to delete the D6 test configuration. Any ideas?
It looks like a CI bug. Will try to figure out how to report an issue.
Comment #26
AohRveTPV CreditAttribution: AohRveTPV commentedSubmitted DrupalCI support request:
https://www.drupal.org/node/2638452
Comment #27
gregglesThanks for identifying the problem. I subscribed to the other issue. I took a look at the settings and agree with what you see, but don't see a way to fix it.
Comment #29
AohRveTPV CreditAttribution: AohRveTPV commentedHaving trouble seeing the functional difference between greggles patch in #17 and my patch in #21.
Comment #30
AohRveTPV CreditAttribution: AohRveTPV commentedI see it now. Re-using the 'p' alias for different tables. I wrongly thought changing the alias for
password_policy_history
to 'h' was a cosmetic change. Will fix.Comment #31
AohRveTPV CreditAttribution: AohRveTPV commentedComment #33
AohRveTPV CreditAttribution: AohRveTPV commentedComment #34
AohRveTPV CreditAttribution: AohRveTPV commentedCommitted "We don't" -> "Don't" in comment separately:
http://drupalcode.org/project/password_policy.git/commit/8b95a31
http://drupalcode.org/project/password_policy.git/commit/63c12aa
Comment #35
gregglesThanks, AohRveTPV. I'll try to work on the 7.x-2.x if you don't beat me to it.
Comment #37
AohRveTPV CreditAttribution: AohRveTPV commented#31 breaks expiration.
Comment #38
AohRveTPV CreditAttribution: AohRveTPV commentedThe patch changes:
to:
However, this line remains:
The result is
$row->p_created
is always empty, and the user creation time ($row->created
) is used instead of the most recent password change time for determining expiration.The tests should have caught this, IMO, and need improvement.
Comment #39
AohRveTPV CreditAttribution: AohRveTPV commentedComment #40
AohRveTPV CreditAttribution: AohRveTPV commentedComment #41
AohRveTPV CreditAttribution: AohRveTPV commentedComment #43
AohRveTPV CreditAttribution: AohRveTPV commentedAccidentally tested patch on 7.x-2.x instead of 7.x-1.x. Re-testing.
Comment #44
AohRveTPV CreditAttribution: AohRveTPV commentedMade improvement to test method that would have caught bug in #31:
http://cgit.drupalcode.org/password_policy/commit/?id=ee096470a271ab3298...
After the user whose password is expired changes their password, again run cron, then attempt login, to check that the password is not immediately re-expired.
Comment #46
AohRveTPV CreditAttribution: AohRveTPV commentedComment #48
AohRveTPV CreditAttribution: AohRveTPV commentedRemoving
orderBy('p.created')
might be problematic, because of this code:The purpose of the statement in the for loop is to get the time of the user's most recent password change.
But there can be multiple rows for each user, each with a different p_created. (The password_policy_history table stores the times of all password changes for a user, not just the most recent.) So the code is relying on the last iteration of the for loop for a given user to provide their most recent password change time in
$row->p_created
. I'm not sure we can depend on the last iteration being the most recent password change time unless the results are ordered ascending onp.created
.I've reverted the patch out of caution. Will recommit if convinced removing
orderBy()
is safe.(The intent of the code I excerpted above is not very clear. The expiration code in general is complicated and hard to read, in my opinion. I am working on refactoring it.)
Comment #49
andypostComment #50
AohRveTPV CreditAttribution: AohRveTPV commentedorderBy('p.created')
is needed, so #40 will not work. See #2833776: Multiple warning mails sent per day in certain condition for an example of a problem that may occur when rows are not ordered as expected.Comment #51
Sebastien M. CreditAttribution: Sebastien M. commentedThis performance issue seems to be fixed just by changing a flag in the 'user_load()' function in the 'PasswordPolicyExpire::cron' method.
Indeed, User entity controller use by default a static cache to store already loaded entities.
In a batch process, all users are different, so no need to use this kind of cache.
Instead, you can 'reset' at each call this cache and it won't be fill.
File 'plugins/item/expire.inc' at line '327' :
What do you think about it ?
Should I create a patch or a new issue ?
Many thanks
Comment #52
AohRveTPV CreditAttribution: AohRveTPV commentedSebastien, thanks for the idea, I'll plan to review it and try it out soon! A patch would be helpful.
Your suggestion applies to 7.x-2.x, so updating issue version accordingly. Both 7.x-1.x and 7.x-2.x have this problem.
Comment #53
Sebastien M. CreditAttribution: Sebastien M. commentedThanks for your feedback.
Here is the patch for the 7.x-2.x branch.
Comment #54
Sebastien M. CreditAttribution: Sebastien M. commentedHere is the patch for the 7.x-1.x branch.
Comment #55
Sebastien M. CreditAttribution: Sebastien M. commentedIn the 8.x-3.x branch, all users are loaded at once using this code:
Instead, users should be loaded one by one in the following 'foreach' loop:
Comment #56
Sebastien M. CreditAttribution: Sebastien M. commentedHere is my patch for 8.x-3.x branch.
Comment #57
AohRveTPV CreditAttribution: AohRveTPV commentedThanks for the patches. Do you or anyone have a suggestion on how to reproduce the problem, so we can verify that the patch fixes it?
My current idea is to use Drush to create a lot of users--maybe 100,000. I can then see if the PHP memory limit is exceeded, or measure the memory usage using PHP functions. Then, I'll apply the patch and see either the memory limit is not exceeded, or the memory usage is reduced.
Comment #58
Sebastien M. CreditAttribution: Sebastien M. commentedSure. I used Devel to create a batch of 1000 users.
I have added a call to `memory_get_usage` before and in the foreach loop.
Comment #59
AohRveTPV CreditAttribution: AohRveTPV commentedDid some testing with 7.x-2.x. Unfortunately I think memory consumption is just one facet of this performance problem. The cron execution time is extremely long, probably problematically so for sites with 100,000s of users. This seems to be a design issue as I wrote in #1.
Here was my test:
- Fresh Drupal 7 site.
- Password Policy 7.x-2.x-dev, default policy plus password expiration enabled.
- 10,000 users created with Drush and Devel:
drush user-generate 10000
.- 10,000
password_policy_history
entries, 1 for each user, created using Bash/MySQL.1. Ran
drush cron
.2. Cron completed successfully, but only after maybe 15-30 minutes. There was no out of memory error.
If cron takes 15-30 minutes for 10000 users, it could take hours or days for 100000s of users. That is problematic. Passing
$reset = TRUE
touser_load()
won't help the execution speed.Comment #60
AohRveTPV CreditAttribution: AohRveTPV commentedSetting this back to 7.x-1.x-dev because it helps me with bug triage. (I filter on 7.x-1.x bug reports because I am more interested in fixing bugs in the stable branch than 7.x-2.x or 8.x-3.x.)
Please feel free to change the version as needed, though.
Comment #61
Sebastien M. CreditAttribution: Sebastien M. commentedThis issue seems to concern a 'out of memory' issue.
That's why I passed 'TRUE' to avoid any static memory consuption.
Concerning the time of the cron, it's quite normal because it loads 100'000 users from database which is really huge.
I saw in your 8.x-3.x branch that's you use the 'range()' function to restrict database query.
May be it could be done on the 7.x branches.
Simple suggest.
Comment #62
AohRveTPV CreditAttribution: AohRveTPV commentedYou're right, the issue summary only mentions the memory usage problem. Execution time is also a problem, as discussed in the comments. If we fix memory usage, execution time will still be a problem. We will ultimately need a solution for to the general problem of processing 1000s of users efficiently.
I'll rewrite the issue summary.
We could go ahead and commit this change. There are possibly sites where memory usage is a problem but execution time is not a problem. So your change would be an improvement, even if not a fix for the general inefficiency problem.
Yes, we need to somehow avoid calling
user_load()
for 1000s of users in a single cron run.I'll plan to look into this. We probably need to process all users in a single cron run. Many sites are configured to run cron only once per day. In 7.x, expiration notifications are only sent when cron runs on the day they are due to be sent: #2833764: Send expiration warnings even if cron not run daily. So any users not processed in a daily cron run may miss expiration notifications.
Comment #63
AohRveTPV CreditAttribution: AohRveTPV commented(Removed comment that was supposed to be summary edit.)
Comment #64
AohRveTPV CreditAttribution: AohRveTPV commentedComment #67
AohRveTPV CreditAttribution: AohRveTPV commentedThanks for the memory patches. They should help some sites. Leaving 8.x-3.x memory patch for nerdstein to review/commit. (I've been working primarily on the D7 branches.)
Only the exceeded memory limits are fixed by the previous commits. We will ultimately need to fix both expiration memory usage and execution time.
Comment #68
malcomio CreditAttribution: malcomio as a volunteer and at Capgemini commentedRunning 7.x-1.14 with the patch from #54 applied on a site with 462091 users and 512M memory, we're still getting out of memory issues.
Comment #69
aaron.ferris CreditAttribution: aaron.ferris commentedI've been looking at a way of reducing the result set of users based on whether they have a password policy applicable role, patch attached.
This won't help sites that have a large user base who all have a password policy applicable role, but it helps in my implementation.
Im running 7.x-1.14.
Thanks
Aaron
Comment #70
AohRveTPV CreditAttribution: AohRveTPV commentedI'm working on a 7.x-1.x patch to change the expiration processing in two ways:
- Processes users in chunks (e.g., 1000 at a time).
- Replaces per-user
user_load()
calls with single database queries that operate on the chunk of users.I think it will solve the out of memory issues, but execution time may still be long depending on the number of users. Will have to do some testing to quantify. Will perhaps also use the Batch API.
Just wanted to mention in case there's something basically wrong with this idea, or something else that could be done.
Comment #71
joachim CreditAttribution: joachim commented> I think it will solve the out of memory issues, but execution time may still be long depending on the number of users. Will have to do some testing to quantify. Will perhaps also use the Batch API.
For something that runs in cron, you should use the batch queue API.
Comment #72
johnhanley CreditAttribution: johnhanley as a volunteer commentedThe following patch is a modified version of #69:
1) When building user role query, added inner join to table password_policy to retrieve roles for enabled password policy roles only
2) Exit password_policy_cron() if no roles enabled
3) Replaced potential multiple OR conditions with single IN condition
4) Restored missing 'name' field from select criteria
Comment #73
johnhanley CreditAttribution: johnhanley as a volunteer commentedSame patch as above sans extraneous lines.
Applied successfully to 7.x-1.15 (and maybe dev, but we're not running dev version on production).
Comment #74
nerdsteinI'd like to recommend we split out two separate issues for ongoing D7 improvements and D8 work because there are two separate threads getting munged together and it's hard to follow.
I'm recommending we take https://www.drupal.org/project/password_policy/issues/2252541#comment-12... as a candidate patch for a new D8 issue.
Comment #75
andypostGood idea to separate d8 issue!
Comment #76
johnhanley CreditAttribution: johnhanley as a volunteer commentedI've actually rewritten the cron hook to use Queue API. This includes consolidating several queries.
We're in the midst of testing, but it looks good so far. I'll try to post the code at some point in the near future.
Comment #77
banviktor CreditAttribution: banviktor at CARD.com commentedThe last few patches don't take into consideration that users_roles don't hold references to rid 2 (authenticated role) and as a result they filter out a bit too much.
I think that the variable password_policy_process_on_cron that was included in patches <= #40 is a useful addition. This patch resembles #40 the most and cleanly applies to 7.x-1.x (and 7.x-1.15)
Comment #78
AohRveTPV CreditAttribution: AohRveTPV commentedPatch refactors expiration code and removes calls to
user_load()
.The expiration code is currently overly complex and in serious need of refactoring, in my opinion. For instance, the expiration cron code is a single 140+ line function with many variables, deep nesting, and hard-to-understand logic. It makes it difficult to work on making improvements to the expiration code. So I think the top priority is to simplify that code, and this patch attempts to do that.
Code pertaining to expiration is moved from
password_policy.module
topassword_policy.expiration.inc
. This reduces the length ofpassword_policy.module
from 1835 lines (ridiculous) to 1473 lines (a little less ridiculous). The long cron handling code is broken into several functions.After refactoring the code some, the
user_load()
calls were removed. I believe theuser_load()
are the costliest part of the expiration cron. I have not tested the changes for a large number of users, but I believe they should help.A difficulty is that warning emails can be customized for the user: user tokens can be used in the email, and the user's language should be used for the email. Loading the user is normally required for both those customizations, but I came up with workarounds that do not require loading the user. Not sure they're entirely correct, but all expiration tests pass.
Perhaps removing the
user_load()
calls is problematic or not enough of a performance improvement, and queueing is a better solution (as suggested in #71, #76, etc.) Or maybe both could be done. Either way, I think it's important to refactor the expiration code as a first step. I plan to make a patch that is just the refactoring without any attempts at performance improvements.(I hope this comment is not insulting to the authors of the existing expiration code. I appreciate them getting it working to the extent that it does! The code probably grew in its complexity and scope over time.)
Comment #79
AohRveTPV CreditAttribution: AohRveTPV commentedI have some time I could spend on this issue. I have some questions:
1. Does this issue still affect anyone using 7.x-1.x? (Or has everyone moved on to D8?)
2. If so, how many users does your site have?
3. If so, is the problem running out of memory, that cron takes too long to run, or both?
I wouldn't want to put effort into fixing this problem if no one really cares.
Comment #80
AohRveTPV CreditAttribution: AohRveTPV commentedNo one seems to care about fixing this in 7.x.
Comment #81
greggles1. Yes.
2. over a million.
3. I'm not sure any more. I think it was chewing up RAM to the point that the drush process running cron would exhaust RAM on the server and get killed. The site is using the patch in #77 to stay under those limits. The time that it takes to run cron isn't as important when running it via drush.
IMO the ideas presented in #78 are good, but make a big departure from how things were done. Big departures in architecture are great to do in D8. The changes proposed in #77 are small and hopefully easy to review. Did you consider them? Moving this to needs review for #77 or #78.
Comment #84
jasonawantI think this is still an issue with 8.x version, any one else?
I think it can occur here in password_policy_cron() when there are too many users that have been loaded and then attempted to save.
Comment #85
wellsRe: #84 --
There are efforts to improve this in the 8.x branch, but none have been fully reviewed and merged at this point.
See:
Comment #86
gregglesFor anyone that this affects in Drupal 7, does the fix from #77 help you?
Comment #87
AohRveTPV CreditAttribution: AohRveTPV commentedgreggles, I'll review #77. A few years back I felt like refactoring the expiration code was the best way forward, and I was working on that, but I guess I didn't get it done. So #77 may be the most realistic solution. Sorry for the delay.