Problem/Motivation

When deleting large datasets the delete multiple API does not invoke all dependent hook_[node|comment|user]_delete.
e.g. deleting a node does not invoke hook_comment_delete()

Proposed resolution

Use the queue API to delete large datasets so we can invoke all hooks.

Remaining tasks

This issue is looking for further direction. Some of the questions/issues are:

  • Does this issue cause problems/confusion for users that are expecting all records to be deleted?
  • Should the user be aware that the operation is happening via queues/cron?
  • If the deletes are happening through the UI, the batch API could process everything. Could the UI create a queue and parse to the batch API to process so you don't have to wait for cron to complete?

User interface changes

When deleting large datasets, you are advised that only a subset of the objects have been deleted and the remaining will been queued to be deleted on cron. A link is shown to run the cron manually if user has permission to "administer site configuration"

API changes

[node|comment|user]_delete_multiple no longer will always delete all items instantly, instead it deletes the the first 20 (or the defined queue threshold number) with the remaining items deleted on cron through a queue.

If the number of items being deleted is less than the queue threshold number, then the current behavior of the system will remain unchanged.

Original report by fajerstarter

Deleting a node that has comments deletes its comments. The problem is that this doesn't initiate hook_comment($op='delete'). Modules using hook_comment need this. This might be the case for Drupal-cvs as well though this hasn't yet been tested.

Files: 
CommentFileSizeAuthor
#140 Delete-later.png34.32 KByoroy
#140 Delete-later-1.png48.44 KByoroy
#140 Delete-now-2.png58.08 KByoroy
#140 Delete-later-3.png45.99 KByoroy
#132 job-queue-89181-132.patch10.34 KBxjm
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch job-queue-89181-132.patch. Unable to apply patch. See the log in the details link for more information. View
#132 interdiff-126-132.txt6.41 KBxjm
#126 job-queue-89181-126.patch8.69 KBxjm
PASSED: [[SimpleTest]]: [MySQL] 33,985 pass(es). View
#126 interdiff-112-126.txt3.21 KBxjm
#123 interdiff-112-121.txt3.2 KBxjm
#122 job-queue-89181-121.patch8.68 KBxjm
PASSED: [[SimpleTest]]: [MySQL] 33,973 pass(es). View
#112 job-queue-89181-111.patch8.42 KBxjm
PASSED: [[SimpleTest]]: [MySQL] 33,984 pass(es). View
#112 interdiff.txt441 bytesxjm
#108 89181-bulk-operations-using-reliable-job-queue.patch8.41 KBfranz
PASSED: [[SimpleTest]]: [MySQL] 33,980 pass(es). View
#106 89181-bulk-operations-using-reliable-job-queue.patch8.44 KBfranz
PASSED: [[SimpleTest]]: [MySQL] 33,796 pass(es). View
#105 89181-bulk-operations-using-reliable-job-queue.patch8.46 KBfranz
PASSED: [[SimpleTest]]: [MySQL] 33,808 pass(es). View
#101 89181-bulk-operations-using-reliable-job-queue-101.patch7.95 KBDésiré
PASSED: [[SimpleTest]]: [MySQL] 33,802 pass(es). View
#98 89181-bulk-operations-using-reliable-job-queue.patch7.81 KBfranz
PASSED: [[SimpleTest]]: [MySQL] 33,064 pass(es). View
#93 89181-bulk-operations-using-reliable-job-queue.patch5.29 KBfranz
PASSED: [[SimpleTest]]: [MySQL] 32,886 pass(es). View
#93 queue_delete.png20.82 KBfranz
#86 89181-bulk-operations-using-reliable-job-queue.patch3.62 KBfranz
PASSED: [[SimpleTest]]: [MySQL] 32,866 pass(es). View
#78 89181-bulk-delete-nodes-comments-users-using-reliable-job-queue-4.patch4.25 KBtizzo
PASSED: [[SimpleTest]]: [MySQL] 23,324 pass(es). View
#76 89181-bulk-delete-nodes-comments-users-using-reliable-job-queue-4.patch4.59 KBtizzo
PASSED: [[SimpleTest]]: [MySQL] 23,320 pass(es). View
#75 89181-bulk-delete-nodes-comments-users-using-reliable-job-queue-2.patch8.31 KBtizzo
PASSED: [[SimpleTest]]: [MySQL] 23,312 pass(es). View
#73 89181-bulk-delete-nodes-comments-users-using-reliable-job-queue.patch9.17 KBtizzo
FAILED: [[SimpleTest]]: [MySQL] 23,268 pass(es), 6 fail(s), and 4 exception(es). View
#63 queued_deletes_5.patch5.44 KBYesCT
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch queued_deletes_5_0.patch. View
#57 queued_deletes.patch5.44 KBmr.baileys
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch queued_deletes_5.patch. View
#51 queued_deletes.patch4.63 KBmr.baileys
PASSED: [[SimpleTest]]: [MySQL] 20,044 pass(es). View
#48 queued_deletes.patch6.71 KBmr.baileys
FAILED: [[SimpleTest]]: [MySQL] 20,147 pass(es), 3 fail(s), and 863 exception(es). View
#42 queued_deletes.patch7.74 KBmr.baileys
PASSED: [[SimpleTest]]: [MySQL] 19,925 pass(es). View
#41 queued_deletes.patch7.13 KBmr.baileys
PASSED: [[SimpleTest]]: [MySQL] 19,928 pass(es). View
#39 queued_deletes.patch7.22 KBmr.baileys
PASSED: [[SimpleTest]]: [MySQL] 19,934 pass(es). View
#37 queued_deletes.patch7.38 KBmr.baileys
PASSED: [[SimpleTest]]: [MySQL] 19,936 pass(es). View
#28 comment-89181-28.patch7.88 KBdev.cmswebsiteservices
PASSED: [[SimpleTest]]: [MySQL] 17,857 pass(es). View
#22 comment-89181-22.patch8.82 KBdev.cmswebsiteservices
FAILED: [[SimpleTest]]: [MySQL] Invalid patch format in comment-89181-22.patch. View
#25 comment-89181-24.patch8.57 KBdev.cmswebsiteservices
PASSED: [[SimpleTest]]: [MySQL] 0 pass(es). View
#12 comment_nodeapi_delete-batch-89181-11.patch4.46 KBcatch
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch comment_nodeapi_delete-batch-89181-11_1.patch. View
#11 comment_nodeapi_delete-batch-89181-11.patch4.54 KBdropcube
FAILED: [[SimpleTest]]: [MySQL] Invalid patch format in comment_nodeapi_delete-batch-89181-11.patch. View
#7 comment_nodeapi_delete.patch964 bytesPancho
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch comment_nodeapi_delete.patch. View
#1 comment.module_59.patch760 bytesArto
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch comment.module_59.patch. View

Comments

Arto’s picture

Status: Active » Needs review
FileSize
760 bytes
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch comment.module_59.patch. View

Attached is a patch against DRUPAL-4-7 CVS that invokes hook_comment('delete') on each of the node's comments before they are deleted.

Zen’s picture

I don't understand the need for $comment->name in the patch. Please explain and add comments to the patch if it is non-obvious :)

Thanks,
-K

Arto’s picture

The $comment->name handling isn't my idea - the patch is simply replicating the way comments are loaded everywhere else in the comment module code. I might add that none of the other identical lines (of which there are half a dozen) are commented, either ;-)

That said, its purpose seems simple enough: if the comment was created by a registered user, set the name property to the user's proper name, otherwise use whatever name was provided by the anonymous user who submitted the comment.

So this is simply API consistency towards third-party code.

Zen’s picture

Version: 4.7.3 » 4.7.4
Status: Needs review » Needs work

Cool. Won't it be better if the ORDER BY was c.cid DESC rather than timestamp? It's the primary key and this will also ensure that comments are deleted in reverse order of creation thus ensuring that no "hierarchical" mishaps occur in any contrib module.

Please also submit patches against root.

Cheers,
-K

killes@www.drop.org’s picture

Version: 4.7.4 » 5.x-dev

Please fix in HEAD first.

StevenPatz’s picture

Version: 5.x-dev » 6.x-dev

head is now 6.x? right?

Pancho’s picture

Status: Needs work » Needs review
FileSize
964 bytes
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch comment_nodeapi_delete.patch. View

Rerolled against HEAD, also incorporated Zen's suggestion to use ORDER BY c.cid DESC.
Please test.

catch’s picture

Agree with ordering by cid instead of timestamp - this recently fixed a bug with comment display.

Added two nodes with some comments, deleted one via edit tab, one via admin/content/node. Comments went too. Don't have a hook_comment example to test with but it seems fine.

Having said that, is that while loop going to eat memory on hundreds or thousands of comments? Does it need to be batched instead?

moshe weitzman’s picture

Priority: Normal » Critical

it is a bad API omission to skip this hook. promoting to critical. we could get away without batching here but it would be nice.

jvandyk’s picture

Just an observation that on a large comment tree if actions are assigned to the "after deleting a comment" trigger, that's going to be a lot of actions running.

dropcube’s picture

FileSize
4.54 KB
FAILED: [[SimpleTest]]: [MySQL] Invalid patch format in comment_nodeapi_delete-batch-89181-11.patch. View

The patch follows all the suggestions above, implementing a batch processing to prevent timeouts when deleting a large number of comments where actions may be triggered, etc.

I have generated with devel_generate several nodes each with 50 comments. Tested deleting a node from the node edit form and the mass deletion of nodes from the content admin pages. The comments are being deleted, the hook is being invoked and assigned actions to 'After deleting a comment' are being executed as well.

The problem I see here for the batch processing is when comment_nodeapi is reached without a form submission (programmatically) . Any suggestions ? Please test.

catch’s picture

FileSize
4.46 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch comment_nodeapi_delete-batch-89181-11_1.patch. View
4.44 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch comment_nodeapi_delete-batch-89181-11_0.patch. View

Batch processing on lots of nodes and comments looks fine, also tested with 1 node/1 comment, but

1. It introduces strings
2. the strings needed some work

Rather than set to needs work, I've edited the patch to change some of the strings and tidied up some of the phpdoc since these are all trivial changes.

Gábor Hojtsy’s picture

Version: 6.x-dev » 7.x-dev
Priority: Critical » Normal
Status: Needs review » Needs work

- I don't think the batch processing stuff deserves a "NOTE:", it can be a simple explanation sentence there

- t('Deleting node comments') should not use the word "node", it might be clear enough still if we remove that word

- what's an item in "@count items successfully processed"? We are talking about comments or blocks of comments in 10?

- there is some inconsistency in how this works... we call the delete operation on comments but do not delete any data until we called all delete hooks... this means that we might end up in the middle with a broken batch with modules informed about comments being deleted, which are in fact not. unfortunately it is far from convenient to delete comments one by one, because we would need to keep the threading information, etc.

After all I am not convinced that this is a serious API omission and deserves to be critical. Those implementing hook_comment could just as well implement hook_nodeapi() and act on its delete operation as well. Thinking that we are consistent in calling these before the act (ie. node deletion, comment deletion), so that the given module could just as well act there.

I believe that it would be great to polish up this part of the API as well in Drupal 7. More consistent APis was one of Dries' Drupal 7 goals as well :)

dropcube’s picture

I agree with Gábor.

Also, as I stated at #11 there is a problem with batch processing when comment_nodeapi is not called from a form submit handler.

yched’s picture

Moshe pointed me to that thread a few days ago, but I've been mostly afk since then.

Yes, 'nested' batch processing are indeed problematic. Batches are a neat tool for simple workflows (typically, form submission), but they're probably not (yet ?) the answer to all timeout issues, esp. in API functions or hooks.
Batches are highly context sensitive :
- they require a browser to iterate the requests (through ahah or redirect) - thus, non cron-friendly; and any API / hook could potentially run in cron...
- they require that the current page execution is interrupted at some point to be redirected (drupal_goto) to the batch processing page. FAPI takes care of doing this at the right time when the batch is set during a form submission, but nothing 'catches' a batch set during an API call.
- they can make a convoluted execution sequence (your function returns without having its publicized task performed yet).

We're facing the same sort of issues in using batches for CCK cleanup (which, er, was the original reason I started the work on batches...).

dropcube’s picture

Title: Deleting a node with comments doesn't initiate hook_comment($op='delete') » Deleting a node with comments doesn't invoke hook_comment_delete()
dropcube’s picture

Priority: Normal » Critical

I still think this is critical as it is an API omission.

Deciphered’s picture

Subscribing to this issue due to the creation of a duplicate issue... I did search, I swear.

catch’s picture

Issue tags: +Performance

Some of this went in with http://api.drupal.org/api/function/comment_delete_multiple/7 vs. http://api.drupal.org/api/function/comment_node_delete/7

However:

1. hook_comment_delete() isn't multiple
2. field_attach_delete() isn't multiple
3. We don't use batch there.

So bumping this and tagging with performance. Actual patch will need a lot of updating, but would be good to clear up these cases which could easily bring your site down if you have a node with a few thousand comments that needs deleting.

catch’s picture

Title: Deleting a node with comments doesn't invoke hook_comment_delete() » Batch node_delete_multiple() and comment_delete_multiple()
sun’s picture

Yes, this code is horribly broken. Not sure whether a proper fix can be backported, so we should get this done before ALPHA1.

I'll try to work on a patch, if no one beats me to it.

dev.cmswebsiteservices’s picture

Assigned: dev.cmswebsiteservices » Unassigned
Status: Needs review » Needs work
FileSize
8.82 KB
FAILED: [[SimpleTest]]: [MySQL] Invalid patch format in comment-89181-22.patch. View

Batch processing has been implemented in this code.
I found that comments are also not deleted when a node is deleted from the system. So updated the code for this too.

dev.cmswebsiteservices’s picture

Assigned: Unassigned » dev.cmswebsiteservices
Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, comment-89181-22.patch, failed testing.

dev.cmswebsiteservices’s picture

Status: Needs work » Needs review
FileSize
8.57 KB
PASSED: [[SimpleTest]]: [MySQL] 0 pass(es). View
sun’s picture

Assigned: Unassigned » dev.cmswebsiteservices
Status: Needs work » Needs review

This issue is critical, as it seems to solve the following issues:

#355905: node_delete_multiple() has to use batch API
#355926: Add comment_mass_update()

If one of those is not fixed by this patch, then the corresponding needs to be re-opened, since both are critical for D7.

sun’s picture

Status: Needs review » Needs work
+++ modules/comment/comment.module	12 Feb 2010 13:07:14 -0000
@@ -1522,13 +1548,56 @@ function comment_delete_multiple($cids) 
+function _comment_mass_delete_batch_process($cids, &$context) {
...
+  // Process comments by groups of 5.
+  $count = min(5, count($context['sandbox']['cids']));
+  for ($i = 1; $i <= $count; $i++) {

+++ modules/node/node.module	12 Feb 2010 13:07:29 -0000
@@ -1101,40 +1101,107 @@ function node_delete($nid) {
+function _node_mass_delete_batch_process($nids, &$context) {
...
+  // Process nodes by groups of 5.
+  $count = min(5, count($context['sandbox']['nids']));
+  for ($i = 1; $i <= $count; $i++) {

Batch API takes over this "splitting" for you, no need to do it manually. Just process one item per invocation and be done with it.

Powered by Dreditor.

dev.cmswebsiteservices’s picture

FileSize
7.88 KB
PASSED: [[SimpleTest]]: [MySQL] 17,857 pass(es). View

Thanks Sun,
attached patch is the updated patch as per #27 update.

dev.cmswebsiteservices’s picture

Status: Needs work » Needs review
moshe weitzman’s picture

we are about to add user_delete_multiple in #705306: user_cancel_delete method calls into a "standard" user_delete_multiple API. if that gets in first, then this issue should batch user deletion as well.

catch’s picture

Status: Needs review » Needs work

That's in now, so we need an update here (or decide to commit #28 as is and do users in yet another issue).

sun’s picture

+++ modules/comment/comment.module	16 Feb 2010 11:02:44 -0000
@@ -1512,6 +1512,32 @@ function comment_delete($cid) {
+    batch_set($batch);  ¶

+++ modules/node/node.module	16 Feb 2010 11:02:55 -0000
@@ -1103,37 +1103,99 @@ function node_delete($nid) {
+    drupal_set_message(t('The delete has been performed.'));  ¶
...
+ ¶

Trailing white-space here.

+++ modules/comment/comment.module	16 Feb 2010 11:02:44 -0000
@@ -1512,6 +1512,32 @@ function comment_delete($cid) {
+ */
+
+function _comment_mass_delete_helper($cids) {

Superfluous blank line here.

+++ modules/comment/comment.module	16 Feb 2010 11:02:44 -0000
@@ -1525,13 +1551,52 @@ function comment_delete_multiple($cids) 
+function _comment_mass_delete_batch_process($cids, &$context) {
...
+
+

Duplicate newline here.

+++ modules/comment/comment.module	16 Feb 2010 11:02:44 -0000
@@ -1525,13 +1551,52 @@ function comment_delete_multiple($cids) 
+function _comment_mass_delete_batch_finished($success, $results, $operations) {
+  if (!$success) {
+    drupal_set_message(t('An error occurred and processing did not complete.'), 'error');
+    $message = format_plural(count($results), '1 item successfully processed:', '@count items successfully processed:');

+++ modules/node/node.module	16 Feb 2010 11:02:55 -0000
@@ -1103,37 +1103,99 @@ function node_delete($nid) {
+function _node_mass_delete_batch_finished($success, $results, $operations) {

The finished callbacks display both error and success messages in the error case (only).

+++ modules/node/node.module	16 Feb 2010 11:02:55 -0000
@@ -1103,37 +1103,99 @@ function node_delete($nid) {
+    ->condition('nid', $nid, '=')
...
+    ->condition('nid', $nid, '=')
...
+    ->condition('nid', $nid, '=')

The 3rd parameters can be removed here, '=' is the default.

+++ modules/node/node.module	16 Feb 2010 11:02:55 -0000
@@ -1103,37 +1103,99 @@ function node_delete($nid) {
+function _node_mass_delete_batch_process($nids, &$context) {
...
+  // Store result for post-processing in the finished callback.
+  $context['results'][] = l($node->title, 'node/' . $node->nid);

Does it make sense to output a link to a node that has been deleted? :)

148 critical left. Go review some!

yched’s picture

As I wrote in #15, using Batch API in an API function is dangerous...
Batch API assumes a client browser handle the redirects -> no batch in cron, for instance.
If your API func relies on batch API, then it's not an API func anymore (cannot be called safely from anywhere)

catch’s picture

#34 makes sense, and we have a shiny new queue API which could handle this instead.

catch’s picture

Title: Batch node_delete_multiple() and comment_delete_multiple() » Use queue API for node and comment multiple deletes
mr.baileys’s picture

Status: Needs work » Needs review
FileSize
7.38 KB
PASSED: [[SimpleTest]]: [MySQL] 19,936 pass(es). View

Here is an attempt at converting this to use the Queue API instead of the Batch API. Processing for multiple node and comment deletes starts immediately, but if for some reason (execution time?) the processing gets interrupted, the next cron invocation performs the remaining deletes.

moshe weitzman’s picture

Status: Needs review » Needs work

Nice approach.

We have a more terse 'Implements of ...' instead of 'Implementation of hook_cron_queue_info().'

mr.baileys’s picture

Status: Needs work » Needs review
FileSize
7.22 KB
PASSED: [[SimpleTest]]: [MySQL] 19,934 pass(es). View
  • Changed "Implementation of ..." to "Implements ..." in the Doxygen blocks for the hook_cron_queue_info() implementations;
  • Removed the hard-coded variable_set('queue_class_comment_deletes', 'BatchQueue');. I guess the default Queue implementation is adequate, and we should not force a certain implementation.
yched’s picture

should not use BatchQueue indeed.

I'm wondering about the "process til timeout and do the rest on cron" approach, though.
- what if cron runs while the initial pass is still running ?
- timeout might happen in the middle of a node/comment deletion. Unless deletion is atomic, this semi-deleted node / comment might cause trouble ?

mr.baileys’s picture

FileSize
7.13 KB
PASSED: [[SimpleTest]]: [MySQL] 19,928 pass(es). View

- what if cron runs while the initial pass is still running ?

My understanding is that since we are using queues, cron and the running process will both be emptying the queue simultaneously, and given the fact that claimed items cannot be claimed by the other process, this should work fine, right?

With the current patch, node_delete_multiple is basically saying: a) I'll delete all the nodes, b) as quickly as possible, but c) might not have finished when I return(*)

(*) actually, not yet implemented. I guess we should cap the number of items to process during each batch to actually prevent time-outs and then have node_delete_multiple return a value indicating whether or not all nodes were deleted? This way the user can be warned through the interface ("xx nodes were delete, the remaining nodes will be deleted during the next cron run").

- timeout might happen in the middle of a node/comment deletion. Unless deletion is atomic, this semi-deleted node / comment might cause trouble ?

Yes, it might, and unfortunately the delete operations are not atomic. Additionally, while the expiring lease on the queue item will ensure the item is processed again later on, the current patch prevents this from working since the deletion logic is wrapped in a if ($comment = comment_load($cid)) { block (so if aborted halfway through, the comment itself has been removed, but might still have field values). Attached patch removes the load check so that aborted deletions can be processed during a subsequent run.

The deletion of the actual node/comment record is the first action taken when processing each item, so worst case scenario the processing is aborted and we end up with, for example, an orphaned node revision or field values which no longer have a node to attach to (although just temporarily, it'll be cleaned up automatically later on). Not sure if this poses an issue, and if it does how to solve this. Capping the number of items to process as indicated earlier in this comment might provide the solution, at least when it comes to timeouts.

It also does not seem to be specific to the implementation of the queue: both in the current situation (looping through all nodes/comments without batch/queue) as in the batch implementation there is the risk of the operation being aborted, but I guess that only when working with the queue, these half-deleted nodes are cleaned up automatically at a later stage.

mr.baileys’s picture

FileSize
7.74 KB
PASSED: [[SimpleTest]]: [MySQL] 19,925 pass(es). View

Attached patch should take care of aborted node/comment deletions due to timeouts during the initial invocation by limiting the number of comments/nodes to process.

moshe weitzman’s picture

I might be misreading this patch (hopefully).

I'm not OK with the amount of defensiveness we are introducing. If we want protection against half deleted nodes, we should add a transaction like node_save() has. We worked hard during this release to speed up deletes with node_delete_multiple which deletes many nodes from 3 node tables in just one query. AFAICT, this patch is neutering that.

mr.baileys’s picture

Status: Needs review » Needs work

Needs work for #43:
@moshe

I'm not OK with the amount of defensiveness we are introducing.

are you referring to the fact that we're blowing the cache on each delete (whereas node_delete_multiple only cleared the cache after deleting all nodes) or the batch size limit?

catch’s picture

Status: Needs work » Needs review

It feels like we could keep the existing node_delete_multiple() logic up to a certain number of items - let that be set by a variable, then use queue for any nids or cids left over, at which point we only want to do one or a time for simplicity.

catch’s picture

Status: Needs review » Needs work

cross-posted.

moshe weitzman’s picture

I'm not too keen on dropping into one at a time mode, even during cron. We should batch there too ... I'm have no opinion about when we clear static cache.

mr.baileys’s picture

Status: Needs work » Needs review
FileSize
6.71 KB
FAILED: [[SimpleTest]]: [MySQL] 20,147 pass(es), 3 fail(s), and 863 exception(es). View

Ok, what about this approach (for node deletions):

  1. Load all the nodes to be deleted via node_load_multiple(), keeping the performance benefit over individual node_load() calls;
  2. Queue all the nodes loaded in the previous step;
  3. Delete all the nodes in one go, keeping the performance benefit over individual deletes;
  4. Individually process "clean-up" for each node (clean-up: invoke hook_node_delete(), remove from search index, etc.)

This basically follows the current approach in HEAD, where all nodes are deleted from the database first, and then clean-up is done node-per-node. The only difference is that this node-per-node post-delete-processing is now batched/queued.

Advantages:

  • We keep the performance benefits of node_delete_multiple() intoduced in #457080: Add node_delete_multiple()
  • More robust: if processing is halted during the clean-up phase, it'll be resumed at the next cron run

Disadvantages:

  • Nodes are deleted immediately, but in some cases artefacts of those nodes might linger until clean-up during a later cron run. Still better than the current situation though, where these artefacts never get removed if processing is halted.
  • Since we need to queue the full node object for each node, we are basically undoing much of the performance improvements from the node_delete_multiple() call, since there's a single DB call for each createItem call. Maybe we should introduce DrupalQueueInterface::createMultipleItems() ;)

Comparing this to catch's suggestion (do XX deletes in one go, process the rest individually) or to not queuing at all, I honestly don't know what the best solution would be.

Also, this issue was originally marked critical because the missing hook_comment('delete') invocation on node deletes. It seems that this has since been addressed via comment_node_delete(), so should this issue still be considered critical?

catch’s picture

If we can stick arbitrary stuff into the queue items, what about chunking the $nids array at a certain size, then adding an array of nids for each queue item instead of one at a time? WIth a variable batch size that could then be 50 or 500 at a time - we'd want a different variable for nodes and comments.

On creating queue items, it's a shame that it requires a db query per item, but it's also possible to use a non-SQL queue backend (and one already exists in mongodb module), so high performance sites can skip some of that cost. I'm not sure that's too much of an issue.

I think the reason this is critical is because of user_cancel() which can potentially lead to the deletion of every node and every comment a user has ever posted. Cancelling a user with thousands of nodes and comments is an edge case though, so certainly this would be major if and when we get that priority.

Status: Needs review » Needs work

The last submitted patch, queued_deletes.patch, failed testing.

mr.baileys’s picture

Status: Needs work » Needs review
FileSize
4.63 KB
PASSED: [[SimpleTest]]: [MySQL] 20,044 pass(es). View

Ok, reworked this using catch's approach (just doing nodes for now, comments are easy once we've settled on an approach):

  • The array of node IDs to be deleted is split up into chunks of size node_delete_batch_size (variable with a default of 50) or smaller;
  • the first batch is processed immediately, all further batches are queued for processing during cron;

Advantages:

  • Since instead of queueing every nid/node we are queuing batches of 50 nids, the performance hit of the queueing should be negligable (one query per batch size nodes). It is possible to
    • Increase the batch size;
    • Use/implement a queue that's faster than the SystemQueue (for example the MongoDB implementation catch refers to)

    to further improve performance.

Disadvantages:

  • Dead node walking: since only the first batch is processed immediately, there can be a number of nodes lingering until the next cron run(s). Compare this to #48 where all nodes were deleted immediately, but related info was cleaned up later. This might lead to "WTF, I deleted 500 nodes and 450 are still there!"-moments.
  • This probably requires string changes in the interface to alert users of the above issue.
catch’s picture

Thanks for rolling this one, overall this feels like a better balance of performance and scaling to me.

I really like that _node_delete_multiple() is just a renamed node_delete_multiple(). On the other hand, node_delete_multiple() starts to look a bit weird when all it does is build a queue and hand it off to a worker. This makes me begin to think we might want to instead add a new node_delete_batch() function - which we call when there's not a finite amount of nodes to delete, leaving node_delete_multiple() as a top level API function. Depends whether we want to give people the choice or not.

String changes - yeah it'd definitely be weird to see comments or nodes hanging around after you deleted them, we might want to even give people a run-cron link as well?

chx’s picture

Status: Needs review » Needs work

I am deeply concerned by the fundamental misunderstanding of what this queue API is good for. Did we did not document enough? We mention that "there is no guarantee that items will be delivered on claim in the order they were sent" and that "The system also makes no guarantees about a task only being executed once" -- hm, that needs to be changed to "The system also makes no guarantees about a task only being executed once or at all".

It's awesome for cron-ish like tasks which noone cares when they will be executed like aggregator refresh. It's less awesome when you want some content disappear for good. If you want to use the queue for that you better bolt down the queue backend to be used to SQL or face some frightening consequences like the delete simply not happening.

catch’s picture

Assigned: dev.cmswebsiteservices » Unassigned

Well curently the delete won't happen if you nuke a user with thousands of comments and hit PHP max execution time. However I agree that trading "might not delete if you hit max execution time" for "might not delete due to a known limitation in the API" isn't a good plan.

That leaves us creating a comment_delete_batch() which only ever gets triggered from submit handlers - would solve the user_cancel() case. If you trigger a delete from command line max_execution_time isn't an issue anyway. The other option would be moving this to D8 and creating a distinction between lossy and non-lossy queue implementations (and possibly some cleverness to try to maximise work done on the form submission in cases like this where batch processing is a last resort anyway - mikey_p mentioned that jobqueue module has something like this already).

chx’s picture

chx’s picture

Assigned: dev.cmswebsiteservices » Unassigned
mr.baileys’s picture

FileSize
5.44 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch queued_deletes_5.patch. View

@chx: While I was aware of the "no guarantee about order" and "possible duplicate execution" risks with the Queue API (and I don't think either of those poses a problem for this issue), I did not know that the API does not even offer guarantee of execution at all (in which case, yes that should be documented ;) ). I've opened #784962: DrupalQueueInterface offers no guarantee about execution. for this.

Re-rolled the patch with catch's suggestion of introducing node_delete_batch. Leaving this to needs work to address the queue issue. If I understand it correctly, we have the following options:

  1. Go with the node_delete_batch-option, but ensure that SystemQueue is used (or at least ensure that the used Queue implementation guarantees execution). Using the variable queue_class_node_deletes allows us to set the Queue implementation for the mass node deletion while still allowing it to be overriden.
  2. Go with the batch implementation but ensure that it's only used on form submits

I like the visual feedback offered by the batch solution, although when deleting multiple users and a significant amount of nodes on large sites, the queued approach might be better.

yched’s picture

As a nitpick, '*_batch' in function names is currently always related to "Batch API" batches. Batch and Queue APIs being "kind of adjacent but clearly distinct", using the word 'batch' for a function using queues is really confusing.
I know, an API stealing a common-language word is always unfortunate (Views, Field :-p). No other candidate showed up when designing the Batch API...

jbrown’s picture

Doing all mass node / comment operations with batch api / progress bar would be very cool.

chx’s picture

Would be cool -- but we do not want API functions to fire up batch API.

jbrown’s picture

Progress bar could be implemented for the Update button on admin/content as part of hook_node_operations() .

admin/content/comment also

YesCT’s picture

Status: Needs work » Needs review

wondering if the test bot will notice the patch

YesCT’s picture

FileSize
5.44 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch queued_deletes_5_0.patch. View

re-attaching the patch from #57

YesCT’s picture

opps I might not have needed to re-attach it. (sorry, trying to learn the test bot)

Status: Needs review » Needs work

The last submitted patch, queued_deletes_5.patch, failed testing.

catch’s picture

There's a whole series of related issues to this:

#814990: node_mass_update() can't rely on a browser session means that node/8 changed node_mass_update() to a form submit helper to an API function, but API functions can't rely on batch API.

#814884: comment_user_cancel() should use comment_save() needs to happen, but would be better with the equivalent of a node_mass_update(), if node_mass_update() was in a fit state to be copied.

So... we need all multiple updates, for all entities, to go through a helper, which at this point in the cycle, should probably use batch if it's possible, and if not just load and updated/delete everything and hope.

YesCT’s picture

In preparation for the new "major" issue priority, I'm tagging this (not actually critical) issue as priority-major.
See: http://drupal.org/node/669048#comment-3019824 (#669048: We should have four priorities not three (introduce a new 'major' priority)) for more information about the coming major issue priority.

chx’s picture

DrupalReliableQueueInterface at your service.

catch’s picture

Version: 7.x-dev » 8.x-dev
Category: bug » task

I've given this a lot of thought, and we can neither use batch API nor queue API for this as they currently exist in D7 - both are at least as risky as just a PHP timeout, so I'm moving this to Drupal 8.

Damien Tournoud’s picture

I opened #832572: Add a standard worker queue as a push forward to provide a way for modules to easily register long-running bulk operations.

marcingy’s picture

Priority: Critical » Major

Changing to major as per tag.

MustangGB’s picture

Tag update

tizzo’s picture

FileSize
9.17 KB
FAILED: [[SimpleTest]]: [MySQL] 23,268 pass(es), 6 fail(s), and 4 exception(es). View

This patch is based on the central job queue chix posted at #832572: Add a standard worker queue.

It adds a simple bit of logic to node, user and comment delete to split up deletes that are more than 100 items and adds them to the central job queue for later delete_multiple without otherwise altering the guts of any of the delete_multiple stuff and without introducing any new functions (aside from the ones added by chix's patch).

tizzo’s picture

Status: Needs work » Needs review
tizzo’s picture

FileSize
8.31 KB
PASSED: [[SimpleTest]]: [MySQL] 23,312 pass(es). View

I inadvertently added some code to delete all nodes of the type with this patch, that should really be separate and at this issue: #874000: Automatically remove module's nodes upon module uninstallation. Updating without that portion.

tizzo’s picture

FileSize
4.59 KB
PASSED: [[SimpleTest]]: [MySQL] 23,320 pass(es). View

Cleaned up a the patch a bit and removed a one more thing that found its way in from #874000: Automatically remove module's nodes upon module uninstallation.

moshe weitzman’s picture

Title: Use queue API for node and comment multiple deletes » Use queue API for node and comment, user, node multiple deletes
Version: 8.x-dev » 7.x-dev

Now this is terrific bug fix, that addresses the root cause. I'm going to ask chx to take a look.

tizzo’s picture

FileSize
4.25 KB
PASSED: [[SimpleTest]]: [MySQL] 23,324 pass(es). View

Forgot to take out the stuff I had added to node_cron before using chx's central queue. Updating...

sun’s picture

+++ modules/comment/comment.module	27 Aug 2010 17:43:55 -0000
@@ -1578,6 +1578,14 @@ function comment_delete($cid) {
+  if (count($cids) > 100) {

Ideally, we'd perhaps use a (single) 'queue_bulk_treshold' (or similar) system variable for that 100 and set the default to 20. Also accounting for a variable value of 0, which would mean that the entire update goes through the queue.

Powered by Dreditor.

chx’s picture

Status: Needs review » Needs work

Already #79 needed work but this deifnitely needs comments to mention that after the if we have less than 100 or 20 or whatever.

tizzo’s picture

Will reroll tomorrow as per feedback.

sun.core’s picture

Still a major problem. @tizzo, any chance to resurrect the issue?

tizzo’s picture

@sun.core Yeah, this totally fell off my radar. I'll pick it up and run with it tomorrow AM.

I also have a fix that uses a batch operation to delete all the nodes of a type when you delete the type at the time. So this would work for programmatic deletions and that would work on manual ones. Assuming that hasn't been fixed.

catch’s picture

Version: 7.x-dev » 8.x-dev

Moving to Drupal 8, we could backport if it goes in there.

catch’s picture

Component: comment.module » entity system
franz’s picture

Status: Needs work » Needs review
FileSize
3.62 KB
PASSED: [[SimpleTest]]: [MySQL] 32,866 pass(es). View

I've reworked the patch and refactored the code. I believe this way we simplify and improve clarity. Also, the new function added contains documentation to explain #80 and a single reference to the variable suggested in #79.

This can be widely used for a whole range of mass operations. I've been directed to this issue, as an example, from #1065814: Vocabulary terms remain after removing field from content type which I'd like to make use of this implementation. This can become a robust way to work with any mass operation.

franz’s picture

I've read the whole thread, and now I'm confused. Why did the code in here duplicates what is being addressed at #832572: Add a standard worker queue ?

franz’s picture

Would be easy if I had a better summary before =)

moshe weitzman’s picture

Status: Needs review » Needs work

This still applies fine but doesn't seem to work quite right. I generated 40 nodes with devel_generate and then attempted to delete them all in one click using admin/content. That worked without this patch but with the patch it takes two tries. It seems to delete half of them during each submission. If this is expected behavior, the drupal_set_message() is wrong (it says that we deleted 40 during the first try).

franz’s picture

Status: Needs work » Needs review
franz’s picture

Issue tags: +Needs tests

I'll try this manually and we can then write some automated tests with this.

Well, it should do 20 each time, and it should run the 2nd time on cron.

franz’s picture

Ignore this comment, wrong issue...

franz’s picture

Status: Needs work » Needs review
FileSize
20.82 KB
5.29 KB
PASSED: [[SimpleTest]]: [MySQL] 32,886 pass(es). View

I changed a couple of things here.

First, the original worker split function was returning the last set and not the first (the last set is usually smaller than threshold).

Second, I changed the node_delete_multiple to return the number of nodes actually deleted. Then I used this to warn the user if content removal was enqueued. I'm providing a screenshot to demonstrate.

franz’s picture

Status: Needs review » Needs work

If that's a good logic, need to apply to other bulk deletes...

moshe weitzman’s picture

Status: Needs review » Needs work

In general, this is a good approach IMO. The wording and code comments could use some work. I will do that in a follow up. But first, I am having doubts about a global variable 'queue_bulk_threshold'. Some mutiple operations will be very light and thus their threshhold could be much higher. Ciould we not leave this up the caller somehow?

franz’s picture

When working on wording, you can make the "run cron" a link for that, I was too lazy to do it. =P

For comments, nodes and users the threshold can be the same, but it's true, sometimes it would be best to have an override as an argument.

Devin Carlson’s picture

I believe this issue addresses part of #1270100: Search module loads every comment into memory.

franz’s picture

Status: Needs work » Needs review
FileSize
7.81 KB
PASSED: [[SimpleTest]]: [MySQL] 33,064 pass(es). View

I've applied same logic to comments and users. I'll set this to needs review to get a review on the current code first, then we can work on wording and on tests.

dixon_’s picture

subscribing

moshe weitzman’s picture

I think the approach is good.

the word 'enqueued' is rarely used. i would say that 'Y comments have ben scheduled for deletion. You may run cron to delete them immediately.'

Désiré’s picture

FileSize
7.95 KB
PASSED: [[SimpleTest]]: [MySQL] 33,802 pass(es). View

here is a rerolled patch to further work

Status: Needs review » Needs work
Issue tags: -Performance, -Needs tests, -Needs issue summary update

The last submitted patch, 89181-bulk-operations-using-reliable-job-queue-101.patch, failed testing.

franz’s picture

Status: Needs work » Needs review
Issue tags: +Performance, +Needs tests, +Needs issue summary update
xjm’s picture

Status: Needs review » Needs work

Thanks for working on this patch. Looks like the rerolled version now passes testbot.

I took a close look at the documentation and UI text and found several things that should be cleaned up a bit. Reference: http://drupal.org/node/1354#functions

+++ b/core/modules/comment/comment.admin.incundefined
@@ -161,7 +161,10 @@ function comment_admin_overview_submit($form, &$form_state) {
+      drupal_set_message(format_plural(count($cids) - $deleted, 'Removal of 1 comment was enqueued to be run in background. Run cron to finish removal.', 'Removal of @count comments was enqueued to be run in background. Run cron to finish removal.'), 'warning');

@@ -228,11 +231,16 @@ function comment_multiple_delete_confirm($form, &$form_state) {
+      drupal_set_message(format_plural($count - $deleted, 'Removal of 1 comment was enqueued to be run in background. Run cron to finish removal.', 'Removal of @count comments was enqueued to be run in background. Run cron to finish removal.'), 'warning');

+++ b/core/modules/node/node.admin.incundefined
@@ -587,10 +587,15 @@ function node_multiple_delete_confirm($form, &$form_state, $nodes) {
+      drupal_set_message(format_plural($count - $deleted, 'Removal of 1 post was enqueued to be run in background. Run cron to finish removal.', 'Removal of @count posts was enqueued to be run in background. Run cron to finish removal.'), 'warning');

As has been mentioned above, the word "enqueue" is uncommon (even if it is delicious). Also, the message should be clarified a bit. Perhaps:

@count posts have been scheduled for deletion on the next cron run. Run cron now to finish deleting these posts immediately.

+++ b/core/modules/comment/comment.admin.incundefined
@@ -228,11 +231,16 @@ function comment_multiple_delete_confirm($form, &$form_state) {
+    // In case the jobs get enqueued, we warn user.

+++ b/core/modules/node/node.admin.incundefined
@@ -587,10 +587,15 @@ function node_multiple_delete_confirm($form, &$form_state, $nodes) {
+    // In case the jobs get enqueued, we warn user.

I would change this to:

Notify the user if items were added to the queue instead of being processed immediately.

+++ b/core/modules/comment/comment.moduleundefined
@@ -1598,8 +1598,14 @@ function comment_delete($cid) {
+  // Enqueue larger operations.

+++ b/core/modules/node/node.moduleundefined
@@ -1189,10 +1189,16 @@ function node_delete($nid) {
+    // Enqueue larger operations.

+++ b/core/modules/user/user.moduleundefined
@@ -2414,9 +2414,15 @@ function user_delete($uid) {
+    // Enqueue larger operations.

I'd change these to:

Use a queue to process large datasets.

+++ b/core/modules/node/node.moduleundefined
@@ -1234,7 +1240,11 @@ function node_delete_multiple($nids) {
+    // Return the number of nodes deleted
+    return count($nids);
   }
+  return 0;

This comment should have a period. Also, let's add an inline comment above the return 0 line. Perhaps:

No nodes were passed to delete.

+++ b/core/modules/system/system.moduleundefined
@@ -3031,6 +3031,50 @@ function system_cron() {
+ * Simple job queue worker.
...
+ * Helper function. Conditionally splits big sets of data into chunks before

Both of these need one-line summaries, 80 characters or less and beginning with a third-person verb, that explain what the function does.

For the second, I'd suggest:

Splits large datasets into smaller subsets for processing.

The words "helper function" can also be removed.

+++ b/core/modules/system/system.moduleundefined
@@ -3031,6 +3031,50 @@ function system_cron() {
+ * processing them. In case they are bigger than the allowed threshold -
+ * variable 'queue_bulk_threshold', we add the remaining to the queue and
+ * return just the first set.

I'd add a blank line after the summary I suggest above, and then reword this as:

When the dataset is larger than the allowed threshold defined by the 'queue_bulk_threshold' configuration variable, a subset is returned and the remaining items are added to the queue for later processing.

+++ b/core/modules/system/system.moduleundefined
@@ -3031,6 +3031,50 @@ function system_cron() {
+ * @param $callback - The callback for the job being added
+ * @param $items - An array of items to be processed
+ *
+ * @return chunk with items to be processed, not bigger than the threshold

The descriptions for @param and @return should be on separate lines, indented by two spaces. Also, these should be capitalized and have periods. For the @return, I'd suggest changing the text to:

A subset of items to be processed that does not exceed the bulk threshold.

+++ b/core/modules/system/system.queue.incundefined
@@ -94,6 +94,18 @@ class DrupalQueue {
+   * Add a simple job to the queue.

Should begin with "Adds..."

+++ b/core/modules/system/system.queue.incundefined
@@ -94,6 +94,18 @@ class DrupalQueue {
+   *   The name of the function to be called when the queue is processed next.

I think this should be "when the queue is next processed" or just "when the queue is processed"?

+++ b/core/modules/system/system.queue.incundefined
@@ -94,6 +94,18 @@ class DrupalQueue {
+   *   The arguments to the callback.

I'd change this to:

(optional) An array of arguments to be passed to the callback function. Defaults to array().

franz’s picture

Status: Needs work » Needs review
FileSize
8.46 KB
PASSED: [[SimpleTest]]: [MySQL] 33,808 pass(es). View

Let's see, now with those suggested changes.

franz’s picture

FileSize
8.44 KB
PASSED: [[SimpleTest]]: [MySQL] 33,796 pass(es). View

Removing whites...

moshe weitzman’s picture

Status: Needs review » Needs work

1 comment have been scheduled

. Should be s/have/has.

We should not use 'warning' when content has been enqueued. Thats a normal occurence. Just remove the second param to druapl_set_message().

franz’s picture

Status: Needs work » Needs review
FileSize
8.41 KB
PASSED: [[SimpleTest]]: [MySQL] 33,980 pass(es). View

Ok, fixed those minor issues.

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

Thanks Tis ready.

xjm’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/modules/system/system.queue.incundefined
@@ -94,6 +94,18 @@ class DrupalQueue {
+   * Add a simple job to the queue.

Mmm, somehow this got to be "Add" again instead of "Adds." (Sorry!) Totally RTBC when that is fixed.

Dave Reid’s picture

I think this might need some usability review as to why we'd have a regression why content would still be listed and visible even though the user ran the delete operation, especially since we do not have a concept of 'trashed' in core.

xjm’s picture

Status: Needs work » Needs review
FileSize
441 bytes
8.42 KB
PASSED: [[SimpleTest]]: [MySQL] 33,984 pass(es). View

1-character adjustment. :P

xjm’s picture

Issue tags: +Needs usability review

So to clarify Dave Reid's comment, if you delete 500 nodes and they're still visible (until cron), that could be confusing, even if there's a message. (Because who reads UI text really?) So we might want to consider that.

Tagging.

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

What is a "usability review"? Do we have to wave a magic wand and ask Yoroyjan to come in and bless this node?

Anyway, we don't really have a choice but to do this. We are crippling Drupal by not having this. We don't allow you to delete a text format because if this problem. We can't, in the context of a real time web request, find and migrate all your content that uses that text format. Similarly, we can't really find and migrate all the content that might have been authored by a user that is cancelling.

As for usability improvement, we might consider polishing and putting Queue UI into core. That way folks monitor the progress of their deletes.

Sure, this is a change for long time users of Drupal. They cope with this sort of change all the time.

franz’s picture

I agree with moshe.

The thing is, we can work on the UX later after this comes in (another issue). We have many issues that are just waiting for this to be pushed.

aspilicious’s picture

What if you want them to dissapear now? Do you always have to unpublish first?

catch’s picture

See #1189464: Add a 'poor mans queue runner' to core for instant deletes. I haven't reviewed the patch here yet.

webchick’s picture

Could we link the "Run cron" text here? As of D7+ you can no longer just manually go to cron.php, so that would be a helpful shortcut.

xjm’s picture

Assigned: Unassigned » xjm

Hrm, I swear I suggested that earlier. I'll reroll with that.

xjm’s picture

+++ b/core/modules/system/system.moduleundefined
@@ -3032,6 +3032,54 @@ function system_cron() {
+function system_workers_bulk_split($callback, $items) {
+  $threshold = variable_get('queue_bulk_threshold', 20);

Might be better to define a constant rather than hardcoding 20. Also, beejeebus suggested it might be better to do something like

$site_wide_threshold = variable_get('queue_bulk_threshold', 20);
  $threshold = variable_get('queue_bulk_threshold_' . $callback, $site_wide_threhold);

or

function system_workers_bulk_split($callback, $items, $threshold = 20)
beejeebus’s picture

Status: Reviewed & tested by the community » Needs review

following up on 120 - how many items we want to chunk really depends on the operation - it's really not a one-size-fits-all deal.

i'm not fussed about whether we pass in the threshold with a default, or allow for a per-callback override, as long as we don't force a single site-wide default.

i'm happy to reroll with those changes.

xjm’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
8.68 KB
PASSED: [[SimpleTest]]: [MySQL] 33,973 pass(es). View

This is just the comment change message link because it's before 6:00a.

xjm’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
3.2 KB

Doh crossposts.

beejeebus’s picture

Status: Needs review » Reviewed & tested by the community

ok, cross post. we can add the overrideableness later if need be.

xjm’s picture

Status: Reviewed & tested by the community » Needs review

Okay, kids, don't make patches before 6:00a. Links in #122 are missing url(). Fixing that now.

xjm’s picture

FileSize
3.21 KB
8.69 KB
PASSED: [[SimpleTest]]: [MySQL] 33,985 pass(es). View
Dave Reid’s picture

Not sure what the path admin/logs/status/run-cron is but I'd assume we want the path admin/reports/status/run-cron? Also, we'll need a destination parameter on the link otherwise users clicking on 'run cron' will get redirected to 'admin/reports/status' (due to the drupal_goto() call). This could also pose some permission problems as users without the 'administer site configuration' permission will get an access denied when clicking the run cron link. This is very likely considering how separate the node edit/delete permissions are.

xjm’s picture

I copied the path admin/logs/status/run-cron from a similar message in system.api.php. Was that the path in some previous iteration of D7? If so we should fix it there as well.

The cron path itself already has:

    'access arguments' => array('administer site configuration'),

So I think actually, rather than a destination param, we need to only append that part of the message if the user has that permission?

Dave Reid’s picture

I'll file an issue because that path in system.api.php is dead wrong. We will still need a destination parameter to avoid the unintentional redirect after clicking the run cron link.

xjm’s picture

Confirmed that admin/logs/status/run-cron does not exist, and opened #1339292: hook_requirements() documentation references incorrect path for cron in system.api.php.

xjm’s picture

Assigned: xjm » Unassigned
Status: Needs review » Needs work

I'll test/correct this tomorrowish if no one gets to this first. Unassigning for now.

xjm’s picture

Status: Needs work » Needs review
Issue tags: +Novice, +Needs manual testing
FileSize
6.41 KB
10.34 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch job-queue-89181-132.patch. Unable to apply patch. See the log in the details link for more information. View

Attached:

  1. Cleans up what looked to be a copypasta error where the word "comment" was used in place of "node."
  2. No longer invents non-existent paths for cron.
  3. Only shows the prompt to "run cron now" if the user has the administer site configuration permission.
  4. Uses drupal_get_destination() so that the cron link redirects properly.
  5. Adds a constant DRUPAL_QUEUE_DEFAULT_THRESHOLD for the default threshold of 20.
  6. Adds an optional $threshold parameter to system_workers_bulk_split()
  7. Clarifies a misleading comment (my fault).

Based on the past 20 comments, I think we should have a few people test this patch manually and report the results (concisely!). To test:

  1. Generate some nodes and comments with Devel.
  2. Log in as an administrator.
  3. Test deleting a few different things: nodes with comments, nodes without comments, comments by themselves or with children, etc.
  4. Test deleting a few different numbers of items at a time: 1, 10, 20, 30, 50.
  5. Ensure that the message text is accurate in each case.
  6. Test the link to run cron.
  7. Confirm that the items are deleted.

Screenshots would be helpful for reviewing the issue (but use a small browser window and crop images to only the relevant portion of the screen). Make note of anything that is confusing or unexpected.

I'll have people work on this during office hours on Wednesday if no one gets to it before then.

Dave Reid’s picture

What is a "usability review"? Do we have to wave a magic wand and ask Yoroyjan to come in and bless this node?

For you, it is easy to understand what you need to do to 'run cron' but for a lot of our end users who will be the ones actually doing the deleting of content in production, it is a much harder concept. I feel adding this queue functionality without first implementing #1026908: Trash bin or #147723: Deletion API for core is flawed and doing our users a disservice.

Anyway, we don't really have a choice but to do this. We are crippling Drupal by not having this.

We do have a choice. We can step back and ask how we should actually be handling deletions. We've had how many major version of Drupal with the current behavior so far and we've all been fine?

droplet’s picture

big usability issue.

I read the code, it looks like no UI changes, only an info message. When I first time using VBO + admin_view, it use Queue API for deletion, I'm totally lost and thought the module is broken.

EDIT: I do real test. and it has UI changes.

tizzo’s picture

It seems like anywhere we're triggering this activity from the UI we should be running a batch operation in real time.

bojanz’s picture

I agree with #135.

yoroy’s picture

Issue tags: -Needs usability review

So the move from where deleting actually means deleting to this new situation where deleting means 'will delete soonish' (press here for now) is an step back in UX.

beejeebus was so kind to check for me if content gets unpublished immediately and it seems not.

Relying on a link in a message is brittle then, because I will of course simply not see that. If these content items are not unpublished and/or not somehow labeled and styled as being in a scheduled-for-delete state, that's a serious UXWTF, because I just deleted all those and there they still are?

Do we want to move towards a model where you have a trash can with automatic deletion options? Does cron not run too often for maintaining a reliable trash can? I mean, on my computer I manually empty trash, spam in gmail gets auto-deleted after what is it, 30 days? If there's only a likely *short* intermediate stage where stuff is not deleted yet I don't think that ultimately maps to a trash can metaphor. Triggering an automatic delete would be more fitting then.

I think we need a clearer action plan in follow-ups on what needs to be done if this bit of plumbing is to be put in place first. Right now this intermediate state seems to introduce a temporary black hole.

## About the user interface text

Good to see the 'enqueued' word being designed away already :)

The messages can be less wordy. Currently:

1 comment has been scheduled for deletion on the next cron run. <a href="@cron">Run cron now</a> to finish deleting this comment immediately.

Shorter:

1 comment scheduled for deletion. <a href="@cron">Run cron to delete it now</a>.

Is 'trigger an immediate batch when done through the ui' a viable option or can of worms?

yoroy’s picture

#132: job-queue-89181-132.patch queued for re-testing.

xjm’s picture

So yoroy and I talked about this. It would probably be better from a UX standpoint to divorce this a bit from cron. Workflow:

  1. User selects a large number of nodes to delete.
  2. Confirmation form:

    Are you sure you want to delete 300 nodes? This operation may take a long time. (Or something. See node access rebuild for an existing example.)

    This action cannot be undone.
    ( Delete now ) Cancel
    Schedule for automatic deletion later

    That's the general idea, though the exact text/markup/layout of the form needs refinement. yoroy said he'd look into it a bit.

    • If the user clicks the "Delete now" button, it initializes and runs a progressive batch.
    • Cancel is obvious.
    • If the 3rd link, then it uses the cron queue in the current patch.
yoroy’s picture

FileSize
45.99 KB
58.08 KB
48.44 KB
34.32 KB

For example:

Confirmation page with 'Delete' button, 'cancel' link and 'Schedule big deletions with the [link]Queue API[/link].

By all means move the plumbing in but at this time show restraint and keep current core workflows unchanged, or mimick it to appear so. That 'Queue API' link has to go somewhere too, I don't know where yet.

catch’s picture

To make the deletes instant (or nearly instant) we can do #1189464: Add a 'poor mans queue runner' to core.

The workflow then would be:

1. Submit the form to delete the stuff (same as now)
2. Form submit puts the entities into a deletion queue (same as the patch here)
3. Form submit makes a non-blocking http request to a queue runner, which then deletes as many entities as possible within 30 seconds.
4. While #3 is running, the form submission continues and the next page gets loaded. It would also be possible to then set a message if there are still some items left in that queue to inform people that they'll be deleted when cron runs.

For UX, when deleting < 100 items or possibly more this is probably going to look exactly the same as now (depends how many items the queue manages to delete before the next page loads).

This still means that if you're trying to delete several hundred, or thousands of nodes/comments at once, that some will be sitting there waiting for the next cron run, but that is only going to happen for extremely long comment threads.

So, if we're able to put that in, then there would be no UX regression 90% of the time. For the other 10% of the time, it's balancing having some items left in the queue, compared to what would happen now which is the form submit hangs for 30 seconds then times out or wsods.

xjm’s picture

Issue tags: -Novice, -Needs manual testing

@catch: So are you saying you'd like to see #1189464: Add a 'poor mans queue runner' to core fixed first? I do think the workflow in #140 is much better whatever we do, and it wouldn't require too much additional work on the existing patch here.

Untagging, 'cause we have a new direction or two to follow before this is ready for testing. (Also, we need test coverage.)

bojanz’s picture

The question is, how much do we want to solve in one issue.

One thing that we started doing here is:
Modify delete_multiple API functions to queue excess items
That's cool. And that's where #1189464: Add a 'poor mans queue runner' to core comes in handy, though it's not critical to have it right away.
I'm wondering though if the API changes we are proposing are good, because it changes the meaning of "delete multiple" to mean "delete some now and do some magic to delete others later in order to to make your life easier". Perhaps it might make sense to have it in separate functions? I'm growing more and more used to the idea in the latest patch, but it still feels a bit unexpected.

I see the UI multiple deletion as a separate thing. I think the user should never know about queues, delayed deletions, scheduling. There's no reason for that.
When multiple items are being deleted from the UI, it still makes sense to fill the queue, but the queue should then automatically be processed by batch api.
The problem is, we are filling one queue with all delete operations across the system, which makes sense for non-UI deletions, but doesn't for UI deletions, because batch api can't be precise. If I'm deleting 20 nodes I don't want to have it delete pending files or products as well.
Would make sense to generate a queue per operation, then pass that queue name to batch api. Which means that we'd have to go around the code in delete_multiple, once again raising the question of whether that code should be separate.

There's also the question of queueing and batching any core action, not just deletion, so this really smells like "VBO in core" (at least the BO part :P).

moshe weitzman’s picture

Anyone care to push this along or state what they think should happen (as few dependencies as possible, please).

YesCT’s picture

I think we still need direction and next steps here.

YesCT’s picture

@univate thanks for the issue summary update. (updating tags)

I think VBO got in. Does that help here? (reading the end of #143)

bojanz’s picture

VBO in core currently uses no batching / queueing.
It's definitely relevant though (if we use views + vbo for every core listing, we only implement the batching / queueing once)

Berdir’s picture

Issue tags: -Performance, -Needs tests

#132: job-queue-89181-132.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, job-queue-89181-132.patch, failed testing.

Anonymous’s picture

Issue summary: View changes
joachim’s picture

What happens when one of the queued deletions then incurs more multiple deletions?

Say I queue 200 nodes for deletion, and each node has 200 comments.

Do a node's comments need to be deleted before the node itself? If so, how can one of the 20 items in the current queued item handle that?

vijaycs85’s picture

+++ b/core/modules/system/system.module
@@ -3032,6 +3037,57 @@ function system_cron() {
+  $threshold = $threshold ? $threshold : variable_get('queue_bulk_threshold', DRUPAL_QUEUE_DEFAULT_THRESHOLD);

may be config(system.settings)->get(queue_threshold) ?

joachim’s picture

I've been pondering this problem, as it's come up over at Flag module, where we possibly need to delete a large number of flagging entities when a user account is deleted, or when a node and its comments are deleted.

What I've come up with is this: the entity delete system should have two phases: a simulation phase, and an actual deletion phase. In the simulation phase, the hooks are invoked, but with a parameter to tell implementations not to delete anything, but to merely assemble the entities and update a count of how many entities are involved in total. If this count reaches a predefined limit (from experience on large sites I'd say 50; maybe 200 if your site's timeout is set high), the whole process breaks off and reports that the limit has been reached.

This would get used for both the UI and API routes:

For the UI:

- The user edits a node and presses the 'delete' button
- They are taken the the delete confirmation form. This form runs the entity delete simulation. If the count is less than the limit, it's a plain delete form. If it's over the limit, then the form gives them the choice to either run a batch now, or queue the entities for deletion.

For the API, the caller of entity_delete() would be expected to call the simulation first, and would get a count back. Based on this, the caller can either run the deletion immediately, or queue it. Or for easier DX, we could add an option where you just say: entity_delete(ENTITY, and do the best thing for it, and let me know what you've chosen).

joachim’s picture

Thinking about this more, the threshold count should possibly be something passed in by the caller.

For example, on Flag module, if the flag is an AJAX flag, you really don't want to be doing more than deleting one entity, so the site feels responsive -- so anything more than that should be queued. If the flag is using the page reload method, you can delete a few more. If it's a confirmation form, then we can show the user the option of running the batch or queueing.

I think the idea of deleting some entities right away and queueing the rest warrants more thought. It certainly seems like it could be useful -- eg delete just the flagging now and queue up any chain reactions for later. But I'm concerned it could leave the relationships between entities in an unstable (or at least unexpected) state. The way round this might be for the simulation phase hook implementations to be able to say whether their entities are essential or ancillary.

xjm’s picture

Version: 8.0.x-dev » 8.1.x-dev
Status: Needs work » Postponed

This improvement should now be targeted for the next minor version per https://www.drupal.org/core/d8-allowed-changes#minor. Thanks!

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.0-beta1 was released on March 2, 2016, which means new developments and disruptive changes should now be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

lliss’s picture

Issue summary: View changes
joachim’s picture

Status: Postponed » Active

Unpostponing now we have 8.2.x.

catch’s picture

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

alexpott’s picture

Priority: Critical » Major

We've decided to unblock #1239558: Deleting an entity with revisions and file/image field does not release file usage of non-default revisions, causing files to linger by deleting non-default revisions prior to an entity delete. I think that this means this is no longer critical because this is not blocking that.

jonathanshaw’s picture

Very similar issues occur with #2723323: When deleting an entity, references to the deleted entity remain in entity reference fields, suggesting there is a wider need for an architecture that can navigate the tension between providing immediate vs scalable cleanups.

jonathanshaw’s picture

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.