I've added ini_set('memory_limit', '1800M'); to the top of the function so it will work. The update takes a LONG time (15+ minutes) to run and in MySQL under Show Processlist the Status is says "System lock". I have 173,721 unpublished nodes in this database.

Files: 
CommentFileSizeAuthor
#20 drupal-2174551-18-taxonomy-update-7011-fix.patch1.64 KBDavid_Rothstein
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch drupal-2174551-18-taxonomy-update-7011-fix.patch. Unable to apply patch. See the log in the details link for more information.
[ View ]
#14 drupal-2174551-14-taxonomy-update-7011-fix.patch2.54 KBmikeytown2
PASSED: [[SimpleTest]]: [MySQL] 40,735 pass(es).
[ View ]
#13 drupal-fix_taxonomy_update_7011-2174551-13.patch1.29 KBhswong3i
PASSED: [[SimpleTest]]: [MySQL] 40,724 pass(es).
[ View ]

Comments

mikeytown2’s picture

Issue summary:View changes
mikeytown2’s picture

Issue summary:View changes
heddn’s picture

To help with memory consumption, it should probably be rewritten as a count() query then loop over the results using db_query_range() to page through a thousand rows at a time.

David_Rothstein’s picture

Priority:Normal» Major

Ugh, yeah. That's a lot of unpublished nodes. Hopefully most sites don't have anywhere near that many.

We could probably change the update function to run as a batch (with $sandbox), and also the initial query could be made to join against the {taxonomy_index} table (so it only pulls node IDs that actually need deleting; for many Drupal 7 sites this will be 0 anyway).

David_Rothstein’s picture

I added this to the "known issues" section of the release notes and release announcement for now.

greggles’s picture

@mikeytown2 - did you run the update via update.php or drush updb? Do you have a memory limit on drush?

mikeytown2’s picture

First try was via update.php. That failed. Next was drush updb, that failed as well until I gave PHP a lot of ram. Luckily I ran the update again from the cli (drush) and that worked after waiting over 15 minutes for the query to complete.

Garrett Albright’s picture

For the record, and to save others from having to look the function up:

<?php
/**
 * Drop unpublished nodes from the index.
 */
function taxonomy_update_7011() {
 
$nids = db_query('SELECT nid from {node} WHERE status = :status', array(':status' => NODE_NOT_PUBLISHED))->fetchCol();
  if (!empty(
$nids)) {
   
db_delete('taxonomy_index')
      ->
condition('nid', $nids)
      ->
execute();
  }
}
?>
Jamie Holly’s picture

So converting this to the regular ole' SQL syntax that MySQL would ultimately get, using @mikeytown2's 173,721 nodes, you end up with something like:

DELETE FROM taxonomy_index WHERE nid IN (###, ...)

And just keep adding to that ### until you have 173,721 values in your IN statement.

Two areas I can see that this could be greatly improved upon. The first is using a batch, as @David_Rothstein suggested.

The second is eliminating any nodes that aren't in taxonomy_index on the initial query:

SELECT n.nid FROM {node} n LEFT JOIN {taxonomy_index} t ON n.nid=t.nid WHERE n.status=0 AND t.nid IS NULL;

That would really help out sites with a lot of unpublished nodes that also don't have any taxonomy assigned.

The other option is a sub query, but I'm not sure about support in other databases. This should perform better though (If Mikey has a copy of that database around and could test, that would be great):

DELETE FROM taxonomy_index WHERE nid IN (select nid from node where status=0)
mikeytown2’s picture

SELECT n.nid
FROM node AS n
LEFT JOIN taxonomy_index AS t
  ON n.nid=t.nid
WHERE n.status=0
AND t.nid IS NULL;

Returns 161,262 nodes on one of our test servers with an older db backup. So in this case it most likely doesn't help reduce the total number of nids to remove.

SELECT *
FROM taxonomy_index
WHERE nid IN (
  SELECT nid
  FROM node
  WHERE status=0
)

Returns an empty result (which is correct as this is not a D6->D7 site).

mikeytown2’s picture

We would want an INNER JOIN correct?
This gives back an empty result

SELECT n.nid
FROM node AS n
INNER JOIN taxonomy_index AS t
  ON n.nid=t.nid
WHERE n.status=0
AND t.nid IS NULL
Jamie Holly’s picture

HAHA you're right. That's what I get for typing without the proper amounts of coffee in me.

We would want nodes where an entry exists on taxonomy_index. No need for the null check either.

SELECT n.nid
FROM node AS n
INNER JOIN taxonomy_index AS t
ON n.nid=t.nid
WHERE n.status=0;

That way if there isn't a corresponding taxonomy_index then it won't even get returned.

hswong3i’s picture

Version:7.26» 7.x-dev
Assigned:Unassigned» hswong3i
Status:Active» Needs review
StatusFileSize
new1.29 KB
PASSED: [[SimpleTest]]: [MySQL] 40,724 pass(es).
[ View ]

A quick fix that only execute 50 nids in each execute, via GIT 7.x:

<?php
/**
 * Drop unpublished nodes from the index.
 */
function taxonomy_update_7011() {
 
$nids = db_query('SELECT nid from {node} WHERE status = :status', array(
   
':status' => NODE_NOT_PUBLISHED
 
))->fetchCol();

 
// Prevent running out of memory, by execute 50 nids each time.
  // @see https://drupal.org/node/2174551
 
while (!empty($nids)) {
   
$slice = array_slice($nids, 0, 50);
   
$nids = array_slice($nids, 50);
   
db_delete('taxonomy_index')
      ->
condition('nid', $slice)
      ->
execute();
  }
}
?>
mikeytown2’s picture

StatusFileSize
new2.54 KB
PASSED: [[SimpleTest]]: [MySQL] 40,735 pass(es).
[ View ]

Changed this so it will use batch processing. Also uses the new query from #12. Tested it using the old query and the update takes about 30 seconds to run going through 161k records. Put in the new query and it doesn't try to delete anything. The limit parameters for the query should probably be included in the query correctly instead of string concatenation.

Status:Needs review» Needs work

The last submitted patch, 14: drupal-2174551-14-taxonomy-update-7011-fix.patch, failed testing.

mikeytown2’s picture

Status:Needs work» Needs review

14: drupal-2174551-14-taxonomy-update-7011-fix.patch queued for re-testing.

Previous result: FAILED: [[SimpleTest]]: [MySQL] Setup environment: failed to drop checkout database.

mikeytown2’s picture

Assigned:hswong3i» Unassigned

Thinking about this more and I should make the limit always be 0, 5000 and run it till $nids = 0. Will wait until this first patch passes before adjusting this patch... Won't happen till Monday so feel free to modify it.

hswong3i’s picture

I am questioning on performance of delete query: is it really such heavy that require for a background batch processing?

IMHO delete query shouldn't be such memory expensive, just because we put everything into a single IN () statement, and so result as a huge result matrix (i.e. running out of memory).

Shall #13 enough for fixing this issue?

hswong3i’s picture

David_Rothstein’s picture

StatusFileSize
new1.64 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch drupal-2174551-18-taxonomy-update-7011-fix.patch. Unable to apply patch. See the log in the details link for more information.
[ View ]

Rerolled to fix a few things; this should now be working correctly, I believe.

David_Rothstein’s picture

Cross-posted there, but I would say, since we don't know how many nodes the site has, batch processing is still a good idea.

If I understand mikeytown2's comment above correctly, this still takes 30 seconds to run in his situation (or rather, it would if all the unpublished nodes came from an upgraded Drupal 6 site with this issue), which is long enough to warrant a batch.

mikeytown2’s picture

Status:Needs review» Reviewed & tested by the community

@David_Rothstein
That is correct. It takes 30 seconds to issue the delete command for 160k rows in the DB.

Patch in #20 works for me so going to RTBC it as well.

prakharcse’s picture

hii

Iwant to upgrade my drupal 6.30 to the updated version of drupal 7 . Please if somebody guide me through this migration.

Garrett Albright’s picture

prakharcse, you can learn more about upgrading Drupal core here.

marcingy’s picture

Also ran into this issue (on a none d6->d7 site) with a lot of unpublished nodes(751,357). Patch here does the trick for me and prevents a nasty fatal error.

Kristen Pol’s picture

#20 is working for me.

David_Rothstein’s picture

Status:Reviewed & tested by the community» Fixed
Issue tags:+7.27 release notes

Committed to 7.x - thanks! http://drupalcode.org/project/drupal.git/commit/5be1de3

I'll also update the 7.26 release notes and release announcement to mention this is now fixed.

Status:Fixed» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

vipcar’s picture

Status:Closed (fixed)» Needs review
marcingy’s picture

Status:Needs review» Closed (fixed)

This is already committed

Status:Closed (fixed)» Needs work

The last submitted patch, 20: drupal-2174551-18-taxonomy-update-7011-fix.patch, failed testing.

marcingy’s picture

Status:Needs work» Closed (fixed)
David_Rothstein’s picture

Drupal 7.27 was a security release, so this is now scheduled to go out in Drupal 7.28 instead.