It's a bit clumsy having to press 'bulk update' 20 times in order to update 2000 node paths.

How to allow more then 50 entities per bulk-update? Should it be done with CRON? or AJAX?

Amnon
-
Professional: Drupal Israel | Drupal Development & Consulting | Eco-Healing | Effective Hosting Strategies | בניית אתרים
Personal: Hitech Dolphin: Regain Simple Joy :)

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

druvision’s picture

It's 40 times, not 20.

greggles’s picture

Title: Allow more then 50 entities for bulk-update » perform bulk updates during cron and/or via the batch API
Version: 5.x-2.0 » 6.x-1.x-dev

At the top of the page under general settings is "Maximum number of objects to alias in a bulk update:" which allows you to experiment with higher numbers. 50 aliases at a time is generally "safe" which is I chose it (but some sites can't even do 50!).

I do agree, this would ideally be something that could be checked off as "scheduled to run during cron" and also could use the batch API (which is Drupal6.x only, so I'm moving the version).

lefnire’s picture

So I have 10,000 pages, therefore I would like a cron job to run this bulk update once every 5-10 minutes. So I wanted a separate cron job for specifically this, so that all other drupal crons aren't run every 5 minutes

so what I tried was creating a new book page, making it's input type "PHP", and adding the following code:

<?php node_pathauto_bulkupdate(); ?>

and then setting up a custom cron tab like in cPanel, or linux, or what have you, that loads this page. But for some reason, every time this page loads it times out? no matter what I have the bulk-update amount set to (it's at 50 still), and even though doing it from the pathauto admin page does NOT time out. ???

ashtronaut’s picture

I was able to get this working on a site with 179,000 nodes with a bulk-update amount set to 100. @ 100 nodes it takes my box almost 2min 45 sec to finish, and it eats up a big % of CPU, so these bulk_updates are resource hungry. My pathauto version is 5.x-2.1. I run my cron job every 3 min. If you have root access, you may want to check your php.ini file to see what your settings are. I had to change my memory_limit, and max_execution_time in the past to prevent xmlsitemap from timing out while populating the sitemap. (just a shot in the dark, i'm not sure if this is your problem or not) This might help prevent your bulk update from timing out as well. If you check your apache logs, and see something about memory running out, this could probably be a fix.

ash

stennie’s picture

@ashtronaut : you don't mention whether you are using mysql or postgres,
but try the suggested patch at:
http://drupal.org/node/212327#comment-729893

For my configuration the main bottleneck was not the number
of nodes being updated in each run, but rather the query to
find the nodes to update.

Cheers,
Stephen

stennie’s picture

@lefnire : i had the same requirement to run pathauto without
the full drupal cron so pathauto can be done more frequently.

To do so, I set up a cron-update-pathauto.php script which contains:

<?php

include_once './includes/bootstrap.inc';
include_once './modules/pathauto/pathauto.inc';
include_once './modules/pathauto/pathauto_node.inc';


drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

node_pathauto_bulkupdate();

Then used a modified version of this cron-php.sh to call
the script from cron:
http://drupal.org/node/65307

Calling the script via cron avoids any web server timeout/memory
issues that might otherwise cause this to bail out early.

Cheers,
Stephen

greggles’s picture

@stephen - could you provide that script and a write-up of how to use it in the handbook? That would be really useful!

ashtronaut’s picture

I agree with the handbook write-up, that is a nice approach to bulk update large sites. . . . . . . . Thanks for sharing it.

@stephen, thanks for pointing me in the direction of that patch (I am running mysql). I will give it a shot this afternoon.

ash

stennie’s picture

@greggles: happy to add something to the handbook, but not familiar with how the contribution process works. Could you point me in the right direction .. do I just need to login and add a child page under the appropriate handbook node?

I guess that would be either : http://drupal.org/handbook/modules/pathauto or http://drupal.org/node/86994.

Cheers,
Stephen

greggles’s picture

Exactly - just login and create a child page. You should have the permission to do that. I think adding it as a child of the top level page would make sense (i.e. a child of http://drupal.org/handbook/modules/pathauto ).

stennie’s picture

FYI, added handbook write-up of cron update:
http://drupal.org/node/236304

Cheers,
Stephen

giorgio79’s picture

I tried using this cron script but for some reason the script would just hang, my url_alias table would get corrupted, and my site would not load. It would just hang on a page request with no response from the server and no error log entries.

After finally figuring out that the url_alias and a few other cache tables were corrupted, I fixed them with myisamchk,

I can however seamlessly do bulk updates from the admin page...Although that is manual. Will be looking around for a few more ideas.

Cheers,
G

moshe weitzman’s picture

subscribe. batch api here would be lovely.

moshe weitzman’s picture

For that matter, a drush command would be well as well. Those are not subject to PHP timeout.

David Lesieur’s picture

Status: Active » Needs work
FileSize
4.17 KB

This patch uses Batch API to bulk-generate node aliases. We'll want to generalize the thing a little bit to use Batch API for any type of alias, not just nodes, so let's consider this as just a first draft. :-)

greggles’s picture

I wish I had posted sooner...

Basically I think we should not have this Batch API support in Pathauto.

We already have a bulk generate for nodes via the node action and Views Bulk Operations. If we expose two more actions (one for term, one for user) then the http://drupal.org/project/views_bulk_operations modules gives us this for free.

moshe weitzman’s picture

Further, I think that VBO got a 're-save' operation which was also needed for auto_nodetitle. since that op will work fine for pathauto, this issue might be considered done. note that re-saving each node is a lot slower that add/edit an alias but such is life.

David Lesieur’s picture

Considering that Pathauto's current bulk generation feature is not so useful without batch operations, then could it be a good idea to remove it completely from the module and provide proper documentation about the VBO trick?

David Lesieur’s picture

I have not tried VBO for some time. I know it allows to select some nodes and apply an action on them. But what if I wish to update 30k nodes? Does VBO allow to select "all nodes" of the view, or better: "all nodes without an alias" (the patch I have provided does the latter). I'm not aware of Views filters for URL aliases.

drewish’s picture

David Lesieur, yes it does. there's an option to use the batch, and to select all.

hass’s picture

+

ccshannon’s picture

Just a word about VBO. Yes, it lets you select "ALL" and run a batch operation, but the batch operation times out when updated more than say 500 nodes at a time. This results in getting a white screen on return running the operations, plus you really don't know which items got updated and which didn't. Not a great solution when, as David said, you are updating 30K+ nodes. I was only trying to update 1800 nodes and it has been a nightmare performing any batch operations of more than a few hundred nodes through Drupal.

drewish’s picture

i think the limits ccshannon has identified are going to vary by server. things like memory limit, speed and the like will play into it.

greggles’s picture

If someone needs to update 5k+ nodes (much less 30k) they should use a command line script for the job. There are certain things for which browser solutions are non-ideal. The command line script is documented in the handbook.

Froggie-2’s picture

#24: "The command line script is documented in the handbook."

Thanks for the information.

However, I could not locate the command line script for bulk update for Drupal 6x in the documentation.
I shall be grateful if you could please provide a link to the exact documentation page for command line script for pathauto bulk update for Drupal 6x.
Thanks

chrisschaub’s picture

The VBO Batch method is very server friendly since it uses javascript callbacks to process a very small number of records at a time. It's basically the same as doing a small set of updates manually. It breaks the large job up into small updates and runs each separately via javascript. Currently, pathauto works with VBO but not in "batch" mode which is a shame, shows a 404 not found page. This seems to be a better approach than forcing users to use a command line -- which is fine with me -- but not everybody knows unix.

sun’s picture

Status: Needs work » Closed (duplicate)
leanazulyoro’s picture

This does not work for me, the paths aren't getting its patterns replaced, I get stuff like "content/[title-raw]". What am I missing?

JCB’s picture

I had some issues updating term aliases as I have vocabulary with 50,000 terms.
This lead to internal server errors.

I can confirm that updating URL alias is working with views bulk operations (VBO).

I installed the latest dev version which supplied the required options to make this possible.
6.x-2.x-dev tar.gz (42.96 KB) | zip (47.99 KB) 2012-Sep-28 Notes