Problem/Motivation

When we have a large amount of nodes, like 200k, to run on scheduler we lack of memory, depending on server configuration. This occurs when the scheduler tries to merge and get the unique values from the results returned from the query.

// Allow other modules to add to the list of nodes to be published.
  $nids = array_unique(array_merge($nids, _scheduler_scheduler_nid_list($action)));

Proposed resolution

Add a configuration to select the limit number of nodes to run on every cron.

User interface changes

Add a field to Scheduler settings to handle this number.

Issue fork scheduler-2907382

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

chgasparoto created an issue. See original summary.

chgasparoto’s picture

Title: Limits the number of nodes when run cron » Limit the selected number of nodes when run cron
chgasparoto’s picture

chgasparoto’s picture

Status: Active » Needs review
jonathan1055’s picture

Hi chgasparoto,

Thanks for your interest in Scheduler and for raising this issue.

When we have a large amount of nodes, like 200k, to run on scheduler we lack of memory

Yes, I am not surprised at running out of memory, but I have to ask why you want to publish so many nodes at one time - that is a huge number! I think there may be other ways for you to do this, using batch processing, as Scheduler was never designed to tackle such a big task.

In addition, new features will first have to be added to Scheduler 8.x then they may get ported to 7.x. We are just about to release Scheduler 8.x next month, see #2662476: Progress towards 8.x release of Scheduler, so we won't be adding anything new at this point.

Jonathan

Liam Morland’s picture

Version: 7.x-1.x-dev » 8.x-1.x-dev
jonathan1055’s picture

If this feature really is wanted then a new patch will be required. Setting this to Needs Work (as we can't review the current patch on 8.x)

Liam Morland’s picture

Status: Needs review » Needs work
jonathan1055’s picture

Version: 8.x-1.x-dev » 2.x-dev

New features go into the 2.x branch now.

geoffreyr made their first commit to this issue’s fork.

geoffreyr’s picture

I had a bit of a go at porting this to D9/10. Not sure that we should call the parameter max_nodes_per_cron anymore, maybe it should be renamed to reference entities in general. publish and unpublish methods take the limit as the command, but default to 0 so the original invocation should continue to work (albeit with no limit).
Will iterate on this when I have the opportunity.

jonathan1055’s picture

Thanks geoffreyr for opening the MR. Looks good so far, but I wonder if we can simplify it? The SchedulerManager publish() and unpublish() functions are never going to be called with varying values, it will always be the value set in the config options. So instead of adding a parameter, could you just get the $limit setting at the start of the function?

Also, we could introduce a default of, say 1,000 (?) rather than 'everything' so that we automatically prevent problems if no limit has been set? The the processing could start at the $limit and then $remaining_limit would get deprecated, and the processing stop when it is <= 0. Just an idea, not actually proved this is right, but worth a try.

We will need test coverage too, but don't let that hinder your progress.