I'm having scenario when 100k rows (or more) are being migrated using simple node_save() call, however by default (by not specifying any pathauto settings), Pathauto generating the aliases by it-self (which I guess is fine).

Note: We know that when using node_save(), we can specify the same title, so no validation is performed as usually from GUI, so the same title can be repeated as many times as possible.

So when migration is run, I've the issue when single call to node_save() generates over 5000 SQL queries to _pathauto_alias_exists (1k rows = 5mln SQL queries).

The reason why this happens, because when the node with the same title is saved, Pathauto incrementally assigning the numbers, so it can end up with the following surprise for single node:

SELECT pid FROM url_alias WHERE source <> 'node/13073' AND alias = 'foo/bar' AND language IN ('und', 'und') ORDER BY language DESC, pid DESC LIMIT 0, 1", 170, MSG_DONTWAIT, NULL, NULL)
SELECT pid FROM url_alias WHERE source <> 'node/13073' AND alias = 'foo/bar-0' AND language IN ('und', 'und') ORDER BY language DESC, pid DESC LIMIT 0, 1", 172, MSG_DONTWAIT, NULL, NULL)
... 4470 more queries repeated like that ...
SELECT pid FROM url_alias WHERE source <> 'node/13073' AND alias = 'foo/bar-4472' AND language IN ('und', 'und') ORDER BY language DESC, pid DESC LIMIT 0, 1", 175, MSG_DONTWAIT, NULL, NULL)

so in other words, it keeps sending the queries until find its unique available value.

Here is affected code in pathauto_alias_uniquify():

  $i = 0;
  do {
    // Append an incrementing numeric suffix until we find a unique alias.
    $unique_suffix = $separator . $i;
    $alias = truncate_utf8($original_alias, $maxlength - drupal_strlen($unique_suffix, TRUE)) . $unique_suffix;
    $i++;
  } while (_pathauto_alias_exists($alias, $source, $langcode));

So if 10k nodes are with the same title, then this single loop will execute 10k SQL queries for single node_save() and it seems that _pathauto_alias_exists() doesn't even cache anything (like static variable or something).

So this is something for improvement, as I'm sure there is some better way of searching the unique value by using single SQL query. For example using like: "LIKE "foo/bar%" then sort decrementally by number, or something else.

As for workaround, $node->path = array('pathauto' => FALSE); needs to be added before node_save(), to avoid the processing.

Comments

kenorb created an issue. See original summary.

kenorb’s picture

Issue summary: View changes
Dave Reid’s picture

Issue tags: +Performance
.bert’s picture

I realize this issue is old, but we recently ran into this causing severe performance issues on an old D7 site for us.

For anyone else coming here, you can also work around this by setting a unique path alias pattern for the entity in question by going to admin/config/search/path/patterns.

For example, we had nodes that were automatically being generated with the same title and the path alias was set to the default content/[node:title]. There were nearly 35,000 nodes with that title and pathauto had to cycle through them all each time it created a new one. We changed the pattern to content/[node:title]-[node:nid] and our automated process that creates a few hundred nodes went from taking 2.5 hours to 46 seconds, since they were all unique.

I can see that this code is the same in the D8 version. Does anyone have any thoughts on ways to improve this?

Perhaps an initial check if the original alias is unique, if not an SQL query that searches for the last item and increment 1 beyond that. Example query in D7:

<?php
$query = db_select('url_alias', 'ua')
  ->condition('alias', $original_alias . $separator . '%', 'LIKE')
  ->orderBy('counter', 'DESC')
  ->range(0, 1);
$query->addExpression("CAST(SUBSTRING_INDEX(alias, :separator, -1) as unsigned)", 'counter', array(
  ':separator' => $separator,
));
$counter = $query->execute()->fetchField();
?>