Hello,

What are the chances I get two generated users getting the same name, the same day on the same Drupal website ?
I first generated 1000 users, then, the second time, tried to generate 10000 users, on the same database, I got a MySQL duplicate entry error for users.name at around 3200 created users.
Should I go play to casino today, or is it bound to happen sometimes ?
I know that would impact performances, but my suggestion would be to check users names unicity, in the batch itself, and compared to existing users.
Example of code (needs review and improvement)

$existing_names =   db_select('users')
    ->fields('users', array('name'))
    ->execute()->fetchAllKeyed(0, 0);
while (count($names) < $num) {
  $names = array();
  while (count($names) < $num) {
    $names[] = devel_generate_word(mt_rand(6, 12));
  }
  $names = array_unique($names);
  $names = array_diff($names, $existing_names);
}

Regards,

David

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

salvis’s picture

I haven't really looked into this, but I would expect that the "MySQL duplicate entry error" is a DBTNG exception that we should be able to catch. Even developing and testing should be straight-forward: just pass a constant username and the error will pop up at user 3.

D8 first, please...

David Stosik’s picture

Version: 7.x-1.x-dev » 8.x-1.x-dev

Switching to D8 then...
I did some test and in a 2000 generated users population, one or two are usually duplicates. The new generated users also generally contain name that are already used in my 3000 previously generated users...
I need to generate 20000 users so...
Catching DBTNG Exception sounds good, but what would you do then ? Generate another user name, and so on until you get a good one ?

David Stosik’s picture

Priority: Minor » Normal
Status: Active » Needs review
FileSize
1.05 KB

Applies to 7.x-1.x-dev and 8.x-1.x-dev.

salvis’s picture

Priority: Normal » Minor
Status: Needs review » Active

Catching DBTNG Exception sounds good, but what would you do then ? Generate another user name, and so on until you get a good one ?

Yes, catch the exception inside the loop and just continue looping without incrementing the count. Catch the specific type of the exception (hopefully a DuplicateKeyException or something like that), so that we don't go into an endless loop for some other unrecoverable error.

And hope that our string generator doesn't run out of distinct strings... Otherwise we'll have to fix that, too — no matter how we treat the collisions.

salvis’s picture

Priority: Minor » Normal
Status: Active » Needs review

Sorry, crosspost...

salvis’s picture

I've looked at the current code. It's unnecessarily complex. There's no point in pre-creating the $names array.

Replace

  if ($num > 0) {
    $names = array();
    while (count($names) < $num) {
      $name = devel_generate_word(mt_rand(6, 12));
      $names[$name] = '';
    }

    if (empty($roles)) {
      $roles = array(DRUPAL_AUTHENTICATED_RID);
    }
    foreach ($names as $name => $value) {

with

  if (empty($roles)) {
    $roles = array(DRUPAL_AUTHENTICATED_RID);
  }

  $count = 0;
  while ($count < $num) {
    $name = devel_generate_word(mt_rand(6, 12));
    // ...
    $count++;

Then you can just

      catch ( ... ) {
        if ( ... ) {
          continue;
        }
        throw;
      }

Also, move the module_load_include() out of the loop.

Please check what happens if we keep getting the same error over and over again. We may or may not need to check for that case.

David Stosik’s picture

I was just fixing the error, aiming to as few code lines as possible. :)
I once read that a provided patch should not "fix" other things than the problem it's dealing with. ("Keeping things organized" in the general patch guidelines, in http://drupal.org/node/707484).

Anyway, I guess attached patch is better.

Regards,

David

salvis’s picture

Status: Needs review » Needs work

I generally agree. But if the structure is bad to begin with and forces you to add more quirky code and further obfuscate the meaning of the whole thing, then we have to give priority to the quality of the code over minimizing the patch.

If you like to take the extra step, I'll gladly accept two separate patches, one to refactor the code and a second one with the actual fix.

Some minor comments:

+++ b/devel_generate/devel_generate.inc
@@ -49,18 +48,30 @@ function devel_create_users($num, $kill, $age = 0, $roles = array()) {
+        $count++;

Please move this to the end of the while block so that the control flow is easier to grasp.

+++ b/devel_generate/devel_generate.inc
@@ -49,18 +48,30 @@ function devel_create_users($num, $kill, $age = 0, $roles = array()) {
+        // 23000 is "Integrity constraint violation" : user name is probably already used.

Comments must be wrapped at col 80.

+++ b/devel_generate/devel_generate.inc
@@ -49,18 +48,30 @@ function devel_create_users($num, $kill, $age = 0, $roles = array()) {
+        else {
+          throw($e);
+        }

There's no need for an else here — we want to re-throw unconditionally at the end of the catch block. Also, throw is not a function (it should not look like a function call), and there's no need for parentheses.

David Stosik’s picture

Took your comments into account.

David Stosik’s picture

Status: Needs work » Needs review
salvis’s picture

That looks good to me, thank you.

Now we need someone to test this... A PostgreSQL test would be nice, too...

From #6:

Please check what happens if we keep getting the same error over and over again. We may or may not need to check for that case.

Have you tried to generate your 20000 users (#2)?

David Stosik’s picture

16000 users, no problem. :)
I created them with custom code implying Batch API (which means many small passes).

salvis’s picture

Great.

What about the endless loop problem? What happens if you set

      $name = 'constant';

? Do we time out? Do we need to count consecutive exceptions and abort if we have no intervening successes? I'm not saying we need to do that, just wondering how it behaves if we don't. Will Batch API detect the timeout?

David Stosik’s picture

Good question, but the problem is deeper. How many different names (N vowel/consonant combinations) can the algorythm provide?
Should the function abort if the passed number is greater?

salvis’s picture

Yeah, that's what I meant in #4. If the number of remaining names gets low, it takes progressively longer to find one, and the algorithm will slow down to a crawl long before they are completely exhausted.

Running out of names is just a special case. I'm more concerned with the same exception being thrown due to some other cause (maybe in a third-party contrib) that doesn't go away just because we try a different name.

If we handle the general case (or determine that it behaves reasonably well as it is), we don't need to worry about the special case.

David Stosik’s picture

I'm no SQL / PDO expert, so I can't say which cases may throw the same exception.

By the way there's an alternative, consisting in getting a list of all user names before starting, and testing if the user name is in the list before creating it.

salvis’s picture

I can't say which cases may throw the same exception.

That's not the question. We can't know. A number of hooks are processed when saving a user, and any number of core and contrib modules may implement these hooks and do all sorts of things which have the potential to fail in a myriad of ways. Our change is to neutralize the PDOException 23000 no matter where it comes from. We can simulate an imaginary contrib persistently throwing PDOException 23000 (the worst case for our change) by passing the same username over and over again (causing core to exhibit the nasty behavior within the try block) and checking what happens. That's what I'm asking you to try out.

marvil07’s picture

I made a couple of changes:

  • Use entity_create() instead of user_save(). (following core)
  • Use a variable to define maximum retries. My solution for the mentioned case about infinite loop.
DrupalChimp’s picture

Status: Needs review » Needs work

Error seems to occur if generated users already exist. e.g. If you generate 2000 users, then generate another 2000 users. The first generation seems to run fine. Second generation gets a 23000 Integrity constraint violation.

Here is the generated error

Error message
Drupal\Core\Entity\EntityStorageException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'uanodr' for key 'name': INSERT INTO {users} (uid, uuid, name, langcode, pass, mail, theme, signature, signature_format, created, access, login, status, timezone, preferred_langcode, preferred_admin_langcode, init) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8, :db_insert_placeholder_9, :db_insert_placeholder_10, :db_insert_placeholder_11, :db_insert_placeholder_12, :db_insert_placeholder_13, :db_insert_placeholder_14, :db_insert_placeholder_15, :db_insert_placeholder_16); Array ( [:db_insert_placeholder_0] => 3750 [:db_insert_placeholder_1] => d9bf6efd-2130-47c8-bf05-056f0c7cbb51 [:db_insert_placeholder_2] => uanodr [:db_insert_placeholder_3] => und [:db_insert_placeholder_4] => $S$E9nBaHxImlSim7FhQ6XiKX1Ijk1DPE1DoJVcBK.0uuVrkuiPfSc6 [:db_insert_placeholder_5] => uanodr@drupalcore.dev.invalid [:db_insert_placeholder_6] => [:db_insert_placeholder_7] => [:db_insert_placeholder_8] => [:db_insert_placeholder_9] => 1366856175 [:db_insert_placeholder_10] => 0 [:db_insert_placeholder_11] => 0 [:db_insert_placeholder_12] => 1 [:db_insert_placeholder_13] => Europe/London [:db_insert_placeholder_14] => und [:db_insert_placeholder_15] => und [:db_insert_placeholder_16] => ) in Drupal\Core\Entity\DatabaseStorageController->save() (line 565 of /www/drupal/core/lib/Drupal/Core/Entity/DatabaseStorageController.php).

marvil07’s picture

Status: Needs work » Needs review

@James Linnell: Can you please confirm the error happens *after* applying the patch on comment 18?

DrupalChimp’s picture

Yes, it happens after applying the patch.

pcambra’s picture

Issue summary: View changes
Status: Needs review » Needs work

Devel generation has changed to be pluggable in #2154141: DevelGenerate as a D8 plugin type so this needs a re roll.

pcambra’s picture

Issue tags: +low hanging fruit
willzyx’s picture

Status: Needs work » Closed (outdated)

Closing for lack of activity. Feel free to reopen if the issue still exists