Problem/Motivation

I have a site where the admin user uses the migration UI to update specific content, using the idlist option, where the data comes from an outside source. Now it seems that option form field has a strict pattern that is tied to that text field which prevents the migration from running as the browser blocks the form submission. Would we be able to update that pattern or remove it altogether from that form field?
capture

Note: The migration runs fine when using drush but our user in this case does not have access to drush or the command line.

In MigrationExecuteForm.php Line 160:

$form['options']['idlist'] = [
      '#type' => 'textfield',
      '#title' => $this->t('ID List'),
      '#maxlength' => 255,
      '#size' => 60,
      '#pattern' => '^[0-9]+(' . MigrateTools::DEFAULT_ID_LIST_DELIMITER . '[0-9]+)?(,?[0-9]+(' . MigrateTools::DEFAULT_ID_LIST_DELIMITER . '[0-9]+)?)*$',
      '#description' => $this->t('Comma-separated list of IDs to process.'),
      '#states' => [
        'enabled' => [
          ':input[name="operation"]' => [['value' => 'import'], 'or', ['value' => 'rollback']],
        ],
      ],
    ];

Steps to reproduce

  1. Visit any migration through the UI.
  2. Enter in any string (alphabet) in the idlist option.
  3. Try executing, the browser will prevent you from running your migration.

Proposed resolution

Remove/edit the pattern for the idlist form field

Comments

danielkim7755 created an issue. See original summary.

2dareis2do’s picture

I have this issue as well. If using rss feed it is common to use guid for the source id. Guid is a string.

Here is an example where guid is used.

<item>
            <title>Roots: (Overseas) and Is Any Body Home? at Streatham Space Project - London Theatre 1</title>
            <link>https://news.google.com/rss/articles/CBMipAFBVV95cUxNTENNa3EtZ182WEJURjRpZmJXelc3YklsLXFIWEludWlqNVBYZFBUbF81UW0xRDNwVW1WYVJGNDFmV3NTSWlBeXgwcGJCcExRc1EwUHJYZTlTV0x6M2lOQjROa21PYjNKTlhQbnZlNkZKd3FJRG5LYnFaSEZYRFhsc3l2bHZwbk9UZFpqb1BkY3kxeDZnSXd1b0lkMkJoM2NWd1UzWQ?oc=5</link>
            <guid isPermaLink="false">CBMipAFBVV95cUxNTENNa3EtZ182WEJURjRpZmJXelc3YklsLXFIWEludWlqNVBYZFBUbF81UW0xRDNwVW1WYVJGNDFmV3NTSWlBeXgwcGJCcExRc1EwUHJYZTlTV0x6M2lOQjROa21PYjNKTlhQbnZlNkZKd3FJRG5LYnFaSEZYRFhsc3l2bHZwbk9UZFpqb1BkY3kxeDZnSXd1b0lkMkJoM2NWd1UzWQ</guid>
            <pubDate>Wed, 01 May 2024 07:00:00 GMT</pubDate>
            <description>&lt;a href="https://news.google.com/rss/articles/CBMipAFBVV95cUxNTENNa3EtZ182WEJURjRpZmJXelc3YklsLXFIWEludWlqNVBYZFBUbF81UW0xRDNwVW1WYVJGNDFmV3NTSWlBeXgwcGJCcExRc1EwUHJYZTlTV0x6M2lOQjROa21PYjNKTlhQbnZlNkZKd3FJRG5LYnFaSEZYRFhsc3l2bHZwbk9UZFpqb1BkY3kxeDZnSXd1b0lkMkJoM2NWd1UzWQ?oc=5" target="_blank"&gt;Roots: (Overseas) and Is Any Body Home? at Streatham Space Project&lt;/a&gt;&amp;nbsp;&amp;nbsp;&lt;font color="#6f6f6f"&gt;London Theatre 1&lt;/font&gt;</description>
            <source url="https://www.londontheatre1.com">London Theatre 1</source>
        </item>

From what I can see the html validation pattern only accepts d:d or d,d wher d is a digit. Also does not accept multiple ids divided by a space.

The other thing I noticed is there does not appear to be a date or tine unless the migration was successful. This would be useful to see when the migration was first started for failed migrations reported in the message tab.

One more thing, it might be useful to have the ability to remove a failed migration row, especially if it is no longer available. Not sure how this is handled at the moment. I do notice there are quite a few You do have the option to update but doing so from the UI will invariably run the whole batch (you do have the option to ignore dependencies though).

From phpmyadmin and can see there are quite a few migrations with a destination id, a source_row_status of 1 (needs update) and no last imported timestamp. There are also other that do have a last imported timestamp but do are also set to source_row_status 1. Am I right in thinking these will only be updated if the migration is run with the update flag?

2dareis2do’s picture

Also long ids are restricted to 255 chars or something. Many guids are longer.

Removing the pattern for the form does allow you to use non digital ids

I guess it may be possible to mark a row to be ignored rather than deleted?

2dareis2do’s picture

Ref the reg expression, I believe it is like so:

^[0-9]+(:[0-9]+)?(,?[0-9]+(:[0-9]+)?)*$

I believe this can be rewritten like so to support all upper and lower case alphabet characters as well as digits

^[0-9A-Za-z]+(:[0-9A-Za-z]+)?(,?[0-9A-Za-z]+(:[0-9A-Za-z]+)?)*$

https://regex101.com/r/ZctV9r/1

2dareis2do’s picture

Looking at the field, I can see the limit is set to 255 chars

    $form['options']['idlist'] = [
      '#type' => 'textfield',
      '#title' => $this->t('ID List'),
      '#maxlength' => 255,
      '#size' => 60,
      '#pattern' => '^[0-9]+(' . MigrateTools::DEFAULT_ID_LIST_DELIMITER . '[0-9]+)?(,?[0-9]+(' . MigrateTools::DEFAULT_ID_LIST_DELIMITER . '[0-9]+)?)*$',
      '#description' => $this->t('Comma-separated list of IDs to process.'),
      '#states' => [
        'enabled' => [
          ':input[name="operation"]' => [['value' => 'import'], 'or', ['value' => 'rollback']],
        ],
      ],
    ];

For my purposes, i think it makes sense to bump this considerably. In fact it might even make more sense to change this to a text area so that a user can see what has been pasted.

2dareis2do’s picture

Ok looks like #pattern is not supported by text area https://www.drupal.org/node/1953036

Added issue for this

https://www.drupal.org/project/drupal/issues/3465443

2dareis2do’s picture

StatusFileSize
new944 bytes
2dareis2do’s picture

StatusFileSize
new948 bytes

Updating patch to also support underscores as part of string

2dareis2do’s picture

StatusFileSize
new952 bytes

key can also contain -

e.g.

CBMi-AFBVV95cUxOR0FZOVowd1U2ZTA5X0diS0RqcjBCT2hqS1NNQ3Y4UkFlUF9IUHkzTTFGUFF6ZV9Ja3daTEV5UXlEaGZvNE5Hc2ZqVTAzVndiUG1CckUxYjBkcGgzdW9NM2J4cUV3dnZKUTE3N0VtUHNzTjlRZXRwZldIMGFKdlBVMWNHT2kzcmxuaWhWeXBiZEZHeG5QWlpCV0Nja2xManhfcEpOQWFlLWcxVVJGb2ZuN2wzcGZoRlNMQjlIRWZSNW12N2c0Yl8wV0lnbThtVGo1bWFQM1BRamwwVVZBSDFiN29EWlQ4TmRmOXFBblBOQzRLM2hHX213SNIB_gFBVV95cUxPSnZudGI2elVjVTRtQVRpNElkRjJPRFBqRFhVNHVEVGxObUlsZzVXRjNlSmZUdHlsbDl6cWg0OHh2T1dUMVNjWXJRNmtVNTJvT2ZpUml5bXc1eV9lUC15V1U1TUFFWUR3bTlEZTlwcGVqZkpUY2hObVNDYm5uSTlTTENMWTFfbE5CQi1ObVhVTlE4N0ppWFNnN3RxOEFCSjVpc2d0UkxXbmxkRG1vZ21mZ3lmd2Q1UkRVN0RoSHNfRUExaGlEMENxWUE0RVpFRXZrdGo5Y1hxcjZmZ3pGakJUWEhzc2VneTJWTnUxSk01RUtoa3B3ZzlRUWtIYVBaUQ

Updated reg expression

https://regex101.com/r/bIn6z2/1

2dareis2do’s picture

Looking at flickt api, they use the following syntax for guid

tag:flickr.com,2004:/photo/53927832048

Here is an example:

https://api.flickr.com/services/feeds/photos_public.gne?tags=streatham&f...

So a couple of things here:

  1. Use of : and , in key. These are currently used as delimiters for entering multiple values
  2. Use of . and / in key. These characters are not currently recognised

Furthermore, as mentioned previously the use of a single line input tag is restrictive, especially when entering multiple id values of any length

I am thinking it might be better to use a multi line input (text area) This does not currently support #pattern. That said I am thinking we should accept virtually and value as a value as a key maybe use a newline as a way of demarcating multiple entries. That would also be an UX improvement.

2dareis2do’s picture

Ok if I remove the pattern html5 check I can see the flickr id will get split into an array of arrays e.g.

0 =
array(2)
0 =
"tag"
1 =
"flickr.com"
1 =
array(2)
0 =
"2004"
1 =
"/photo/53729127947"

This will culminate in the following error:

ValueError: array_combine(): Argument #1 ($keys) and argument #2 ($values) must have the same number of elements in array_combine() (line 152 of modules/contrib/migrate_tools/src/MigrateBatchExecutable.php).

So what seems to happen it will explode a string by , and then each string in this case will get split by :

The code for this is in the main MigrateTools class e.g.

<?php

declare(strict_types = 1);

namespace Drupal\migrate_tools;

/**
 * Utility functionality for use in migrate_tools.
 */
class MigrateTools {

  /**
   * Default ID list delimiter.
   */
  public const DEFAULT_ID_LIST_DELIMITER = ':';

  /**
   * Build the list of specific source IDs to import.
   *
   * @param array $options
   *   The migration executable options.
   *
   *   The ID list.
   */
  public static function buildIdList(array $options): array {
    $options += [
      'idlist' => NULL,
      'idlist-delimiter' => self::DEFAULT_ID_LIST_DELIMITER,
    ];
    $id_list = [];
    if (is_scalar($options['idlist'])) {
      $id_list = explode(',', (string) $options['idlist']);
      array_walk($id_list, function (&$value) use ($options): void {
        $value = str_getcsv($value, $options['idlist-delimiter']);
      });
    }
    return $id_list;
  }

}
2dareis2do’s picture

Modified patch to:

  1. Use text area to enter ids
  2. Allow the existing option as the default (use of , and :)
  3. Add option to disable the default and use one entry per line

See attached screenshot for example of how this looks.

tstoeckler’s picture

Hit this as well, thanks for the patch!