Problem/Motivation

  • Process plugin d6_field_link is used to migrate the content of Link fields from an older version of Drupal. This same process plugin is used also with D7-D8 migrations since the d7 link_field plugin extends d6.
  • The migration works correctly if the source URL is a full absolute URL with a protocol prefix (e.g. http://).
  • In Drupal 6/7, the Link fields also accept an URL without the protocol prefix, in other words a Link field with URL value www.example.com will work in Drupal 7. When this value is migrated to Drupal 8, the link is broken because Drupal 8 Link field expects the URL value to be a full absolute URL or a relative URL to the Drupal site itself.

Proposed resolution

To avoid breaking links that were working Drupal 7, add the following logic to d6_field_link process plugin:

  • If the URL starts with "www.", add a "http://" prefix

Remaining tasks

Patch
Add tests
Review
Commit

User interface changes

None.

API changes

None.

Data model changes

None.

CommentFileSizeAuthor
#53 interdiff-49-53.txt1.75 KBjofitz
#53 2897254-53.patch14.16 KBjofitz
#49 interdiff-47-49.txt4.34 KBjofitz
#49 2897254-49.patch14.38 KBjofitz
#47 interdiff-44-47.txt3.57 KBjofitz
#47 2897254-47.patch15.6 KBjofitz
#44 interdiff-38-44.txt1.64 KBjofitz
#44 2897254-44.patch14.07 KBjofitz
#38 interdiff-2897254-34-38.txt2.42 KBmaxocub
#38 2897254-38.patch13.74 KBmaxocub
#34 interdiff-31-34.txt3.43 KBjofitz
#34 2897254-34.patch14.22 KBjofitz
#31 interdiff_29-31.txt4.17 KBheddn
#31 2897254-31.patch15.03 KBheddn
#29 interdiff_26-29.txt9.5 KBheddn
#29 2897254-29.patch11.91 KBheddn
#27 interdiff-2897254-22-26.txt11.75 KBrakesh.gectcr
#26 interdiff-2897254-22-26.txt0 bytesrakesh.gectcr
#26 2897254-26.patch10.39 KBrakesh.gectcr
#22 2897254-22.patch6.45 KBjofitz
#22 interdiff-16-22.txt2.87 KBjofitz
#16 2897254-15.patch3.72 KBjofitz
#16 interdiff-11-15.txt5.41 KBjofitz
#11 2897254-11.patch4.96 KBjofitz
#11 interdiff-5-11.txt4.98 KBjofitz
#5 2897254-5-complete.patch1.43 KBjofitz
#5 2897254-5-test_only.patch751 bytesjofitz
#2 2897254-2.patch710 bytesmasipila
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

masipila created an issue. See original summary.

masipila’s picture

Issue summary: View changes
Status: Active » Needs review
Issue tags: +Needs tests
FileSize
710 bytes

Patch attached.

We should probably add automated tests for this and other Link migrations, I'll leave that to someone with more experience with tests.

masipila’s picture

Issue summary: View changes
Status: Needs review » Needs work

Changing the status to Needs work because we need to add tests.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

jofitz’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests
FileSize
751 bytes
1.43 KB

Added tests.

No interdiff because the only change is adding the tests.

The last submitted patch, 5: 2897254-5-test_only.patch, failed testing. View results

masipila’s picture

Issue summary: View changes

Updated issue summary / remaining tasks to help reviewers.

Current status:

  • Jo Fitzgerald added the missing tests
  • Remaining tasks: review and commit
heddn’s picture

Status: Needs review » Needs work
+++ b/core/modules/link/src/Plugin/migrate/process/d6/FieldLink.php
@@ -53,6 +53,12 @@ protected function canonicalizeUri($uri) {
+    // If the URL starts with 'www.' without scheme, add 'http://' prefix.
+    if (strpos($uri, 'www.') === 0) {

Why are we stopping at www? Why not anything that doesn't have a protocol? But then what about https? This seems like a slippery slope. Did a link field in d6/d7 handle this natively? What was the state there? How did it handle it? We would want to re-process in a similar manner. Otherwise, I feel this is something that shouldn't go into core.

jofitz’s picture

I had a look at D7's link_validate_url() for some inspiration, but that may be overkill. 'All' we need is something like:

if (!$uri_is_internal && $uri_is_external) {
  return 'http://' . $uri;
}
heddn’s picture

Link was in contrib in d7 and I recall it had some very fancy means for storing and building links. So, I'm OK with the simple, check in #9. I am concerned if passing along a malformed URL is going to cause any issue.

What if the url is /foo or //bar? Can we add this as an additional configuration parameter to auto-prefix links? I'm worried this will not catch all the cases and I wonder it would be good to disable this by default, or something.

But, did d7 link auto convert www.example.com into http://www.example.com? Or what? If it didn't do any of that, then I don't see what we need to fix it on migration. GIGO. If someone needs to fix it for /their/ garbage, they can do that in a custom process plugin.

jofitz’s picture

Status: Needs work » Needs review
FileSize
4.98 KB
4.96 KB

A quick experiment concludes that '/foo' transforms to 'internal:/foo', while '//bar' transforms to 'internal:/bar' which is acceptable, imo.

Yes, D7 prepended 'http://' onto external links without a protocol.

Here is a quick work-in-progress that I've thrown together before stopping for the day.

heddn’s picture

re #11 this is great news. I'll take a look here, but I'm much more comfortable with tacking it on in the migration if that's how d7 link handled things.

heddn’s picture

Status: Needs review » Needs work
Issue tags: +Needs tests

Does this only effect d6 and not D7?

  1. +++ b/core/modules/link/src/Plugin/migrate/process/d6/FieldLink.php
    @@ -57,6 +57,98 @@ protected function canonicalizeUri($uri) {
    +      // @TODO Complete letters.
    

    Should we remove the @TODO and add all the letters? And a couple of those new characters should have a test added for them, no?

  2. +++ b/core/modules/link/src/Plugin/migrate/process/d6/FieldLink.php
    @@ -57,6 +57,98 @@ protected function canonicalizeUri($uri) {
    +      $LINK_ICHARS_DOMAIN = (string) html_entity_decode(implode("", [
    

    Usually we use lowercase variable names.

  3. +++ b/core/modules/link/src/Plugin/migrate/process/d6/FieldLink.php
    @@ -57,6 +57,98 @@ protected function canonicalizeUri($uri) {
    +      $LINK_ICHARS = $LINK_ICHARS_DOMAIN . (string) html_entity_decode(implode("", [
    ...
    +      $internal_pattern = "/^(?:[a-z0-9" . $LINK_ICHARS . "_\-+\[\] ]+)";
    ...
    +      $directories = "(?:\/[a-z0-9" . $LINK_ICHARS . "_\-\.~+%=&,$'#!():;*@\[\]]*)*";
    ...
    +      $query = "(?:\/?\?([?a-z0-9" . $LINK_ICHARS . "+_|\-\.~\/\\\\%=&,$'():;*@\[\]{} ]*))";
    +      $anchor = "(?:#[a-z0-9" . $LINK_ICHARS . "_\-\.~+%=&,$'():;*@\[\]\/\?]*)";
    ...
    +        $authentication = "(?:(?:(?:[\w\.\-\+!$&'\(\)*\+,;=" . $LINK_ICHARS . "]|%[0-9a-f]{2})+(?::(?:[\w" . $LINK_ICHARS . "\.\-\+%!$&'\(\)*\+,;=]|%[0-9a-f]{2})*)?)?@)";
    +        $domain = '(?:(?:[a-z0-9' . $LINK_ICHARS_DOMAIN . ']([a-z0-9' . $LINK_ICHARS_DOMAIN . '\-_\[\]])*)(\.(([a-z0-9' . $LINK_ICHARS_DOMAIN . '\-_\[\]])+\.)*(' . $LINK_DOMAINS . '|[a-z]{2}))?)';
    

    Same here. Lowercase.

masipila’s picture

heddn, this same process plugin is used with both d6-d8 and d7-d8 migrations. Like the first sentence of the issue summary says :)

heddn’s picture

Then, we need move the namespace around here on this process plugin. Or use a trait, or something.

jofitz’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests
FileSize
5.41 KB
3.72 KB
  1. Completed the list of characters allowed in a url (and simplified the code to obtain them). Added a test for these alternative characters.
  2. Set variable name to lower case.
  3. Set variable name to lower case.

EDIT: mentioned new test.

jofitz’s picture

@masipila Where is d6_field_link used for D7 migrations? I can't find it in the codebase.

masipila’s picture

Re: #17

When I'm looking at the migrations for node type that has a link field, I see this:

process:
  field_url:
    plugin: d6_field_link
    source: field_url

I didn't back track this all the way to check which piece of code is handling the Link field mappings.

Markus

masipila’s picture

@heddn, @Jo Fitzgerald: I guess D7 uses this process process plugin because class LinkField extends D6LinkField.

https://api.drupal.org/api/drupal/core%21modules%21link%21src%21Plugin%2...

Hope this helps,
Markus

heddn’s picture

Issue summary: View changes
Status: Needs review » Needs work

Looking very close.

+++ b/core/modules/link/src/Plugin/migrate/process/d6/FieldLink.php
@@ -57,6 +57,40 @@ protected function canonicalizeUri($uri) {
+      $anchor = "(?:#[a-z0-9" . $link_ichars . "_\-\.~+%=&,$'():;*@\[\]\/\?]*)";
...
+          return 'http://' . $uri;

This should be configurable. Some sites might opt to use SSL only. Let's make it a configurable option to the process plugin. Defaulting to http, since that is what was assumed in d6/d7.

heddn’s picture

Issue tags: +Vienna2017

tagging

jofitz’s picture

Status: Needs work » Needs review
FileSize
2.87 KB
6.45 KB

Since FieldLink is not d6 specific I have moved the file and updated the namespace as suggested by @heddn in #15.

jofitz’s picture

Status: Needs review » Needs work

I'm not sure how to make the configurable option recommended in #20. Can someone provide a link to an example or some guidance, please?

Setting back to Needs Work for that change.

maxocub’s picture

I think he means a configurable option like "bypass" and "default_value" are optional configuration of the static_map process plugin.

It looks like in core we are only using the d6_field_link plugin in a field plugin, not in any yml files. But since we are going to use http by default, we won't need to change the field plugins using it.

The only change needed is in the d6_field_link plugin itself, where we need to check if the configurable option is set, or if not, use the default.

rakesh.gectcr’s picture

Assigned: Unassigned » rakesh.gectcr
rakesh.gectcr’s picture

Status: Needs work » Needs review
Issue tags: +Needs tests, +Needs documentation
FileSize
10.39 KB
0 bytes

Discussed with @urmaxocub @Drupalcon vienna. We need to have a test coverage for the same and we need Api documentation for the process plugin, will be working on it.

rakesh.gectcr’s picture

FileSize
11.75 KB

OOPS! the interdiff was wrong... her is the one....

heddn’s picture

Assigned: rakesh.gectcr » Unassigned
Status: Needs review » Needs work

I'm going to work on my own feedback.

  1. +++ b/core/modules/link/src/Plugin/migrate/process/FieldLink.php
    @@ -0,0 +1,125 @@
    +          if (!empty($this->configuration['url_scheme']) && ($this->configuration['url_scheme'] === "https://")) {
    

    Hmm, I don't think this is necessary.

  2. +++ /dev/null
    @@ -1,86 +0,0 @@
    -class FieldLink extends ProcessPluginBase implements ContainerFactoryPluginInterface {
    

    We cannot just delete this plugin, we need to leave it with trigger_warnings and an @deprecated added to it. And we will want to retain test coverage of it as well.

  3. +++ b/core/modules/link/tests/src/Unit/Plugin/migrate/field/d6/LinkFieldTest.php
    index 4b4eda2249..5b272e95d2 100644
    --- a/core/modules/link/tests/src/Unit/Plugin/migrate/process/d6/FieldLinkTest.php
    
    --- a/core/modules/link/tests/src/Unit/Plugin/migrate/process/d6/FieldLinkTest.php
    +++ b/core/modules/link/tests/src/Unit/Plugin/migrate/process/d6/FieldLinkTest.php
    

    This should change namespaces.

heddn’s picture

I've picked up my nits from #28. And we are now using this in d7 field plugin too. Lastly, I added test coverage for making the uri scheme configurable.

maxocub’s picture

Status: Needs review » Needs work
Issue tags: -Needs tests

I can't find a nit.

The only thing I find missing is a doc block on the field_link process plugin to document the new uri_scheme configuration and to show a code example.

Oh, and we also need the change record.

So back to NW for those reasons.

heddn’s picture

Title: Process plugin d6_field_link: URLs without http:// prefix are broken after migration » URLs without http:// are broken after migration from d6 or d8
Status: Needs work » Needs review
Issue tags: -Needs documentation, -Needs change record +migrate-d6-d8
FileSize
15.03 KB
4.17 KB

Here's a start at docs and change record. I'm not feeling very inspired, so if anyone see any areas for improvement, fix things up for me.

I also noticed that our deprecated 'd6_field_link' process plugin wasn't extending the new process plugin. That is fixed too in the patch.

heddn’s picture

Title: URLs without http:// are broken after migration from d6 or d8 » URLs without http:// are broken after migration from d6 or d7

Fixing title

maxocub’s picture

Status: Needs review » Needs work

I think the docs and the CR are great. I just found a few nits:

  1. +++ b/core/modules/link/src/Plugin/migrate/field/d7/LinkField.php
    @@ -45,4 +45,15 @@ public function processFieldInstance(MigrationInterface $migration) {
    +  /**
    +   * {@inheritdoc}
    +   */
    +  public function processFieldValues(MigrationInterface $migration, $field_name, $data) {
    +    $process = [
    +      'plugin' => 'field_link',
    +      'source' => $field_name,
    +    ];
    +    $migration->mergeProcessOfProperty($field_name, $process);
    +  }
    +
    

    This is identical to the parent method, so it can be removed.

  2. +++ b/core/modules/link/src/Plugin/migrate/process/FieldLink.php
    @@ -0,0 +1,145 @@
    + * - uri_scheme: The URI scheme prefix to use for missing scheme urls. Defaults
    + *   to 'http://', which was the default in Drupal 6 and Drupal 7.
    

    We should add '(optional)'.

  3. +++ b/core/modules/link/tests/src/Unit/Plugin/migrate/process/FieldLinkTest.php
    @@ -0,0 +1,84 @@
    +  public function testCanonicalizeUri($url, $expected, $configuration) {
    ...
    +        [],
    ...
    +        [],
    ...
    +        [],
    ...
    +        [],
    ...
    +        [],
    ...
    +        [],
    ...
    +        [],
    ...
    +        [],
    

    We could add an empty array as the default value for the $configuration argument and remove all those empty arrays.

jofitz’s picture

Status: Needs work » Needs review
FileSize
14.22 KB
3.43 KB

Addressed @maxocub's nits from #33.

maxocub’s picture

Status: Needs review » Reviewed & tested by the community

Everything have been addressed, this looks ready, providing the tests are green.

catch’s picture

Priority: Normal » Critical
Issue tags: +Migrate critical

Since this is corrupted data by the migration, bumping to critical.

phenaproxima’s picture

Status: Reviewed & tested by the community » Needs work

So close! Just a few things...

  1. +++ b/core/modules/link/src/Plugin/migrate/process/FieldLink.php
    @@ -0,0 +1,146 @@
    + *   urls. Defaults to 'http://', which was the default in Drupal 6 and Drupal
    

    Nit: Can this say "...URLs without a scheme"?

  2. +++ b/core/modules/link/src/Plugin/migrate/process/FieldLink.php
    @@ -0,0 +1,146 @@
    +class FieldLink extends ProcessPluginBase implements ContainerFactoryPluginInterface {
    

    I see no need for this plugin to implement ContainerFactoryPluginInterface. It's not using any injected services.

  3. +++ b/core/modules/link/src/Plugin/migrate/process/FieldLink.php
    @@ -0,0 +1,146 @@
    +    $this->migration = $migration;
    

    $this->migration is not used, so we should probably not bother mentioning it.

  4. +++ b/core/modules/link/src/Plugin/migrate/process/FieldLink.php
    @@ -0,0 +1,146 @@
    +  /**
    +   * {@inheritdoc}
    +   */
    +  public static function create(ContainerInterface $container, array $configuration, $plugin_id, $plugin_definition, MigrationInterface $migration = NULL) {
    +    return new static(
    +      $configuration,
    +      $plugin_id,
    +      $plugin_definition,
    +      $migration
    +    );
    +  }
    

    We're not using any injected dependencies, so we don't need this.

  5. +++ b/core/modules/link/src/Plugin/migrate/process/FieldLink.php
    @@ -0,0 +1,146 @@
    +    if (empty($uri) || !is_null(parse_url($uri, \PHP_URL_SCHEME))) {
    

    Namespaced constants are a PHP 5.6 thing, so \PHP_URL_SCHEME violates Drupal's minimum PHP version requirement. This should just be PHP_URL_SCHEME.

    Also, we can lose the is_null() check, since parse_url() will return a truthy value if a scheme is present.

  6. +++ b/core/modules/link/src/Plugin/migrate/process/FieldLink.php
    @@ -0,0 +1,146 @@
    +    if (strpos($uri, '<front>') === 0) {
    

    As far as I can tell, there is no need for strpos(); <front> is a full Drupal URI, so we could probably just straight-up return 'internal:/', right?

maxocub’s picture

Status: Needs work » Needs review
FileSize
13.74 KB
2.42 KB
  1. Done.
  2. Done.
  3. Done.
  4. Done.
  5. Done.
  6. In the Unit test, we verify that <front>?query=1 becomes internal:/?query=1, so we cannot just return internal:/ in case we have query parameters.
phenaproxima’s picture

Status: Needs review » Reviewed & tested by the community

Sa-weet. Back to RTBC on the assumption that Drupal CI will pass it.

larowlan’s picture

Status: Reviewed & tested by the community » Needs review
  1. +++ b/core/modules/link/src/Plugin/migrate/process/FieldLink.php
    @@ -0,0 +1,131 @@
    +        $link_domains = 'aero|arpa|asia|biz|com|cat|coop|edu|gov|info|int|jobs|mil|museum|name|nato|net|org|pro|travel|mobi|local';
    

    Are we sure we wants to hardcode this list?

    Will this break intranet links such as http://intranet.lan?

    Is there anything in UrlHelper we can reuse here instead of maintaining our own code?

  2. +++ b/core/modules/link/tests/src/Unit/Plugin/migrate/process/FieldLinkTest.php
    @@ -0,0 +1,76 @@
    +  public function canonicalizeUriDataProvider() {
    

    There are no tests for fragments here.

phenaproxima’s picture

Status: Needs review » Needs work

Good points. Let's fix those.

heddn’s picture

I thought the same thing about the list of domain endings. But this is for a migration from d6/d7 and that was the list allowed previously in contrib. So, we can improve, but the source data already follows that pattern. This is specifically about updating in a prices plugin from previous to the new URL format in d8. And if someone altered or hacked in previous versions of drupal, they can extend and add their own custom plugin. We just need to deal with standard upgrade paths.

larowlan’s picture

I thought the same thing about the list of domain endings. But this is for a migration from d6/d7 and that was the list allowed previously in contrib

Ok - can we get a comment to that effect ?

jofitz’s picture

Status: Needs work » Needs review
FileSize
14.07 KB
1.64 KB
  1. Added comment about hard-coded domain endings.
  2. Added tests for fragments.
phenaproxima’s picture

Status: Needs review » Reviewed & tested by the community

Fantastic. Let's land this.

catch’s picture

Two questions. The logic is relatively tricky to follow, but I don't have suggestions to simplify it.

  1. +++ b/core/modules/link/src/Plugin/migrate/process/FieldLink.php
    @@ -0,0 +1,133 @@
    +      $link_ichars = '¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿŒœŠšŸŽžƒ';
    

    Is this copypasted from somewhere? If so can we provide a link to the source?

  2. +++ b/core/modules/link/src/Plugin/migrate/process/FieldLink.php
    @@ -0,0 +1,133 @@
    +        $link_domains = 'aero|arpa|asia|biz|com|cat|coop|edu|gov|info|int|jobs|mil|museum|name|nato|net|org|pro|travel|mobi|local';
    

    This wasn't configurable alterable in 6 or 7?

jofitz’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
15.6 KB
3.57 KB
  1. I have added an @see and tweaked the format to make it clear where $link_ichars came from.
  2. Good point, $link_domains was configurable so I have replaced the list with a regex.
catch’s picture

Status: Needs review » Needs work
+++ b/core/modules/link/src/Plugin/migrate/process/FieldLink.php
@@ -69,8 +69,79 @@ protected function canonicalizeUri($uri) {
+        "¿", // &#x00BF;

The inline comments will fail coding standards checks, need to be in the line above.

Thanks for the domain change. Can we include a link that would previously have failed due to the domain check in the test coverage?

jofitz’s picture

Status: Needs work » Needs review
FileSize
14.38 KB
4.34 KB

Since the coding standards will not allow a similar array layout to that used in the Drupal 7 Link module I have returned to a single line string with explanation above (rather than have a 130+ line array!). This doesn't change the functionality at all.

I have added a link that would previously have failed due to the domain check (.xxx) and a link with non-standard characters and without protocol prefix (because the other link with non-standard characters was not testing the regex.

I have also corrected a few other coding standards errors.

phenaproxima’s picture

Status: Needs review » Reviewed & tested by the community

Looks good to me.

heddn’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/modules/link/src/Plugin/migrate/process/FieldLink.php
@@ -0,0 +1,135 @@
+        $allowed_protocols = ['http', 'https', 'ftp', 'news', 'nntp', 'telnet', 'mailto', 'irc', 'ssh', 'sftp', 'webcal'];

This was also configurable in d7. See http://cgit.drupalcode.org/link/tree/link.module?h=7.x-1.5-beta2#n1458

phenaproxima’s picture

The Drupal 8 counterpart is mentioned in filter.module, in the _filter_url() function:

  $protocols = \Drupal::getContainer()->getParameter('filter_protocols');

I don't know if the filter_allowed_protocols variable is migrate-able. Container parameters aren't like configuration; if I'm not mistaken, you have to write a service provider to modify them. That's light-years out of Migrate's scope.

IMHO, the correct approach here is to simply have the plugin use the container parameter (obtained via dependency injection, of course) and be done with it.

jofitz’s picture

Status: Needs work » Needs review
FileSize
14.16 KB
1.75 KB

After looking more closely, I realised that the $protocols variable is in fact a red herring: the whole else case is about dealing with URIs without a protocol (and internal links). Any valid links with a protocol are handled by parse_url() in the first conditional. So I have removed the protocol section of the regex.

phenaproxima’s picture

Status: Needs review » Reviewed & tested by the community

After looking more closely, I realised that the $protocols variable is in fact a red herring: the whole else case is about dealing with URIs without a protocol (and internal links).

Point. Dayum.

catch’s picture

Version: 8.5.x-dev » 8.4.x-dev
Status: Reviewed & tested by the community » Fixed

Committed/pushed to 8.5.x and cherry-picked to 8.4.x. Thanks!

  • catch committed df1dd60 on 8.5.x
    Issue #2897254 by Jo Fitzgerald, heddn, rakesh.gectcr, maxocub, masipila...

  • catch committed 9b6801e on 8.4.x
    Issue #2897254 by Jo Fitzgerald, heddn, rakesh.gectcr, maxocub, masipila...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

quietone’s picture

Publish change record.