Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Problem/Motivation
- Process plugin d6_field_link is used to migrate the content of Link fields from an older version of Drupal. This same process plugin is used also with D7-D8 migrations since the d7 link_field plugin extends d6.
- The migration works correctly if the source URL is a full absolute URL with a protocol prefix (e.g. http://).
- In Drupal 6/7, the Link fields also accept an URL without the protocol prefix, in other words a Link field with URL value www.example.com will work in Drupal 7. When this value is migrated to Drupal 8, the link is broken because Drupal 8 Link field expects the URL value to be a full absolute URL or a relative URL to the Drupal site itself.
Proposed resolution
To avoid breaking links that were working Drupal 7, add the following logic to d6_field_link process plugin:
- If the URL starts with "www.", add a "http://" prefix
Remaining tasks
Patch
Add tests
Review
Commit
User interface changes
None.
API changes
None.
Data model changes
None.
Comment | File | Size | Author |
---|---|---|---|
#53 | interdiff-49-53.txt | 1.75 KB | jofitz |
#53 | 2897254-53.patch | 14.16 KB | jofitz |
#49 | interdiff-47-49.txt | 4.34 KB | jofitz |
#49 | 2897254-49.patch | 14.38 KB | jofitz |
#47 | interdiff-44-47.txt | 3.57 KB | jofitz |
Comments
Comment #2
masipila CreditAttribution: masipila as a volunteer commentedPatch attached.
We should probably add automated tests for this and other Link migrations, I'll leave that to someone with more experience with tests.
Comment #3
masipila CreditAttribution: masipila as a volunteer commentedChanging the status to Needs work because we need to add tests.
Comment #5
jofitz CreditAttribution: jofitz at ComputerMinds commentedAdded tests.
No interdiff because the only change is adding the tests.
Comment #7
masipila CreditAttribution: masipila as a volunteer commentedUpdated issue summary / remaining tasks to help reviewers.
Current status:
Comment #8
heddnWhy are we stopping at www? Why not anything that doesn't have a protocol? But then what about https? This seems like a slippery slope. Did a link field in d6/d7 handle this natively? What was the state there? How did it handle it? We would want to re-process in a similar manner. Otherwise, I feel this is something that shouldn't go into core.
Comment #9
jofitz CreditAttribution: jofitz at ComputerMinds commentedI had a look at D7's
link_validate_url()
for some inspiration, but that may be overkill. 'All' we need is something like:Comment #10
heddnLink was in contrib in d7 and I recall it had some very fancy means for storing and building links. So, I'm OK with the simple, check in #9. I am concerned if passing along a malformed URL is going to cause any issue.
What if the url is
/foo
or//bar
? Can we add this as an additional configuration parameter to auto-prefix links? I'm worried this will not catch all the cases and I wonder it would be good to disable this by default, or something.But, did d7 link auto convert www.example.com into http://www.example.com? Or what? If it didn't do any of that, then I don't see what we need to fix it on migration. GIGO. If someone needs to fix it for /their/ garbage, they can do that in a custom process plugin.
Comment #11
jofitz CreditAttribution: jofitz at ComputerMinds commentedA quick experiment concludes that '/foo' transforms to 'internal:/foo', while '//bar' transforms to 'internal:/bar' which is acceptable, imo.
Yes, D7 prepended 'http://' onto external links without a protocol.
Here is a quick work-in-progress that I've thrown together before stopping for the day.
Comment #12
heddnre #11 this is great news. I'll take a look here, but I'm much more comfortable with tacking it on in the migration if that's how d7 link handled things.
Comment #13
heddnDoes this only effect d6 and not D7?
Should we remove the @TODO and add all the letters? And a couple of those new characters should have a test added for them, no?
Usually we use lowercase variable names.
Same here. Lowercase.
Comment #14
masipila CreditAttribution: masipila as a volunteer commentedheddn, this same process plugin is used with both d6-d8 and d7-d8 migrations. Like the first sentence of the issue summary says :)
Comment #15
heddnThen, we need move the namespace around here on this process plugin. Or use a trait, or something.
Comment #16
jofitz CreditAttribution: jofitz at ComputerMinds commentedEDIT: mentioned new test.
Comment #17
jofitz CreditAttribution: jofitz at ComputerMinds commented@masipila Where is d6_field_link used for D7 migrations? I can't find it in the codebase.
Comment #18
masipila CreditAttribution: masipila as a volunteer commentedRe: #17
When I'm looking at the migrations for node type that has a link field, I see this:
I didn't back track this all the way to check which piece of code is handling the Link field mappings.
Markus
Comment #19
masipila CreditAttribution: masipila as a volunteer commented@heddn, @Jo Fitzgerald: I guess D7 uses this process process plugin because class LinkField extends D6LinkField.
https://api.drupal.org/api/drupal/core%21modules%21link%21src%21Plugin%2...
Hope this helps,
Markus
Comment #20
heddnLooking very close.
This should be configurable. Some sites might opt to use SSL only. Let's make it a configurable option to the process plugin. Defaulting to http, since that is what was assumed in d6/d7.
Comment #21
heddntagging
Comment #22
jofitz CreditAttribution: jofitz at ComputerMinds commentedSince FieldLink is not d6 specific I have moved the file and updated the namespace as suggested by @heddn in #15.
Comment #23
jofitz CreditAttribution: jofitz at ComputerMinds commentedI'm not sure how to make the configurable option recommended in #20. Can someone provide a link to an example or some guidance, please?
Setting back to Needs Work for that change.
Comment #24
maxocub CreditAttribution: maxocub for Acquia commentedI think he means a configurable option like "bypass" and "default_value" are optional configuration of the static_map process plugin.
It looks like in core we are only using the d6_field_link plugin in a field plugin, not in any yml files. But since we are going to use http by default, we won't need to change the field plugins using it.
The only change needed is in the d6_field_link plugin itself, where we need to check if the configurable option is set, or if not, use the default.
Comment #25
rakesh.gectcrComment #26
rakesh.gectcrDiscussed with @urmaxocub @Drupalcon vienna. We need to have a test coverage for the same and we need Api documentation for the process plugin, will be working on it.
Comment #27
rakesh.gectcrOOPS! the interdiff was wrong... her is the one....
Comment #28
heddnI'm going to work on my own feedback.
Hmm, I don't think this is necessary.
We cannot just delete this plugin, we need to leave it with trigger_warnings and an @deprecated added to it. And we will want to retain test coverage of it as well.
This should change namespaces.
Comment #29
heddnI've picked up my nits from #28. And we are now using this in d7 field plugin too. Lastly, I added test coverage for making the uri scheme configurable.
Comment #30
maxocub CreditAttribution: maxocub for Acquia commentedI can't find a nit.
The only thing I find missing is a doc block on the field_link process plugin to document the new uri_scheme configuration and to show a code example.
Oh, and we also need the change record.
So back to NW for those reasons.
Comment #31
heddnHere's a start at docs and change record. I'm not feeling very inspired, so if anyone see any areas for improvement, fix things up for me.
I also noticed that our deprecated 'd6_field_link' process plugin wasn't extending the new process plugin. That is fixed too in the patch.
Comment #32
heddnFixing title
Comment #33
maxocub CreditAttribution: maxocub for Acquia commentedI think the docs and the CR are great. I just found a few nits:
This is identical to the parent method, so it can be removed.
We should add '(optional)'.
We could add an empty array as the default value for the $configuration argument and remove all those empty arrays.
Comment #34
jofitz CreditAttribution: jofitz at ComputerMinds commentedAddressed @maxocub's nits from #33.
Comment #35
maxocub CreditAttribution: maxocub for Acquia commentedEverything have been addressed, this looks ready, providing the tests are green.
Comment #36
catchSince this is corrupted data by the migration, bumping to critical.
Comment #37
phenaproximaSo close! Just a few things...
Nit: Can this say "...URLs without a scheme"?
I see no need for this plugin to implement ContainerFactoryPluginInterface. It's not using any injected services.
$this->migration is not used, so we should probably not bother mentioning it.
We're not using any injected dependencies, so we don't need this.
Namespaced constants are a PHP 5.6 thing, so \PHP_URL_SCHEME violates Drupal's minimum PHP version requirement. This should just be PHP_URL_SCHEME.
Also, we can lose the is_null() check, since parse_url() will return a truthy value if a scheme is present.
As far as I can tell, there is no need for strpos(); <front> is a full Drupal URI, so we could probably just straight-up return 'internal:/', right?
Comment #38
maxocub CreditAttribution: maxocub as a volunteer commented<front>?query=1
becomesinternal:/?query=1
, so we cannot just returninternal:/
in case we have query parameters.Comment #39
phenaproximaSa-weet. Back to RTBC on the assumption that Drupal CI will pass it.
Comment #40
larowlanAre we sure we wants to hardcode this list?
Will this break intranet links such as http://intranet.lan?
Is there anything in UrlHelper we can reuse here instead of maintaining our own code?
There are no tests for fragments here.
Comment #41
phenaproximaGood points. Let's fix those.
Comment #42
heddnI thought the same thing about the list of domain endings. But this is for a migration from d6/d7 and that was the list allowed previously in contrib. So, we can improve, but the source data already follows that pattern. This is specifically about updating in a prices plugin from previous to the new URL format in d8. And if someone altered or hacked in previous versions of drupal, they can extend and add their own custom plugin. We just need to deal with standard upgrade paths.
Comment #43
larowlanOk - can we get a comment to that effect ?
Comment #44
jofitz CreditAttribution: jofitz at ComputerMinds commentedComment #45
phenaproximaFantastic. Let's land this.
Comment #46
catchTwo questions. The logic is relatively tricky to follow, but I don't have suggestions to simplify it.
Is this copypasted from somewhere? If so can we provide a link to the source?
This wasn't configurable alterable in 6 or 7?
Comment #47
jofitz CreditAttribution: jofitz at ComputerMinds commentedComment #48
catchThe inline comments will fail coding standards checks, need to be in the line above.
Thanks for the domain change. Can we include a link that would previously have failed due to the domain check in the test coverage?
Comment #49
jofitz CreditAttribution: jofitz at ComputerMinds commentedSince the coding standards will not allow a similar array layout to that used in the Drupal 7 Link module I have returned to a single line string with explanation above (rather than have a 130+ line array!). This doesn't change the functionality at all.
I have added a link that would previously have failed due to the domain check (.xxx) and a link with non-standard characters and without protocol prefix (because the other link with non-standard characters was not testing the regex.
I have also corrected a few other coding standards errors.
Comment #50
phenaproximaLooks good to me.
Comment #51
heddnThis was also configurable in d7. See http://cgit.drupalcode.org/link/tree/link.module?h=7.x-1.5-beta2#n1458
Comment #52
phenaproximaThe Drupal 8 counterpart is mentioned in filter.module, in the _filter_url() function:
I don't know if the filter_allowed_protocols variable is migrate-able. Container parameters aren't like configuration; if I'm not mistaken, you have to write a service provider to modify them. That's light-years out of Migrate's scope.
IMHO, the correct approach here is to simply have the plugin use the container parameter (obtained via dependency injection, of course) and be done with it.
Comment #53
jofitz CreditAttribution: jofitz at ComputerMinds commentedAfter looking more closely, I realised that the $protocols variable is in fact a red herring: the whole else case is about dealing with URIs without a protocol (and internal links). Any valid links with a protocol are handled by parse_url() in the first conditional. So I have removed the protocol section of the regex.
Comment #54
phenaproximaPoint. Dayum.
Comment #55
catchCommitted/pushed to 8.5.x and cherry-picked to 8.4.x. Thanks!
Comment #59
quietone CreditAttribution: quietone at PreviousNext commentedPublish change record.