Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Problem/Motivation
If some link field data is migrated in from D6/D7 and the rules for the link back there weren't as strict... then this can lead to a WSOD and InvalidArgumentException thrown from LinkFormatter->buildUrl() => LinkItem->getUrl => URI->fromUri().
This was discovered in a Search API scenario where the exception was breaking all of searching because one of the search results had bad link data.
Proposed resolution
Try/catch that thing and be a little more friendly in rendering the link field.
Remaining tasks
User interface changes
API changes
Data model changes
Comment | File | Size | Author |
---|---|---|---|
#73 | interdiff-2745179-50-73.txt | 536 bytes | acbramley |
#73 | 2745179-73.patch | 7.42 KB | acbramley |
|
Comments
Comment #2
Mac_Weber CreditAttribution: Mac_Weber as a volunteer commentedChanging the status to
major
as it leads to WSOD and it is also a system update issue.@juancasantovenia may you provide some examples and steps to reproduce?
Comment #3
dawehnerMh, so we have some kind of non existing validation as part of saving some data with our API. While I agree that the actual site should not fail, we should also validate in the first place and ensure that the migrations are written in a way, that this cannot cause an issue.
Comment #4
heddnPart of the issue is that the current code in LinkFormatter is doing a ternary. It shouldn't. Because the only valid returned values from
getUrl()
are InvalidArgumentException and a valid URL.This site was an early adopter of D8. So it seems like perhaps the link validation either 1) wasn't fired in a migration because validation didn't exist outside of the FAPI or 2) it still doesn't exist. I'm also guessing that the source data was a text field that was used to generate a wrapper link around some image or text. Which isn't all that uncommon in D6. I've opened #2745779: Migration into Link field doesn't fire link validation to triage the migrate issue.
Comment #5
heddnAnd now closing it duplicate to #2698083: D6->D8: Migrating links without leading slash leads to fatal error
Comment #6
heddnBased on the findings in #2698083: D6->D8: Migrating links without leading slash leads to fatal error, we need to a fix here in link module. There are too many live D8 sites at this point. Sure, we should fix migrate moving forward. But we also need to wrap that URL building in a try/catch. Are there other options I'm missing? The site I'm dealing with has over 300 nodes that had malformed URLs. Causing significant issues with Search API and general navigation of the site.
Comment #7
juancasantito CreditAttribution: juancasantito at MTech, LLC commentedComment #13
dawehnerYou are missing a @group statement :(
Comment #14
juancasantito CreditAttribution: juancasantito at MTech, LLC commentedComment #16
heddnThis is one possible solution.
Comment #20
leymannx@asolero, and I are working on triaging this issue at DrupalCon Vienna 2017 following the instructions in https://www.drupal.org/node/2474049. @ekl1773 is helping as our mentor.
Comment #21
leymannxSuccessfully applied https://www.drupal.org/files/issues/2745179-14.patch from #14 in 8.5.x and sent it for retesting.
Still unclear what "malformed data" is exactly in this context. How does a faulty link look like for example? How to actually reproduce this? Tagged for summary update.
Possibly related to #2233883: Link migration needs to convert source url into the appropriate route format for storage.
Comment #23
cbeier CreditAttribution: cbeier commentedI have successfully applied the patch from #14 with Drupal 8.4.
In my case, I had import some data from a csv. This data was not in the best quality and some urls was not a valid url. Example: "http://Mo.-Do. 8:30-16:00; Fr. 8:30-12:00". This has lead in the "InvalidArgumentException" error.
With the patch, the problem was not triggered anymore.
Comment #24
dawehnerDo we still need to handle the case that an empty URL is returned?
Comment #25
cbeier CreditAttribution: cbeier commentedI think this is not the problem. Because, the buildUrl() function is not executed if the url field is empty. Even, if I remove the url value from the field in the database, this function is not executed.
Comment #26
dawehnerThank you for clarifying!
Comment #27
catchCould we assert the log message as well here?
Comment #28
heddnI'm not sure if this was what was expected for the logging. I don't see any other tests in core that are doing this, so I had to guess this is what you mean.
Comment #29
XanoThe log message needs some love, as there is absolutely no information in there that will contribute to finding and fixing the root cause of the problem. Seeing as we catch the exception and convert it to a log message in the field formatter, can we add information about which entity and link field this is about, or perhaps even link to the edit page?
Comment #30
XanoHow about something like this? :)
Comment #32
joachim CreditAttribution: joachim as a volunteer commentedShouldn't we output a warning message as well?
Comment #33
heddnAs in a drupal_set_message type warning?
Comment #34
joachim CreditAttribution: joachim as a volunteer commentedYup. IIRC if the type is set to 'error', then only admins see it?
Comment #35
XanoSetting a message is for the current user, regardless of message severity. What we can do is see if the user had edit permissions for the entity, and if they do, we display the error with instructions to fix it, and a link to the edit page?
Comment #37
MegaChriz CreditAttribution: MegaChriz at WebCoo commentedA similar error occurs for the UriLinkFormatter "uri_link" formatter.
If a (base) field is defined as "uri" and the field contains for example "private://hello.txt" and you try to render that with "uri_link" formatter an exception of type InvalidArgumentException is thrown with the following message:
This can lead to fatal errors if an user accidentally selects this formatter for an uri field. For background, see #3043433: Error when trying to render field 'source' as uri.
I can open a new issue for this, but it seems at least closely related to this issue.
Comment #38
dpiThe getUrl method already guarantees a
\Drupal\Core\Url
object is returned, we should move exception handling there, and return theUrl::fromRoute('<none>')
instance when the exception is thrownI dont think we should log any watchdog messages, we cant be expected to log messages whenever malformed data is loaded. Besides, the Url utility is already designed to throw exceptions in these cases, and we can safely ignore them. The existing line of code made the assumption that getUrl could return NULL, and falls back to the
Url::fromRoute('<none>')
instance, without logging any errors.Comment #39
dpiComment #40
dpiComment #41
dpiInstead of doing that, updated
\Drupal\link\Plugin\Field\FieldType\LinkItem::getUrl
to document it potentially throwing exceptions.\Drupal\link\Plugin\Validation\Constraint\LinkTypeConstraintValidator::validate
already catches this exception directly, and doesn't log or cause any other noise.Major rewrite to tests from Kernel to Unit, as we're testing internals, not a functional site or the render system.
Test cases include expected exception, unexpected exception, valid URL.
Comment #42
jibranThanks, the fix makes sense.
Comment #43
alexpottThis change makes sense. I can't see how $item->getUrl() can ever return anything other than a Url object or throw an exception.
I think I'm +1 on the new approach but will wait a bit before committing to see what others think.
Comment #44
MegaChriz CreditAttribution: MegaChriz at WebCoo commentedThe change looks okay (though I haven't studied the tests), but this issue probably would require follow-ups for other field formatters that try to generate urls.
For example:
I haven't checked TelephoneLinkFormatter, but if an uri is something like
private://hello.txt
, then trying to display that with the field formatter "uri_link" results into a fatal error. For background, see #3043433: Error when trying to render field 'source' as uri.Comment #46
heddnComment #48
heddnQuick re-roll now that #3059090: Deprecate \Drupal\Tests\PhpunitCompatibilityTrait::setExpectedException() has landed. Back to RTBC.
Comment #49
heddnComment #50
heddnGoofed that up. Here's a better re-roll.
Comment #51
alexpottSo I guess one thing we should be thinking about is should this result in a log message? @dpi argues that we shouldn't be logging something here but entities with invalid data are an error state that we should inform site owners about. How else are they going to know? At the very least we should have a comment here as to why we are not logging and eating the exception.
Comment #52
MegaChriz CreditAttribution: MegaChriz at WebCoo commented@alexpott
I would argue that the data does not have to be invalid, just not renderable by the selected format. In my opinion, something like
private://hello.txt
is a valid uri as it complies with the definition of a uri: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Definition. It is just that the formatter can't handle that type of uri currently.Perhaps we could add an option on the formatter for logging not renderable data? This way people can choose whether or not they want to catch data that the formatter can't handle.
Comment #53
dpiI'm not particularly fond of logging messages to admin/system logs due to errors potentially caused by user invalid user input.
There is the existing
LinkTypeConstraintValidator
validator which eats the exception, which I'm guessing is a higher traffic code path than this widget, seemingly without any problem since at least 2016.Comment #55
bucefal91 CreditAttribution: bucefal91 at Ocelot commentedWe are also experiencing the same issue on our website (D7 data migrated into D8, too)
For what it's worth, in my opinion committing this patch with or without log entry is better than not committing it. I opinion just a PHP comment about why we eat the exception is good enough, I do not think a log entry about malformed data is necessary. In fact, whenever you attempt to edit such malformed entity, Drupal validation API will not let you save it any more with an error message like "Web links entered must begin with ?, /, #, http:// or https://".
As for the patch itself #50, I find it impeccable :)
Comment #59
kim.pepperComment #60
kevinquillen CreditAttribution: kevinquillen at Velir commentedLogging is preferable with a handled exception so you are at least aware there is a problem and know where to look. If you are migrating data frequently (think feeds, APIs) into fields, this can happen often - so it isn't just when someone enters data into the Drupal admin form UI.
Here is one example of how it can get into Drupal:
Where 'uri' is bad, only its stored into the field. Then, when rendering, you get:
.
Comment #61
heddnre #60: one suggestion in this specific case is to turn on entity validation on the destination entity plugin.
Comment #62
kevinquillen CreditAttribution: kevinquillen at Velir commentedWill note that in the future. Might be hard to go that route though since we are over 100,000 items.
The patch in #50 did apply though in 9.2.x, and fixed the issue I mentioned in #60. I can't replicate the error from that anymore.
Comment #65
xjmComment #67
acbramley CreditAttribution: acbramley at PreviousNext for Service NSW commentedI'm wondering if there's appetite to commit the patch as is and move the logging discussion into a follow-up? I can see there being a bit of churn to get that right and this has been sitting untouched for quite a while now.
Comment #68
acbramley CreditAttribution: acbramley at PreviousNext for Service NSW commentedSetting to NR to see if anyone else agrees with #67
Comment #69
smustgrave CreditAttribution: smustgrave at Mobomo commentedBeen 5 months. @acbramley lets do what you suggested. Moving to NW for the follow up.
Comment #70
acbramley CreditAttribution: acbramley at PreviousNext for Service NSW commentedCreated #3348020: Add logging to LinkFormatter and LinkWidget when the URL is invalid
Comment #71
smustgrave CreditAttribution: smustgrave at Mobomo commentedMoving to NW as the patch matches the issue summary.
Test fails and shows the issue with error InvalidArgumentException
Think this is good. Patch #50 applies to 10.1 cleanly also.
Comment #72
larowlanIn which case, let's add a comment referring to the followup (e.g. @todo) as per #51
Comment #73
acbramley CreditAttribution: acbramley at PreviousNext for Service NSW commentedAdded todo.
Comment #74
smustgrave CreditAttribution: smustgrave at Mobomo commentedTodo was added.
Comment #75
godotislate CreditAttribution: godotislate at Digital Polygon commentedJust noting there is a similar issue with the Link Widget for users who don't have "link to any page" permissions: #3340154: Link-widget throws exception when rebuilding a form with an invalid uri
Comment #76
larowlanCommitted to 10.1.x and backported to 10.0.x
Ran the new test locally on 9.5.x (PHP 7.3)
Comment #81
gaddman CreditAttribution: gaddman commentedA similar problem exists with the MailToFormatter where an invalid email address is present, eg '000' - fromUri throws an InvalidArgumentException resulting in WSOD. Although the UI prevents this, our system ingests data from other systems which hasn't been validated. Is it worth raising a separate issue for that? Or is it worth addressing at all, since really the data should be valid in the first place?