Problem/Motivation
If there is a link to a managed file in a text field, e.g. href='/sites/default/files/foo.pdf', the HtmlLink tracker will not identify this as a file entity. So there is no usage record of this file entity usage. If you are using usage info to fully remove references to a file, this will be missed and leave a 404 or other error.
I believe the problem is that the UrlToEntity code depends on converting urls to routes and file urls are direct links that are not routes. So they are not identified.
Steps to reproduce
- Files/Content should be allowed as targets/sources
- Upload or use an existing managed file
- Get the direct url to the file and the fid.
- Add a link to the file as a manually entered url, e.g. <a href='/sites/default/files/foo.pdf'>Test link</a>
- Note the node id
- Save the page.
- Query the usage database table for target_id = the fid and source_id = the node id
- There will be no entry
Proposed resolution
The UrlToEntity class should check if:
- file entities allowed as targets and
- the url starts with the public file system path.
If true, then convert the url to a public://(path sans public file path) Uri that can be used to query the uri field in the managed_files table to find the fid.
Return this file entity info.
Remaining tasks
Proposed plan discussion
Write the code.
User interface changes
API changes
Data model changes
| Comment | File | Size | Author |
|---|---|---|---|
| #2 | 3537701-2-expand-public-file-pattern.patch | 752 bytes | chewi3 |
Comments
Comment #2
chewi3 commentedI had a similar issue after updates where direct file links in body texts were no longer tracked for usage through HtmlLink. After debugging this issue, it turned out that the public file regex pattern in PublicFileIntegration was changed as part of https://www.drupal.org/project/entity_usage/issues/3514883 and now no longer matches file paths like "/sites/default/files/imagename.jpeg".
I made a patch that re-adds the optional leading slash which was present in earlier versions. I also made sure we never get a double trailing slash. (which I got at one point during testing).
This new pattern should now handle both external and local file systems. However, I did not actually test this with an external file system, so it would be great if someone could confirm that this still works with S3 or similar.
Comment #3
idebr commentedComment #4
james.williamsLooks like this is the same as #3521603: Local linked files are no longer tracked, which is older and has an MR?