Problem/Motivation

Quoting @effulgentsia in #2335661-23: Outbound path & route processors must specify cacheability metadata:

See also #2417793-46: Allow entity: URIs to be entered in link fields, where changes to URL aliases don't result in a cache clear of rendered entities containing formatted links. Maybe that needs its own issue, but seems highly related to this, so just noting it here for now.

Reply by @Wim Leers in #2335661-98: Outbound path & route processors must specify cacheability metadata:

This is exactly what I feared was the case. Note that when changing the URL alias of a node, the node updates a MenuLinkContent entity. Which means everything is updated correctly. It is only when updating URL aliases via /admin/config/search/path, that no invalidation happens. But, to be honest, that's really a problem independent of this issue: URL aliases don't track changes and hence also not invalidations in any way. This issue is about fixing the more general problem, that sits a level higher: to make it possible for those outbound path/route processors that do have cacheability metadata, to be able to associate that cacheability with generated URLs.
So we should fix URL aliases in a separate issue.

This is that separate issue.

Proposed resolution

As written in #12:

When rendering
for all route params that are used to generate the link, associate the cache tags of those that implement CacheableDependencyInterface with the rendered link
When saving an alias in the URL alias UI
find the route name + params associated with the source path (=== "run routing")
get the set of cache tags from the route params
if empty set of cache tags: invalidate 'rendered' cache tag
if non-empty set of cache tags: invalidate those cache tags

Remaining tasks

Do it.

User interface changes

None.

API changes

None.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Wim Leers’s picture

Berdir’s picture

Let's continue the discussion from #2336597: Convert path aliases to full featured entities here..

Right, but aliases being entities means that we don't have to retrofit aliases with cache tags in some hacky way; they'll be able to use cache tags in exactly the same way as others can today.

I disagree :)

having a cache tag is one line of code. But you can't actually have a cache tag on the *alias*, we somehow need a cache tag for the source path or something like that. How else can you invalidate something when you *add* a new alias for something that didn't have one before?

And just using a list cache tag or so would invalidate most of the render cache every time you add an alias, not fun.

This is a pretty hard one...

mikeytown2’s picture

Is this something where we would want to use a system similar to the expire module where it does a search for things that need to be expired like a URL alias given a path?

Wim Leers’s picture

#3 Can you explain what the Expire module does?

mikeytown2’s picture

The original idea behind the expire module is when a node changes find all the related paths where that node can show up and output a list of URLs to clear from something like varnish. So if the node is tagged with front, be sure to include that in the list, or if it has the alias of about-us include that.

What made me think about this aliases can be used in reverse. I'm assuming we know the machine name but we need to find the aliases associated with it. Just do a reverse lookup of the alias. Or did I completely misread the issue summary?

My guess on the current issue:
node 1 (contact us) contains a link to node 2 (about us). Some one goes to /admin/config/search/path and decides to change the about-us alias to team-awesome. None 1 needs to be invalidated because it now is using the old alias. My current assumption is that linking node 1 to node 2 is kept somewhere besides a raw a href so we can use that to do a reverse lookup.

Wim Leers’s picture

My guess on the current issue: [...]

Exactly!

My current assumption is that linking node 1 to node 2 is kept somewhere besides a raw a href so we can use that to do a reverse lookup.

So, that means maintaining a reverse look-up table, even for every possible link, including links generated by filters. How can we possibly do that?
Only when generating URLs, we can know about all the dependencies involved in generating that link, including a dependency on a path alias.
But that then points to the next problem, which is the problem I was getting at above: creating a cache tag per URL alias would result in

  1. (arguably) far too many cache tags overall
  2. (unquestionably) far too many cache tags per response (one per URL alias linked to in a response)

While writing this, I thought of a possibly viable strategy: rather than having a cache tag per URL alias, we can choose to have a fixed number of cache tags: N (10, 50, 100). We hash the source path into the corresponding Ni value, and that's our cache tag. The downside is that we'd always invalidate (100/N) percent of cached responses. This is why we want to choose a higher N rather than a lower one. But that also translates into more cache tags being associated with the cached response.
In other words: there's always some level of pain: either we invalidate more than we'd like, or we have huge cache tags headers. This is the downside of having a system of URL aliases. We're just making the actual cost of that visible for the first time.
(Well, the second time, my very first Drupal core contribution was #106559: drupal_lookup_path() optimization - skip looking up certain paths in drupal_lookup_path(), over 8 years ago!)

mikeytown2’s picture

Is the url_alias table viable?

Here's a query that joins the url_alias table to the node table in D7

SELECT 
  COUNT(*) as counter, 
  url_alias.*,
  node.*
FROM url_alias AS url_alias 
INNER JOIN node AS node 
  ON node.nid = CAST(substring(url_alias.source, 6) AS UNSIGNED) 
  AND node.status = 1
GROUP BY node.nid
ORDER BY counter DESC

Takes about 10 seconds with 1.7M aliases. This just to show that one can use the url_alias to find the nodes associated with them. Maybe we add some fields to the url_alias table make this a lot faster and less brittle.

I really haven't taken a look at this issue in D8; just throwing some ideas out :)

In terms of cache tags I haven't looked at the D8 implementation of it yet but I will say that GROUP_CONCAT in MySQL is powerful. Using it you can take what would have taken 2+ queries and make it only take one query. An example of this is the role retrieval of the user on a session load; in D7 this is always 2 queries but by using GROUP_CONCAT, CONCAT & GROUP BY one can use just one query if you do some minimal processing after the fact. Would something like this help with the issue of having 100 aliases grouped together? Use SHA1() to hash the GROUP_CONCAT field to make the output smaller (see advagg bundler for an example of this).

Berdir’s picture

the substring query is a known issue, fixed in D8 with a D7 patch: #1209226: Avoid slow query for path alias whitelists. And not related to this issue I think.

Wim Leers’s picture

Wim Leers’s picture

Wim Leers’s picture

Title: URL aliases don't track changes, hence don't do necessary invalidations » Using the URL alias UI to change aliases doesn't do necessary invalidations

Discussed with catch. While doing so, I thought of a solution:

When rendering
for all route params that are used to generate the link, associate the cache tags of those that implement CacheableDependencyInterface with the rendered link
When saving an alias in the URL alias UI
find the route name + params associated with the source path (=== "run routing")
get the set of cache tags from the route params
if empty set of cache tags: invalidate 'rendered' cache tag
if non-empty set of cache tags: invalidate those cache tags

That makes it so that even render cached nodes that link to themselves and whose URL alias is changed in the URL alias UI are kept up-to-date.

The quick 'n dirty solution for now, which catch finds acceptable too: always invalidate the 'rendered' cache tag when saving an alias in the URL alias UI. Personally, I think it's better to keep it as "broken" as it is today until we fix it in the above way, which has far more precise cache invalidations.

Wim Leers’s picture

Issue summary: View changes
dawehner’s picture

Component: base system » path.module
Wim Leers’s picture

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

effulgentsia’s picture

I also just noticed that the 4xx-response cache tag introduced in #2472281: 404/403 responses for non-existing nodes are cached in Page Cache/reverse proxy, are not invalidated when the node is created doesn't get invalidated when a URL alias is updated via /admin/config/search/path. It only gets cleared when an entity is saved.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Wim Leers’s picture

Wim Leers’s picture

Title: Using the URL alias UI to change aliases doesn't do necessary invalidations » Using the URL alias UI to change aliases doesn't do necessary invalidations: path aliases don't have cache tags
Berdir’s picture

Sure that this is the correct issue reference?

What you need is first #2539634: PathItem::delete() never runs because the path field type is a computed field in disguise and a follow-up of that to actually compute/load the alias when accessed. which is the better approach than what that other issue is doing, I don't want to always load them, only when necessary.

Wim Leers’s picture

davidwbarratt’s picture

This issue is blocking #2835528: Path module is missing a REST plugin from being cacheable. :(

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

andypost’s picture

Version: 8.6.x-dev » 8.7.x-dev

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

prabhu9484’s picture

Just curious, does this issue cover this problem statement.

Actual:
When we change the URL alias of an already created node, then refreshing the old URL returns a 404

Expected:
The new URL and page content should be returned and not a 404 error.

Approach:
See if it is possible to seamlessly remap the URL.

Use Case: Content author created a new page of content and then the SEO team asks for the URL to be changed. Based on SEO recommendations, the content author changes the node title and URL alias. When the author refreshes the old URL, automagically the new URL and page content is returned and not a 404 error.

FYI - WordPress seems to have resolved this.

jwilson3’s picture

Isn't what #33 describes perfectly handled by the Redirect module setting to Automatically create redirects when URL aliases are changed.

On /admin/config/search/path/settings

On /admin/config/search/redirect/settings

If I understand correctly (possibly I'm wrong) the scope of this issue seems that if you have a manually entered link to a URL alias page in the body content of node A that points to the url alias of node B, and node B's url alias gets updated, then they want a way to somehow invalidate the cache of node A and dynamically update that link in the body to point to the right alias.

I can potentially see this happening, particularly when using the Linkit module, which gives you a fancy pants way to link to nodes in the wysiwyg of the body field. But with the URL Alias + Redirect settings dialed in, the old alias becomes a 301 redirect (which results in an expected behavior of taking you to the page you intended).

prabhu9484’s picture

Many thanks jwilson3 - my understanding for the scope of this issue is that for a given node, your comment should be default(core) functionality without having to download,install and configure a contributed module - also wondering if this is related to https://www.drupal.org/project/drupal/issues/133552 - if there is a clear consensus by the community on these issues, maybe update the status of the issues?

eelkeblok’s picture

jwilson3 is being polite, but spot on. The difference between what you are describing and what this issue is about, is whether the URL is embedded in actual content on the site (what this issue is about; if it is, it needs to be updated when the alias changes) or whether visiting the URL directly, e.g. because another site is linking to it, or because a user has bookmarked it or just typed it into the address bar (what you are describing).

With redirect module in place, this issue would hurt less, because the redirect would handle the problem (but it would still be something to fix; it is not very tidy to have your own site rely on redirects; those are for *other* parties that can not reasonably know you changed a URL; your own site can know that and should just give you the right URL in the first place).

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.