Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
When a site is configured to use "Path prefix with language fallback" in the language configuration, the absolute URLs in the update screen, as well as many others including the language configuration page itself, are built incorrectly using the relative URL model: they are output as: /<language code>/http://www.example.com
.
Seeing how widespread this error appears to be, it might be located in a low-level function like l
or url
Comment | File | Size | Author |
---|---|---|---|
#20 | document.external.patch | 2.76 KB | Gábor Hojtsy |
#16 | 162738.png | 143.38 KB | fgm |
#7 | language.inc_.patch | 1.27 KB | fgm |
#1 | update.report.inc_.patch | 1.53 KB | fgm |
update_0.png | 9.06 KB | fgm |
Comments
Comment #1
fgmTracing what happens, the logic in
l
,t
, andlanguage_url_rewrite
does not seem faulty, so I made this patch, which forcibly adds the"absolute" = TRUE
attribute to links withinupdate-report.inc
.This fixes the problem with update.module downloads, but it might be desired to have to have the module decide for itself whether downloads are using relative (can this happen ?) or absolute URLs.
Going further, even
language_url_rewrite
could parse links to check whether they match the absolute URL format in common schemes like(http[s]|ftp)
, and thus automatically identify absolute links to avoid generating incorrect links when path prefixing has been activated. The problem, of course, would be the added CPU load to all link rewrites, which could make this unappealing.Comment #2
Gábor HojtsyIf something is passed into l() and "absolute" is not set, the link generated is considered relative, that is how it works. For a long time, l() was not even able to output absolute links, so relative is the actual default. Any such error should be fixed on the caller side.
Please test.
Comment #3
fgmHi Gabor.
The suggested patch does indeed fix things from the caller side : it only touches
update.report.inc
, and appears to work, so I'm not sure what you mean by "please test".Regarding
l
history, this is indeed interesting, but you know even better than me Drupal's well-publicized stance regarding backwards compatibility.l
itself evolved, as you mention yourself, so I'm not sure we could not at some point reconsider implementing such safeguards within these lower-level functions, if it could be done without a performance impact, which does not seem to be the case here, regrettably.Comment #4
Gábor HojtsyWell, it is not Drupal policy either, to leave bugs alone and try to work around them elsewhere, especially not in Drupal core code.
Comment #5
fgmI'm sorry to say this, but I do not understand the meaning of your answers: as I understand them, your first comment seems to imply this specific issue should be fixed in update module, which the patch put out for review does, and not touching the
l
,url
,language_url_rewrite
chain ; while the second one seems to imply that the current behaviour in said chain is a bug and should be fixed. Maybe I am misinterpreting what you are saying, so could you be more explicit ?I thought about it since my latest post, and it seems that it might be interesting to add this detection in
language_url_rewrite
and document it as such, but not inl
ort
, then benchmark. It is obvious that adding this detection will slow sites using language prefixing, since added code will be run on each link; but on the other hand, it also means that callers would no longer have to check URLs for relativity, possibly reducing the overall amount of code by refactoring all such checks at only one place. Since this is a trade-off between low code volume/ease of maintenance (language_url_rewrite
checks) versus code speed (caller checks), only hard numbers would probably tell what is the best choice : reducing the code volume tends to speed up code a bit too.The fact remains that this bug has to be fixed, though. Maybe you actually mean that the issue should be fixed in all the places where it occurs, not just in update module ? For this, all occurrences of the problem would have to be found (which is not the case currently), or the problem could be fixed at a single stroke at the level of
language_url_rewrite
. It would be nice to see other devs chiming in to see what must be done.Comment #6
Dries CreditAttribution: Dries commentedfgm, it seems that without the i18n stuff, the links work properly. However, as soon you turn on i18n, the links break. The solution is to figure out why that happens, rather than working around the problem by adding
absolute => true
. Links should always work, regardless of whether i18n is enabled or not. Enabling i18n should not affect the API of l() or url().Comment #7
fgmHere is a different patch, which modifies
language.inc/language_url_rewrite
to add a sanity check for theabsolute
option and avoids rewriting received URLs if they are valid absolute URLs.For those who don't want to check the code causing the problem, here is what happens.
theme_update_report
andtheme_update_version
) invokesl('sometitle', 'someabsoluteurl')
, without passing it theabsolute
parameter, because they do not require Drupal to forcibly emit an absolute URL, presumably because it is already an absolute ULl
invokesurl
, not passing itabsolute
since it did not receive iturl
receives noabsolute
option, and initializes it toFALSE
, according to its own API speclanguage_url_rewrite('someabsoluteurl', $options)
, in which$options
now holds aabsolute => FALSE
optionlanguage_url_rewrite
considers the link to be relative by examining theabsolute
option, and rewrites the absolute URL because it hasn't received theabsolute => TRUE
option, generating an incorrect link..url
, then onwards tol
, then to the callerWith this patch,
language_url_rewrite
performs a sanity check, notices actual valid absolute URLs, and avoids rewriting them.Basically, the caller relies on a not really specified behaviour in which drupal doesn't actually transform URLs passed to
l
andurl
ifabsolute
is not set, and the activation of i18n changes this behaviour. This appears to be fairly common in existing code, including core.In other words, there might be a confusion at work between the link formatting property specified by the
absolute
attribute, and the intrinsic relative/absolute nature of the passed URL, which is not addressed by this attribute.I can't but wonder what happens with
custom_url_rewrite_outbound
, which might face a similar problem, but no contrib code in DRUPAL-5--1-1 makes use of it to see how it interprets this attribute. In that regard, it might be interesting to consider moving this detection one step up intourl
instead : this is currently the only function invokinglanguage_url_rewrite
in core; that way any such rewriting function could benefit from the detection.Comment #8
Dries CreditAttribution: Dries commentedThanks for the careful analysis. It helps me understand the problem. Although this seems like a possible fix, it's a bit awkward and a performance killer. Because url() is often called hundreds of times per page load, we need to optimize the hell out of this. Calling valid_url() doesn't look like a fast solution.
It might be better to always pass the absolute flag when passing in an absolute URL, but it's not clear how much of an API change that would be. Do you have any idea of how many times we pass an absolute URL without settings the 'absolute' flag?
In fact, this might also solve another problem. I've seen people pass in incorrectly formated relative URLs (i.e.
url('foo#bar')
instead ofurl('foo', array('fragment' => 'bar'))
which might be a security issue as it performs a different set of security steps. That is, it will be perceived as an absolute URL. Even long time core contributors -- including me -- get this wrong every once in a while ...(We have both 'external' and 'absolute'?)
I'm not 100% yet but we might be better of with a slight API change. Certainly needs some additional investigation and discussion.
Comment #9
Gábor HojtsyPlease look out for the bigger picture. All I have said before, but I try to be more understandable now:
- l() was *originally* designed to only accept relative URLs, which will become *q=* values in the URL, or clean URL paths
- a parameter was added in Drupal 5(?) to make it accept a flag if the link is absolute
function l($text, $path, $attributes = array(), $query = NULL, $fragment = NULL, $absolute = FALSE, $html = FALSE) {
- Steven's l() refactoring made it possible to accept an array with the 'absolute' key in Drupal 6
function l($text, $path, $options = array()) {
Passing $absolute (or 'absolute' in D6) is *required* for absolute links, because otherwise l() and url() end up with links like:
- ?q=http://external.example.com/dsgfdg (without clean URLs)
- ?q=it/http://external.example.com/dsgfdg (without clean URLs with language features)
- it/http://external.example.com/dsgfdg (with clean URLs with language features)
- http://external.example.com/dsgfdg (with clean URLs without language features)
The fact that fgm only realized the third one as being a problem is rooted in that he turned clean URLs on. l() and url() makes different modifications on the URL depending on whether it is absolute or an onsite link. Not passing on 'absolute' for an absolute link is an error with the calling code.
We can obviously fix this with passing on 'absolute', which should be done as I have indicated above. We can surely fix it with adding some heuristics to url() to identify absolute links, but using such (expensive or not) heuristics was not a usual practice for Drupal before, and I don't think it is now. Then the 'absolute' parameter would become obsolote and this modificaton would require an API change to remove this unused attribute too, for which it is quite late in the release cycle now.
Maybe I am cleaner now?
Comment #10
Gábor HojtsySo not a language issue. I mark this for the 'base system', as the bogus 'external' cases mentioned by Dries should also be fixed.
Comment #11
fgmGabor,
I'm afraid that your interpretation of the meaning of absolute does not match the current drupal API reference for D4.6, D4.7, D5 and D6, which an all cases states
$absolute
(default FALSE) Whether to force the output to be an absolute link.This means that
$absolute
, at least as specified since D4.6, is not used to describe the fact that a URL parameter is an absolute link, but the fact that any URL passed with this parameter set to TRUE must be output (so may have to be rewritten) as an absolute URL, even if the URL is initially relative. Any different behaviour is in breach of the specification.As Dries said, this needs to be discussed and the result put in the API specification, because the current situation feels vaguely undecided:
We could obviously go fully on the drupal array-way, by passing URLs as the familiar arrays already used by
theme_links
instead of the current mix of string, individual flags (inl
) and options array (language_url_rewrite
and internal use byurl
). This would probably improve performance by removing the existing URL parsing performed inurl
(including filter_xss_bad_protocol and the code used to limit its usage), but would entail significant rewrites on any code using these functionsOr we could go fully the "markup" way, by passing complete URLs (no more separate fragment and query, no "absolute" or "external" hint), and just using directives like
absolute
andhtml
for rendering, not for parsing/rewriting.But obviously, either would be a very significant, probably unwelcome API change at this point in time. So overloading absolute with the meaning you suggest, and documenting it as such might be a way to tame the problem for the D6 lifecycle.
But in that case, doing as you suggest (quoting you:
), it seems we come back my the initial patch (passingabsolute
in the module) which neither you not Dries found good. What alternative are you suggesting ?Comment #12
Gábor Hojtsyfgm, I said the following:
In short: your patch which fixes this issue on the caller side looked OK to me, but it obviously needed some testing. Was this unclear?
You are right that we seem like interpreting $absolute differently. The documentation on l() says that $absolute means it should *end up* being an absolute link, not that it *is* an absolute link. And there is some heuristic in url() *already* to identify absolute links, and some logic which bypasses most of the other stuff in url().
So it seems like I was mistaken, and 'absolute' has a different meaning, and the language rewrite is mistaken in interpreting the meaning of the 'absolute' flag. So this is really a language issue. Seems like the language url rewrite could safely be moved after the *already* absolute URLs are handled (after the static and global declarations), and that should solve the problem.
Comment #13
Gábor HojtsyDoh, language_url_rewrite() is not mistaken in interpreting 'absolute', it does just right: when rewriting the URL to be absolute, it marks the flag that it is absolute. BUT if we move this call after the already done absolute link handling in url(), we should carefully test the result with domain based URL rewrites (which make the links absolute, and thus could break with the remaining parts of url()).
Comment #14
profix898 CreditAttribution: profix898 commentedI first thought it was my stupidity, but it actually seems like a more general problem with absolute urls (http://..) passed into l() or url(). Subscribing.
Comment #15
Gábor HojtsyThe absolute/external URL handling of the system recently changed, and this bug is probably fixed now, or at least only required a patch as posted in #1 above. It would be great if someone could test.
Comment #16
fgmFor the specific case outlined in this issue, the problem is fixed (see attachment).
Not sure where else we should check ?
Comment #17
Gábor HojtsyAFAIR, all external links we found are now marked as external.
Comment #18
fgmThen maybe we need to change the doc for l() somehow: it currently does not define the
external
attribute, although it passes it tourl()
, and it claims " you provide the full URL, it will be considered an external URL." while nothing appears to actually perform that check:However, since url() actually performs the external detection, it looks like just a doc issue: l() DOES accept an "external" attribute, and it is advisable to use it because testing for it is faster than relying on the detection in url() like Dries mentioned in this thread.
Comment #19
Gábor HojtsyRetitled. No patch here yet.
Comment #20
Gábor HojtsyOK, here is some docs patch to review. I hope that I have integrated all suggestions (and also improved the general documentation of $path on l()). Please review.
Comment #21
Dries CreditAttribution: Dries commentedReviewing this patch, and looking at the code in url(), I wondered why url() still tries to detect the protocol automatically. Is this something that can or should be removed? If not, it is probably worth documenting.
Right now, the documentation is confusing. If url() will attempt to detect the protocol, why bother with the external-flag?
Comment #22
fgmI think there were several considerations made when choosing the way
l()
andurl()
were modified:url()
autodetection essentially provides a safety net for the use of links withoutexternal
url()
withexternal
is still faster than without: ifexternal
is present, whetherTRUE
orFALSE
(which is why it isn't being added in the defaults), the rather slowfilter_xss_bad_protocol()
call is simply skipped, so it's a net speed gain in these cases, even on local links, for which the pretests (strpos()
) are still slower than no test at allfilter_xss_protocol
, like "irc:" scheme links, which would otherwise be dependent on the site admin not modifying the list of allowable protocols to allow such linksDoing things that way gives us the advantage of the safety net provided by autodetection for authors, while providing a faster processing for situations where the nature of links is known at the time of url() invocation.
A quick (and quite probably inaccurate) checkout of today's core shows 363 calls of url() and none of these defining
external
which means all of these trigger at least the first step of autodetection, but probably nothing beyond the pretests since most links in core are internal. It still suggests a potential optimization, though: adding the appropriate value ofexternal
where applicable.Comment #23
Gábor Hojtsyfgm: we added some "external" attributes to url() and/or l() calls, so there should be some.
Dries: url() autodetects external links, so you can enter external links as *menu items*. This is a feature of the menu system for some time now, but is implemented here for some reason (I guess to be broad enough to cover other use cases, when a path should be provided but a link is acceptable). I cannot think of more use cases though, but removing the detection would be a regression for the menu system. Moving the detection to the menu system might not be a regression though, I am not sure.
Comment #24
fgmGabor: you're right of course, my check was too quick (a simple grep on url( and external) and missed cases where external was not on the same line as the call. To be more accurate, within core, I checked again found 'external' to be currently used in:
...leaving 100% of direct url() calls without it and 98.6% of l() calls within core without it either.
Changing component, since this is visibly no longer a language system issue.
Comment #25
Gábor HojtsyDries, is my explanation clear? What needs to be changed here?
Comment #26
moshe weitzman CreditAttribution: moshe weitzman commentedauto-detect of external links is useful for fapi #redirect as well. it should not be moved to menu system IMO.
Comment #27
dpearcefl CreditAttribution: dpearcefl commentedIs this a need?