When a site is configured to use "Path prefix with language fallback" in the language configuration, the absolute URLs in the update screen, as well as many others including the language configuration page itself, are built incorrectly using the relative URL model: they are output as: /<language code>/http://www.example.com.

Seeing how widespread this error appears to be, it might be located in a low-level function like l or url

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

fgm’s picture

Status: Active » Needs review
FileSize
1.53 KB

Tracing what happens, the logic in l, t, and language_url_rewrite does not seem faulty, so I made this patch, which forcibly adds the "absolute" = TRUE attribute to links within update-report.inc.

This fixes the problem with update.module downloads, but it might be desired to have to have the module decide for itself whether downloads are using relative (can this happen ?) or absolute URLs.

Going further, even language_url_rewrite could parse links to check whether they match the absolute URL format in common schemes like (http[s]|ftp), and thus automatically identify absolute links to avoid generating incorrect links when path prefixing has been activated. The problem, of course, would be the added CPU load to all link rewrites, which could make this unappealing.

Gábor Hojtsy’s picture

If something is passed into l() and "absolute" is not set, the link generated is considered relative, that is how it works. For a long time, l() was not even able to output absolute links, so relative is the actual default. Any such error should be fixed on the caller side.

Please test.

fgm’s picture

Hi Gabor.

The suggested patch does indeed fix things from the caller side : it only touches update.report.inc, and appears to work, so I'm not sure what you mean by "please test".

Regarding l history, this is indeed interesting, but you know even better than me Drupal's well-publicized stance regarding backwards compatibility. l itself evolved, as you mention yourself, so I'm not sure we could not at some point reconsider implementing such safeguards within these lower-level functions, if it could be done without a performance impact, which does not seem to be the case here, regrettably.

Gábor Hojtsy’s picture

Well, it is not Drupal policy either, to leave bugs alone and try to work around them elsewhere, especially not in Drupal core code.

fgm’s picture

Component: base system » update system

I'm sorry to say this, but I do not understand the meaning of your answers: as I understand them, your first comment seems to imply this specific issue should be fixed in update module, which the patch put out for review does, and not touching the l, url, language_url_rewrite chain ; while the second one seems to imply that the current behaviour in said chain is a bug and should be fixed. Maybe I am misinterpreting what you are saying, so could you be more explicit ?

I thought about it since my latest post, and it seems that it might be interesting to add this detection in language_url_rewrite and document it as such, but not in l or t, then benchmark. It is obvious that adding this detection will slow sites using language prefixing, since added code will be run on each link; but on the other hand, it also means that callers would no longer have to check URLs for relativity, possibly reducing the overall amount of code by refactoring all such checks at only one place. Since this is a trade-off between low code volume/ease of maintenance (language_url_rewrite checks) versus code speed (caller checks), only hard numbers would probably tell what is the best choice : reducing the code volume tends to speed up code a bit too.

The fact remains that this bug has to be fixed, though. Maybe you actually mean that the issue should be fixed in all the places where it occurs, not just in update module ? For this, all occurrences of the problem would have to be found (which is not the case currently), or the problem could be fixed at a single stroke at the level of language_url_rewrite. It would be nice to see other devs chiming in to see what must be done.

Dries’s picture

fgm, it seems that without the i18n stuff, the links work properly. However, as soon you turn on i18n, the links break. The solution is to figure out why that happens, rather than working around the problem by adding absolute => true. Links should always work, regardless of whether i18n is enabled or not. Enabling i18n should not affect the API of l() or url().

fgm’s picture

Component: update system » language system
FileSize
1.27 KB

Here is a different patch, which modifies language.inc/language_url_rewrite to add a sanity check for the absolute option and avoids rewriting received URLs if they are valid absolute URLs.

For those who don't want to check the code causing the problem, here is what happens.

  1. a function (in the original case here, theme_update_report and theme_update_version) invokes l('sometitle', 'someabsoluteurl'), without passing it the absolute parameter, because they do not require Drupal to forcibly emit an absolute URL, presumably because it is already an absolute UL
  2. l invokes url, not passing it absolute since it did not receive it
  3. url receives no absolute option, and initializes it to FALSE, according to its own API spec
  4. since i18n is enabled, it invokes language_url_rewrite('someabsoluteurl', $options), in which $options now holds a absolute => FALSE option
  5. without this patch, language_url_rewrite considers the link to be relative by examining the absolute option, and rewrites the absolute URL because it hasn't received the absolute => TRUE option, generating an incorrect link..
  6. ...which returns to url, then onwards to l, then to the caller

With this patch, language_url_rewrite performs a sanity check, notices actual valid absolute URLs, and avoids rewriting them.

Basically, the caller relies on a not really specified behaviour in which drupal doesn't actually transform URLs passed to l and url if absolute is not set, and the activation of i18n changes this behaviour. This appears to be fairly common in existing code, including core.

In other words, there might be a confusion at work between the link formatting property specified by the absolute attribute, and the intrinsic relative/absolute nature of the passed URL, which is not addressed by this attribute.

I can't but wonder what happens with custom_url_rewrite_outbound, which might face a similar problem, but no contrib code in DRUPAL-5--1-1 makes use of it to see how it interprets this attribute. In that regard, it might be interesting to consider moving this detection one step up into url instead : this is currently the only function invoking language_url_rewrite in core; that way any such rewriting function could benefit from the detection.

Dries’s picture

Thanks for the careful analysis. It helps me understand the problem. Although this seems like a possible fix, it's a bit awkward and a performance killer. Because url() is often called hundreds of times per page load, we need to optimize the hell out of this. Calling valid_url() doesn't look like a fast solution.

It might be better to always pass the absolute flag when passing in an absolute URL, but it's not clear how much of an API change that would be. Do you have any idea of how many times we pass an absolute URL without settings the 'absolute' flag?

In fact, this might also solve another problem. I've seen people pass in incorrectly formated relative URLs (i.e. url('foo#bar') instead of url('foo', array('fragment' => 'bar')) which might be a security issue as it performs a different set of security steps. That is, it will be perceived as an absolute URL. Even long time core contributors -- including me -- get this wrong every once in a while ...

drupal-cvs $ grep -r "url('http" * | wc -l
       2
drupal-cvs $ grep -r 'url("http' * | wc -l
       0
drupal-cvs $ grep -r " l(" * | grep "http://"
modules/openid/openid.module:      '#description' => l(t('What is OpenID?'), 'http://openid.net/', array('external' => TRUE))
modules/syslog/syslog.module:      '!php'         => l("PHP's syslog", 'http://www.php.net/manual/en/function.openlog.php'),
modules/syslog/syslog.module:      '!syslog_conf' => l('UNIX/Linux syslog.conf', 'http://www.rt.com/man/syslog.5.html'),

(We have both 'external' and 'absolute'?)

I'm not 100% yet but we might be better of with a slight API change. Certainly needs some additional investigation and discussion.

Gábor Hojtsy’s picture

Status: Needs review » Needs work

Please look out for the bigger picture. All I have said before, but I try to be more understandable now:

- l() was *originally* designed to only accept relative URLs, which will become *q=* values in the URL, or clean URL paths
- a parameter was added in Drupal 5(?) to make it accept a flag if the link is absolute
function l($text, $path, $attributes = array(), $query = NULL, $fragment = NULL, $absolute = FALSE, $html = FALSE) {
- Steven's l() refactoring made it possible to accept an array with the 'absolute' key in Drupal 6
function l($text, $path, $options = array()) {

Passing $absolute (or 'absolute' in D6) is *required* for absolute links, because otherwise l() and url() end up with links like:

- ?q=http://external.example.com/dsgfdg (without clean URLs)
- ?q=it/http://external.example.com/dsgfdg (without clean URLs with language features)
- it/http://external.example.com/dsgfdg (with clean URLs with language features)
- http://external.example.com/dsgfdg (with clean URLs without language features)

The fact that fgm only realized the third one as being a problem is rooted in that he turned clean URLs on. l() and url() makes different modifications on the URL depending on whether it is absolute or an onsite link. Not passing on 'absolute' for an absolute link is an error with the calling code.

We can obviously fix this with passing on 'absolute', which should be done as I have indicated above. We can surely fix it with adding some heuristics to url() to identify absolute links, but using such (expensive or not) heuristics was not a usual practice for Drupal before, and I don't think it is now. Then the 'absolute' parameter would become obsolote and this modificaton would require an API change to remove this unused attribute too, for which it is quite late in the release cycle now.

Maybe I am cleaner now?

Gábor Hojtsy’s picture

Component: language system » base system
Priority: Critical » Normal

So not a language issue. I mark this for the 'base system', as the bogus 'external' cases mentioned by Dries should also be fixed.

fgm’s picture

Gabor,

I'm afraid that your interpretation of the meaning of absolute does not match the current drupal API reference for D4.6, D4.7, D5 and D6, which an all cases states $absolute (default FALSE) Whether to force the output to be an absolute link.

This means that $absolute, at least as specified since D4.6, is not used to describe the fact that a URL parameter is an absolute link, but the fact that any URL passed with this parameter set to TRUE must be output (so may have to be rewritten) as an absolute URL, even if the URL is initially relative. Any different behaviour is in breach of the specification.

As Dries said, this needs to be discussed and the result put in the API specification, because the current situation feels vaguely undecided:

  • we currently pass URL parts separately: "path", fragment, query, and formatting hints (html, absolute)
  • we currently keep other parts of the URLs together as "path": scheme, host, path
  • the suggestion would have drupal superimpose two meanings on the absolute hint:
    • format any link as absolute
    • the "path" is actually an absolute URL of either the http, https, of ftp schemes

We could obviously go fully on the drupal array-way, by passing URLs as the familiar arrays already used by theme_links instead of the current mix of string, individual flags (in l) and options array (language_url_rewrite and internal use by url). This would probably improve performance by removing the existing URL parsing performed in url (including filter_xss_bad_protocol and the code used to limit its usage), but would entail significant rewrites on any code using these functions

Or we could go fully the "markup" way, by passing complete URLs (no more separate fragment and query, no "absolute" or "external" hint), and just using directives like absolute and html for rendering, not for parsing/rewriting.

But obviously, either would be a very significant, probably unwelcome API change at this point in time. So overloading absolute with the meaning you suggest, and documenting it as such might be a way to tame the problem for the D6 lifecycle.

But in that case, doing as you suggest (quoting you: We can obviously fix this with passing on 'absolute', which should be done), it seems we come back my the initial patch (passing absolute in the module) which neither you not Dries found good. What alternative are you suggesting ?

Gábor Hojtsy’s picture

Component: base system » language system

But in that case, doing as you suggest (quoting you: We can obviously fix this with passing on 'absolute', which should be done), it seems we come back my the initial patch (passing absolute in the module) which neither you not Dries found good. What alternative are you suggesting?

fgm, I said the following:

If something is passed into l() and "absolute" is not set, the link generated is considered relative, that is how it works. For a long time, l() was not even able to output absolute links, so relative is the actual default. Any such error should be fixed on the caller side.

Please test.

In short: your patch which fixes this issue on the caller side looked OK to me, but it obviously needed some testing. Was this unclear?

You are right that we seem like interpreting $absolute differently. The documentation on l() says that $absolute means it should *end up* being an absolute link, not that it *is* an absolute link. And there is some heuristic in url() *already* to identify absolute links, and some logic which bypasses most of the other stuff in url().

So it seems like I was mistaken, and 'absolute' has a different meaning, and the language rewrite is mistaken in interpreting the meaning of the 'absolute' flag. So this is really a language issue. Seems like the language url rewrite could safely be moved after the *already* absolute URLs are handled (after the static and global declarations), and that should solve the problem.

Gábor Hojtsy’s picture

Doh, language_url_rewrite() is not mistaken in interpreting 'absolute', it does just right: when rewriting the URL to be absolute, it marks the flag that it is absolute. BUT if we move this call after the already done absolute link handling in url(), we should carefully test the result with domain based URL rewrites (which make the links absolute, and thus could break with the remaining parts of url()).

profix898’s picture

I first thought it was my stupidity, but it actually seems like a more general problem with absolute urls (http://..) passed into l() or url(). Subscribing.

Gábor Hojtsy’s picture

The absolute/external URL handling of the system recently changed, and this bug is probably fixed now, or at least only required a patch as posted in #1 above. It would be great if someone could test.

fgm’s picture

Status: Needs work » Needs review
FileSize
143.38 KB

For the specific case outlined in this issue, the problem is fixed (see attachment).

Not sure where else we should check ?

Gábor Hojtsy’s picture

Status: Needs review » Fixed

AFAIR, all external links we found are now marked as external.

fgm’s picture

Status: Fixed » Needs work

Then maybe we need to change the doc for l() somehow: it currently does not define the external attribute, although it passes it to url(), and it claims " you provide the full URL, it will be considered an external URL." while nothing appears to actually perform that check:

  • l() itself just processes the text and "attributes" option before passing the URL to url()
  • url() expects the presence of the "external" attribute, which l() does not pass it

However, since url() actually performs the external detection, it looks like just a doc issue: l() DOES accept an "external" attribute, and it is advisable to use it because testing for it is faster than relying on the detection in url() like Dries mentioned in this thread.

Gábor Hojtsy’s picture

Title: Incorrect update URLs on update screen » Update l() documentation to include 'external' attribute
Status: Needs work » Active

Retitled. No patch here yet.

Gábor Hojtsy’s picture

Status: Active » Needs review
FileSize
2.76 KB

OK, here is some docs patch to review. I hope that I have integrated all suggestions (and also improved the general documentation of $path on l()). Please review.

Dries’s picture

Status: Needs review » Needs work

Reviewing this patch, and looking at the code in url(), I wondered why url() still tries to detect the protocol automatically. Is this something that can or should be removed? If not, it is probably worth documenting.

Right now, the documentation is confusing. If url() will attempt to detect the protocol, why bother with the external-flag?

fgm’s picture

I think there were several considerations made when choosing the way l() and url() were modified:

  1. url() autodetection essentially provides a safety net for the use of links without external
  2. url() with external is still faster than without: if external is present, whether TRUE or FALSE (which is why it isn't being added in the defaults), the rather slow filter_xss_bad_protocol() call is simply skipped, so it's a net speed gain in these cases, even on local links, for which the pretests (strpos()) are still slower than no test at all
  3. it allows the programmer to pass external URLs that will not trigger filter_xss_protocol, like "irc:" scheme links, which would otherwise be dependent on the site admin not modifying the list of allowable protocols to allow such links

Doing things that way gives us the advantage of the safety net provided by autodetection for authors, while providing a faster processing for situations where the nature of links is known at the time of url() invocation.

A quick (and quite probably inaccurate) checkout of today's core shows 363 calls of url() and none of these defining external which means all of these trigger at least the first step of autodetection, but probably nothing beyond the pretests since most links in core are internal. It still suggests a potential optimization, though: adding the appropriate value of external where applicable.

Gábor Hojtsy’s picture

fgm: we added some "external" attributes to url() and/or l() calls, so there should be some.

Dries: url() autodetects external links, so you can enter external links as *menu items*. This is a feature of the menu system for some time now, but is implemented here for some reason (I guess to be broad enough to cover other use cases, when a path should be provided but a link is acceptable). I cannot think of more use cases though, but removing the detection would be a regression for the menu system. Moving the detection to the menu system might not be a regression though, I am not sure.

fgm’s picture

Component: language system » base system

Gabor: you're right of course, my check was too quick (a simple grep on url( and external) and missed cases where external was not on the same line as the call. To be more accurate, within core, I checked again found 'external' to be currently used in:

  • openid.module : passed to l(), set to TRUE
  • syslog.module : passed to l(), set to TRUE, twice
  • system.module : passed to l(), set to TRUE
  • menu system : does some complicated processing taking it into account

...leaving 100% of direct url() calls without it and 98.6% of l() calls within core without it either.

Changing component, since this is visibly no longer a language system issue.

Gábor Hojtsy’s picture

Status: Needs work » Needs review

Dries, is my explanation clear? What needs to be changed here?

moshe weitzman’s picture

auto-detect of external links is useful for fapi #redirect as well. it should not be moved to menu system IMO.

dpearcefl’s picture

Is this a need?

Status: Needs review » Needs work

The last submitted patch, document.external.patch, failed testing.

Status: Needs work » Closed (outdated)

Automatically closed because Drupal 6 is no longer supported. If the issue verifiably applies to later versions, please reopen with details and update the version.