Can pathologic alter img src paths to be HTTPS if the page is served out HTTPS? [#516294]

I am using the secure pages module and currently have all my HTTPS paths setup how I need them so that all secure content passes through HTTPS for SSL encryption.

The issue I am having is that IE gives the warning on HTTPS pages that "This page contains both secure & insecure items..." The warning is generated because the page content is served via HTTPS, but the images src paths are being re-written by Pathologic to be HTTP://.

Can Pathologic determine that the page is served via HTTPS and then rewrite the img src path to HTTPS also to avoid the IE warning?

Thanks,
Jason

Comments

Comment #1

Garrett Albright CreditAttribution: Garrett Albright commented 13 July 2009 at 00:02

Hmm. I think what might be happening here is that the pages were first viewed over a standard HTTP connection. When Drupal went to apply the input format to the text and Pathologic kicked into action, it built the image paths using the base URL of the time - which would have had http:// instead of https:// - and Drupal then cached the results. Now, when someone is served the page over an HTTPS connection, they're served the cached, formatted text with the HTTP paths.

To fix this, try this: Make sure your site is available *only* over an HTTPS connection if possible, then, while logged in to your site over HTTPS, clear the site's cache (Administer > Site settings > Performance (or something like that) and find the button at the bottom). When the pages are rebuilt and re-cached for visitors over an HTTPS connection, they should be using the https:// prefix. If it's not feasible to keep your entire site behind an HTTPS connection, then avoid this problem in the future by always being logged in over an HTTPS connection and viewing the node/block/whatever you're editing yourself immediately after editing it. That way, the paths should have the https:// prefix.

You may find my Secure by Role module convenient for forcing certain roles - such as all authenticated users - to use an HTTPS connection when accessing the site.

Comment #2

jrosen CreditAttribution: jrosen commented 17 July 2009 at 16:25

So, what you mentioned is what happened. When I first viewed the site it was over HTTP. Then when I switched to HTTPS, it continued to use HTTP for the images. Now that all admin pages are behind HTTPS, the images are all cached under HTTPS.

But I am a bit concerned that if somehow the cache was to get regenerated, it might mess up the HTTP/HTTPS paths again. Is there a way to ensure that Secure Pages always re-writes attributes as HTTPS if the page request was HTTPS? Then it wouldn't matter what the cache says (I hope).

Comment #3

Garrett Albright CreditAttribution: Garrett Albright commented 17 July 2009 at 16:40

Is there a way to ensure that Secure Pages always re-writes attributes as HTTPS if the page request was HTTPS? Then it wouldn't matter what the cache says (I hope).

Not really. Pathologic is an input filter, and the output of input filters is cached.

Perhaps you may want to not use Pathologic and instead stick to relative paths in your content.

Comment #4

jrosen CreditAttribution: jrosen commented 30 July 2009 at 10:23

No, I definitely want to use Pathologic... I think it is awesome for managing content that moves between servers and domains, like Dev to Test to Production.

I'll take a look at the caching and play around a bit to figure out what happens when I clear the cache.

I also have it configured now where all my admin pages are automatically redirected to HTTPS, so whenever new page content is added, it is done via HTTPS. This should force the first page rendering to always be via HTTPS, so that should implicitly fix the issue I had before.

If I figure anything else out, I'll post here.

Comment #5

corporatebastard CreditAttribution: corporatebastard commented 13 September 2009 at 14:41

What about an option to make pathologic output absolute urls rather than fully qualified ones? That way the image would be accessed with whatever protocol the page was accessed with.

Comment #6

Garrett Albright CreditAttribution: Garrett Albright commented 13 September 2009 at 20:01

You mean "/foo" instead of "http://example.com/foo"? One of the explicit goals of Pathologic is to use full paths so that links and images in content read via RSS feeds are not broken. The protocol and host fragments must be included in order for that to work.

Comment #7

hass CreditAttribution: hass commented 13 September 2009 at 23:15

Version:	6.x-2.0-beta19	» 6.x-2.0-beta22
Category:	support	» bug
Priority:	Normal	» Critical

If pathologic change every URL into a full qualified URL all links are broken and the browsers report insecure content on SSL pages. This will happen if the filtered content is saved in cache_filter (normal) and than shown on SSL pages with full qualified http:// links. It also add a useless icon in front of all my internal URLs that should be absolute to the site root, but not having the domain name.

CSS definition example:

  #main a[href^="http:"],
  #main a[href^="https:"]
  {
    padding-left: 12px;
    background-image: url('your_image.gif');
    background-repeat: no-repeat;
    background-position: 0 0.45em;
  }

If an URL is broken in RSS feeds there need to be a case check for RSS and only use absolute links if RSS feeds are used.

Comment #8

Garrett Albright CreditAttribution: Garrett Albright commented 14 September 2009 at 15:17

I am aware of the http/https problem; see the "Caching issues" section of the documentation. Unfortunately, there's nothing really that can be done about that without making Pathologic something more/other than an input filter, which I don't wish to do. My best suggestion is that your site be all-or-nothing as far as security goes - don't let it be accessible at all via HTTP if any part of it requires HTTPS.

As for only using the protocol fragment when feeds are used, input filters don't work that way - there's no context about how the node is going to be viewed when it's filtered, and feeds and web pages will use the same filtered output anyway.

Comment #9

hass CreditAttribution: hass commented 14 September 2009 at 15:55

Sorry, I do not understand your comment... Pathologic is an output filter... so the only bug is - it needs to create links absolute from webroot and if an RSS feed is shown a full qualified hostname. Where is the problem to implement this?

There seems to be more modules with this issue:
#361926: Use relative URL
#395764: Aggregator: Convert all relative URLs to absolute URLs in feed items
#88183: Relative URLs in feeds should be converted to absolute ones

By this issue - it's an aggregator issue that needs to be solved if I understood all correctly... so it's better to implement Pathologic like core works and not in a different way and wait or solve the issue - or start working on the core issue if you need to see this fixed. I'd like to migrate from Pathfilter to Pathologic, but this full qualified URLs are a show stopper.

Comment #10

Garrett Albright CreditAttribution: Garrett Albright commented 14 September 2009 at 17:12

Sorry, I do not understand your comment... Pathologic is an output filter... so the only bug is - it needs to create links absolute from webroot and if an RSS feed is shown a full qualified hostname. Where is the problem to implement this?

Pathologic is an input filter. When input filters are run, they are not told whether they are creating output for a web page or an RSS feed item - partially because it doesn't really matter, since the output is going to be cached for use in both feeds and web pages anyway. There's no way to differentiate without getting all hacky. Pathologic is magical, not hacky.

If the feed serving code is ever going to be tweaked to output full URLs, that's not going to happen for D6 anyway, so Pathologic will continue to do this. If this is really a show-stopper for you, then feel free to stick with Path Filter - if it ain't broke, etc, etc.

Comment #11

hass CreditAttribution: hass commented 14 September 2009 at 18:13

I'd like to no longer use pathfilter "internal:" prefix in URLs. This is the main reason to migrate.

Are you able to explain why you do not like to make Pathologic working correctly with http and https URLs? Trying to keep user on non-SSL is no option at all and it's proven to be faulty and unrealistic.

The module should work in a general way. Naming Pathologic "magical" sounds not like a solution and I do not see so much magic inside the module... it doesn't do sooo much except a bit filtering and URL prefixing. Not so much magic, but very helpful for running production sites in development subfolders...

I believe - if the bug in feeds is fixed for D7 it can be backported asap. Looks not very important or it would have been already fixed... I have only changed 3 lines to make Pathologic working as it should. I will try to take a look how "hackisch" it would be to detect the feeds as I do not like to maintain this 3 lines on every release myself. This may be more error prone if I miss this changes...

Comment #12

Garrett Albright CreditAttribution: Garrett Albright commented 14 September 2009 at 18:24

Are you able to explain why you do not like to make Pathologic working correctly with http and https URLs? Trying to keep user on non-SSL is no option at all and it's proven to be faulty and unrealistic.

So keep users on SSL connections all the time. As mentioned several times above, Pathologic will work just fine if it's always one or the other; it's when there's the chance for mixing that problems occur.

I think maybe you're thinking two things are true which are actually false; one, that input filters are capable of being aware of the context in which their output will be used (whether in a feed item or a web page or…), and two, that input filters are run on site content every time it is requested. If both those two things were true - or heck, just one or the other - then an input filter doing things one way for feed items and another way for web page content, or one way for pages behind HTTP connections and another way for pages behind HTTPS connections, would be more plausible. But that's not the case without resorting to dirty tricks. I'm simply working within the limitations that input filters inherently have. I certainly bear no hostility towards people with secure sites.

Comment #13

hass CreditAttribution: hass commented 15 September 2009 at 20:05

Keeping thousands of concurrent users on SSL causes *high* CPU load for encryption. This eats power, slow down browsing, may overload a infrastructure and so on and so on. A site should use SSL only if required - e.g. shopping process, personal user data transfer, credit card data transfer, etc.

It sounds like we are talking against each other... this are my points:

1. A node need to have all links absolute to the site root to allow switching between SSL and non-SSL without any harm.
2. The reason is caching of Drupal - it works the way it works.
3. RSS feeds have a bug as they do not add the links with full qualified domain name.
4. This module works around a bug and does not FIX the bug.
5. The module break other things by the workaround implemented.
6. Breaking things only for a workaround is not the Drupal way. Bugs need to be fixed.
7. Output from a node view is not the same code like the RSS code. Therefore different code caches (not sure about RSS). No problem to split the url logic.

So - only to clarify - pathologic hacks around a RSS bug in core? Again, hacking around a core bug is by design the wrong way.

PS: No need to teach me basics about the filter system. It may be named input filter, but filters on output as input filters would change code on save and this is not the Drupal way. Drupal only filters on output - therefore I say it's an output filter.

Comment #14

Garrett Albright CreditAttribution: Garrett Albright commented 14 September 2009 at 23:14

3. RSS feeds have a bug as they do not add the links with full qualified domain name.

I disagree that this is really a bug. It's outputting the HTML it is given, without filtering it in any way. That's not being buggy; it's being simple. But for the sake of argument, we'll say it's a bug for the rest of this post.

4. This module works around a bug and does not FIX the bug.

The bug is in core. Have you seen how tough it is to get a bugfix into an already-released core branch? - especially when you consider that this isn't really a bug, and even if it were, it's a rather minor one with an easy workaround. Besides that, fixing this bug is not Pathologic's sole reason of existence - more of a bonus.

5. The module break other things by the workaround implemented.

Arguably, it does. However, these statistics which I'm making up say that twenty times more people are publishing their site's content via a feed than are concerned about securing it with HTTPS.

6. Breaking things only for a workaround is not the Drupal way. Bugs need to be fixed.

Again, the bug's in core, and this is a contrib module. I'm doing what I can to alleviate the issue from a contrib module which implements an input filter and nothing more.

7. Output from a node view is not the same code like the RSS code. Therefore different code caches (not sure about RSS). No problem to split the url logic.

…No, this is wrong, as I've mentioned above. Both posts in feeds and posts viewed via the web are going to be drawing their filtered HTML from the cache_filter table in the database. And the code will be the same as far as input filters are concerned, because there's no way to tell where its output is going to appear. I'm not lying about this…

PS: No need to teach me basics about the filter system. It may be named input filter, but filters on output as input filters would change code on save and this is not the Drupal way. Drupal only filters on output - therefore I say it's an output filter.

But what does it filter? It filters input. Therefore it's an input filter. :P

With all due respect, the odds of me changing my mind on this are pretty miniscule. If this is still a dealbreaker for you, your time will likely be better spent by either continuing to use Path Filter or making your own hacks/patches to Pathologic.

Comment #15

hass CreditAttribution: hass commented 15 September 2009 at 20:05

Have you seen how tough it is to get a bugfix into an already-released core branch?

For sure, I've got all my patches in, but sometimes it wasn't that easy... :-) The core thread that deals with the bug seems not to be very active... for the reason that RSS feeds are much more less popular than a HTML page.

I do not like to compare this with SSL and non SSL and the expected usage... this is pointless. It's the implementation that is not correct as it starts breaking CSS and give users security warnings in the browser. This security warning is much more important than an RSS feed as users loose confidence in your site if they see this only *once*. A broken links in RSS doesn't cause this loss of confidence.

I have no plans to maintain another duplicate module. You may remember that I asked you many months ago why you haven't joined the pathfilter team and merged both projects.

It would be great if the users is able to decide how the module works. Full qualified vesus / absolute to the webroot - e.g. with a radio box.

Comment #16

Garrett Albright CreditAttribution: Garrett Albright commented 15 September 2009 at 00:01

You may remember that I asked you many months ago why you haven't joined the pathfilter team and merged both projects.

I'm still up for that, though in my opinion it should be less of a merger and more like Pathologic clobbering Path Filter. But maybe I'm a bit biased. I posted again in the relevant issue (#272778: Deprecate Path Filter in favour of Pathologic). Maybe we can move forward with it.

It would be great if the users is able to decide how the module works. Full qualified vesus / absolute to the webroot - for e.g. with a radio box.

I'll promise to consider it if you promise to never write "for e.g." ever again ever.

Comment #17

hass CreditAttribution: hass commented 15 September 2009 at 20:08

Fixed :-)))

Comment #18

hass CreditAttribution: hass commented 20 September 2009 at 18:04

In D6 the node_feeds also use xml:base, so the issue is more minor... and aggregator output is themable.

Comment #19

Anonymous (not verified) CreditAttribution: Anonymous commented 5 October 2009 at 19:07

Just thought I'd throw in a vote for the ability to use fully-qualified absolute urls or 'absolute' urls via base_path() (or similar).

Currently I'm using a custom module to prepend base_path() to all relative links. It works fine, but I'd rather be using a supported, contributed module rather than custom code. Currently I can't use Pathologic because my site(s) can be accessed via http/https and via different domains (although the path is identical).

Comment #20

hass CreditAttribution: hass commented 28 March 2010 at 12:45

The line:

  return url($path, array('query' => $query, 'fragment' => $fragment, 'absolute' => TRUE));

need to be changed to:

  return url($path, array('query' => $query, 'fragment' => $fragment));

Or the module behaves wrong.

Comment #21

Garrett Albright CreditAttribution: Garrett Albright commented 29 March 2010 at 15:16

Status:

Active

» Closed (works as designed)

hass, you rascal, you.

Comment #22

hass CreditAttribution: hass commented 29 March 2010 at 19:38

Status:

Closed (works as designed)

» Active

Bug hasn't been fixed, sorry.

Comment #23

Garrett Albright CreditAttribution: Garrett Albright commented 29 March 2010 at 20:17

Status:

Active

» Closed (works as designed)

There is no bug. Please don't play with my status menu - I can really go without any more petty Drupal-related drama at the moment.

Comment #24

hass CreditAttribution: hass commented 29 March 2010 at 22:55

What is the problem with implementing the filter correctly?

I do not like to fork this module only to get this bug fixed that clutters my site with every new release. I really hate buggy modules that need custom patches applied to work correctly.

Comment #25

Garrett Albright CreditAttribution: Garrett Albright commented 19 April 2010 at 02:44

Version:	6.x-2.0-beta22	» 6.x-3.x-dev
Status:	Closed (works as designed)	» Needs review

The just-released 6.x-3.x branch is a backport of the 7.x-1.x branch, where the filter settings have a check box which allow you to toggle between outputting the path with the http://example.com part or without. Anyone interested in this issue, please back up your database and give it a try. Especially you, hass.