I was playing around with the IMCE module last night. I realized that the module was inserting HTML in the node for the images to insert, that much is great, however the href attribute for these images was an absolute link to the server's root. (Some say : server-root-relative path.) Indeed that is the only way it will "work" because of Drupal's weakness with regards to its clean (not so smart) URL management.

This is a problem as far as I see it. It is brought up regularly in the forums. Never dealt with.

Here is the problem :

When clean URLs is enabled, Drupal is unable, through its very own "rewrite rules", to correctly reference any file that is in the "Files" directory, unless the link to it is absolute. Relative links from index.php to the File folder, that File Folder location and name defined explicitly in the Adm Control Panel by the admin himself, aren't referenced correctly.

We usually want to have a test site.

We even move our web sites around, when we are no longer satisfied with our hosting cies.

Hence, if you move a web site, or transfer a dabase, and the Drupal installation does not have the exact same absolute path to the web's server, all images links are broken in all nodes.

Here is a simple, reasonable request :

Make it possible for end users to enter html image links in nodes with php input disabled, which won't break when (a) the site is in a subdirectory, (b) clean URLs are enabled and (c) the site is moved.

I do believe that this is a bug, however.

Clean URLs are dealt with by Drupal.

It is Drupal that takes care of understanding clean URLs.

It is Drupal that creates ReWrite rules in an .htaccess file in the root of the Drupal folder so that when an HTTP request that is trying to retrieve such page as http://www.myWebSite.com/node/5 is received, that request is translated to : http://www.myWebSite.com/index.php?q=node/5.

Since the path to the "files" folder is one set in Administer>settings>File System Settings by the user,
recorded as File system path, i.e. since Drupal knows the path to that folder, Drupal should
add a special Rewrite rule to its own special Rewrite rules when clean URLs is enabled so that it can understand such path as : http://www.myWebSite.com/..../files/... to any file.

Hard-coding links to the web root folder is bad practice.

A Rewrite Rule can be written by someone who understands regular expression well.

Let's talk about this.

What are we planning to do about this "by-design" bug ?

Comments

xjm’s picture

Agreed that absolute links are not the answer. I use the nice pathfilter module to make relative URLs work no matter where my site is currently living, and my users are under strictest orders to always use it for internal links in nodes--and never absolute or root-relative URLs.

I've never encountered the issue with the files/ directory because I've disallowed uploading of files, but I am going to have to allow uploads in the near future, so I'm eager to hear others' thoughts on the issue.

Agreed also that it would be fairly straightforward to add a RewriteRule to .htaccess to make an exception for a files directory. It would be more complicated, though, to customize this rule--certainly dangerous territory for any admin not wise in the ways of Apache and the regex.

What about the case where the files/ directory is not inside the Drupal directory?

=======================
Just another newbie.
XHTML Strict: it's the way to be.
=======================
Feature request: HTML Source Formatting in TinyMCE

xjm’s picture

I was able to successfully add images in the files directory to my nodes using pathfilter:
<img src="internal:files/images/image.jpg" alt="" />

Worked fine for me.

=======================
Just another newbie.
XHTML Strict: it's the way to be.
=======================
Feature request: HTML Source Formatting in TinyMCE

cog.rusty’s picture

I agree that Drupal's handling of standard html img src could take some tinkering for covering the case you mentioned.

I am a bit skeptical of rewrite rules in .htaccess as a solution. Rewrite rules are already a user-support hell, although sometimes they make entertaining puzzles for me (at least when I am not the one having the problem).

So, I think at least something like the "internal:" filter mentioned by xjm should be in core if a cleaner solution is not available.

About IMCE, what I have found is that it does use paths relative to your Drupal installation if you have specified your files path as just 'files' in Drupal's general file system settings, while it uses absolute paths if you have specified an absolute Drupal files path.

xjm’s picture

It would be nice for something that does what pathfilter does to be integrated into the core.. That would mean that I wouldn't have to keep hacking modules to make them cooperate with what pathfilter does. :) (E.g. http://drupal.org/node/106695)

=======================
Just another newbie.
XHTML Strict: it's the way to be.
=======================
Feature request: HTML Source Formatting in TinyMCE

Chill35’s picture

So, I think at least something like the "internal:" filter mentioned by xjm should be in core if a cleaner solution is not available.

Me too.

Caroline

styro’s picture

Clean URLs are dealt with by Drupal.

It is Drupal that takes care of understanding clean URLs.

It is Drupal that creates ReWrite rules in an .htaccess file in the root of the Drupal folder so that when an HTTP request that is trying to retrieve such page as http://www.myWebSite.com/node/5 is received, that request is translated to : http://www.myWebSite.com/index.php?q=node/5.

Since the path to the "files" folder is one set in Administer>settings>File System Settings by the user,
recorded as File system path, i.e. since Drupal knows the path to that folder, Drupal should
add a special Rewrite rule to its own special Rewrite rules when clean URLs is enabled so that it can understand such path as : http://www.myWebSite.com/..../files/... to any file.

I'm not sure that would be possible. Drupal doesn't control the rewriting rules at all, they are just an Apache config include file that gets bundled with Drupal. Clean URLs are handled by Apache. In any half way secured site Drupal would have no way of altering any Apache configuration to suit any settings in its database. Generally the web server (ie Drupal) shouldn't have write access to any of the applications or the servers files.

Although I'm not completely sure what the problem is (I've never used subdirectories), I suppose one option that could happen is that some sort of commented out example rewrite rule could be added to .htaccess that the webmaster could edit to suit their site.

But I don't see any way Drupal could do this from the database settings without severely lowering the security of your web server.

--
Anton
New to Drupal? | Forum posting tips | Troubleshooting FAQ

cog.rusty’s picture

@styro, this is about the behavior which Heine explains in this issue http://drupal.org/node/102667, comment #8, last case. The question was whether it should work this way.

styro’s picture

and private downloads, so I'm not really familiar with the issue.

So the problem scenario is:

location: /drupal/node/4
link: files/my_file.png

Browser tries to fetch /drupal/node/4/files/my_file.png because the current document location is /drupal/node/4.

Isn't that just how browsers and the web work with relative vs absolute links? You'd get the same problem on a static HTML site.

The problem possibly (I'm just guessing) stems from the removal of the <base href="..." /> element from 4.6. That was a slightly controversial change at the time if I remember correctly. Does adding a <base href="..." /> back into the <head>..</head> of page.tpl.php resolve this?

So if i understand correctly, the feature request is for some automatic filter / token replacement in Drupal core to handle aliasing the files directory properly for this scenario?

If that was able to happen in core it would be for 6.0 at the earliest, and I suspect it would need to happen as part of a larger debate anyway.

In the meantime, I think the best approach would be looking for contrib filter modules for that. Or using <base href="..." /> (if that works). Or (the cop-out answer) not using subdirectories where ever possible. You can still use multiple virtual hosts when developing on localhost so that each site gets a root url.

--
Anton
New to Drupal? | Forum posting tips | Troubleshooting FAQ

cog.rusty’s picture

What I was thinking is that Drupal should never append the "location" (which is a virtual one anyway with clean URLs) but only the base path. Not sure about the side effects.

styro’s picture

doing that, it's the browsers following the standards.

In the absence of a <base href="..." /> element or some sort of HTTP header metadata, the HTML specs and RFCs state that current document is the base url. Any relative urls get appended to that.

With clean urls the browsers don't know that /drupal/node/4 is really /drupal/index.php?q=node/4, mod_rewrite hides that from them. As far as the browsers are concerned, /drupal/node/4 is the current document - they don't know it gets rewritten at the server. The rewriting is done by Apache for the server, not for the clients.

The HTML options in this scenario are to either use <base href="..." /> with relative urls or absolute urls. Drupal has for whatever reasons in 4.7 changed to favour the absolute urls approach.

--
Anton
New to Drupal? | Forum posting tips | Troubleshooting FAQ

Chill35’s picture

How are other CVS handling this ?

Caroline

styro’s picture

Do you mean CMS? If so, I'm not sure.

The way I handle it in Drupal, is to use the image_filter module. If I've got an image node with a node id of 44, I can insert that in a node by entering [image:44] in my nody body. You can also control other aspects of the image...

Here is the help text for image_filter:

You may quickly link to image nodes using a special syntax. Each image code will be replaced by thumbnail linked to full size image node. Syntax:

[image:node_id align=alignment hspace=n vspace=n border=n size=label width=n height=n nolink=(0|1) class=name style=style-data]

Every parameter except node_id is optional.

Typically, you will specify one of size, width, or height, or none of them. If you use size=label, where label is one of the image size labels specified on the image settings page, the size associated with that label will be used. The sizes "thumbnail", "preview", and "original" are always available. If you use width=n or height=n, the image will be scaled to fit the specified width or height. If you use none of them, the thumbnail image size will be used.

If you specify nolink=1, no link will be created to the image node. The default is to create a link.

The align, hspace, vspace, border, class, and style parameters set the corresponding attributes in the generated img tag.

image_filter is a little dated and neglected though (I'm using the latest 4.6 version directly in 4.7 without any problems), and I'm not up with the play on many of the other image handling solutions these days. It just suits how I want to deal with inline images.

--
Anton
New to Drupal? | Forum posting tips | Troubleshooting FAQ

Chill35’s picture

When I finish downloading the latest version of Joomla, I will try & see how they achieve inserting images in their nodes with relative path, if they use relative path. So far I have looked at the .htaccess Joomla file (from an old version of it I had) and I can see many differences in the Rewrite rules. I must say that Joomla makes excellent use of comments in its .htaccess file, although I dislike Joomla for just about everything else I know about it.

Caroline

Chill35’s picture

In the absence of a element or some sort of HTTP header metadata, the HTML specs and RFCs state that current document is the base url. Any relative urls get appended to that.

That makes sense.

I saw something in the Drupal .htaccess file :

# Modify the RewriteBase if you are using Drupal in a subdirectory and
  # the rewrite rules are not working properly.
  #RewriteBase /drupal

I have to play around with this.

With clean urls the browsers don't know that /drupal/node/4 is really /drupal/index.php?q=node/4, mod_rewrite hides that from them. As far as the browsers are concerned, /drupal/node/4 is the current document - they don't know it gets rewritten at the server. The rewriting is done by Apache for the server, not for the clients.

Agreed, the rewriting is done by Apache, but it follows the Rewrite Rules set by Drupal in Drupal-made .htaccess file, which is part of the Drupal package, in the Drupal's folder.

When you turn on and off clean URL in the Drupal Admin Control Panel, you are effectively editing this .htaccess file.

The HTML options in this scenario are to either use with relative urls or absolute urls. Drupal has for whatever reasons in 4.7 changed to favour the absolute urls approach.

Here are links to these discussions, I think :

http://drupal.org/node/58106
http://drupal.org/node/58446

--

styro’s picture

I saw something in the Drupal .htaccess file :

# Modify the RewriteBase if you are using Drupal in a subdirectory and
# the rewrite rules are not working properly.
#RewriteBase /drupal

I have to play around with this.

Whoops sorry, I had mistakenly assumed you had already tried that and it wasn't working for you. It does help some things, but I'm not sure it will solve all your issues.

Agreed, the rewriting is done by Apache, but it follows the Rewrite Rules set by Drupal in Drupal-made .htaccess file, which is part of the Drupal package, in the Drupal's folder.

Slight clarification: The rules are set by the Drupal dev team (not including any alterations made by the webmaster) - not by the Drupal application itself.

When you turn on and off clean URL in the Drupal Admin Control Panel, you are effectively editing this .htaccess file.

No not quite, the clean url setting has no effect on the rewriting rules. If Apache loads the rewrite rules, they will be in place no matter what the setting is within Drupal. The Drupal setting alters how Drupal outputs any internal links it creates programmatically.

As an experiment, try it yourself:

With clean urls on (in the Drupal settings), manually try both these links: http://yoursite.com/index.php?q=user and http://yoursite.com/user

Then with clean urls off (in the Drupal settings), try them both again.

As you can see the both types of urls always work and the rewriting done by Apache always happens (providing it is working at all of course). Drupal (the application) has no control over how Apache works or what is in the .htaccess file. The only change you will notice is in the HTML output from Drupal - the clean urls setting changes the format of the links Drupal creates for itself.

mod_rewrite handles the incoming requests, and the Drupal clean urls setting handles the outgoing HTML.

Or in more detail mod_rewrite is for getting Apache to convert the incoming HTTP requests from the clean format to the standard one so that PHP can do something with them. And the clean url setting in Drupal is for changing the page output so that browsers or crawlers following the links will use the clean format for their future requests.

They are each independant of each other and used for different sides of the puzzle.

Was that helpful? Hopefully that will give you a better understanding of what's happening with clean urls.

Good luck with getting some sort of internal filter into core for 6.0 - I think it is a needed component :)

--
Anton
New to Drupal? | Forum posting tips | Troubleshooting FAQ

Chill35’s picture

With clean urls on (in the Drupal settings), manually try both these links: http://yoursite.com/index.php?q=user and http://yoursite.com/user

Yes, I tried that but I didn't think it meant that the .htaccess file wasn't accessed.

Given the "rules" I though that both types of link would work.

If Drupal is responsible for creating the link programmatically, then surely something can be implemented.
Although for now, whatever is the content of a node (its $body) is dropped in the dynamically created page, as is.
(If there's no "macro"...) Which has its advantages. Total... control.

Was that helpful? Hopefully that will give you a better understanding of what's happening with clean urls.

Very useful and I stand corrected. You explain very well by the way. It's worthy of mention.

I believe you about the .htaccess access. I guess that if I was in doubt I could look at the .htaccess file when clean URLs is on, and look at it when it's off - but I believe you.

Thank you

Caroline

styro’s picture

If Drupal is responsible for creating the link programmatically, then surely something can be implemented.

I can understand the desire for that to happen, but I can also see that there would be resistance from some core devs.

They might be reluctant to implement a solution for a problem that is a bit of a corner case, especially when the solutions can be a bit site specific, would probably rely on parsing rules, and that have other ways around the problem.

Some would probably just see it as something that is perfectly well handled by contrib modules for those that need them.

If a solution was to be created in core it would probably need to be part of a wider more general framework aimed at handling references to things - but I'm only guessing :)

In the meantime, I think a good way to avoid lots of little issues like this is to not install Drupal in subdirectories and to use internal urls that start from the root ie "/node/4" instead of "node/4". Not installing in subdirectories also makes multisite easier IMO.

--
Anton
New to Drupal? | Forum posting tips | Troubleshooting FAQ

xjm’s picture

Unfortunately sometimes "not installing in subdirectories" is not an option for some. :)

Everything else in Drupal continues to work wherever you put Drupal, so internal links in node content should too.

I can think of a somewhat persuasive reason for the internal URLs to be handled by core rather than contrib modules: a standard that all things Drupal could use. In a sense, it's already there internally--Drupal automatically converts URLs we enter in various places all over the site's config to the appropriate Drupal-base-relative path. The url() function is there for all contrib modules to use. What's not there is a standard for referencing this important url() in site content.

For example: I use pathfilter to reference internal URLs when linking in nodes, as I mentioned before. I also use the relatedlinks module to retrieve a list of all the links in certain node types (since my site includes a knowledge base this very useful functionality). On install, relatedlinks of course knows nothing about pathfilter. It dutifully fetches the internal:my/path/alias/xyz link text from the node content and adds it to the relatedlinks block--as <a href="internal:my/path/alias/xyz">, which obviously is not going to work if it doesn't go through pathfilter's filter. While the node itself has those links interpreted by pathfilter, the relatedlinks block isn't even a node, so it naturally doesn't go through any filters at all.

To get relatedlinks to print working links for those already-defined-en-masse internal links, I added a couple lines to make a call to pathfilter's converter if appropriate. Well, that's fine as far as I'm concerned (I'm a big girl, I can handle it), but I'm less enthusiastic about having to add and maintain similar hacks to new functionalities in the future.

Summary. This is:

  • a very basic and oft-used functionality (linking to content on one's own site)
  • something that Drupal can already do for everything except human-entered node content
  • something that needs to be standardized

==========
Feature request: HTML Source Formatting in TinyMCE

Chill35’s picture

Don't ask me why but it is possible to add relative paths that use the "path" system of Drupal in this fashion, clean URL enabled or not.

<a href="?q=node/6>Hello</a>

This will link, from location node/8 for example, to :

http://www.myWebSite.com/node/8/?q=node/6

With that address in the browser, you will see node 6

!

That's why I keep talking about the files directory... the 'files' directory cannot be referenced with a query...

Very odd.

Chill35’s picture

However, let us agree that as an address to boomark, it's aweful.

Caroline

Chill35’s picture

If that was able to happen in core it would be for 6.0 at the earliest, and I suspect it would need to happen as part of a larger debate anyway.

It's my intention to open a larger debate over this, in the long term (drupal 6.0), with this forum topic.

kdebaas’s picture

I assume the problem manifests itself with urlaliases as well?

I am in the middle of finding ways to import an old coldfusion/html site to Drupal. I had thought of just dumping the full HTML of the content of each page, stripped of its navigation, into Drupal nodes with one of the variety of import modules there exist. (thank god there is a database with the contents of the old site). My initial, optimistic, assumption was that if I could maintain the original, and relative, urls by creating urlaliases during import, the links inside the node bodies to other nodes would still be valid.

Well, if optimism won't have its way, let it be replaced by warm support for this debate.
Subscribing...

Chill35’s picture

I assume the problem manifests itself with urlaliases as well?

I think so.

Chill35’s picture

I have been testing the base element for a few days now.

In page.tpl.php of my theme, I put in the header :

<base href="http://www.myWebSite.com/" />

(One should use php instead... not hard-code it like I have...)

About the issues John Albin reports here : http://drupal.org/node/58106

Named links (<a href="#mylink">) don't work in Drupal 4.6 and before. if you are putting a named link in a node, you have to know the node's url before writing part of the node's content (which is kind of a pain). But more importantly, if you want to put a named link in your template, you'll need to learn php in order to insert the current url into the named link (i.e. it's difficult for template writer's to do something simple.)

1. Named links don't work indeed. After adding back the base element in my Drupal 4.7.4 installation, I realized that all collapsible fieldsets (such as in the node edit page) make use of the "href" node attribute in DOM scripting. To make the fieldset title "clickable", collapse.js creates a link element, and assign to its onclick property the function that will make the fieldset collapse. Instead of adding an onclick event-handler property to the existing element. The code is like this, in collapse.js :

a = document.createElement("a");
a.href = "#";
a.onclick = function() { ...

In Internet Explorer 7, even when you set the event-handler to false, the link is still followed in its non-Javascript usual way, after the function assigned to the event-handler is invoked. As a result, when you try to open and close the collapsable fieldsets, you are redirected to the web site home page : http://www.myWebSite.com/#

I fixed collapse.js because it's bad practice anyway to add an href attribute just to make the pointer cursor appear when you are hovering over the element... Here's what I did (I don't think it's the absolute best fix though, the best would be to avoid the creation of that anchor element) :

a.href = "#";

a.className = "clickable";
a.onclick = function() { ...

And my stylesheet, I added :

.clickable {
cursor:pointer;
}

Now, about the so-called limitation here of having to put : href="node/6/#", instead of href="#" when trying to access some IDed element on the same page, I find it a non-issue. You do not require php.

And if you’re using a template with "#" links to the same page, you are using php already and you would use base_path() anyway. I am not aware of any template throwing in relative links.

broken links many broken pieces of software ignore the <base href> and fill the watchdog and apache logs full of 404 errors.

2. As far as the second problem you mention, I am not aware of experiencing it so far. No error in my Drupal logs anyway.

What I know, though, is that the base element is XHTML-strict compliant. So the argument that the removal of the base element is necessary to be more W3C-compliant is wrong. That argument is used in the documentation : http://drupal.org/node/58446

Chill35’s picture

I forgot to mention :

- After fixing collapse.js for Internet Explorer 7 (clicking on the fieldset name also caused the page to roll back to the top, without the base element, which is a total annoyance...), I haven't experienced any problem so far.

EXCEPT :

- TinyMCE is using the href node property as well, because now TinyMCE is broken with my base element. Clicking on "rich text" brings me to the Drupal home page : http://www.myWebSite.com/#.

I don't care about tinyMCE though.

But if you test my solution, beware of this.

Regards,

Caroline

Tom-182’s picture

I've been looking for the solution for this problem, i've tried the base url thing and it works like magic.
Thank you Chill35 and others for having discussion for this problem :)

nekobul’s picture

Thanks for the valuable info. For Drupal 5, to avoid having the site hard-coded in the template you can set base to:

<base href="<?php print $GLOBALS['base_root']?>" />

smandal’s picture

I think it is base_url and not base_root. See more at http://api.drupal.org/api/file/developer/globals.php/6

You also have to use the ending "/" in the base href tag and no leading "/" in the html code for your browser to use base url combined with your . Here is a piece of code you might want to put in your theme page.tpl.php file.

<base href="<?php print $GLOBALS['base_url'].'/'?>"

And then in your html code for the content do not use the leading "/" . Your code might look like "src="sites/default/files/ ....../.jpg"

This is not ideal because your rich-text editor (e.g. fckeditor) will not show the image in the box. You have to preview to see the content. Unless someone knows how to configure fckeditor or other editor to use baseurl for it's text area.

paganwinter’s picture

Subscribing...