Pathologic
Note: This documentation is targeted for the 7.x-3.x branch of Pathologic. The 7.x-2.x branch works similarly, except that the global configuration method is not available; only per-format configuration is possible. The 6.x-3.x branch is “current” for those still using Drupal 6; however, due to fundamental changes in the way the current Drupal 7 code works, this documentation won’t exactly match.
Pathologic is an input filter which attempts to alter paths in your content so that they are correct in situations which would otherwise cause them to “break;” for example, if the URL of the site changes, or the content was moved to a different server. Pathologic can also solve the problem of missing images and broken links in your site’s RSS feeds.
Example use cases
Here’s some hypothetical situations in which Pathologic can save the day.
- Links and/or images in your site content use relative paths (eg,
<a href="tag/food/pizza">instead of<a href="http://example.com/tag/food/pizza">) which work fine for people reading content on your site, but break gracelessly when content is syndicated via RSS, Atom, or maybe even a REST interface or the like. Pathologic can ensure that those paths are always full paths with a server fragment so that the paths always work no matter how or where the content is consumed. - The address of your site has changed. Perhaps you moved to a shiny new domain name, or perhaps you moved the Drupal installation from one subdirectory to another. Now all the images and internal links in your content don’t work. Using Pathologic is an alternative to going through all of the site’s content and correcting the paths manually.
- Your site has more than one copies at separate URLs; for example, testing and production servers. Or perhaps it is accessible via both HTTP and HTTPS, and when links or images switch between the two on the same page, web browsers throw scary warnings at visitors. Perhaps copy-editors edit content on the testing server, and that content eventually gets pushed over to the production server. When the editors link to other content on the site, perhaps they sometimes link to content using the test server’s URL; these links break when the content is published to the production server. Pathologic can correct those paths so that they’re always pointing at the “current” correct URL.
- Your Drupal site has been up for a while, but you’ve recently discovered the Clean URLs feature and enabled it. Your links still work, but they still have that ugly
?q=thing in them, and you have better things to do with your time than go through all your content to prettify the links. Or maybe you’re going the other way; you used to have Clean URLs enabled, but you’ve had to disable it, and now your links are broken. Pathologic to the rescue!
Installation
Pathologic is an input filter, so getting it installed and configured is a little bit more difficult than standard modules, but the instructions below will walk you through the process.
- Install the Pathologic module as normal. (If you’re a total Drupal newbie, you can read up on how to install Drupal modules to your site – and welcome to the community, by the way!)
- Go to Administration » Configuration » Content authoring » Text formats (
admin/config/content/formats). A list will appear of the various input formats your site uses. Find one in the list with which you want to use Pathologic, and click the “configure” link for that format. If you’re unfamiliar, you can learn more about text formats and input filters. - On the next page, find the section labeled “Enabled filters.” Check the box next to “Pathologic.” Scroll down a bit to the “Filter processing order” section and ensure that Pathologic is at the bottom of the list; if it is not, rearrange the filters using the draggable arrows in the left column (or the Weight menus, if you have JavaScript disabled) so that it is. Click the “Save configuration” button at the bottom. (If your browser has JavaScript disabled, you’ll have to click “Save configuration” between each step.)
- If you wish to use Pathologic with other input formats, go back to step 2 and repeat the process.
- Pathologic is now working on all old and new content which uses the input format(s) you added it to.
The reason why Pathologic should almost always be the last input filter to run on the text is because it will only work properly on pure HTML, so any input filters which convert some sort of non-HTML markup (BBCode, Markdown, Textile, etc) to HTML need to run first.
How Pathologic works
Depending on how you intend to use Pathologic and how the paths in your currently-existing content are formed, further configuration may not be necessary. To understand if further configuration is necessary in your case, and to explain how to go about that configuring, allow me to take a moment to explain how Pathologic works.
Pathologic looks at paths that are located in href attributes of links (<a> tags), as well as the src attributes of image tags and tags for other embedded media (<img>, <embed>, etc). After finding a path in an attribute, Pathologic then determines if a path is “local”, It does its magic on local paths, but leaves other paths alone.
Let’s assume that your Drupal site is up and running at http://example.com/drupal/. Pathologic considers a path local if:
- The path is a relative path. That is, it does not have a protocol fragment (such as
http://) and does not begin with a slash. For example,tags/food/pizzawill be considered a local path, but/tags/food/pizzaandhttp://drupal.org/tags/food/pizzaare not. - The path is an absolute path that points to a resource located within your Drupal installation. Our example is located at
http://example.com/drupal/, sohttp://example.com/drupal/tags/food/pizzais considered a local path. However, whilehttp://example.com/not_drupal/points to a resource on the same domain name, it points to something outside of the Drupal installation, so it is not considered local. - The path contains only an anchor fragment, such as
#pizza. - The path is an absolute path which begins with a URI of another Drupal installation which you’ve instructed Pathologic to consider local.
Aha! That last one is where things start getting interesting. Let’s say you’ve grown tired of using http://example.com/drupal/, so you’ve moved your site over to http://example.net/. (For those interested in using Drupal in a test/production server paradigm, imagine that example.com is the test server and example.net is the production server.) If all the paths in your content are relative paths, then Pathologic will handle them perfectly – no need for further configuration. However, if they are absolute paths that begin with http://example.com/drupal/, then Pathologic will not consider them local paths and will ignore them. However, we can tell Pathologic to consider such paths as local paths and to fix them.
Configuring Pathologic
Pathologic stores configuration in two ways: globally, and per-text format. By default, when you add the Pathologic filter to a format, it will use the global configuration unless you configure it otherwise. In most cases, just sticking with global configuration will work fine, and reduces the potential for confusion resulting when Pathologic works one way with one text format and a different way with another. However, you may want to use per-format configuration in certain cases; for example, if you want content on your site to use protocol-relative URLs, but content syndicated via RSS to use absolute URLs.
To modify the global configuration for Pathologic:
- Go to Administration » Configuration » Content authoring » Pathologic (
admin/config/content/pathologic). - Select the desired output format of Pathologic-processed paths from the “Processed URL format” field. The explanatory text for the field should explain the consequences of each option.
- Enter the paths of other/previous Drupal installations which should be considered local in the “Also considered local” text field. Enter one path per line. For the above example, we’d want to enter
http://example.com/drupal/. - Click the “Save configuration” button when done.
(Note that it’s fine to put the path for the “current” server in the “Also considered local” field. Pathologic will simply remove it when it does its trick. In other words, both the example.com and example.net servers can have both http://example.com/drupal/ and http://example.net/ in the field. This means that each server can be configured identically. This will make life easier if you’re using Features to manage configuration.)
To set per-text format options for Pathologic:
- Go to Administration » Configuration » Content authoring » Text formats (
admin/config/content/formats). Find the format you wish to set Pathologic options for, and click the corresponding “configure” link. - Find Pathologic’s settings in the “Filter settings” section of the format settings form. You’ll see a “Settings source” radio button with two options: “Use global Pathologic settings” and “Use custom settings for this text format.” Select the latter option.
- In the “Custom settings for this text format” section, configure Pathologic as above.
- Click the “Save configuration” button when done.
- Should you decide you want Pathologic to again use the global settings for this text format, simply edit the format settings again and change “Settings source&rquo; back to “Use global Pathologic settings.” The other local settings will be ignored and Pathologic will return to using the global settings.
Now sit back and enjoy the fruits of Pathologic’s labor.
WYSIWYG editor compatibility
If the site is using a WYSIWYG content editor such as CKeditor, TinyMCE, etc and Pathologic doesn’t seem to be doing anything, it may be due to the fact that such editors often try to output paths which begin with a slash character. Such paths are usually ignored by Pathologic, because Pathologic considers such paths to be absolute. However, you can trick Pathologic into working with such paths by using the “Also considered local” field. If the Drupal installation is at the root level of a web site (such as http://example.com/), simply enter a single slash into the “All base paths for this site” field. If it’s in a subdirectory (such as http://example.com/foo/drupal/), enter the full subdirectory path, with slashes at both the beginning and end (so /foo/drupal/ in this case). See the “Configuring Pathologic” section above for more information.
Migrating from Path Filter
Path Filter is an input filter which works similarly to Pathologic, but requires one to type a prefix of “internal:” or “files:” before all internal paths they want Path Filter to function on. A down side to this is that a site’s content becomes strewn with these bits, and if Path Filter is disabled, those “internal:” prefixes are going to be spat out to web browsers that won’t know what to do with them. That’s one of the reason I avoided using such “hints” in Pathologic.
If you are interested in migrating from Path Filter to Pathologic, be aware that Pathologic will automatically look for a prefix of “internal:” or “files:” in your paths, and behave appropriately. This means you should be able to use Pathologic as a drop-in replacement to Path Filter, with no additional configuration.
Alter Pathologic’s behavior - hook_pathologic_alter()
If you are a developer, you may be interested to know that Pathologic implements a hook which allows you to alter how it will construct a new URL, or even bypass constructing a new URL entirely. Check out the pathologic.api.php file in the module directory for documentation and example code for hook_pathologic_alter(). Some examples of things you could do by implementing this hook include:
- Have Pathologic bypass constructing a new URL if the path would be to a particular file, or to a file in a particular subdirectory (handy if you have a non-Drupal directory under your root Drupal directory which you want to link to).
- Have paths to images altered so that they point to a copy of the image on your site’s CDN instead of its main server.
- Remove or add query parameters to the URL that will be generated.
- Alter older path structures to reflect newer ones. For example, if your articles used to have paths like
articles/new-pizza-trends.html, but your paths now look likemagazine/articles/new-pizza-trends, that alteration could be done in ahook_pathologic_alter()implementation so that links in the old format in site content would continue to work.
Caching issues
Drupal caches the output of input formats for speed. This can cause some stale data problems with the paths that Pathologic creates if circumstances change to make those paths incorrect. See this issue and this issue for examples of this sort of problem which have come up in real-world use. Unfortunately, there’s no real good way to fix this without making Pathologic something other than a standard input filter (and cacheable). To avoid these sorts of problems, consider these tips:
- Do not change the URL path of established nodes, particularly if you have linked to them in your site content. Decide on a good URL path when the node is created and keep it. (If changing the path is truly necessary, change the path on the node editing form as normal, then go to Administration > Configuration > Search and metadata > URL aliases (
admin/config/search/path) and create a new path which points the “old” path to the node to avoid breaking both internal and external links.) - When migrating Drupal database contents from one site to another, exclude the contents of the cache tables (basically, all tables with names which begin with “cache”). This is actually a good idea whether you’re using Pathologic or not. If you are unable to exclude cached data from your dumps or otherwise avoid migrating cache data, you should clear your site’s cache after importing the data; you can do this by going to Administration > Configuration > Development > Performance (
admin/config/development/performance) and clicking the “Clear all caches” button near the top of the page, or by runningdrush cc allif your server has Drush installed.
Upgrading from Pathologic 7.x-2.x
If you're already familiar with the 7.x-2.x branch of Pathologic, this brief section of the manual will cover important changes in the 7.x-3.x branch, and, by extension, the 8.x-1.x branch. If you intend to upgrade a Drupal 7 site using Pathologic to Drupal 8, you must upgrade Pathologic to 7.x-3.x first; an upgrade path from 7.x-2.x will not be supported.
As of Pathologic 7.x-3.x, Pathologic can now be configured both on a per-text format basis, as in 7.x-2.x, and globally. The global configuration allows several or all instances of Pathologic across different text formats to easily be configured with the same settings without having to configure every instance of the filter across every format the same way. By default, your Pathologic instances on your site’s currently-existing text formats will continue to use per-text format settings, so behavior shouldn’t change at all between 7.x-2.x and 7.x-3.x just by upgrading (if it does, please consider it a bug and report it in Pathologic’s issue queue). However, you can convert those filter instances to use the global configuration instead, and adding Pathologic to a new format will cause it to use the global settings by default.
To modify the global settings, check out the new form at Administration » Configuration » Content authoring » Pathologic (admin/config/content/pathologic). The form is pretty similar to the per-text format form as existed before.
To toggle Pathologic between using global and per-format settings for a format, edit the Pathologic settings for the format. You’ll see a new “Settings source” radio button with two options: “Use global Pathologic settings” and “Use custom settings for this text format.” Select the former option if you want Pathologic to use the global settings for that format, or the latter one to use the per-format settings and thus behave as it did in 7.x-2.x.
Questions? Suggestions? Need help?
Please open an issue on Pathologic’s issue queue or contact the author and I’ll get back to you soon. Thanks for trying Pathologic!
Help improve this page
You can:
- Log in, click Edit, and edit this page
- Log in, click Discuss, update the Page status value, and suggest an improvement
- Log in and create a Documentation issue with your suggestion