Has anyone played with adding an input filter for links in content? For my example: I have a persistent query string element that is carried along for all links. So far this is working great, until you get a link in content. For example if you have a link to "/contact" in a node. This isn't run through l() so the persistent url element is dropped.

An input filter (that runs before link processing), which looks for a tags and transforms local links might solve this nicely. Any potential pitfalls here? Performance? A small contrib module might just need to:

  1. Define an input filter designed to run before link transformation.
  2. This input filter would scan for any a tags where the link is not absolute/external.
  3. The href for any of these links would then be transformed as any other link running through l().

Also of note is the internal links module (http://drupal.org/project/intlinks). I was hoping that purl would get called when that filter processed links, but it seems that module just adds titles to the links.

Comments

seanbfuller’s picture

I spent some time on this and have a partial solution, but I think it is not viable due to performance concerns. Mainly, if we want to apply PURL to internal links in content, then we need to set the input filter cache to false. This causes the entire input filter for a field to be un-cached. This means that all input filtering is re-run for all page requests for logged in users. Anonymous users get a page-level cache, so they do not have this issue.

Here's my code for reference. Things to note:

  • The module that I created for this is called purl_filter.
  • Obviously this is just a prototype to explore the feasibility of this and test performance. Some of what I did was based on things in the internal links module.
  • Note that I'm only worried about links in the form of "/internal/path" for now.
  • This filter should be placed at the end of the filter processing list. For my testing, it was turned on for filtered html and placed at the end.

First the info file:

name = "Purl Filter"
description = "Apply PURL link processing to links in contnet"
core = 7.x
dependencies[] = purl

Next the actual module file (note that I left my dsm calls in there but commented them out):

<?php
/**
 * @file
 * Apply PURL link processing to links in contnet
 */

/**
 * Implements hook_filter_info().
 */
function purl_filter_filter_info() {
  $filters['purl_filter_links'] = array(
    'title' => t('Apply PURL to internal links'),
    'description' => t('Processes persistent url logic to internal links in content.'),
    'process callback' => '_purl_filter_links_process',
    'settings callback' => '_purl_filter_links_settings',
    'default settings' => array(),
    'cache' => purl_filter_is_cachable_callback(),
    'tips callback' => 'purl_filter_links_tips',
  );
  return $filters;
}

/**
 * Callback to see if we should cache this filter
 * We get no extra parameters during the call. This means we can't assess the text on a case-by-case basis?
 */
function purl_filter_is_cachable_callback($a = NULL, $b = NULL) {
  //dsm('called purl_filter_is_cachable_callback');
  //dsm($a);
  //dsm($b);
  return FALSE;
}

/**
 * Filter process callback
 * Types of links:
 * - <a href="/internal/path">
 * - Just scanning for "/internal/path" is not supported
 * - Scanning for <a href="http://domain.com/internal/path"> is not supported
 */
function _purl_filter_links_process($text, $filter, $format, $langcode, $cache, $cache_id) {
  // Find an <a> tag and pass to a preg replace callback function for processing
  $pattern = '%<a([^>]*?href="([^"]+?)"[^>]*?)>%i';
  return preg_replace_callback($pattern, "_purl_filter_links_process_link", $text);
}

/**
 * Preg replace callback function to process each link
 */
function _purl_filter_links_process_link($matches) {
  //dsm('called _purl_filter_process_link');
  //dsm($matches);
  
  // Create a local variable for our raw url
  $original_href_full = $matches[2];
  
  // Return if the first string is not a slash
  if (($original_href_full[0] != "/")) {
    return $matches[0];
  }

  // Remove the slashes from each end so that drupal will recognize it.
  $trimmed_path = ltrim($original_href_full, "/");

  // Build the path url by letting drupal_parse_url pull it apart
  $options = drupal_parse_url($trimmed_path);
  $path = $options['path'];
  //dsm($options);
  //dsm('path is '. $path);
  
  // get purl to alter it by passing it through url().
  $altered  = url($path, $options);
  //dsm('altered is '. $altered);
      
  // Replace the url with the new one
  $new_output = str_replace($original_href_full, $altered, $matches[0]);
  //dsm($new_output);
  return $new_output;

}

/**
 * Filter settings callback
 */
function _purl_filter_links_settings($form, &$form_state, $filter, $format, $defaults, $filters) {
  // would we want to add a way to define multiple domains to check against?
}


/**
 * Filter tips callback
 */
function _purl_filter_links_tips($filter, $format, $long = FALSE) {
  return '';
}
?>

In doing some quick tests, I didn't notice a huge performance hit. The page timer and memory usage stayed about the same for a logged in user. However, I would think that as the number of concurrent users scale, the impact would get more noticeable.

Next steps

What would be required to make this a viable solution is to be able to cache the input when no internal links are present. I have yet to find a way to do this, but I'm not an expert in working with the input filters. If someone with more knowledge of those systems has a thought on how to make that happen, it would be great to hear.

An alternative method that I want to explore next is to move this into the preprocessor level. That would allow the input filter to be cached, and then add this processing before it gets passed to the theme. I'll be digging into that over the next few days.

Any feedback is appreciated.