This is a great module and API. I am looking to add it to a high availability site. As part of our review we wanted to be able to queue requests for short urls.

Queuing the requests has the following advantages:

  1. A page render from a cold cache does not have to wait for a third party shortening service to respond before returning the page.
  2. A site shouldn’t be affected by a third party service going down. If it was attempting to wait for the full timeout on a high traffic site then server could easily max its network connections and possibly crash.

The disadvantages are:

  1. The page requests won’t include the shorten URL until the queue is processed.
  2. If not already done a cron needs to be setup to process the queue.
Members fund testing for the Drupal project. Drupal Association Learn more

Comments

JoeMcGuire’s picture

FileSize
12.63 KB

This patch allows the site builder to choose between the two request methods. By default it uses the current workflow requesting the short URL on page render but if the Queue API is enabled then the site builder can switch to queuing requests.

The main API function shorten_url() has a new third parameter $options array. Which can be used to bypass cache checking and creating queue items.

IceCreamYou’s picture

Status: Active » Needs work

Queuing requests will not reduce the server load; in fact, there is no implementation that will not increase it, and yours increases it substantially. I'm interested in this approach though because it does minimize load time for the first user who requests a shortened URL -- since they won't get back a shortened URL at all. I suppose there could be situations where this is desirable, if the goal is to minimize the load time for pages with an unprimed cache.

That said, new features need to be written for and committed to Drupal 7 before I will consider backporting them. There should never be a feature in an older version of the module that is not in a newer one.

But anyway, did this review, got halfway through it, realized this is a compound patch. This needs to be a single patch with no revisions to previous (uncommitted) patches before I will review it again.

+++ b/shorten.admin.inc
@@ -87,6 +87,32 @@ function shorten_admin($form_state) {
+  $form['shorten_workflow'] = array(

This option should only appear if drupal_queue is enabled. If it's there but disabled, people can still manipulate the DOM to submit (and change) the value illegally.

+++ b/shorten.admin.inc
@@ -87,6 +87,32 @@ function shorten_admin($form_state) {
+      'queue' => t('Queue requests to fetch at a later time. Requires !queue_module.', array('!queue_module' => $queue_module)),

If the form item only shows when drupal_queue is enabled then we don't need to say that this option requires that module. But regardless, the correct way to include links in translated content is t('This is some <a href="!url">linked text</a>', array('!url' => url('my-url')));

+++ b/shorten.admin.inc
@@ -87,6 +87,32 @@ function shorten_admin($form_state) {
+    '#default_value' => module_exists('drupal_queue') ? variable_get('shorten_workflow', 'render') : 'render',

Any new variable you introduce needs to be deleted during hook_uninstall().

+++ b/shorten.admin.inc
@@ -87,6 +87,32 @@ function shorten_admin($form_state) {
+    $form['shorten_cache_queue_duration'] = array(
+      '#type' => 'textfield',
+      '#title' => t('After creating a queue item do not create another for'),
+      '#field_suffix' => ' '. t('seconds'),
+      '#description' => t('If "When to process" is set to "queue" then new URL requests will be added to the queue to process later. The same URL will not be added to the queue for this grace period.') .' '.
+      t('You should set this to the same interval that you process the queue. The default value is 300 (5 minutes).'),
+      '#size' => 11,
+      '#required' => TRUE,
+      '#default_value' => variable_get('shorten_cache_queue_duration', 300),
+    );

I don't understand this option. Shorten URLs' default behavior is to retrieve URLs once and then cache them for a very long time. If you are only caching them for 5 minutes, you have to request the same URL over and over again, which is much more draining on your server than just requesting the URL once during rendering. Additionally, if you keep throwing away the cached URL every 5 minutes, you're going to get a lot of page requests that have to use the original (long) URL. We don't need yet another option for how long to cache stuff; just use the existing ones.

+++ b/shorten.module
@@ -109,6 +109,17 @@ function shorten_flush_caches() {
+    'worker callback' => 'shorten_url_cron_queue', ¶

Trailing whitespace

+++ b/shorten.module
@@ -121,19 +132,58 @@ function shorten_flush_caches() {
-function shorten_url($original = '', $service = '') {
+function shorten_url($original = '', $service = '', $may_queue = TRUE) {

$use_queue is better than $may_queue. The word "may" means "is allowed to" which implies that queueing might not happen. If it is requested and allowed, it should always happen.

+++ b/shorten.module
@@ -121,19 +132,58 @@ function shorten_flush_caches() {
+    // If queuing requests is supported and enabled then create an item and

"queueing"

JoeMcGuire’s picture

FileSize
8.06 KB

Thanks for such a quick response.

Attached is a collated patch which should make reviewing easier.

I see your point about patching D7 first. This patch is also for D6 simply because it what I'm working on. Ideally how do you want the issues organized? - Change this issue to version D7 and create a new issue to track a backport patch file for D6?

Sorry I have not I explained the purpose of this new feature very well. Its not about reducing load on servers but instead to have control over when they are waiting for external services. Whenever you wait for a response from an external service it introduces an unknown wait time - some times fast and some times slow or possibly a timeout. For the majority of sites these influxes can be easily handled but when you have tens of thousands of pages and thousands of visitors a minute these wait times can easily bring a site down. Simply because a page cannot be returned until the external service has returned or timed out. I have explored the option of warming the cache but that introduced a number of other complexities so I think adding a queuing feature is our best option.

I have made the following changes:

  • I have removed one of the variables shorten_cache_queue_duration from the previous patch as I do not think it is required to have this grace time any more. Instead I have used the constant CACHE_TEMPORARY and the cache is overridden when the queue it processed.
  • Added variable shorten_workflow to hook_uninstall().
  • Change option name may queue to use queue.
  • Removed trailing whitespace.
  • Corrected comment the typo 'queuing'.

Regarding the workflow radio buttons:

  • I displayed the option for the queue workflow to help site builders discover the feature and avoid any frustration of options only sometimes being visible depending on what modules are currently enabled. The link to the Queue API module was put in to help show the required dependancy. Another option would be to outline the feature in the README.TXT but I do not think these are accessible as the UI.
  • I think all the checks for module_exists('drupal_queue') are still required as a site builder could either disable the module when the shorten_workflow is set to 'queue' or force the variable to be 'queue' when the Queue API is not installed which would result in PHP errors.
  • I tested DOM manipulation and the field is safe. The Form API will only return the #default_value for a #disabled field. See: http://api.drupal.org/api/drupal/includes--form.inc/function/form_type_c...
  • I am happy to set the field to only be visible when the module is enabled under your discretion but first wanted to explain why I had displayed the options.

I have also removed some of condition checking around the use of get_cache() as the function is already validating the expiry and the logic did not look to be respecting CACHE_PERMANENT.

JoeMcGuire’s picture

FileSize
8.03 KB

Improved patch attached.

Only diff is to set the hook_cron_queue_info() time value to 60. It was being miss assigned to an unsuitable variable before.

IceCreamYou’s picture

Version: 6.x-1.x-dev » 7.x-1.x-dev

Ideally how do you want the issues organized? - Change this issue to version D7 and create a new issue to track a backport patch file for D6?

Ideally D7 work happens in this issue and then when it's committed the status changes to "patch (to be ported)" and then it's backported to D6. Nothing will get committed to D6 until it's been committed to D7.

Its not about reducing load on servers but instead to have control over when they are waiting for external services.

This also assumes that you have control over which processes run during cron, since the only reason this would matter for performance is if you could delay shortening URLs during a traffic spike. (Or I guess if you could run cron processes on servers not responsible for serving page requests.)

I don't know what numbers your site will be seeing, but I know this module runs on really huge government sites and other large sites without performance problems, so I would be interested in seeing the tests you've done that show that there are potential scaling issues, if you can share them. (Really I should be requiring performance numbers since this is primarily a performance patch.)

Whenever you wait for a response from an external service it introduces an unknown wait time - some times fast and some times slow or possibly a timeout.

It's not entirely unknown -- you can set the maximum timeout, and the default is 3 seconds. I guess if you're really concerned about the timeout you can use a module like shorturl or ShURLy to build your own URL shortening service in Drupal, and Shorten URLs will then use native function calls instead of making a page request.

I have removed one of the variables shorten_cache_queue_duration from the previous patch as I do not think it is required to have this grace time any more. Instead I have used the constant CACHE_TEMPORARY and the cache is overridden when the queue it processed.

Why cache the original URL at all at that point? Ideally you end up overwriting the cache when the queue is processed, but there is the possibility that the cache expires before the queue gets processed and you end up queueing the same URL multiple times. Instead, I'd just check whether the original URL we're dealing with is already in the queue (example of how to retrieve queue items manually).

Corrected comment the typo 'queuing'.

I checked this and I was actually wrong and you were right. The correct spelling is "queuing;" "queueing" is incorrect. Sorry about that

  • I displayed the option for the queue workflow to help site builders discover the feature and avoid any frustration of options only sometimes being visible depending on what modules are currently enabled. The link to the Queue API module was put in to help show the required dependancy. Another option would be to outline the feature in the README.TXT but I do not think these are accessible as the UI.
  • I think all the checks for module_exists('drupal_queue') are still required as a site builder could either disable the module when the shorten_workflow is set to 'queue' or force the variable to be 'queue' when the Queue API is not installed which would result in PHP errors.
  • I tested DOM manipulation and the field is safe. The Form API will only return the #default_value for a #disabled field. See: http://api.drupal.org/api/drupal/includes--form.inc/function/form_type_c...
  • I am happy to set the field to only be visible when the module is enabled under your discretion but first wanted to explain why I had displayed the options.

You're right about the module_exists() checks; this is more about not showing people options that they have no access to or way to change. It would be like if Windows had a Microsoft Word icon on the desktop by default, just to remind you that you can buy Office. Feature discovery belongs in the project description and documentation because it should happen before the module is installed, not during use. In general there is no guarantee that anything on your site will remain consistent when you install a new module, so I don't consider this behavior to be inconsistent. Additionally this is a pretty edge-case feature that I expect to be very rarely used, especially in D6, so there is no need to clutter the UI for the majority of users.

I have also removed some of condition checking around the use of get_cache() as the function is already validating the expiry and the logic did not look to be respecting CACHE_PERMANENT.

I believe that in general the cache only expires during cron; that check makes it happen closer to the actual expiration date. It was committed fairly recently and I think (although I'm not 100% sure) that it's harmless. It only makes a significant difference for sites that run cron relatively infrequently but have set the cache for failed requests to expire more frequently than cron runs.

+++ b/shorten.admin.inc
@@ -87,6 +87,19 @@ function shorten_admin($form_state) {
+    '#title' => t('When to process'),
+    '#description' => t('The default is to request short URLs when a page is rendered which can slow down the time it take to deliver a page. If you have the Queue API module enabled then you can also queue request to process at a later time.'),
+    '#options' => array(
+      'render' => t('Request while the page is rendered.'),
+      'queue' => t('Queue requests to fetch at a later time. Requires <a href="@queue_module_url">Queue API</a>.', array('@queue_module_url' => $queue_module_url)),
+    ),

I would prefer the wording to be like this:

+    '#title' => t('Delay requests to URL shortening services'),

+    '#description' => t('To retrieve shortened URLs, the Shorten URLs module makes requests to the designated URL shortening services. When a shortened URL is retrieved, it is cached so that the request does not have to be made again. However, sometimes URL shortening services are slow or fail to respond when Shorten URLs requests a shortened URL, in which case Shorten URLs will time out so that pages can still return quickly. You can configure the maximum request time, but if this timeout is still too slow for you, you can choose to queue requests for shortened URLs. <strong>URLs that have been queued will return the original (long) URL, not a shortened URL, until the queue has been processed.</strong> By default the queue gets processed when cron runs. However, queuing the URL shortening process is really only useful if you can delay processing the queue during traffic spikes or URL shortening service downtime, or if cron tasks can run on servers that are not responsible for serving pages to users, so you should make sure you have a module that allows you to do this before enabling this feature.'),

+    '#options' => array(

+      'render' => t('Request shortened URLs during page rendering'),

+      'queue' => t('Queue requests for shortened URLs'),

+    ),
+++ b/shorten.module
@@ -109,6 +109,17 @@ function shorten_flush_caches() {
+    'time' => 60,

Am I correct in assuming that this is the maximum amount of time that this queue should be processed during cron? If so, this should be configurable (unless drupal_queue lets you do that already?)

+++ b/shorten.module
@@ -118,22 +129,87 @@ function shorten_flush_caches() {
+  $default_options = array(
+    'use queue' => TRUE,
+    'check cache' => TRUE,
+  );
+  $options = array_merge($default_options, $options);

Simpler to use += syntax, e.g. $options += array('use queue' => TRUE, 'check cache' => TRUE,);

+++ b/shorten.module
@@ -118,22 +129,87 @@ function shorten_flush_caches() {
-  //First try to retrieve a value from the cache.

No need to remove this comment...

+++ b/shorten.module
@@ -118,22 +129,87 @@ function shorten_flush_caches() {
+    $cache_duration = variable_get('shorten_cache_fail_duration', 1800);
+    $expire = time() + $cache_duration;

These lines can be combined, no need for an extra variable there

+++ b/shorten.module
@@ -118,22 +129,87 @@ function shorten_flush_caches() {
+ * Helper function to call the shortening service.

Function descriptions should start with a verb, e.g. "Retrieve the shortened URL from the shortening service."

+++ b/shorten.module
@@ -118,22 +129,87 @@ function shorten_flush_caches() {
+ * Method also check a secondary service if the first fails.

"Falls back to a secondary service if the primary service fails."

+++ b/shorten.module
@@ -118,22 +129,87 @@ function shorten_flush_caches() {
+ *   An abbreviated URL or FALSE if one could not be generated.

"An abbreviated URL or FALSE if retrieving a shortened URL from both the primary and secondary services failed."

IceCreamYou’s picture

Are you still working on this? Or do you have statistics available demonstrating that this patch improves scalability?

JoeMcGuire’s picture

The patch is deployed on economist.com - it wouldn't have been possible to use the module without the patch for the reasons I highlighted earlier in the thread.

I stress again this is is not about load or scalability. It's required for stability - a system should never rely on an external services - smaller system may get away with waiting for the 1 second timeout and then continuing to return their own page requests. But for larger systems this extra second would be disastrous causing apache server to hold mysql connections open longer than the infrastructure had been tested for followed by reaching the maximum number of connections.

I do hope to make your changes and submit another patch again D7 when I find the time.

IceCreamYou’s picture

Okay, got it. Sounds good. If submitting a patch against D6 is easier I'll accept that -- porting to D7 should be pretty straightforward since I believe the queue API is a straight backport.

IceCreamYou’s picture

5 months later -- any progress on changes to this patch?