Now that we have entity path callback in core, we can go one step further in performance optimization and code cleanup. This was originally motivated by #687712: Optimize rdf.module for displaying multiple comments but it turns out that the rest of core can benefit from it as well. Moshe suggested something similar at #687712-26: Optimize rdf.module for displaying multiple comments and I'd like to investigate a more generic approach on a per entity basis (not only for comments).

Throughout core there are a lot of times where url() calls are used to generate a link to the entity (node, comment, term, etc.). The whole idea of this issue is to centralize these calls during the entity loading phase using the defined path callback, and add this to the entity object so it can be used later in the theme layer or when generating other output formats like RSS, RDF/XML, etc. Each call to url() is expensive, and saving a few of them per page load can help with performance. The RDF module makes extra calls to url() to generate this uri, so having it as part of the entity object allows us to optimize it and only generate the uri only once per entity load.

An example where core can benefit is in template_preprocess_comment():

  $variables['title']     = l($comment->subject, 'comment/' . $comment->cid, array('fragment' => "comment-$comment->cid"));
  $variables['permalink'] = l('#', 'comment/' . $comment->cid, array('fragment' => "comment-$comment->cid"));

where url is called twice - via l() - to generate the very same link and duplicate the logic of comment_path(). On a page with 50 comments, that would 50 calls to url() saved right away if we were to only call url() once per comment.

Furthermore, having this uri as part of the entity objects allows modules like entitycache to cache it as part of the entity data, which should increased performance even more.

CommentFileSizeAuthor
#101 entity_uri-use-699440-101.patch53.7 KBeffulgentsia
#94 entity_uri-use-699440-94.patch53.35 KBeffulgentsia
#90 entity_uri-use-699440-89.patch52.23 KBeffulgentsia
#85 entity_uri-use-699440-84.patch52.22 KBeffulgentsia
#84 entity_uri-use-699440-84.patch52.23 KBeffulgentsia
#83 entity_uri-use-699440-83.patch50.89 KBeffulgentsia
#75 entity_uri-use-699440-75.patch50.54 KBeffulgentsia
#72 entity_uri-use-699440-72.patch49.77 KBeffulgentsia
#70 entity_uri-use-699440-70.patch37.68 KBeffulgentsia
#66 699440_entity_uri_66.patch14.22 KBscor
#60 entity_uri.patch15.66 KBcatch
#57 699440_entity_uri_57.patch20.18 KBeffulgentsia
#55 699440_entity_uri_55.patch19.68 KBeffulgentsia
#50 699440_entity_uri_43.patch19.7 KBeffulgentsia
#49 entity_uri.patch18.98 KBbcn
#44 699440_entity_uri_43.patch19.7 KBeffulgentsia
#39 head.png36.56 KBcatch
#39 patch.png34.99 KBcatch
#38 699440_entity_uri_38.patch17.82 KBscor
#37 699440_entity_uri_37.patch19.37 KBeffulgentsia
#34 699440_entity_uri_34.patch13.9 KBscor
#31 699440_entity_uri_31.patch14.57 KBscor
#29 699440_entity_uri_28.patch10.72 KBscor
#19 699440_entity_uri_19.patch7.68 KBscor
#17 699440_entity_uri_17.patch7.61 KBscor
#14 699440_entity_uri_14.patch7.61 KBscor
#12 699440_entity_uri_12.patch9.34 KBscor
#11 699440_11_url_caching.patch9.5 KBscor
#10 front_10_nodes_c_anon.txt6.04 KBscor
#10 front_10_nodes_c_admin.txt18.13 KBscor
#10 node_50_c_anon.txt24.8 KBscor
#10 node_50_c_admin.txt45.06 KBscor
#10 admin_content_comment.txt31.85 KBscor
#10 699440_10_url_caching.patch1.6 KBscor
#9 699440_entity_uri_9.patch9.34 KBscor
#5 699440_entity_uri_5.patch8.07 KBscor
#4 699440_entity_uri_3.patch7.42 KBscor
#1 699440_entity_uri_1.patch7.66 KBscor

Comments

scor’s picture

Status: Active » Needs review
StatusFileSize
new7.66 KB

A patch should help to understand this issue a bit more! This patch relies and includes the patch I posted at #525622-34: 'as link' formatters need a generic way to build the url of an 'entity'. The entity URI is added in attachLoad() and might be better located somewhere else, but it works as it is (proof of concept).

Status: Needs review » Needs work

The last submitted patch, 699440_entity_uri_1.patch, failed testing.

moshe weitzman’s picture

I like this a lot. Lets fix up those test failures and move ahead.

scor’s picture

Status: Needs work » Needs review
StatusFileSize
new7.42 KB

Sometimes the entity does not exist yet (like in a node and comment preview for example). This patch should pass all node and comment tests at least.

Note I left out plenty of calls to url() which are either with the absolute option or hidden behind l(). We should try to catch these suckers as well. I'll open an issue to allow to pass an already existing url to l() unless someone else beats me to it. The absolute option for url() could be reproduced outside url().

scor’s picture

StatusFileSize
new8.07 KB

The reason for the other tests to fail was that forum_url_outbound_alter() contains a nasty term load, which leads to an infinite recursion when calling url() during the entity load:

function forum_url_outbound_alter(&$path, &$options, $original_path) {
  if (preg_match('!^taxonomy/term/(\d+)!', $path, $matches)) {
    $term = taxonomy_term_load($matches[1]);
    if ($term && $term->vocabulary_machine_name == 'forums') {
      $path = 'forum/' . $matches[1];
    }
  }
}

Side note: The regular expression feels like a hack and I wonder how this perform on pages with many calls to url() when the forum module is enabled. I'm not sure why this cannot be done in the same fashion as the comment permalinks have been implemented, i.e. via a forum/%forum menu item which loads the term page. For reference, this was introduced in #320331: Turn custom_url_rewrite_inbound and custom_url_rewrite_outbound into hooks.

For now I've added a condition to prevent infinite recursion, but hopefully we can improve this.

We'll need benchmarks for HEAD, HEAD with entitycache, and number of calls to url() saved on pages like node w/ 50 comments, front page, etc.

catch’s picture

I'd think that forum.module could hook_entity_info_alter() taxonomy.module to add it's own uri_callback (or whatever we end up calling it) - which then checks for forum vocabulary.

Additionally the check on machine name is wrong, it should use variable_get('forum_nav_vocabulary');

scor’s picture

Quick benchmark of #5:

displaying 50 comments
HEAD 142.5ms
#5   139.5ms
2.1%

displaying 300 comments
HEAD 642ms
#5   621ms
3.2%
scor’s picture

I did some micro benchmarks of l() and url(). Times were measured by calling

l('Praesent Elit Blandit Qui', 'comment/1990', array('fragment' => 'comment-1990'));

1000 times. times are in microseconds (us). each line removes an element of the return at the end of l() and shows in parenthesis the time each removed element takes.

31.0 total for l()
29.5 ( 1.5us) w/o drupal_attributes()
 9.3 (20.2us) w/o url()
 7.1 ( 2.2us) w/o check_plain() url
 4.6 ( 2.5us) w/o check_plain() text
 0.0 ( 4.6us) overheads

note: add about 8us to url() when the forum module is enabled.

url() represents 65% (20us) of the l() execution time - that means that if we have already built a url via url() prior to calling l(), we should be able to save a lot when calling l(). Unfortunately, l() does not accept pre-built urls. l() does not alter the input parameters of url() at all. It does alter $options, but only the 'attributes' and 'html' keys which are not used by url(). That means it would be possible to pass a prebuilt url to l() and include it in the return A tag (modulo an API change of course). This would be an alternative to hard-coding the A tag like we're doing in the last patch in template_preprocess_comment(). However using l() will always have more overhead than the hard-coded alternative.

When displaying 50 comments, HEAD calls l() twice per comment. The idea is to prebuild the comment uri and pass it to l() - or hardcode the A tag like we do in #5. That saves 50 calls to url(), which counts for 1ms - compared to a page load of 140ms, it starts to matter. (and they are more call to l()/url() to be saved).

For future reference, other random benchmarks I did outside the context of l(), which confirm the numbers above.

2.2us   check_plain('Praesent Elit Blandit Qui');
2.3us   check_plain('/d7gitfr/comment/1990#comment-1990');
4.8us   '<a href="' . check_plain('/d7gitfr/comment/1990#comment-1990') . '">' . check_plain('Praesent Elit Blandit Qui') . '</a>';
31.0us   l('Praesent Elit Blandit Qui', 'comment/1990', array('fragment' => 'comment-1990'));
18.4us   url('comment/1990', array('fragment' => 'comment-1990'));
scor’s picture

StatusFileSize
new9.34 KB

prototype patch for l() accepting new 'url' option, which will bypass the call to url(), similar to the 'alias' option which bypasses the call to drupal_get_path_alias().

31.1us l('Praesent Elit Blandit Qui', 'comment/1990', array('fragment' => 'comment-1990'));

11.1us l('Praesent Elit Blandit Qui', '/d7gitfr/comment/1990#comment-1990', array('url' => TRUE));

5.17us '<a href="' . check_plain('/d7gitfr/comment/1990#comment-1990') . '">' . check_plain('Praesent Elit Blandit Qui') . '</a>';

so optimized l() still has an overhead, and we might want to use the hard-coded version in performance critical contexts like template_preprocess_comment(). I've made use of the new l() in template_preprocess_comment() to show how to use it.

scor’s picture

Status: Needs review » Needs work
StatusFileSize
new1.6 KB
new31.85 KB
new45.06 KB
new24.8 KB
new18.13 KB
new6.04 KB

Investigating a different alternative here: provide an opt-in caching in url(). I've logged the calls to url() for certain pages, and many of them are redundant. Since many others are not, adding a key to $options triggers the cache.

A few examples:

comment/137 | a:1:{s:8:"fragment";s:11:"comment-137";}
comment/137 | a:1:{s:8:"fragment";s:11:"comment-137";}
comment/137 | a:3:{s:4:"html";b:0;s:8:"fragment";s:11:"comment-137";s:10:"attributes";a:0:{}}
comment/137 | a:3:{s:4:"html";b:0;s:8:"fragment";s:11:"comment-137";s:10:"attributes";a:0:{}}

The 2 last lines look different, but they are in fact treated the same by url(), the extra keys are crap that l() left behind. Once removed, all these 4 can be put in the same cache item.

node with 50 comments as anonymous:

user/login | a:1:{s:5:"query";a:1:{s:11:"destination";s:19:"node/1#comment-form";}}

is duplicated 50 times... same for user/register.

scor’s picture

Status: Needs work » Needs review
StatusFileSize
new9.5 KB

This patch introduces url_fast(), a static cache atop url(). By using a separate function as opposed to a key in $options, we ensure all normal calls to url() do not have any extra overhead at all.

scor’s picture

StatusFileSize
new9.34 KB

Let's refocus on the original goal of this issue and continue investigating url caching in a separate issue.

This is a reroll of #9, which introduced a new 'url' option to skip the call to url(). It's used when we already have the value return by url(), aka the entity URI. See micro-benchmark at #9 which showed a gain of about 20us when skipping url() in l().

We're mostly saving on comments here. In HEAD we're calling l() twice and url() once in rdf_preprocess_comment(). With the patch we're only calling url() during the entity load and outputing it thereafter. That's 2 calls to url() saved per comment.

There is also one call to url() saved when viewing a full node (template_preprocess_node and node_page_view).

The alternative to adding this new key to l() is to hard-code the A tag, which is in fact even faster. So we need to choose between a non-breaking API change or hard-coded A tag.

This patch depends on #525622: 'as link' formatters need a generic way to build the url of an 'entity' which hopefully will get fixed soon, it's mostly about agreeing on the right terminology...

scor’s picture

In HEAD we're calling l() twice and url() once in rdf_preprocess_comment(). With the patch we're only calling url() during the entity load and outputing it thereafter directly or through l(). That's 2 calls to url() saved per comment, which are about 20us each on my machine, total is 40us / comment. There is also one call to url() saved on a full node view. For 50 comments, that's 20 + 50 x 40 = 2020 us (~2ms). For 300 comments, it's 20 + 300 x 40 = 12ms.

Actual page load benchmarks look good:

Displaying 50 comments:
HEAD  140.0ms  
patch 137.9ms
1.5% faster

Displaying 300 comments:
HEAD  747ms
patch 727ms
2.6% faster
scor’s picture

catch’s picture

This looks great to me except for the hunk which has to account for the brokenness in http://api.drupal.org/api/function/forum_url_outbound_alter/7

1. Doing a regexp on every url.

2. Doing an taxonomy_term_load() on every url that matches.

Instead we should have forum_entity_info_alter() change the entity_path() callback to it's own, handle the forums vocabulary properly, then ensure entity_path() is used in field formatters and other places consistently as the replacement to taxonomy_term_path().

That doesn't have to be dealt with in this issue, but the infinite recursion possibilities this currently has can happen anywhere. Note that's not just a problem with this patch - any module anywhere running url('node/1'); inside a hook_$something_load() could end up triggering infinite recursion.

Status: Needs review » Needs work

The last submitted patch, 699440_entity_uri_14.patch, failed testing.

scor’s picture

Status: Needs work » Needs review
StatusFileSize
new7.61 KB

Thanks bot for catching that. yes, entity_path() is now entity_uri().

catch’s picture

scor’s picture

StatusFileSize
new7.68 KB

added a @todo to remove the infinite loop prevention code when #712076: Infinite loop possibility and performance problems in forum_url_outbound_alter() is fixed.

scor’s picture

#19: 699440_entity_uri_19.patch queued for re-testing.

effulgentsia’s picture

subscribe

catch’s picture

Title: Centralize entity URI in entity object and save on url() calls » Add bundle support to entity_uri callback, use for taxonomy terms, save url() calls (regression from D6)
Priority: Normal » Critical

I'm bumping this to critical. We have a major performance / infinite recursion regression in trying to use hook_url_outbound_alter() instead of taxonomy_term_path() between D6 and D7. Fixing that mainly means a change from taxonomy_term_path($term) to taxonomy_term_uri($term), and adding bundle support to the entity uri callbacks key, which is only an internal change in entity api which will affect very few, if any, entity modules in the wild. See #712076: Infinite loop possibility and performance problems in forum_url_outbound_alter() which has a workaround for both issues but needs to be fixed fully here, marking that other issue RTBC so we can make progress in this one.

Also retitling.

catch’s picture

Issue tags: -code cleanup +Performance
webchick’s picture

Here's a summary of some general discussion tonight in #drupal. It kinda spans multiple issues, but figured I'd stick it here as kind of a central 'bridging' place.

catch first pointed me to #712076: Infinite loop possibility and performance problems in forum_url_outbound_alter() as an "easy" patch to commit. :P (Note to self: never trust catch ;))

What that patch does, however, is take 6 or so lines of easily understandable code and turn them into a 25 line kitten killing monster which includes its own recursion detection. When I see code like that in a one-off module it creates the distinct impression that other modules are going to need similar code to handle that. Ick. So I would actually recommend not removing the recursion detection from this one and instead keeping it there. (and marking the other issue won't fix)

However, this approach is only really going to solve the issue for entities. There are tons of other modules out there (Dave Reid mentioned feedburner) that also will implement hook_url_outbound_alter() and don't get to benefit from this approach. It makes me seriously question whether we ought to roll that patch back since it was fairly late to begin with, and this makes at least 2-3 serious performance regressions as a result...

On this patch in particular, my concern is that lines like this:

- drupal_add_html_head_link(array('rel' => 'canonical', 'href' => url('node/' . $node->nid)), TRUE);
+ drupal_add_html_head_link(array('rel' => 'canonical', 'href' => $node->uri), TRUE);

...are really confusing, because nowhere else do we do this sort of "magic" pre-url()ing of urls, so I could see module developers scratching their head a good bit, and ending up doing double-duty on calling url(). I'm also not real keen on that 'url' parameter to $options. It feels like we ought to be consistent everywhere (but catch informs me this may not be possible).

In the forum issue, Dave Reid mentioned an idea about different modules passing various "contexts" they need in that $options array, to make hook_url_alter() more performant. So for example:

url('taxonomy/term/' . $term->tid, array('term' => $term));

and forum_url_outbound_alter could check for $options['term']

This, however, strikes me as a complete nightmare since there'd be no way to document the possible values in that $options array in the url() documentation, and creates a precedent for any random module sticking whatever random crap they want in a honking array that gets passed around a lot.

So.. not a lot of answers here, I'm afraid, but those are my first impressions.

chx’s picture

Rollback++ apparently http://drupal.org/node/320331#comment-2188534 is not so cached after all.

effulgentsia’s picture

Status: Needs review » Needs work

Whether or not to rollback hook_url_(in|out)bound_alter() should be discussed in a separate issue. But I don't think it's fair to treat forum.module's inappropriate use of the hook as an indicator that the hook is bad. hook_url_outbound_alter() is architecturally sound when used correctly. Correct uses include the locale module. Also, http://drupal.org/project/domain. And to handle proxy issues as described in #535612-14: Fix Drupal 6 url() function to call custom_url_rewrite_outbound for external links too. That there are at least 3 independent legitimate uses of this hook makes me like it as a hook rather than a hard-coded function name like custom_url_rewrite_outbound(), but I can also appreciate chx's perspective that it's not worth the performance cost. Anyway, that cost-benefit analysis belongs in a separate issue. If we decide to keep the hook, we should document it better, so that mistakes like forum_url_outbound_alter() don't happen again in core, and are minimized in contrib.

Regardless whether we rollback that hook, I think we all agree here that forum_url_outbound_alter() needs to go. And for that to happen, we need entity_uri() to support per-bundle 'uri callback'. This is just an abstraction of what D6's taxonomy_term_path() does, so it will give us everything that function gave us, but for all entities.

Some of the code in #19 still needs work and can be made much cleaner once we have per-bundle uri callbacks supported.

However, this approach is only really going to solve the issue for entities. There are tons of other modules out there (Dave Reid mentioned feedburner) that also will implement hook_url_outbound_alter() and don't get to benefit from this approach.

Flexible path creation for entities is a problem that Views and Panels have been struggling with for years. Solving it is a huge leap forward that we should be massively proud of. Sure, flexible path creation for non-entities would be great, but we're not gonna get that in D7 at this point. If that leads to some contrib modules using hook_url_outbound_alter() to hack around that limitation, that's the price we pay for not having achieved perfect path architecture yet. But these are contrib modules that are already hacking in this way via the http://drupal.org/project/url_alter module.

catch’s picture

Yeah I think the main thing here is to create a division of responsibility between altering arbitrary paths, and allowing paths which have a relationship to something we already know about, and have their own API attached. So if we can only do entity_path() here then that fixes the regressions, we'll find out during the D7 cycle what crazy implementations are done to hook_url_outbound_alter() which could be handled better via another method. The main purpose of this issue should be to not ship with any of those.

scor’s picture

Status: Needs work » Needs review

This patch extends entity_uri() to allow for optional bundle specific uri callback. It also adds forum_entity_info_alter() and forum_uri() to forum.module so forum_url_outbound_alter() can be removed. I've fixed a few places in taxonomy.module to take advantage of the new term->uri so it plays well with forum.module, but I might have missed some places. I'm hoping the testbot can help me here.

scor’s picture

StatusFileSize
new10.72 KB

here goes the patch...

Status: Needs review » Needs work

The last submitted patch, 699440_entity_uri_28.patch, failed testing.

scor’s picture

Status: Needs work » Needs review
StatusFileSize
new14.57 KB

path.test tests the hook_url_outbound_alter() functionality so I moved it from forum.module to a dedicated test module forum_url.module.

effulgentsia’s picture

Status: Needs review » Needs work
+++ includes/common.inc
@@ -2165,6 +2165,8 @@ function drupal_attributes(array $attributes = array()) {
  *       escaped HTML.
  *     - 'alias' (default FALSE)
  *       Whether the given path is an alias already.
+ *     - 'url' (default FALSE)
+ *       Whether the given path has been passed to url() already.
  * @return
  *   an HTML string containing a link to the given path.
  */
@@ -2176,6 +2178,7 @@ function l($text, $path, array $options = array()) {
   $options += array(
       'attributes' => array(),
       'html' => FALSE,
+      'url' => FALSE,
     );
 
   // Append active class.
@@ -2221,7 +2224,7 @@ function l($text, $path, array $options = array()) {
   }
   // The result of url() is a plain-text URL. Because we are using it here
   // in an HTML argument context, we need to encode it properly.
-  return '<a href="' . check_plain(url($path, $options)) . '"' . drupal_attributes($options['attributes']) . '>' . ($options['html'] ? $text : check_plain($text)) . '</a>';
+  return '<a href="' . check_plain(($options['url'] ? $path : url($path, $options))) . '"' . drupal_attributes($options['attributes']) . '>' . ($options['html'] ? $text : check_plain($text)) . '</a>';
 }

How do you guys (and gals) feel about pulling back a bit on the scope of this issue and taking all this l() optimization out? I'd like to see this issue be about bundle support for entity_uri(), making sure all places that make links to entities use entity_uri() (either directly or via a $entity->uri) instead of hard-coding a path (see for example, template_preprocess_username()), and removing forum_url_outbound_alter(). Maybe once we accomplish all that, we can open a new issue for optimizing url() calls.

+++ includes/entity.inc
@@ -229,6 +229,30 @@ class DrupalDefaultEntityController implements DrupalEntityControllerInterface {
+    // Add the URI of each entity if available.
+    foreach ($queried_entities as $queried_entity) {
+      if ($url_structure = entity_uri($this->entityType, $queried_entity)) {
+        // Prepare variables to be passed to url().
+        $path = $url_structure['path'];
+        $options = isset($url_structure['options']) ? $url_structure['options'] : array();
+        // Implementations of hook_url_outbound_alter() might try to load the
+        // same entity, so there is a risk of infinite recursion when calling
+        // url(), see forum.module for example. To avoid this recursion, we only
+        // call url() the first time an entity is being loaded. An empty uri is
+        // assigned on eventual subsequent recursive loads.
+        // @todo remove when http://drupal.org/node/712076 is fixed.
+        static $loaded_entities;
+        if (empty($loaded_entities[$path])) {
+          $loaded_entities[$path] = TRUE;
+          $queried_entity->uri = url($path, $options);
+          unset($loaded_entities[$path]);
+        }
+        else {
+          $queried_entity->uri = NULL;
+        }
+      }
+    }
+

If we're getting rid of forum_url_outbound_alter(), do we need all this? Also, as per above, can we make $queried_entity->uri be the result of calling entity_uri() rather than flattening it with a url() call?

+++ modules/forum/forum.module
@@ -219,6 +219,24 @@ function forum_init() {
+ * Implements hook_entity_info_alter().
+ */
+function forum_entity_info_alter(&$entity_info) {
+  // Overrides the generic uri callback of the taxonomy_term entity type to
+  // provide a specific uri for each term of the forums vocabulary.
+  $entity_info['taxonomy_term']['bundles']['forums']['uri callback'] = 'forum_uri';
+}

module_invoke_all() does array_merge_recursive(). So, can this be forum_entity_info() instead? Alter hooks are for making changes, not for adding the primary info data. Or is there something special about entities that makes doing this in an alter hook preferable?

143 critical left. Go review some!

effulgentsia’s picture

Status: Needs work » Needs review

#32 was x-post with #31. CNR for bot.

scor’s picture

StatusFileSize
new13.9 KB

How do you guys (and gals) feel about pulling back a bit on the scope of this issue and taking all this l() optimization out?

I'd like to keep it in here because it accounts for more than half the performance we gain from centralizing entity uris into the each entity object. If others really want to, we could split it into a different issue...

Also, as per above, can we make $queried_entity->uri be the result of calling entity_uri() rather than flattening it with a url() call?

Unless you also store its flatten, "ready to use" value, that would defeat the initial purpose of this issue which was to save on url calls. If you don't flatten it right away and store it in the entity object, then you have to call url() everytime you want to use it. However, we could also keep the structured value of uri so that when you use a variant of url() like 'absolute' for consistency and good design, but that would not bring any performance benefit (and that was the initial goal of this issue).

I removed the infinite loop prevention code, and changed forum_entity_info_alter() to forum_entity_info() per effulgentsia comments.

Status: Needs review » Needs work

The last submitted patch, 699440_entity_uri_34.patch, failed testing.

yched’s picture

The $info['bundles'][$bundle]['uri callback'] property needs to be documented in the phpdoc for hook_entity_info() ?

+++ includes/common.inc
@@ -6437,10 +6440,17 @@ function entity_prepare_view($entity_type, $entities) {
+  // Uses the bundle specific uri callback if it exists.
+  if (isset($info['bundles'][$bundle]['uri callback']) && function_exists($info['bundles'][$bundle]['uri callback'])) {
+    return $info['bundles'][$bundle]['uri callback']($entity) + array('options' => array());
+  }
+  // Otherwise uses the generic entity uri callback for this entity type.
   if (isset($info['uri callback']) && function_exists($info['uri callback'])) {
     return $info['uri callback']($entity) + array('options' => array());
   }

Nitpick : can we refactor this into :
- some logic ending up with $function = something;
- return $fuction($entity) + array('options' => array());

Powered by Dreditor.

effulgentsia’s picture

Status: Needs work » Needs review
Issue tags: +API change
StatusFileSize
new19.37 KB

Well, if we have to optimize url(), how about this way? This changes entity_uri() to return an object instead of an array. That's why I'm adding the "API change" tag to this issue. But, there's only one place (image.field.inc) in HEAD that currently calls entity_uri() (this patch adds more), and I doubt that that many contrib modules have started calling it either, so I think it's an acceptable trade-off considering it allows the desired url() optimization in what I think is a nice way.

This patch doesn't include the forum_url module from #34, and instead changes path.test to be caught up to the removal of forum_url_outbound_alter().

This patch converts all the places that #34 converted from hard-coded urls to entity_uri() driven urls, but this is not complete. There's a larger sweep that's needed. But, I'd like to get a check from people here on whether this is looking like a good route before doing that sweep.

scor’s picture

StatusFileSize
new17.82 KB

Good job effulgentsia! Initial micro-benchmarking on url() is promising:
url('comment/' . $comment->cid, array('fragment' => 'comment-' . $comment->cid)); HEAD = 26.5us
url(entity_uri('comment', $comment)); with #37 = 4.2us

and a regular uri defined as an object such as

$register_uri->path = 'user/register';
$register_uri->options = array();

will also show great performance:
url('user/register'); HEAD = 25us
url($register_uri); with #37 = 2.3us
This particular one is not part of #37 but could be part of a larger sweep effulgentsia mentions (#10 contains some examples).

I could not find any improvement on l() upfront. This was due to a typo in one of the unset for the 'attributes' key in url() necessary for the proper merge. With that fixed, this is what we get:
l('Praesent Elit Blandit Qui', 'comment/1', array('fragment' => 'comment-1')); HEAD = 40us
l('Praesent Elit Blandit Qui', entity_uri('comment', $comment)); with #38 = 18us

catch’s picture

StatusFileSize
new34.99 KB
new36.56 KB

microbenchmarks look good. I also profiled with xhprof on a node with 48 comments and standard display settings, and while it looks like only about .5% improvement in time, general profile of URL looks much more encouraging. Attached screenshots of the url() detail for comparison.

Overall code looks good too. Only thing which concerns me if the caching in url depends on always passing in the same object and acting on it by resource - if we end up cloning by default in #154859: Document that cached entity objects are not necessarily cloned any more then couldn't we end up with different objects being passed in each time, and loose the benefit of caching there?

catch’s picture

Status: Needs review » Reviewed & tested by the community

Thought about this some more, and it's extremely likely that the caching would be on the object passed around the various node view / build functions, so even if it's cloned from node_load() eventually, it's still going to be the same object passed by resource in between all those functions when we run URL on it - certainly it would be for comment listings and similar where this optimization is most pressing.

This is a very important patch in the sense we absolutely need to remove forum_url_outbound_alter() (as opposed to simply rolling back hook_url_outbound_alter() altogether), and because RDF can add up to 20% overhead to a node + 50 comment listing, a lot of which comes from url(). While it has API changes, they're either additions or contained to entity_info() which has few implementations outside core (and which will run into the same problems we have here so it's a critical bug for the lack of bundle support just by itself).

I think we should go ahead and commit this asap to get the API change out of the way, then do the much larger sweep which effulgentsia mentions in #37 afterwards. The microbenchmarks look great, the profile looks much better url(entity_uri('taxonomy_term', $term)) is a lot better than other ideas we had like having $options['entity'] or similar. As mentioned above, this pattern could also be used for non-entities if we found suitable places for it, which it looks like there are, so it's a generic solution as well. RTBC.

chx’s picture

Status: Reviewed & tested by the community » Needs work

No. Just no. This can not be. Move that hunk out from url() into a wrapper, say entity_url(). You can't possibly suggest a core function that accepts an entity or an arbitrary path. do_everything($foo, $bar, $universe); is the next?

effulgentsia’s picture

You can't possibly suggest a core function that accepts an entity or an arbitrary path.

That's not what #38 suggests. It suggests that url() and l(), which both take $path and $options as two separate parameters also take a single parameter that is an object containing ->path and ->options properties. That object is not an entity, but a uri (I guess, a "uri" being something that contains enough structured information to become a "url"). The problem trying to be solved is this: how to make it so that when url() is called multiple times for the same 'path' and 'options', that its result can be cached? If overloading the first parameter of url() is unacceptable, do you have any other ideas? Static cache within url() keyed on a serialized combination of $path and $options?

catch’s picture

i spoke to chx in irc, he suggested an entity_url() wrapper around url(). So keep the caching / is_object() (we wouldn't need a check because could guarantee object) outside url(). Didn't play with that yet though.

effulgentsia’s picture

Status: Needs work » Needs review
Issue tags: -API change
StatusFileSize
new19.7 KB

This removes the is_object() from url() that chx disapproves of. This also removes the need for the BC break of changing entity_uri() to return an object instead of an array, so removing the "API change" tag: yay!

chx’s picture

Status: Needs review » Needs work

Getting better! Thanks. But array('fragment' => "comment-$comment->cid") this is gone and i can't see where it's added back.

effulgentsia’s picture

Status: Needs work » Needs review
chx’s picture

OK, one last question , why forum_entity_info and not entity_info_alter?

effulgentsia’s picture

why forum_entity_info and not entity_info_alter?

From #32: Alter hooks are for making changes, not for adding the primary info data.

forum.module is the primary module responsible for the forum bundle. If it were making a change to anything about a taxonomy_term entity other than the forum bundle, then alter() would make more sense. Do you agree or am I missing something in our hook_*_info()/hook_*_info_alter() pattern?

bcn’s picture

StatusFileSize
new18.98 KB

Edited to add. Never mind, i didn't know about comment_uri()

Ignore this patch.

effulgentsia’s picture

StatusFileSize
new19.7 KB

@noahb: Thanks for the good intentions :). Re-uploading prior patch to clarify that it's the one that's on the table.

yched’s picture

re effulgentsia #48 : well, taxonomy_entity_info() does take care of exposing info for all taxo bundles (i.e taxo vocabs), including the forum vocab, right ?
Then it does look like forum_entity_info() in the patch only alters what taxonomy_entity_info() defined - for that matter it doesn't declare any of the other 'bundle' properties (label, admin), but relies on those set by taxonomy_entity_info().
Or am I missing something ?

yched’s picture

About the comment in forum_entity_info() - would it make sense to have the 'forum_nav_vocabulary' variable store the vocab machine name instead of the vid ?

yched’s picture

Sorry for the spam - I re-read #32, and I'd tend to disagree. Relying on array_merge_recursive(), and having two separate hook implementations define (partial) primary data for an entry, happens to work but is kind of confusing IMO. forum.module is not defining the primary data, it is tweaking the primary data exposed by taxonomy.module. Sounds very much like an alter hook to me.

catch’s picture

Patch looks good. I'm not sure why we need to call url() recursively though, why not check at the end of the function if we had a cache miss and add to cache there?

I think we should use forum_entity_info_alter() as well, there's no particular benefit to using either, but if it'd been _alter() in the patch no-one would have mentioned it, so that automatically leans me towards that.

forum_nav_vocabulary() having a machine name instead of vid sounds good, not necessary for this patch though.

effulgentsia’s picture

StatusFileSize
new19.68 KB

Changed to forum_entity_info_alter().

I'm not sure why we need to call url() recursively though, why not check at the end of the function if we had a cache miss and add to cache there?

url() returns in 5 different places, so that would need to be refactored or adding to the cache would need to be duplicated 5 times. An extra stack call adds 1us to the cache miss (and only when the cache is being employed), but we're already paying a couple us for the drupal_static(). Is it worth attempting the optimization you suggest as part of this issue?

catch’s picture

Argh I'd forgotten url() returns so many times, NIH then, let's not get into a completely url() refactoring here, keeping this to less than 20k is nice ;)

Will try to run some benchmarks later today.

effulgentsia’s picture

StatusFileSize
new20.18 KB

Added an @todo explaining the situation so that someone who's inspired to can attempt a url() refactor in a separate issue.

catch’s picture

Status: Needs review » Reviewed & tested by the community

Node with some comments on it. Enabled forum module so we can see the impact of removing forum_url_outbound_alter()

Before patch:

Concurrency Level:      1
Time taken for tests:   65.782 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      10514000 bytes
HTML transferred:       9984000 bytes
Requests per second:    15.20 [#/sec] (mean)
Time per request:       65.782 [ms] (mean)
Time per request:       65.782 [ms] (mean, across all concurrent requests)
Transfer rate:          156.08 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    55   66  18.2     60     523
Waiting:       49   59  17.2     54     497
Total:         55   66  18.2     60     523

After patch:

Concurrency Level:      1
Time taken for tests:   64.785 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      10514000 bytes
HTML transferred:       9984000 bytes
Requests per second:    15.44 [#/sec] (mean)
Time per request:       64.785 [ms] (mean)
Time per request:       64.785 [ms] (mean, across all concurrent requests)
Transfer rate:          158.49 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    55   65  10.3     60      96
Waiting:       49   58   9.4     54      88
Total:         55   65  10.3     60      96

In xhprof, total cpu used for url() and the functions it calls went down from 40,000us to 28,000us

RTBC.

dries’s picture

Status: Reviewed & tested by the community » Needs work
+++ includes/common.inc	2 Apr 2010 03:07:42 -0000
@@ -6430,11 +6467,49 @@ function entity_prepare_view($entity_typ
+      // By default, url() does not statically cache its results, because the
+      // risk of cache miss overhead outweighs the benefit of cache hit savings.
+      // Entity URIs, however, are more likely to pass through url() multiple
+      // times, especially with RDF enabled.
+      if (!isset($entity->uri['options']['cache'])) {
+        $entity->uri['options']['cache'] = TRUE;
+      }

Personally, I'm not a big fan of the cache handling in this patch, and the fact that people can turn on/off caching. It adds nuances and complexity to url() that are not easily understood without a lot of benchmarking. I recommend that we remove that for now.

catch’s picture

Status: Needs work » Needs review
StatusFileSize
new15.66 KB

url() is one of the worst CPU hogs in core at the moment, top 5 or so, and the main use for this cache is to be invoked transparently when people use entity_uri().

However, the critical bug and API change here (bundle support, removing forum_url_outbound_alter()) doesn't require the caching to be added, so I'm uploading a patch without that hunk, and I've opened #763574: Add caching for url() so we can work on it separately. Patch is untested.

effulgentsia’s picture

Status: Needs review » Reviewed & tested by the community

Diff confirms that the only change from #57 is the removal of any modification to url() and the removal of adding a 'cache' key to what entity_uri() returns. Bot's happy, so RTBC.

effulgentsia’s picture

Title: Add bundle support to entity_uri callback, use for taxonomy terms, save url() calls (regression from D6) » Add bundle support to entity_uri callback to remove performance overhead of forum_url_outbound_alter() (regression from D6)

Title change to reflect #60.

dries’s picture

Fails to apply locally. Asking for a re-test.

dries’s picture

Issue tags: -Performance, -RDF, -Entity system

#60: entity_uri.patch queued for re-testing.

Status: Reviewed & tested by the community » Needs work
Issue tags: +Performance, +RDF, +Entity system

The last submitted patch, entity_uri.patch, failed testing.

scor’s picture

Status: Needs work » Needs review
StatusFileSize
new14.22 KB
effulgentsia’s picture

Status: Needs review » Reviewed & tested by the community

Diff confirms #66 as equivalent to #60.

dries’s picture

Status: Reviewed & tested by the community » Fixed

Great. Committed to CVS HEAD.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

effulgentsia’s picture

Title: Add bundle support to entity_uri callback to remove performance overhead of forum_url_outbound_alter() (regression from D6) » Add bundle support to entity_uri callback, remove forum_url_outbound_alter(), and use entity_uri() instead of hard-coded paths
Status: Closed (fixed) » Needs review
StatusFileSize
new37.68 KB

Here's the sweep I promised in #37. It might not be complete, but can this be reviewed? I'd like confirmation that this is in fact what we intended when we introduced a entity_uri() function.

Status: Needs review » Needs work

The last submitted patch, entity_uri-use-699440-70.patch, failed testing.

effulgentsia’s picture

Status: Needs work » Needs review
StatusFileSize
new49.77 KB

Another try.

catch’s picture

To actually allow rewriting of links, this is exactly what we need to do. Patch looks good, on the @todos, it looks to be only comment module which is causing problems? Generally we only render comments from a single node, although D6 used to call node load 3-4 times for each comment, which with drupal_static() was still needlessly expensive when viewing a lot (now it'd be getting them from the entity loader which is probably not much better), the answer to this was to pass $node around with $comments to the functions that need it - http://api-drupal.pajunas.com/api/function/comment_view_multiple for example. For the admin case, we can probably do something similar, for RDF less so.

Status: Needs review » Needs work

The last submitted patch, entity_uri-use-699440-72.patch, failed testing.

effulgentsia’s picture

Status: Needs work » Needs review
StatusFileSize
new50.54 KB

Once again. This also solves the @todos by using entity_create_stub_entity(), though catch's idea (#73) of getting a real $node object into the necessary comment functions may be a better way to go.

fago’s picture

oh, I really like this. As of now we have entity_uri() and the possibility to override it for bundles, however most places in core hard-code the url. It also catches some hard-coded taxonomy urls, which cleans up the forum-alter-url use case. ++

@array_merge($uri['options'], $options)
Perhaps we should use array_merge_recursive() here? What if entity uri callback makes use of 'attributes' or 'query' and the caller wants to do so too?

effulgentsia’s picture

Perhaps we should use array_merge_recursive() here?

I like the idea, but: array_merge_recursive(array('language' => 'en'), array('language' => 'es')) results in array('language' => array('en', 'es')). This seems broken. How do we deal with this in other places that we use array_merge_recursive()? Scalars should replace. Only arrays should merge.

effulgentsia’s picture

scor’s picture

nice cleanup to remove all hard-coded entity paths!

+++ includes/bootstrap.inc	4 May 2010 02:02:14 -0000
@@ -1211,7 +1211,7 @@ function drupal_unpack($obj, $field = 'd
- *     $message[] = t("If you don't want to receive such e-mails, you can change your settings at !url.", array('!url' => url("user/$account->uid", array('absolute' => TRUE))));
+ *     $message[] = t("If you don't want to receive such e-mails, you can change your settings at !url.", array('!url' => url("user/$account->uid/edit", array('absolute' => TRUE))));

this seems like an unrelated hunk.

+++ includes/common.inc	4 May 2010 02:02:15 -0000
@@ -6539,16 +6542,58 @@ function entity_uri($entity_type, $entit
+function entity_url($entity_type, $entity, $options = array()) {

we should explain the difference between entity_uri() and entity_url()

fago’s picture

>How do we deal with this in other places that we use array_merge_recursive()? Scalars should replace. Only arrays should merge.
We just use array_merge_recursive() and hope nothing breaks :D Anyway, we can incorporate drupal(_array)?_merge() here once it's fixed.

scor’s picture

@fago: not related to entity_uri, but just in case it interests you, an example is #721082-21: Prevent conflicting namespaces and move hook_rdf_namespaces() invocation into rdf.module where had to work around array_merge_recursive() to prevent namespaces wreck, see the rdf_get_namespaces() hunk in the patch.

effulgentsia’s picture

StatusFileSize
new50.89 KB

This is a re-roll only. I'll follow-up with implementing #76 and #79 later today.

effulgentsia’s picture

StatusFileSize
new52.23 KB

This implements #76 (see #78/#81).

Re #79.1: It's related in that the example code includes a hard-coded path to 'user/UID'. This could be fixed by using entity_url(), but the example is wrong: it should be linking to 'user/UID/edit', so that's what the patch does.

Re #79.2: I really struggled with this, because I'm not really sure if we're intending to support entity types whose URIs are not URLs. Since Drupal is about making things available on the web, are we anticipating entity types whose URI is a URN rather than a URL? And if so, does our entity_uri() function handle that? But then we have things like the PHPDoc of entity_uri() saying @return: An array containing the 'path' and 'options' keys used to build the uri of the entity, and matching the signature of url() implying that we're gonna use the url() function to build a uri. So please see if the PHPDoc I added to entity_url() makes sense.

effulgentsia’s picture

StatusFileSize
new52.22 KB
+++ includes/common.inc	15 May 2010 23:04:29 -0000
@@ -6593,16 +6596,82 @@ function entity_uri($entity_type, $entit
+ *   // Returns 'http://example.com/node/1?fragment=comment-form.
+ *   entity_url('node', $node, array('absolute' => TRUE, 'fragment' => 'comment-form'));

No, that's not how fragments work :)

83 critical left. Go review some!

effulgentsia’s picture

Title: Add bundle support to entity_uri callback, remove forum_url_outbound_alter(), and use entity_uri() instead of hard-coded paths » Remove hard-coded entity paths from core: instead, use entity_uri()

Changing issue title to focus attention on the part that hasn't already landed.

aspilicious’s picture

+++ includes/common.inc	15 May 2010 23:04:29 -0000
@@ -6593,16 +6596,82 @@ function entity_uri($entity_type, $entit
+ * @param $options
+ *   Options to pass to url() in addition to the ones returned by entity_uri().
+ * @return
+ *   A URL string as returned by url().
+ *
+ * @see entity_uri()

put a newline between the different doc blocks (in this case between @param and @return)

+++ includes/common.inc	15 May 2010 23:04:29 -0000
@@ -6593,16 +6596,82 @@ function entity_uri($entity_type, $entit
+ * @param $options
+ *   Options to pass to l() in addition to the ones returned by entity_uri().
+ * @return
+ *   An HTML anchor tag as returned by l().

put newline between @param and @return

Powered by Dreditor.

aspilicious’s picture

Status: Needs review » Needs work
effulgentsia’s picture

Status: Needs work » Needs review

thanks.

effulgentsia’s picture

StatusFileSize
new52.23 KB

with the file.

aspilicious’s picture

Status: Needs review » Needs work

No patch?

aspilicious’s picture

Status: Needs work » Needs review

Damnit crossposting

catch’s picture

This looks great. We need this to get the functionality of hook_term_path() - which was removed and replaced by hook_url_outbound_alter() in a way which caused a whole range of issues, back into Drupal 7.

Only question on the patch so far is do we really need those entity_create_stub_entity() calls in there. It seems like in some cases (comment deletion) we could just do a node_load() instead - those pages aren't in the critical path at all.

effulgentsia’s picture

StatusFileSize
new53.35 KB

Agreed with #93. I left entity_create_stub_entity() in rdf_comment_load(), because that does seem to be critical path. But I commented it. I'm a little confused why we need to generate those variables there at all (do we ever need them other than when rendering a comment, in which case should we move that code to rdf_preprocess_comment()?), but perhaps that should be a separate issue?

scor’s picture

@effulgentisa: those variables are added to the entity during hook_comment_load() simply so they can be cached by modules like entity_cache. It doesn't really provide any performance benefit in core alone, but allow better performance with modules providing specialized caching. With all the work going into entity_url(), this could be revisited and we could move these back into rdf_preprocess_comment(), though this does not affect the speed of the date markup whose time relies on PHP's date().

effulgentsia’s picture

@scor: thanks! Given #95, we should leave rdf_comment_load() as it is in #94. Potentially, someone who feels inspired can open a separate issue about removing those variables from rdf_comment_load(), but the performance analysis of that issue would need to include the considerations you mention. On another topic, can you please comment on #94's PHPDoc for entity_url() and whether it addresses #79.2 without introducing anything inaccurate. Please see #84 about my general concerns about how we use URI and URL. It's beyond the scope of this issue to fix incorrect usage of the terms in HEAD, but I don't want this patch adding any new misinformation. Thanks!

yesct’s picture

Status: Needs review » Needs work

needs work to address the points in #96

effulgentsia’s picture

Status: Needs work » Needs review

I didn't intend #96 to imply "needs work". I would like a review from scor on the PHPDoc of entity_url(). Other than that, catch may want to review if #94 adequately addresses #93. Unless one of them (or another reviewer) has further changes to recommend, I think this can be RTBC.

catch’s picture

The changes + comment in #94 looks good to me. Didn't give the patch another full once-over, but didn't see any other issues last time either.

scor’s picture

+++ includes/common.inc	17 May 2010 20:11:44 -0000
@@ -6593,16 +6596,84 @@ function entity_uri($entity_type, $entit
+ * The returned URL can either be the entity's URI, or can be a context-specific
+ * URL that's different than the entity's URI, but that can still be used for

do you have an example for "context-specific URL that's different than the entity's URI"? re #84 I don't think we should try to support URNs at this stage, if we do, calling entity_url() on this entity would break so we would need to extend entity_url (too late). I'm personally quite happy already with the level we're reached with dealing with non-URN URIs.

+++ includes/common.inc	17 May 2010 20:11:44 -0000
@@ -6593,16 +6596,84 @@ function entity_uri($entity_type, $entit
+ * information about the difference between a URI and a URL. It is assumed that
+ * if this function is being called for an entity, then the entity_uri()
+ * function will return elements that make sense in a URL.

I can't think of any case where this function would not be called for an entity. What am I missing?

PS: leaving this needs review. the points above are not major concerns and we need people to review this patch in the meantime.

effulgentsia’s picture

StatusFileSize
new53.7 KB

Thanks! Is this better? If so, then given catch's and fago's reviews, I think we're RTBC.

effulgentsia’s picture

Anyone up for RTBCing this? Sure would be nice to get this into alpha5 to help contrib authors be aware of this change.

catch’s picture

Status: Needs review » Reviewed & tested by the community

I think we're good now.

Just a reminder of why this is needed:

* The hook_url_outbound_alter() patch removed hook_term_path(). This was used by forum module. forum_url_inbound_alter() led to a high potential for infinite recursion alongside performance issues.
* Rather than just put back hook_term_path(), we already had a need of entity URI for RDF, so we added the entity_uri() function instead.
* This patch just applies the use of that function consistently across core, introducing entity_l() and entity_url() as additional helpers, after the initial API change went in.

yesct’s picture

#101: entity_uri-use-699440-101.patch queued for re-testing.

scor’s picture

To address effulgentsia's question in #94 about why we need to generate those variables in rdf_comment_load(), I've added some documentation as part of #815866: Improve rdf.module documentation.

webchick’s picture

Status: Reviewed & tested by the community » Needs work

Thanks for the summary. But changes made throughout this patch leave me scratching my head as to when I'm supposed to use which function.

1. Do we strictly need the entity_uri() function? Most people wouldn't be able to tell you what URI vs URL is without consulting Wikipedia (which, to prove the point, is linked in the description of entity_url()), and we don't have a corresponding uri() function for symmetry.

2. I think we need far better, and cross-referenced, documentation as to when to use l() or url() vs. entity_l() or entity_url(). The documentation that exists now is technically accurate, but doesn't help me as a module author to not screw up my links to things.

effulgentsia’s picture

Title: Remove hard-coded entity paths from core: instead, use entity_uri() » Add bundle support to entity_uri callback to remove performance overhead of forum_url_outbound_alter() (regression from D6)
Status: Needs work » Closed (fixed)

Reverting issue status to as it was in #69. Opened a new issue to continue this discussion (so people can join without wading through all the back story): #823428: Impossible to alter node URLs efficiently.

Re #106:

Do we strictly need the entity_uri() function?

I wrote in the new issue that I'm leaning towards removing it in D8, but we already have it in D7, and don't have a hook_url_outbound_TAG_alter() system in place that could make entity_uri() obsolete.

Most people wouldn't be able to tell you what URI vs URL is

Indeed. And I agree #101 didn't help address that. URIs are just a superset of URLs, but for Drupal entities, they're really the same thing, as all Drupal entity URIs are also URLs. So, I think we're just slowly changing terminology within Drupal from URL to URI, because URLs are so 1990s, and all the hip web technologists prefer the word URI. The patch in the new issue tries to make this more clear.