By not using the filter cache we are wasting precious CPU cycles.

http://note.io/18qSiW6

We can see that each batch job takes around 16 seconds. If we dive a little deeper we discover the following data :
http://note.io/18cjA0c

3 of the top 10 most expensive functions, excluding wall time (so the time spent only in that specific function) comes from the filter functions.
Example with the patch when we time it in Drush

10848 items successfully processed. 10848 documents successfully sent to Solr.                                                                                                                                                   [status]

real	9m50.093s
user	7m29.995s
sys	0m11.233s

Now, if we apply the patch we get the following result with the following xhprof results. We can clearly see that some functions are not called as frequently anymore but more importantly the time it takes has almost been split in two...

http://note.io/18cl1fa
http://note.io/18clz4N

and if we check this with Drush :

10848 items successfully processed. 10848 documents successfully sent to Solr.                                                                                                                                                   [status]

real	4m26.749s
user	3m13.977s
sys	0m5.224s

The filter cache documentation states that the cache is infinitely valid :

  // Cache the filtered text. This cache is infinitely valid. It becomes
  // obsolete when $text changes (which leads to a new $cache_id). It is
  // automatically flushed when the text format is updated.
  // @see filter_format_save()

So I think we can safely state that we should push this in the indexing code and significantly speed up the indexing process.

CommentFileSizeAuthor
#1 2093031-1.patch603 bytesNick_vh
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Nick_vh’s picture

FileSize
603 bytes
Nick_vh’s picture

Status: Active » Needs review
Nick_vh’s picture

If we could somehow speed up

$document->content = apachesolr_clean_text($text);

it could give us another performance gain, but I understand that is not so easy to avoid.

10848 items successfully processed. 10848 documents successfully sent to Solr.                                                                                                                                                   [status]

real	3m32.385s
user	2m1.001s
sys	0m5.560s
Nick_vh’s picture

Version: 7.x-1.x-dev » 6.x-3.x-dev
Status: Needs review » Patch (to be ported)

Committed to 7.x-1.x. We should figure out if this also applies to 6.x-3.x

Nick_vh’s picture

Issue summary: View changes

Changing markup