Speed up indexing process significantly : make use of the filter cache [#2093031]

By not using the filter cache we are wasting precious CPU cycles.

We can see that each batch job takes around 16 seconds. If we dive a little deeper we discover the following data :
http://note.io/18cjA0c

3 of the top 10 most expensive functions, excluding wall time (so the time spent only in that specific function) comes from the filter functions.
Example with the patch when we time it in Drush

10848 items successfully processed. 10848 documents successfully sent to Solr.                                                                                                                                                   [status]

real	9m50.093s
user	7m29.995s
sys	0m11.233s

Now, if we apply the patch we get the following result with the following xhprof results. We can clearly see that some functions are not called as frequently anymore but more importantly the time it takes has almost been split in two...

http://note.io/18cl1fa
http://note.io/18clz4N

and if we check this with Drush :

10848 items successfully processed. 10848 documents successfully sent to Solr.                                                                                                                                                   [status]

real	4m26.749s
user	3m13.977s
sys	0m5.224s

The filter cache documentation states that the cache is infinitely valid :

  // Cache the filtered text. This cache is infinitely valid. It becomes
  // obsolete when $text changes (which leads to a new $cache_id). It is
  // automatically flushed when the text format is updated.
  // @see filter_format_save()

So I think we can safely state that we should push this in the indexing code and significantly speed up the indexing process.

Comment	File	Size	Author
#1	2093031-1.patch	603 bytes	Nick_vh
#1

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

Nick_vh

he/him

Ghent

CreditAttribution: Nick_vh commented 19 September 2013 at 13:10

File	Size
2093031-1.patch	603 bytes

Comment #2

Nick_vh

he/him

Ghent

CreditAttribution: Nick_vh commented 19 September 2013 at 13:10

Status:

Active

» Needs review

Comment #3

Nick_vh

he/him

Ghent

CreditAttribution: Nick_vh commented 19 September 2013 at 13:24

If we could somehow speed up

$document->content = apachesolr_clean_text($text);

it could give us another performance gain, but I understand that is not so easy to avoid.

10848 items successfully processed. 10848 documents successfully sent to Solr.                                                                                                                                                   [status]

real	3m32.385s
user	2m1.001s
sys	0m5.560s