I'm trying to reduce load on our server, much of which is from Tika indexing PDFs. We have a situation whereby nodes are often edited and resaved without the file content changing, but Tika then goes ahead and reindexes all the PDFs.

Is there an existing way to tell search_api_attachments not to reindex such attachments?

If not existing already, I'm happy to code this myself. Is there any reason this wouldn't be possible?

Thanks,
Shiraz

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

izus’s picture

Hi,
there is the cache_search_api_attachments cache table.
The method getFileContent already does this work
The cache is updated during hook_file_update hookfile_delete and when clearing the cache (take a look at .module)
Doesn't this work for you or maybe somehow in your case, some code forces cache rebuild during the node update ?

Shiraz Dindar’s picture

thanks izus.

the problem is the cache tables get cleared too frequently in our setup. i wonder if it's kosher to skip the search_api_attachments_flush_caches and only flush manually (drush cmd, button on a settings page, etc). if you're open to that, let me know and i'll provide a patch. could make it an option.

Shiraz Dindar’s picture

So this is the start, and it does work for us.

I'd also add a button on the settings form to clear the cache table manually.

Grimreaper’s picture

Status: Active » Needs review
Issue tags: +drupaldevdays
FileSize
1.87 KB

Hello,

I have reviewed your patch.

I added default value to your variable_get to preserve configuration on existing websites.

I don't know how to have a simple way to see significant improvement in the speed to test.

Thanks for your patch.

Grimreaper’s picture

Component: Documentation » Code

  • izus committed 93b55ec on 7.x-1.x authored by Shiraz Dindar
    Issue #2307225 by Shiraz Dindar, Grimreaper, izus: Possible to only...
izus’s picture

Status: Needs review » Fixed

hi,
Thank you guys, this is merged now :)

  • izus committed ff9b054 on 7.x-1.x
    Issues #2307225 by izus: Better documentation of caching.
    

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Shiraz Dindar’s picture

thank you!