Hi,

as far as I understand how this module should work, the data extracted from the images is only updated when the node is being edited. I'm currently trying to get the 'EXIF' module working on a site with over 20,000 images, so manually editing the image nodes is not a realistic option.

As it seems, at least of some nodes the extracted data seems to be wrong and/or the matching CCK field might be misconfigured. However, I have to "refresh" the data; I think it's a similar problem if CCK fields are added at a later time. In the "static" 5.x version this all was no problem, but with the new "dynamic" approach, a way to "bulk update" and "manage" the extracted data is needed.

Ideally I would like to have a plug-in for the 'Views Bulk Operations' (VBO) module. But another solution would be greatly appreciated.

Thanks & greetings, -asb

Comments

jessia’s picture

I agree; a VBO plugin would be extremely helpful. I'm working on an image-heavy site and currently having problems getting the EXIF data to read correctly, but I don't want to wait on uploading the images. If I could upload them first and then update them once I get EXIF working properly, that'd be great...

asb’s picture

Hi,

theoretically you could make a dummy edit with VBO on all image nodes that need to be refreshed (e.g. add a taxonomy term or change the contents of a CCK field). The EXIF module should refresh it's data when saving the image node (if "Refresh on node update" is checked at ./admin/settings/exif/config).

However, I haven't figured out a method to safely identify image nodes with stale or wrong EXIF data (e.g. after adding a CCK field, or after reconfiguring the field's configuration), and such dummy bulk edits will definitely have a price (e.g. messing up the recent changes log, messing up views that work on "real" node edits, search module needs to reindex thousands of nodes etc.).

Example: I have a site with about 35,000 image nodes and changed a few EXIF fields a few weeks ago; it takes literally months for search module to update the search index after such an operation (currently at 65%, still 11,694 nodes to go). If you index 50 items per cron run, this will take 234 cron runs to finish (= 10 days). Because of timeouts I had to limit the indexer to 20 nodes per Cron run, so I need 585 Cron runs to finish (= 25 days). If another bulk operation comes in between, you'll have to start again from zero, so basically your site might never have an complete and up-to-date search index - if you follow this path.

Greetings, -asb

rapsli’s picture

asb is right.

Proposals for different (more direct) methods are welcome

jphautin’s picture

I think we should split the issue in 4 main tasks :
- Creation of a metadata cache table in mysql.
On an upload of an image file, insert metadata data into the cache.
This allow modification of a "metadata CCK field" without reading files for each node.
This allow to change all node modified synchronously (for few images) without reading any file.
- A batch to update node of a modified Content type with added metadata field (big site). No reading of file
- A batch to update metadata cache if image file changes. It should note append if the user upload file by Drupal.
- Optimisation of the cache by using cache API (memcache ?)

I could give a try in the 7.x branch but only if others committers (rapsli) think it is the right way to go.

rapsli’s picture

@jphautin: Go ahead. btw: What's the architecture of the D7 branch? Does it fallow the same concepts as in the D6 Version?

jphautin’s picture

Yes, I do not change the concept. but the merge is not so easy as the API of CCK/Fields are not the same.

rapsli’s picture

Cool. Yeah, I bet there are lots of changes ;)

jphautin’s picture

Version: master » 7.x-1.x-dev
Assigned: Unassigned » jphautin
jphautin’s picture

Issue tags: +bulk drush exit update

Hi,
I am changing the way to make bulk update.
I will not to the complex described in #4.
To make thing a little bit better quickly, I am working on a drush integration.
First version will allow administrator to run an update on all content type with Exif module enabled.
Second version will allow you to display the list of content types where Exif module is enabled and then choose one of the content type to update

jphautin’s picture

Status: Active » Needs review

first drush integration provided in head. please review.

jphautin’s picture

Status: Needs review » Fixed
asb’s picture

Regading #9:

Has the new mechanism the means (i.e. an action) to plug into VBO, rules and/or Batch API, or how does it prevent timeouts during bulk updates?

In many cases, there will be exactly one content type with Exif module enabled: 'Image'. There might be thousends of nodes of this content type; the original issue from December 27, 2009 was about selectively updating the EXIF data of portions of these image nodes. Does the new mechanism accomplish this?

Will these change be backported to the D6 version?

Thanks & greetings, -asb

jphautin’s picture

No, this first version is a drush that update all node. But as drush allow to set your execution timeout for each batch, this will allow you to update all node on the night. Moreover, next version will update node ONLY if the image has been updated later that the node. This should make update quicker as I do not think all image node will be updated on a single day.

asb’s picture

Status: Fixed » Active

Thanks for the clarification and for your work. However, I do not think that this fixes the original issue (an action for Rules and VBO).

firfin’s picture

Title: Bulk update of extracted EXIF data » Drush support for Bulk update of extracted EXIF data

I think it might be better to split this issue in two. Drush support isn't the same as an action for VBO/Rules.
So created #1980550: Action for VBO/Rules let's deal with that part over there?

Meanwhile we can focus on fixing the drush commands in this issue. Because (at least for me) this seems broken atm. I am getting multiple error when running drush exif_update .
array_flip(): Can only flip STRING and INTEGER values! entity.inc:178
and also
PDOexceptions duplicate entries

firfin’s picture

Category: feature » bug

As this feature is now in the module, this should probably be considered a bug. Only affecting a stand-alone part of the module though, so normal priority.
Gonna look into this some more.

firfin’s picture

Status: Active » Closed (fixed)

Ok, the problem I was having in #15 was caused by my content having illegal values in the file entity fields. Probably caused by me messing about with private files which has known issues with media / file entity.
The drush commands are working fine for nodetypes. They should however be improved in my opinion. See: