Mostly this configuration works, but multiple attachments of the same name will result in an integrity constraint violation as the managed temporary file is inserted into the file_managed table. Specifically this condition results when the file reference is in the file_managed table but the file is no longer in the temporary directory (in my case because the subsequent request is processed by a different webnode).
The sequence looks like this:
1. cron runs on webnode 1, an email is processed and an attachment is saved (image.jpg). This results in both a temporary managed file temporary://image.jpg and a permanent managed file named public://image.jpg
2 At some point later, cron runs on webnode 2, an email is processed and an attachment is saved (image.jpg again). Because this is a different webnode the temporary://image.jpg file does not exist on the filesystem but it does still exist in the file_managed table, so the filename is not changed to make it unique. This results in an integrity constraint violation when the managed temporary file is saved, so the processing of the attachment stops before the permanent file is saved, thus causing data loss in this configuration.
Unfortunately in my case it isn't practical to run cron from a single machine, which would address this by always processing the mail on a single box by always writing temporary files to the same directory. Also file replication across webnodes is slow enough that moving the temporary directory into the site directory or other space that is shared across all webnodes would destroy performance completely.
The fact that the directory is hard coded to the temporary:// directory in MailhandlerCommandsFiles.class.php means I have to move all temporary files to a common directory or none.
Temporary files generally shouldn't be managed because they can disappear at any time. Managed temporary files should go into a different directory so it is possible to guarantee that they don't get deleted, since such a loss can result in data loss.
| Comment | File | Size | Author |
|---|---|---|---|
| #4 | mailhandler_1482966.patch | 965 bytes | grendzy |
| #2 | mailhandler-1482966-2.patch | 1.16 KB | danepowell |
| #1 | 1482966_1.patch | 1.04 KB | paul.lovvik |
Comments
Comment #1
paul.lovvik commentedThis patch addresses the problem by introducing a variable that allows the configuration to be changed while not requiring the variable to be set if the current behavior is sufficient for your architecture.
Comment #2
danepowell commentedI haven't had a chance to look at this yet, I'm just reposting so that the automated tests can check it.
Comment #3
danepowell commentedThanks for the patch- I see how this is a problem, however I don't think creating a variable that's not exposed in the frontend is the way to solve it.
Comment #4
grendzy commentedThis is what I've been using. system_cron() will automatically clean up the temporary files.
Comment #6
danepowell commentedI like #4, thanks for contributing:
http://drupalcode.org/project/mailhandler.git/commit/5eb78a8