This is a spin-off of #1329218: How to prevent duplicate posts.
Basically, there are several situations in which a message might be undesirably imported (or at least fetched) multiple times- for instance, if using POP and not deleting messages (since POP messages cannot be marked as read on the server), or if using IMAP and a message is marked as unread again after being imported once.
This can be partially alleviated by mapping the "Message ID" to a unique source, so that duplicate nodes don't get created. However, Feeds will still fetch and parse the message up to that point, leading to unnecessary overhead and possible performance issues.
It might be desirable to write a filter plugin that filters messages by whether they have already been imported or not- this would require a module that creates a message-tracking table and provides the filter plugin to Mailhandler. I don't have the resources to write such a filter plugin at the moment, but I would accept patches or sponsorship for it.
Comments
Comment #1
danepowell commentedActually, rather than a filter plugin, this should probably go straight into core.
Comment #2
danepowell commentedAfter careful consideration, I've decided that there are already good existing mechanisms to prevent duplicate messages - deleting messages after import if possible, and marking messages as read for IMAP mailboxes. I also recommend mapping the Message ID to GUID in the Feeds Processor to stop potential duplication at that level - the Mailhandler quick-start importer now does this by default.
I welcome further debate on this, but since it seems that no on else is really interested at the moment I am marking it "won't fix".