Support CCK creation using Feeds and Mailhandler [#921774]

I'm working in a project to create nodes using Mailhandler 2.x, we are using cck field for bodies instead of node bodies, so I'm really interested in fix #141211: CCK custom content type fields and scheduling using mailhandler

After review the Mailhandler 2.x code I think we can achieve this task making MailhandlerNodeProcessor inherit from FeedsNodeProcessor instead of FeedsProcessor and consider the special case for comments.

There are several issues to fix. In first place, I'm not sure if a Processor Plugin is the right place to process commands, maybe the Parser should 'parse' the origbody, and provide parsed commands to the Processor.

For other side, an single method should check if the user can or cannot create the node (or comment) and use this method to determine if the node can be created.

MailhandlerNodeProcessor::submitNode() should be removed, FeedsNodeProcessor should take care of the node creation. For the comment we can provide another method to save the comment.

MailhandlerNodeProcessorget:MappingTargets() should be removed too, FeedsNodeProcessor already provides all the fields for a node. This is the reason because I'm suggesting to process commands in the parser, at this moment '_mailhandler_node_processor_target_body' is processing commands, and now, this function will be removed.

I started a patch but It is not working yet, I would like to know if this ideas are workable, thoughts?

Comment	File	Size	Author
#19	mailhandlernodeprocessor-refactor-921774-19.patch	33.9 KB	Ian Ward
#15	mailhandlernodeprocessor-refactor-921774-15.patch	36.62 KB	Ian Ward
#13	mailhandlernodeprocessor-refactor-921774-13.patch	30.73 KB	Ian Ward

Comments

Comment #1

ekes commented 26 September 2010 at 15:30

I've just started looking at this and had much the same thought re: using the FeedNodeProcessor. (I was especially interested in inheritance from the 'feed' node).

The biggest challenge seems to be the threading, and that's special for comment handling [or even nodecomment (!) handling]. So a more generic and separate mapper for comments seems to be a good idea. Or is there some case for having to map to both nodes and comments?

Comment #2

Jean-Philippe Fleury commented 26 September 2010 at 18:46

subscription

Comment #3

dagmar

Spanish

Argentina

commented 26 September 2010 at 19:04

@ekes, yes, I think the best option is use a different mapper for comments, I didn't looked at it: #915146: Comment Processor.

To avoid use two different mailboxes to create new posts or comments maybe the fetcher could have an option where we can define to fetch only emails that create comments or only emails that create new posts (based on the email header)

Comment #4

Ian Ward commented 29 September 2010 at 01:31

@dagmar, I think the only method that would not have to be completely overridden if MailhanderNodeProcessor inherits from FeedsNodeProcessor would be getMappingTargets. I think there's too much custom processing that needs to happen in ::process to use FeedsNodeProcessor::process. That said, if we can delete at least that getMappingTargets code, it could make sense.

To comment on your other points

For other side, an single method should check if the user can or cannot create the node (or comment) and use this method to determine if the node can be created.

Right now a single method is used to execute authentication, but multiple types of authentication plugins can be written in order to support authentication based on email address, based on token, etc. Mailcomment module is an example where custom authentication method is needed.

I'm not sure if a Processor Plugin is the right place to process commands, maybe the Parser should 'parse' the origbody, and provide parsed commands to the Processor. [...] This is the reason because I'm suggesting to process commands in the parser, at this moment '_mailhandler_node_processor_target_body' is processing commands, and now, this function will be removed.

Basically, instead of having _mailhandler_node_processor_target_body you'd just trim the commands out and signature line out in the parser when processing the commands? I see that'd also let us delete some code and I cannot think of any issues OTOH (I've taken a few weeks off working on this code b/c of vacation/travels).

Am I understanding this all correctly? Please elaborate if you see clean paths to inheriting from FeedsNodeProcessor. I have the use case of sending emails to the same mailbox and those emails could get turned into new nodes or comments (mailcomment, plus something like http://github.com/ianshward/Mail-Group Thanks.

Comment #5

dagmar

Spanish

Argentina

commented 29 September 2010 at 02:28

Hi Ian good to see you on this issue.

I was looking for the feed glosary, here is what I found:

Fetcher

A plugin responsible for downloading, loading or receiving a feed.

Parser

A plugin responsible for bringing a fetched feed into a normalized format for processors.

Processor

A plugin that "does stuff" with a parsed feed. Usually a processor stores a feed in one or the other form (a node, a user, a taxonomy term, a simple DB record).

Following those concepts, I think Mailhandler shouldn't provide a processor, in theory with a Fetcher and a Parser would be enough.

I think we can achieve the same features that Mailhandler 1.x has doing this:

Fetcher:
Download mails from a mailbox. This fetcher may have a UI option to determine if the email has a msg-id in their header (to create comments) or it is a email to create new nodes. (Or maybe two kind of fetchers, fetch Mail for comments, and fetch Mails for new nodes)

Parser:
Process body commands, strip replies from bodies and sanitize the title.

And then using the NodeFeedProccessor we can create nodes based on processed data, or using #915146: Comment Processor (I didn't test it) to create comments.

So, users have to create two feeds in order to create new nodes or comments from the same Mailbox, but this method reuse a lot of already existing code and IMO is a good thing.

Related to the authentication method, I was trying to say that the authentication method should go into the fetcher, due NodeFeedProccessor doesn't take care about the author of the node, the email should be rejected before Proccesor processes it.

I hope this clarify a bit the idea, maybe we should ask to @alex_b about this kind of implementation, I don't know.

Comment #6

Ian Ward commented 29 September 2010 at 13:35

@dagmar - the main complication to be aware of is that of handling the imap stream. Between fetching and parsing, multiple http requests are made, so if you open a stream in the fetcher, it will not be available in the parser; you have to open it, close it, then open it again. These multiple requests occur because the parsing phase uses the batch API. This is why the fetcher just checks whether there's any new mail, and then the parser opens the stream again and parses the mails in the imap stream. If you parse in the fetcher then you'll need to store what's fetched somewhere as opposed to just in memory so that it's not lost between http connections. If there are attachments, you'll also need to store those properly. This is an architecture I contemplated, and I think makes sense to switch to at some point because fetching is now (or will be at some time in the future...I need to catch up on the latest) also batch-able, which will mean that in order to take advantage of this (and it's a good thing if you have mailboxes w/ tons of mail) we'll probably need some kind of intermittent storage before the fetched mails find their final storage as nodes, comments, or other (like simple structures w/ their own db storage).

The parsing phase in the case of mailhandler includes parsing the imap stream in order to build an array of mailbox items which can then be processed. The lines are a little fuzzy. One problem I see with authentication in the fetching phase is that someone could send an email with the command "type: story" and you would not know what type the node is supposed to be turned into until you parse the commands. The sender may have permission to create stories, but not blog posts. I don't have a good idea of how many people depend on this feature, but it is questionable which commands should be supported. I think determination of the node type defined in the mailbox configuration or importer is more common than specification in the mail commands in the email itself. In short, discontinuing the capability to define certain commands in the email could be acceptable to achieve superior architecture.

Comment #7

nally commented 29 September 2010 at 16:48

Comment #8

mark trapp

he/him

English

commented 29 September 2010 at 17:51

One portion of MailHandlerNodeProcessor that I'm currently relying on is _mailhandler_node_hooks to provide hook_mailhandler_node_* so the object data can be modified before the node is saved. Without it, modules like Mailsave will have a hard time acting upon data that Mailhandler doesn't handle, like $object->mimedata.

It's possible for modules to provide their own implementation of FeedsProcessor to process parsed data, but that would only allow one module at a time to act on the parsed data (you can only select one processor per importer).

If nothing else, it'd be really useful if there was a bare-bones MailHandlerNodeProcessor that provided a hook for dependent modules.

Comment #9

mark trapp

he/him

English

commented 29 September 2010 at 23:52

Actually, I retract my previous request to keep MailhandlerNodeProcessor: the functionality needed can easily be done in hook_feeds_after_parse() instead of Mailhandler's custom hooks. See #927014: Provide e-mail attachments as FeedEnclosure objects to support FileField mappings for implementation.

Comment #10

Ian Ward commented 30 September 2010 at 13:29

@Mark Trap I didn't see hook_feeds_after_parse until now. To recap, I think if MailhandlerNodeProcessor can subclass FeedsNodeProcessor without losing much functionality then that's a good goal. Batch fetching and parsing will be backported http://drupal.org/node/744660 but look like they'll live in a feeds 6.2. I think support for batch fetching can be a later goal. So in sum:

* Switch MailhandlerNodeProcessor to subclass FeedsNodeProcessor
* What happens to MailhandlerNodeProcessor::commands()? This alters the node object before save.
* Need to maintain capability which _mailhandler_node_hooks('presave') and _mailhandler_node_hooks('postsave') provide (alter points on the node object before saved, and after saved)

For the last two, maybe all that's needed is an alter hook after `$node = $this->buildNode($nid, $source->feed_nid);` in FeedsNodeProcessor::process() ... By this point $node->uid is set to the logged in user (probably 0, or an admin) $node->created is also set to time(). So, an alter at this point may be the ideal place to run:

* MailhandlerNodeProcessor::commands()
* MailhandlerNodeProcessor::authenticate()
* MalihandlerNodeProcessor::save() (does not exist, could take place of _mailhandler_node_hooks('presave') so we could control this running only after commands() and authenticate() have been run

That still leaves open how to deal with comments, which may need another patch, or, we go the two separate importers way as @dagmar suggests (clearner code, but a little extra UI setup).

@all, is this sounding reasonable?

Comment #11

Ian Ward commented 30 September 2010 at 13:42

Another note - mailhandler hook 'postsave' may no longer be needed based on http://drupal.org/node/927014#comment-3512992

Comment #12

mark trapp

he/him

English

commented 30 September 2010 at 18:20

I'm not really familiar with the comment part of MailhandlerNodeProcessor (I don't use Drupal comments), but I've gotten pretty familiar with the node part. This is what I've found:

hook_mailhandler_node_presave() should be replaced with hook_feeds_after_parse(): there really isn't anything a dependent module can do with it since it occurs after MailhandlerNodeProcessor::map().

In fact, most of the code in MailhandlerNodeProcessor::process() can be scrapped once MailhandlerNodeProcessor extends FeedsNodeProcessor. Any custom work on processing each of the sources should really be done in FeedsNodeProcessor::setTargetElement() and/or hook_feeds_set_target() callbacks so that by the time FeedsNodeProcessor::process() is called, there is very little that needs to be done to the actual source fields. And FeedsNodeProcessor::process() already handles preparing and saving the node.

If a module needs to act upon the node object after it's been built, there's hook_feeds_after_import(), which would replace hook_mailhandler_node_postsave().

That leaves hook_mailhandler_node_midsave(), which seems like it could be replaced with common hook_nodeapi operations.

If you ignore comments for a moment, MailhandlerNodeProcessor can be simplified a lot. I wonder if it's worth exploring splitting the two if comments do require the additional complexity.

Comment #13

Ian Ward commented 1 October 2010 at 04:16

Status:

Active

» Needs review

Status	File	Size
new	mailhandlernodeprocessor-refactor-921774-13.patch	30.73 KB

Mark, thanks for pointing those things out. Attached is a patch that moves toward the goal. It may have some bugs, but I have it basically working. The way commands are applied is a little hackish, and FeedsNodeProcessor will need a patch in order to let MailhandlerNodeProcessor set a uid based on the ::authenticate result (it gets overwritten twice from within ::buildNode). This patch also completely removes comment support, which could be tackled after this refactor lands.

Comment #14

skolesnyk commented 14 October 2010 at 14:22

Why not leave this function to Media Mover's mm_node contributed module? It does the same function.

Comment #15

Ian Ward commented 16 October 2010 at 19:28

Status	File	Size
new	mailhandlernodeprocessor-refactor-921774-15.patch	36.62 KB

Attached is a re-roll of the patch in #13. ::process() is reimplemented in order to invoke hook_mailhandler which lets mailcomment work because it can set the type to comment. There's conditional handling for comments. I am leaning toward this processor doing both nodes and comments. If there are two separate processors I think it'll be more confusing to set up in the UI than necessary. Then there's the problem of deleting messages after they're read, so the node processor deletes the mails which should be comments and then the comment processor never gets to them. Solvable but may just lead to even more convoluted code. If anyone's got a cleaner idea for that implementation that'd be great. The main use case I have in mind is using mailhandler + mailcomment to both create new nodes and respond via comments from a single mailbox.

One alternative could be to get a hook that's invoked in FeedsNodeProcessor::process() after map() is called but before the node_save(), which lets $node be modified (basically, what hook_mailhandler does in the attached patch. Once we get to node_save, it's just too late to stop the creation of the node. nodeapi presave is too late. So this patch to FeedsNodeProcessor would optionally bail out on calling the node_save. Whether this is too implmentation-specific is suspect.

I'd like to commit this soon, so please review if you're interested. Thanks.

Comment #16

mark trapp

he/him

English

commented 18 October 2010 at 19:28

Status:

Needs review

» Needs work

There's a lot to this patch, and I'm slowly working my way through it, but I do have some initial feedback:

Firstly, both patches cleanly apply and grant CCK support almost for free. Awesome. The amount of code removed to provide the same and more functionality gives me chills.

Secondly, there's no way to extend the mapping sources MailhandlerParser provides. Feeds provides hook_feeds_parser_sources_alter, which is invoked in FeedParser::getMappingSources(). MailhandlerParser::getMappingSources() just needs to check with its parent class when building the $sources array:

public function getMappingSources() {
  $sources = parent::getMappingSources();
  // ...
}

Finally, the additions you added to the patch in #15 give me a little bit of a pause in that they assume only one module would want to handle the submitted data via hook_mailhandler in MailhandlerNodeProcessor::process():

foreach (module_list() as $name) {
  if (module_hook($name, 'mailhandler')) {
    $function = $name .'_mailhandler';
    if (!($node = $function($node, $item))) {
      // Exit if a module has handled the submitted data.
      break;
    }
  }
  // ...
}

Once one module gets it (whether it's comments or another module), no other modules can touch it. I like the hook here, but what about module_invoke_all() instead?

And I'm struggling to identify the use case where you'd want both node creation and comment creation handled by the same importer: it seems like, given Feeds workflow, choosing how you want to process the data (either you can create nodes with it, or you can create comments with it) is pretty standard, and Mailhandler ought to conform to that. I'd rather see an extension of FeedsNodeProcessor than the conditional handling.

I have some time in the next few days to test a few other use cases for mailhandler_node, but they don't hold up the patch in #13 at least, and I'm on-board with those changes given the change to MailhandlerParser::getMappingSources() above.

Comment #17

Ian Ward commented 19 October 2010 at 13:17

Mark, thanks for the review. I'll fix `MailhandlerParser::getMappingSources()`. With hook_mailhandler "Exit if a module has handled the submitted data." this would just exit if the implementation of hook_mailhandler does not return $node and the implementation should pass $node by reference.

For the use case of making nodes and comments from the same processor, the user story I have in mind is the following - you have a drupal site w/ mailhandler and mail comment. You allow people to post new posts to the blog using mailhandler, which uses the tokenauth authentication plugin for increased security. People are also subscribed to receive email notifications from the blog. The notifications run through mailcomment module and get a special signature which contains info that if someone replies to the notification, the signature contains threading information. The reply is read by the mailhandler importer, mailcomment implements hook_mailhandler which alters $node so that $node->type is 'comment' and adds parent id information, and the message gets posted as a comment. In this scenario, a message sent to the mailbox could either become a new blog post, or a comment on an existing blog post.

Now, to do this with two different processors, I guess there would need to be a few assumptions taken by the node processor and comment processor:

* Node processor does not process messages which contain in-reply-to information
* Comment processor does not process messages which *do not* contain in-reply-to information
* Neither processor should mark messages for deletion if they're not actually processed. This would need to be handled in the parser which reads the imap stream.
* If someone wants the mailcomment workflow with ability to also create new posts (not just comments) they'll need to setup two processor, but, mailcomment could ship with two importers (the site builder would just need to setup a mailbox and assign the importers to that mailbox)

I actually like this design too, thinking about it more. I only wonder whether these assumptions are all safe, which they seem to be. Worst case, how to handle a message if it contains or does not contain in-reply-to headers could be a setting on each importer.

Does this sound sane?

Comment #18

mark trapp

he/him

English

commented 19 October 2010 at 22:14

Thanks for the explanation, Ian. I now understand the hook_mailhandler() conditional, and I'm good with that.

The user story you provided seems a little too like magic: I'm of the mind to go with the assumptions you are making in the second part of your post. This way, the user explicitly defines what it wants Mailhandler to do. This would also allow modules to extend MailhandlerNodeProcessor without having to inherit the comment-handling logic.

This specific assumption:

Neither processor should mark messages for deletion if they're not actually processed. This would need to be handled in the parser which reads the imap stream.

Should be in Mailhandler regardless of anything else: I could've sworn it was.

Providing default importers would be great to hide some of the complexity to new users setting up the importers they need.

Comment #19

Ian Ward commented 20 October 2010 at 13:12

Status	File	Size
new	mailhandlernodeprocessor-refactor-921774-19.patch	33.9 KB

Attached is a patch w/ the fix to getMappingSources(). I also removed the process() override (which is great). Having a processor that just does nodes and one that just does comments means that hook_node_presave and hook_comment_presave are likely all that's needed, so the hook_mailhandler would not be needed which was the only reason to override process() (because it basically halted the node creation and switched to comment creation which is something you could not just do w/ hook_node_presave because by then it was too late to stop the node submission). This patch should go in and then we can deal w/ comment processing on another issue. That same issue can cover how to differentiate a node from a comment in mail headers and associated settings.

Comment #20

mark trapp

he/him

English

commented 21 October 2010 at 00:38

Status:

Needs work

» Reviewed & tested by the community

Patch works and applies cleanly. I tested it with Mailhandler Attachments and a few other things, and everything's working. Can't wait to see it committed.

Comment #21

Ian Ward commented 21 October 2010 at 13:17

Status:

Reviewed & tested by the community

» Fixed

Committed. Thanks Mark and Dagmar.

Comment #22

dagmar

Spanish

Argentina

commented 21 October 2010 at 21:46

Awesome! I have tested this a bit late, but it works fine. I have posted a follow up here http://drupal.org/node/927654#comment-3606850 (wich fix the @TODO in the committed patch)

Comment #23

4 November 2010 at 21:50

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Comment #24

Ian Ward commented 7 January 2011 at 23:44

We discussed how to handle comments briefly on this thread. If you're interested in those decisions please see http://drupal.org/node/947266 ... @dagmar I'm thinking your idea here http://drupal.org/node/921774#comment-3495476 seems the cleanest overall, without much sacrifice to usability.

Support CCK creation using Feeds and Mailhandler

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

Comment #18

Comment #19

Comment #20

Comment #21

Comment #22

Comment #23

Comment #24

News items

Our community

Documentation

Drupal code base

Governance of community