I can verify, using 6.x-2.x-dev that commands are stripped out of emails as long as they are sent as plain text. But for emails sent as HTML or rich text (from MS Outlook, at least), the commands are not processed or stripped. This is true no matter whether the Feed Importer's Node Processor is set to use an Input format of Default, Full HTML, or Filtered HTML.

Comments

danepowell’s picture

Thanks, I'll have to take a closer look at this.

squinternata’s picture

Version: 6.x-2.x-dev » 6.x-1.11

i have the same problem but with a recommended version..6.x-1.11
but actually i m using outlook 2002 and my emails are not stripped..
someone can help me?
thanks
A

ilo’s picture

I can't take a look right now, but I'll work on it in the following days. I don't have an outlook 2002 to make a test. In the meantime, if you want, please, post the full source of an email sent with outlook 2002 (removing sensitive data) so I can try to guess the problem.

Raul Cano’s picture

Hi,
I came across the same problem using Mailhandler 6.x-1.11 and sending my emails from gmail (with the rich text enabled). To solve this, I made my own function to parse the commands, which I paste below. I just replaced the original code with my code, though I guess this is not the most orthodox way (suggestions are welcome).
Some considerations:
1.- Now, the commands must be between the tags ##COMMANDS_START## and ##COMMANDS_END##.
2.- Every command, even the default ones, must now be preceded by the characther "-" (no quotes).

This is how it would look like:

##COMMANDS_START##
-taxonomy: [term1,term2]
-type: wiki
-og_groups: [307]
-og_public: 0
##COMMANDS_END##
Email text here.

And this is the function I made, replacing the one in the file mailhandler.retrieve.inc:

function mailhandler_commands_parse($body, $sep) {
	$commands = array();
	$body_parts = explode("##COMMANDS_START##", $body,2);
	$body_parts = explode ("##COMMANDS_END##", $body_parts[0].$body_parts[1],2);
	// Remove every HTML tag in the commands
	$body_parts[0] = strip_tags($body_parts[0]);
	//The way it is programmed with the "explode", the first position of the commands array would be always empty.
	//This is worked around this way:
	$aux = explode("-", $body_parts[0]);
	array_shift($aux);
	$endcommands = count($aux);
	$lines = array_merge($aux, explode("\n", $body_parts[1]));
	for ($i = 0; $i < $endcommands; $i++) {
		$line = trim($lines[$i]);
		$words = explode(':', $line);
		$words[0] = (isset($words[0]))?trim($words[0]):"";
		$words[1] = (isset($words[1]))?trim($words[1]):"";	
		// If one of the words is empty, the command is wrong, so it is ignored
		if(!strlen($words[0])==0 && !strlen($words[1])==0){
			$commands[$i] = array($words[0],$words[1]);
		}
	}
	//Returns an array with the commands
	$res = array('commands' => $commands, 'lines' => $lines, 'i' => count($lines), 'endcommands' => $endcommands);
	return $res;
}

So, as I said, this may be a bit dirty solution, but it works really fine.
I hope it helps.
Have a nice day!

danepowell’s picture

Version: 6.x-1.11 » 6.x-2.x-dev

This will be fixed in 6.x-2.x before 6.x-1.x

danepowell’s picture

Version: 6.x-2.x-dev » 6.x-1.x-dev

I am not able to reproduce this in 6.x-2.x. If you are still having this issue with 6.x-2.x-dev, please post a sample message. Here is the message I used (from gmail - sensitive headers removed):

Content-Type: multipart/alternative; boundary=90e6ba6e8bd666bc8d04af708cd2

--90e6ba6e8bd666bc8d04af708cd2
Content-Type: text/plain; charset=ISO-8859-1

status: 0

testing

-- 
Dane Powell
Graduate Student / Research Assistant
Rice University Mechatronics and Haptic Interfaces Lab
danepowell.com - mahilab.rice.edu

--90e6ba6e8bd666bc8d04af708cd2
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

status: 0<div><br></div><div>testing<br clear=3D"all"><div><br></div>-- <br=
>Dane Powell<br>Graduate Student / Research Assistant<br>Rice University Me=
chatronics and Haptic Interfaces Lab<br><a href=3D"http://danepowell.com" t=
arget=3D"_blank">danepowell.com</a> - <a href=3D"http://mahilab.rice.edu" t=
arget=3D"_blank">mahilab.rice.edu</a><br>

</div>

--90e6ba6e8bd666bc8d04af708cd2--
Anonymous’s picture

Hi all

Same for 7.2.x

:-(

Best regards
ArchGalileu

danepowell’s picture

@ArchGalileu - see my post #6 - please post an example problematic message. Otherwise I have no way of reproducing or troubleshooting this.

MtRoxx’s picture

Version: 6.x-1.x-dev » 7.x-2.0-rc1

I am having this same issue. Below are the headers. I tested this with just the Test, not formatted, worked great. When I put the second Test as bold, the tid:22 showed up in the node. Any suggestions would be appreciated.

Subject:Fitness Center
From:"My Name"
Date:11/15/2011 3:28 PM
To:toemail@communitywebsite.com
Message-ID:<4EC2E7A3.70103@mywebsite.com>
Reply-To:email@mywebsite.com
User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0) Gecko/20111105 Thunderbird/8.0
MIME-Version:1.0
Content-Type:multipart/alternative; boundary="------------000002080106040601000109"

tid: 22
Test
Test

danepowell’s picture

Title: Commands are not stripped out if email is sent as HTML or rich text » Commands not processed for some HTML emails

@Mt_Roxx - that is still not helpful... I need the problematic message, i.e. the one where you said "Test" is bold, and I need the entire message including headers. For instance, in gmail, click the arrow to the right of the message and "show original".

At any rate, I have a hunch as to why this is happening. If you don't put the commands on the very first line of the message or if your mail client does screwy stuff like inserting junk HTML before the first line of content, AND your client does not insert newlines but only HTML breaks, then this can happen.

Possible workarounds for you to try...

  • Make sure you are putting commands on the very first line of the message
  • Try sending using different mail clients
  • Change the MIME preference in the Mailhandler Mailbox settings to 'Plain text'

Possible solutions to think about for Mailhandler (none of them pretty...)

  • Strip out HTML tags before processing commands (could be finicky, as mail clients, especially nonconforming ones like Outlook, may insert all sorts of junk before content, not just simple line breaks)
  • Always use the plain text MIME part for command processing (easier, but I don't know if text/plain is always present, and this still leaves the problem of having to strip the commands from the HTML portion of the message)
  • Change the way commands are searched for- instead of searching line-by-line, do a search for all available command strings. Again, this leaves the problem of stripping... we could get rid of the commands, but can't get rid of all the HTML junk around them, which might leave blank lines or who knows what else where the commands used to be
Anonymous’s picture

Hi @Dane Powell

For some reason the taxonomy, obras: [bla,bla] is not created and body as the * and not the html

Here is my original:

MIME-Version: 1.0
Received: by 10.68.52.226 with HTTP; Mon, 21 Nov 2011 17:40:36 -0800 (PST)
Date: Tue, 22 Nov 2011 01:40:36 +0000
Delivered-To: geral@gasparsantos.eu
Message-ID: <CAMXbdrXN5P7sB696DAhJ-ZHpVcSH9_EsLzSk01Ub7tjSLw5yvA@mail.gmail.com>
Subject: teste drupal
From: "Gaspar Santos, violinista" <geral@gasparsantos.eu>
To: concertos@quartetodouro.eu
Content-Type: multipart/alternative; boundary=bcaec51f9697d3212404b248e3d9

--bcaec51f9697d3212404b248e3d9
Content-Type: text/plain; charset=ISO-8859-1

status: 0
obras: [Pedro Carneiro,Trolita Troliti,Joaquin Rodrigo]
sala: Teatro Bruto
*Este espero que corra bem :-)*
*
*
*Gaspar Santos*

--bcaec51f9697d3212404b248e3d9
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<span class=3D"Apple-style-span" style=3D"color: rgb(34, 34, 34); font-fami=
ly: arial, sans-serif; font-size: 13px; background-color: rgba(255, 255, 25=
5, 0.917969); ">status: 0</span><br style=3D"color: rgb(34, 34, 34); font-f=
amily: arial, sans-serif; font-size: 13px; background-color: rgba(255, 255,=
 255, 0.917969); ">
<span class=3D"Apple-style-span" style=3D"color: rgb(34, 34, 34); font-fami=
ly: arial, sans-serif; font-size: 13px; background-color: rgba(255, 255, 25=
5, 0.917969); ">obras: [Pedro Carneiro,Trolita Troliti,Joaquin Rodrigo]</sp=
an><br style=3D"color: rgb(34, 34, 34); font-family: arial, sans-serif; fon=
t-size: 13px; background-color: rgba(255, 255, 255, 0.917969); ">
<span class=3D"Apple-style-span" style=3D"color: rgb(34, 34, 34); font-fami=
ly: arial, sans-serif; font-size: 13px; background-color: rgba(255, 255, 25=
5, 0.917969); ">sala: Teatro Bruto</span><br style=3D"color: rgb(34, 34, 34=
); font-family: arial, sans-serif; font-size: 13px; background-color: rgba(=
255, 255, 255, 0.917969); ">
<span class=3D"Apple-style-span" style=3D"color: rgb(34, 34, 34); font-fami=
ly: arial, sans-serif; font-size: 13px; background-color: rgba(255, 255, 25=
5, 0.917969); "><b>Este espero que corra bem :-)</b></span><font class=3D"A=
pple-style-span" color=3D"#222222" face=3D"arial, sans-serif"><br>
</font>
<div><span class=3D"Apple-style-span" style=3D"color: rgb(34, 34, 34); font=
-family: arial, sans-serif; font-size: 13px; background-color: rgba(255, 25=
5, 255, 0.917969); "><b><br></b></span></div><div><span class=3D"Apple-styl=
e-span" style=3D"color: rgb(34, 34, 34); font-family: arial, sans-serif; fo=
nt-size: 13px; background-color: rgba(255, 255, 255, 0.917969); "><i>Gaspar=
 Santos</i></span></div>

--bcaec51f9697d3212404b248e3d9--

Best regards
ArchGalileu

danepowell’s picture

Thanks @ArchGalileu, that confirms my suspicions in #10. As a workaround I suggest setting the mailbox to use "plain text" for the node body.

danepowell’s picture

I think the way to implement this is to use the plain text part to find the commands, then search for the same commands in the html part and use an HTML parser to eliminate all of the immediately enclosing tags. It might take some work to get right, but I don't see any better option.

danepowell’s picture

Status: Active » Postponed

Okay, first this needs to happen: #1370096: Remove 'MIME preference' from Mailbox config, change 'Body' mapping source

Then, we can get the commands from the plain-text part, and search for the first occurrences of the same commands in the HTML body and remove them. Finally, for n commands, we can remove the first n line breaks (<br*>).

It's not perfect, but I *think* it will work.

danepowell’s picture

Component: Code » Mailhandler
Status: Postponed » Active

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.