(Any regex experts willing to tackle this?)
When users copy text over from MS Word documents sometimes you end up with HTML markup like this:
<a href="mailto:user@example.com"><span>user@example.com</span></a>
By the time spamspan has chewed this you get a pretty ugly result. It comes out looking like this:
userexample [dot] net
and the mailto: href is also badly mangled.
Help!
| Comment | File | Size | Author |
|---|---|---|---|
| #6 | spamspan-embedded_tags-1167084-6.patch | 2.68 KB | vitalie |
Comments
Comment #1
gpk commentedI should add that I have disabled all filters except spamspan.
Comment #2
peterx commented@gpk, the email address is processed by a regex, changing a regex is a pain, and there are dozens of possible combinations to handle.
Comment #3
gpk commentedThanks for working on this module Peter, this is quite a major bug for me and I'm a little suprised if it's not affecting others, given that a lot of content must get pasted into websites from word processing apps. It is a bit of a pain having to trawl through the raw HTML cleaning it up and this is beyond most content creators.
I appreciate a fix might not make it into 6.x but maybe this should be flagged up in 7.x for a regex expert. I have a rough workaround which helps a bit on our site, though it's not really production-ready. Maybe I should post it up here when I get a moment.
Thanks!
Comment #4
peterx commented@gpk Post it.
Comment #5
peterx commentedComment #6
vitalie commentedPatch below should partially fix this - it will just strip the tags. It includes the patch for the issue #2386967: Link text replaced with email address, since without it testing the this very issue becomes problematic.
Keeping the tags needs more work which I postpone until it will be actually requested by community.
Comment #7
vitalie commentedComment #10
vitalie commented