When I publish using ecto, every apostrophe is returned to ecto as #039.

* The escaped code does not appear on the Drupal site.
* The escaped code does not appear on ecto itself ... until it downloads the node back to its local archive.

In other words, the apostrophes are passed back to ecto as #039.

This in and of itself may seem like no big deal. However, if I edit the node and re-upload, I have to do a search/replace of all the codes, or they will appear as #039 within the Drupal site itself.

This does not happen with quotation marks, smart quotes or any other character (as far as I know). I went back and forth with Adriaan at ecto about this, and he determined it to be Drupal's doing.

I never noticed this before, and suspect it might have crept in during the xmlrpc-related updates. I should have reported this earlier. My apologies.

CommentFileSizeAuthor
#7 xmlrpc-quotes.patch624 byteswalkah
#2 xmlrpc-entquotes.patch634 byteswalkah
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

laura s’s picture

Version: 4.6.0 » 4.6.3

Changing to correct version, noting that I did not notice this behavior until 4.6.3, perhaps 4.6.2 (not sure on that part). Sorry for the mismarking.

walkah’s picture

Status: Active » Reviewed & tested by the community
FileSize
634 bytes

hm... I had thought we fixed this. The error is that xmlrpc.inc uses check_plain (which calls htmlentities() with ENT_QUOTES)... while this is good for XSS prevention ... it's a bit too strong for XMLRPC. attached is a patch which fixes the issue (by replacing check_plain by just a raw htmlentities).

Please apply to both HEAD & DRUPAL-4-6

Dries’s picture

We decided to undo that patch for reasons I can't remember. Maybe it's worth searching for that old issue.

walkah’s picture

I did search for it and couldn't find it... I actually thought we applied this patch.

The issue is that check_plain is htmlencoding single quotes (') ... which is a) unneccesary for XML and b) very confusing for users trying to use blogapi .

Steven’s picture

-1, htmlentities() should never be used because it breaks UTF-8 (and escapes a bunch of stuff unnecessarily). Use htmlspecialchars() instead.

I still believe that clients which cannot interpret entities are messed up and need to be fixed. It means they are not valid XML parsers, but hackjobs. And XML is very specific about what an XML parser should support.

What happens when you use non-ASCII characters in these broken clients? If they don't handle entities, I assume the program's never even heard of encodings?

Oh and the reason we undid that patch is XSS protection. If we don't escape apostrophes, there might be issues in contrib modules or themes.

Steven’s picture

Status: Reviewed & tested by the community » Needs work
walkah’s picture

FileSize
624 bytes

ok. this issue still needs fixing... I have changed the patch to use htmlspecialchars rather than htmlentities...

a quick check of some utf8 chars - they seem to pass ok.

This still needs to be committed, allow me to quote the xmlrpc spec (http://www.xmlrpc.com/spec):

Any characters are allowed in a string except < and &, which are encoded as < and &. A string can be used to encode binary data.

this issue shows up in a vast number of blogging clients. please commit.

walkah’s picture

Status: Needs work » Reviewed & tested by the community

feeling bold :)

walkah’s picture

Assigned: Unassigned » walkah
Steven’s picture

Status: Reviewed & tested by the community » Fixed

Committed to HEAD. Ho, ho, ho.

Anonymous’s picture

Status: Fixed » Closed (fixed)
sun’s picture

5 years later, I'm going to revert this patch over in #882298: XML-RPC request (string) values are not safe for UTF-8