Tweets with special characters, like Spanish tildes are incorrectly displayed in the tweet contents field. They work correctly in the title though.

Removing the utf8_encode function in line 786 in tweet_feed.module helps, as it seems to work properly now.

Lines 784 to 788 in tweet_feed.module:

 // The tweet itself goes into the tweet contents field
  $node->field_tweet_contents[$node->language][0] = array(
    'value' => utf8_encode(htmlspecialchars_decode($tweet_html)),
    'format' => 'full_html',
  );

By removing the utf8_encode function call, tweets import correctly now:

 // The tweet itself goes into the tweet contents field
  $node->field_tweet_contents[$node->language][0] = array(
    'value' => htmlspecialchars_decode($tweet_html),
    'format' => 'full_html',
  );

Comments

codibit created an issue. See original summary.

cebasqueira’s picture

Assigned: Unassigned » cebasqueira
cebasqueira’s picture

Assigned: cebasqueira » Unassigned
Status: Active » Needs review
FileSize
509 bytes
codibit’s picture

I've been testing without the utf8_encode, and now it fails when it finds an emoji character, the database needs to be migrated to 4 byte UTF-8 in order to do this, or ,I guess, emojis need to be filtered out or replaced (I used the utf8mb4_convert drush module)

Grimreaper’s picture

Hello,

Inverting htmlspecialchars_decode() and utf8_encode() solved the problem for me.

Here is a patch for that.

Thanks for the review.

PaulDinelle’s picture

Since this somewhat applies, I also noticed that mention names were not being properly stripped like the title is, so I have included that here in addition to @Grimreaper's fix. This was patched against 7.x-3.x-dev so that it can be better included.

ElusiveMind’s picture

I have to play catch up here. I also need to deal with the now unsupported Oauth module. I will likely be working on updates to the module this weekend. I'll try to have something ready for testing Sunday night with additions in it. Sorry for the delays.

rsmylski’s picture

I was trying this out hoping to get it to help clean up some tweets that were getting garbled during import. I found the utf8_encode() was mangling characters like ’ and the like. I added a fix I had applied to a different project - converting characters like that to their ASCII equivalent.

ElusiveMind’s picture

Could this be solved via the UTF-8 fix in Drupal and MySQL? Granted it requires MySql 5.5.3 or higher. I will be submitting a new dev that may fix this and the tilde/emoji issue.

  • ElusiveMind committed 04f7da0 on 7.x-3.x
    Issue #2765215 by cyclone321: Drush Commands To Import Tweets
    Issue #...
ElusiveMind’s picture

There is a test fix for this in the dev branch. Did not use the patch but rather leveraged Drupal's support of UTF-8

ElusiveMind’s picture

I have taken Patch #8 and made support for utf8-multibyte conditional. This means that if the user has UTF-8 multibyte tables it will take advantage of them. Otherwise, it will fallback to the patches logic.

  • ElusiveMind committed fe66deb on 7.x-3.x
    Issue #2828756 by cebasqueira, PaulDinelle, Grimreaper, rsmylski,...
ElusiveMind’s picture

Status: Needs review » Reviewed & tested by the community