Hi, I've just installed OpenCalais on my Drupal site. Since I use CKeditor to enter the content, all accents and special characters (it's a francophone site) are converted to HTML entities in the node. I'M wondering if this could have a negative influence on OpenCalais and prevent it from picking up some terms.

Comments

lavamind’s picture

Just to be clear, I'm not getting an error message of any kind.

lavamind’s picture

Component: Code » Documentation

After testing with the help of http://viewer.opencalais.com/ I found out that WSIWYG editors that convert special characters into HTML entities (ie. é) does break things for OpenCalais. Fortunately, the module provides an 'alter' hook by which it's possible to modify the body before it's submitted for analysis :

function mymodule_calais_body_alter(&$body, $node) {
  return $body = html_entity_decode($body, ENT_NOQUOTES, 'UTF-8');
}

Perhaps this should be documented in the README. I know CKeditor's default behavior is to convert to HTML entities, but I don't know about others.

febbraro’s picture

Status: Active » Closed (works as designed)

We have not run into WYSIWYG editors breaking Calais processing. Were you getting no tags returned?

I agree though that the hook needs some documentation. Will add that shortly.

febbraro’s picture