Sometimes Drupal returns HTML entities and the function blueprint_trim_text() doesn't check for them. In the end you could have something like <meta content="bla blu blub &nbsp" /> in your template, which will lead to a non valid document. Because the entity doesn't end properly.

To ensure that this doesn't happen, we have to decode all HTML entities. The following should do the trick:

  // remove any HTML or line breaks so these don't appear in the text
  $text = trim(str_replace(array("\n", "\r", "\r\n"), ' ', strip_tags(html_entity_decode($text, ENT_QUOTES, 'UTF-8'))));

Additionally I check for \r\n which should also be replaced.

Comments

designerbrent’s picture

Status: Active » Fixed

Thanks for this suggestion. I've committed it back to the code and it should show up in the download within 24hrs.

http://drupal.org/cvs?commit=382450

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

VisualFox’s picture

Hi sorry to bump this thread. This is a quick fix I needed in order to remove double quote and ensure the correct spacing between word after strip_tags have done its job. I am using the 1.x branch

It's a little ugly but fixed my problem with the meta description.

/**
 * Trim a post to a certain number of characters, removing all HTML.
 */
function blueprint_trim_text($text, $length = 150) {
  
	$pat[0] = "/[\n\r\t]/"; //replace end line and tab by spage
	$pat[1] = "/[^0-9A-Za-z\!\?\.\s]/"; //replace non-alphanumeric character except for ? ! and .
	$pat[2] = "/\s\s+/";
	
	$rep[0] = " ";
	$rep[1] = "";
	$rep[2] = " ";
	
	//str_replace is ugly but needed to add some space between word after strip_tags have done its job
	$text = preg_replace($pat, $rep, trim(strip_tags(str_replace("><", "> <", html_entity_decode($text, ENT_QUOTES, 'UTF-8')))));
	
	if(strlen($text)>$length) {
		$text = trim(substr($text, 0, $length));
		
		// check to see if the last character in the title is a non-alphanumeric character, except for ? ! and .
 		// if it is strip it off so you don't get strange looking titles
		$lastchar = substr($text, -1, 1);
		
		if (preg_match('/[^0-9A-Za-z\!\?\.]/', $lastchar)) {
    		$text = substr($text, 0, -1);
 	 	}
 	 	
	  	// ? ! and . are ok to end a title with since they make sense
	  	if ($lastchar == '!' || $lastchar == '?' || $lastchar == '.') {
	    	return $text;
	    }
	    
	    $text .= '...';
	}
  
  return $text;
}

FYI: I actually didn't modify blueprint's template.php but my subtheme. I call this new function in the sub_theme_preprocess_page. If needed I can provide a working example.