1. Node contains legal and valid html
2. Teaser is extracted as N first words
All the open tags are not closed and page formatting is messed up.

I found the bug when I was moving (copy-pasting) my Blogger blog to Drupal. Blogger's posts often include &ltdiv xmlns="http://www.w3.org/1999/xhtml"&gt. Not closing this tag moves the right column to the bottom, below the main column

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

webchick’s picture

Priority: Critical » Normal

Not critical imo -- you can always go in manually and close the tags. Although it is inconvenient.

Artem’s picture

To me it is critical, because a single user can easily mess up the whole cite without even using any cool hacker technique. Drupal is a "community plumbing" tool and not only "your posts" are published.

Mirrorball’s picture

Version: 4.7.0-beta6 » 4.7.0

I also think this bug is critical. No site should break because a user posted perfectly well-formed HTML.

Mirrorball’s picture

There is this function from Wordpress, which could be used to solve this.

/*
 balanceTags
 
 Balances Tags of string using a modified stack.
 
 @param text      Text to be balanced
 @return          Returns balanced text
 @author          Leonard Lin (leonard@acm.org)
 @version         v1.1
 @date            November 4, 2001
 @license         GPL v2.0
 @notes           
 @changelog       
 ---  Modified by Scott Reilly (coffee2code) 02 Aug 2004
             1.2  ***TODO*** Make better - change loop condition to $text
             1.1  Fixed handling of append/stack pop order of end text
                  Added Cleaning Hooks
             1.0  First Version
*/
function balanceTags($text, $is_comment = 0) {
	
	if ( get_option('use_balanceTags') == 0)
		return $text;

	$tagstack = array(); $stacksize = 0; $tagqueue = ''; $newtext = '';

	# WP bug fix for comments - in case you REALLY meant to type '< !--'
	$text = str_replace('< !--', '<    !--', $text);
	# WP bug fix for LOVE <3 (and other situations with '<' before a number)
	$text = preg_replace('#<([0-9]{1})#', '&lt;$1', $text);

	while (preg_match("/<(\/?\w*)\s*([^>]*)>/",$text,$regex)) {
		$newtext .= $tagqueue;

		$i = strpos($text,$regex[0]);
		$l = strlen($regex[0]);

		// clear the shifter
		$tagqueue = '';
		// Pop or Push
		if ($regex[1][0] == "/") { // End Tag
			$tag = strtolower(substr($regex[1],1));
			// if too many closing tags
			if($stacksize <= 0) { 
				$tag = '';
				//or close to be safe $tag = '/' . $tag;
			}
			// if stacktop value = tag close value then pop
			else if ($tagstack[$stacksize - 1] == $tag) { // found closing tag
				$tag = '</' . $tag . '>'; // Close Tag
				// Pop
				array_pop ($tagstack);
				$stacksize--;
			} else { // closing tag not at top, search for it
				for ($j=$stacksize-1;$j>=0;$j--) {
					if ($tagstack[$j] == $tag) {
					// add tag to tagqueue
						for ($k=$stacksize-1;$k>=$j;$k--){
							$tagqueue .= '</' . array_pop ($tagstack) . '>';
							$stacksize--;
						}
						break;
					}
				}
				$tag = '';
			}
		} else { // Begin Tag
			$tag = strtolower($regex[1]);

			// Tag Cleaning

			// If self-closing or '', don't do anything.
			if((substr($regex[2],-1) == '/') || ($tag == '')) {
			}
			// ElseIf it's a known single-entity tag but it doesn't close itself, do so
			elseif ($tag == 'br' || $tag == 'img' || $tag == 'hr' || $tag == 'input') {
				$regex[2] .= '/';
			} else {	// Push the tag onto the stack
				// If the top of the stack is the same as the tag we want to push, close previous tag
				if (($stacksize > 0) && ($tag != 'div') && ($tagstack[$stacksize - 1] == $tag)) {
					$tagqueue = '</' . array_pop ($tagstack) . '>';
					$stacksize--;
				}
				$stacksize = array_push ($tagstack, $tag);
			}

			// Attributes
			$attributes = $regex[2];
			if($attributes) {
				$attributes = ' '.$attributes;
			}
			$tag = '<'.$tag.$attributes.'>';
			//If already queuing a close tag, then put this tag on, too
			if ($tagqueue) {
				$tagqueue .= $tag;
				$tag = '';
			}
		}
		$newtext .= substr($text,0,$i) . $tag;
		$text = substr($text,$i+$l);
	}  

	// Clear Tag Queue
	$newtext .= $tagqueue;

	// Add Remaining text
	$newtext .= $text;

	// Empty Stack
	while($x = array_pop($tagstack)) {
		$newtext .= '</' . $x . '>'; // Add remaining tags to close
	}

	// WP fix for the bug with HTML comments
	$newtext = str_replace("< !--","<!--",$newtext);
	$newtext = str_replace("<    !--","< !--",$newtext);

	return $newtext;
}
Mirrorball’s picture

If anyone wants to use the above function to fix the teasers (it works as far as I can tell), just call it in a hook_nodeapi function:

function hook_nodeapi(&$node, $op, $teaser = NULL, $page = NULL) {
  switch ($op) {
    case 'view':
      $node->teaser = balanceTags($node->teaser);
      break;
  }
}

But delete the first two lines of balanceTags because it's for WP only.

Artem’s picture

Should we raise the priority back to critical?

profix898’s picture

For a simpler algo look at Superteaser module, function superteaser_close_tags().
http://cvs.drupal.org/viewcvs/drupal/contributions/modules/superteaser/

Mirrorball’s picture

Version: 4.7.0 » x.y.z
Artem’s picture

There was a similar bug in Flexinode teasers and they managed to solve it somehow.

pwolanin’s picture

I found this related code, though I haven't had a chance to test it yet:

http://www.iamcal.com/publish/articles/php/processing_html_part_2/

Some other related forum/issue posts:

http://drupal.org/node/22833
http://drupal.org/node/28211
http://drupal.org/node/37048

I agree that this is a major usability issue, but I think a tag balancing algorithm should be applied to *all* text where HTML is allowed. Again, I want my site to be community friendly, and I think that allowing a user submitting a comment to easily accidentally mis-format all following comments , content, etc. does not support this goal.

At this post: http://drupal.org/node/41503

chx marked the request to close all tags as "won't fix" and points to the HTML corrector module. However, this module is not available for 4.7 (or at least not tagged as such):

http://drupal.org/node/3254/release

jadwigo’s picture

Project: Drupal core » HTML corrector
Version: x.y.z » master
Component: node.module » Code

the HTML corrector module is not tagged as 4.7 but at least the cvs version works on my homepage running 4.7..

pwolanin’s picture

Title: Tags are not closed, when teaser is extracted » Tags are not closed when teaser is extracted
Project: HTML corrector » Drupal core
Version: master » x.y.z
Component: Code » node.module

Changing this back to a Drupal issue. As far as I know, the HTML corrector module only fixes the node body, not the teaser (please correct me if I'm wrong). I also think this issue needs to be fixed in core for at least the teasers, since this teaser problem affects all sites.

JohnG-1’s picture

vote +1 to fix at core

website content legibility is a critical issue.

Steven’s picture

The only way to fix it is in the XSS filter, on output. When extracting the teaser, we have no idea of what format the content is in.

adixon’s picture

Status: Active » Needs review
FileSize
2.87 KB

Yes, I think this is worth fixing in core. I've had the experience of getting the whole site mysteriously mangled because of unclosed div's in teasers. It's probably mainly a problem when using tinymce or pasting in html from a word document, for example, but I think this is pretty mainstream unfortunately...

I've attached a simple patch to node.module that just attaches a renamed '_htmlcorrector_process' function and calls it whenever the node module is auto-generating teasers. I used it instead of the word press one because it's more drupalish in code style, though it looks pretty similar at it's heart.

It doesn't apply it for cases where a poster is specifically using a command (assuming they'll know what they're doing?), but it could with a small edit.

I don't think this patch should be applied as is to core, but I'm putting it here for those that want an interim solution, and to get the attention of those with more responsibility and power than time ...

I think it is worth moving the html corrector process function into one of the drupal include files and then it could be accessed here and in the html corrector module.

adixon’s picture

Hey Steven:

Well, since you're the author of the htmlcorrector module, you oughta know - and thanks for your code that i renamed above ...but I'm curious about your comment - why do we need to know about the format? i.e., what's the risk in closing them regardless? Having to filter all the teasers on output sounds like a heavy code load.

profix898’s picture

I havent looked at the code too closely yet, but shouldnt that html_corrector thing run only if html is selected for input format? We dont need to correct any tags for plain text input, right?

Steven’s picture

Attempting to close HTML tags in a non-HTML format can result in the text becoming damaged or garbled. This is unacceptable.

As for only doing it for the "HTML format"... there is actually no filter that uniquely signals HTML. You could just as well add in a Textile filter first. Core cannot know this.

This is why the closing of HTML tags can only happen on output, during the filtering process. At that point, we are sure we are dealing with HTML, and not some other format.

pwolanin’s picture

I need to look at the code, but I assume that it is th filtered/HTML output form of the teaser that gets stored in the database?

adixon’s picture

Okay, I understand now why we shouldn't be mangling our teaser within core, but I do think this problem merits a solution...

The problem is really about a certain use of drupal which is common and even recommended by a user interface cohort of mine (really, I'm not making this up) - where the default and only input format is html and tinymce or some other wysiwig input is on by default.

So here are three options:

1. put a hook_nodeapi into the tinymce module to clean up the messes that it's enabling.
2. create a separate optional module to do the cleanup.
3. put a hook_nodeapi into the htmlcorrector module to do automatically do the cleanup, with a setting to enable/disable this function (i.e. so you can disable it if you're using textile, though I don't know why you'd use the htmlcorrector module is such a case ...).

I think I like 3. the best, since the cleanup function is already in there (and since chx has identified htmlcorrector as the solution).

pwolanin’s picture

Component: node.module » filter.module
Category: bug » feature
Status: Needs review » Needs work

Ok, after looking at the code from node.module, etc. I see that I was wrong in #19. The teaser is run through check_markup (or maybe not if a module implements hook_view) every time it's viewed. Thus, it seems I was also wrong in #12- I just confirmed for myself that htmlcorrector will correct both the teaser and the body.

So, I think the a solution I would suggest is to make htmlcorrector part of the core filter.module and enable it in any new installation for the default filtered and full HTML input formats. That preserves maximum flexibility, while solving the unclosed tags problems for most installations "out of the box".

Steven’s picture

We do not filter teasers into HTML when extracting them, we keep the original format. But, HTMLcorrector should work just fine already. It is applied to the extracted teaser or the full body on output.

alex_b’s picture

FileSize
3.48 KB

It's one thing to use an add on module to fix the bad input of a user, it's another thing to have the add on module fix the bad output of a core function. I think node_teaser() should be fixed.

I cast the wordpress function mirrorball posted in a quick and dirty module. it works fine for me.

cheers -alex

Boris Mann’s picture

We set default input to Filtered HTML, which does fun stuff like not allow posting of image tags (yet another reason people complain about image support in Drupal).

As part of Filtered HTML, we replace line breaks by default.

I think a setting that is "Automatically close tags" is a valuable, useful function that no one would object to.

adixon’s picture

Sounds like we're still looking for a solution to close this issue.

Thinking again about Steven's comments about not being able to close tags on teasers because we don't know what the format is - yes, that means to me that these filters aren't the right tool for this problem.

Instead, I think that what we need is an entirely separate checkbox on the input filter page that says "fix html in teasers". I'd even vote for making this checked by default for the Full HTML and Filtered HTML in the distributed tables.

I'll think about a patch when time allows unless someone beats me to it.

adixon’s picture

Okay, i've put one together, it was pretty straightforward. My current solution involves:

1. a new field in the filter_formats table:
teaser_close_html tinyint(2) NOT NULL default '0'

2. a couple of modifications to the filter.module to allow a by-filter enabling of the teaser_close_html property.

svn diff filter.module
Index: filter.module
===================================================================
--- filter.module       (revision 212)
+++ filter.module       (working copy)
@@ -21,7 +21,7 @@
   switch ($section) {
     case 'admin/help#filter':
       $output = '<p>'. t('The filter module allows administrators to configure  text input formats for the site.  For example, an administrator may want a filt er to strip out malicious HTML from user\'s comments.  Administrators may also w ant to make URLs linkable even if they are only entered in an unlinked format.')  .'</p>';
-      $output .= '<p>'. t('Users can choose between the available input formats  when creating or editing content.  Administrators can configure which input for mats are available to which user roles, as well as choose a default input format .  Administrators can also create new input formats.  Each input format can be c onfigured to use a selection of filters.') .'</p>';
+      $output .= '<p>'. t('Users can choose between the available input formats  when creating or editing content.  Administrators can configure which input for mats are available to which user roles, as well as choose a default input format .  Administrators can also create new input formats.  Each input format can be c onfigured to use a selection of filters. Administrators can also enable an "Automatically close tags" option.') .'</p>';
       $output .= t('<p>You can</p>
 <ul>
 <li>administer input format permissions and settings  at <a href="%admin-filter s">administer &gt;&gt; input formats</a>.</li>
@@ -439,6 +439,12 @@
       '#description' => module_invoke($filter->module, 'filter', 'description',  $filter->delta),
     );
   }
+
+  $form['teaser_close_html'] = array('#type' => 'checkbox',
+      '#title' => t('Automatically close html tags in excerpts'),
+      '#default_value' => $format->teaser_close_html,
+      '#description' => t('This feature requires the html corrector module to be installed'),
+    );
   $form['submit'] = array('#type' => 'submit', '#value' => t('Save configuratio n'));

   if (isset($format)) {
@@ -520,7 +526,7 @@
   }
   $roles = ','. implode(',', ($form_values['default_format'] ? user_roles() : $ roles)) .',';

-  db_query("UPDATE {filter_formats} SET cache = %d, name='%s', roles = '%s' WHERE format = %d", $cache, $name, $roles, $format);
+  db_query("UPDATE {filter_formats} SET cache = %d, name='%s', roles = '%s', teaser_close_html = %d WHERE format = %d", $cache, $name, $roles, $form_values['teaser_close_html'], $format);

   cache_clear_all('filter:'. $format, true);

3. A small change to node.module:

 svn diff node.module
Index: node.module
===================================================================
--- node.module (revision 218)
+++ node.module (working copy)
@@ -168,11 +168,11 @@
     if (isset($filters['filter/1']) && strpos($body, '<?') !== FALSE) {
       return $body;
     }
+    $format = db_fetch_object(db_query('SELECT * FROM {filter_formats} WHERE format = %d', (integer) $format));
   }
   if ($delimiter !== FALSE) {
-    return substr($body, 0, $delimiter);
+    return _node_teaser_process(substr($body, 0, $delimiter));
   }

   // If we have a short body, the entire body is the teaser.
@@ -187,14 +187,21 @@
   $breakpoints = array('</p>' => 4, '<br />' => 0, '<br>' => 0, "\n" => 0, '. ' => 1, '! ' => 1, '? ' => 1, '。' => 3, '؟ ' => 1);
   foreach ($breakpoints as $point => $charnum) {
     if ($length = strpos($body, $point, $size)) {
-      return substr($body, 0, $length + $charnum);
+      return _node_teaser_process(substr($body, 0, $length + $charnum),$format);
     }
   }

   // If all else fails, we simply truncate the string.
-  return truncate_utf8($body, $size);
+  return _node_teaser_process(truncate_utf8($body, $size),$format);
 }

+function _node_teaser_process($teaser,$format) {
+  if ($format->excerpt_close_html && function_exists('_htmlcorrector_process')) {
+    $teaser = _htmlcorrector_process($teaser);
+  }
+  return $teaser;
+}
+
 function _node_names($op = '', $node = NULL) {
   static $node_names = array();
   static $node_list = array();

I haven't bundled this up as a patch since it's just an idea that I'd like some feedback on. In particular, for a real patch, we'd want that html corrector code moved into core.

Also - i'm getting all the format information for the new _node_teaser_process function, even though i only care about the teaser_close_html value - I'm thinking that there may be other teaser filtering to be done based on the format perhaps? Or even - could filters be extended to operate on the teaser?

Steven’s picture

Um. Filters are the right tool for this problem as they are applied on output. We just need to place it in the HTML formats as a separate item so admins can remove it if needed. I don't see why this needs to be so complicated.

pwolanin’s picture

See #21- essentailly this means including HTML corrector module in core and applying it to the HTML-based formats by default.

I think this is a good idea...

funana’s picture

Version: x.y.z » 4.7.x-dev

html corrector module seems to work fine. It really has to become part of the core!

killes@www.drop.org’s picture

Version: 4.7.x-dev » 6.x-dev

features go into development version

isaac77’s picture

Version: 6.x-dev » 5.x-dev
Category: feature » bug

Sorry, but IMHO this seems like a bug that needs fixing rather than a feature request. Correctly formatted/coded teasers should not wait until drupal 6 as previously suggested. This is a pretty basic function; at the least it should be fixed in the next release of Drupal 5 (if possible).

Note: this is not just an issue for poorly coded content. Correctly coded HTML can still result in an unclosed

tag in the teaser... and this can throw off CSS formatting of the page (a bit).

yched’s picture

Version: 5.x-dev » 6.x-dev

Even if you consider it a bug, then this bug is also present in D6/HEAD, and the proper workflow is still to fix it there, and _then_ possibly maybe backport to D5.

Assigning stuff to D5 won't serve your cause, since it means the thread will get less attention.

mcarbone’s picture

Whatever version this gets fixed on, I hope that it's also tested on the situation I mentioned here: http://drupal.org/node/127076

In that case, it's an escaped character sequence messing things up, rather than a tag.

sun’s picture

Subscribing and agreeing with Steven, this should be a new (core) filter that may be enabled on community sites or sites where (HTML) comments are enabled. D.org issue queues partially suffer from this issue, too.

moonray’s picture

Subscribing.
I would really like a working version of this implemented. It breaks many CSS layouts.
Besides, badly formed HTML offends me on a personal level. ;-)

moonray’s picture

Subscribing.
I would really like a working version of this implemented. It breaks many CSS layouts.
Besides, badly formed HTML offends me on a personal level. ;-)

JohnG-1’s picture

wouldn't it make sense for the teaser to 'break' by default at the first </p> tag ? or the first . full stop ? ... after n lines would be nice but more difficult?

Actually, if you were clever enough, you could probably write a module that would parse the body content onsubmit, and insert the <!--break--> before the first instance of whatever $string (html tag, punctuation mark, linebreak, etc) is specified by admin in the module settings ... (probably per content-type) ... anyone clever enough ?

sun’s picture

Just throwing in another possible cause for this issue:

With CCK a teaser can contain the first body lines and other fields that may be or may be not in HTML format. Each field is able to break a site's layout if tags are not closed. A CCK node does not need to have a body field.

If that would be covered by an input filter, fine. Otherwise we should think of more generic options like filter_xss, like Steven suggested.

dman’s picture

This has been a problem since forever.
Half the time it just messes up validations for no justifiable reason.
The other half it totally screws up somebodys layout, and I have to explain to the newbies that it really wasn't their fault at all.

The problem is enormously amplified when trying to use pure-css themes. They just break that much easier now. Web pundits blame the early browsers for being too forgiving - well, default cropped Drupal teasers should be punished harshly!

I've never actually used html corrector, as my gut feeling was that it was a shortfall that shouldn't happen in the first place, and installing an extra module to fix broken behaviour was ugly.
Boris' point about the kinky things that already happen by default is a good one. The least we can do is unkink a known, bad behaviour in an effort to protect ourselves and everyone else.

I don't care where in the code it happens, but please please lets fix this gaping (old) error.

funana’s picture

Actually, if you were clever enough, you could probably write a module that would parse the body content onsubmit, and insert the

before the first instance of whatever $string (html tag, punctuation mark, linebreak, etc) is specified by admin in the module settings ... (probably per content-type) ... anyone clever enough ?

That's exactly what HTML corrector module does.

But anyways... I absolutely second that this is a MUST for core. Every new Drupal user I know asked me for a solution for that problem...

moonray’s picture

Actually, if you were clever enough, you could probably write a module that would parse the body content onsubmit, and insert the

before the first instance of whatever $string (html tag, punctuation mark, linebreak, etc) is specified by admin in the module settings ... (probably per content-type) ... anyone clever enough ?

What if the first thinb a person enters is a table tag that doesn't close until the end of the body? No, we need to properly close any tag that gets opened.

The other option could be to just strip (most) html.

webchick’s picture

Status: Needs work » Needs review
FileSize
5.15 KB

K, well. Here is a patch that just literally takes the code from htmlcorrector and puts it into core, like we did for urlfilter. I haven't tested it because I'm not sure how to get it to trigger this, but hopefully someone with a test site can.

webchick’s picture

Actually, it does seem to work.. to reproduce, paste the following into a story or whatever (thanks, wikipedia!):

<strong>Mars is the fourth planet from the Sun and is known as the Red Planet due to its reddish appearance as seen from Earth. The planet is named after Mars, the Roman god of war. A terrestrial planet, Mars has a thin atmosphere and surface features reminiscent both of the impact craters of the Moon and the volcanoes, valleys, deserts and polar ice caps of Earth. It has the highest mountain in the solar system, Olympus Mons, and the largest canyon, Valles Marineris. Mars' rotational period and seasonal cycles are also similar to those of the Earth. Of all the planets in our solar system other than Earth, Mars is the most likely to harbor liquid water, and perhaps life. Mars is currently host to three functional orbiting spacecraft: Mars Odyssey, Mars Express, and Mars Reconnaissance Orbiter. This is more than any planet except Earth. The surface is also home to the two Mars Exploration Rovers (Spirit and Opportunity). Geological evidence gathered by these and preceding missions suggests that Mars previously had large-scale water coverage, while observations also indicate that small geyser-like water flows have occurred in recent years. Mars has two moons, Phobos and Deimos, which are small and irregularly shaped. (more...)

moonray’s picture

Looks like it's working properly.

I did get a notice, though. But I'm not sure it was caused by this patch:
notice: Undefined index: default_format in /home/moonray/wwwroot/drupal-6/modules/filter/filter.module on line 498.

moonray’s picture

Just for clarification, I tested it with a large table, and it closed the table cells and rows properly.

yched’s picture

Great - I just suggested going along with HTML corrector in core on the devel ML a few days ago (this got positive feedback, FWIW).

I can't actually test the patch right now, but HTML corrector works quite well as is, so I guess there shouldn't be too many hardships.

One question : it ssems HTML corrector does not fix the case of tags (or HTML entities) that get cut in the middle by teaser break.
Meaning you could get teasers ending with, for instance,

Steven, if you're around : do you know if this has been tempted before in the original HTML corrector module ?

drewish’s picture

trying this out i ran into one small bug, if you feed in invalid html:

Sun and is known as the Red Planet due to its reddish appearance as seen from Earth. The planet is named after Mars, the<em> Roman <strong>god </em>of war. </strong>

you'll get the following warning:

notice: Undefined offset: 0 in modules/filter/filter.module on line 1133.

and the output ends up being:

<p>Sun and is known as the Red Planet due to its reddish appearance as seen from Earth. The planet is named after Mars, the<em> Roman <strong>god </strong></em>of war. </p> Geological evidence gathered by these and preceding missions suggests that Mars previously had large-scale water coverage, while observations also indicate that small geyser-like water flows have occurred in recent years.

the only part that's annoying is that it inserts < /p >

Steven’s picture

FileSize
6.53 KB

It's good to keep htmlcorrector separate from e.g. the XSS filter. That way, even Full HTML nodes will get cleaned up (which are more likely use complex tags). The HTML corrector also doesn't look at all for exploits, so its parser is much simpler and faster. And because it needs to analyze tags for determining the proper nesting, you get free XHTML closing slashes for tags like <img> too. I'm all for getting this into core.

As for the bad closing tag: I tweaked the algorithm so that it will not close mismatched tags on the stack if there is no opening tag to match it up with. In the case above, the original </strong> tag has no matching tag anymore (as it has already been closed earlier), so it is now just discarded instead of closing </p> in search of the strong tag.

Note that the final result is still probably not what the user intended, but the goal is to get correctly nested HTML, not to match user intentions. However, with the latest tweak, we only touch the absolute minimum of tags needed to fix it, and the most common oversights and lazy shortcuts are covered.

Finally, I also added an update, similar to what we did with urlfilter.module, however with an encore. As this is essentially a bugfix for a common problem rather than just an enhancement, it should be installed for users who didn't use htmlcorrector before too.

The update will look for htmlcorrector.module, disable it and replace its filter if found; otherwise it will look for any format with "HTML" in the name and will add the filter at the end. Upgrading from a default 5 to 6 will get you an identical filter setup as a clean 6 install.

Please test the tweaked algorithm.

Steven’s picture

FileSize
6.41 KB

This one has no debug code, but you may want to try the earlier patch for testing.

drewish’s picture

it looks really good to me. the update worked as expected. i tried throwing some other broken html at it and everything seemed to work correctly. the only suggestion i'd make would be setting the default weight to 10. would you really want to be running a filter after this? it seems like a good final cleanup.

yched’s picture

I also tested, it looks great.

Contrary to what I wrote in #46, it seems 'cut off tags' are handled to some extent :
foo<stron is rightly filtered to foo
yet strangely, foo<li does seem to generate an empty list item tag.

For truncated HTML entities, foo&nbs is displayed foo&nbs, but I think there is probably no good fix for this...

webchick’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
6.72 KB

Re-rolled with drewish's weight suggestion. I think either this or #49 are rtbc.

moonray’s picture

I tested the patch on Drupal 5 as well, and it works great with some minor adjustments. A backport should be fairly easy to immplement.

yched’s picture

FileSize
6.27 KB

bump + rerolled.

yched’s picture

Title: Tags are not closed when teaser is extracted » HTML corrector in core

more descriptive title

Dries’s picture

This looks good. We'll want to add an entry to CHANGELOG.txt when this gets committed.

drewish’s picture

FileSize
7.09 KB

added an entry to the changelog

Dries’s picture

Status: Reviewed & tested by the community » Fixed

I re-rolled this patch against CVS HEAD, and committed it. Thanks all.

dman’s picture

Woot! Yay!
That's an old bugbear finally gone. Drupal is now approaching perfection, I can't think of anything I don't like about it now ... ;-)

Artem’s picture

Thank you all guys for fixing it. I didn't even hope that this problem gets *really* fixed anytime this year.

Is there any chance to backport the fix to 4.7 or at least 5.1? After all it is quite a critical problem isn't it?

Best regards,
Artem, this issue reporter

yched’s picture

new features do not get added to released versions, so there's no way this will go in D5 _core_.

But you can download the 'original' HTML corrector module separately, it's exactly the same code as the one that's just been included in D6 - only as a separate module.

Anonymous’s picture

Status: Fixed » Closed (fixed)