When very long text is entered in the body of a node and input filter contains Line break converter filter, it doesn't let the content be displayed in the view tab, but it is available in the edit tab.

This seems to happen inside "_filter_autop" function.

Steps to reproduce:
1- Go to Create content > Page (or any other node type)
2- Enter a title
3- For body, enter a text that is longer that 40000 characters
4- Submit
5- Now in the view tab, body is not displayed

If you edit the body and enter less content (Under 30000), it will be viewed.

My configurations:
- Windows XP
- Apache 2.2.3
- PHP 5.2.0
- Drupal 5.1

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

chx’s picture

Priority: Critical » Normal
Status: Active » Postponed (maintainer needs more info)

Hardly critical. I would like to see this reproduced on various OSes and PHP versions before trying to find the regex among the many which runs out (if it's indeed autop).

Cherrr’s picture

Have the same problem that was described. This is very disagreeable bug. I traced the problem on FreeBSD dedicated server. At home Windows 2000 PC (with Apache, PHP, MySQL) all works fine.

Cherrr’s picture

The problem is not in the drupal filter module but in the php settings.
Find and uncomment this strings in php.ini:
;pcre.backtrack_limit=100000
;pcre.recursion_limit=100000
then set it to
pcre.backtrack_limit=1000000
pcre.recursion_limit=1000000
for example.

artem_sokolov’s picture

Version: 5.1 » 5.3

Confirming this issue on Drupal 5.3 (PHP 5.2.4)
The #3 recipe has worked for me by putting in settings.php:

ini_set('pcre.backtrack_limit', 1000000);
ini_set('pcre.recursion_limit', 1000000);
ricabrantes’s picture

Version: 5.3 » 5.x-dev

I put 147000 characters and works very well..

D.5-dev apache on windows xp and mysql, browser Firefox 2

gpk’s picture

Title: Line break converter has a bug with very long texts » Line break converter can trigger (in PHP 5.2.0+) PCRE processing limits with very long texts
Component: filter.module » base system
Status: Postponed (maintainer needs more info) » Active

OK so it appears from the above that the problem is with config of PHP (actually only in PHP 5.2.0 or later, which introduced the PCRE limits http://uk3.php.net/manual/en/ref.pcre.php). See also http://bugs.php.net/bug.php?id=40846.

Solution appears to be to increase the limits as per #4 in settings.php. The link to the PHP bug above actually suggests 10,000,000 as a more sensible limit (i.e. 100 times the PHP default of 100,000, and 10 times the suggested 1,000,000 at #4). Might want to check that we are in fact increasing the system's current limits...

Would probably need to be addressed in 6.x/7.x first and then backported to 5.x but as there's no patch yet I'll leave it against 5.x since that's where most people will be hitting this problem at the moment.

@ricabrantes: what version of PHP are you using? What are the values of pcre.backtrack_limit and pcre.recursion_limit (e.g. from phpinfo)? Are you saying that you reproduced this bug, and that #4 fixed it?

ricabrantes’s picture

My versions are: Windows xp sp3(beta), PHP 5.2.5, MySql 5.0.51, Apache/2.2.6, pcre.backtrack_limit 100000 and pcre.recursion_limit 100000..

I tested on Firefox 2.0.0.12, ie6, opera 9.26 and Safari 3.0.4 for windows..

I can´t reproduced the bug, the text is show very well..

hubris’s picture

Upping the limits to:

ini_set('pcre.backtrack_limit', 10000000);
ini_set('pcre.recursion_limit', 10000000);

doesn't seem to solve the problem. I've entered these values in settings.php, and the changes are confirmed in PHPinfo()

I have a 70,535 character count node and it will not display under the view tab...

Drupal 5.7
PHP 5.2.5
Apache (Unix)
Shared hosting environment

-Chris

hubris’s picture

Title: Line break converter can result in empty node display - PCRE limits » Line break converter can trigger (in PHP 5.2.0+) PCRE processing limits with very long texts
Version: 7.x-dev » 5.x-dev

Some additional information:
I've found some errors listed in my hosted site's control panel error log regarding these long character nodes that I'm trying to edit/submit:

[Mon Mar 31 14:27:33 2008] [error] [client 70.137.148.72] ALERT - configured request variable value length limit exceeded - dropped variable 'field_body_of_chapter[0][value]' (attacker 'IP.address', file '/example_home_directory/index.php'), referer: http://www.examplesite.com/en/node/2170/edit
[Mon Mar 31 13:51:30 2008] [error] [client 70.137.148.72] ALERT - configured request variable value length limit exceeded - dropped variable 'field_body_of_chapter[0][value]' (attacker 'IP.address', file '/example_home_directory/index.php'), referer: http://www.examplesite.com/en/node/1481/edit

After some searching I found this error is related to the Suhosin Hardened PHP extension. Specifically the suhosin.request.max_value_length value of 65000 . My problem node/post of 70,535 characters is exceeding these limits and the field: 'field_body_of_chapter[0][value]' is being dropped... I didn't fully determine when it's being dropped (when I submit the node, when I view the node, etc). But somewhere in the process of creating/viewing the node, it's getting ...killed by this limit.

I've tried increasing these limits via ini_set, but they don't take hold, phpinfo() returns the same 65000 limit:

(tried:
ini_set('hphp.post.max_value_length', 180000); <--- how I've seen it described on other forums
ini_set('hphp.request.max_value_length', 180000);
and
ini_set('suhosin.post.max_value_length', 180000); <--- how the variable actually appears in my phpinfo()
ini_set('suhosin.request.max_value_length', 180000);
)

My host provider recently upgraded to PHP 5.25 (which may have included Suhosin Hardened PHP) - I have existing long nodes in the database, and they are displayed under the View tab. But any edits I try to submit to these existing long text nodes are not submitted - so the problem appears to be occurring during the Submit phase...

So, for those people (like myself) who first try to solve the problem with
ini_set('pcre.backtrack_limit', 10000000);
ini_set('pcre.recursion_limit', 10000000);
and still don't see the 'unable to view long text nodes problem' go away, try looking at your PHP setup to see if the same Hardened PHP restrictions are in place.

-Chris

catch’s picture

Version: 5.x-dev » 7.x-dev

I ran into this because I kept getting completely empty node contents displayed on my site seemingly at random.

http://drupal.org/node/225335 was duplicate. This is a nasty one.

catch’s picture

Title: Line break converter can trigger (in PHP 5.2.0+) PCRE processing limits with very long texts » Line break converter can result in empty node display - PCRE limits
gpk’s picture

Title: Line break converter can trigger (in PHP 5.2.0+) PCRE processing limits with very long texts » Line break converter can result in empty node display - PCRE limits
Version: 5.x-dev » 7.x-dev

@hubris: Just to clarify/confirm: I conclude that in your case the problem you were having is unrelated to the original problem of PCRE limits but a specific restriction on your server's PHP setup which I can't imagine Drupal should try to work round (i.e. there is no way on your server of POSTing more than 65k in the node body). Thanks for the update since that does at least clarify the situation.

Also just to note that the problem is much less likely to occur prior to PHP 5.2.0 since the PCRE limits were essentially reduced with this version of PHP.

catch’s picture

Yeah I should note my install is standard debian etch on 5.2, and we have a bunch of articles which break this limit. Bumping this back to critical since it's a pig to track down and we had a load of visitors asking about 'missing pages' etc.

Not to mention everyone will be running 5.2+ when we release.

hubris’s picture

Priority: Normal » Critical

@gpk: You are correct on the clarification/confirmation: the problem I'm having is not the PCRE limits -- the 65k POSTing limit I'm experiencing is due to the Suhosin Hardend PHP POST limits as setup by my host provider, and isn't something that Drupal development needs to take into account.

-Chris

yngens’s picture

i guess i am having the same issue here. #4 did not help.

gpk’s picture

@yngens: what are the values of pcre.backtrack_limit and pcre.recursion_limit reported by phpinfo() on your server? Also is it running suhosin (again should be reported by phinfo()).

Also what is the size of the post you are trying to make?

yngens’s picture

gpk, i don't remember, but i am sure i tried even bigger numbers than ones recommended here. not sure about suhosin too - i decided to require users to divide big posts into chapters instead of putting everything into one post as a workaround. but the rpbolem is still there and when i have little more time i will try to test again and report here. thanks

gpk’s picture

Status: Active » Postponed (maintainer needs more info)

OK awaiting your input ...

Renirtor’s picture

I have had this empty node problem after upgrading to php5.x from php4.x
I put this code in my php.ini file in root directory:

pcre.backtrack_limit = 150000
pcre.recursion_limit = 150000

the longest node of my site (53896 characters including spaces, 55704 bytes) is now showed again.
Since it solved my needs, I did not set a higher limit because I read here about some side effects:

http://de.php.net/manual/en/pcre.configuration.php

but it's good to know that it worked and that a higher value can solve the problem for longer nodes.

Thanks,

Renirtor

ajayg’s picture

Alternative solution
Just want to confirm saw this issue when I upgraded from php 4.x to php 5.2.6
drupal 5.12
php 5.2.6
Linux Fedora FC8
Resolved the issue by trying solution in comment #3.

But you can also use the paging module which solved the problem without making changes to PCRE limit.

John Morahan’s picture

Status: Postponed (maintainer needs more info) » Needs review
FileSize
1.29 KB
ajayg’s picture

@John Morahan
Do you mean that by appplying the patch you don't need to update PCRE limits?

John Morahan’s picture

That's the idea, yes.

John Morahan’s picture

Component: base system » filter.module

the idea is that the new regex just replaces the \n\n and variants without trying to remember the bits in between.

moving this issue back to filter.module

John Morahan’s picture

FileSize
2.06 KB

with test

mr.baileys’s picture

Regarding the test:

  1. I think $this->randomName() is preferred over str_repeat to generate the long string.
  2. 100000 is an arbitrary number: some servers might allow higher values. Would it be possible to read out the actual value of pcre.backtrack_limit & pcre.recursion_limit and then just add 1 and use that value (or something similar)?
John Morahan’s picture

FileSize
2.17 KB

Status: Needs review » Needs work

The last submitted patch failed testing.

John Morahan’s picture

Status: Needs work » Needs review

apparently an installer change confused the testbot

ajayg’s picture

I made a mistake (sorry don't know why)and resubmitted the patch in #25 as well for retesting. I hope this does not affect testing started previously for patch in #27. If it conflicts, My apologies. Should be more careful next time. I am suspecting even if this may not conflict, the system message about result may conflict since all it says results about "last patch submitted" rather than what time/date the retesting was requested. In that case my request would be the last one.

chx’s picture

Status: Needs review » Reviewed & tested by the community

Nicely done.

webchick’s picture

Status: Reviewed & tested by the community » Needs review

Eh. Can we please have a couple of of the 20 people or so who reported having this issue testing the patch?

frega’s picture

FileSize
3.72 KB

Hmm, chx "assigned" me this issue to review ... but unlike chx i can be (and was) distracted ... so i am not sure whether my input is still relevant ...

Well ... the last patch replaces a backtracking regex with a simpler regex. Testing strings of pcre.backtrack_limit-length is kinda superfluous now, as there is no backtracking or recursive regex in the _filter_autop function left, that could run into that "limit". I would suggest removing the addition to filter.test, and can re-roll the patch if needed.

Yet the new regex leads also to a slightly different output than the old regex - there's whitespace in the last < p >-Tag (which has no impact in HTML). This could be trivial, but as I am no regex-ninja, there could also be other implications I don't see ... I have attached a demo script - illustrating the (trivial?) difference.

John Morahan’s picture

Status: Needs review » Needs work

Thanks for the review frega!

Yeah, I forgot to handle the ending \n's as a special case like the beginning (and also dropped a \n from the final </p>). Will fix later.

I do think the test (or something like it) should stay, so that it will fail if someone later makes a change that unintentionally runs into these limits again. It's not always immediately obvious from looking at a regex how it will behave in these situations.

John Morahan’s picture

FileSize
1.66 KB
John Morahan’s picture

Status: Needs work » Needs review
cburschka’s picture

Status: Needs review » Reviewed & tested by the community

Good patch. Assuming it is enough to test the string exactly at the limit, rather than a longer string...

John Morahan’s picture

well, $this->randomName() adds a short prefix too

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD. Thanks.

catch’s picture

Version: 7.x-dev » 6.x-dev
Status: Fixed » Patch (to be ported)
John Morahan’s picture

Status: Patch (to be ported) » Needs review
FileSize
1.47 KB

Untested backport.

abu3abdalla’s picture

thank you

Damien Tournoud’s picture

Dave Reid’s picture

Status: Needs review » Reviewed & tested by the community

This fixed a problem I had on my local install that had a backtrace limit of 1000. Also tested that increasing the backtrace limit also solves the problem, but this is a good fix. Took me 30 minutes to debug that it was the line break filter and lead me to this issue.

tuffnatty’s picture

+1 for patch in #41.

soxofaan’s picture

I can also confirm that patch from #41 fixes the problem on my setup
(marked duplicate: #711056: Max Number of Lines in Body)

lilyzm’s picture

Version: 6.x-dev » 6.16

I have this problem after updated from 6.15 to 6.16.
The pach fixes the problem.

mr.baileys’s picture

Version: 6.16 » 6.x-dev

@lilyzm: thanks for testing and confirming that the patch works. Version needs to remain at 6.x-dev though, as that's where fixes are applied.

varkenshand’s picture

Does this mean (April 4 today) that D 6.16 has a problem that won't let me save long nodes? As that is what's been happening to me last week. And I can't get it solved.
Also tried to use the filter module of the 6.x-dev version to no avail.

varkenshand’s picture

The long nodes problem has gone away. In my case I narrowed it down to a combination of php5 and MySQL4. Upgrading to MySQL5 solved the issue. Also changed pcre settings just to be on the safe site :o)

joachim’s picture

Confirming this patch fixes the problem and marking #794256: Page "Body" limit? as a duplicate.

Ready to commit! :D

sanduhrs’s picture

The patch in #41 is working well.

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Needs review

There is not much talk about the output differences of before and after the patch. Who tested that apart from just looking at whether PHP chokes or not? Changed output could break sites, themes, etc.

Dave Reid’s picture

Status: Needs review » Reviewed & tested by the community

I tested it locally before I used an .htaccess solution and the patch was working just fine. Seeing as the exact same fix went into D7 and we haven't had any regression problems, I'd say it's back to RTBC.

Gábor Hojtsy’s picture

Uhm given that D7 is not out and the upgrade path is spotty at places, people did not update their D6 sites either, right? Are you sure we can consider that a proof of regression testing?

Dave Reid’s picture

I've said I've tested it manually with my D6 install and I'm using the patch currently on several sites with no problem, I'm not sure what more you need. :/

Gábor Hojtsy’s picture

Taking a quick look at the regular expression being replaced, I think it did not always add paragraph tags for example, while the new one adds at least one paragraph wrapper. This made me suspect we are breaking some backwards compatibility here. Am I missing something?

John Morahan’s picture

Yes that's correct, if you pass it an empty string it will add an empty <p></p>. Originally the very next regex immediately removed that paragraph tag, so I didn't think it was a problem. Now three new rules have been added in between. Still, they all search for specific tags (li/blockquote), so I don't think they should affect this special case.

joachim’s picture

> Yes that's correct, if you pass it an empty string it will add an empty

Won't that break themes that test on the content of the body text being empty?

John Morahan’s picture

no, because it's removed again before it gets anywhere near the theme

John Morahan’s picture

Let me clarify that.

First, this creates the empty <p></p>:

      $chunk = preg_replace('/^\n|\n\s*\n$/', '', $chunk);
      $chunk = '<p>'. preg_replace('/\n\s*\n\n?(.)/', "</p>\n<p>$1", $chunk) ."</p>\n"; // make paragraphs, including one at the end

Next, these three fix up some wrongly nested tags, none of which occur in our <p></p> chunk:

      $chunk = preg_replace("|<p>(<li.+?)</p>|", "$1", $chunk); // problem with nested lists
      $chunk = preg_replace('|<p><blockquote([^>]*)>|i', "<blockquote$1><p>", $chunk);
      $chunk = str_replace('</blockquote></p>', '</p></blockquote>', $chunk);

Finally, this removes the <p></p>:

      $chunk = preg_replace('|<p>\s*</p>\n?|', '', $chunk); // under certain strange conditions it could create a P of entirely whitespace

So that we are left with an empty string, as before.

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Needs work

Ok, looks like other filter related critical issue commits invalidated this patch recently:

$ patch -p0 < autop-pcre-limit_3.patch 
patching file modules/filter/filter.module
Hunk #1 FAILED at 911.
1 out of 1 hunk FAILED -- saving rejects to file modules/filter/filter.module.rej
John Morahan’s picture

Status: Needs work » Needs review
FileSize
1.28 KB

It still applies with -F3 which is what I used for my description above. Here is a clean reroll.

Gábor Hojtsy’s picture

Status: Needs review » Fixed

Ok, this was already reviewed and explained before so committed, thanks.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Ludo.R’s picture

Version: 6.x-dev » 6.15
Status: Closed (fixed) » Active

The #4 solved the problem for displaying the node.

However, the content of the body is not indexed in the drupal search.

Is the this issue fixed in version 6.16 or 6.17?

I may consider doing the upgrade from 6.15 then.

UPDATE : Correction, i just forgot to re-index the search! There is no problem anymore!

ajayg’s picture

Status: Active » Closed (fixed)

Next time could you please close the issue that you activated, if it is not longer happening?