Hi

Recently I've experienced problems dealing with some long feeds which exceeded the VARCHAR 255 limitation of the field 'link' of aggregation_item table.

I've changed this datatype for TEXT and i have it working.

Another solution would be changing to VARCHAR(65535), but this is only available in MySQL 5.0.3 and older. (http://dev.mysql.com/doc/refman/5.0/en/char.html)

I suggest to change this data type for avoiding limitation issues

Regards
Pedro

Files: 
CommentFileSizeAuthor
#95 aggregator-218004-95.patch30.83 KBxjm
PASSED: [[SimpleTest]]: [MySQL] 39,544 pass(es).
[ View ]
#95 interdiff.txt3.34 KBxjm
#91 aggregator_update_path-218004-91.patch22.73 KBcam8001
PASSED: [[SimpleTest]]: [MySQL] 39,401 pass(es).
[ View ]
#82 aggregator_text.patch6.98 KBcatch
PASSED: [[SimpleTest]]: [MySQL] 38,662 pass(es).
[ View ]
#66 218004.patch5.8 KBheyrocker
PASSED: [[SimpleTest]]: [MySQL] 34,668 pass(es).
[ View ]
#64 just-the-tests-maam.patch2.04 KBno_commit_credit
FAILED: [[SimpleTest]]: [MySQL] 34,660 pass(es), 14 fail(s), and 11 exception(s).
[ View ]
#63 218004.patch5.81 KBheyrocker
PASSED: [[SimpleTest]]: [MySQL] 34,676 pass(es).
[ View ]
#60 218004.patch6.31 KBheyrocker
PASSED: [[SimpleTest]]: [MySQL] 34,668 pass(es).
[ View ]
#40 218004-35_aggregator_url.patch7.16 KBtwistor
PASSED: [[SimpleTest]]: [MySQL] 19,931 pass(es).
[ View ]
#38 218004-34-reroll.patch7.15 KBbeeradb
FAILED: [[SimpleTest]]: [MySQL] Invalid PHP syntax in aggregator.install.
[ View ]
#37 218004-34-reroll.patch7.16 KBbeeradb
FAILED: [[SimpleTest]]: [MySQL] Invalid PHP syntax in aggregator.install.
[ View ]
#34 218004-34_aggregator_url.patch7.03 KBalex_b
Passed: 14779 passes, 0 fails, 0 exceptions
[ View ]
#30 218004_aggregator_url_30.patch6.31 KBneclimdul
Passed: 14773 passes, 0 fails, 0 exceptions
[ View ]
#23 aggregator.patch7.41 KBmfb
Failed: Failed to apply patch.
[ View ]
#21 218004-aggregator-longer-urls-21.patch6.68 KBneclimdul
Failed: Failed to apply patch.
[ View ]
#19 218004-aggregator-longer-urls-D7.patch7.11 KBDave Reid
Failed: Failed to apply patch.
[ View ]
#18 218004-aggregator-longer-urls-D7.patch7.59 KBDave Reid
Failed: Failed to apply patch.
[ View ]
#17 218004-17_aggregator_long_urls.patch4.62 KBalex_b
Failed: Invalid PHP syntax in modules/aggregator/aggregator.install.
[ View ]
#16 aggregator-longer-urls-D7-merge.patch4.45 KBpcambra
Failed: Failed to apply patch.
[ View ]
#14 aggregator-longer-urls-D7-1.patch3.34 KBpcambra
Failed: Failed to apply patch.
[ View ]
#10 218004-aggregator-longer-urls-D7.patch3.32 KBDave Reid
Failed: 10954 passes, 148 fails, 136 exceptions
[ View ]

Comments

cburschka’s picture

Title:Aggregator feed character limit» Allow aggregator feed URLs longer than 255 characters
Version:5.7» 7.x-dev

Moved to 7.x.

Seriously though, are insane URLs worth changing the data structure like that? This sets a precedent - we have URLs stored elsewhere, and email addresses can also get pretty long. Turning a VARCHAR into a TEXT could be a big deal for high-traffic sites.

pcambra’s picture

Well, then i propose 2 alternatives

-> Detect "insane" URLs (I think this is a must, to avoid getting mad looking for the error), and correct them, maybe with a tinyurl function, or letting the user decide if he wants to work with the field in long or short format through the settings (warning of the lost of data and the performance impact, of course)

-> If Drupal 7 won't work over MySQL 4.X anymore (i am not sure), letting the user choose the max length of the field, keeping it as a VARCHAR

I found myself in this error and I don't decide the length of the external URL, even they have an absurd length!

Regards and Thanks
Pedro

cburschka’s picture

Oh. That's interesting, MySQL 5.x allows VARCHAR up to 65,536. I must confess that the VARCHAR(255) concept is still deeply ingrained in me. I don't know how long we have to support MySQL 4. Drupal 7 will enforce PHP5, but the database?

What's the limit on Postgre? Perhaps a combination of VARCHAR(512) and intelligent filtering could do the trick. I admit that silently truncating on database insert is not the best way to handle such things.

If we have to keep MySQL 4 for the forseeable future, then perhaps only the warning or a validation function could be sufficient. Refusing to handle longer URLs is still better than silently failing.

alex_b’s picture

Issue tags:+Novice

This is a novice task.

alex_b’s picture

#3: we use MySQL TEXT with a 255 character index in feedapi for sites with hundreds of thousands of feed items. This is doable (see http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/feedapi/fee...)

There are a bunch of URLs out there that are longer than 255 characters (and some that are longer than 512 for that matter). There is no specification for the length of a URL.

I'd say let's use TEXT with an index limited to the first 255 characters (I'm thinking in MySQL world, I hope this is supported by Drupal's DB layer.)

pcambra’s picture

Assigned:Unassigned» pcambra

Begin to work on this.

I am comparing if is better use TEXT (max 65535 characters) or a 65535 length VARCHAR (new in MySQL5), but I've seen that VARCHAR(65535) its only available from 5.0.3. Not sure if postgre supports 'huge' VARCHAR so I think that link and url fields will be modified to get a TEXT field.

aggregator_feed contains these fields: url (indexed) & link (not indexed) for storing urls
aggregator_item contains link field (not indexed)
The three store varchar type with 255 length, I think that all of them should be converted to TEXT type.

link field from aggregator_feed table is not invoked in any sql condition, so I don't think it should be indexed.
url field from aggregator_feed table is currently indexed but it is only invoked as a sql condition in one admin process (aggregator.admin.inc) so I am not sure if this index provides benefits.
link field from aggregator_item is not indexed but it is used in a WHERE clause by hook_aggregator_process, maybe this should be indexed.

I'll take a look to indexes for text columns, as far as I know, you must specify a index lenght when indexing MySQL text columns, I'll take a look to drupal db layer & postgre specs.

alex_b’s picture

Let's also add indexes to URL/guid - currently aggregator_item does not have any.

greggles’s picture

Subscribe. We should really fix this and backport. technically URLS have no length limit. Practically I have heard limits of 2,000 characters in some layers of internet technology. We should certainly be closer to 2,000 than 255.

Regarding data types - that is handled somewhat automatically by the schema API (see the data types). I think this should use the "text" type which stores 16KB worth of characters. Should be plenty.

Dave Reid’s picture

Subscribing.

Dave Reid’s picture

Status:Active» Needs review
Issue tags:+needs backport to D6
StatusFileSize
new3.32 KB
Failed: 10954 passes, 148 fails, 136 exceptions
[ View ]

Drupal 7 patch ready for review.

Status:Needs review» Needs work

The last submitted patch failed testing.

cburschka’s picture

It seems the various CRUD functions won't work now.

cburschka’s picture

Aha! I can see the problem.

         'not null' => TRUE,
-        'default' => '',
         'description' => 'URL to the feed.',
       ),
       'refresh' => array(
@@ -143,10 +141,8 @@ function aggregator_schema() {
         'description' => 'Last time feed was checked for new items, as Unix timestamp.',
       ),
       'link' => array(
-        'type' => 'varchar',
-        'length' => 255,
+        'type' => 'text',
         'not null' => TRUE,
-        'default' => '',

You removed the empty string default value, but you left in the NOT NULL constraint.

The link is an optional field, and need not be passed. But with the updated schema, omitting this value will violate the not-null constraint.

If you're going to remove the empty default, you should probably remove the not-null as well.

pcambra’s picture

StatusFileSize
new3.34 KB
Failed: Failed to apply patch.
[ View ]

Ooops, I was working on the issue but I have patched aggregator.install aggregator.module and aggregator.admin.inc files.

I don't know how to mix both patches, I am attaching mine.

greggles’s picture

So, the patches are similar, but not the same.

pcambras misses a space in the url,255 near line 191, adds an index to link (also missing a space after comma), and as stated modifies aggregator.module or aggregator.admin.inc.

Dave Reid's handles updating the tables by adding the indexes that are in the schema and adds an index on guid.

Aside from the whitespace fixes, I think they can just be merged and we'll be set, right?

pcambra’s picture

StatusFileSize
new4.45 KB
Failed: Failed to apply patch.
[ View ]

I've compared Dave Reid's patch with mine and I think I've merged our changes into this new patch.

I've included the guid index and the update in the install file and the space missing that greggles pointed out is fixed now.

alex_b’s picture

StatusFileSize
new4.62 KB
Failed: Invalid PHP syntax in modules/aggregator/aggregator.install.
[ View ]

Reviewed and tested on PHP 5.2.6 and MySQL 5.0.77

This is coming together now. Nice work. Minor fixes:

- minor code style fixes (mostly missing spaces)
- adjusted comment to latest update to reflect that there's also an index on aggregator_item.guid being created
- aggregator_remove() should not reset link nor url, it needs to use db_update() instead of db_merge() though - otherwise this patch breaks remove functionality.
- description is not necessary in db_change_field() - removed

pcambra: the patch in #16 was created on modules/aggregator - please create future patches from Drupal's root.

There are two outstanding issues from my point of view:

1) Looking at this again, we should convert the guid field to text as well. Many times feed producers use the feed item's URL for the guid and there is no specified character limit to it: http://cyber.law.harvard.edu/rss/rss.html
2) Needs testing on pgsql.

Dave Reid’s picture

Assigned:pcambra» Dave Reid
Status:Needs work» Needs review
StatusFileSize
new7.59 KB
Failed: Failed to apply patch.
[ View ]

Re-rolled patch for head containing improvements from the previous few versions.

In summary:
- Converts {aggregator}_feed.url and .link to text fields since they are URLs and shouldn't be limited to 255 characters.
- Converts {aggregator}_item.link and guid to fext fields for the same reason.
- Drops the {aggregator}_item.fid index and uses fid and guid instead, since there are a couple queries that use WHERE fid = :fid AND guid = :guid. See http://api.drupal.org/api/function/aggregator_aggregator_process/7.
- Removes the #maxlength limitation on the feed URL textfield (tested) instead of using a hardcoded limit of 65535
- Also ran a coder check to be sure there weren't any major things wrong.

I tested the upgrade path and it worked just fine (also found a bug in the previous aggregator update function not returning an array). I'm currently being bitten by the 'cache_update table does not exist' testing error so I'll let the bot test this one.

Dave Reid’s picture

StatusFileSize
new7.11 KB
Failed: Failed to apply patch.
[ View ]

After review of http://api.drupal.org/api/function/aggregator_aggregator_process/7 we do not need to have an index on guid or link. That is the only place where we get those fields in a WHERE condition and it's simply used to find duplicate feeds. If we really wanted to, we'd have to add an index on title as well since that is used in a third query, but I don't think that we should add those indexes here. If it's necessary we should have a separate issue for revising aggregator_aggregator_process().

EDIT: Summary (again)
- Converts {aggregator}_feed.url and .link to text fields since they are URLs and shouldn't be limited to 255 characters.
- Converts {aggregator}_item.link and guid to fext fields for the same reason.
- Removes the #maxlength limitation on the feed URL textfield (tested) instead of using a hardcoded limit of 65535 or 255.
- Changes aggregator_remove() to use db_update instead of db_merge because the feed item should be in the database or there are problems. Remove the description and image from the update since those things aren't going to be changing.
- Also ran a coder check to be sure there weren't any major things wrong.

mfb’s picture

Status:Needs review» Needs work

Tested this patch after running into the issue on a d7 sandbox site and it works well. Coding style also looks good. There is just one problem. When calling db_change_field() you must include the field description in the schema definition. Otherwise, the column comment will be lost.

neclimdul’s picture

Category:feature» bug
Status:Needs work» Needs review
StatusFileSize
new6.68 KB
Failed: Failed to apply patch.
[ View ]

Ran into this testing the opml import function on aggregator. Tried http://opml.tagjag.com/all and was mystified that it thought that url was longer than 255 characters. Of course it isn't and it actually triggering this error on the feeds contained in that opml feed but wow...

So yeah, I'd say this is actually a bug more than a feature request. Its starting to cause weird problems. See #70201: Aggregator: Feed URL Length Is A Limitation where it actually is a closed(?) bug.

Reroll sync with core and update with field descriptions requested by mfb. Any chance of still getting this in to 7?

Status:Needs review» Needs work

The last submitted patch failed testing.

mfb’s picture

Status:Needs work» Needs review
StatusFileSize
new7.41 KB
Failed: Failed to apply patch.
[ View ]

Apparently needed another reroll.

One question I have about this patch: we now allow feed URLs > 255 characters. However, there is a unique key on the first 255 chars of the URL column. What if you want two distinct URLs that have identical first 255 chars?

neclimdul’s picture

I don't have a good answer for that.

alex_b’s picture

Status:Needs review» Needs work

However, there is a unique key on the first 255 chars of the URL column. What if you want two distinct URLs that have identical first 255 chars

The unique key length is about the index length on the column. If you have X URLs that are identical in their first 255 characters, comparisons between them will get slow. This is an unlikely case.

1)
#maxlength => NULL - I think this does not break anything but doesn't have a precedence in Drupal either. I'd just kill this line.

2)
There are changes in aggregator.install that don't seem related.

Dave Reid’s picture

If you don't manually specify #maxlength => NULL, the textfield will get a limit of 64 characters. If we're changing that field to a text, we shouldn't limit the input.

alex_b’s picture

"If you don't manually specify #maxlength => NULL, the textfield will get a limit of 64 characters."

- wasn't aware of that. Thanks for the clarification.

neclimdul’s picture

Yeah, I think the only really odd change is the $ret change in update_7001. If that where removed I think we can agree to RTBC this.

mfb’s picture

Definitely needs a reroll, $ret is gone now.

neclimdul’s picture

StatusFileSize
new6.31 KB
Passed: 14773 passes, 0 fails, 0 exceptions
[ View ]

Easy enough.

mfb’s picture

Status:Needs work» Needs review
MichaelCole’s picture

Crash on Uncaught PDO exception. Very close, maybe test a bit in the UI before submitting next patch. See below:

Example very long URL (476 chars)
http://www.google.com/search?hl=en&client=firefox-a&rls=com.ubuntu%3Aen-...

How I tested:
- mysql
- apply patch
- enable module
- clicked help
- started to test all links on that page
- CRUD a long link.
- add through UI succeeded - http://d7p1.dev/admin/config/services/aggregator/add/feed
- add duplicate - appropriate error
- add similar long (append "test2" to above URL to test 255 index) - CRASH
PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'http://www.google.com/search?hl=en&client=firefox-a&rls=com.ubun' for key 3: INSERT INTO {aggregator_feed} (title, url, refresh, block, description, image, link) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6); Array ( [:db_insert_placeholder_0] => test2 [:db_insert_placeholder_1] => http://www.google.com/search?hl=en&client=firefox-a&rls=com.ubuntu%3Aen-... [:db_insert_placeholder_2] => 3600 [:db_insert_placeholder_3] => 5 [:db_insert_placeholder_4] => [:db_insert_placeholder_5] => [:db_insert_placeholder_6] => ) in aggregator_save_feed() (line 522 of /home/drupaldev/websites/d7p1.dev/modules/aggregator/aggregator.module).

NOT TESTED:
- Postgres
- Update script

NOTES:

Postgres 8.3 can have long varchar: "In any case, the longest possible character string that can be stored is about 1 GB. "
- http://www.postgresql.org/docs/8.3/static/datatype-character.html

MichaelCole’s picture

Status:Needs review» Needs work
alex_b’s picture

Status:Needs work» Needs review
StatusFileSize
new7.03 KB
Passed: 14779 passes, 0 fails, 0 exceptions
[ View ]

The problem is the unique flag on url. This patch replaces it with an index.

Freso’s picture

Status:Needs review» Needs work
Issue tags:-Novice+Needs tests

I'd be hard pressed to call this a "novice" issue. It should have tests though.

mr.baileys’s picture

No longer applies cleanly to HEAD, so will need to be rerolled.

+++ modules/aggregator/aggregator.install 1 Nov 2009 16:30:18 -0000
@@ -191,8 +187,10 @@ function aggregator_schema() {
+    'indexes' => array(
+      'url'  => array(array('url', 255)),
+      ),

Wrong indentation on the closing parenthesis

Powered by Dreditor.

beeradb’s picture

StatusFileSize
new7.16 KB
FAILED: [[SimpleTest]]: [MySQL] Invalid PHP syntax in aggregator.install.
[ View ]

Here's a re-roll.

beeradb’s picture

Status:Needs work» Needs review
StatusFileSize
new7.15 KB
FAILED: [[SimpleTest]]: [MySQL] Invalid PHP syntax in aggregator.install.
[ View ]

With indentation fix.

Status:Needs review» Needs work

The last submitted patch, 218004-34-reroll.patch, failed testing.

twistor’s picture

Status:Needs work» Needs review
StatusFileSize
new7.16 KB
PASSED: [[SimpleTest]]: [MySQL] 19,931 pass(es).
[ View ]

Changed the redeclaration of aggregator_update_7002 to aggregator_update_7003.

mr.baileys’s picture

I don't think the unique key on url should be converted to a regular index: it's a sensible constraint to prevent duplicate URLs. Instead, what about leaving this field as varchar and increasing the maximum length? See {file}_uri and {file}_filename length limitations for a similar issue.

twistor’s picture

I agree with the logic. It would solve the hypothetical index problem of the first 255 characters being the same. Also simplifies the patch.

Jody Lynn’s picture

Version:7.x-dev» 8.x-dev
lpirotte’s picture

Why "Version: 7.x-dev » 8.x-dev"
Does it mean that it will not be patched in 7.x or simply that it's fixed in 7.x and have still to be fixed in 8.x ?
I'm running 7.0 and affected. Do I have to apply the last patch, wait 7.1, wait 8.0 ?
Regards

neclimdul’s picture

All fixes to core go through the latest version before being back ported. That said, this is a database change so it will not be back ported to 7 so you'll have to deal with this problem in some way until 8.x.

lucuhb’s picture

I encounter the same problem with 6.x version.
I suppose that it will unfortunately not be back ported to 6.x version too ?

I tried to update the drupal aggregator_item table with a sql query like this :
ALTER TABLE `drupal_aggregator_item` CHANGE `link` `link` VARCHAR( 500 ) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT ''
The links longer than 255 characters are now correctly displayed.
This change in the basis can it cause problems?

effulgentsia’s picture

subscribe

Wim Leers’s picture

Subscribing. Still a problem in D7.

@neclimdul (#45): may I ask why database changes are not backported? Why can't we support this through a schema change?

neclimdul’s picture

Wim Leers unless somethings changed, schema changes where locked except for critical and security issues. Sadly i don't think this counts.

Wim Leers’s picture

Well, on any site that uses aggregator, this causes PDO Exceptions that are always shown… It doesn't break any website, but it's a severe WTF for the average user.

neclimdul’s picture

Priority:Normal» Major

Fair enough, I think that does upgrade this to major. The wording around critical issues is pretty strict though and I can't say I think this would fall into it. Don't get me wrong, I have been bit by this issue to the point of building a small URL proxy for a d6 site to make it work so I feel your pain but these are the rules we hold ourselves to. @webchick or @dries could always make an exception though.

My concern would be given the exception that we're changing an indexed var_char to a sub-indexed text field. There are performance and implementation concerns that I'm not sure have been fully tested. D8 being open gives us the ability to test and fix this should it be a problem. D7 could be opening a string of regressions. There's a lot of "could" in that so take it as you will.

catch’s picture

Status:Needs review» Needs work
Issue tags:+needs backport to D7

We can change the schema in 8.x. #1284658: MySQL strict mode is causing PDO Exceptions when lengths of varchar are exceeded has guidelines on what should be a schema change and what should be truncated in PHP now. I agree with mr.baileys that removing the unique key on url isn't a good plan here, although pretty sure varchar(255) is the limit for some of the database engines we support so a longer varchar may not be an option either.

xjm’s picture

Edit: never mind. Sheesh.

Do we have a list of these 255 char limit issues somewhere? There's aggregator title (fixed), statistics path (NR), and this one, that I've confused so far this morning.

Edit 70 billion: Found it in the summary at #1284658: MySQL strict mode is causing PDO Exceptions when lengths of varchar are exceeded. So only those three currently?

neclimdul’s picture

@xjm maybe? I can't think of any.

@catch it just occurred to me, maybe we take the unique off url so we can make it a full text field then have a parallel url_hash that's a SHA of the url that we do enforce uniqueness on. If its vital we enforce the uniqueness, this should fix the problem and preserve the functionality, abliet in a bit of a hacky way. I'm not really sure what the gain of the unique index is but there's a solution.

catch’s picture

Could we make this a long varchar (I think we can do 1024 at least), then have a unique index on a substring of that maybe?

Makes sense to me to avoid having two aggregator feeds with exactly the same URL, but also in terms of things that can get horribly broken, a duplicate feed url is not the worst.

neclimdul’s picture

@catch no definitely not. A unique index on a substring would prevent someone from adding 2 long but different urls from similar paths since they are going to have the unique string at the end.

xjm’s picture

Well, if anything, I think the end of the path rather than the beginning would be a more reasonable substring for uniquification.

http://longsubdomain.reallylongdomainname.com/my/really/long/path/to/content/articles/how-i-learned-to-stop-worrying-and-love-the-bomb
http://longsubdomain.reallylongdomainname.com/my/really/long/path/to/content/articles/how-i-learned-to-cook

But I still think that would be a bit fragile.

catch’s picture

Yes that's a good point. We should probably just drop the unique then.

xjm’s picture

@heyrocker found this issue again today, yay. So based on the recent discussion, #40 (with a plain, non-unique index on the field) is the correct solution. So, that patch should be rerolled for current 8.x. I'd also suggest adding test coverage for a long URL, possibly for a pair of long URLs that are similar for the first 255 like in #57.

Assuming this is backportable, the 8.x patch should have no update hook (as we aren't supporting head2head at present). Then, the 7.x backport should include the update hook as well as a 7.0 -> 7.x upgrade path test. See the upgrade path test docs for info (and #1280792: Trigger upgrade path: Node triggers removed when upgrading to 7-dev from 6.25 for a very simple example of the sort of tests we should add.)

heyrocker said he'd take a look at this issue. :)

heyrocker’s picture

Status:Needs work» Needs review
StatusFileSize
new6.31 KB
PASSED: [[SimpleTest]]: [MySQL] 34,668 pass(es).
[ View ]

This is a reroll of the previous patch from head, apparently there have been quite a few changes to aggregator since this time last year. I also added a test where two very long URLs are added and deleted, the only change between them being the last few characters. Do we need anything else?

oriol_e9g’s picture

Status:Needs work» Needs review

It's useful to speed the commit send an only tests patch file that should fail to prove the bug.

xjm’s picture

Status:Needs review» Needs work

Thanks! The test coverage in #60 looks complete to me, and I think that's the complete solution for 8.x.

Few code style cleanups:

  1. +++ b/core/modules/aggregator/aggregator.parser.incundefined
    @@ -111,7 +111,7 @@ function aggregator_parse_feed(&$data, $feed) {
    -    // Resolve the items link.
    +    // Resolve the item's link.

    Unrelated cleanup; this should be removed from the patch.

  2. +++ b/core/modules/aggregator/aggregator.testundefined
    @@ -349,6 +349,37 @@ class AddFeedTestCase extends AggregatorTestCase {
         // Delete feed.
         $this->deleteFeed($feed);
       }
    +  ¶
    +  /**
    ...
    +    $feed_2 = $this->createFeed($long_url_2);
    +    ¶

    Trailing whitespace here.

  3. +++ b/core/modules/aggregator/aggregator.testundefined
    @@ -349,6 +349,37 @@ class AddFeedTestCase extends AggregatorTestCase {
    +   * Tests for feeds with very long URLs.

    I'd remove the word "for" here so that "Tests" becomes a verb rather than a noun. Reference: http://drupal.org/node/1354#functions

  4. +++ b/core/modules/aggregator/aggregator.testundefined
    @@ -349,6 +349,37 @@ class AddFeedTestCase extends AggregatorTestCase {
    +    $this->assertTrue($this->uniqueFeed($feed->title, $feed->url), t('The first long URL feed is unique.'));

    (and subsequent). t() should not be used for the assertion message text. Reference: http://drupal.org/simpletest-tutorial-drupal7#t

  5. +++ b/core/modules/aggregator/aggregator.testundefined
    @@ -349,6 +349,37 @@ class AddFeedTestCase extends AggregatorTestCase {
    +    $this->assertTrue($this->uniqueFeed($feed_2->title, $feed_2->url), t('The second long URL feed is unique.'));
    +
    +
    ...
    +    $this->deleteFeed($feed_2);
    +    ¶

    Unneeded blank lines after both these (plus the trailing whitespace).

When you post an updated patch, could you also post a separate patch that contains only the test method (and not the fix) to expose the failures on testbot?

heyrocker’s picture

StatusFileSize
new5.81 KB
PASSED: [[SimpleTest]]: [MySQL] 34,676 pass(es).
[ View ]

Rerolled with xjm's comments.

no_commit_credit’s picture

StatusFileSize
new2.04 KB
FAILED: [[SimpleTest]]: [MySQL] 34,660 pass(es), 14 fail(s), and 11 exception(s).
[ View ]

heyrocker's test by itself to expose the failures.

xjm’s picture

Status:Needs review» Needs work
+++ b/core/modules/aggregator/aggregator.testundefined
@@ -349,6 +349,35 @@ class AddFeedTestCase extends AggregatorTestCase {
+    // Check feed source.
+    $this->drupalGet('aggregator/sources/' . $feed->fid);
+    $this->assertResponse(200, t('Long URL feed source exists.'));
+    $this->assertText($feed->title, t('Page title'));
+    $this->drupalGet('aggregator/sources/' . $feed->fid . '/categorize');
+    $this->assertResponse(200, t('Long URL feed categorization page exists.'));

Couple more assertion messages missed here.

Outside of that, I think this patch is ready, assuming the patches above pass and fail as expected.

heyrocker’s picture

StatusFileSize
new5.8 KB
PASSED: [[SimpleTest]]: [MySQL] 34,668 pass(es).
[ View ]

OK, everything above looks good and I rerolled for the last t() assertion messages.

xjm’s picture

Status:Needs work» Reviewed & tested by the community
Issue tags:-Needs tests

Yay!

For the backport we'll need to add an update hook + tests, but this is RTBC for 8.x.

catch’s picture

Status:Reviewed & tested by the community» Needs review

Why text and not varchar(1024) or similar?

xjm’s picture

pretty sure varchar(255) is the limit for some of the database engines we support so a longer varchar may not be an option either.

@catch in #52. ;) Am I misunderstanding the comment?

catch’s picture

Yeah I was wrong in #52. It was only MySQL that couldn't handle the longer varchars, and we've required a minimum version high enough to ignore that limitation since 7.x.

xjm’s picture

Reference:
http://drupal.org/node/159605

If I understand the table correctly we can only do 255. I could be totally wrong though.

Edit: Crosspost. We should update that doc I guess. Actually, the doc is mostly correct. I'll update it to make it a little more clear.

xjm’s picture

So, references:
http://dev.mysql.com/doc/refman/5.0/en/char.html
says anything up to really big works for the varchar, which I would have seen if I'd read the whole issue.
And, D7 requires at least MySQL 5.0.15 per http://drupal.org/requirements.

heyrocker’s picture

To me the bigger issue was greggles' point in #8 that URLs have no real length limit, and that he has seen length implementations up to 2000. I just don't see what to gain by setting another arbitrary limit that is just bigger.

xjm’s picture

Status:Needs review» Reviewed & tested by the community

Okay, the case for using a text is up in #8; technically there is no limit to the URL size. So setting back to RTBC.

If we really wanted to enforce uniqueness, we could add fields with hashes of the feed or item URLs or something, but I'm not seeing a strong case for that. aggregator_form_feed_validate() does do a check for uniqueness on the feed URL.

neclimdul’s picture

Yeah if we're not validating then we where just opening a weird usability bug anyways. If someone wants to add the feature back lets have it happen in a new issue and done correctly. +1 RTBC

webchick’s picture

Assigned:Dave Reid» catch

Bumping back to catch to verify. My understanding was that TEXT causes performance regressions over VARCHAR because it can't be indexed.

effulgentsia’s picture

IE8 has a URL length limit of 2083 (oh, Microsoft). So, if we want to use varchar, I suggest varchar(2083), since my guess is that most feed providers would want to have their feeds work with IE8.

catch’s picture

Version:8.x-dev» 7.x-dev
Status:Reviewed & tested by the community» Patch (to be ported)

text vs. varchar is a performance issue when the query involves creating a temporary table - since it'll force an on-disk table, whereas varchar doesn't have that limitation.

However with aggregator feed URLs while there's a chance someone will make a horrible query involving those, it seems a lot less critical than many other places to worry about this, so I'm OK with just moving to text.

Committed/pushed to 8.x. As mentioned this will need an upgrade path for D7 (and it's tagged for backport to D6, so possibly one for D6 and then some trickiness in D7 later).

catch’s picture

Assigned:catch» Unassigned
xjm’s picture

Committed/pushed to 8.x. As mentioned this will need an upgrade path for D7 (and it's tagged for backport to D6, so possibly one for D6 and then some trickiness in D7 later).

Hrm. How does this work? Say we add a hook_update_700X() to the patch for D7. Folks upgrading to 7.14 from either D6 or D7 get that update. Then maybe later we backport to D6 and add hook_update_600X() to D6, which makes the same schema changes that are needed for D7. When the people still on D6 after this release of D6 update to D7, their schema is already what we want it to be; but there will still be folks around on 7.12 who need the update, so we can't remove the D7 update hook. How do we prevent the D6 update from causing later problems for those sites' D7 upgrade path?

For reference, this is the update hook body from the earlier patch:

<?php
  db_drop_unique_key
('aggregator_feed', 'url');
 
db_change_field('aggregator_feed', 'url', 'url', array('type' => 'text', 'not null' => TRUE, 'description' => 'URL to the feed.'));
 
db_change_field('aggregator_feed', 'link', 'link', array('type' => 'text', 'not null' => TRUE, 'description' => 'The parent website of the feed; comes from the <link> element in the feed.'));
 
db_change_field('aggregator_item', 'link', 'link', array('type' => 'text', 'not null' => TRUE, 'description' => 'Link to the feed item.'));
 
db_change_field('aggregator_item', 'guid', 'guid', array('type' => 'text', 'not null' => TRUE, 'description' => 'Unique identifier for the feed item.'));
 
db_add_index('aggregator_feed', 'url', array(array('url', 255)));
?>
catch’s picture

I think with this one, since it's just a changed field, the schema changes would be a no-op (pointless but no harm). So it may be possible to just leave 7.x as is in this case even after a 6.x backport.

catch’s picture

Status:Patch (to be ported)» Needs review
StatusFileSize
new6.98 KB
PASSED: [[SimpleTest]]: [MySQL] 38,662 pass(es).
[ View ]

Trying to knock off some lower hanging fruit from the major bugs queue, here's a re-roll for D7, untested. The patch applied more or less cleanly with p2, so it's mainly adding the update.

xjm’s picture

The update hook looks good to me. Still need the upgrade path tests. Let's also have some folks test the upgrade manually on a site with some populated aggregator data.

irunflower’s picture

Patch was successful. Tested this patch in office hours today:

Local environment (Acquia Dev). Upgraded from 6.25 to 7.x-dev.

Steps:
Installed 6.25 fresh
Manually upgraded to 7.x-dev (drag and drop files)
Applied patch (above) using "patch -p1 < aggregator_text.patch"
Ran upgrade.php
Checked database, "type" was changed from "varchar(255)" to "text" in link field.
Patch successful.

xjm’s picture

Issue tags:-Needs manual testing

Thanks @irunflower for confirming the update hook worked properly!

So now we just need upgrade path tests.

Québec’s picture

Hi,

after getting this message while updating a feed in aggretator

«PDOException : SQLSTATE[22001]: String data, right truncated: 1406 Data too long for column 'link' at row 1: INSERT INTO {aggregator_item} (title, link, author, description, guid, timestamp, fid) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6); Array ( [:db_insert_placeholder_0] => ... »

and after doing some research, I've landed here. Just to be shure that I undestand everything (and that I'm at the right place!): the patch in #82 will correct the actual limit making the cron job «trip over». There were some comments about the upgrading path (#80). Not shure if it concerns this patch. And then it will probably be in D7.14 (is 7.14 about to come out?)?

Thanks for lighting my may.

sdmr’s picture

Tested this patch on a production site running 7.10, where aggregrator module had ceased working - because of oversized URLs in a feed.

1. Uploaded patch file to drupal install root directory.
2. Applied patch using "patch -p1 < aggregrator_text.patch".
3. Patch succeeded.
4. Ran update.php.
5. Did manual "remove items" then "update items" for the broken feed at admin/config/services/aggregator.

Positive result. Feed now works correctly, including importing posts with URLs which were over 255 characters long and previously breaking the aggregator import.

Québec’s picture

I applied the patch (manually) to a 7.12 install (test site). Than ran update.php and then I did a manual cron job.
The cron job took eternity and gave me a white page. I reloaded the browser's page and the cron worked.
But I got this message at the end of the feed list (around 100):

Notice : Undefined offset: 2 in drupal_http_request() (ligne 930 in /home/hangar217model/public_html/includes/common.inc).

Is it related to the patch?

brianV’s picture

awebmanager’s picture

So, without wishing to read 90 posts dating back 4 years (to be fair), does this problem boil down to the length of the URL entered in the "URL" field at admin/config/services/aggregator/edit/feed/1 ? Because I have shortened the URL I originally entered (which was 193 characters) using TinyURL and am still getting the error, even though the block displays the content from the feed anyway. (The front end isn't affected.) The only reason the original URL was that long was because it was generated by a custom search in Google News.

cam8001’s picture

StatusFileSize
new22.73 KB
PASSED: [[SimpleTest]]: [MySQL] 39,401 pass(es).
[ View ]

awebmanager, you are correct. The issue is that title and link fields could not be longer than 255 characters, which has now been fixed for Drupal 8 but needs to be backported to 7 and 6.

Here's an update path test, but I am having trouble getting simpletest to recognise the new file locally, so I can't get the tests to run. Any advice would be much appreciated.

When things are all good I'll roll in catch's patch from 82.

xjm’s picture

@cam8001, you probably just need to clear your site's cache. The upgrade path tests look great to me, so go ahead and roll the patch. Thanks!

xjm’s picture

Assigned:Unassigned» xjm
Issue tags:-Needs tests
+++ b/modules/simpletest/tests/upgrade/update.aggregator.testundefined
@@ -0,0 +1,46 @@
+class AggregatorUpdatePathTestCase extends UpdatePathTestCase {
+  public static function getInfo() {
+    return array(

Missing blank line here.

+++ b/modules/simpletest/tests/upgrade/update.aggregator.testundefined
@@ -0,0 +1,46 @@
+    // Our test data only relies on aggregator.module.
+    $this->uninstallModulesExcept(array('aggregator'));

I didn't know about this function. Nice!

+++ b/modules/simpletest/tests/upgrade/update.aggregator.testundefined
@@ -0,0 +1,46 @@
+    $this->assertTrue($this->performUpgrade(), t('The update was completed successfully.'));

Just need to remove the t() around the assertion message text here.

I'm going to apply this with the old patch locally to test it anyway, so I'll fix those things.

xjm’s picture

Ohhh, we need a files[] entry for the test file. Ahh, the quaint old days before PSR-0.

xjm’s picture

Status:Needs review» Reviewed & tested by the community
StatusFileSize
new3.34 KB
new30.83 KB
PASSED: [[SimpleTest]]: [MySQL] 39,544 pass(es).
[ View ]

Attached cleans up a few things, adds the files[] entry, and reorders the files[] so the update ones are all in a group (before they were mixed in with the upgrade ones and it was confusing). I also ran the test locally with the patch and confirmed that it worked fine.

I've reviewed this issue repeatedly before, and it's undergone lots of manual testing, so marking RTBC. Thanks @cam8001 for rescuing this issue!

Pomliane’s picture

Yeah!, less than 5 years for an RTBC status. Awesome.

xjm’s picture

Yeah!, less than 5 years for an RTBC status. Awesome.

@Pomliane, I don't see any comments from you earlier in the issue helping to write the patch, review the patch, or test the patch.

xjm’s picture

Assigned:xjm» Unassigned
webchick’s picture

Version:7.x-dev» 6.x-dev
Status:Reviewed & tested by the community» Patch (to be ported)

Great work on this, folks!

Committed and pushed #95 to 7.x.

Moving down to 6.x for backport.

Pomaline: Consider the fact that all of the people here involved are volunteering their time to help push Drupal forward, and consider the effect disparaging comments like yours have on the motivation of said volunteers to continue doing so. Also consider apologizing for your remarks, and joining up with these sorts of efforts, rather than attacking them from the sidelines in the future.

greggles’s picture

I appreciate webchick's comments to @Pomaline. It's worth mentioning that they are a short version of our community's shared code of conduct - http://drupal.org/dcoc

cam8001’s picture

Issue tags:-needs backport to D7

Removing tag.

rbennion’s picture

Thanks a bunch for fixing this folks. Appreciate the effort!

David_Rothstein’s picture

Issue tags:+7.17 release notes

Yikes, I hope this one doesn't come back to bite us. I'm pretty sure you can build Views of aggregator feed items, so all sorts of opportunities to have a query whose performance characteristics are changed by this patch, I would think.

We're probably saved by the fact that in practice no one is using the Aggregator module for a site with a large enough number of feeds to have a serious performance impact... hopefully :)

Definitely adding this one to the release notes and CHANGELOG.txt, though.

neclimdul’s picture

Issue tags:-7.17 release notes

@David_Rothstein or if they where, they have enough feeds to have run into this and had to patch anyways like I did way back in the 6.x days when this was posted. :)

neclimdul’s picture

Issue tags:+7.17 release notes

stupid tags

drupalshrek’s picture

Issue summary:View changes

We're still on 7.14 on one of our sites, and I see from #104/#105 that this is fixed in 7.17.

For the record, here is an example of a URL which caused our aggregator feed to blow up:
http://www.ilsole24ore.com/art/notizie/2014-09-11/openexpo-on-line-tutti...

And here is the related error message:
PDOException: SQLSTATE[22001]: String data, right truncated: 1406 Data too long for column 'link' at row 1: INSERT INTO {aggregator_item} (title, link, author, description, guid, timestamp, fid) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6); Array ( [:db_insert_placeholder_0] => OpenExpo: on line tutti i dati dell'Esposizione. Appaltati lavori per 574 milioni. Renzi: «Trasparenza totale» [:db_insert_placeholder_1] => http://feeds.ilsole24ore.com/c/32276/f/438662/s/3e60bea5/sc/33/l/0L0Sils... [:db_insert_placeholder_2] => [:db_insert_placeholder_3] => Contro gli scandali e l'opacità un sito che informa in tempo reale sullo stato di avanzamento dei cantieri e delle spese. Il 68% delle opere è ancora in corso... Only local images are allowed.
[:db_insert_placeholder_4] => ABWSrrsB [:db_insert_placeholder_5] => 1410464367 [:db_insert_placeholder_6] => 8 ) in aggregator_save_item() (line 156 of /aggregator/aggregator.processor.inc).

Another reason for us to upgrade I know...