While updating the nodeformpopup module to Drupal 5, I was confronted with a few very long referers (due to heaps of GETs), and that resulted in this PHP MySQL warning (it occured while creating a node via the nodeformpopup module):

Warning: Data too long for column 'referer' at row 1 query: INSERT INTO watchdog (uid, type, message, severity, link, location, referer, hostname, timestamp)

The attached patch checks the string length of the referer when creating a watchdog entry and trims it to 128 chars if it is longer.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

drumm’s picture

Status: Needs review » Needs work

This should have a comment explaining why we are throwing away data.

Any patches for HEAD should make the referrer column a text column which has plenty of characters.

gdd’s picture

Component: base system » install system
FileSize
876 bytes

I agree with drumm that this is better fixed at the database level. I notice in system.install that in system_update_162 referer is changed to a text field for pgsql, so really there should be another update for mysql.

Patch attached. This is my first core patch submission, please be gentle.

gdd’s picture

Status: Needs work » Needs review

Just realized this should have been changed to CNR on patch submission

ivanSB@drupal.org’s picture

This stuff still didn't get in 5.7.

I'm not that convinced that this should be turned into a text field since somehow referer and REMOTE_ADDR are a sort of external input.
If you're running on limited web/DB space it could be a serious problem.

http://drupal.org/node/198233

I'd call this a serious bug because if someone is trying to tampering your site he just have to put rubbish in referer to stay under radars.
I'm already applying my patch to all my pg sites.

drumm’s picture

Version: 5.x-dev » 6.x-dev
Component: install system » dblog.module
Status: Needs review » Needs work

Since we can't make a decision on truncate or lengthen, this should go to 6.x, which has more active reviewers. 6.x's dblog module has a varchar(128), which is not long enough for URLs. It should be either varchar(128) or text.

On client projects, I always make URL fields text. I used to use varchar(255), but I got asked about truncation every time. I think long URLs are something that should be expected. All the other software works with longer URLs: http://www.boutell.com/newfaq/misc/urllength.html.

ivanSB@drupal.org’s picture

Thanks for the pointer to "URL" lenght. I thought about it and I thought that different programs may have different limits but after all it'd be a matter of what the users of drupal consider reasonable.
I'd think that they can't make good forecasts about how sane other webmasters are and they will prefer to conserve a "long" referer but they should know in advance what reasonable requested_uri() should be.

What about making them longer *and* truncating them?

BTW it is not just the referer but the requested_uri() that should be truncated (they both are external input).

People may not have the option to fine tune Apache and saving in the logs 8K most of the time is not what people would like to do on Drupal.
ISP space may not take into account the web server logs but most of the time it consider DB space.

If you really would like to be paranoid you'd check if the URLs are too long, trim them and *log* they were too long.
If someone is trying to access a not existent page with a 3K URL on my site I'd like to know it.

The lenght at which they will be truncated could be configured in settings.php or in a variable.

I consider important that watchdog works always.

cburschka’s picture

So... 255 or text? And if text, what arbitrary length limit to truncate on?

ivanSB@drupal.org’s picture

As drumm pointed out Apache support URL as long as 8K (for me it is longer than reasonable in most cases)... other software support different length. Some URL from google may be pretty long but I definitively doubt they can reach 2K.

Will it add too much overhead to make it configurable?

I'd set the default length to something as long as 2K.

I really don't know if it is worth to log that the log was truncated... it could be useful to spot if someone is trying to tamper your site or tune the length at which you're trimming... I don't know if it could be considered a serious veicle of attack once you trim... surely I would consider it a possible hostile activity... but maybe if you're that serious, you may tune apache or whatever... and not use drupal as an IDS... that's start to be too phylosophical and will make the decision about truncating delayed further...
Making it truncating at a configurable lenght should be surely better than the current situation and it is easier to decide upon.

If we are checking if the length is above that limit and log it too.... I wouldn't make 2 log entries since there is a chance you lose relationship between the 2 entries.
Better flag a too long entry in the entry itself to make the log atomic.

ivanSB@drupal.org’s picture

locales.module seems to be affected by the same problem.
Going to open a separate issue. URL max length should be treated uniformly across drupal.

$lid = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s'", ...

Robin Monks’s picture

Version: 6.x-dev » 7.x-dev

Personally, I don't like the idea of throwing away data.

-1 to cropping URLs
+1 to configuration options for this

At all rates, this is now a discussion for 7.x; doesn't time fly? If we come to a decision we can backport later.

Robin

ivanSB@drupal.org’s picture

You're going to throw away data anyway since there are hardcoded limits in: browsers and servers.
Now... the current limit is too small and data get thrown away in MySQL (without notice?) anyway.

What's going to happen in real life?
The url should be trimmed by the browser or the server and most realistically if you've a very long url
1) you've made some mistake in assigning a too long url to your pages (pathauto or some of your editors)
2) some crawler got crazy
3) someone is trying to exploit something or cause a DOS

If your web server is working properly... it will trim the url. If it is not working properly you may be in trouble.
Now suppose you'd like to understand what's going on from the logs...
There are very few chances you need 8Kb of data to "debug" 1)
Most people (all?) will be hardly interested in collecting crawlers garbage.
You'd be better to protect yourself from 3).
And all this will happen just if... your web server is not working as expected... otherwise the url will get trimmed anyway and won't reach php at all.
If you want to turn drupal into an IDS you may want to write somewhere that the url was trimmed. I'd think that this task should be left to the logging facility of the web server or a real IDS, not to drupal.

[1]I think people doing forensic that would like to collect a 100K executable sent through a URL are a very small minority of drupal users and still there would be better system to intercept an extraordinary long url that doesn't put you at risk of exceptional traffic.

BTW I think there is a similar limit in access table as well.

Dave Reid’s picture

Status: Needs work » Needs review
FileSize
2.31 KB

Attached patch changes the referrer fields in watchdog and accesslog tables to type text. Upgrade paths provided.

AlexisWilke’s picture

Title: Check referer string length when creating a watchdog entry » Check referer string length when creating a watchdog OR accesslog entry
Version: 7.x-dev » 6.x-dev
Component: dblog.module » statistics.module
FileSize
492 bytes

I noticed that the accesslog in the statistics module has the same problem. (i.e. once the watchdog works, you bump in this one!)

I guess D7 is affected. I'm working mainly with D6.

Thank you.
Alexis Wilke

Dave Reid’s picture

Version: 6.x-dev » 7.x-dev

For now, let's keep this issue assigned to 7.x-dev. Once it gets cleared for that, we can start backporting this. Patch in #12 is good for 7.x review.

ivanSB@drupal.org’s picture

We've 2 kind of URLS we should take care of... request and referer.

Surely Apache trims requested URL and that is sane... but is it going to trim referrer too?

It seems it does:
192.168.1.12 - - [10/Oct/2008:11:00:57 +0200] "GET /drupal/ HTTP/1.0" 400 8391 "-" "Wget/1.11.4"
at around 8K default...
I didn't dig further but it seems you can even tune those limits in Apache cfg.

So Apache (or any other reasonable web server) seems the place where to trim (it's the first place where you may incur in resource exhaustion), not drupal.

To make it clearer, all my motivations for trimming in drupal are gone and I'd say text is enough.

Anonymous’s picture

Code in #12 looks good. Are we able to provide a test?

AlexisWilke’s picture

I indeed think that #12 is a good solution and that's what I was thinking to do first. On the other hand, it will use a lot more space in the database if you get a lot of hits with long referrers.

hass’s picture

subscribe

Status: Needs review » Needs work

The last submitted patch failed testing.

AlexisWilke’s picture

I thought this was fixed in D7 by changing the type to TEXT instead of a VARCHAR. Why is this bug still open? I'd like to know.

Thank you.
Alexis W.

Dave Reid’s picture

Because the patch in this issue hasn't been committed yet and both fields are still 'varchar' types. See {accesslog}.url and {watchdog}.referer.

mfb’s picture

For reference here's the error message (no longer just a warning) d7 is spitting out:

Error
PDOException: INSERT INTO {watchdog} (uid, type, message, variables, severity, link, location, referer, hostname, timestamp) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8, :db_insert_placeholder_9) - Array ( [:db_insert_placeholder_0] => 1 [:db_insert_placeholder_1] => user [:db_insert_placeholder_2] => Session closed for %name. [:db_insert_placeholder_3] => a:1:{s:5:"%name";s:7:"mfb";} [:db_insert_placeholder_4] => 5 [:db_insert_placeholder_5] => [:db_insert_placeholder_6] => http://localhost/d7/user/logout [:db_insert_placeholder_7] => ***very long referer removed*** [:db_insert_placeholder_8] => 127.0.0.1 [:db_insert_placeholder_9] => 1228299052 ) SQLSTATE[22001]: String data, right truncated: 1406 Data too long for column 'referer' at row 1 in dblog_watchdog() (line 135 of /modules/dblog/dblog.module).
The website encountered an unexpected error. Please try again later. 
Dave Reid’s picture

Title: Check referer string length when creating a watchdog OR accesslog entry » Convert {dblog}.referer and {statisics}.url from varchar to text
Assigned: Unassigned » Dave Reid
Status: Needs work » Needs review
FileSize
2.35 KB

Re-rolled for head.

Dave Reid’s picture

Title: Convert {dblog}.referer and {statisics}.url from varchar to text » Convert {dblog}.referer and {watchdog}.url from VARCHAR to TEXT

Table name was wrong in title...

mfb’s picture

Title: Convert {dblog}.referer and {watchdog}.url from VARCHAR to TEXT » Convert {watchdog}.referer and {accesslog}.url from VARCHAR to TEXT

It seems preferable to just add another update script in dblog.install, so those who have d7 installed can upgrade?

Dave Reid’s picture

Status: Fixed » Needs review
FileSize
13.18 KB

Yeah I suppose that's better. I also went in and fixed my big pet peeve of the "referer" misspelling. It should be "referrer." Renamed the {accesslog}.url to {accesslog}.referrer for consistency as well (url implies the current url, and not a referrer). All dblog, syslog, and statistics tests passed.

From the Wikipedia article on "referrer":

Referer is a common misspelling of the word referrer. It is so common, in fact, that it made it into the official specification of HTTP – the communication protocol of the World Wide Web – and has therefore become a widely used industry spelling when discussing HTTP referrers[citation needed]. The misspelling usage is not universal, the correct spelling of "referrer" is used in some web specifications such as the Document Object Model.

Dries’s picture

Status: Needs review » Fixed

Committed to CVS HEAD. Thanks.

Dave Reid’s picture

Status: Needs review » Fixed

Dries, could you roll back and review #26? I'd prefer to fix the whole misspelling of "referrer" as well.

mfb’s picture

Status: Fixed » Needs review

aww I always thought 'referer' was kinda cute.. :p
Can u fix up the whitespace on the arrays so the values all line up? several of them are now one char off.

Dave Reid’s picture

Whitespace fixed. Again note, that #23 will need to be rolled back from core, and this applied instead.

Dave Reid’s picture

FileSize
12.46 KB

New follow-up patchthat will apply to head without any rolling back of patches.

Dave Reid’s picture

FileSize
13.08 KB

Forgot the initial changes in the _schema functions to the column names.

Dave Reid’s picture

Status: Needs review » Fixed

I'm going to mark this as fixed and file a separate issue for fixing the "referer" spelling error.

andypost’s picture

Where is an issue with backport this into 6.x ?

Dave Reid’s picture

Version: 7.x-dev » 6.x-dev
Status: Fixed » Patch (to be ported)

This is on my hitlist to backport today...

veriKami’s picture

Version: 6.x-dev » 7.x-dev
Status: Patch (to be ported) » Fixed

subscribing :-)

Dave Reid’s picture

Version: 7.x-dev » 6.x-dev
Status: Fixed » Patch (to be ported)

Ah! Stop cross-posting! :)

Dave Reid’s picture

Status: Patch (to be ported) » Needs review
FileSize
2 KB

Patch for 6.x. If this is committed, I will have to change the upgrade paths on D7 with a followup-patch.

andypost’s picture

Updates should be 6001 (6000 is default)

Dave Reid’s picture

andypost, not true. From http://api.drupal.org/api/function/hook_update_N/6

Database updates consist of 3 parts:

* 1 digit for Drupal core compatibility
* 1 digit for your module's major release version (e.g. is this the 5.x-1.* (1) or 5.x-2.* (2) series of your module?)
* 2 digits for sequential counting starting with 00

I'll bet that if you look in your {system} table right now, the dblog and statistics modules will have a schema of '0' before this patch. There's even a system_update_6000.

Dave Reid’s picture

Here's an updated D6 patch and updated follow-up D7 patch that provides the update comment blocks that I forgot to put in...

andypost’s picture

@Dave Reid you right with version numbers but it (6000) works only if update from 5-branch
I've tried to run update.php and only 6001 shown - 6000 ignored by update.php on 6.x

Dave Reid’s picture

@andypost. It works just fine for me. You've probably applied some patch, ran the update so now the schema for dblog and statistics is set to '6000'. You'll need to reset your install, or reset the 'schema' column in the {system} table for the dblog and statistics module to '0', which simulates what all the current 6.x installs have out there. I have attached screenshots of the update on a newly-patched 6.x install on my machine.

andypost’s picture

@Dave Reid sorry - you right, {system}.schema_version was 6000

The following queries were executed
dblog module
Update #6000

    * ALTER TABLE {watchdog} RENAME referer TO referer_old
    * ALTER TABLE {watchdog} ADD COLUMN referer text
    * UPDATE {watchdog} SET referer = referer_old
    * ALTER TABLE {watchdog} DROP COLUMN referer_old
Pancho’s picture

Status: Needs review » Needs work

Doesn't matter whether it works for you guys, but we do never ever substantially change an update function. This ruins the upgrade path within the D7 trunk.

We can add an update_6000 function to D6 and a new update_7002 to D7.
Then we need to do the following:
in update_7001: wrap "db_change_field(... 'referer', ...)" in "if (db_column_exists('referer') {...} else {...}" or (better) check the column name first, and then do the "db_change_field" on the appropriate column.
in update_7002: wrap "db_change_field(... 'referer', ...)" in "if (db_column_exists('referer') {...}"

This is ugly but necessary. Complaints that the upgrade path doesn't work are even uglier.

Also, please create a new issue for the forgotten comment blocks update. Don't roll that in as this is an independent followup, and unnecessarily complicates this already complicated issue.

Dave Reid’s picture

Status: Needs work » Needs review

Pancho, that's why when the 6.x patch is committed, the 7.x follow-up patch in #41 will need to be committed. Don't worry about the 7.x upgrade path. It's not finalized. We don't need to wrap everything in db_column_exists, the column names are still the same. db_change_column will still work in the 7.x paths.

Pancho’s picture

Oh, I'm really sorry - I mixed it up with #343626: Fix "referrer" spelling errors and rename {accesslog}.url to {accesslog}.referrer.
What remains valid from #45 is the last point to split off the forgotten comment blocks issue. I don't see it in the patch either way, so please make sure it doesn't get forgotten.
Sorry again.

fractile81’s picture

Subscribing. Running into long URLs with a search appliance: faceted search is cool, but makes for some really long URLs. It would be great to get this resolved! Thanks!

andypost’s picture

bug #221020: locales_source.location is just varchar(255) and it is not truncated is the same nature but Gábor Hojtsy has opinion do not change database so it's different

andypost’s picture

Status: Needs review » Reviewed & tested by the community

Patch #41 for d6 works fine on live sites so RTBC

andypost’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
2.48 KB

Sorry, previous was wrong - here patch for d6 last patch outdated because t() removed from table descriptions

Patch for D7 (migration) should be applied latter

AlexisWilke’s picture

Status: Reviewed & tested by the community » Needs review

Andy,

You have to choose one or the other method. Either you do it by making sure you never go over the limit or you remove the limit. I think it is better to remove the limit since in 99% of the cases it won't go over the usual size of about 50 to 100 characters. But whenever it happens that a URL is 300, it still works as expected.

Entry #13 (by me) is the same solution as #221020: locales_source.location is just varchar(255) and it is not truncated i.e. truncate the data.

Thank you.
Alexis

andypost’s picture

@Alexis, as #26 already in CVS by #27 - limit should be removed

better review my last patch

fractile81’s picture

I applied the patch (#51) by hand, and it upgraded and is working successfully!

If someone can verify the patch applies, I think this could be considered reviewed!

Dave Reid’s picture

Marked #371668: watchdog table referer column not big enough as a duplicate of this issue.

fractile81’s picture

It's unfortunate this didn't make it into D6.10. If no one else can test that the patch applies, I'll do it (just can't right now).

mattlc’s picture

Hi,

I got the same problem : a vrey long uri called by another server I do not control.

I applied the patch and now it work fine.
Hopefully, this issue will be fixed in next core versions.

Tks!

andypost’s picture

As for me patch from #51 applies cleanly but nobody test it except me

Josh Waihi’s picture

Status: Needs review » Reviewed & tested by the community

This looks good to me. I noticed this problem for some ago in PostgreSQL but haven't bothered fixing it since it was only a db logging bug. I'm happy to RTBC this but we'll need to fix the upgrade in D7 too. #41's patch http://drupal.org/files/issues/107824-followup-D7_1.patch looks like it'll do the trick.

hctom’s picture

Status: Needs review » Reviewed & tested by the community

Hi @all,

will this make it into the next Drupal 6.x core update?

Cheers

hctom

Anonymous’s picture

Tested patch posted #51 on Drupal 6.12 MySQL, works fine thanks, subscribing as this'd be nice to see in 6x core

andypost’s picture

About URL length in browsers
IE 2083 - http://support.microsoft.com/kb/208427
Summary - http://www.boutell.com/newfaq/misc/urllength.html

Suppose this numbers help Gabor to include this patch into 6.x

Gábor Hojtsy’s picture

Version: 6.x-dev » 7.x-dev
Priority: Normal » Critical
Status: Reviewed & tested by the community » Needs work

Finally committed #51 to Drupal 6.x! Now open again for the follow up patch on Drupal 7. Opening as critical to make sure it gets resolved before release.

Gábor Hojtsy’s picture

FileSize
1.55 KB

In fact, we should not mark the two added updates as 5.x to 6.x updates, because then they would be removed in the D7 release. All D6 -> D6 updates are kept though to allow for updating from any D6 version to any D7 version by tradition. So I've fixed the grouping comments to reflect that with our standard 6.x-extra group. Committed this to 6.x too.

andypost’s picture

Status: Needs work » Reviewed & tested by the community

so trivial, rtbc

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Needs work

@andypost: as said, I did commit #64. So this issue is left needs work for the fixup of the 7.x update path.

andypost’s picture

Status: Needs work » Needs review
FileSize
1.36 KB

Patch to fix upgrade path

Should we leave empty defgroup updates-6.x-to-7.x section in statistics.install?

Dave Reid’s picture

Status: Needs review » Needs work

current patch breaks the upgrade path. We need to rename statistics_update_7000 to statistics_update_6000 and add dblog_update_6000() to match what is currently in D6. Otherwise people upgrading from a non-updated version of D6 to D7 still get the full upgrade path.

andypost’s picture

Status: Needs work » Needs review
Dave Reid’s picture

Status: Needs review » Needs work

@andypost: That applies for 5.x to 6.x updates. This is a 6.x to 6.x update. Read #64 above: "All D6 -> D6 updates are kept though to allow for updating from any D6 version to any D7 version by tradition." That's what we should be doing here.

andypost’s picture

Status: Needs work » Needs review
FileSize
2.22 KB

New patch adds updates-6.x-extra section

andypost’s picture

FileSize
2.57 KB

Added formating for d6->d7

andypost’s picture

Status: Needs review » Reviewed & tested by the community

Changed comment, RTBC as minor change

andypost’s picture

FileSize
2.6 KB

patch was lost, re attach here

andypost’s picture

Issue tags: +D7 upgrade path

taggin

andypost’s picture

Version: 7.x-dev » 6.x-dev
Status: Reviewed & tested by the community » Postponed
andypost’s picture

Status: Postponed » Fixed

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

ivanSB@drupal.org’s picture

I just downloaded D7 and the "short url" problem is plaguing the cache too:

$schema['cache'] = array(
    'description' => 'Generic cache table for caching things not separated out into their own tables. Contributed modules may also use this to store cached items.',
    'fields' => array(
      'cid' => array(
        'description' => 'Primary Key: Unique cache ID.',
        'type' => 'varchar',
        'length' => 255,
        'not null' => TRUE,
        'default' => '',
      ),

cache_page inherits from cache...

Dave Reid’s picture

@ivanSB: This has nothing to do with the cache tables and that's a whole nother can of worms. Open a new core issue.

andypost’s picture

effulgentsia’s picture