Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
While updating the nodeformpopup module to Drupal 5, I was confronted with a few very long referers (due to heaps of GETs), and that resulted in this PHP MySQL warning (it occured while creating a node via the nodeformpopup module):
Warning: Data too long for column 'referer' at row 1 query: INSERT INTO watchdog (uid, type, message, severity, link, location, referer, hostname, timestamp)
The attached patch checks the string length of the referer when creating a watchdog entry and trims it to 128 chars if it is longer.
Comment | File | Size | Author |
---|---|---|---|
#74 | 107824_updates.patch | 2.6 KB | andypost |
#72 | 107824_updates.patch | 2.57 KB | andypost |
#71 | 107824_updates.patch | 2.22 KB | andypost |
#67 | 107824_updates.patch | 1.36 KB | andypost |
#64 | updates-extra.patch | 1.55 KB | Gábor Hojtsy |
Comments
Comment #1
drummThis should have a comment explaining why we are throwing away data.
Any patches for HEAD should make the referrer column a text column which has plenty of characters.
Comment #2
gddI agree with drumm that this is better fixed at the database level. I notice in system.install that in system_update_162 referer is changed to a text field for pgsql, so really there should be another update for mysql.
Patch attached. This is my first core patch submission, please be gentle.
Comment #3
gddJust realized this should have been changed to CNR on patch submission
Comment #4
ivanSB@drupal.orgThis stuff still didn't get in 5.7.
I'm not that convinced that this should be turned into a text field since somehow referer and REMOTE_ADDR are a sort of external input.
If you're running on limited web/DB space it could be a serious problem.
http://drupal.org/node/198233
I'd call this a serious bug because if someone is trying to tampering your site he just have to put rubbish in referer to stay under radars.
I'm already applying my patch to all my pg sites.
Comment #5
drummSince we can't make a decision on truncate or lengthen, this should go to 6.x, which has more active reviewers. 6.x's dblog module has a varchar(128), which is not long enough for URLs. It should be either varchar(128) or text.
On client projects, I always make URL fields text. I used to use varchar(255), but I got asked about truncation every time. I think long URLs are something that should be expected. All the other software works with longer URLs: http://www.boutell.com/newfaq/misc/urllength.html.
Comment #6
ivanSB@drupal.orgThanks for the pointer to "URL" lenght. I thought about it and I thought that different programs may have different limits but after all it'd be a matter of what the users of drupal consider reasonable.
I'd think that they can't make good forecasts about how sane other webmasters are and they will prefer to conserve a "long" referer but they should know in advance what reasonable requested_uri() should be.
What about making them longer *and* truncating them?
BTW it is not just the referer but the requested_uri() that should be truncated (they both are external input).
People may not have the option to fine tune Apache and saving in the logs 8K most of the time is not what people would like to do on Drupal.
ISP space may not take into account the web server logs but most of the time it consider DB space.
If you really would like to be paranoid you'd check if the URLs are too long, trim them and *log* they were too long.
If someone is trying to access a not existent page with a 3K URL on my site I'd like to know it.
The lenght at which they will be truncated could be configured in settings.php or in a variable.
I consider important that watchdog works always.
Comment #7
cburschkaSo... 255 or text? And if text, what arbitrary length limit to truncate on?
Comment #8
ivanSB@drupal.orgAs drumm pointed out Apache support URL as long as 8K (for me it is longer than reasonable in most cases)... other software support different length. Some URL from google may be pretty long but I definitively doubt they can reach 2K.
Will it add too much overhead to make it configurable?
I'd set the default length to something as long as 2K.
I really don't know if it is worth to log that the log was truncated... it could be useful to spot if someone is trying to tamper your site or tune the length at which you're trimming... I don't know if it could be considered a serious veicle of attack once you trim... surely I would consider it a possible hostile activity... but maybe if you're that serious, you may tune apache or whatever... and not use drupal as an IDS... that's start to be too phylosophical and will make the decision about truncating delayed further...
Making it truncating at a configurable lenght should be surely better than the current situation and it is easier to decide upon.
If we are checking if the length is above that limit and log it too.... I wouldn't make 2 log entries since there is a chance you lose relationship between the 2 entries.
Better flag a too long entry in the entry itself to make the log atomic.
Comment #9
ivanSB@drupal.orglocales.module seems to be affected by the same problem.
Going to open a separate issue. URL max length should be treated uniformly across drupal.
$lid = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s'", ...
Comment #10
Robin Monks CreditAttribution: Robin Monks commentedPersonally, I don't like the idea of throwing away data.
-1 to cropping URLs
+1 to configuration options for this
At all rates, this is now a discussion for 7.x; doesn't time fly? If we come to a decision we can backport later.
Robin
Comment #11
ivanSB@drupal.orgYou're going to throw away data anyway since there are hardcoded limits in: browsers and servers.
Now... the current limit is too small and data get thrown away in MySQL (without notice?) anyway.
What's going to happen in real life?
The url should be trimmed by the browser or the server and most realistically if you've a very long url
1) you've made some mistake in assigning a too long url to your pages (pathauto or some of your editors)
2) some crawler got crazy
3) someone is trying to exploit something or cause a DOS
If your web server is working properly... it will trim the url. If it is not working properly you may be in trouble.
Now suppose you'd like to understand what's going on from the logs...
There are very few chances you need 8Kb of data to "debug" 1)
Most people (all?) will be hardly interested in collecting crawlers garbage.
You'd be better to protect yourself from 3).
And all this will happen just if... your web server is not working as expected... otherwise the url will get trimmed anyway and won't reach php at all.
If you want to turn drupal into an IDS you may want to write somewhere that the url was trimmed. I'd think that this task should be left to the logging facility of the web server or a real IDS, not to drupal.
[1]I think people doing forensic that would like to collect a 100K executable sent through a URL are a very small minority of drupal users and still there would be better system to intercept an extraordinary long url that doesn't put you at risk of exceptional traffic.
BTW I think there is a similar limit in access table as well.
Comment #12
Dave ReidAttached patch changes the referrer fields in watchdog and accesslog tables to type text. Upgrade paths provided.
Comment #13
AlexisWilke CreditAttribution: AlexisWilke commentedI noticed that the accesslog in the statistics module has the same problem. (i.e. once the watchdog works, you bump in this one!)
I guess D7 is affected. I'm working mainly with D6.
Thank you.
Alexis Wilke
Comment #14
Dave ReidFor now, let's keep this issue assigned to 7.x-dev. Once it gets cleared for that, we can start backporting this. Patch in #12 is good for 7.x review.
Comment #15
ivanSB@drupal.orgWe've 2 kind of URLS we should take care of... request and referer.
Surely Apache trims requested URL and that is sane... but is it going to trim referrer too?
It seems it does:
192.168.1.12 - - [10/Oct/2008:11:00:57 +0200] "GET /drupal/ HTTP/1.0" 400 8391 "-" "Wget/1.11.4"
at around 8K default...
I didn't dig further but it seems you can even tune those limits in Apache cfg.
So Apache (or any other reasonable web server) seems the place where to trim (it's the first place where you may incur in resource exhaustion), not drupal.
To make it clearer, all my motivations for trimming in drupal are gone and I'd say text is enough.
Comment #16
Anonymous (not verified) CreditAttribution: Anonymous commentedCode in #12 looks good. Are we able to provide a test?
Comment #17
AlexisWilke CreditAttribution: AlexisWilke commentedI indeed think that #12 is a good solution and that's what I was thinking to do first. On the other hand, it will use a lot more space in the database if you get a lot of hits with long referrers.
Comment #18
hass CreditAttribution: hass commentedsubscribe
Comment #20
AlexisWilke CreditAttribution: AlexisWilke commentedI thought this was fixed in D7 by changing the type to TEXT instead of a VARCHAR. Why is this bug still open? I'd like to know.
Thank you.
Alexis W.
Comment #21
Dave ReidBecause the patch in this issue hasn't been committed yet and both fields are still 'varchar' types. See {accesslog}.url and {watchdog}.referer.
Comment #22
mfbFor reference here's the error message (no longer just a warning) d7 is spitting out:
Comment #23
Dave ReidRe-rolled for head.
Comment #24
Dave ReidTable name was wrong in title...
Comment #25
mfbIt seems preferable to just add another update script in dblog.install, so those who have d7 installed can upgrade?
Comment #26
Dave ReidYeah I suppose that's better. I also went in and fixed my big pet peeve of the "referer" misspelling. It should be "referrer." Renamed the {accesslog}.url to {accesslog}.referrer for consistency as well (url implies the current url, and not a referrer). All dblog, syslog, and statistics tests passed.
From the Wikipedia article on "referrer":
Comment #27
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD. Thanks.
Comment #28
Dave ReidDries, could you roll back and review #26? I'd prefer to fix the whole misspelling of "referrer" as well.
Comment #29
mfbaww I always thought 'referer' was kinda cute.. :p
Can u fix up the whitespace on the arrays so the values all line up? several of them are now one char off.
Comment #30
Dave ReidWhitespace fixed. Again note, that #23 will need to be rolled back from core, and this applied instead.
Comment #31
Dave ReidNew follow-up patchthat will apply to head without any rolling back of patches.
Comment #32
Dave ReidForgot the initial changes in the _schema functions to the column names.
Comment #33
Dave ReidI'm going to mark this as fixed and file a separate issue for fixing the "referer" spelling error.
Comment #34
andypostWhere is an issue with backport this into 6.x ?
Comment #35
Dave ReidThis is on my hitlist to backport today...
Comment #36
veriKami CreditAttribution: veriKami commentedsubscribing :-)
Comment #37
Dave ReidAh! Stop cross-posting! :)
Comment #38
Dave ReidPatch for 6.x. If this is committed, I will have to change the upgrade paths on D7 with a followup-patch.
Comment #39
andypostUpdates should be 6001 (6000 is default)
Comment #40
Dave Reidandypost, not true. From http://api.drupal.org/api/function/hook_update_N/6
I'll bet that if you look in your {system} table right now, the dblog and statistics modules will have a schema of '0' before this patch. There's even a system_update_6000.
Comment #41
Dave ReidHere's an updated D6 patch and updated follow-up D7 patch that provides the update comment blocks that I forgot to put in...
Comment #42
andypost@Dave Reid you right with version numbers but it (6000) works only if update from 5-branch
I've tried to run update.php and only 6001 shown - 6000 ignored by update.php on 6.x
Comment #43
Dave Reid@andypost. It works just fine for me. You've probably applied some patch, ran the update so now the schema for dblog and statistics is set to '6000'. You'll need to reset your install, or reset the 'schema' column in the {system} table for the dblog and statistics module to '0', which simulates what all the current 6.x installs have out there. I have attached screenshots of the update on a newly-patched 6.x install on my machine.
Comment #44
andypost@Dave Reid sorry - you right, {system}.schema_version was 6000
Comment #45
PanchoDoesn't matter whether it works for you guys, but we do never ever substantially change an update function. This ruins the upgrade path within the D7 trunk.
We can add an update_6000 function to D6 and a new update_7002 to D7.
Then we need to do the following:
in update_7001: wrap "db_change_field(... 'referer', ...)" in "if (db_column_exists('referer') {...} else {...}" or (better) check the column name first, and then do the "db_change_field" on the appropriate column.
in update_7002: wrap "db_change_field(... 'referer', ...)" in "if (db_column_exists('referer') {...}"
This is ugly but necessary. Complaints that the upgrade path doesn't work are even uglier.
Also, please create a new issue for the forgotten comment blocks update. Don't roll that in as this is an independent followup, and unnecessarily complicates this already complicated issue.
Comment #46
Dave ReidPancho, that's why when the 6.x patch is committed, the 7.x follow-up patch in #41 will need to be committed. Don't worry about the 7.x upgrade path. It's not finalized. We don't need to wrap everything in db_column_exists, the column names are still the same. db_change_column will still work in the 7.x paths.
Comment #47
PanchoOh, I'm really sorry - I mixed it up with #343626: Fix "referrer" spelling errors and rename {accesslog}.url to {accesslog}.referrer.
What remains valid from #45 is the last point to split off the forgotten comment blocks issue. I don't see it in the patch either way, so please make sure it doesn't get forgotten.
Sorry again.
Comment #48
fractile81 CreditAttribution: fractile81 commentedSubscribing. Running into long URLs with a search appliance: faceted search is cool, but makes for some really long URLs. It would be great to get this resolved! Thanks!
Comment #49
andypostbug #221020: locales_source.location is just varchar(255) and it is not truncated is the same nature but Gábor Hojtsy has opinion do not change database so it's different
Comment #50
andypostPatch #41 for d6 works fine on live sites so RTBC
Comment #51
andypostSorry, previous was wrong - here patch for d6 last patch outdated because t() removed from table descriptions
Patch for D7 (migration) should be applied latter
Comment #52
AlexisWilke CreditAttribution: AlexisWilke commentedAndy,
You have to choose one or the other method. Either you do it by making sure you never go over the limit or you remove the limit. I think it is better to remove the limit since in 99% of the cases it won't go over the usual size of about 50 to 100 characters. But whenever it happens that a URL is 300, it still works as expected.
Entry #13 (by me) is the same solution as #221020: locales_source.location is just varchar(255) and it is not truncated i.e. truncate the data.
Thank you.
Alexis
Comment #53
andypost@Alexis, as #26 already in CVS by #27 - limit should be removed
better review my last patch
Comment #54
fractile81 CreditAttribution: fractile81 commentedI applied the patch (#51) by hand, and it upgraded and is working successfully!
If someone can verify the patch applies, I think this could be considered reviewed!
Comment #55
Dave ReidMarked #371668: watchdog table referer column not big enough as a duplicate of this issue.
Comment #56
fractile81 CreditAttribution: fractile81 commentedIt's unfortunate this didn't make it into D6.10. If no one else can test that the patch applies, I'll do it (just can't right now).
Comment #57
mattlc CreditAttribution: mattlc commentedHi,
I got the same problem : a vrey long uri called by another server I do not control.
I applied the patch and now it work fine.
Hopefully, this issue will be fixed in next core versions.
Tks!
Comment #58
andypostAs for me patch from #51 applies cleanly but nobody test it except me
Comment #59
Josh Waihi CreditAttribution: Josh Waihi commentedThis looks good to me. I noticed this problem for some ago in PostgreSQL but haven't bothered fixing it since it was only a db logging bug. I'm happy to RTBC this but we'll need to fix the upgrade in D7 too. #41's patch http://drupal.org/files/issues/107824-followup-D7_1.patch looks like it'll do the trick.
Comment #60
hctomHi @all,
will this make it into the next Drupal 6.x core update?
Cheers
hctom
Comment #61
Anonymous (not verified) CreditAttribution: Anonymous commentedTested patch posted #51 on Drupal 6.12 MySQL, works fine thanks, subscribing as this'd be nice to see in 6x core
Comment #62
andypostAbout URL length in browsers
IE 2083 - http://support.microsoft.com/kb/208427
Summary - http://www.boutell.com/newfaq/misc/urllength.html
Suppose this numbers help Gabor to include this patch into 6.x
Comment #63
Gábor HojtsyFinally committed #51 to Drupal 6.x! Now open again for the follow up patch on Drupal 7. Opening as critical to make sure it gets resolved before release.
Comment #64
Gábor HojtsyIn fact, we should not mark the two added updates as 5.x to 6.x updates, because then they would be removed in the D7 release. All D6 -> D6 updates are kept though to allow for updating from any D6 version to any D7 version by tradition. So I've fixed the grouping comments to reflect that with our standard 6.x-extra group. Committed this to 6.x too.
Comment #65
andypostso trivial, rtbc
Comment #66
Gábor Hojtsy@andypost: as said, I did commit #64. So this issue is left needs work for the fixup of the 7.x update path.
Comment #67
andypostPatch to fix upgrade path
Should we leave empty
defgroup updates-6.x-to-7.x
section in statistics.install?Comment #68
Dave Reidcurrent patch breaks the upgrade path. We need to rename statistics_update_7000 to statistics_update_6000 and add dblog_update_6000() to match what is currently in D6. Otherwise people upgrading from a non-updated version of D6 to D7 still get the full upgrade path.
Comment #69
andypost@Dave take a look at #278592: Sync 6.x extra updates with HEAD and Gabor's comment http://drupal.org/node/221020#comment-2038462
Comment #70
Dave Reid@andypost: That applies for 5.x to 6.x updates. This is a 6.x to 6.x update. Read #64 above: "All D6 -> D6 updates are kept though to allow for updating from any D6 version to any D7 version by tradition." That's what we should be doing here.
Comment #71
andypostNew patch adds
updates-6.x-extra
sectionComment #72
andypostAdded formating for d6->d7
Comment #73
andypostChanged comment, RTBC as minor change
Comment #74
andypostpatch was lost, re attach here
Comment #75
andyposttaggin
Comment #76
andypostShould wait til #278592: Sync 6.x extra updates with HEAD
Comment #77
andypostFixed in #278592: Sync 6.x extra updates with HEAD
Comment #79
ivanSB@drupal.orgI just downloaded D7 and the "short url" problem is plaguing the cache too:
cache_page inherits from cache...
Comment #80
Dave Reid@ivanSB: This has nothing to do with the cache tables and that's a whole nother can of worms. Open a new core issue.
Comment #81
andypostJust a link to cause of the problem under Mysql 5.0 #39 #221020: locales_source.location is just varchar(255) and it is not truncated
Comment #82
effulgentsia CreditAttribution: effulgentsia commentedSimilar issue for {accesslog}.path: #1180642: PDOException in statistics_exit() when path longer than 255 characters