The problem with the current watchdog() calls is that the messages are stored in the actual locale used to present that page. Ie. if you have a German, French and English section (version) of your site, watchdog logs will be recorded in all three languages intermixed with each other. Administrators reviewing logs are supposed to understand all languages used on the site to understand possible problems. This is not the desired user experience.
To make the watchdog logs admin locale aware, we need to postpone the t() call to the time the log is presented to the administrator and should not t() the string on storage time. This also means that we should store the message in English, and the variables separately in a different field. I have chosen serialize()/unserialize() to store the array.
We should note that for backward compatibility the previous watchdog messages should have an 'N;' value in the field (the serialization of NULL). This needs an upgrade path, which is not yet implemented in the patch, but is quite simple. Some messages in system.install for example also log stuff which is user defined, so t()-ing it is not desired. With specifying N; in the variables field, it is possible to keep those away from t().
Pros of this approach:
- Actual locale used to present messages to the administrator
- Messages stored before any locale is available (ie. install time messages) are later translated so easier to review and understand
Cons:
- Tablesorting of watchdog messages on the admin interface is not directly done using the displayed value. So eg. 404 messages with different file names will not get sorted by the filename, which is only added after the tablesort is done.
- Slightly more storage requirement for the watchdog table.
In my patch, I have updated the watchdog() function used to store messages and the two watchdog review interfaces in watchdog.module. As an example, I also updated the user.module (also fixing some missing theme placeholder calls). This is not a complete patch, updating of other watchdog calls and the upgrade path is also required.
Since this change requires all watchdog() calls changed to run without errors, I have not tested the change in runtime. First I would like to gather opinions on the viability of the patch to put more work into it.
Please review!
Comment | File | Size | Author |
---|---|---|---|
#62 | syslog_plaintext-76588.patch | 1.77 KB | AmrMostafa |
#48 | watchdog-syslog_2.patch | 3.69 KB | kbahey |
#45 | syslog.patch | 3.69 KB | RobRoy |
#44 | watchdog-syslog_1.patch | 3.7 KB | kbahey |
#42 | watchdog-syslog_0.patch | 3.7 KB | kbahey |
Comments
Comment #1
Jose Reyero CreditAttribution: Jose Reyero commentedHi Goba,
I think this is a great idea and looks to me like the best of the solutions.
About the patch, just an idea: we could keep function arguments as they are now, but having either an string or an array for the second one. The watchdog funcion could look which type is passed and then log the message, or the message plus arguments depending on that.
This way it would be backwards compatible -which I know is not the main goal- but it will also simplify the function call when no arguments are needed. It would look like:
A simple array shifting will do to separate the main string from arguments array...
Comment #2
Gábor HojtsyJose, it is not usual to pass these kind of structured arrays in Drupal, and changing the API was never a problem. In fact it enforces people to not follow the usage of t() in watchdog messages. Your solution would silenty accept the t() usage. Either way, let's hear some core comitter comments, so I can continue preparing the patch.
Comment #3
drummSince this changes watchdog()'s arguments, it can't be done in 5.0.
Comment #4
Gábor HojtsyComing back to the issue to give a short summary. Basically the problem is that watchdog() messages are always t()-ed at runtime, so if you have a multilanguage site, you have messages in different languages sprinkled around your admin interface, and it is not actually possible to overview, search, sort or do any other kind of useful action on them.
The patch surely does not apply, but the idea is simple:
A possible problem is if there are embedded calls to t() when creating the replacement array. I would say these instances are only a small minority, so we should not put off the gain we get from the above solution. But maybe you have a better idea of implementing this!? Or just a few encouraging thoghts, so someone has the courage to change all watchdog calls in core :)
Comment #5
kbahey CreditAttribution: kbahey commentedThere is a fly in the ointment: dblog (nee watchdog) is not the only mechanism that uses the watchdog hook. Other modules now use that hook to route messages, even outside of Drupal (e.g. email, syslog).
For these, there is no way for applying a delayed t().
So, while your solution will address the dblog part, it will leave an unaddressed area for the rest of the module.
Don't know offhand what the best solution would be.
Comment #6
Gábor Hojtsy@kbahey: as far as I see, it does not matter, if other loggers cannot do a t() later. The reality now is that they get spaghetti log messages, which are not useful for people overviewing, generating reports from those, etc. If you have five languages enabled on the site, you will get log messages "randomly" in all five languages saved. I think it is highly acceptable to get only English messages, compared to having messages in five arbitrary languages (which might not include English, for sure). If you have the ability to translate on display time then fine, if not, then it is not our problem.
Comment #7
kbahey CreditAttribution: kbahey commentedGabor
The severity column needs to be moved after the uid column for performance reasons.
Comment #8
magico CreditAttribution: magico commentedSee also #40657.
Comment #9
Chris Johnson CreditAttribution: Chris Johnson commentedIf I understand correctly, Gabor's proposal would result in the watchdog hook always calling consumers with English messages, and dblog would then store that data in his modified format, allowing translation at viewing time. If that's essentially correct, then I agree.
Alternatively, there could be an admin selected language to which all messages are translated before passing to consumers. That would allow an admin to choose to have all watchdog messages in Spanish (for example), whether stored in the Drupal dblog database table, or issued to syslog or sent via SMS. I'm not sure what kind of technical effort that might require. it might be simple, or not.
Comment #10
Gábor Hojtsy@kbahey: I am absolutely fine with moving the severity column after the uid column, and I don't really think it belongs in here :)
@magico: Thanks for the issue tip. There chx noted that it might be a performance problem to unseralize the stored replacement array for every message on an admin log page view. As far as I see, this change would help performance on "public" pages, where t() would not be executed on log messages. Log admin pages are visited a lot less. I don't see other points discussed there (apart from whether this is a bug or feature request). I think it is a task :)
@Chris Johnson: Yes, you understand it right. What hook_watchdog() would get is not only a string with a log message, but a string and a replacement array, which can be stored separately, or parsed to a single string for systems where it is not possible to later t() the string with stored replacements. A simple strtr($string, $replacements) does the trick for syslog, so readable English log messages are logged consistently to syslog.
Comment #11
Jose Reyero CreditAttribution: Jose Reyero commentedOh, I thought this had been committed time ago :-(
Anyway, I think Gabor's patch is great, and not only a good solution for the problem, but 'the good one'.
So I think we should go ahead with it and let's leave all the other performance fixes for a different patch. Sure there are many other issues we can fix, but one step at a time...
Comment #12
Wesley Tanaka CreditAttribution: Wesley Tanaka commentedsubscribing via http://drupal.org/node/40657 (which was marked as duplicate of this)
Comment #13
Gábor HojtsyOk, here is a complete and updated patch for latest HEAD to get this change comitted soon.
What it does:
- adds a 'variables' column to the watchdog table
- adds a $variables parameter to watchdog() just between message and severity
- dblog stores the $variables serialized in the database
- syslog replaces variables and logs the resulting message in English
- dblog translates the message and replaces variables on the admin pages
Compromises of this patch:
- node type and language names are not stored translated in the variables array, so not translated in the final output
- a few places needed the message duplicated in code to have drupal_set_message() called with t() and watchdog without t()
- the only message still having translated parts is the throttle message, where format_plural() is used, and I see no clean way to work around that now
- as a result of this patch, although there is an option to order by message text on the overview table, the English text of the stored message is used for ordering, not the translated text
Future work after this patch gets comitted:
- provide an upgrade path, so previous watchdog entries have their messages with "N;" variables (which is the serialization of NULL)
- find a way to abstract the 'link' field of the watchdog messages/table to be language independent... now l(t('view'), ...), l(t('results'), ...) and l(t('edit'), ...) generated HTML links are stored there (in Drupal core). Maybe find a better single word instead of these three to always print, or store the link target and link type separately, showing text according to the link type on display
I have tested the included patch and it seems to work nicely, storing arrays properly and translating on display (with dblog). As far as I see, the compromises made are worth the gains: so that people can now have a single (English) language to log messages with syslog, and any language to review all of their messages with dblog.
I would appreciate reviews and encouragement, so we can step forward :)
Comment #14
Gábor HojtsyWhoops, I have gone against my own documentation at two places. When the log message is not something to translate, like a $_GET['q'] as used in the 'access denied' and 'page not found' error messages, NULL should be sent in the $variables parameter, so dblog knows that the string is not to be translated. Updated patch attached.
Comment #15
kbahey CreditAttribution: kbahey commentedGabor
Few comments:
1. We need an update_2008() in system.install so the watchdog table schema is upgraded with the new column.
2. The severity column needs to move after type and before the other text fields for the sake of performance.
3. We can change the watchdog() function signature to make the $variables parameter optional. This leaves only the first two parameters as required ($type and $message). This way modules/sites that don't use care about language can just send their message in and have it be AS IS.
So:
function watchdog($type, $message, $severity = WATCHDOG_NOTICE, $variables = array(), $link = NULL)
Also, why don't we take the plunge and just make the argument a keyed array. Every other part of Drupal is moving in that direction, so why not consider it for this too?
4. I also do not understand why we have + before and after in this patch:
I will test it after you address the above.
Comment #16
Gábor HojtsyFine with me. Unless the database model is decided, there is no point in doing that update (which is most error prone to fail when you apply a patch, given that possibly other changes will get sooner in then this).
I did not do anything with the severity column, it was there after the text column before (for quite some time). If you think moving that is required, it is still out of scope for this patch. I got too many objections on such side effects of my core patches before, so I would not include that here. Feel free to open a new issue.
Erm? What do you mean? I have the $variables optional already, it is just not in the order you have shown here (but you did not object against that it seems). Only the first two are required. I don't think an associative array is right here. It was introduced at other places, where there were too many arguments, or arguments were often omitted in mixed places. Also we *are* going to parse the watchdog messages out of the Drupal code for translation, despite they being out of t() after this patch. We still need them for translation. If we introduce a "nice" keyed array here, it makes that extraction a lot more harder (if not impossible in many cases).
This is a unified diff, everything included is diffed inline, there is no "before and after part" in there (if this is what you mean). The parts you cited are for mysql and pgsql respectively.
Comment #17
Gábor HojtsyFor consistency, we can go on the 'callback' and 'arguments' storage route, to support log messages with format_plural(), as in throttle module. Menu module already takes this approach for menu item title translation (http://drupal.org/node/128082), so it seems logical to apply this for consistency here too. Question is where should we put that callback name into the watchdog() parameter list... Hm...
Comment #18
Gábor HojtsyBecause of security concerns with the menu approach I linked earlier, I am sticking with the simpler implementation. The attached patch now includes an update function too, so Khalid's concerns are addressed (except the performance concern he raised about the severity column which is AFAIS offtopic here). I hope he or someone else will find the time to test the patch.
Comment #19
Gábor HojtsyDoh, attached patch I mean...
Comment #20
Dries CreditAttribution: Dries commentedI'm in support for this patch.
I think the benefit of this patch outweight the drawback (not being able to sort messages). I think this drawback can be overcome by implementing a vastly superior filtering system for the watchdog table. Also, if watchdog tables can't be sorted correctly, it probably makes sense not to make them sortable.
Should we document in the PHPdoc API why we are taking the arguments separately?
I'd like to see this patch committed, but it is probably worth thinking it through a bit more. What is going to be the impact on a filter mechanism? Things might become non-trivial.
Comment #21
Dries CreditAttribution: Dries commentedSee http://drupal.org/node/67893 for the filter discussion. It would be good to keep that issue in mind when developing this patch.
Comment #22
AmrMostafa CreditAttribution: AmrMostafa commentedGreat patch, I just tested on a recent HEAD and it works fine.
Comment #23
Gábor HojtsyNew parth addressing Dries' concerns about documenting how $message and $variables interact, also removing sortability on the message column as he suggests. Rerolled against HEAD (which of course needed a conflict fix around the update code in system.install).
I hope we can get this patch in soon, as the (lack of) issues connected to the filtering patch are already cleaned up, I addressed all concerns here (except kbahey's suggestion about moving watchdog columns around, which is not related to localization).
Comment #24
Gábor HojtsyNew patch I mean, not parth...
Comment #25
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD.
I'm marking this 'code needs work' until the upgrade documentation has been updated.
Feel free to mark this fixed once the handbook is up-to-date.
Thanks Gabor!
Comment #26
Gábor HojtsyOK, updated http://drupal.org/node/114774#watchdog_logging to reflect our changes, and added http://drupal.org/node/114774#watchdog_parameters to document how things changed here, and what is the proper usage of watchdog().
Comment #27
Gábor HojtsyCommitted three more small fixes to core to handle NULL values in syslog as well as adding a missing array() in ping.module and fixing a typo in my phpdoc.
However I noticed an interesting (superflous?) feature of syslog module, which somewhat counters the language independentness of watchdog logging. We now have this code there:
Why is t() used here? Is it really desired to have different log formats per language? What if I have four languages on my site, each with different log formats defined. Now depending on what language a page was displayed with, my syslog will have messages in different formats. Why is this feature in there? What good does that bring to us?
This is already a theme function, so the format can be customized on a site-by-site basis. Why would we need per-language modification?
Kbahey could probably enlighten us here.
Comment #28
AmrMostafa CreditAttribution: AmrMostafa commentedFew localization-unfriendly watchdogs ;-)
Comment #29
kbahey CreditAttribution: kbahey commentedGabor,
Do we really need the third parameter if we are not translating, or if there are no tokens?
I mean can't we just do:
watchdog('user', 'User has logged in');
Or do we have to specify a NULL or array() on every call?
It would be better to default the third parameter and not force everyone to pass something, even if it is an empty array or a NULL.
Comment #30
kbahey CreditAttribution: kbahey commentedGabor
The only reason syslog used t() was that this is a theme function for token substitution, not really a language translation.
The t() can be removed.
Comment #31
Gábor Hojtsy@alienbrain: Thanks for spotting these missing conversions, comitted to core! I don't know how I missed them myself...
@kbahey: I still don't know what code you are looking at. As I explained earlier, every patch I submitted here included what you suggest about $variables having a default empty array parameter, it was like this from the start, and it is like this in the comitted patch.
@kbahey: t() does language translation, not only token substitution, and that side effect is undersired here. I see you used @ placeholders, so they are all check_plain()-d. Is that required for syslog (or even desired?). I intentionally stepped special handling of @ ! and % over by using strtr() in the $message vars substitution because syslog output is supposed to end up in a character based log, doesn't it? We don't need HTML escaping or wrapping for values. Do we?
Comment #32
kbahey CreditAttribution: kbahey commentedGabor
Re: your first point,
My comment was for the documentation here http://drupal.org/node/114774#watchdog_parameters
Specifically this part;
And the examples:
Can we see the most used case and default the 3rd parameter to it (whether NULL or array()), so we don't have to pass it in every call?
Am I being clear?
Re: your second point for syslog theme function, the @ was used to strip out all HTML markup, since it is very unsightly (makes my eyes bleed) in syslog.
What I can do is clean up that function and use strip_tags() everywhere instead and remove all references to t(). How about that?
Comment #33
Gábor HojtsyKbahey: sure, the update documentation was not clear enough. Now updated to this version:
The "most used case" for the $variables parameter from core examples is an actual associative array (other values are very rare), but an empty is the default anyway, so the update documentation needed an update (not the code).
@ does not strip HTML but escapes it, so instead of
<info@example.com>
you get<info@example.com>
, which I don't think is better. If you strip_tags(), you would loose this part of the log message entirely, which might not be a good idea. Contact module seems to have such messages, where important information might be lost.(These kind of message details were already lost before my patch landed). True, maybe these can me modified to better suite syslog, or syslog can live with logging some HTML(-like stuff).
Comment #34
kbahey CreditAttribution: kbahey commentedGoba
Re: syslog.
How about something like this? (not sure about the last segment about $message and $variables, need your input on that part).
Remember that we just need a decent default, not perfect nor all encompassing, since this function is themable and people can override it anyway they like.
Comment #35
kbahey CreditAttribution: kbahey commentedI meant to say that I am not sure about the last line which contains "$entry['message'] and $entry['variables']" not $message and $variables.
Comment #36
Dries CreditAttribution: Dries commentedsyslog output can be displayed, processed and converted with a variety of tools. Syslog messages are often shown in broswer windows and have to be sanitized. See http://216.147.18.102/ams/ , http://www.dangermen.com/smt/screens.html, http://www.phpwizardry.com/php-syslog-ng.php or http://www.googlog.org/.
Comment #37
Gábor HojtsyDries, are you sure *we* need to take care of HTML escaping? These log messages can appear in any environment, depending on what viewer you use to watch your syslog messages. Other software adding log messages into the syslog escapes for HTML based display? I would doubt.
Kbahey: we are getting there AFAIS. But strip_tags()-ing the message does remove possibly important stuff from the message, like the email address I have shown above.
Comment #38
AmrMostafa CreditAttribution: AmrMostafa commentedThis replaces
t()
withstrtr()
. It could be as well useimplode()
actually, but I feelstrtr()
is easier to read/customize.And adds/uses
_syslog_strip_html_tags()
as a safe alternative forstrip_tags()
.Comment #39
Dries CreditAttribution: Dries commentedI'm not saying we should escape the strings. I'm just saying that they could be used unescaped because I don't think syslog expect to see HTML code. Not sure what to do with it, but it seems best to send non-HTML text to syslog.
Comment #40
kbahey CreditAttribution: kbahey commentedsyslog can be used in many different ways. The most common case is just to tail the log, or use less or similar to view it. Viewing it in a browser is not the only way (and I should say not the most common way either, having worked with monitoring systems before).
So, any markup in it (HTML or otherwise) will be distracting. Plain text is the most common way to view/process syslog.
Again, this is a themable function. We just provide a reasonable default. It can be overridden by anyone who does not like the way we process the text by a simple function in template.php or other means.
Alienbrain's patch is a step in the right direction. For the sake of consistency, @link should be treated the same as @message. The part I do not like about it is the _syslog_strip_html_tags(). Although it does the job in this case, it should be a generic function not specific to this module. Perhaps making it strip_html_tags() in one of the includes would be best?
Comment #41
Dries CreditAttribution: Dries commentedI agree with kbahey: (1) let's strip the HTML and (2) the provided HTML stripper is not particularly elegant. Can't we use the built-in PHP functionality or something that is a bit more future proof?
Comment #42
kbahey CreditAttribution: kbahey commentedHere is a new patch.
It strips the HTML, and eliminates the t() altogether so we are in sync with Goba's previous patch.
it also fixes the few cases in core where we have the email enclosed in angled brackets, so that info does not get lost.
Comment #43
Gábor HojtsyI like this version, looked over the code, and it seems to be elegant and to the point.
Comment #44
kbahey CreditAttribution: kbahey commentedExcept for the comma that needs to be a semicolon, that causes a PHP syntax error!
Bad Khalid, no biscuit ...
Here is a fixed version.
Comment #45
RobRoy CreditAttribution: RobRoy commentedJust edited the patch to conform to code style. '|' . became '|'.
Comment #46
AmrMostafa CreditAttribution: AmrMostafa commentedI've searched conrib and here is a list of modules that has tags which are not HTML in any of their watchdog messages, if this patch goes in, I will make sure to submit issues for them to change the syntax of their affected watchdog messages:
Comment #47
Dries CreditAttribution: Dries commentedPersonally, I find the use of square brackets strange. The rest of the world use less than and greater than signs for e-mail address. This might even be a standard. Round brackets are better than square brackets, IMO.
Also, why don't we move those brackets in the string itself?
Comment #48
kbahey CreditAttribution: kbahey commentedRerolled with parens as per Dries.
Comment #49
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD. Thanks! :)
Comment #50
kbahey CreditAttribution: kbahey commentedThis is not "to be ported" in the usual sense.
Rather, it is a reminder for alienbrain to open the appropriate issues for the modules mentioned in #46 above.
You may want to explicitly mention parenthesis instead of angled brackets.
Please mark as fixed after this is done.
Comment #51
Steven CreditAttribution: Steven commentedThe tag stripping does not make sense. Either syslog information is plain text, or it is HTML. In the latter case, we don't need to do anything. In the former case, we need to remove all tags and convert the rest to plain text.
This patch only removes tags, but doesn't actually convert to plain text. Needs work.
Why on earth is it so hard to get this right people?
Comment #52
Gábor HojtsySteven, how would you define: "convert to plain text"? Probably an exact definition of this is missing to get this right. (A possible definition would be to convert to plain text, like project module converts HTML to send in email, but I am unsure you are interested in that for log messages.)
Comment #53
Gábor HojtsySteven, can you elaborate?
Comment #54
recidive CreditAttribution: recidive commentedI think what Steven is talking about is that previously strings was being passed with the '@' placeholder, which causes
t()
to format them usingcheck_plain()
. Also we currently don't strip out '|' characters, which could cause watchdog to break with messages containing this character.Comment #55
Gábor Hojtsykbahey: can you look into fixing this?
Comment #56
kbahey CreditAttribution: kbahey commentedHow about this simple patch that replaces strip_tags with check_plain?
If you want it as an attachement, let me know.
Comment #57
Gábor HojtsyAny of the referer, request URI and so on could contain a pipe (|), so those need some kind of "escaping" (not sure in what form).
Comment #58
PanchoEven if it is late again in the release cycle, this time this bug must be fixed.
We are having an incredibly powerful new translation system in D6. Therefore D6 will be used a lot for multilingual sites. Imagine an administrator at the EU or U.N. How useful is for him a log with Basque, Estonian and Greek entries intermixed?
For this reason this bug is rather critical than normal IMHO.
Comment #59
PanchoAnd yes, I set this to critical, because it does break functionality, namely the watchdog/syslog, for multilingual sites.
Comment #60
Gábor HojtsyPacho: first take some time to read up on what happened here. Watchdog() logs are in English and localized on display for quite some time now. The remaining issue here is in syslog module's serialization of the log components.
Comment #61
Pancho@Gábor: I'm sorry, you're absolutely right. Maybe I'm being too bold today... I'm gonna take some days break :)
I just stumbled upon this issue, and was surprised that would be still not fixed. Sorry again...
Comment #62
AmrMostafa CreditAttribution: AmrMostafa commentedPatch attached. It provides the following improvements:
|
(pipe) is indeed not the best separator as it can exist in URLs. Every field is now double-quoted and a single whitespace is used as a separator. (more on why this was chosen below). That should address #57$link
parameter is passed as is after stripping its tags, so the link<a href="http://example.com/user/123">View user profile</a> will end up as plain text <code>View user profile
in the syslog entry. The patch converts such links toView user profile: http://example.com/user/123
.On why a white space with double quoting as the separator, we needed something which can't exist in our values. We can discard the message as it comes last. That leaves mainly the URIs. URIs cannot have double quotes nor whitespaces, i.e. they are always encoded. Unfortunately the watchdog
$type
can contain whitespaces so we can't use them alone for separating, also, whitespaces can make it difficult to notice when a field has an empty value (i.e. when you are reading the line, there is no indicator that there is a field here with an empty value). Thus, we double-quote values and separate with a whitespace. There were other options, like>
which can be used and is pretty safe, however is very bad readability wise. Syslogs are often read as is, not necessarily parsed.As an example of the final output, the following URI:
Produces the following syslog entry:
Comment #63
kbahey CreditAttribution: kbahey commentedLooks good overall, except I am -1 on using space as a delimiter. The reason is that this does not lend itself to command line tools, e.g. awk ...etc. when you are looking for things in the syslog.
For example, to get a quick count of each type of errors I do:
Space would not allow these things. I can settle for semicolon or anything that is less likely to be in the strings of the messages.
Comment #64
AmrMostafa CreditAttribution: AmrMostafa commentedThe double-quoted whitespace-separated format is certainly slightly harder to digest from cli, but really, if we want to get this right then this is it. (Or we could use
>
but that looks horrible when read :)Did you know that replacing
-F'|'
with-F'"(^"|" "|"$)'
does the trick for the (rather nifty) command you showed? (You have to use $4 instead of $3 though, because with that field separate $1 will always be empty, so the first field is $2, the second is $3, and so on)... and for PHP CLI:
echo '"field one" "field two" "field three"' | php -r '$l=fgetcsv(STDIN, 1000, " "); var_dump($l);'
prints:Comment #65
kbahey CreditAttribution: kbahey commentedI guess since this is a theme function, I can override it and use | as a delimited.
But it is still too cryptic as a default implementation (me thinks).
Comment #66
ilikepie CreditAttribution: ilikepie commented#62: syslog_plaintext-76588.patch queued for re-testing.