The problem with the current watchdog() calls is that the messages are stored in the actual locale used to present that page. Ie. if you have a German, French and English section (version) of your site, watchdog logs will be recorded in all three languages intermixed with each other. Administrators reviewing logs are supposed to understand all languages used on the site to understand possible problems. This is not the desired user experience.

To make the watchdog logs admin locale aware, we need to postpone the t() call to the time the log is presented to the administrator and should not t() the string on storage time. This also means that we should store the message in English, and the variables separately in a different field. I have chosen serialize()/unserialize() to store the array.

We should note that for backward compatibility the previous watchdog messages should have an 'N;' value in the field (the serialization of NULL). This needs an upgrade path, which is not yet implemented in the patch, but is quite simple. Some messages in system.install for example also log stuff which is user defined, so t()-ing it is not desired. With specifying N; in the variables field, it is possible to keep those away from t().

Pros of this approach:

  • Actual locale used to present messages to the administrator
  • Messages stored before any locale is available (ie. install time messages) are later translated so easier to review and understand

Cons:

  • Tablesorting of watchdog messages on the admin interface is not directly done using the displayed value. So eg. 404 messages with different file names will not get sorted by the filename, which is only added after the tablesort is done.
  • Slightly more storage requirement for the watchdog table.

In my patch, I have updated the watchdog() function used to store messages and the two watchdog review interfaces in watchdog.module. As an example, I also updated the user.module (also fixing some missing theme placeholder calls). This is not a complete patch, updating of other watchdog calls and the upgrade path is also required.

Since this change requires all watchdog() calls changed to run without errors, I have not tested the change in runtime. First I would like to gather opinions on the viability of the patch to put more work into it.

Please review!

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Jose Reyero’s picture

Hi Goba,

I think this is a great idea and looks to me like the best of the solutions.

About the patch, just an idea: we could keep function arguments as they are now, but having either an string or an array for the second one. The watchdog funcion could look which type is passed and then log the message, or the message plus arguments depending on that.

This way it would be backwards compatible -which I know is not the main goal- but it will also simplify the function call when no arguments are needed. It would look like:

watchdog('user', array('External load by %user using module %module.', '%user' => ..., '%module' =>..., )

A simple array shifting will do to separate the main string from arguments array...

Gábor Hojtsy’s picture

Jose, it is not usual to pass these kind of structured arrays in Drupal, and changing the API was never a problem. In fact it enforces people to not follow the usage of t() in watchdog messages. Your solution would silenty accept the t() usage. Either way, let's hear some core comitter comments, so I can continue preparing the patch.

drumm’s picture

Version: x.y.z » 6.x-dev

Since this changes watchdog()'s arguments, it can't be done in 5.0.

Gábor Hojtsy’s picture

Title: Administrator locale aware watchdog logs » t() watchdog messages in display time, instead of at the time of logging
Component: watchdog.module » language system
Status: Needs review » Active

Coming back to the issue to give a short summary. Basically the problem is that watchdog() messages are always t()-ed at runtime, so if you have a multilanguage site, you have messages in different languages sprinkled around your admin interface, and it is not actually possible to overview, search, sort or do any other kind of useful action on them.

The patch surely does not apply, but the idea is simple:

1. Remove t() calls from watchdog calls, add a new array parameter to watchdog with the replacements:
-  watchdog('user', t('Session closed for %name.', array('%name' => $user->name)));
+  watchdog('user', 'Session closed for %name.', array('%name' => $user->name));

2. Store the array of replacements in the watchdog table (as a serialized array):
         uid int(10) NOT NULL default '0',
         type varchar(16) NOT NULL default '',
         message longtext NOT NULL,
+        vars longtext NOT NULL,
         severity tinyint(3) unsigned NOT NULL default '0',
         link varchar(255) NOT NULL default '',
         location varchar(128) NOT NULL default '',

3. t() when displaying the text.
4. For legacy messages (where the 'vars' array is NULL), just display what message was in the database.

A possible problem is if there are embedded calls to t() when creating the replacement array. I would say these instances are only a small minority, so we should not put off the gain we get from the above solution. But maybe you have a better idea of implementing this!? Or just a few encouraging thoghts, so someone has the courage to change all watchdog calls in core :)

kbahey’s picture

There is a fly in the ointment: dblog (nee watchdog) is not the only mechanism that uses the watchdog hook. Other modules now use that hook to route messages, even outside of Drupal (e.g. email, syslog).

For these, there is no way for applying a delayed t().

So, while your solution will address the dblog part, it will leave an unaddressed area for the rest of the module.

Don't know offhand what the best solution would be.

Gábor Hojtsy’s picture

@kbahey: as far as I see, it does not matter, if other loggers cannot do a t() later. The reality now is that they get spaghetti log messages, which are not useful for people overviewing, generating reports from those, etc. If you have five languages enabled on the site, you will get log messages "randomly" in all five languages saved. I think it is highly acceptable to get only English messages, compared to having messages in five arbitrary languages (which might not include English, for sure). If you have the ability to translate on display time then fine, if not, then it is not our problem.

kbahey’s picture

Gabor

The severity column needs to be moved after the uid column for performance reasons.

magico’s picture

See also #40657.

Chris Johnson’s picture

If I understand correctly, Gabor's proposal would result in the watchdog hook always calling consumers with English messages, and dblog would then store that data in his modified format, allowing translation at viewing time. If that's essentially correct, then I agree.

Alternatively, there could be an admin selected language to which all messages are translated before passing to consumers. That would allow an admin to choose to have all watchdog messages in Spanish (for example), whether stored in the Drupal dblog database table, or issued to syslog or sent via SMS. I'm not sure what kind of technical effort that might require. it might be simple, or not.

Gábor Hojtsy’s picture

@kbahey: I am absolutely fine with moving the severity column after the uid column, and I don't really think it belongs in here :)

@magico: Thanks for the issue tip. There chx noted that it might be a performance problem to unseralize the stored replacement array for every message on an admin log page view. As far as I see, this change would help performance on "public" pages, where t() would not be executed on log messages. Log admin pages are visited a lot less. I don't see other points discussed there (apart from whether this is a bug or feature request). I think it is a task :)

@Chris Johnson: Yes, you understand it right. What hook_watchdog() would get is not only a string with a log message, but a string and a replacement array, which can be stored separately, or parsed to a single string for systems where it is not possible to later t() the string with stored replacements. A simple strtr($string, $replacements) does the trick for syslog, so readable English log messages are logged consistently to syslog.

Jose Reyero’s picture

Oh, I thought this had been committed time ago :-(

Anyway, I think Gabor's patch is great, and not only a good solution for the problem, but 'the good one'.

So I think we should go ahead with it and let's leave all the other performance fixes for a different patch. Sure there are many other issues we can fix, but one step at a time...

Wesley Tanaka’s picture

subscribing via http://drupal.org/node/40657 (which was marked as duplicate of this)

Gábor Hojtsy’s picture

Status: Active » Needs review
FileSize
55.6 KB

Ok, here is a complete and updated patch for latest HEAD to get this change comitted soon.

What it does:

- adds a 'variables' column to the watchdog table
- adds a $variables parameter to watchdog() just between message and severity
- dblog stores the $variables serialized in the database
- syslog replaces variables and logs the resulting message in English
- dblog translates the message and replaces variables on the admin pages

Compromises of this patch:

- node type and language names are not stored translated in the variables array, so not translated in the final output
- a few places needed the message duplicated in code to have drupal_set_message() called with t() and watchdog without t()
- the only message still having translated parts is the throttle message, where format_plural() is used, and I see no clean way to work around that now
- as a result of this patch, although there is an option to order by message text on the overview table, the English text of the stored message is used for ordering, not the translated text

Future work after this patch gets comitted:

- provide an upgrade path, so previous watchdog entries have their messages with "N;" variables (which is the serialization of NULL)
- find a way to abstract the 'link' field of the watchdog messages/table to be language independent... now l(t('view'), ...), l(t('results'), ...) and l(t('edit'), ...) generated HTML links are stored there (in Drupal core). Maybe find a better single word instead of these three to always print, or store the link target and link type separately, showing text according to the link type on display

I have tested the included patch and it seems to work nicely, storing arrays properly and translating on display (with dblog). As far as I see, the compromises made are worth the gains: so that people can now have a single (English) language to log messages with syslog, and any language to review all of their messages with dblog.

I would appreciate reviews and encouragement, so we can step forward :)

Gábor Hojtsy’s picture

FileSize
55.6 KB

Whoops, I have gone against my own documentation at two places. When the log message is not something to translate, like a $_GET['q'] as used in the 'access denied' and 'page not found' error messages, NULL should be sent in the $variables parameter, so dblog knows that the string is not to be translated. Updated patch attached.

kbahey’s picture

Status: Needs review » Needs work

Gabor

Few comments:

1. We need an update_2008() in system.install so the watchdog table schema is upgraded with the new column.

2. The severity column needs to move after type and before the other text fields for the sake of performance.

3. We can change the watchdog() function signature to make the $variables parameter optional. This leaves only the first two parameters as required ($type and $message). This way modules/sites that don't use care about language can just send their message in and have it be AS IS.

So:

function watchdog($type, $message, $severity = WATCHDOG_NOTICE, $variables = array(), $link = NULL)

Also, why don't we take the plunge and just make the argument a keyed array. Every other part of Drupal is moving in that direction, so why not consider it for this too?

4. I also do not understand why we have + before and after in this patch:

RCS file: /cvs/drupal/drupal/modules/dblog/dblog.install,v
retrieving revision 1.1
diff -u -p -r1.1 dblog.install
--- modules/dblog/dblog.install	10 Apr 2007 10:10:27 -0000	1.1
+++ modules/dblog/dblog.install	12 Apr 2007 14:57:53 -0000
@@ -13,6 +13,7 @@ function dblog_install() {
         uid int NOT NULL default '0',
         type varchar(16) NOT NULL default '',
         message longtext NOT NULL,
+        variables longtext NOT NULL,
         severity tinyint unsigned NOT NULL default '0',
         link varchar(255) NOT NULL default '',
         location text NOT NULL,
@@ -30,6 +31,7 @@ function dblog_install() {
         uid int NOT NULL default '0',
         type varchar(16) NOT NULL default '',
         message text NOT NULL,
+        variables text NOT NULL,
         severity smallint_unsigned NOT NULL default '0',
         link varchar(255) NOT NULL default '',
         location text NOT NULL default '',

I will test it after you address the above.

Gábor Hojtsy’s picture

1. We need an update_2008() in system.install so the watchdog table schema is upgraded with the new column.

Fine with me. Unless the database model is decided, there is no point in doing that update (which is most error prone to fail when you apply a patch, given that possibly other changes will get sooner in then this).

2. The severity column needs to move after type and before the other text fields for the sake of performance.

I did not do anything with the severity column, it was there after the text column before (for quite some time). If you think moving that is required, it is still out of scope for this patch. I got too many objections on such side effects of my core patches before, so I would not include that here. Feel free to open a new issue.

3. We can change the watchdog() function signature to make the $variables parameter optional. This leaves only the first two parameters as required ($type and $message). This way modules/sites that don't use care about language can just send their message in and have it be AS IS.

So:

function watchdog($type, $message, $severity = WATCHDOG_NOTICE, $variables = array(), $link = NULL)

Also, why don't we take the plunge and just make the argument a keyed array. Every other part of Drupal is moving in that direction, so why not consider it for this too?

Erm? What do you mean? I have the $variables optional already, it is just not in the order you have shown here (but you did not object against that it seems). Only the first two are required. I don't think an associative array is right here. It was introduced at other places, where there were too many arguments, or arguments were often omitted in mixed places. Also we *are* going to parse the watchdog messages out of the Drupal code for translation, despite they being out of t() after this patch. We still need them for translation. If we introduce a "nice" keyed array here, it makes that extraction a lot more harder (if not impossible in many cases).

4. I also do not understand why we have + before and after in this patch:

This is a unified diff, everything included is diffed inline, there is no "before and after part" in there (if this is what you mean). The parts you cited are for mysql and pgsql respectively.

Gábor Hojtsy’s picture

For consistency, we can go on the 'callback' and 'arguments' storage route, to support log messages with format_plural(), as in throttle module. Menu module already takes this approach for menu item title translation (http://drupal.org/node/128082), so it seems logical to apply this for consistency here too. Question is where should we put that callback name into the watchdog() parameter list... Hm...

Gábor Hojtsy’s picture

Status: Needs work » Needs review

Because of security concerns with the menu approach I linked earlier, I am sticking with the simpler implementation. The attached patch now includes an update function too, so Khalid's concerns are addressed (except the performance concern he raised about the severity column which is AFAIS offtopic here). I hope he or someone else will find the time to test the patch.

Gábor Hojtsy’s picture

FileSize
56.39 KB

Doh, attached patch I mean...

Dries’s picture

I'm in support for this patch.

I think the benefit of this patch outweight the drawback (not being able to sort messages). I think this drawback can be overcome by implementing a vastly superior filtering system for the watchdog table. Also, if watchdog tables can't be sorted correctly, it probably makes sense not to make them sortable.

Should we document in the PHPdoc API why we are taking the arguments separately?

I'd like to see this patch committed, but it is probably worth thinking it through a bit more. What is going to be the impact on a filter mechanism? Things might become non-trivial.

Dries’s picture

See http://drupal.org/node/67893 for the filter discussion. It would be good to keep that issue in mind when developing this patch.

AmrMostafa’s picture

Great patch, I just tested on a recent HEAD and it works fine.

Gábor Hojtsy’s picture

New parth addressing Dries' concerns about documenting how $message and $variables interact, also removing sortability on the message column as he suggests. Rerolled against HEAD (which of course needed a conflict fix around the update code in system.install).

I hope we can get this patch in soon, as the (lack of) issues connected to the filtering patch are already cleaned up, I addressed all concerns here (except kbahey's suggestion about moving watchdog columns around, which is not related to localization).

Gábor Hojtsy’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
56.96 KB

New patch I mean, not parth...

Dries’s picture

Status: Reviewed & tested by the community » Needs work

Committed to CVS HEAD.

I'm marking this 'code needs work' until the upgrade documentation has been updated.

Feel free to mark this fixed once the handbook is up-to-date.

Thanks Gabor!

Gábor Hojtsy’s picture

Status: Needs work » Fixed

OK, updated http://drupal.org/node/114774#watchdog_logging to reflect our changes, and added http://drupal.org/node/114774#watchdog_parameters to document how things changed here, and what is the proper usage of watchdog().

Gábor Hojtsy’s picture

Assigned: Gábor Hojtsy » Unassigned
Category: task » bug
Status: Fixed » Postponed (maintainer needs more info)

Committed three more small fixes to core to handle NULL values in syslog as well as adding a missing array() in ping.module and fixing a typo in my phpdoc.

However I noticed an interesting (superflous?) feature of syslog module, which somewhat counters the language independentness of watchdog logging. We now have this code there:

function theme_syslog_format($entry) {
  global $base_url;

  $message = t('@base_url|@timestamp|@type|@uid|@ip|@request_uri|@referer_uri|@link|@message',
    array(
      '@base_url'    => $base_url,
      '@timestamp'   => $entry['timestamp'],
      '@type'        => $entry['type'],
      '@ip'          => $entry['ip'],
      '@request_uri' => $entry['request_uri'],
      '@referer_uri' => $entry['referer'],
      '@uid'         => $entry['user']->uid,
      '@link'        => strip_tags($entry['link']),
      // Keep message English, but replace variable components, if present.
      '@message'     => strip_tags(is_null($entry['variables']) ? $entry['message'] : strtr($entry['message'], $entry['variables'
])),
    ));
  return $message;
}

Why is t() used here? Is it really desired to have different log formats per language? What if I have four languages on my site, each with different log formats defined. Now depending on what language a page was displayed with, my syslog will have messages in different formats. Why is this feature in there? What good does that bring to us?

This is already a theme function, so the format can be customized on a site-by-site basis. Why would we need per-language modification?

Kbahey could probably enlighten us here.

AmrMostafa’s picture

FileSize
7.91 KB

Few localization-unfriendly watchdogs ;-)

kbahey’s picture

Status: Postponed (maintainer needs more info) » Needs work

Gabor,

Do we really need the third parameter if we are not translating, or if there are no tokens?

I mean can't we just do:

watchdog('user', 'User has logged in');

Or do we have to specify a NULL or array() on every call?

It would be better to default the third parameter and not force everyone to pass something, even if it is an empty array or a NULL.

kbahey’s picture

Gabor

The only reason syslog used t() was that this is a theme function for token substitution, not really a language translation.

The t() can be removed.

Gábor Hojtsy’s picture

Status: Needs work » Postponed (maintainer needs more info)

@alienbrain: Thanks for spotting these missing conversions, comitted to core! I don't know how I missed them myself...

@kbahey: I still don't know what code you are looking at. As I explained earlier, every patch I submitted here included what you suggest about $variables having a default empty array parameter, it was like this from the start, and it is like this in the comitted patch.

@kbahey: t() does language translation, not only token substitution, and that side effect is undersired here. I see you used @ placeholders, so they are all check_plain()-d. Is that required for syslog (or even desired?). I intentionally stepped special handling of @ ! and % over by using strtr() in the $message vars substitution because syslog output is supposed to end up in a character based log, doesn't it? We don't need HTML escaping or wrapping for values. Do we?

kbahey’s picture

Gabor

Re: your first point,

My comment was for the documentation here http://drupal.org/node/114774#watchdog_parameters

Specifically this part;

If you have a message which has no placeholders to replace with values, pass on an empty array(). If you have a dynamically generated message (a PHP error message, a message coming from a remote service or some other type of message, whose literal text cannot be included in the second parameter), please pass on NULL in place of the variables parameter, so Drupal knows that the message is not for translation.

And the examples:

// Translatable message, but no placeholders to replace
watchdog('soap', 'Soap message sent.', array());
// Not translatable message
watchdog('soap', $remote_soap_message, NULL);

Can we see the most used case and default the 3rd parameter to it (whether NULL or array()), so we don't have to pass it in every call?

Am I being clear?

Re: your second point for syslog theme function, the @ was used to strip out all HTML markup, since it is very unsightly (makes my eyes bleed) in syslog.

What I can do is clean up that function and use strip_tags() everywhere instead and remove all references to t(). How about that?

Gábor Hojtsy’s picture

Kbahey: sure, the update documentation was not clear enough. Now updated to this version:

If you have a message which has no placeholders to replace with values, pass on an empty array() - which is also the default value for the variable array, so you can omit it, if you have no more parameters to pass. If you have a dynamically generated message (a PHP error message, a message coming from a remote service or some other type of message, whose literal text cannot be included in the second parameter), please always pass on NULL in place of the variables parameter, so Drupal knows that the message is not for translation.

// Translatable message, no placeholders to replace, but watchdog type different from default
watchdog('soap', 'Soap message sent.', array(), WATCHDOG_DEBUG);
// Translatable message, no placeholders to replace, and default watchdog type used, so array() can be omitted.
watchdog('soap', 'Remote host seems to be slow.');
// Not translatable message. Always pass on NULL in these cases!
watchdog('soap', $remote_soap_message, NULL);

The "most used case" for the $variables parameter from core examples is an actual associative array (other values are very rare), but an empty is the default anyway, so the update documentation needed an update (not the code).

@ does not strip HTML but escapes it, so instead of <info@example.com> you get &lt;info@example.com&gt;, which I don't think is better. If you strip_tags(), you would loose this part of the log message entirely, which might not be a good idea. Contact module seems to have such messages, where important information might be lost.

watchdog('mail', '%name-from sent an e-mail regarding %category.', array('%name-from' => $form_values['name'] ." <$from>", '%category' => $contact->category));

(These kind of message details were already lost before my patch landed). True, maybe these can me modified to better suite syslog, or syslog can live with logging some HTML(-like stuff).

kbahey’s picture

Status: Postponed (maintainer needs more info) » Needs review

Goba

Re: syslog.

How about something like this? (not sure about the last segment about $message and $variables, need your input on that part).

Remember that we just need a decent default, not perfect nor all encompassing, since this function is themable and people can override it anyway they like.

function theme_syslog_format($entry) {
  global $base_url;

  $message  = $base_url;
  $message .= '|' . $entry['timestamp'];
  $message .= '|' . $entry['type'];
  $message .= '|' . $entry['ip'];
  $message .= '|' . $entry['request_uri'];
  $message .= '|' . $entry['referer'];
  $message .= '|' . $entry['user']->uid;
  $message .= '|' . strip_tags($entry['link']);
  $message .= '|' . strip_tags(is_null($entry['variables']) ? $entry['message'] : strtr($entry['message'], $entry['variables'])),

  return $message;
}
kbahey’s picture

I meant to say that I am not sure about the last line which contains "$entry['message'] and $entry['variables']" not $message and $variables.

Dries’s picture

syslog output can be displayed, processed and converted with a variety of tools. Syslog messages are often shown in broswer windows and have to be sanitized. See http://216.147.18.102/ams/ , http://www.dangermen.com/smt/screens.html, http://www.phpwizardry.com/php-syslog-ng.php or http://www.googlog.org/.

Gábor Hojtsy’s picture

Dries, are you sure *we* need to take care of HTML escaping? These log messages can appear in any environment, depending on what viewer you use to watch your syslog messages. Other software adding log messages into the syslog escapes for HTML based display? I would doubt.

Kbahey: we are getting there AFAIS. But strip_tags()-ing the message does remove possibly important stuff from the message, like the email address I have shown above.

AmrMostafa’s picture

FileSize
1.95 KB

This replaces t() with strtr(). It could be as well use implode() actually, but I feel strtr() is easier to read/customize.

And adds/uses _syslog_strip_html_tags() as a safe alternative for strip_tags().

Dries’s picture

I'm not saying we should escape the strings. I'm just saying that they could be used unescaped because I don't think syslog expect to see HTML code. Not sure what to do with it, but it seems best to send non-HTML text to syslog.

kbahey’s picture

syslog can be used in many different ways. The most common case is just to tail the log, or use less or similar to view it. Viewing it in a browser is not the only way (and I should say not the most common way either, having worked with monitoring systems before).

So, any markup in it (HTML or otherwise) will be distracting. Plain text is the most common way to view/process syslog.

Again, this is a themable function. We just provide a reasonable default. It can be overridden by anyone who does not like the way we process the text by a simple function in template.php or other means.

Alienbrain's patch is a step in the right direction. For the sake of consistency, @link should be treated the same as @message. The part I do not like about it is the _syslog_strip_html_tags(). Although it does the job in this case, it should be a generic function not specific to this module. Perhaps making it strip_html_tags() in one of the includes would be best?

Dries’s picture

I agree with kbahey: (1) let's strip the HTML and (2) the provided HTML stripper is not particularly elegant. Can't we use the built-in PHP functionality or something that is a bit more future proof?

kbahey’s picture

FileSize
3.7 KB

Here is a new patch.

It strips the HTML, and eliminates the t() altogether so we are in sync with Goba's previous patch.

it also fixes the few cases in core where we have the email enclosed in angled brackets, so that info does not get lost.

Gábor Hojtsy’s picture

Status: Needs review » Reviewed & tested by the community

I like this version, looked over the code, and it seems to be elegant and to the point.

kbahey’s picture

FileSize
3.7 KB

Except for the comma that needs to be a semicolon, that causes a PHP syntax error!

Bad Khalid, no biscuit ...

Here is a fixed version.

RobRoy’s picture

FileSize
3.69 KB

Just edited the patch to conform to code style. '|' . became '|'.

AmrMostafa’s picture

I've searched conrib and here is a list of modules that has tags which are not HTML in any of their watchdog messages, if this patch goes in, I will make sure to submit issues for them to change the syntax of their affected watchdog messages:

  • advuser
  • email
  • feedback
  • gojoingo
  • logintoboggan
  • user_import
  • welcome
Dries’s picture

Personally, I find the use of square brackets strange. The rest of the world use less than and greater than signs for e-mail address. This might even be a standard. Round brackets are better than square brackets, IMO.

Also, why don't we move those brackets in the string itself?

kbahey’s picture

FileSize
3.69 KB

Rerolled with parens as per Dries.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD. Thanks! :)

kbahey’s picture

Status: Fixed » Patch (to be ported)

This is not "to be ported" in the usual sense.

Rather, it is a reminder for alienbrain to open the appropriate issues for the modules mentioned in #46 above.

You may want to explicitly mention parenthesis instead of angled brackets.

Please mark as fixed after this is done.

Steven’s picture

Status: Patch (to be ported) » Needs work

The tag stripping does not make sense. Either syslog information is plain text, or it is HTML. In the latter case, we don't need to do anything. In the former case, we need to remove all tags and convert the rest to plain text.

This patch only removes tags, but doesn't actually convert to plain text. Needs work.

Why on earth is it so hard to get this right people?

Gábor Hojtsy’s picture

Steven, how would you define: "convert to plain text"? Probably an exact definition of this is missing to get this right. (A possible definition would be to convert to plain text, like project module converts HTML to send in email, but I am unsure you are interested in that for log messages.)

Gábor Hojtsy’s picture

Steven, can you elaborate?

recidive’s picture

I think what Steven is talking about is that previously strings was being passed with the '@' placeholder, which causes t() to format them using check_plain(). Also we currently don't strip out '|' characters, which could cause watchdog to break with messages containing this character.

Gábor Hojtsy’s picture

kbahey: can you look into fixing this?

kbahey’s picture

Status: Needs work » Needs review

How about this simple patch that replaces strip_tags with check_plain?

If you want it as an attachement, let me know.

Index: modules/syslog/syslog.module
===================================================================
RCS file: /cvs/drupal/drupal/modules/syslog/syslog.module,v
retrieving revision 1.9
diff -u -F^f -r1.9 syslog.module
--- modules/syslog/syslog.module        4 Jul 2007 21:35:24 -0000       1.9
+++ modules/syslog/syslog.module        2 Oct 2007 17:35:37 -0000
@@ -95,8 +95,8 @@ function theme_syslog_format($entry) {
   $message .= '|'. $entry['request_uri'];
   $message .= '|'. $entry['referer'];
   $message .= '|'. $entry['user']->uid;
-  $message .= '|'. strip_tags($entry['link']);
-  $message .= '|'. strip_tags(is_null($entry['variables']) ? $entry['message'] : strtr($entry['message'], $entry['variables']));
+  $message .= '|'. check_plain($entry['link']);
+  $message .= '|'. check_plain(is_null($entry['variables']) ? $entry['message'] : strtr($entry['message'], $entry['variables']));

   return $message;
 }
Gábor Hojtsy’s picture

Any of the referer, request URI and so on could contain a pipe (|), so those need some kind of "escaping" (not sure in what form).

Pancho’s picture

Even if it is late again in the release cycle, this time this bug must be fixed.

We are having an incredibly powerful new translation system in D6. Therefore D6 will be used a lot for multilingual sites. Imagine an administrator at the EU or U.N. How useful is for him a log with Basque, Estonian and Greek entries intermixed?

For this reason this bug is rather critical than normal IMHO.

Pancho’s picture

Priority: Normal » Critical

And yes, I set this to critical, because it does break functionality, namely the watchdog/syslog, for multilingual sites.

Gábor Hojtsy’s picture

Title: t() watchdog messages in display time, instead of at the time of logging » Syslog module's serialization of log components possibly broken
Priority: Critical » Normal

Pacho: first take some time to read up on what happened here. Watchdog() logs are in English and localized on display for quite some time now. The remaining issue here is in syslog module's serialization of the log components.

Pancho’s picture

@Gábor: I'm sorry, you're absolutely right. Maybe I'm being too bold today... I'm gonna take some days break :)
I just stumbled upon this issue, and was surprised that would be still not fixed. Sorry again...

AmrMostafa’s picture

Version: 6.x-dev » 7.x-dev
Component: language system » syslog.module
FileSize
1.77 KB

Patch attached. It provides the following improvements:

  • Message entities are decoded. That should address #51.
  • The | (pipe) is indeed not the best separator as it can exist in URLs. Every field is now double-quoted and a single whitespace is used as a separator. (more on why this was chosen below). That should address #57
  • Another new issue is that currently the $link parameter is passed as is after stripping its tags, so the link <a href="http://example.com/user/123">View user profile</a> will end up as plain text <code>View user profile in the syslog entry. The patch converts such links to View user profile: http://example.com/user/123.

On why a white space with double quoting as the separator, we needed something which can't exist in our values. We can discard the message as it comes last. That leaves mainly the URIs. URIs cannot have double quotes nor whitespaces, i.e. they are always encoded. Unfortunately the watchdog $type can contain whitespaces so we can't use them alone for separating, also, whitespaces can make it difficult to notice when a field has an empty value (i.e. when you are reading the line, there is no indicator that there is a field here with an empty value). Thus, we double-quote values and separate with a whitespace. There were other options, like > which can be used and is pretty safe, however is very bad readability wise. Syslogs are often read as is, not necessarily parsed.

As an example of the final output, the following URI:

http://pipeline/~alienbrain/drupal/?q=gimme some 404s&foo="@$bar"

Produces the following syslog entry:

Apr 18 21:06:16 pipeline drupal: "http://pipeline/~alienbrain/drupal" "1240081576" "page not found" "127.0.0.1" "http://pipeline/~alienbrain/drupal/?q=gimme%20some%20404s&foo=%22@$bar%22" "" "1" "" gimme some 404s
kbahey’s picture

Status: Needs review » Needs work

Looks good overall, except I am -1 on using space as a delimiter. The reason is that this does not lend itself to command line tools, e.g. awk ...etc. when you are looking for things in the syslog.

For example, to get a quick count of each type of errors I do:

awk -F'|' '{data[$3]++ } END {for (i in data) print data[i], i}' $SYSLOG | sort -n

Space would not allow these things. I can settle for semicolon or anything that is less likely to be in the strings of the messages.

AmrMostafa’s picture

The double-quoted whitespace-separated format is certainly slightly harder to digest from cli, but really, if we want to get this right then this is it. (Or we could use > but that looks horrible when read :)

Did you know that replacing -F'|' with -F'"(^"|" "|"$)' does the trick for the (rather nifty) command you showed? (You have to use $4 instead of $3 though, because with that field separate $1 will always be empty, so the first field is $2, the second is $3, and so on)

... and for PHP CLI: echo '"field one" "field two" "field three"' | php -r '$l=fgetcsv(STDIN, 1000, " "); var_dump($l);' prints:

array(3) {
  [0]=>
  string(9) "field one"
  [1]=>
  string(9) "field two"
  [2]=>
  string(11) "field three"
}
kbahey’s picture

I guess since this is a theme function, I can override it and use | as a delimited.

But it is still too cryptic as a default implementation (me thinks).

ilikepie’s picture

Status: Needs work » Needs review

#62: syslog_plaintext-76588.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, syslog_plaintext-76588.patch, failed testing.

  • Gábor Hojtsy committed 0a5ec93 on 8.3.x
    #76588: Small fixes after that patch landed  - typo fix in watchdog()...
  • Dries committed 2487034 on 8.3.x
    - Patch #76588 by kbahey and RobRoy: made the syslog deamon strip HTML...
  • Gábor Hojtsy committed 359ace3 on 8.3.x
    #76588 patch by alienbrain: some localization-unfriendly watchdog()...
  • Dries committed 9a6099c on 8.3.x
    - Patch #76588 by Gabor: made log messages translatable.  Yay.
    
    
  • Dries committed f73610b on 8.3.x
    - Patch #76588 by Gabor: made log messages translatable.  Yay.
    
    

  • Gábor Hojtsy committed 0a5ec93 on 8.3.x
    #76588: Small fixes after that patch landed  - typo fix in watchdog()...
  • Dries committed 2487034 on 8.3.x
    - Patch #76588 by kbahey and RobRoy: made the syslog deamon strip HTML...
  • Gábor Hojtsy committed 359ace3 on 8.3.x
    #76588 patch by alienbrain: some localization-unfriendly watchdog()...
  • Dries committed 9a6099c on 8.3.x
    - Patch #76588 by Gabor: made log messages translatable.  Yay.
    
    
  • Dries committed f73610b on 8.3.x
    - Patch #76588 by Gabor: made log messages translatable.  Yay.
    
    

  • Gábor Hojtsy committed 0a5ec93 on 8.4.x
    #76588: Small fixes after that patch landed  - typo fix in watchdog()...
  • Dries committed 2487034 on 8.4.x
    - Patch #76588 by kbahey and RobRoy: made the syslog deamon strip HTML...
  • Gábor Hojtsy committed 359ace3 on 8.4.x
    #76588 patch by alienbrain: some localization-unfriendly watchdog()...
  • Dries committed 9a6099c on 8.4.x
    - Patch #76588 by Gabor: made log messages translatable.  Yay.
    
    
  • Dries committed f73610b on 8.4.x
    - Patch #76588 by Gabor: made log messages translatable.  Yay.
    
    

  • Gábor Hojtsy committed 0a5ec93 on 8.4.x
    #76588: Small fixes after that patch landed  - typo fix in watchdog()...
  • Dries committed 2487034 on 8.4.x
    - Patch #76588 by kbahey and RobRoy: made the syslog deamon strip HTML...
  • Gábor Hojtsy committed 359ace3 on 8.4.x
    #76588 patch by alienbrain: some localization-unfriendly watchdog()...
  • Dries committed 9a6099c on 8.4.x
    - Patch #76588 by Gabor: made log messages translatable.  Yay.
    
    
  • Dries committed f73610b on 8.4.x
    - Patch #76588 by Gabor: made log messages translatable.  Yay.
    
    

  • Gábor Hojtsy committed 0a5ec93 on 9.1.x
    #76588: Small fixes after that patch landed  - typo fix in watchdog()...
  • Dries committed 2487034 on 9.1.x
    - Patch #76588 by kbahey and RobRoy: made the syslog deamon strip HTML...
  • Gábor Hojtsy committed 359ace3 on 9.1.x
    #76588 patch by alienbrain: some localization-unfriendly watchdog()...
  • Dries committed 9a6099c on 9.1.x
    - Patch #76588 by Gabor: made log messages translatable.  Yay.
    
    
  • Dries committed f73610b on 9.1.x
    - Patch #76588 by Gabor: made log messages translatable.  Yay.