Hello, I installed Boost module at my russian web-site and switched it on. But all the cached pages contain incorrect charset charactes. I checked the files in "cache" directory and found out that they are all saved in "win1251" charset instead of "utf-8". Is this a bug? Please help! I spent a LOT of time trying to fix it. If you need additional information concerning my Drupal installation I can give it.

CommentFileSizeAuthor
#11 boost-601698.patch1.87 KBmikeytown2
#7 boost-601698.patch2.64 KBmikeytown2

Comments

mikeytown2’s picture

Was this a problem with Boost 1.03, or is this a new install? You want to save in utf-8 correct?

Is there any way to detect win-1251?

Possible Solutions Today:
http://php.net/utf8-encode before saving file
http://php.net/iconv to convert it
http://php.net/utf8-encode#74905 - Example

Possible Solutions With PHP 6.0:
http://php.net/stream-encoding
http://php.net/file-put-contents with the FILE_TEXT

mikeytown2’s picture

You can play around with what is written to the cache by implementing this hook
hook_boost_preprocess()

/**
 * Edit document before it is put into the boost cache.
 *
 * This hook is run at right before the page is cached by boost.
 *
 * $GLOBALS['_boost_cache_this'] and $GLOBALS['_boost_router_item'] are useful.
 * set $GLOBALS['_boost_cache_this'] = FALSE if you wish to not cache this page.
 *
 * @param $path
 *   URL path of the document
 * @param $data
 *   String containing the data
 * @param $extension
 *   file extension type. Use to detect what type of document your operating on.
 * @return
 *   $data string containing the document
 */
function hook_boost_preprocess($path, $data, $extension) {
  return $data;
}
Martynov-1’s picture

Thank you for reply. Here are the answers to your questions:
1) I used 6.x-1.11 version of Boost.
2) yes, I want to save the cached files in utf-8.
3) Sorry, but I don't understand this question. What exactly do you mean?
Maybe I am wrong, but I think that the problem happens exactly in the moment of writing the data to the server (not at the moment of writing the data to the cache). And maybe this is the problem of server setup. Which php function is used to write the files to the server? Maybe Boost can save the files in specific charset (utf-8)? Can I force Boost to do it?

Martynov-1’s picture

By the way, what function name should I use in my theme template.php file to override hook_boost_preprocess function of Boost module - hook_boost_preprocess, mytheme_hook_boost_preprocess or mytheme_boost_preprocess?

mikeytown2’s picture

mytheme_boost_preprocess() should do the trick

PHP6 allows for saving files in a specific charset. Boost uses file_put_contents() to save the cached file.

What if I make the default so Boost always saves in utf-8. Could you see any issues with that?

Martynov-1’s picture

Hello, mikeytown2! I am almost sure this will solve my problem. Please make this improvement in the Boost module. I hope it will not take a lot of your time. I look forward to hearing from you about the new version or patch including this new feature. Thank you.

mikeytown2’s picture

Status: Active » Needs review
StatusFileSize
new2.64 KB
Martynov-1’s picture

Priority: Critical » Normal
Status: Needs review » Active

Hello, mikeytown2. I installed your patch manually and tested it. Unfortunatently it didn't help. All the files are still saved in win-1251 charset. I phoned to my hosting support. They adviced me to add "AddDefaultCharset Off" line to my .htaccess file, and this solved the problem for now. Although I hope you will find better solution.

mikeytown2’s picture

http://httpd.apache.org/docs/trunk/mod/core.html#adddefaultcharset

The actual file is not the problem, its your webserver. It's configured to send out win-1251 by default. By turning this off it solves your issue. You could also do this and it would have the same effect.

AddDefaultCharset utf-8 

This is something that I might want to add to the default .htaccess rules, but I'm not 100% certain.

mikeytown2’s picture

Title: Cached pages contain incorrect charset characters » Hosting Issue: Apache serving pages not as utf-8
Status: Active » Needs review

Could you let me know if adding this to your htaccess file fixes the issue?

AddDefaultCharset utf-8 
mikeytown2’s picture

StatusFileSize
new1.87 KB

Code patch for above proposed change

mikeytown2’s picture

Status: Needs review » Fixed

committed

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.