Sometimes when users search not english keywords and search engine results page is not in utf-8 I have non utf-8 encoded referers, like http://www.yandex.ru/yandsearch?text=%E7%E0%E3%F0%F3%E7%EA%E0+Runtu+2.0
In this case there is no link at all at search engine referers page. The following code returns empty link (href is ok but ancor text is empty):
l($query_data[$engine['query_variable']], $r->url)
It may be solved by using href as ancor text too if call to l() returned empty link. In this case we may use this link to access search page and see original keywords.
A feature to recode referer to utf-8 to see keywords would be nice too, but I have no idea how to autodetect exact encoding. May be autodetect only utf-8 or not and introduce a default encoding option for non utf-8 encoding to convert from?
Comments
Comment #1
sin commentedI use this workaround to see non utf-8 referers on search engine referers page:
Comment #2
soxofaan commentedcommitted, thanks
http://drupal.org/cvs?commit=115830
http://drupal.org/cvs?commit=115832
Encoding autodetection is out of the scope of this simple module IMHO
Have you any idea what the encoding of "%E7%E0%E3%F0%F3%E7%EA%E0" is?
if I press search again on that yandex page the query changes to "%D0%B7%D0%B0%D0%B3%D1%80%D1%83%D0%B7%D0%BA%D0%B0", which is UTF8 encoding and displays nicely on the search engine referers page.
google.com does not know what to do with it: http://www.google.com/search?q=%E7%E0%E3%F0%F3%E7%EA%E0
google.ru however does it better: http://www.google.ru/search?q=%E7%E0%E3%F0%F3%E7%EA%E0 but if I press search again, I get "%D0%B7%D0%B0%D0%B3%D1%80%D1%83%D0%B7%D0%BA%D0%B0" again
Comment #3
sin commentedThanks for a fix, works great.
>Have you any idea what the encoding of "%E7%E0%E3%F0%F3%E7%EA%E0" is?
I think it is russian Windows-1251.
I suspect non utf-8 pages are given by search engines to users having non utf-8 encoding set as default encoding in their browser. I have about 10% referers in Windows-1251 on russian sites.
Comment #4
sin commentedOoops... looks like all Rambler referers are Windows-1251 encoded, I've got all [n/a] for Rambler. Yandex can be both utf8 or 1251. But this is another issue.
Comment #5
Anonymous (not verified) commentedAutomatically closed -- issue fixed for two weeks with no activity.