Sometimes when users search not english keywords and search engine results page is not in utf-8 I have non utf-8 encoded referers, like http://www.yandex.ru/yandsearch?text=%E7%E0%E3%F0%F3%E7%EA%E0+Runtu+2.0

In this case there is no link at all at search engine referers page. The following code returns empty link (href is ok but ancor text is empty):

l($query_data[$engine['query_variable']], $r->url)

It may be solved by using href as ancor text too if call to l() returned empty link. In this case we may use this link to access search page and see original keywords.

A feature to recode referer to utf-8 to see keywords would be nice too, but I have no idea how to autodetect exact encoding. May be autodetect only utf-8 or not and introduce a default encoding option for non utf-8 encoding to convert from?

Comments

sin’s picture

I use this workaround to see non utf-8 referers on search engine referers page:

        $l = l($query_data[$engine['query_variable']], $r->url);
        $rows[$r->aid] = array(format_date($r->timestamp, 'small'), $url_data['host'], substr($l, -5) == '></a>' ? l('[search]', $r->url) : $l);
soxofaan’s picture

Title: Non utf-8 encoded referers results in emty link » Non utf-8 encoded referers results in empty link
Status: Active » Fixed

committed, thanks
http://drupal.org/cvs?commit=115830
http://drupal.org/cvs?commit=115832

Encoding autodetection is out of the scope of this simple module IMHO

Have you any idea what the encoding of "%E7%E0%E3%F0%F3%E7%EA%E0" is?
if I press search again on that yandex page the query changes to "%D0%B7%D0%B0%D0%B3%D1%80%D1%83%D0%B7%D0%BA%D0%B0", which is UTF8 encoding and displays nicely on the search engine referers page.

google.com does not know what to do with it: http://www.google.com/search?q=%E7%E0%E3%F0%F3%E7%EA%E0
google.ru however does it better: http://www.google.ru/search?q=%E7%E0%E3%F0%F3%E7%EA%E0 but if I press search again, I get "%D0%B7%D0%B0%D0%B3%D1%80%D1%83%D0%B7%D0%BA%D0%B0" again

sin’s picture

Thanks for a fix, works great.

>Have you any idea what the encoding of "%E7%E0%E3%F0%F3%E7%EA%E0" is?

I think it is russian Windows-1251.

I suspect non utf-8 pages are given by search engines to users having non utf-8 encoding set as default encoding in their browser. I have about 10% referers in Windows-1251 on russian sites.

sin’s picture

Ooops... looks like all Rambler referers are Windows-1251 encoded, I've got all [n/a] for Rambler. Yandex can be both utf8 or 1251. But this is another issue.

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.