Problem/Motivation

Search engine crawlers and bots are inflating the statistics.

Steps to reproduce

Proposed resolution

Device detector is able to detect bots. We could choose not to log visits from bots or add a column (tinyint) to track bots.

If we add a column, bot visitors can be excluded from reports, but we can also track which pages have been indexed by which bots.

Remaining tasks

User interface changes

API changes

Data model changes

New column.

+-----------------------------+-----------------------+------+-----+---------+----------------+
| Field                       | Type                  | Null | Key | Default | Extra          |
+-----------------------------+-----------------------+------+-----+---------+----------------+
| bot                         | tinyint(1)            | YES  |     | NULL    |                |
+-----------------------------+-----------------------+------+-----+---------+----------------+

Issue fork visitors-1087268

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

roseba’s picture

I second that suggestion. I have no interest in seeing all the bot activity and it fills up quickly with that.

Additionally, I would like to set it to ignore certain IPs.

roball’s picture

Title: Web Crawlers » Exclude robots from the statistics
Version: 6.x-0.32 » 7.x-2.0-alpha9
Component: Miscellaneous » Code
Category: Support request » Feature request
Issue summary: View changes

I am also interested in a way to exclude accesses by robots from the statistics. Seems that the module still does not support that, right? The original module author answered the feature request "how can I sort out Google and all the other search engines?" at #558306: Visitors or viewed pages? with "Not now. Current version does not support it.". Any suggestions on how to handle this?

Thanks.

bluegeek9’s picture

Version: 7.x-2.0-alpha9 » 8.x-2.x-dev
Status: Active » Postponed

There is not a reliable way to filter out web crawlers the way it is currently designed. The visit is logged as the list thing done after the HTML has been sent. The user agent is not a reliable way to filter results.

Having some javascript code attached to every page that performs an ajax request is a more reliable way of eliminating web crawlers; most web crawlers do not execute javascript. The javascript approach also allows reporting the OS, Browser, screen resolution, and other things not possible with the current server-side approach.

Refactoring would be a significant effort.

bluegeek9’s picture

This library, device detector, has a method for checking if it is a bot.

bluegeek9’s picture

Status: Postponed » Active
bluegeek9’s picture

Version: 8.x-2.x-dev » 7.x-1.x-dev
bluegeek9’s picture

Version: 7.x-1.x-dev » 8.x-2.x-dev
bluegeek9’s picture

bluegeek9’s picture

Status: Active » Needs review

  • bluegeek9 committed e2de6887 on 8.x-2.x
    Issue #1087268 by bluegeek9: Exclude robots from the statistics
    
bluegeek9’s picture

Status: Needs review » Fixed

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

bluegeek9’s picture

This issue was resolved. A new release, 8.x-2.17, will be made soon, May 31st.

This is an excellent opportunity to beta test the dev branch, and report any issues.

Contributors (5)

bluegeek9, roshni27, abhishek_gupta1, sarwan_verma, SandeepSingh199

Changelog

Issues: 32 issues resolved.

Changes since 8.x-2.16:

Bug

Feature

Task