Exclude robots from the statistics [#1087268]

Problem/Motivation

Search engine crawlers and bots are inflating the statistics.

Steps to reproduce

Proposed resolution

Device detector is able to detect bots. We could choose not to log visits from bots or add a column (tinyint) to track bots.

If we add a column, bot visitors can be excluded from reports, but we can also track which pages have been indexed by which bots.

Remaining tasks

User interface changes

API changes

Data model changes

New column.

+-----------------------------+-----------------------+------+-----+---------+----------------+
| Field                       | Type                  | Null | Key | Default | Extra          |
+-----------------------------+-----------------------+------+-----+---------+----------------+
| bot                         | tinyint(1)            | YES  |     | NULL    |                |
+-----------------------------+-----------------------+------+-----+---------+----------------+

Issue fork visitors-1087268

Show commands

Start within a Git clone of the project using the version control instructions.

Add & fetch this issue fork’s repository

Or, if you do not have SSH keys set up on git.drupalcode.org:

Add & fetch this issue fork’s repository

1087268-exclude-robots-from changes, plain diff MR !88
Check out this branch for the first time

Check out existing branch, if you already have it locally

About issue forks

Comments

Comment #1

roseba commented 11 October 2011 at 15:35

I second that suggestion. I have no interest in seeing all the bot activity and it fills up quickly with that.

Additionally, I would like to set it to ignore certain IPs.

Comment #2

roball commented 25 November 2015 at 17:47

Title:	Web Crawlers	» Exclude robots from the statistics
Version:	6.x-0.32	» 7.x-2.0-alpha9
Component:	Miscellaneous	» Code
Category:	Support request	» Feature request
Issue summary:	View changes

I am also interested in a way to exclude accesses by robots from the statistics. Seems that the module still does not support that, right? The original module author answered the feature request "how can I sort out Google and all the other search engines?" at #558306: Visitors or viewed pages? with "Not now. Current version does not support it.". Any suggestions on how to handle this?

Thanks.

Comment #3

bluegeek9 commented 27 July 2022 at 00:37

Version:	7.x-2.0-alpha9	» 8.x-2.x-dev
Status:	Active	» Postponed

There is not a reliable way to filter out web crawlers the way it is currently designed. The visit is logged as the list thing done after the HTML has been sent. The user agent is not a reliable way to filter results.

Having some javascript code attached to every page that performs an ajax request is a more reliable way of eliminating web crawlers; most web crawlers do not execute javascript. The javascript approach also allows reporting the OS, Browser, screen resolution, and other things not possible with the current server-side approach.

Refactoring would be a significant effort.

Comment #4

bluegeek9 commented 3 August 2022 at 17:53

This library, device detector, has a method for checking if it is a bot.

Comment #5

bluegeek9 commented 17 August 2022 at 15:07

Status:

Postponed

» Active

Comment #6

bluegeek9 commented 17 August 2022 at 15:07

Version:

8.x-2.x-dev

» 7.x-1.x-dev

Comment #7

bluegeek9 commented 28 July 2023 at 21:56

Version:

7.x-1.x-dev

» 8.x-2.x-dev

Comment #8

bluegeek9 commented 29 July 2023 at 20:34

Issue summary:	View changes
Related issues:		+#3369318: Log OS and device infomation - BrowserCap replacement

Comment #9

bluegeek9 commented 10 August 2023 at 20:11

Status:

Active

» Needs review

Comment #10

10 August 2023 at 23:05

bluegeek9 opened merge request !88

Comment #11

10 August 2023 at 23:10

bluegeek9 committed e2de6887 on 8.x-2.x

Issue #1087268 by bluegeek9: Exclude robots from the statistics

Comment #12

bluegeek9 commented 10 August 2023 at 23:10

Status:

Needs review

» Fixed

Comment #13

24 August 2023 at 23:14

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Comment #14

bluegeek9 commented 22 May 2024 at 15:46

This issue was resolved. A new release, 8.x-2.17, will be made soon, May 31st.

This is an excellent opportunity to beta test the dev branch, and report any issues.

Contributors (5)

bluegeek9, roshni27, abhishek_gupta1, sarwan_verma, SandeepSingh199

Changelog

Issues: 32 issues resolved.

Changes since 8.x-2.16:

Bug

#3432630: Remove Not rendering HTML
#3395217 by bluegeek9: On a fresh install of a Drupal site with this module included, we get dependency errors
#3400985 by bluegeek9: Site break when placed visitors module
#3413155 by abhishek_gupta1, bluegeek9: Give Default value to all $agent keys
#3383142 by bluegeek9: ParseError: syntax error, unexpected
#3380760: Warning: Undefined array key

Feature

#3376256: Page Performance Metrics
#3378568: Ajax Replace Report
#3376234: Route reports
#3378580: Browser Report(s)
#1087268 by bluegeek9: Exclude robots from the statistics
#3376233 by bluegeek9: View filter date range
#3369318: Log OS and device infomation - BrowserCap replacement
#3376235 by bluegeek9: Drush command: Download MaxMind database
#3250285 by bluegeek9: Performance: Add db indexes

Task

#3389685: Three different errors displayed
#3376397 by bluegeek9, roshni27: Remove block settings from visitors.config
#3376392: Move Visitors Report menu
#3444385: Module path
#3443031 by bluegeek9: Issues reported by PHPCS
#3392006 by bluegeek9, sarwan_verma: watchdog_exception() deprecated
#3423013: Deprecated function user_role_names()
#3423001 by bluegeek9: composer.json missing "repositories"
#3397326 by bluegeek9, SandeepSingh199: Issues reported by PHPStan
#3401384: Remove Drupal 9 from GitLab CI
#3393046 by bluegeek9: Code Coverage
#3377961: Replace visitors/hosts reports with views
#3377962: Replace visitors/hits with Views
#3377964: Replace /visitors/pages with Views
#3377960 by bluegeek9: Log visitor local time
#3376241: Drush Command: Rebuild Geo Location
#3377958: Tracking cookies

Exclude robots from the statistics

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Issue fork visitors-1087268

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Contributors (5)

Changelog

Bug

Feature

Task

Related issues

Referenced by

News items

Our community

Documentation

Drupal code base

Governance of community