Problem/Motivation

I'm working on a very large site for a newspaper with lots of historical content (> 100000 nodes) and found two important limitations when attempting to create a "most read" news block. I will explain the first one on this thread and open a separate one for the other:

1. The way the module is currently querying the Google Analytics API makes very hard for large sites to retrieve data without causing a timeout error, especially if the linked profile was created a long time ago and has lots of historical entries. For instance, I kept getting the following error even though it was clear that the GA profile was correctly authorized:

Problem fetching data from Google Analytics: Code: -1 - Error: request timed out - Message: . Did you authenticate any Google Analytics profile? See here.

Since the module only allows to set a minimum of 1000 entries each time the cron runs, I was not able to reduce even more the number of requested records to avoid this error. Finally I had to ask the person that manages our Analytics account to create a new View (Profile) with additional filters to limit the number of pages tracked... And in consequence I also lost the capacity to retrieve all the historical data before that new Profile was created. However, with the new profile the timeout error did finally stop.

I think the main reason we kept getting those errors was that the start-date parameter that the module sends to GA is hard-coded in google_analytics_counter_data.inc, line 47:

 // The earliest valid start-date for Google Analytics is 2005-01-01.
  $request = array(
    'dimensions' => array('ga:pagePath'), // date would not be necessary for totals, but we also calculate stats of views per day, so we need it
    'metrics' => array('ga:pageviews'),
    'start_date' => strtotime('2005-01-01'),
      'end_date' => strtotime('tomorrow'),

Proposed resolution

The start date should be configurable by the user, because in some cases getting the complete range of data since 2005 is not necessary or relevant for the context in which the counter will be used. In my case, for example, accessing the information from the last month was more than enough to create a "most read" block, since news that are too old are just not relevant for the readers anymore.

Also, the module should allow values lower that the default 1000 for the "Number of items to fetch from Google Analytics in one request" setting.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Anonymous’s picture

Version: 7.x-2.1 » 7.x-2.x-dev
Status: Active » Needs review

I think it's a good idea to give the possibility to users to enter a start date.
This is my first patch, it should work.. but now it makes use of the Date API...
I hope I've helped.

Anonymous’s picture

Anonymous’s picture

Ok i have added some improvements ..now you can choose if set a fixed or variable date.
A variable date is when you want to get only the data of the last month, or the last day, and this changes over the time.
A fixed or static date instead is inserted through the Date API module and it is a custom date which don't changes over the time.
I'm not a expert, and probably this patch needs a lot of improvements ... but it could be a good starting point, on my server seems to work.

kscheirer’s picture

Yeah, the patch needs some improvement, but the idea looks good. I didn't see anything that addresses

the module should allow values lower that the default 1000 for the "Number of items to fetch from Google Analytics in one request" setting.

Just code for altering the start date.

mavimo’s picture

Patch refactoring and some coding standard fixes.

lzimmerman’s picture

Awesome - this patch code seems to be working well in 7.x-3.x-dev also.

Vacilando’s picture

Status: Needs review » Needs work

Looks interesting, thanks.

Could someone please re-roll this for 7.x-3.x-dev.
While doing that, variable "ga_counter_advanced_date_checkbox" should be named "google_analytics_counter_advanced_date_checkbox".

Anonymous’s picture

But we need to validate the correct date in "GA Fixed Starting Date" so that it can't go over today.

Anonymous’s picture

Version: 7.x-2.x-dev » 7.x-3.x-dev
Anonymous’s picture

Status: Needs work » Needs review
FileSize
7.4 KB

ok i added some small visual enhancements(now the advanced settings panel is always open and visible if checkbox is active ), and i added the settings form validation for the "GA Fixed Starting Date" so that it can not go over today and cause bugs.

Vacilando’s picture

Status: Needs review » Needs work

Tested on a few systems, looks OK. A nice solution with the predefined periods plus an override.

1) The change I recommend is to provide an option for "All time" and have that selected by default. The reason is that when people would update to a version with this patch their option would change from the expected whole history to "Last year". Better to have and give them the option of "All time", and they change it if they want.

Two other small comments:
2) Missing space in "Starting Datequery below"
3) Don't capitalize titles -- it does not match the rest of the module. E.g. "Last Year" -> "Last year", "GA Variable Starting Date" -> "GA variable starting date", ...

Vacilando’s picture

cthshabel’s picture

First off, the docs on this module are extremely impressive. Whoever wrote the front page, thank you so much. Great information everywhere.

I am in the process of trying to use something like this start date.

Is there any plan to allow multiple start dates? So we could display blocks similar to other options with Statistic modules.

Examples: Views today, Views yesterday, Views this week, Views last week, Views this month, Views this year, All views

Is there a way someone could create hooks or something so we can define start dates per block based on a separate variable in the function call?

Thanks so much for everything.

bibo’s picture

Status: Needs work » Needs review
FileSize
6.73 KB

Ahum, the patch doesnt work with current 3.1 module version or latest DEV, which are required for Google setups nowadays (oAuth2 etc).

I just rerolled the patch in #10 above so it works with latest dev. I havent had to test it yet, but posting it here.

wundo’s picture

I've tested and it works on my environment

wundo’s picture

bibo’s picture

The previous patch also worked for me, but here is still an updated version with one addition: not deleting and redirect uri variable during authentication process. Im not sure if that has been causing problems for others, but for our use case it seemed to help the case of authentication + staying authenticated.

PS: Also, some tips to anyone have trouble with authentication on one or multiple sites:
- Its best to leave some or all gac variables out of features to avoid trouble
- the redirect/hostname is best set to same protocoll as you are accessing the page
- for me the hostname with just http://domain.com without any protocoll is what always worked. But only after I made sure secure_pages and other modeles werent creating redirects based on protocol and url.

ooystein’s picture

I have used the patch in #14 on several sites for 3 months now with no issues. This patch makes this module a lot more useful! Getting the statistics for a user specified period instead of being locked down to the full Google Analytics history makes it possible to use this module in new and better ways.

As for the removal of variable_del('google_analytics_counter_redirect_uri'); in the #17 patch I can not say I have experienced the same problems with authentication or staying authenticated and don't think keeping that variable will change anything. But I guess it doesn't create any problems by keeping it either.

  • Vacilando committed 02173c3 on 7.x-3.x authored by bibo
    Issue #2103199 by hanser, bibo, mavimo, Vacilando: Scalability issues in...
Vacilando’s picture

Status: Reviewed & tested by the community » Fixed

The patch had a number of issues (some of them I had reported in #11 above):
* Broke upgrade path for existing users in the sense that it would set everybody from all stats to stats based on variable start date. Instead I made the fixed date starting 2005-01-01 (=launch of Google Analytics) the default, so current users will not have any surprises.
* Help text, language and formatting issues. Especially precision about the variable time periods -- e.g. "Last month" can be understood as "preceding calendar month" or "last 31 days"; I opted for the latter kind of description.
* For further clarity, I've visually disabled the variable date field when it is overriden.
* Etc.

I've fixed all of these and did tests of both variable and fixed start dates.

The code has just been committed to 7.x-3.x.

Please test all options and report before I tag a new stable release!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Vacilando’s picture

OK, pushed to 7.x-3.3.

Vacilando’s picture