Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Over the last two weeks, usage stats for all my projects have fallen dramatically. Doing some random sampling, it looks like this is not only affecting my projects.
For instance Advanced help had a 14 % drop compared to the previous week in the June 7th stats, and a 49 % drop compared to the previous week in the June 14th stats.
I don't believe such huge drops are because by the module (or Drupal itself) has become significantly less popular in just one week - so the most likely explanation is that this is caused by an error.
Project usage overview
Comment | File | Size | Author |
---|---|---|---|
#32 | Screen Shot 10-15-15 at 09.59 AM.PNG | 60.83 KB | pingwin4eg |
#23 | project-usage-overview-drupal.org_.png | 345.31 KB | derjochenmeyer |
#5 | 2015-06-22.10-23-43.png | 32.05 KB | DerekAhmedzai |
Comments
Comment #1
gisleMoving to right queue.
Comment #2
antongp CreditAttribution: antongp as a volunteer and at ADCI Solutions commentedChecked a few projects I maintain - each decreased for ~50%. Yep, looks it's some error, I don't believe Views, Token, ctools and other really lost for about half of installations.
Comment #3
sylus CreditAttribution: sylus commentedYeah noticed this as well for large distros such as panopoly, and commerce_kickstart.
Comment #4
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedTo add another data point, Drupal core doesn't have any usage stats at all listed from that time period (the most recent data displayed at https://www.drupal.org/project/usage/drupal is from May 31) so something does seem to be wrong...
Comment #5
DerekAhmedzai CreditAttribution: DerekAhmedzai commentedMy module (Fitvids) had a 10% drop too. I checked some other projects (Panels, Views) and they had dropped too.
I just checked today to see if it was fixed, now it's a 50% drop! What's going on?!
This needs to be sorted out asap, because if the numbers aren't accurate, then they are meaningless.
Comment #6
sylus CreditAttribution: sylus commentedIncreasing the priority. Ideally if usage statistics are incorrect we could just disable the metrics until they are working again. I believe there was a related issue about that but can't seem to find it now.
Comment #7
gislesylus wrote:
Maybe you're thinking about: #2270127: Show info about incorrect usage stats
Comment #8
basic CreditAttribution: basic at Drupal Association commentedWe've found the issue on our end, the update stats are processing incomplete log files, and isn't smart enough to notice changes in filesize to the incomplete files. We are going to be rewriting and reprocessing the usage stats to account for this. In the mean time usage stats will be broken and will take some time to catch up to the current day once the processing is fixed.
Comment #9
sylus CreditAttribution: sylus commentedI was curious about whether we have started to reindex the incomplete files. Already had a few emails about the usage drop for a few projects.
Thanks @gisle for the link we definitely should try to push that issue forward should this happen again.
Of course thank you very much for looking into this :)
Comment #10
basic CreditAttribution: basic at Drupal Association commentedI've begun indexing files starting @ June 10th this morning. We are looking at ~1.5 hours per day so it will take some time to catch up. Hopefully by tomorrow morning we've caught up with most of the processing and the stats start to come back to life a bit.
Comment #11
mlhess CreditAttribution: mlhess commentedMarking this fixed.
Comment #12
anrikun CreditAttribution: anrikun commentedLooks broken again...
Comment #13
MixologicThanks @Anrikun - we had another issue with our loghost, and were missing a couple of days of log data. That data has been pulled from our backup source and is processing now.
Comment #14
markhalliwellI appreciate the diligence and quick responses the Infra team has shown over this issue; it is quite a headache.
I also feel, however, that it is becoming increasingly obvious that the current method, in which these logs are being processed, is quite unreliable.
Is there or will there be any thought/action into providing a more stable system for parsing these logs? (i.e. placing proper contingencies for handling random errors and missing logs, etc.) The entire way these are being parsed (from my perspective) seems rather ambiguous and manual when something goes awry.
This has been an issue for over a year now (seemingly since D7/server upgrades maybe?), spanning several issues.
Isn't it time that this issue becomes properly fixed, instead of just slapping band-aids on it?
Comment #15
lolandese CreditAttribution: lolandese commentedPlease. And meanwhile #2270127: Show info about incorrect usage stats.
Comment #16
basic CreditAttribution: basic at Drupal Association commentedOf course, we've made about 6 changes over the last 6 months to fix each individual problem we've had with the usage stats. These changes included:
Since all of this has happened, we (specifically @Mixologic) have also begun work on removing the mongodb processing component which is inefficient and replacing it with awk and other gnu tools.
Comment #17
markhalliwellAwesome. Yeah, not dogging what y'all have done (I know it's been a lot), just wasn't sure exactly _what_ the plan was.
FWIW, I actually just visited https://www.drupal.org/project/usage/bootstrap to see current stats and it appears to be really drastic still?
24k drop and no data since 7/18?
Comment #18
MixologicThe usage statistics are a canary in the coal mine that is symptomatic of any number of failures elsewhere in the system. The stats are derived from our updates traffic which is a firehose of data - its responsible for about 75% of the bandwidth we use each month (about 12 TB or so ). In an effort to reduce costs and provide a more reliable updates service we had moved updates.drupal.org to a CDN in May of last year (to edgecast) - recently we've been transitioning to a different CDN (fastly) and therefore had to update the processing methodology. (we moved from rsyncing logs from edgecasts' servers, to having fastly communicate directly with our loghost). As with any change, there is a risk that new bugs will surface, that assumptions that were made in the former process no longer apply in the new process.
Each time the process has failed it has been for a different reason that was not previously anticipated. Each time we run into those failures we apply a proper fix so it does not happen again in exactly the same way.
This particular instance was a result of the loghost not responding to fastly's direct logging. Our monitoring and alerting was not tracking that process specifically, so we didn't know the loghost was non-responsive - that will be rectified today. In anticipation of this sort of contingency, we were already dual logging fastly logs to S3. Had we not anticipated that loghost might fail, we would not have backups of that data.
In any case, we have been looking at ways to improve the processing of these stats. Currently we end up with about 40 million records a day, which get moved to another server to process with drush -> mongodb -> drupal.org database. This process takes about 3-4 hours to run each day, and is a big reason why when something breaks (like loghost failing over 4th of july) it becomes difficult to catch up.
So I've began a rewrite of the process that removes mongodb from the equation (which was really only acting as a deduplicating key-value store). The rewrite gets back to some unix file processing roots - awk, cut, uniq -c and sort are going to be the tools we use to handle the fastly and edgecast log files. That part is done, and I was able to reformat every file from mid february to date. This reformat was able to process in about 6 hours - i.e. about a months worth of data an hour.
The next step is to do the deduplication and aggregation - Preliminary tests show that these processes take about 30 minutes per week's worth of data to run.
The next step will be to load those pre-aggregated count files into the drupal.org database - this is essentially taking the code thats already there, removing 90% of it, and changing one or two things around.
The final step is to ensure this process is running on jenkins.
Some additional value we're going to get out of this: we were discarding a *lot* of data that didnt perfectly match our release names. For example, drupal 8.0.0-dev is being reported to us, but drupal 8.0.x-dev is what is in the database - so we are not counting the 30,000 or so users who have a d8 dev install up and running. Same goes for virtually every development version of a module in contrib. So, we're going to see a good bump in the numbers that were always there.
Secondly, a feature was added a long time ago to provide stats on submodule useage. That data was being sent to drupal.org, but ignored, but now we've at least got it parsed and counted. When the priorities align, we'll be able to do something like this: https://www.drupal.org/node/1627676#comment-7683233 with it...
Anyhow, hope that helps - we have been focused on these and want to get away from cleanup efforts that take half a day when any little thing derails the update stats freight train. Thanks for being patient.
Comment #19
markhalliwellThanks @Mixologic! That is an awesome overview, definitely helps clarify quite a few questions (and answers some that haven't even been asked yet).
Like I said above, I definitely appreciate y'alls attentiveness to this issue. I wasn't suggesting otherwise.
From what I gathered in that reply, it sounds like y'all are still processing ~6mo worth of data. That will obviously take a bit, understood. One more question though: I'm assuming based on what you said above that the issue I described in #17 will automatically resolve itself once d.o's db stats have been re-imported with proper numbers, yes?
Comment #20
lolandese CreditAttribution: lolandese commentedThat is definitely news and should be made public to a wider audience than just the twenty-something followers of this issue. A slightly edited edition of #18 could go into Drupal News. Being informed makes inconveniences easier to bear. It is proven that travellers that know the reason of a train delay are less likely to complain about it. See also #2270127: Show info about incorrect usage stats.
That is good news.
Thanks for the efforts on this.
Comment #21
sylus CreditAttribution: sylus commentedLooks like all projects across the board have lost 60% of their usage again.
https://www.drupal.org/project/usage/views
https://www.drupal.org/project/usage/ctools
https://www.drupal.org/project/usage/bootstrap
Comment #22
mqanneh+1
Comment #23
derjochenmeyer CreditAttribution: derjochenmeyer at forward-media.de commentedProject usage (including Drupal core) has dropped 50%. This seems to be a bug not a task.
Comment #24
sylus CreditAttribution: sylus commentedThis has been happening off an on for a year and it is starting to get very frustrating that we cant seem to fix this.
Comment #25
drummWe're working on a new method for aggregating usage statistics as we fix this week's issue.
Comment #26
sylus CreditAttribution: sylus commentedThanks @drumm that is exciting news, appreciate it!
Comment #27
markhalliwellProgress update? Both for this latest issue, whatever it may be, as well as an overall update. It's been like this since for over a week now and it does seem like it's taking longer and longer to fix these errors when they do appear. Maybe that's just the side-effect of developing/implementing the "new method", idk.
I know this is probably like the last thing on y'alls list, but I have to be honest... it's not very re-assuring to consistently just hear "we're working on it" when it breaks.
Comment #28
drummComment #29
MixologicIf you'd like to follow along, the issue and code where this is being reworked is here: #2575425: Project side of new d.o usage processing method, which has been implemented on the following dev site:
https://mixologic-updatestats-drupal.redesign.devdrupal.org/project/usage (There is a spike drop in Feb as that was the earliest we had data, and only had a partial week, that partial week will not be used in production)
All data from Feb onwards has been reprocessed on that site utilizing the new method (which also properly accounts for -dev sites, which the former method did not).
There is one other big caveat as to why this has been "taking over a week". Earlier in this thread I said:
It turned out that pushing all of our log data from fastly to S3 once a day resulted in a silent API timeout from AWS that we unaware of. The loghost stopped responding again, and we had to rely on the S3 backups - except the S3 backups were only 80-85% complete as a result of the timeout. Additionally, we lost an entire day's worth of data the 20th of september as our S3 backups were only for 14 days, and it failed right at the start of drupalcon bcn, and everybody took time off after the con, so we missed the backup window there. - we've since extended that to 45 days of s3 + forever amazon glacier.
In other words, the update stats will not be accurate for the weeks of
Sep 19th-26th (missing data for the 20th)
Sep 27th- Oct 2nd
Oct 3rd-10th (using S3 backups from the 28th->6th with only about 80% data).
We're still working with fastly to get the backup log data sorted out (despite switching from saving every 24 hours to every 2 hours, we're still getting lost data).
Comment #30
MixologicThe new process is in place, and the stats have been updated.
Please let us know if there are any wild discrepancies and we can adjust the process further (in a new issue of course)
Comment #31
gregglesThanks, Mixologic! I reviewed a few projects and they look reasonably good.
Comment #32
pingwin4egThere is some trouble happened again. Or is there some work still in progress?
Please see this project's stats: https://www.drupal.org/project/usage/styleswitcher . Yesterday (after the fix was provided) the last 2 weeks' numbers was normal, they were more than 3 hundred usages in total, but now they are changed - they fell dramatically (such as before the fix).
Numbers of September 20, 2015 and earlier seems OK.
As far as I know previously stats were never recounted for earlier weeks. This latest fix recounted stats once again from Feb of this year (as far as I know again) and stats were OK (yesterday).
Comment #33
MixologicThanks @pingwin4eg, there was still a process configured to run from the old process and it overwrote the better stats. I've reran the correct process again, and shut off the old one so that wont happen again. Have a look again.
Comment #34
pingwin4egYes, stats seem OK now. Checked also other popular contribs - they OK too. Thank you!
Comment #35
mqannehIt happened again, the usage stats weren't updated for October 11, 2015, all projects have 0 reported installs for the last week.
check https://www.drupal.org/project/usage/facebook_comments_block
Comment #36
hass CreditAttribution: hass commentedThere is also a bug in "project" module stats. It has peaks of 10.000 and an install base of ~500.
Comment #37
markhalliwellEdit: found an existing issue: #2176757: Hide the last date row in project stats page when it has no values.
Comment #38
hass CreditAttribution: hass commentedHow hould this fix the project peaks that make these graph unreadable?
Comment #39
markhalliwellI linked and closed in response to #35 (which is what re-opened the issue), a separate issue.
@hass, the issue you brought up is also a separate issue. Create a new issue (as @Mixologic has asked people to do); that appears to be bad data for a single project and should be treated as such. It does not indicate that the entire process is broken, which affects all projects and what this issue has been about.
Comment #40
MixologicIt is not bad data, it is a bad definition of what "usage" really means. Peaks like that are sometimes created when we a lot of sites get created as part of some testing issue, or a class is learning drupal and they create tons of sites. We do not currently have a way to separate what is a "Real" site from a CI or testing site. Lest you think this is new, have a look at project stats on an old dev site with the old processing methodology - the spikes are there too: https://composer-drupal.redesign.devdrupal.org/project/usage/project
Someday, we will take the update stats data and turn it into a proper dataset - as it is right now, what we really have is a precomputed aggregate report that has some built in limitations - It is *only* data that is reported to us, which sometimes does not request everything (I'll see the same site ask for a slightly different group of modules every five minutes - always omitting a couple - i.e. not every request is getting through always.)
When we have it as a full dataset, we can add better rules such as "exclude sites who have only contacted us once" or "only count sites who we have been around longer than a month". Or, "how many drupal sites use *both* rules and context" - things like that where our reporting is bound to our limited definition of 'usage'.