Thanks for this super-helpful module!

Problem/Motivation

There are a couple of situations where you can end up making unnecessary requests via JS for fresh site alerts.

  1. If the workaround is being used and a timeout hasn't been configured, then in site-alert.js there's a call to loadAlert($('.site-alert', context)); that automatically sets a timeout to call itself again using drupalSettings.siteAlert.timeout - but that's not set, so the next request is made immediately, and schedules another request to be made immediately and so on, like a DOS attack.
  2. If Drupal.attachBehaviors() is called on .site-alert more than once it will cause a fresh update and a fresh setTimeout().

Proposed resolution

  1. Stop loadAlert() from automatically recalling itself after a timeout.
  2. Use jQuery once to only run the site alert behavior once.

I realise it's not ideal to fix multiple issues in one ticket, but there are some other minorish fixes:

  1. Attach behaviors to newly inserted markup.
  2. Add drupalSettings as an explicit dependency.
  3. Use var instead of let for compatibility. I can't find any examples of core using let or const, though I appreciate it's well supported these days.
  4. Use Drupal.url().

Remaining tasks

  1. The test SiteAlertCacheWorkaroundTest needs updating with this patch. The test schedules an alert for 2 seconds in the future, gets the page before that (ie without the alert) and then waits for the message to show without issuing a page refresh. I wonder if it only passes currently because the bug means it's continually issuing fresh requests? If not then presumably this patch is introducing a bug that needs to be addressed.

Comments

AndyF created an issue. See original summary.

andyf’s picture

Assigned: andyf » Unassigned
Status: Active » Needs review
StatusFileSize
new2.94 KB

Thanks!

Status: Needs review » Needs work

The last submitted patch, 2: 3154138-2.patch, failed testing. View results

andyf’s picture

Status: Needs work » Needs review
StatusFileSize
new3.86 KB
new792 bytes

Hmmmn, I'm not 100% sure what the test failure is down to. But it looks like despite the comment Refresh the page for a few seconds until the alert appears the test doesn't actually issue a refresh, it just waits. I wonder if when there were two AJAX requests, the second one was processed after 2 seconds which enabled the test to pass.

Testing the hypothesis by adding a delay and refresh.

Thanks!

andyf’s picture

StatusFileSize
new3.96 KB
new656 bytes

Only attach behaviors after the new content's inserted (: Also check for error.

jwilson3’s picture

Priority: Normal » Major
Status: Needs review » Reviewed & tested by the community

Patch in #5 is working for me in production. This is a critical error. Without the patch the site alert js ends up pinging continuously to the backend, basically causing a significant unusable slow down on the frontend, and what amounts to a denial of service on the backend.

For example, all resources being taken up to respond to the ajax request several times a second causes login actions of > 1 minute.

I'm marking this major because there is a workaround -- i.e., just don't use the scheduling aspect of the module OR set your block to ping the server ever 5 minutes. But unwitting site admins that are not aware of the problem could easily break their site.

pfrenssen’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs tests
-    // Refresh the page for a few seconds until the alert appears. Thanks to the
+    // Ensure enough time has passed and reload the page. Thanks to the
     // workaround this will work regardless of the fact that the page is cached.
+    sleep(2);
+    $this->drupalGet('<front>');

This change is not correct, reloading the front page invalidates this test.

The documentation is indeed confusing. What this does is that it will poll the DOM 10x per second for a period of 10 seconds to check if the alert appears. If this test fails and needs a refresh of the front page then this means that the patch is breaking the current functionality.

Can we get a test case also that proves that the JS code is loaded twice unnecessarily and that the fix solves it?

pfrenssen’s picture

andyf’s picture

Status: Needs work » Needs review

Thanks for the feedback @pfrenssen!

This change is not correct, reloading the front page invalidates this test.

I guess I don't understand what the test is doing then, would you possibly be able to expand how it actually verifies anything? From my limited understanding it seems it's meant to:

  1. schedule an alert for 2 seconds in the future
  2. get the page before the 2 seconds have elapsed, ie there are no messages
  3. wait to see if anything happens with no refreshes

But why would that ever work? The page load was before the scheduled time, I know there's a single AJAX update, but I wouldn't expect it to arrive a full 2 seconds later and so return the site alert. Sorry if I'm being slow, here. I wonder if actually it's just the bug that enables it to pass - it keeps issuing AJAX requests until it gets the update.

Thanks!

pfrenssen’s picture

I am sorry but I cannot look into this further right now. This is not affecting my paid work so this review is on my spare time. I unfortunately have less than 30 minutes per week to spend on reviewing patches and there are 20+ modules that I maintain. I manage this by prioritizing the ones that are RTBC.

At the moment this is not ready because the patch contains unrelated changes and there is no failing test that showcases the bug and the fix. Normally small unrelated changes are fine for me but this is JS code which I am not familiar with. My Drupal time budget doesn't include studying JS best practices.

I would suggest the best way forward would be to make separate issues for the JS best practices and the change that is required in the test. You can postpone this issue on those if you want.

I'm marking this major because there is a workaround -- i.e., just don't use the scheduling aspect of the module OR set your block to ping the server ever 5 minutes. But unwitting site admins that are not aware of the problem could easily break their site.

This is by design but I agree that we should turn off this behavior by default. The default option for new users should be to not do requests in the background. It is out of scope for this issue though. If you could make a feature request to default this behavior to off then we can investigate whether it can be done in a way that doesn't break B/C.

andyf’s picture

Status: Needs review » Needs work

Thanks once again @pfrenssen! Moving back to Needs work. I too don't have extra time to spend on this, so hopefully someone will be able to step up. Thanks!

andyf’s picture

Issue summary: View changes

IS update with my thoughts about the test, thanks!

jwilson3’s picture

This is by design but I agree that we should turn off this behavior by default.

I don't think it's by design that a valid configuration of the site alert has an unintended consequence as significant as this. I too, and all of us here, have limited time on the issue queues.

I agree that the patch should be reworked to remove the unrelated changes so that A) its easier to review and B) doesn't accidentally trigger another error, and I should have said as much before RTBCing this. For that I apologize, I've been using this module for years, and was part of the team that originally wrote the module with Ccers and Nicolo. Since I thought they were still maintaining they would be find committing the changes here. Thank you for clarifying your position on this.

jwilson3’s picture

From issue summary:

I wonder if [the test in SiteAlertCacheWorkaroundTest] only passes currently because the bug means it's continually issuing fresh requests? If not then presumably this patch is introducing a bug that needs to be addressed.

I think we all agree that it is critical that there be a test that identifies this bug. Whether it is SiteAlertCacheWorkaroundTest that is broken or whether we need another new test, I cannot entirely say. It looks like the hardest part of this issue is figuring out whether SiteAlertCacheWorkaround is "broken" (and therefore somehow hiding this bug in the wild) or not.

pfrenssen’s picture

I think we all agree that it is critical that there be a test that identifies this bug. Whether it is SiteAlertCacheWorkaroundTest that is broken or whether we need another new test, I cannot entirely say. It looks like the hardest part of this issue is figuring out whether SiteAlertCacheWorkaround is "broken" (and therefore somehow hiding this bug in the wild) or not.

I think this is a very good idea. I just read the test in full and it seems to me that there is indeed a problem in it. We need to investigate this.

At the start of the test this sets the timeout to 0:

      // Disable the timeout. Under normal circumstances this would disable AJAX
      // calls completely, but due to the caching bug we still need to load
      // alerts using JS.
      'timeout' => 0,

The comment explains that alerts need to be loaded using JS, but if I understand this correctly, this should happen only one single time. The timeout should not be set, and after the initial retrieval of the currently active alerts, no further requests should be made.

But this section that was changed in the patch for this issue seems to conflict with this:

    // Load the front page. The scheduled alert should not be visible yet.
    $this->drupalGet('<front>');
    $this->assertSiteAlertNotVisible('Scheduled alert');

    // Since the workaround is in effect, the JavaScript code that is
    // responsible for refreshing the alerts should be loaded in the page.
    $this->assertJavaScriptPresent();

    // Refresh the page for a few seconds until the alert appears. Thanks to the
    // workaround this will work regardless of the fact that the page is cached.
    $this->assertSiteAlertAppears('Scheduled alert');

This last assert seems wrong, it seems to me this should be "Wait for a few seconds and check that the new alert DOES NOT appear." For the coverage to be complete, it should then reload the page and check that the alert now appears correctly.

But to make this even more complicated, at the end of the SiteAlertTimeoutTest::testSiteAlertTimeoutsDisabled() there is a check that the JS code is not included in case that the timeout is set to 0. This conflicts with the SiteAlertCacheWorkaroundTest which tests that the JS should be there. Does this mean that both these tests are broken?

pfrenssen’s picture

OK so the difference between the two tests is in whether there are alerts scheduled in the future. This lines up with the comment in #6.

Unfortunately I cannot work on this any more right now, but if anyone can confirm that the way to replicate this bug is as follows then this would be very much appreciated:

* Set the timeout in the site alert block to 0.
* Create a site alert which is scheduled to appear in the future.
* Clear caches and load the frontpage.
* Check in the browser developer toolbar that there are continuous requests happening.

jwilson3’s picture

I can confirm that this ^ is indeed how you replicate the bug. In step 2, the site alert doesn't have to be scheduled to appear in the future -- the bug is also triggered if the alert is schedule with a start and end date and is currently active. I haven't tested whether past / historical / expired scheduled alerts also trigger this, but I assume they do.

I might recommend leaving the test that sets 'timeout' => 0 but then add another test that doesn't set a timeout, which should default to 0, I think this might be where the bug lies.

rpsu’s picture

Status: Needs work » Needs review
StatusFileSize
new3.77 KB

updated patch from #5 against current HEAD (failed due to updated comment in tests/src/FunctionalJavascript/SiteAlertCacheWorkaroundTest.php

pfrenssen’s picture

Priority: Major » Critical

Thanks for the confirmation. Raising this to critical since it can cause very serious problems which can be triggered unexpectedly during normal usage.

I am in favor of getting a quick fix in for this and get a new release out ASAP. We can handle the tests and proper fix in a followup.

Is there a way we can get this fixed in as little lines of code as possible? For example to do a check if the timeout is lower than 1000ms in JS, and setting it to 1000ms if this is the case.

I'm asking because the larger the patch is the lower the chance I will have availability in my spare time to review and commit.

pfrenssen’s picture

Assigned: Unassigned » pfrenssen

I'm going to take a look.

pfrenssen’s picture

Issue tags: -Needs tests
StatusFileSize
new2.72 KB
new3.04 KB
new2.72 KB

Provided a test that shows that unexpected JS requests are taking place when the timeout is set to 0 and a scheduled alert is present. The patch fixes the problem correctly.

While the tests are running I will review the JS code to the best of my ability.

Status: Needs review » Needs work

The last submitted patch, 21: 3154138-21.patch, failed testing. View results

pfrenssen’s picture

Status: Needs work » Needs review
StatusFileSize
new3.49 KB

Here is a minimal fix, just adding a check that the timeout is not 0 is enough. jQuery once does not seem to be needed.

I looked into the JS changes but I read conflicting advice on whether to use "var" or "let" and I did not feel qualified to make a good judgement without doing more reading. I would propose to do the JS refactoring in a followup issue. Let's only fix the issue at hand here.

  • pfrenssen committed f63eb0b on 8.x-1.x authored by AndyF
    Issue #3154138 by AndyF, jwilson3, pfrenssen, rpsu: Large number of AJAX...
pfrenssen’s picture

Title: Don't make unnecessary AJAX requests » Large number of AJAX requests when scheduling an alert with a timeout of 0 seconds
Assigned: pfrenssen » Unassigned
Status: Needs review » Fixed

Thanks all for the bug report, patches and advice. I will immediately issue a new release that includes this fix.

My apologies for the slow reaction, I did not initially recognize the severity of the issue.

@AndyF if you feel like opening a followup issue to modernize our JS code and apply best practices, this would be very much appreciated!

jwilson3’s picture

👍thanks Pieter

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.