Currently, this module will automatically update the static page of a node once this node is updated or delete it after the node is deleted.
It would be great if all other node pages referenced will be updated as well.
| Comment | File | Size | Author |
|---|---|---|---|
| #8 | static_generator-3024169-8.patch | 10.94 KB | mingsong |
Comments
Comment #2
captaindav commentedThere are a lot of places a node can be referenced from, e.g. entity reference field on the node, embedded in the body, etc. Were you mainly concerned with entity references embedded in the body? You could easily create a custom module that uses the same hooks as the staticgenerator.module file (entity add/update/delete). In the hooks you could get the node id's from the body field and then issue a \Drupal::service('static_generator')->generatePage('/path/to/page') command for each referenced node id. In the near future I will be adding \Drupal::service('static_generator')->generatePages(array_of_node_ids_to_generate) so you can generate all of them at once, this will be faster.
Comment #3
mingsongThank @captaindav for your suggestion.
Yes, totally agree. There are some potentially performance issues if we update all relevant contents or pages.
I developed a custom module that provides a new Drupal console command which allows us to update contents using the HTTP header 'X-Drupal-Dynamic-Cache' created by Drupal 8 dynamic page cache module (https://www.drupal.org/docs/8/core/modules/dynamic-page-cache/overview) to sort out if a page is changed since the static page was generated.
The logical behind this content (node) pages update command is that we go through all published content pages and check if the 'X-Drupal-Dynamic-Cache' in the header is HIT. If so, we just ignore this content page since it hasn't been changed in the Drupal cache. Otherwise, we need to generate a new static HTML page for this content as it has been changed.
The updating node's pages feature is pretty critical for my project in which there are more than 100K pages on our site and we have to minimise the number of static pages re-generated once a content is updated and also keep the entire static site up to date, not only the updated node page, but also all other related pages.
Comment #4
captaindav commentedThat sounds like a great way to determine which nodes to generate. Do you think this module (Static Generator) could determine which pages to generate using that method? I don't understand the use of HIT, on a large sites lots of pages are getting hits on the CMS, no telling if that value is set? Of course this does not address non-node entities e.g. media or taxonomy, which also need to be generated when changed.
Originally I wanted to generate the pages by getting their markup from either the page cache or dynamic page cache, rather than using the Core request/rendering services. However I abandoned that idea, as their wasn't a clear way to get complete pages (as sent to the browser) from the page cache or dynamic page cache, and be certain all of the meta data etc on the page would be correct. In fact, using a Core SUB_REQUEST is very fast but I am hitting some rendering bugs, so I am having to use Drupal Core's Guzzle implementation instead.
Couldn't your method be used with Static Generator with some sort of hook or event that fires whenever a page / dynamic page cache entry is created/updated/deleted? That way changes to the cache would result in immediate page generation, which should work for other entity types too like media or taxonomy. Of course the other entity types do not use the page / dynamic page cache, they use the render cache.
Comment #5
mingsongI upload the custom module to https://www.drupal.org/sandbox/amds/3025199
As there are many works needed to be done, this is just a sandbox project at an early stage.
I am thinking in the same way as your original thought, which is to get completed page from Drupal cache. Unfortunately, I didn't work it out. So I borrow your idea getting the markup from a Core request and then fetch the response. I think we are facing the same issue here.
The only different from your method is that my module checks the 'X-Drupal-Dynamic-Cache' in the response HTTP header to determine if the node page has been changed. If so, go ahead to generate a new static file for it. Otherwise, ignore this page. That would save a lots of file I/O for a big site. I think it still too heavy for the entity update hook.
I think you just raised a brilliant idea regarding the cache entry event. My understanding is that Drupal 8 introduced two cache modules , Internal Page Cache (For anonymous user only) and Dynamic Page Cache (for all users).
The Internal Page Cache (page_cache) module uses Drupal\page_cache\StackMiddleware\PageCache inherited from Symfony\Component\HttpKernel\HttpKernelInterface to cache the response for a page the first time requested. Until a node is updated or the cache is cleared, the stored response for a page won't never be invalidated, unless there is an Expires header with the request. A cache ID for a page is 'URL:html', for instance 'http://local-d8.net/node/12:html'. I think we can get the markup for a node page from the cache rather than calling a Core request which is heavier. We also can easily invalid the cache for a page once we know it should be updated. The question is that how to get all referenced pages once a node is updated.
The Dynamic Page Cache uses a different approach. It is using an event subscriber (Drupal\dynamic_page_cache\EventSubscriber) to handle the KernelEvents::REQUEST and KernelEvents::RESPONSE events. It relies on render cache tag to maintain all cached render arrays up to date. A cache tag for node 5 is 'node:5' which will be invalidated whenever the node is changed. Is there an event that will be fired once an cache tag is invalidated? If so, I think we can work out a efficient way to generate static files only for updated entities.
Comment #6
mingsongI figure out an efficient way to do so.
Basically, when we generate a page via the Core method (not Guzzle), all cached tags in that page will be returned with the response object.
We need to store the cached tags for each generated page in a custom database table (static_generated_files), and then while a node or block is updated, we are able to easily work out which pages we need to update by searching cached tags of that page in the database table.
Patch based on the Dev version released on 2019-01-23 is attached.
Regarding Guzzle, we still can get the cached tags via the response header of 'X-Drupal-Cache-Tags'. But we have to enable the http.response.debug_cacheability_headers in services.yml. Otherwise, there won't be cache tags in the response header.
Comment #7
mingsongComment #8
mingsongPatch based on Dev version of 2019-01-27
Comment #9
captaindav commentedThis looks like great progress, I will try and incorporate the patch this week. Sorry I have been very busy on another project and can't get to these right away.
Have you any knowledge of why the Core rendering does not work for some pages? It would be great if that could be figured out, as the Core rendering is much faster than Guzzle.
Comment #10
mingsongSorry for replying late.
It quite difficult to reproduce this issue as it is happen randomly. One thing pretty sure to me is that when this issue occurs, the anonymous view of that page has the same issue, even if we don't generate a static page for that URL.