Creating a static archive of a Drupal site

Last updated on

12 February 2025

How to produce a static mirror of a Drupal website?

Note: You should certainly only use this on your own sites...

Prepare the Drupal website

Create a custom block and/or post a node to the front page that notes that the site has been archived from Drupal to static HTML. Be sure to include the date of the archiving. Consider including a link to the future versions of the site (e.g. if you are archiving a 2008 event, link to the URL of the next event).

Disable interactive elements which will be nonfunctional in the static HTML version.

Use the Disable All Forms module to disable all forms.

login block
who's online block
registration
anonymous commenting
links to the search module and/or any search boxes in the header
comment controls which allow the user to select comment display format
Disable ajax requests such as views pagers.
Remove Views exposed filters
Update all nodes by setting their comments to read only. This will eliminate the login or register to post comments link that would otherwise accompany each of your posts. You can do this through phpMyAdmin by running the following SQL command from the node table:
update node_revision set comment = '1';
Or with Drush:
drush sql:query "UPDATE node_revision SET comment = '1';"
It can also be a good idea to disable any third party dynamically generated blocks; once the site is archived, it would be difficult to remove these blocks if the 3rd party services are no longer available.

Create a static clone

HTTrack (Linux, macOS, Windows)

KarenS has created a very helpful article Sending a Drupal Site Into Retirement Using HTTrack (2014, updated 2020) where she suggests the following code (on a Linux console):

httrack https://LOCALSITE -O DESTINATION -N "%h%p/%n/index%[page].%t" -WqQ%v --robots=0 --footer ''

(LOCALSITE is the URL of the site that's being copied, and DESTINATION is the path to the directory where the static pages should go)
In the latter, she furtherly suggests to run a regex on all files to fix link issues with index.html:

find . -name "*.html" -type f -print0 | xargs -0 perl -i -pe "s/\/index.html/\//g"

This way it would leave Drupal's non-trailing-space paradigma intact and avoid "duplicate content" issues while preserving absolute paths. Note that this only works with a web server configured to add the necessary trailing slashes again and resolve to the actual index.html file. Use for example DDEV to quickly spin up a LAMP instance to test it with:

ddev config --project-type=php

You also need to fix the root index.html by copying /index/index.html to /index.html, and remove ../ from paths in that file:

mv index/index.html .
sed -i 's|\.\./||g' index.html

The final task is to search and replace the link to the root index.html in all files, by updating the string href="../index/" to simply href="/":

find . -name "*.html" -type f -print0 | xargs -0 perl -i -pe "s|\.\./index/|/|g"

For a walk-through, see Convert Drupal to static site using HTTrack and deploy to GitHub Pages and for more clean up tips there's Drupal on Mothballs - Convert Drupal 6 or 7 sites to static HTML.

Wget (UNIX, Linux, OSX, ...)

Wget is generally available on almost any 'nix machine and can produce the mirror from the command line. However, wget seems to have problems converting the relative style sheet URLs properly with many Drupal site pages. Modify your theme template to produce hardcoded absolute links to the stylesheets and try the following command:

wget -q --mirror -p --adjust-extension -e robots=off --base=./ -k -P ./ http://example.com

wget respects the robots.txt files, so might not download some of the files in /sites/ or elsewhere. To disable this, include the option -e robots=off in your command line.

wget includes all query strings such as image file "?itok=qRoiFlnG". Recursively remove all query strings with:

find -name "*.*\?*" | while read filename; do mv "$filename" "${filename%\?*}"; done

SiteSucker (macOS)

SiteSucker. This is a Mac GUI option for downloading a site.

Suggested settings:

General: check "Ignore Robot Exclusions". This way the bot will not be throttled.
URL: check "Treat Ambiguous URLs as Folders". This will match Drupal's Clean URLs

Drupal modules

You can use a Drupal module to export some or all of your site as static HTML.

Verify that the offline version of your site works

Verify that the offline version of your site works in your browser. Test to make sure that you properly turned off any interactive elements in Drupal that will now confuse site users.

Why create a static site archive?

Perhaps over time your website have essentially become static. Because these sites still require security administration, an administrator has to continue to upgrade the site with patches or consider removing the site all together.
You want to ensure that the site is preserved on Drupal.org infrastructure (without direct cost to you)
Alternatively, you may want to produce an offline copy for archiving or convenient reference when you don't have access to the Internet. Before simply removing the site, consider another alternative: a Drupal site is maintained inside a firewall, and then the output of the site is periodically cached to static HTML files and copied to public servers.

Help improve this page

Page status: No known problems

You can:

Log in, click Edit, and edit this page
Log in, click Discuss, update the Page status value, and suggest an improvement
Log in and create a Documentation issue with your suggestion

On this page

Administering a Drupal site

Creating a static archive of a Drupal site

Prepare the Drupal website

Create a static clone

HTTrack (Linux, macOS, Windows)

Wget (UNIX, Linux, OSX, ...)

SiteSucker (macOS)

Drupal modules

Verify that the offline version of your site works

Why create a static site archive?

Help improve this page

News items

Our community

Documentation

Drupal code base

Governance of community