Problem/Motivation

When using CPS, a site can collect many revisions over time. Most often the revisions are long text fields, and over time these take up a lot of database disk space. Purging legitimate revisions isn't an option. One option is to placeholder old values and export them to a "cold storage" like the disk and placeholder their values in the database.

Proposed resolution

Given a revision, replace supported field values with a placeholder token. The original value is written to the private filesystem. When the revision is loaded later on, the cps_archiver module checks if the field values exist on disk and replace the field values.

Remaining tasks

  • A warning that disabling the cps_archiver module can lead to data loss unless archived revision field values restored
  • A cps_archiver_archived table to track revisions which have been exported
  • Archived field values must be encrypted

Issue fork cps-3212174

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

mglaman created an issue. See original summary.

mglaman’s picture

Issue summary: View changes

mglaman’s picture

Status: Active » Needs work

Ready for an initial review, but has work to be done.

mglaman’s picture

There are a few problems here. Technically, Drupal always loads the latest field revision value and when a site version is published, that version is also archived. The current code results in the field_data_body and the matching revision field table row to have the placeholder.

mglaman’s picture

Issue summary: View changes
douggreen’s picture

If there is any archived data, couldn't we just make the archiver a hard dependency, and thus not have to warn the user.

douggreen’s picture

Why do we need to encrypt the file data, where does this requirement come from. If it's in the database unencrypted, it's no less secure than being on the file system unencrypted. ... maybe what we should do is make sure that the directory permissions are 700 and that the file permissions are 400, so that only the web server can read it. We might need to use some umask, because it's possible that we'll need 770 and 440 respectively.

douggreen’s picture

I think that we need a way to restore the archived fields, mainly so that this module can be safely disabled.

douggreen’s picture

I've pushed a commit to this PR that does the following:

* use a shorter placeholder to take up less space
* removes cron queue because this is already been run inside cron or drush #3226803: Add a new drush command that archive's everything that can be archived
* prevents disabling if anything is archived
* adds encryption
* adds a directory hierarchy that hashes based on the revision id, so that we have at most 1000 files per directory
* stores files in /fields subdirectory of cps_get_archive_location(), works best with #3226803: Add a new drush command that archive's everything that can be archived

douggreen’s picture

I think we should rename this as cps_field_archiver to not be confused with the cps_entity archiving that already happens as part of cps.module.

douggreen’s picture

Title: CPS Archiver » CPS Field Archiver