Drupalize.Me (who hosts a CC-licensed version of the User Guide) received the following support request/feature request for 11.8. Making a Development Site:

Tutorial Title: 11.8. Making a Development Site
Was this helpful? No
Any additional feedback?
You should add explainaition [sic] on the arguments to add to the command to
compress the db backup using mysql or drush.
Try to be as complete as possible

The current instructions result in a non-compressed .sql file, which works fine for small databases. However, it is often practical to produce a compressed backup.

So, the question is, should we add instructions to this tutorial for generating a compressed database backup?

Comments

Amber Himes Matz created an issue. See original summary.

baluertl’s picture

Just to add an extra viewpoint to the argument supporting the requester, SQL in plain text (in .sql file format) can be quite vulnerable when any its parts get corrupted (for example the end of the file gets chopped off during transmission over poor network conditions).

Compressing such plain-text files supposed to be sent over the network is always a good idea not only because of the smaller file size but also because the built-in error-detecting functions (eg. CRC) modern compressing algorithms usually feature.

Introducing User Guide readers to the more common industry standards is always a better option I'd vote for in support.

jhodgdon’s picture

Sounds like a good idea to me. Patches welcome. :)

baluertl’s picture

Status: Active » Needs review
StatusFileSize
new2.17 KB

Here's a patch as a first round. List of recommended changes:

  1. Fix the mistake of drush sql-dump > BACKUPFILE.sql format of the command (note the > used to print into a file) resulting a zero-length file. This is probably because Drush has its own --result-file argument to specify the file path, name and extension instead.
  2. Compose a sentence suggesting don't forget to flush the caches before dumping the database (as cache tables can significantly increase the output size with no effect when importing back into an other database).
  3. Add options to compress the output files based on Marius Ducea's article. Regarding importing compressed dump file I took Mike Crittenden's words: "Note that --file supports both compressed and uncompressed files". Did not tried. An other possible option would be to pipe together with gunzip command: gunzip < BACKUPFILE.sql.gz | drush sql:cli
  4. Unify the form of how values are being added in -u and -p arguments (use single quote marks around the value and remove separating space between name and value)
  5. Unify the format of DATABASENAME arguments to not separate with an underscore character
  6. Add hint before the filenames that those can be complete paths, not filenames only.
jhodgdon’s picture

Status: Needs review » Active

This looks like a good start!

One thought: if you clear the cache in the admin UI, it will fill back up with a bunch of stuff once the page reloads. So, we should definitely tell people to use Drush to clear the cache, not the UI.

Another thought:

drush sql-dump --gzip --result-file='PATH/TO/BACKUPFILE.sql

Shouldn't the file name have a .gz extension here?

Third thought: We should test these commands and verify that they work.

baluertl’s picture

Assigned: Unassigned » baluertl
eojthebrave’s picture

The drush sql:dump command has a --structure-tables-list option which can be used to provide a list of tables for which you want to include the structure in the dump, but no the data. So you could skip the step about clearing the cache before making the dump.

So I think something like this would work:

drush sql:dump --gzip --structure-tables-list="cache,cache_*" --result-file=BACKUP.sql

baluertl’s picture

Status: Active » Needs review
StatusFileSize
new3.29 KB

@jhodgdon #5

  1. Absolutely valid addition, thanks for pointing out. Following Joe's suggestion above points exactly into this direction.
  2. Now I tried and confirmed that the Drush command automatically suffixes .gz file extension to the end of the file name provided as the string argument.
  3. Definitely, now I tested all of them importing in and exporting out on a local stack (MySQL 5.7.26, PHP 7.3.8, pMA 4.9.0.1, Drupal 8.8.0-dev, Drush 9.7.1).

@eojthebrave #7
Good idea, thanks for calling attention.

Also, regarding #3054031-17: Use long versions of Drush commands instead of abbreviated commands I replaced occurrences to use the colon-separated syntax variants (eg. drush pm:enable) instead of dash-separated (drush pm-enable) ones.

Some more points I have:

  • This User Guide pagereferences to the settings.php file by saying "The lines look something like this (before editing)". However, Drupal has adopted a new coding standard direction in favor of square brackets instead PHP's built-in array() function, so the original default.settings.php file also features this form. Therefore I'd suggest to update in the User Guide as well. My patch contains this change. Grepped in entire source/en/ folder, no more occurrences found.
  • phpMyAdmin, as the name of a software product officially uses (regarding its website) this form of casing: small initial letter and acronym PHP written entirely small. Also updated in my patch. Grepped in entire en/ folder, no more occurrences found.
jhodgdon’s picture

Looking good -- thanks for the new patch! I think a bit of the wording can be cleaned up, but let's get the content exactly right first. My only comment in that regard is about this:

+. As Drupal's internal caching system uses the database as temporary storage,
+dozen of these caching tables are unnecessary to contain data when a dump is
+exported. Therefore it is always a good idea to empty the cache before dumping
+the database. See <<prevent-cache-clear>>.

This still doesn't mention that if you use the UI method to clear the cache, the cache tables will be filled up again when the page reloads. So, you should use Drush to empty the cache to prevent that... although we should look and see if that even works (maybe drush cr would still end up caching some things at the end? not sure.)

I see that the drush command a few lines down does exclude the cache tables' contents from the dump, but not the mysql command.

Thoughts?

baluertl’s picture

Correct, I forgot to reword these sentences in my previous patch, sorry. However, at this moment I don't feel this extra step necessary at all anymore. The topic page this paragraph points at its end covers both UI- and CLI-based ways of flushing cache. So it would be awkwardly difficult to precisely reference only one method exactly.

Joe's idea of using a proper argument with the command is much clearer and easy to follow for the Reader, I think.

So I'd revoke my suggestion of adding this extra step before.

jhodgdon’s picture

OK... So if we don't have them clear the cache at all, then the mysql command shown there will get all of the data in the cache tables. This first one:

  command (substituting in your site's database name, user name, and password):
 +
 ----
-mysqldump -u USERNAME -p'PASSWORD' DATABASENAME > BACKUPFILE.sql
+mysqldump -u'USERNAME' -p'PASSWORD' DATABASENAME | gzip > PATH/TO/BACKUPFILE.sql.gz
 ----
 
   * If you prefer to use Drush, use this command:
 +
 ----
-drush sql:dump --result-file=BACKUPFILE.sql
+drush sql:dump --gzip --structure-tables-list="cache,cache_*" --result-file='PATH/TO/BACKUPFILE.sql'
 ----

I agree that the drush command below will omit the data in the cache tables, but if someone doesn't have drush, their database dump will include the cache, right?

baluertl’s picture

A valid observation, indeed, the Drush-less way outputs a potentially bigger filesize in comparison with the Drushy way. Therefore we could inform the Reader about this difference? Do we want to "promote" the Drush way by describing it as an added advantage in plus?

I'd love to hear what others (Amber, Joe, anyone) think also about which way should we describe these options.

baluertl’s picture

By a short research I found traces promising mysqldump program also takes an argument to skip specified tables:

--ignore-table=db_name.tbl_name
Do not dump the given table, which must be specified using both the database and table names. To ignore multiple tables, use this option multiple times. This option also can be used to ignore views.

The only disadvantage with this way that does not accept regex pattern to match with (see docs page), but all table names must be specified in full length together with database name. So in our use case it would be quite long in the User Guide to list such like this:

mysqldump -u'USERNAME' -p'PASSWORD' DATABASENAME --ignore-table=DATABASENAME.cache_bootstrap, --ignore-table=DATABASENAME.cache_config, --ignore-table=DATABASENAME.cache_container, --ignore-table=DATABASENAME.cache_data, --ignore-table=DATABASENAME.cache_default, --ignore-table=DATABASENAME.cache_discovery, --ignore-table=DATABASENAME.cache_entity, --ignore-table=DATABASENAME.cache_library, --ignore-table=DATABASENAME.cache_menu, --ignore-table=DATABASENAME.cache_rest, --ignore-table=DATABASENAME.cache_toolbar | gzip > PATH/TO/BACKUPFILE.sql.gz

jhodgdon’s picture

Hm.... We don't want to totally ignore the cache tables, though -- we need the table structure, just not the contents.

So, what if we did this:
- Explain briefly about the cache tables taking up a lot of space in the database dumps
- Present the Drush command first
- Use 2 commands for the non-drush command -- first truncate the cache* tables using a query, and then do the dump.

eojthebrave’s picture

I like the suggestion in #14. I think probably the most important part of this issues is letting people know that there are some tables, cache_*, sessions, etc. that you don't necessarily need to copy the data from. Though it doesn't do any real harm if you do. The benefit is reduced filesize, and dump/import times if those tables are large.

Use 2 commands for the non-drush command -- first truncate the cache* tables using a query, and then do the dump.

Personally I think that mentioning that these are not required is probably enough. And figuring out how they want to do this with SQL is something they can figure out on their own if they want to. It's nice that the drush command has a built in way to do it, but it is by no means required. Maybe in the explanation section we can just mention, "The data from these tables is not required, and they can optionally all be safely truncated before making a database dump. Drush has some handy features to make this easier."

jhodgdon’s picture

#15 sounds good to me. That's a good point that truncating the tables is not required, it only reduces the file size.

eojthebrave’s picture

Status: Needs review » Needs work

Setting this to needs work since there's still some changes we want to make based on the last couple of comments.

jhodgdon’s picture

Weird, this issue still was showing up as Needs Review in the issue list. Saving again to see if that helps.

batigolix’s picture

Assigned: baluertl » batigolix
batigolix’s picture

This patch addresses the feedback from #14 - #17:

- added explanation about cache tables
- put drush as first example and mysql as second

avpaderno’s picture

Status: Needs work » Needs review
jhodgdon’s picture

Status: Needs review » Needs work

Thanks for the patch! Much improved... I'm not sure it is exactly right yet though... Some thoughts, questions, and nitipcks:

a)

+. As Drupal's internal caching system uses the database as temporary storage,
+dozen of these caching tables are unnecessary to contain data when a dump is

The wording seems a bit off, confusing, or something here? Maybe:

As Drupal's internal caching system uses the database as temporary storage, the data in the caching tables is not needed when...

b)

+size. Therefore it is always a good idea to empty the cache before dumping the
+database. See <<prevent-cache-clear>>. The cache tables can be safely
+truncated before dumping the database. The Drush command sql:dump provides the
+--structure-tables-list option to exclude the data of these tables in the
+export.

I don't think that if you clear the cache using the UI method on the prevent-cache-clear topic, that your cache tables will end up empty. The reason is that after clearing out the data (which does empty the tables), you then are presented with the Performance page again. As soon as you or anyone else visits a page, a bunch of data is cached and the tables are not empty.

So I think we should take you the "See prevent-cache-clear" sentence -- I don't think that is an effective way to truncate the cache tables.

jhodgdon’s picture

Status: Needs work » Needs review
StatusFileSize
new5.28 KB
new4.57 KB

I thought about this some more, and decided the easiest way to suggest changes would be to make a new patch, so here it is. Thoughts?

One note: I noticed several of the command lines were impossible to read in the PDF, so I split them into two lines.

batigolix’s picture

Assigned: batigolix » Unassigned
batigolix’s picture

Status: Needs review » Reviewed & tested by the community

This seems fine to me.

  • eojthebrave committed 348f2fc on 8.x-8.x authored by jhodgdon
    Issue #3082471 by Balu Ertl, jhodgdon, batigolix: Add instructions for...
eojthebrave’s picture

Version: 8.x-7.x-dev » 8.x-8.x-dev
Status: Reviewed & tested by the community » Fixed
Issue tags: +Translation updates needed

I've gone ahead and committed the patch from #23. Thanks everyone for helping to sort this all out. I think this makes for a great update to the documentation. I'll post a message to the translators mailing list about this update.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.