Problem/Motivation

When a field config entity is deleted it deletes the config file and moves the configuration into the state field.field.deleted - see Field::preDelete(). This is also true for field instances - state field.instance.deleted and FieldInstance::preDelete(). This actually breaks dependency management since the data in those fields actually depends on the configuration entity existing. We then do all sorts of hacks in order to load the deleted field and purge the items - see field_purge_batch(). At the moment field/field instance data purging occurs during cron.

This causes massive issues for #2080823: Create API to discover config entities' soft dependencies and use this to present a confirm form on module uninstall as we need to be able to clean up fields sanely on module uninstall.

Proposed resolution

  • Enforce data removal before deleting field instances.
  • Ensure all field instances can be deleted before deleting fields.

Remaining tasks

  • Agree approach
  • Write patch

User interface changes

Yes. Deleting a field instance should start a batch that deletes all the data.

API changes

Yes.

Postponed until

#1808248: Add a separate module install/uninstall step to the config import process

Comments

alexpott’s picture

alexpott’s picture

Issue summary: View changes
swentel’s picture

Issue tags: +Field API

Tagging with field api as well. Should be 'relatively' easy to code.

amateescu’s picture

The proposal in the OP of doing a batch with something like

$entities = entity_load_multiple($entity_type, $ids_chunk);
foreach ($entities as $entity) {
  $entity->field_name->setValue(array());
  $entity->save();
}

is easy enough, but doesn't cover deleting a field instance in code.. and that's the hard part :)

xjm’s picture

Three thoughts:

  1. Whatever batching shouldn't be field-specific. #89181: Use queue API for node and comment, user, node multiple deletes

  2. Deletion expectations. If I uninstall a module containing a plugin that provides API that's required for a field to work, the data needs to go away, yeah. But if I uninstall something that provides a widget or formatter, the instances using them should stick around. Analogous to the optional handler stuff we've added to Views.

  3. It is valuable to be able to flag stuff for deletion and let it get purged later. Being able to batch is most important, and that's what we should treat as a requirement, but anyone who's had to sit through an hours-long node access rebuild batch knows why we have node_acces_needs_rebuild().

yched’s picture

Ouch. Long standing debate... We have rehashed this multiple times over the D7 cycle, and I'm sorry to say that I still see no better alternative :-/

Yes, we can wire a batch processing on UI field instance deletion.
- As @xjm points, waiting X minutes for the batch to proceed before being able to do anything else (e.g. deleting the next field ? sigh...) is extra tedious.
- This won't catch other UI actions that lead to instance deletion too, e.g deletion of a bundle (deletion of a node type, of a vocabulary... deletes N field instances). There's no generic way to catch those UI actions atm.

More importantly, same old issue - there's no good way to trigger batch processing on a direct $instance->delete() / $field->delete() API call : drush commands, hook_update_N, features, whatever other custom commissioning tools are used to automate config deployment...

catch’s picture

This comes back to #1797526: Rewrite batch api to be more....sane again.

On the waiting around problem, I don't think that's inherent to batching were the above issue in place. It's not impossible to start multiple batches, then show progress in a block as each runs down if they were marked as non-blocking.

xjm’s picture

xjm’s picture

Issue tags: +Entity Field API
xjm’s picture

Issue tags: -beta blocker

Discussed with @alexpott. While this is important to do config dependency management right for fields in #2080823: Create API to discover config entities' soft dependencies and use this to present a confirm form on module uninstall, we can move forward with that issue by leaving field_system_info_alter() as it is for now. This issue itself would not change the data model or public API for CMI, so if it doesn't blocк #2080823: Create API to discover config entities' soft dependencies and use this to present a confirm form on module uninstall, it's no longer beta-blocking.

xjm’s picture

More importantly, same old issue - there's no good way to trigger batch processing on a direct $instance->delete() / $field->delete() API call

This is why node access has the "needs rebuild" flag, too.

So I think there are three options:

  1. Do #10, and leave it that way. This seems dangerous to me; I think it would come back to bite us. After all, the same problem exists for other config entity types; just fields are (as usual) the most in our faces.
  2. Do #10, at first, but then expand the dependency management stuff to include an API for generic deleted stuff, essentially Deleted Fields: The Next Generation, and remove the field-specific D7 code handling it.
  3. Do #10, bump #1797526: Rewrite batch api to be more....sane to critical, and postpone this issue on non-blocking batches that can run in response to API calls.

From the summary:

This actually breaks dependency management since the data in those fields actually depends on the configuration entity existing

It would be helpful to outline exactly how it breaks dependency management to decide what to do here when we discuss this issue later today.

catch’s picture

I don't think #1 is an option, Fields as you say are the primary use case for that API.

I'd be OK with either #2 or #3 though.

andypost’s picture

Suppose better implement #3 because contrib could attach own clean-ups when field is deleted.
At least when comment field is deleted a lot of records needs to be deleted from {comment} table with all fields/failes attached for each comment

andypost’s picture

yched’s picture

Woah, right.

- Deleting a comment field means deleting a bundle of the 'comment' entity type, which in turns means deleting a bunch of other field instances...
That should still be fine if, as per today's hangout, we keep a notion of "deleted field pending batched purge". Just means a field delete cascades to more field deletes that queue up.

- Deleting a 'comment' field instance triggers the deletion of all the comment entities it held (well, referenced).
That is already flawed without bringing config sync in the mix IMO : comment_field_instance_delete() fetches a list of *all* impacted comment ids (could be thousands), and runs entity_delete_multiple() on it, which will first try to load all the corresponding comment entities. Easy to explode memory with that.

effulgentsia’s picture

Given today's call, should this issue be closed as "works as designed"?

xjm’s picture

Title: Remove deleted fields concept » Make deleted fields work with config synch

Well, we still need to make it work with synchronization, and we do want to eventually make this and #89181: Use queue API for node and comment, user, node multiple deletes cleaner. So rescoping the issue.

Here are the notes from the call, for those who weren't able to attend:
https://docs.google.com/a/acquia.com/document/d/17eXlfDuNwhHUMvQdUZZVU1X...

xjm’s picture

Issue tags: +beta blocker

Also, I think the re-scoped version is beta-blocking.

alexpott’s picture

I agree this is beta blocking abut I do not think this is blocked anything. It is obviously related to #1808248: Add a separate module install/uninstall step to the config import process since it is module uninstallation that causes the issue but the importer is already capable to doing field deletes so I think this is blocked on that.

beejeebus’s picture

can we check if a disabled module will require a field delete/purge?

in that case, we can say 'hey, please delete this field data first, then retry?'.

xjm’s picture

@alexpott:

I agree this is beta blocking abut I do not think this is blocked anything.

I think this is blocked on that.

Er. Which? :)

alexpott’s picture

Well if we go for @beejeebus's solution in #20 then this is definitely blocked on #1808248: Add a separate module install/uninstall step to the config import process

Thinking about this some more.... in order for this to be a problem we need to be able to uninstall modules as part on the config sync - atm we can't so writing a test for this is impossible. Therefore I'm postponing this on #1808248: Add a separate module install/uninstall step to the config import process

jessebeach’s picture

Title: Make deleted fields work with config synch » [PP-1] Make deleted fields work with config synch
Issue summary: View changes
jessebeach’s picture

Title: [PP-1] Make deleted fields work with config synch » Make deleted fields work with config synch

Unpostponed!

jessebeach’s picture

Status: Postponed » Active
xjm’s picture

Issue tags: +Configuration system
alexpott’s picture

Status: Active » Needs review
FileSize
9.28 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 67,586 pass(es). View

Patch attached implements solution based on @beejeebus's idea in #20.

Created #2236553: Config import validation needs to be able to list multiple issues and the messages should be translatable to address issue with working with the validate event found whilst working on this patch.

yched’s picture

+++ b/core/modules/field/lib/Drupal/field/FieldConfigSubscriber.php
@@ -0,0 +1,78 @@
+      if (strpos($config_name, 'field.field.') === 0) {
+        // A field instance is being deleted. We have to ensure that the module

field.field.* = field, not field instance ?
Other than that, patch looks fine for the approach.

On the approach itself - this means I can do stuff on my dev site, and then be told that the resulting config is undeployable to prod ? What do I do then ?

xjm’s picture

Status: Needs review » Needs work

Per @alexpott.

xjm’s picture

Status: Needs work » Postponed
xjm’s picture

Title: Make deleted fields work with config synch » [PP=1] Make deleted fields work with config synch
xjm’s picture

Title: [PP=1] Make deleted fields work with config synch » [PP-1] Make deleted fields work with config synch
xjm’s picture

Title: [PP-1] Make deleted fields work with config synch » Make deleted fields work with config synch
Status: Postponed » Needs work

Unpostponed!

alexpott’s picture

Status: Needs work » Needs review
FileSize
14.16 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 67,804 pass(es). View

New approach that purges field data before the configuration. The field module adds an extra step to configuration synchronisation IF a module is being uninstalled AND it provides a field type AND EITHER fields of its type have already been deleted OR are being deleted as part of the synchronisation.

alexpott’s picture

FileSize
11.22 KB
18.59 KB
FAILED: [[SimpleTest]]: [PHP 5.4 MySQL] 67,889 pass(es), 0 fail(s), and 4 exception(s). View

Improved the calculation of how many steps the field purging will take and added a warning about deleting fields to the config sync screen. Now we need to add a UI test for this. Also I don't think using dsm() is a good idea since this means the message will unnecessarily appear at the end of the batch processing.

Status: Needs review » Needs work

The last submitted patch, 35: 2198429.35.patch, failed testing.

alexpott’s picture

Status: Needs work » Needs review
FileSize
527 bytes
18.6 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 67,976 pass(es). View

Fixing patch.

swentel’s picture

Wow, this is cool so far. Didn't do an in-depth review yet because one thing suddenly popped up again which I completely forgot for a while:

+++ b/core/modules/field/lib/Drupal/field/Entity/FieldConfig.php
@@ -660,32 +660,57 @@ public static function getReservedColumns() {
   public function hasData() {
-    if ($this->getBundles()) {
+    return (bool) $this->numberOfRows(TRUE);
+  }
+

numberOfRows() (and hasData() before too) only works if the entity type has a base table. So that would mean that field purging won't work at all in case the entity type has a different storage backend. Not the fault of this patch though.

So the question is, shouldn't this method live somewhere on ContentEntityStorage ? ContentEntityDatabaseStorage can then implement the method for entity query. If then mongo comes around, it can also signal if entities still have data and then use purgeFieldItems() later on to update the entities to remove the field entries. This isn't something FieldConfig should know about, at all IMO.

Note, could be a follow up though maybe, because I'm not sure exactly how for instance Mongo works in detail, but I can imagine the stored entities should be cleaned up too no ?

yched’s picture

  1. +++ b/core/modules/field/field.module
    @@ -362,3 +363,31 @@ function field_hook_info() {
    +    array_unshift($sync_steps, array('\Drupal\field\FieldConfigImporter', 'processDeletedFields'));
    

    processDeletedFields() is added at the beginning of the steps ?
    Oh, that's because processDeletedFields() will take care of actually deleting the field, in advance of the actual sync process, right ? Might be worth a comment ?

    Also, this means field deletion gets done first thing, bypassing whatever other dependency / ordering mechanism might exist.
    I thought the initial plan was to let the sync proceed, *then* do the purges, *then* uninstall modules ?
    Inserting a sync step between two other steps might require a weighting convention though.

  2. +++ b/core/modules/field/field.module
    @@ -362,3 +363,31 @@ function field_hook_info() {
    +  if (isset($form_state['storage_comparer'])) {
    +    $fields = \Drupal\field\FieldConfigImporter::getFieldsToPurge($form_state['storage_comparer']);
    +    if (count($fields)) {
    +      // @todo improve message and don't use dsm.
    +      drupal_set_message(t('Field data will be deleted by this synchronisation.'), 'warning');
    +    }
    +  }
    

    Not too familiar with that form, and not sure when we want that message to show, but it looks like the $form would need to cater for a "messages" area ?

    Also - sounds surprising that $form_state['storage_comparer'] might not be set ?

  3. +++ b/core/modules/field/lib/Drupal/field/FieldConfigImporter.php
    @@ -0,0 +1,129 @@
    +class FieldConfigImporter {
    

    Nitpick, but name is misleading, the class contains some steps for the sync process, but it is not in charge of "importing the field config" as a whole.

  4. +++ b/core/modules/field/lib/Drupal/field/FieldConfigImporter.php
    @@ -0,0 +1,129 @@
    +  public static function getFieldsToPurge(StorageComparerInterface $storage_comparer, array $deletes = NULL) {
    +    $fields_to_delete = array();
    +    if (!$deletes) {
    +      $deletes = $storage_comparer->getChangelist('delete');
    +    }
    +    $staged_extensions = $storage_comparer->getSourceStorage()->read('core.extension');
    

    The params of getFieldsToPurge() feel a little confusing. The method receives a storage comparer, which holds a list of fields to be deleted, but can also be passed a different list of fields that will then be used instead.
    So the params neeeded for the business logic of the method (a list of modules and a list of fields) are sometimes semi-injected and semi-pulled from the $storage_comparer, and sometimes fully pulled.

    It might be clearer if the method simply received ($modules, $fields) fully injected, and its up to the caller to determine where they come from ? (well, maybe a list of $deletes rather than a list of $fields since figuring out actual fields records is a bit tedious to be left to callers)

  5. +++ b/core/modules/field/lib/Drupal/field/FieldConfigImporter.php
    @@ -0,0 +1,129 @@
    +    // Gather fields that will be deleted during configuration synchronization
    +    // where the module that provides the field type is also being uninstalled.
    

    Not sure if this makes a real difference but strictly speaking it's "where the module is not present in the final set of modules".
    I mean, the module could be already gone for some reason ?
    Well, then we try to purge a field whose module is absent, not sure what happens these days TBH :-/

  6. +++ b/core/modules/field/lib/Drupal/field/FieldConfigImporter.php
    @@ -0,0 +1,129 @@
    +    foreach ($deletes as $config_name) {
    +      if (strpos($config_name, 'field.field.') === 0) {
    +        $id = ConfigEntityStorage::getIDFromConfigName($config_name, 'field.field');
    +        /** @var \Drupal\field\Entity\FieldConfig $field */
    +        $field = entity_load('field_config', $id);
    +        if ($field && !isset($staged_extensions['module'][$field->module])) {
    +          $fields_to_delete[] = $field;
    +        }
    +      }
    +    }
    

    Wondering if this could be done with an EFQ. They are optimized if you can narrow a list of IDs, which is the case here.
    Hmm - can EFQs do "where module NOT IN array()" ?

  7. +++ b/core/modules/field/lib/Drupal/field/FieldConfigImporter.php
    @@ -0,0 +1,129 @@
    +  public static function processDeletedFields(&$context, ConfigImporter $config_importer) {
    

    So this does a series of "delete a field, then run field_purge_batch(N), rinse and repeat"
    But field_purge_batch(N) only deletes N records, and it's not even guaranteed to be records for the field you just deleted.

    So the code gives a false impression of sequencial ordering ("I'll deal with field A, and when I'm done I'll move to field B"), and has do do complex computation in order to figure how many times field_purge_batch(N) needs to run, including the shaky "numberOfRows()".
    + those computations are not guaranteed to be correct anyway since field_purge_batch() might start by catching unrelated fields already pending purge, and then leave off data for the fields we're actually deleting here.

    --> We probably need to add a way to optionally hint field_purge_batch() at an explicit field - purge *this* field, and no other.

    A more natural way to setup the batch operations would then be:
    - one op to delete a field
    - one (multistep) op to do as many passes of field_purge_batch() as needed on that field - hmm, still requires some notion of numberOfRows() if we want to be able to report progress feedback on the batch UI :-/
    - one op to delete the next field
    - etc...
    Which is 2x(# of deleted fields) ops - that's not really an issue Batch API wise, but the way $sync_steps works, each step only gets one multipass op, right ? I guess the above logic can still be smushed into one single multistep op, but that might be more complex to write and maintain.

  8. +++ b/core/modules/field/lib/Drupal/field/Entity/FieldConfig.php
    @@ -660,32 +660,57 @@ public static function getReservedColumns() {
    +  public function numberOfRows($has_data = FALSE) {
    

    Yeah, numberOfRows() is tricky... As swentel says, I'm not sure a Mongo storage would be able to compute that information.

    "get the number of entities that have records for this field" might be more doable - that would mean changing field_purge_batch(N) to mean "purge N entities" rather than N field values. Makes the duration of one call a bit more unpredictable, but given the default value of N (10), I think we'd still be safe regarding timeouts.

    Mongo / EFQ : I guess we need @chx on that one...

yched’s picture

So, I asked @chx on IRC :
is there any way a Mongo entity storage could answer either :
1) how many records exist (including multiple field values) for a given field across entities
2) how many entities have actual records for that field

He said:
2) is trivial
1) would be doable in mongodb 2.6, which he intends to require

So it seems we're safe staying with field_purge_batch(N) = "purge N values".

What's highly unsure is whether such a query can be written in FieldConfig::numberOfRows() in a storage-agnostic way using smthg like entityQueryAggrate, or whether this will require a dedicated method on FieldableEntityStorageInterface. That latter seems more likely.

sun’s picture

re: #40:

mmm, in case our intention is to officially support the field storage to be swappable with a key/value store (or document store? - it's actually not clear what we intend to support...), then we really should not rely on implementation-specific answers to questions like that, but instead we should have an actual, abstract, canonical implementation in core.

cf. #2208617: Add key value entity storage (which seemingly presents the ~50% "outer" dependency for doing that?)

alexpott’s picture

FileSize
18.82 KB
25.56 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 68,056 pass(es). View

Thanks for the reviews and thoughts @yched and @swentel

From the review in #39

  1. Fixed
  2. Look at ConfigSync - it is possible for storageComparer to not be defined. Adding code to only show the message at the right time.
  3. Changed name to ConfigImporterFieldPurger
  4. Fixed to pass extensions and deletes to getFieldsToPurge() - nice idea
  5. The module could not be already gone - if it was then the field would be too - see field_system_info_alter() :)
  6. Switched to use EFQ
  7. Fixed field_purge_batch to (optionally) only handle one field - since we can't pass in arguments to sync steps because of the dance with the ConfigSync form and the ConfigImporter I see no option but to smush.
  8. If numberOfRows is tricky isn't hasData() tricky too?

Patch adds a test of doing all of this through the UI.

yched’s picture

If numberOfRows is tricky isn't hasData() tricky too?

hasData() is way simpler currently, we just need a 'yes/no', not a count - it's just a basic EFQ with a condition group, nothing too problematic for alternate storages.
(well, aside from the getBaseTable() check...)

Also, I don't think hasData() is ever called on a deleted field currently - and I doubt it would work as is, since it just queries using the field name ($this->name).

Couldn't review the interdiff in depth but :

+++ b/core/modules/field/lib/Drupal/field/ConfigImporterFieldPurger.php
@@ -0,0 +1,139 @@
+   * @param array $deletes
+   *   The configuration that delete will be deleted by the configuration
+   *   synchronisation.

Yo dawg, I heard you like deletes...
(sorry, couldn't resist)

yched’s picture

@sun : well, we have to assume minimal stuff...
Any SQL-based storage would very likely know how to answer "how many records exist for this field ?"
And if a Mongo storage can too, then I'd be fine with considering document storages are covered.

The current conclusion is already that numberOfRows() would need to be supported by a specific ad-hoc method implementation in each storage. Worst case: if a storage doesn't support numberOfRows(), then we just won't be able to provide accurate progress on batch UI, the progress bar will just stick to the "purge started" state and then at some point jump to the "purge finished" state with no visible progress meanwhile. Tough :-)

Also, I kind of doubt #2208617: Add key value entity storage is intended to be a full-fledged, EFQ-enabled, fieldable entity storage ? At any rate, not in the current RTBC patch (KeyValueEntityStorage doesn't implement FieldableEntityStorageInterface)

alexpott’s picture

Status: Needs review » Postponed

I must admit I'm still confused about the need to move the numberOfRows functionality to a storage controller. All the current implementation does is borrow from hasData() and field_purge_batch() so if this does not work then neither does HEAD currently. Everything is using EFQ. Anyhow I've extract this part of the change to another issue so we can add specific tests and make easy to run these tests on other storage engines.

Postponing on #2244885: Add method to count number of entities with a particular field

yched’s picture

Reading #2244885: Add method to count number of entities with a particular field made it clearer that numberOfRows() was actually counting entities :-). This did fool me, I thought we were counting rows (i.e # of field values including multiples), and I was skeptical about Mongo's ability to do that.

If we are just counting entities, then yes, no doubt a single EFQ can handle that across storage types.
And right, field_purge_batch() does operate entity by entity, so field_purge_batch(N) means "purge N entities", and what we need to count is the total # of entities, not the number of values.

Sorry for adding confusion here - I guess that's what hibernation does to you :-/

The remaining issue is then the getBaseTable() check, but that's already the case in HEAD.

yched’s picture

  1. +++ b/core/modules/field/field.purge.inc
    @@ -68,11 +68,18 @@
    -function field_purge_batch($batch_size) {
    +function field_purge_batch($batch_size, $field_uuid = NULL) {
       // Retrieve all deleted field instances. We cannot use field_info_instances()
       // because that function does not return deleted instances.
    -  $instances = entity_load_multiple_by_properties('field_instance_config', array('deleted' => TRUE, 'include_deleted' => TRUE));
    +  if ($field_uuid) {
    +    $instances = entity_load_multiple_by_properties('field_instance_config', array('deleted' => TRUE, 'include_deleted' => TRUE, 'field_uuid' => $field_uuid));
    +  }
    +  else {
    +    $instances = entity_load_multiple_by_properties('field_instance_config', array('deleted' => TRUE, 'include_deleted' => TRUE));
    +  }
    

    The second half of the function, that deals with the $fields, also needs to account for the $field_uuid hint ?

  2. +++ b/core/modules/field/lib/Drupal/field/ConfigImporterFieldPurger.php
    @@ -0,0 +1,139 @@
    +        $how_many_steps = (int) ($row_count / $context['sandbox']['field']['purge_batch_size']) + 1;
    

    Nitpick :
    ceil(x / y)
    would be clearer than
    (int) (x / y) + 1
    ?
    [edit: and also more accurate ? for $row_count = 20 and $batch_size = 10, we want 2 steps, not 3 ?]

    Also, actually, that number is the worst case scenario, in good cases less calls will be needed.
    Example: field has 2 instances, and numberOfRows() counts 12 entities.
    - If that's 6 entities per instance, one single call to field_purge_batch(10) will purge all
    - If that's 1 entity on the 1st instance and 11 entities on the 2nd, then two calls are needed.

    This cannot really be determined based on the mere # of overall entities, and it's no biggie, the batch will just do extraneous calls to field_purge_batch(), which will basically be no-ops.

    It just means field_purge_batch(N, $field_uuid) needs to be bullet-proofed against $field_uuid possibly being already gone.

alexpott’s picture

Status: Postponed » Needs review
FileSize
4.39 KB
23.92 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 68,203 pass(es). View

#2244885: Add method to count number of entities with a particular field is in.

  1. Fixed
  2. Fixed - nice catch. Also I fixed field_purge_batch() to only purge the number of entities in $batch_size. I think the deleting more records if there are multiple instances is a bug - since we then make a nonsense of the param docs :
     * @param $batch_size
     *   The maximum number of field data records to purge before returning.
    

    Also afaics field_purge_batch(N, $field_uuid) is already bullet-proofed against $field_uuid possibly being already gone.

yched’s picture

+++ b/core/modules/field/field.purge.inc
@@ -104,6 +111,11 @@ function field_purge_batch($batch_size) {
+        $batch_size--;
+      }
+      // Only delete up to the maximum number of records.
+      if ($batch_size == 0) {
+        break;
       }

Not sure about that - it does make the function behave more like you'd expect from the phpdoc, but it also means each call purges a lot less data.
Make no difference for our case of config sync here, but cron purges will take many more cron runs.

I'd think we should keep the current behavior and adjust the phpdoc instead :-) (the inconstency exists in HEAD, so doesn't have to be done in this patch)

afaics field_purge_batch(N, $field_uuid) is already bullet-proofed against $field_uuid possibly being already gone

Yup - that was depending on how the second half of field_purge_batch() would handle the $field_uuid hint, but it's fine that way.

IMO would be worth adding a comment to ConfigImporterFieldPurger::calculateNumberOfBatchSteps() clarifying that the code will likely do needless calls to field_purge_batch() but that it's OK. Can save potential head scratching later on.

alexpott’s picture

@yched yes but - then the $batch_size really is controlling nothing. If you have a field with 100 instances this could break your cron.

yched’s picture

Yeah I guess you're right.

Then I think we need to bump purge_batch_size's default value to compensate. Something like 50 should be safe IMO.

alexpott’s picture

FileSize
201 bytes
24.23 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 68,186 pass(es). View

Agreed.

yched’s picture

OK, final nitpicky review :-)

  1. +++ b/core/modules/field/field.module
    @@ -362,3 +363,42 @@ function field_hook_info() {
    +function field_config_import_steps_alter(&$sync_steps, ConfigImporter $config_importer) {
    ...
    +  if (count($fields)) {
    

    we usually just do if ($array) rather than if (count($array))

    (same in field_form_config_admin_import_form_alter())

  2. +++ b/core/modules/field/field.module
    @@ -362,3 +363,42 @@ function field_hook_info() {
    +      // @todo improve message and don't use dsm.
    +      drupal_set_message(t('Field data will be deleted by this synchronisation.'), 'warning');
    

    Still a @todo ?

    (at any rate, the trailing ':' currently hangs in the air)
    [edit: scratch that, either I was on crack or that was a dirt spot on my screen, there's no ':' here...]

  3. +++ b/core/modules/field/field.purge.inc
    @@ -116,6 +128,10 @@ function field_purge_batch($batch_size) {
    +      // If a specific UUID provided only operate on that UUID.
    

    Not a real sentence :-)

  4. +++ b/core/modules/field/lib/Drupal/field/ConfigImporterFieldPurger.php
    @@ -0,0 +1,139 @@
    +  /**
    +   * Processes deleted fields.
    ...
    +  public static function processDeletedFields(&$context, ConfigImporter $config_importer) {
    

    Slightly confusing - the method doesn't process (already) deleted fields, it deletes (and purges) them :-)
    --> maybe processFieldDeletion() : "Processes fields targeted for deletion as part of a config sync.\n\nThis takes care of deleting the field and purging the data on the fly." ?

    + missing "array" typehint for $context (& same for next method)

  5. +++ b/core/modules/field/lib/Drupal/field/ConfigImporterFieldPurger.php
    @@ -0,0 +1,139 @@
    +    $fields = static::getFieldsToPurge($context['sandbox']['field']['extensions'], $config_importer->getUnprocessedConfiguration('delete'));
    ...
    +    $fields_to_delete_count = count(static::getFieldsToPurge($context['sandbox']['field']['extensions'], $config_importer->getUnprocessedConfiguration('delete')));
    

    Is there a way we could avoid calling getFieldsToPurge() both before and after ?

    (more on that below)

  6. +++ b/core/modules/field/lib/Drupal/field/ConfigImporterFieldPurger.php
    @@ -0,0 +1,139 @@
    +   * Calculates the number of steps necessary to purge all the field data.
    ...
    +  protected static function calculateNumberOfBatchSteps(&$context, ConfigImporter $config_importer) {
    

    Aside from just calculating the number of steps, this does other stuff to initialize the $context['sandbox'] for the multipass logic.

    --> init() ? initSandbox() ?

  7. +++ b/core/modules/field/lib/Drupal/field/ConfigImporterFieldPurger.php
    @@ -0,0 +1,139 @@
    +    // @todo should we use the staged batch size or the currently active?
    

    Good question :-) Should config about to be deployed affect the deploy itself ?

    No real opinion in this specific case, but more generally this sounds like a slippery slope ? If you haven't been deployed yet, well then you haven't been deployed yet...

  8. +++ b/core/modules/field/lib/Drupal/field/ConfigImporterFieldPurger.php
    @@ -0,0 +1,139 @@
    +    // There will be one step to delete the instances and field.
    +    $context['sandbox']['field']['steps_to_delete'] = count($fields);
    

    More precisely, it's "Each field needs one last field_purge_batch_call() to remove the last instance and the field itself.".
    (sorry for being painful, but I just re-scratched my head to re-check whether this was OK...)

    Also, since it's about the end of the process, it might be slightly easier to understand if this was added at the end :-)

  9. +++ b/core/modules/field/lib/Drupal/field/ConfigImporterFieldPurger.php
    @@ -0,0 +1,139 @@
    +   * Gets the list of fields to purge before configuration synchronisation.
    ...
    +  public static function getFieldsToPurge(array $extensions, array $deletes) {
    

    It's actually "a list of fields to delete and purge" (besides, the var name used internally is $fields_to_delete)

    --> getFieldsToDelete() ?

    Also : this gets called over and over during the batch, permanently re-doing the queries just to get the next field to process. Could the list be stored in $context at the 1st pass, and then simply unstacked as we progress ?

  10. +++ b/core/modules/field/lib/Drupal/field/ConfigImporterFieldPurger.php
    @@ -0,0 +1,139 @@
    +   * data must be purged before the module is uninstalled. Also, if deleted
    +   * fields exist whose field types are provided by modules that are being
    +   * uninstalled their data need to be purged too.
    ...
    +    foreach($fields as $field) {
    

    missing space after foreach

  11. +++ b/core/modules/field/lib/Drupal/field/ConfigImporterFieldPurger.php
    @@ -0,0 +1,139 @@
    +   * @param array $deletes
    +   *   The configuration that delete will be deleted by the configuration
    +   *   synchronisation.
    

    see #43 :-p

  12. +++ b/core/modules/field/lib/Drupal/field/ConfigImporterFieldPurger.php
    @@ -0,0 +1,139 @@
    +  }
    +}
    

    Missing empty line after last method

  13. +++ b/core/modules/field/lib/Drupal/field/Tests/FieldImportDeleteUninstallTest.php
    @@ -0,0 +1,181 @@
    +      'description' => 'Delete field and instances during config synchronsiaton and uninstall module that provides the field type.',
    

    typo: synchronsiation

    (same in the other test class)

  14. +++ b/core/modules/field/lib/Drupal/field/Tests/FieldImportDeleteUninstallTest.php
    @@ -0,0 +1,181 @@
    +   * Tests purging already fields and instances as part of config import.
    

    some words are missing

  15. +++ b/core/modules/field/lib/Drupal/field/Tests/FieldImportDeleteUninstallTest.php
    @@ -0,0 +1,181 @@
    +  }
    +}
    

    Missing empty line after last method

yched’s picture

Also, wondering one final thing :-/ :
ConfigImporterFieldPurger deletes fields (and thus their instances) in advance of the "regular" config sync step, but they're still on the list of "things to process". So what happens when the "regular" sync step tries to deal with those deletions later on ? They're just silently skipped, like "dunno what you're talking about, those items don't exist anyway" ?

swentel’s picture

@yched - I guess the regular process won't care anymore because a new comparison between active and staging will not see deletions anymore, no ?

- edit - I think I talked about that during dev days with alex, when this pre step is done, the process list needs to be updated or simply recalculated, then there's is no problem anymore. Not sure if that is happening already.

alexpott’s picture

FileSize
11.71 KB
24.35 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 68,220 pass(es). View

Thanks @yched!

  1. Fixed in both places
  2. Fixed by removing @todo
  3. Fixed
  4. Fixed - changed to "process" since Drupal\field\ConfigImporterFieldPurger::process() is descriptive enough
  5. Don't think so
  6. Fixed
  7. Fixed
  8. Well every field is purged and only some are deleted that's why I went for getFieldsToPurge() / how would we know when to unstack?
  9. Fixed
  10. Fixed
  11. Fixed
  12. Fixed
  13. Fixed
  14. Fixed
  15. Fixed

@yched, @swentel - At the beginning of the processConfiguration step in the ConfigImporter the changelist is recalculated so we are good.

xjm’s picture

+++ b/core/modules/field/lib/Drupal/field/ConfigImporterFieldPurger.php
@@ -81,11 +84,15 @@ protected static function calculateNumberOfBatchSteps(&$context, ConfigImporter
-   * Gets the list of fields to purge before configuration synchronisation.
+   * Gets the list of fields to purge before configuration synchronization.

So proud! ;)

yched’s picture

LOL @ #57. All those efforts payed, @xjm :-)

yched’s picture

Status: Needs review » Reviewed & tested by the community

@alexpott: thanks !

About avoiding getFieldsToPurge() being recomputed over and over - I *think* it would be doable if :
- instead of returning an array of FieldConfig entities, getFieldsToPurge() returned a lighter list of info allowing just later loading of each of the fields. That light list would be stored in the $context (no need then to store & serialize/unserialize the list of extensions), and process() would then pop the first one, load it, process it, unstack when done.
- field_purge_batch(N, $field_uuid) returned TRUE when it reached the stage where it has actually removed the $field_uuid field. Then we know when to unstack.

"a lighter list of info allowing just later loading of each of the field" would probably require a bit of thinking, since efficiently loading a field depends on whether it's already deleted or not...

So i guess optimizing this could be left to a followup if we want to move forward sooner on the critical here :-) If so, RTBC is fine by me.

At any rate, found some optimization for the current shape of the code : #2247095: Optimize loading of deleted fields

yched’s picture

Oh, sorry - actually we probably need to work on the phrasing of the UI message a bit more.
"Field data will be deleted by this synchronisation" could be read as "*all* data for *all* existing fields is going to be deleted". Would sound like an improbable thing to do, but would probably make people a bit nervous anyway :-)

We're only deleting / purging some fields that are either :
- already deleted on the current target site
- targetted for deletion as part of the sync process we're about to launch

In both cases, a site admin already took an explicit decision at some earlier point to delete the field and say goodbye to the data. We currently don't warn when deploying a field delete, which de facto is enough to make the data inaccessible and as good as gone. Here we "just" add on-the-fly purge to the process.

So IMO it's not that much about warning about data deletion, but rather warning that the sync process will take a (possibly large) bit longer than what you'd expect because of some behind-the-scenes housekeeping.

xjm’s picture

Status: Reviewed & tested by the community » Needs review
Related issues: +#2247095: Optimize loading of deleted fields

Sounds like #59 needs an additional followup other than #2247095: Optimize loading of deleted fields?

+++ b/core/modules/field/field.module
@@ -362,3 +363,41 @@ function field_hook_info() {
+      drupal_set_message(t('Field data will be deleted by this synchronization.'), 'warning');

Well, I think it's more than just behind-the-scenes cleanup, because they are in fact deleting maybe a lot of data. Presumably the user has already at this point seen the list of deletions. What we want to tell them is that deleting these specific fields also deletes their data. $fields already contains this information I assume?

A dsm() warning is potentially bad UX, though. People see the colored box and think they broke something. And we're not giving them a choice to back out of it if they don't like the colored box, so they might just hit the back button. So setting back to NR for #60. I think to really understand the UX I need to test it. Will try to do so later.

alexpott’s picture

FileSize
5.97 KB
26.18 KB
FAILED: [[SimpleTest]]: [PHP 5.4 MySQL] Setup environment: Test cancelled by admin prior to completion. View

Improved the message. What is additional here is that we are purging field data because modules are being uninstalled. So the message now explicit mentions this and which fields are affected. I agree that dsm() may not be the best choice here and I'm up for UX review. I'll ping people in #drupal-usability and see if anyone wants to have a look. The obvious problem being that the config sync screen needs UX work already.

alexpott’s picture

FileSize
4.21 KB
30.77 KB
FAILED: [[SimpleTest]]: [PHP 5.4 MySQL] Setup environment: Test cancelled by admin prior to completion. View
4.21 KB
30.77 KB
FAILED: [[SimpleTest]]: [PHP 5.4 MySQL] Setup environment: Test cancelled by admin prior to completion. View
196.17 KB

Lost the "warning" on the message type. Screenshot attached.

Simplest way to test this:

  1. Install standard profile
  2. From the command line drush cex -y
  3. Install telephone module
  4. Create telephone field on the article node type
  5. Create an article with the new telephone field populated
  6. Go to admin/config/development/configuration
alexpott’s picture

alexpott’s picture

Status: Needs work » Needs review
FileSize
452 bytes
26.19 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 68,137 pass(es). View

Patches uploaded to #63 were totally wrong :(

alexpott’s picture

alexpott’s picture

Issue summary: View changes
Bojhan’s picture

Issue summary: View changes

This should be a red message, not a warning. I'd like this to be a little bit more clear, to me its unclear what will happen when we say "purged". Can't we just say deleted or removed? We have a similar message on module disabling with losing data.

Also this should be a "list" error, when it touches upon several fields.

yched’s picture

@Bojhan : a red message indicates that something is going / has gone wrong ? That's totally not the case here, it's just an information, but everything's as it should.

[edit: again, clarifying that this is not about data loss. Data has *already* been moved away without a way of putting it back when the user decided to delete the field]

yoroy’s picture

1. Having this as a message is ok. Not sure it needs to be a red one, warning seems fine. (EDIT: but if there's a similar one on modules page we should be consistent with that)
2. “will have data purged” is relatively difficult wording. “This synchronisation will delete data from the node.field_tel”
3. There's no clear way to back out of this screen. Relies on people to hit the back button or use breadcrumbs. (out of scope for this issue, thanks @alexpott for creating #2247291: Reorder tabs in configuration UI
4. We might want to add a confirmation dialog. But reading yched in #71, the "damage" has mostly been done already? If that's the case, an additional confirmation seems unneccessary.

xjm’s picture

Yeah, we definitely don't want a red message -- people will see it and hit back, thinking they broke something, whereas it's just there to inform the user that they're about to start a possibly long-running data deletion operation.

alexpott’s picture

FileSize
2.04 KB
26.14 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 68,193 pass(es). View
  1. The reason I chose "warning" is due to data loss and having a big green tick next to something like this seems odd
  2. Fixed
  3. Agreed
  4. Well maybe - if the field configuration is to be deleted then the damage is yet to be done

Hopefully what we have for now is good enough for the beta and we can improve on it in #2247291: Reorder tabs in configuration UI

yched’s picture

if the field configuration is to be deleted then the damage is yet to be done

true, but we don't provide specific warnings for other deletes (or for field deletes when the field type module stays around). This message really only is displayed because we're about to start a long operation, not because we're deleting stuff ?

Bojhan’s picture

I'm fine with this being an warning. We should really figure out our coloring, for status updates - because that always confuses me.

For consistency sake you would want to move this to a confirm page, for convenience/usability sake you would want to display it before the user clicks "go" so torn between what is best here. But the latter should work fine, as you inform them at the right point in time.

xjm’s picture

Alright, I am comfortable with this as it stands. The workflow improvements are somewhat out of scope (broader/existing problem) and can be handled in #2247291: Reorder tabs in configuration UI. Thanks @yoroy and @Bojhan for the 11th-hour help! :)

xjm’s picture

Status: Needs review » Reviewed & tested by the community

Meant to do this. :)

  • Commit 5e4b8a1 on 8.x by webchick:
    Issue #2198429 by alexpott: Make deleted fields work with config synch.
    
webchick’s picture

Status: Reviewed & tested by the community » Fixed

Reviewed this for awhile with xjm and effulgentsia. I remember distinctly this logic going in in D7, and the reason for this was because on a site with multiple hundred thousand/nodes, this would be a blocking operation. Imagine deleting the issue tags field from Drupal.org. :\

My big concern with undoing this "delete on cron whenever you get around to it" logic is that from a user POV, 90%+ of the time, when you do a config sync from the config UI or from Drush, the screen/CLI is back in a minute or so and you go on with your day. This potentially makes it so your screen is frozen in "batch land" for 5+ hours, seemingly totally randomly, and heaven forbid that happens while you're trying to stage something urgent at the same time. (This is totally plausible: on dev, you removed that field just to clean some old data and then suddenly marketing needs a new block deployed on the front page for a product launch. Both of them go in the same staging batch, and you're effed.)

While the screen technically has a warning, it is pretty generic "blah blah your'e about to delete things" warning, and not nearly bold enough to inform people that this scenario could happen, and definitely do NOT click that button at 5pm on a Friday. :P On node access rebuild (which is a similar situation), we have a more specific warning, like:

return confirm_form(array(), t('Are you sure you want to rebuild the permissions on site content?'), 'admin/reports/status', t('This action rebuilds all permissions on site content, and may be a lengthy process. This action cannot be undone.'), t('Rebuild permissions'), t('Cancel'));

This is still easily skipped over when you're in a hurry, but at least we tried to warn people what they're about to do.

But I guess I can get behind this being sorted out in a follow-up issue, so we can make progress here. Just… thar be dragons. I predict we're not quite done with this problem space yet.

Elsewise, the patch has great test coverage and looks great.

Committed and pushed to 8.x. Thanks!

xjm’s picture

One thing omitted from @webchick's comment, that we discussed on the call, is that this long-running, data-purging batching is an edgecase that only happens when you're staging config that completely uninstalls the module providing the field type.

A question @webchick asked that I didn't have an answer for was: what if you first stage a field deletion, but forget to uninstall the module... synch that, and field deletion starts happening on cron. You delete 1/4 of the field data... and then remember to uninstall the module. Does the dependency management still handle adding the field data purging step as in this patch? This wouldn't actually be too too hard to test; just wanted to make sure the question wasn't lost.

xjm’s picture

Adding the followup to the referenced issues. I also updated it to include a point for @webchick's feedback about warning the user about the long-running batch.

yched’s picture

re @xjm #81 :

Yes, @alex took care of that :-). Fields that are already deleted and are progressively purged on cron are still caught if a later config sync removes the field type module - that config sync will include running the remaining purges in the batch.
See "// Gather deleted fields from modules that are being uninstalled." in ConfigImporterFieldPurger::getFieldsToPurge()

AFAICT this should even be safe if cron runs while the batch is running - will just result in extraneous calls to field_purge_batch(), which will just be no-ops.

swentel’s picture

wow, all I can say is alexpott-to-infinity-plus

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.