Problem/Motivation
Deletion of a taxonomy field from a content type leaves references to affected nodes in taxonomy_index, which results in vocabulary terms remaining attached to those nodes.
Steps to replicate:
- Go to
admin/structure/taxonomy/tags
and add a new term. - Go to
admin/structure/types
and add a new content type. - On the “Manage fields” tab for the new content type, add a new term reference field using the select widget. Save and select the “tags” vocabulary on the field settings form.
- Go to
node/add
and add a node of the content type created in Step #2. Select the term you added in Step #1 in the field from Step #3 and save. - Go to
admin/structure/types
and click on the "Manage fields" tab. - Delete the term reference field created in Step #3.
- Go to
admin/structure/taxonomy/tags
and click on the tag created in Step #1. - Observe that the node from Step #4 is still listed, despite that the field containing this term was removed from the content type.
Expected behaviour
You are given the message "If you have any content left in this field, it will be lost. This action cannot be undone." indicating that any tags given to content of this content type will be removed.
Actual behaviour
Content remains tagged with the vocabulary despite not having the field any more. These can be removed by readding the term reference field, removing the tags for the content and then removing the field again.
Proposed resolution
Deleting references by vocabulary is complicated by the potential for data loss if two taxonomy fields on a content type reference the same vocabulary. Proposed resolution is to clear and rebuild taxonomy_index. This may cause slowdown on large sites; suggested that it should run through Queue API.
Remaining tasks
This issue is currently waiting on a patch in #89181: Use queue API for node and comment, user, node multiple deletes.
User interface changes
Warning message noting the potential delay in taxonomy re-indexing and a link to cron for immediate rebuild if desired.
Comment | File | Size | Author |
---|---|---|---|
#90 | interdiff_88-90.txt | 3.51 KB | vsujeetkumar |
#90 | 1065814-90.patch | 4.62 KB | vsujeetkumar |
#88 | interdiff_86-88.txt | 967 bytes | ravi.shankar |
#88 | 1065814-88.patch | 3.41 KB | ravi.shankar |
#86 | 1065814-86.patch | 3.41 KB | Medha Kumari |
Issue fork drupal-1065814
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #1
franzNeed to implement hook_field_delete_instance()
Patch coming...
Comment #2
franzThis is not simple, actually.
When using default taxonomy_index to store the relationship (which is default), the field instance is not in any way stored as well. So when deleting a field instance, it is only possible to determine which rows to delete by collecting all tids from the vocabulary specified by the term reference field.
The only edge case that would cause unexpected results would be the one in which you have 2 fields referencing terms from the same vocabulary. Although stupid, it needs attention, consider this scenario:
1) Dev1 creates complex content type with term_reference Field1 referencing Tags vocabulary
2) Dev2 is a stupid dev that added another term_reference Field2 also referencing Tags
3) Content is added (let's say Field2 is ignored).
4) Someone gets annoyed by Field2 and proceed to remove it. This might cause data from Field1 to be deleted as well.
I think a good solution would be to not allow 2 fields to reference the same vocabulary on the same bundle (nodetype).
Patch implements removal of entries on taxonomy_index table only.
Comment #3
franzComment #4
franzComment #6
franzI was trying to access field info in a field instance array. Fixed
Comment #7
joachim CreditAttribution: joachim commented> I think a good solution would be to not allow 2 fields to reference the same vocabulary on the same bundle (nodetype).
I disagree -- one of the reasons http://drupal.org/project/content_taxonomy was so popular was that it allowed this, so there's obviously use cases for it.
Comment #8
joachim CreditAttribution: joachim commentedI think the better solution would be to clear taxonomy_index for the affected nodes and rebuild it.
Comment #9
franz@joachim, I had a misunderstanding, I didn't knew taxonomy_index was only to speed up while the values were really into proper field storage. However, the patch attached above does only delete the values from taxonomy_index based on vocabulary settings. Except it was missing an execute().
But I changed the logic completely to include a rebuild, so now, as far as I tested (several setups) index is being kept consistent. However, maybe this could be improved in terms of performance.
Comment #10
franzIn short terms, I ended up implementing suggestion in #8
Comment #11
franzAnd it needs to be better documented as well...
Comment #12
marcingy CreditAttribution: marcingy commentedSurely this has to be batch operation as doing it inline on a large site could potentially result in 1000s of items being updated. Maybe queue api could be utilised to do this batch processing. It is really a variation on this issue in some ways #89181: Use queue API for node and comment, user, node multiple deletes although by no means a duplicate but the discussions there might help with coming up with a robust solution.
Comment #13
marcingy CreditAttribution: marcingy commentedduplicate post
Comment #14
franzSince this data doesn't need to be critically updated, I think using a Queue and letting it run on cron sounds a good idea.
Comment #15
franzOr maybe it makes sense to just run the batch at that time so that the admin deleting the field would be confident that data is consistent.
Comment #16
marcingy CreditAttribution: marcingy commentedThat could take for ever on tha large site and if the browser dies the batch won't complete.
Comment #17
franzok, Queue sounds better, along with a warning message explaining it, like "Taxonomy index needs rebuilding to reflect the changes. A job has been scheduled to do so. If you want to rebuild it right away, run cron (link)" I'll give it a shot later (need to educate myself properly about Queue API).
Comment #18
franzWe may use a better summary as well.
Comment #19
franzAre you sure about this? I just checked, and batch implements the Queue API.
Otherwise, I'm still confused with Queue API, should we make use of SystemQueue or create a new one?
Comment #20
marcingy CreditAttribution: marcingy commentedBatch api runs in a browser session by default and triggers based on ajax, there is no queue involved at all simply a sandboxed form post.
Comment #21
franzfrom http://api.drupal.org/api/drupal/includes--form.inc/function/batch_proce...
Also there is http://api.drupal.org/api/drupal/includes--batch.queue.inc/7 and plenty of queue calls during batch execution.
If that's not Batch using Queue then there is a bunch of confusion in docs or in my head. =)
Comment #22
marcingy CreditAttribution: marcingy commentedIt depends on how you are calling the batch process if do anything via drupal_form_submit progressive will be set to false and the batch will not utilise the queue. Also if look at node_access_rebuild it calls batch_set which results in a memory queue being used for the batch process. In a similar way user_cancel also populates a memory queue. So the operations in question are not persisted to the database - there might be a way around it but core doesn't seem to utilise it on interactive screens.
Comment #23
franzI've submitted a patch in in #89181: Use queue API for node and comment, user, node multiple deletes. If that gets implemented, this issue will be easily fixed, and in a consistent way.
Comment #24
Caligan CreditAttribution: Caligan commentedRevised issue summary.
Comment #25
xjmI discovered today that this issue has pretty serious implications for TAC. (And yay for summary that helped me figure this out quickly!) Scenario:
There is absolutely nothing the site admin can do to fix it. Editing the node doesn't have any effect. Rebuilding permissions is no help because as far as the database is concerned, the node is still tagged. The user's only options are to either delete records from the database manually, or to delete the old term and update every single node in content types that still use the old term with a new one.
Based on this, I'm not sure if "on cron" is soon enough, although it's certainly better than nothing. However, if a batch is available for the queue, TAC could probably implement the same hook and set a message asking the user to run the batch, à la
node_access_needs_rebuild()
. That way it would still be a background thing on a majority of sites where it doesn't matter, but modules that need a faster turnaround for taxonomy changes could prompt the admin to fix it.Also, we are already updating the index en masse when a taxonomy term or vocabulary is deleted. I guess the difference is explained in #2. But instead of loading all those nodes, couldn't we run a query before removing the data, to find out which node/term combinations should be removed and which are duplicated by different fields?
Comment #26
joachim CreditAttribution: joachim commentedI've had a thought about how this is all working.
On D6, I am pretty sure that removing a vocabulary's association to a node type would leave all the data in term_node for nodes of that type intact.
The difference presumably would have been that when something like TAC asked if node X had term Y, the first filter would be checking the vocabs that apply to its type.
Can someone confirm this?
If it's the case, perhaps it's that step that's missing here rather than the data clean-up?
Comment #27
joachim CreditAttribution: joachim commentedI've had a thought about how this is all working.
On D6, I am pretty sure that removing a vocabulary's association to a node type would leave all the data in term_node for nodes of that type intact.
The difference presumably would have been that when something like TAC asked if node X had term Y, the first filter would be checking the vocabs that apply to its type.
Can someone confirm this?
If it's the case, perhaps it's that step that's missing here rather than the data clean-up?
Comment #28
franz+1 for the awesome summary.
@xjm, do you confirm that the current patch works for that case? The logic should be the same, but we're waiting so that we can implement a queue job. I don't think it's still a problem, as long as site admin gets a message explaining that he needs to run the cron to rebuild the data.
Also, if you can hop in #89181: Use queue API for node and comment, user, node multiple deletes to help there, we can resume the work needed here.
Comment #29
xjmCan someone test Damien's patch in #687180-44: Deleting a taxonomy vocabulary leaves term reference fields still pointing to it, and a PDO Exception when creating content with the steps to reproduce in the summary and see if it resolves this issue?
Tagging as novice for this testing.
Comment #30
xjmAnd tagging for manual testing.
Comment #30.0
xjmUpdated issue summary.
Comment #30.1
aLearner CreditAttribution: aLearner commentedUpdated issue summary.
Comment #30.2
aLearner CreditAttribution: aLearner commentedUpdated issue summary.
Comment #30.3
aLearner CreditAttribution: aLearner commentedUpdated issue summary.
Comment #31
aLearner CreditAttribution: aLearner commentedI'm on it, xjm. Thank you for all your help in updating the 'Steps to Replicate' in the 'Issue Summary'. Does the update look OK?
Per your instructions in Comment # 51 of the following thread
http://drupal.org/node/687180
I'm proceeding to test if the patch referenced there resolves this issue. Thanks again.
Comment #32
aLearner CreditAttribution: aLearner commentedTest
Here are the steps I took to test the patch:
[1] Downloaded and installed patch
http://drupal.org/files/687180-taxonomy-field-cleanup.patch
on a fresh install of D8.
[2] Ran through steps in the section titled 'Updated Steps to Replicate'.
[3] Observed that the same bug still exists.
Result
The patch
http://drupal.org/files/687180-taxonomy-field-cleanup.patch
does not fix this issue.
Comment #32.0
aLearner CreditAttribution: aLearner commentedUpdated issue summary.
Comment #33
xjmThanks aLearner!
I'll write an automated test that demonstrates the bug.
Comment #34
xjmLooks like the patch still needs work, even aside from the bulk operation question. Scenario:
The result with the patch applied is that there are no index entries for this node.
Attached test demonstrates the bug in the original post; the combined patch demonstrates the issue with #9. Sending the bot to demonstrate the fails.
Comment #35
xjmComment #36
xjmOh, also, this comment should end in a period (and could probably be clarified a little as well).
Comment #37
schiavone CreditAttribution: schiavone commentedCannot reproduce...
Applied the combined patch
Followed the steps listed at #34
Ran taxonomy.test with no errors
Comment #38
schiavone CreditAttribution: schiavone commented#34: 1065814-34-combined.patch queued for re-testing.
Comment #40
xjmYou could not reproduce the bug, or you could not reproduce the test failures? I clearly didn't explain it well above since I forgot myself, but the combined patch fails locally for me, as above, demonstrating that it still needs work.
Comment #41
Albert Volkman CreditAttribution: Albert Volkman commentedRe-rolled against current head. Test doesn't run locally, so trying testbot.
Comment #43
Albert Volkman CreditAttribution: Albert Volkman commentedTest showed - Call to undefined method stdClass::save. I'm not super familiar with testing and how D8 saves nodes, so I just copied how another test ran.
Comment #45
Albert Volkman CreditAttribution: Albert Volkman commentedThis appears to do the trick. Passes the simpletests, and passed manual testing. I replaced field_attach_update() with taxonomy_build_node_index().
Interdiff'd against #41.
Comment #46
iainp999 CreditAttribution: iainp999 commentedTested the patch using the steps above, and stepped through using xdebug for good measure. Works as expected.
Comment #47
corvus_ch CreditAttribution: corvus_ch commentedConfirm, looks good.
Comment #48
gaas CreditAttribution: gaas commentedI verified that the patch works as expected; taxonomy page loose the nodes after the field is deleted (and it didn't do that without the patch).
The patch makes it possible to not update the index by setting the 'taxonomy_maintain_index_table' variable to FALSE. What's the use case for not cleaning up the index? It seems more correct to just remove this option. If the option is kept shouldn't the variable be documented somewhere?
The patch also runs a separate db_delete() for each node that was tagged and then rebuild the index for each. This seems very expensive if there are a lot of tagged nodes. Would it not be possible to clean them up all at once?
Comment #49
BerdirThe handling of taxonomy_maintain_index_table is correct, this setting is not introduced in this patch, it already exists and is just (correctly) applied here as well.
The performance remarks however are true. This is a problem and therefore IMHO not ready. There might be a relatively easy way to fix it, though. When a field instance is deleted, then the field data is not actually deleted but just marked as such.
Drupal will then continously delete those field items in the background. That background process, however, does not invoke hook_node_delete(), only hook_field_attach_purge()/hook_field_delete() (which is a callback, not a hook). So taxonomy module could additionally implement that hook. The downside of that approach is that the index entries will only be deleted over time and it will probably take a few cron runs to get rid of them. With a lot of data more than just a few. But that's better than not being able to delete a field or having Drupal crash in between. And there could be a contrib module that provides a batch process to trigger field data deletion.
Setting back to needs work, this at least needs a scalability test (generate 100k articles, try to delete the tags taxonomy term reference). I don't think we can ship with something like this.
naming the query object variable $nids is very uncommon, that variable is in 99% of the cases called $query.
The loop could use fetchCol() to just have an array of nids and could also use node_load_multiple() to load those nodes in a single query. That scales just as a bad as node_load() due to the static cache, so we might actually need to do node_load() and reset the static cache after every call to at least save memory.
Comment #50
Albert Volkman CreditAttribution: Albert Volkman commentedI was starting to refactor this, and I was considering doing something like -
however, I wasn't aware of node_load_multiple() not scaling well. Also, I wasn't entirely sure how to turn the query result from an iterable object to an array.
Comment #51
Berdiryou can use ->execute()->fetchCol() to get an array of the node ids, for example.
Comment #52
Albert Volkman CreditAttribution: Albert Volkman commentedHere's another go at the patch. I removed the node_load() from the loop and replaced it with a node_load_multiple(). Also, I consolidated the looped db_delete() queries into a single query.
Comment #53.0
(not verified) CreditAttribution: commentedUpdated issue summary.
Comment #55
Wtower CreditAttribution: Wtower commentedSo I came across this bug. I deleted a term reference field from a content type, and the nodes that belonged to specific terms still appear under the respective term pages.
If this helps anyone, until the bug is solved, in order to I get rid of these term values from the nodes, I re-added a taxonomy term reference field to this content type. I then created a view with VBO, modified entity values: set taxonomy term of all the nodes of this type to nothing. This caused the taxonomy module to rebuild the index. Then I was able to remove the field at last.
Comment #56
mgiffordThis still a concern in D8? Unassigned issue too.
Comment #64
effulgentsia CreditAttribution: effulgentsia at Acquia commentedYep, it does appear that the described bug still exists in D8 too. As it's a data integrity bug, I'm raising this to Major, tagging it, and adding more specific information to the title.
Comment #69
catchAgreed with the #50/#52 approach, patch needs a re-roll.
Field deletion is a good candidate for the background batch/job queue runner being discussed in #3242216: Support JSON:API asynchronic tasks.
Comment #70
yogeshmpawar.
Comment #71
yogeshmpawarComment #74
pooja saraah CreditAttribution: pooja saraah at Srijan | A Material+ Company for Drupal India Association commentedAttached Patch against 9.5.x
Attached reroll diff against #52
Comment #75
pooja saraah CreditAttribution: pooja saraah at Srijan | A Material+ Company for Drupal India Association commentedComment #76
pooja saraah CreditAttribution: pooja saraah at Srijan | A Material+ Company for Drupal India Association commentedfixed failed commands against #74
Attached interdiff
Comment #77
ranjith_kumar_k_u CreditAttribution: ranjith_kumar_k_u at Zyxware Technologies commentedTry to fix the test failure
Comment #79
ranjith_kumar_k_u CreditAttribution: ranjith_kumar_k_u at Zyxware Technologies commentedComment #81
RassoniRerolled patch against #78
Attached interdiff
Comment #83
RassoniComment #84
pooja saraah CreditAttribution: pooja saraah at Srijan | A Material+ Company for Drupal India Association commentedFixed failed commands on #83
Attached patch against Drupal 9.5.x
Comment #85
ameymudras CreditAttribution: ameymudras at Salsa Digital commentedReviewed last three patches and firstly they don't seem to apply for 9.5.x. I manually applied the code but the functionality doesn't seem to work. Moving the issue to needs work.
The patch makes use of
+function taxonomy_field_delete_instance($instance) {
hook_field_delete_instance() to perform the cleanup but I couldn't find the reference of this hook in D9. We might need to rework because the issue still exits
Comment #86
Medha KumariReroll the patch #76 with Drupal 9.5.x
Comment #88
ravi.shankar CreditAttribution: ravi.shankar at OpenSense Labs commentedTrying to fix failed tests of patch #86.
Comment #90
vsujeetkumar CreditAttribution: vsujeetkumar at Srijan | A Material+ Company for Drupal India Association commentedUpdated tests based on 9.5.x.