Improve comments for the taxonomy upgrade path [#706842]

Comment	File	Size	Author
#103	taxonomy-upgrade-706842-103.patch	666 KB	bjaspan
#97	taxonomy-upgrade-706842-97.patch	665.31 KB	bjaspan
#96	taxonomy-upgrade-706842-96.patch	665.36 KB	bjaspan
#92	field_sql_storage.patch	707 bytes	mrfelton
#76	706842-76-taxonomy-update-path-broken.patch	16.14 KB	Anonymous (not verified)
#74	706842-73-taxonomy-update-path-broken.patch	16.16 KB	Anonymous (not verified)
#71	706842-71-taxonomy-update-path-broken.patch	16.04 KB	Anonymous (not verified)
#70	706842-70-taxonomy-update-path-broken.patch	16.06 KB	Anonymous (not verified)
#63	706842-63-taxonomy-update-path-broken.patch	16.26 KB	Anonymous (not verified)
#58	d6local.sql_.gz	1.13 MB	mlncn
#48	706842-48-taxonomy-update-path-broken.patch	16.31 KB	Anonymous (not verified)
#46	706842-46-taxonomy-update-path-broken.patch	15.37 KB	Anonymous (not verified)
#43	706842-43-taxonomy-update-path-broken.patch	15.92 KB	Anonymous (not verified)
#38	706842-38-taxonomy-update-path-broken.patch	12.81 KB	Anonymous (not verified)
#33	706842-taxonomy-update-path-broken.patch	9.42 KB	damien tournoud
#21	generate-d6.txt	3.99 KB	chx
#21	706842-taxonomy-update-broken.patch	589.74 KB	chx
#20	generate-d6.sh_.txt	3.91 KB	chx
#17	706842-taxonomy-update-broken.patch	493.79 KB	damien tournoud
#13	706842-taxonomy_update_7005.patch	3.07 KB	aaronbauman

Comment #1

yched commented 7 February 2010 at 00:43

Well, the current upgrade path happens to be broken anyway.

- taxonomy_update_7005() has a query with a ':vocabulary_id' => $vocabulary->vid placeholder, while $vocabulary is undefined. I guess at some point the code in there got factored out of taxonomy_update_7004(), which iterates on vocabularies to create the fields and instances.

- taxonomy_update_7005() also fails to set $sandbox['#finished'], which actually makes it single step, batch API assumes it's done after the first iteration - which is probably why the above didn't raise any warnings so far. The function currently is a no-op (doesn't even wipe the taxonomy_term_node table :-p)

Log in or register to post comments

Comment #2

yched commented 9 February 2010 at 00:23

Title:

"taxo as field" update path wipes some node / term associations

» "taxo as field" update broken + wipes some node/term associations

Massage title a bit.

+ for clarity :
Comment #1 is "taxo as field upgrade path is broken".
OP is "even if it successfully did what it intended to do, it would ditch some user data".

Log in or register to post comments

Comment #3

catch

he/him

English

commented 9 February 2010 at 08:19

OK so we should fix #1 first.

For #2 that's going to be tricky. We need to get all the vocabularies, then get the node type of every node attached a term in each vocabulary - can be done, but it's a big DISTINCT query joining on three tables so not nice. Then we could just use that to create the fields and instances instead of $vocabulary->node_types

Log in or register to post comments

Comment #4

joachim commented 6 March 2010 at 20:11

Why do you need the type of nodes? Surely you only need to know which types a vocabulary has been assigned to, irrespective of whether any nodes actually have terms from it.

If my D6 site had vocab Foo set to story and page types, and because I'd not tagged any pages yet I didn't get that vocab applied to pages after upgrade, I would consider the upgrade to have lost something of my site's data.

See #366364: [meta] Data migration from D6 contrib CCK fields to D7 Field API for code that does the field creation for vocabs based on the existing data.

Log in or register to post comments

Comment #5

catch

he/him

English

commented 7 March 2010 at 02:13

You need the actual node types because various contrib modules allow you to assign terms to nodes without the node type being associated with the vocabulary. Also removing a vocabulary from a node type doesn't remove term_node records either. Possibly we need to check both the empty and unassigned vocabulary cases though.

Log in or register to post comments

Comment #6

bradweikel commented 17 April 2010 at 03:46

Are there other modules besides Taxonomy Defaults that do this? If not, maybe this issue belongs in the Taxonomy Defaults queue. I adopted that module late in its life-cycle and discovered its "hidden vocabulary" approach to be a nasty hack that broke compatibility with many other taxonomy modules, and I was planning to refactor it before fields-in-core happened. (And the latest 6.x-dev version of Taxonomy Defaults might actually pave the way for a clean D7 taxonomy upgrade already -- not sure, though, because I haven't looked at it in months).

Log in or register to post comments

Comment #7

douggreen commented 19 April 2010 at 23:28

As part of DrupalCon SF, I promised to work a D7 critical issue to completion, I asked catch for one, and this is it.... so subscribing, I'll be back later to say more ;)

Log in or register to post comments

Comment #8

jpmckinney commented 20 April 2010 at 00:01

@#6: Even if no modules besides Taxonomy Defaults do this, you can be sure many people have been doing it in import scripts. You don't need a module to programmatically add a term to a node, even if that node's type is not associated to that term's vocabulary. That's just part of the D6 API. It's not part of the D7 API, but we shouldn't leave data behind just for that reason.

Log in or register to post comments

Comment #9

hefox commented 20 April 2010 at 05:17

(Bit late, but Content taxonomy does not enforce vocabulary node types settings; http://drupal.org/project/usage/content_taxonomy 40k.)

Log in or register to post comments

Comment #10

sime

any/any

Melbourne

commented 23 April 2010 at 03:32

Content taxonomy (CT for convenience) should be ok. In any configuration it always has it's tids in cck fields. This means the upgrade path is separate. CT does also have an option to keep tids *additionally* in term_data table. We're going to end up creating a field for these, not really knowing where these orphaned tids are coming from but, to be clear, CT will get it's data from the cck fields.

Log in or register to post comments

Comment #11

catch

he/him

English

commented 26 April 2010 at 04:06

Issue tags:

+D7 upgrade path

tagging.

Log in or register to post comments

Comment #12

yoroy commented 8 May 2010 at 07:09

- taxonomy_update_7005() has a query with a ':vocabulary_id' => $vocabulary->vid placeholder, while $vocabulary is undefined. I guess at some point the code in there got factored out of taxonomy_update_7004(), which iterates on vocabularies to create the fields and instances.

douggreen, any news on this? :)

Log in or register to post comments

Comment #13

aaronbauman

he/him

commented 4 June 2010 at 19:45

Status	File	Size
new	706842-taxonomy_update_7005.patch	3.07 KB

This patch addresses the issues in comment #1, taxonomy_update_7005 is a no-op, by implementing the batch logic that was overlooked when update 7004 was re-factored.

Collect all vocabularies in initial batch step and dump into sandbox.
When terms for one vocabulary are exhausted, remove vocabulary from sandbox.
Update $sandbox['#finished'] on each pass, setting to TRUE when done.

Log in or register to post comments

Comment #14

aaronbauman

he/him

commented 4 June 2010 at 19:46

Status:

Active

» Needs review

This is "needs review" until comment #1 is addressed, when it should go back to "needs work" or "active" to address the OP

Log in or register to post comments

Comment #15

marcingy commented 25 June 2010 at 03:12

#13: 706842-taxonomy_update_7005.patch queued for re-testing.

Log in or register to post comments

Comment #16

Anonymous (not verified) commented 26 June 2010 at 20:24

I've made a DB for D6 to work on this issue. It has 51000 terms because I'm also hearing reports of WSODs on upgrades with deep vocabularies. All the terms and nodes are generated by devel_generate. http://drop.io/bangpound/asset/drupal6-mysql-bz2 (Feel free to use or suggest things I should add to make it useful for other issues.)

To test the upgrade path, use the DB as is.

To test the upgrade path with the problem reported in this issue (terms applied to nodes whose vocabulary doesn't include the node type), delete a row or two from the vocabulary_node_types table before running upgrade.php.

Now it's time for me to dig into this issue.

Log in or register to post comments

Comment #17

damien tournoud commented 28 June 2010 at 07:23

Status	File	Size
new	706842-taxonomy-update-broken.patch	493.79 KB

Here is what it actually takes to fix this upgrade path.

The test database is only random content generated with Devel generate for now, chx is working on a way to make this repeatable. Just skip the test database when reviewing the actual patch.

Log in or register to post comments

Comment #18

28 June 2010 at 16:20

Status:

Needs review

» Needs work

The last submitted patch, 706842-taxonomy-update-broken.patch, failed testing.

Log in or register to post comments

Comment #19

catch

he/him

English

commented 29 June 2010 at 06:16

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #20

chx commented 29 June 2010 at 07:02

Status	File	Size
new	generate-d6.sh_.txt	3.91 KB

Here is a d6 generate script. Still needs to generate comments and poll. But can be used to test this issue just right. Creates 6 users, 24 vocabularies, 48 terms, 24 nodes, 36 node revisions and 1656 rows in term_node.

Edit: the node_revisions table uid is all 0. I will debug why later. Until then, before dump update node_revisions nr set uid = 3+(select max(uid) from node n where nr.nid=n.nid);

Log in or register to post comments

Comment #21

chx commented 6 July 2010 at 04:48

Status	File	Size
new	706842-taxonomy-update-broken.patch	589.74 KB
new	generate-d6.txt	3.99 KB

This is the generate script which IMO does not belong to D7 and then the tested upgrade path. The dump contains 24 vocabularies, 48 terms (not two vocab, it's 1 or 2 or 3 depending on hierarchy, multiple parents need three etc), 24 nodes, 48 node revisions, and 1656 entries in term_node.

Other than that, it's mostly DamZ's patch, rerolled against HEAD, removing the occassional fatal :)

Edit: props to SteveL for figuring out that it's better to have the actual node author loaded in the generating script.

Log in or register to post comments

Comment #22

moshe weitzman commented 7 July 2010 at 03:18

So now we have to store a whole drupal database as proof that a single update function works as designed? At this rate, we are going to balloon the drupal download even further. This is a 500K patch that fixes a handful of lines. I'm not saying that this is definately wrong, but I am not sure its right either.

Log in or register to post comments

Comment #23

chx commented 7 July 2010 at 03:21

This database dump can be used for many other updates that involve nodes and user and taxonomy and path aliases.

Log in or register to post comments

Comment #24

chx commented 7 July 2010 at 03:23

Also we might want to consider excluding this directory from the Drupal tarball and only those who checkout will get it.

Log in or register to post comments

Comment #25

webchick

she/they

English

Vancouver 🇨🇦

commented 7 July 2010 at 18:17

Looked more into my webchick.net upgrade and saw that I lost term associations, so checking this out later.

Moshe's concern is legitimate. I talked to chx about this on IRC. A couple of things:

1. We're only going to have two database dumps: a "barebones" one (for catching things like the upgrade path fails because it assumes the term_data table exists) and a "fancy" one with lots of data (for exhaustively checking all the various core modules' paths). So every test that tests some aspect of the upgrade path will NOT have a 500KB dump associated with it, though we probably will have a single, 1MB dump at the end.

2. mikey_p suggested gzipping the file and having SimpleTest read it in as part of the test. I think this would mitigate filesize concerns. The downside is diffs then become useless. :\

3. We could also do chx's suggestion of excluding this one mega-file from the tarball. I'm not a fan of tarballs and CVS mis-matching in this way, though, but it seems most core developers would not be affected since we all work from CVS (or bzr/git whatever).

Log in or register to post comments

Comment #26

moshe weitzman commented 7 July 2010 at 18:35

two database dumps as described is a reasonable approach.

i'm jumping the gun perhaps, but it would be nice if our db dump also served as a representative dataset for performance testing. See http://drupal.org/project/northdrop

Log in or register to post comments

Comment #27

chx commented 8 July 2010 at 04:41

I talked to dww and it's possible to have a 'normal' and a 'developers' tarball attached to every node. We special case core already apparently anyways.

Log in or register to post comments

Comment #28

Stevel commented 8 July 2010 at 18:34

+++ modules/simpletest/simpletest.info	2010-07-06 04:37:14 +0000
@@ -39,3 +39,4 @@ files[] = tests/unicode.test
=== added file 'modules/simpletest/tests/upgrade/drupal-6.taxonomy.database.php'

=== added file 'modules/simpletest/tests/upgrade/drupal-6.taxonomy.database.php'
--- modules/simpletest/tests/upgrade/drupal-6.taxonomy.database.php	1970-01-01 00:00:00 +0000

--- modules/simpletest/tests/upgrade/drupal-6.taxonomy.database.php	1970-01-01 00:00:00 +0000
+++ modules/simpletest/tests/upgrade/drupal-6.taxonomy.database.php	2010-07-06 04:37:53 +0000

As the generated database file will be one big database that will serve for different tests, it should be renamed to something more generic.

+++ modules/simpletest/tests/upgrade/upgrade.taxonomy.test	2010-07-06 04:37:14 +0000
@@ -0,0 +1,32 @@
+    $this->drupalGet('node/89');
+    $this->drupalGet('node/89/edit');

The node id can vary when the database is updated to contain more (other) types of nodes. Using a static path (i.e. term/foo or something) in the generation code would remedy this.

+++ modules/taxonomy/taxonomy.install	2010-07-06 04:37:14 +0000
@@ -423,14 +414,51 @@ function taxonomy_update_7005(&$sandbox)
+    } ¶

trailing whitespace

+++ modules/taxonomy/taxonomy.install	2010-07-06 04:37:14 +0000
@@ -449,20 +477,38 @@ function taxonomy_update_7005(&$sandbox)
+    ¶
+    // If there were no rows returned, we're finished with the current vocab. ¶

Trailing whitespace

Apart from the -mostly- coding style glitches, the generation-script should probably be included in the patch as well. I would prefer to have it in the same directory as the .test-file (so we can add "on-first-test-using-it" database generation using it later)

Log in or register to post comments

Comment #29

Stevel commented 8 July 2010 at 18:35

Status:

Needs review

» Needs work

This thus needs a little bit of work.

Log in or register to post comments

Comment #30

int commented 11 July 2010 at 13:52

Issue tags:

+beta blocker

add beta blocker tag

Log in or register to post comments

Comment #31

int commented 23 July 2010 at 20:26

bump... this is the last critical "D7 upgrade path" to be fixed

Log in or register to post comments

Comment #32

chx commented 30 July 2010 at 08:04

Whee, poll is in now we can fix this one! Next ~~patch~~issue can gzip down the db dumps and include compress.zlib://filename i checked it works , the manual explicitly says it's local, not restricted by allow_url_fopen.

Log in or register to post comments

Comment #33

damien tournoud commented 31 July 2010 at 10:45

Status	File	Size
new	706842-taxonomy-update-path-broken.patch	9.42 KB

Rerolled now that the standard database is in.

Needs work, for the following:

we need to add a taxonomy reference field to node types that aren't explicitly associated to a vocabulary but have some associations in term_node
we need to modify the generation script so as not to associate nodes of type 'poll' to any term, so that we can test the three cases: (1 - 'page') node type is associated to a vocabulary, (2 - 'story') node type is not associated to a vocabulary but have associations in term_node, (3 - 'poll') node type is not associated to a vocabulary and have no associations in term_node
we need to implement the tests that I outlined as TODOs in UpgradePathTaxonomyTestCase::testTaxonomyUpgrade()

Have fun!

Log in or register to post comments

Comment #34

webchick

she/they

English

Vancouver 🇨🇦

commented 4 August 2010 at 15:38

Last I heard, bangpound was working on this. Any update, bangpound? :)

Log in or register to post comments

Comment #35

int commented 4 August 2010 at 17:16

Status:

Needs work

» Needs review

let's the bot review the last patch

Log in or register to post comments

Comment #36

chx commented 4 August 2010 at 16:39

bangpound promised a patch today. there is nothing to review.

Log in or register to post comments

Comment #37

webchick

she/they

English

Vancouver 🇨🇦

commented 11 August 2010 at 16:20

Status:

Needs review

» Needs work

sad ping. :(

Log in or register to post comments

Comment #38

Anonymous (not verified) commented 12 August 2010 at 12:22

Status:

Needs work

» Needs review

Status	File	Size
new	706842-38-taxonomy-update-path-broken.patch	12.81 KB

Here's the justification for the one field that captures all of the "extra" term node associations that are not "valid" according to the relationships between vocabularies and node types. The values in D6's vocabulary_node_type were translated into a combination of field (cardinality, required), widget (autocomplete or select) and instance (help, bundle) settings.

The only thing that enforced these constraints in D6 were the form elements themselves--through form validation. If the vocabulary "sections" was only allowed to have 1 term and was configured for the "story" node type, extra term_node values for a particular story would be wiped out if the node were edited and saved again. If "page" node type were not allowed according to core's vocabulary_node_type table, who knows if it should accept multiple values or one value? It's not supposed to accept any values, but obviously we're trying to cope with this reality in this patch. So this is why I can't just add new instances of the regular taxonomy field (which I'll call the "valid" field) to the extra node types.

I'm not keen to create double the number of taxonomy term fields either. They'd all have to be the same: multiple values, autocomplete widget, not required. If I create extra fields for each vocabulary to cope with the "overflow" of term_node associations, I'm saying "I know how these terms are related to those nodes." That's not true. I don't know anything.

So the outcome is a field that can capture all of the "excess" values and which can allow nodes to be loaded and saved without losing data. It's named "taxonomyextra_upgrade." It can't be named "taxonomy_something", because there's potential for collision with a machine name. Perhaps I can use taxonomy__something? Whatever, for now let's just call it the "excess" field.

The fun part of this patch was trying to figure out how to handle deltas. If I want to translate from D6 to D7 in groups defined by vocabulary ID and node type, I'm going to switch back and forth between the valid and the excess fields. I don't want to be passing around a larger and larger array of deltas in the batch sandbox... it's entirely possible that every term_node value for an upgrade is going to be put into the "excess" field, and it could be a database with millions of terms. So to avoid an impossible rewriting the upgrade code that Catch worked hard on (and that I remember he also struggled with handling deltas), I just rely on the maximum delta value for that particular entity type, entity ID/revision ID and language combination. This requires extra DB hits but ONLY for the excess field.

Oh and I noticed that the allowed values setting for taxonomy term reference fields are using database primary keys (vocabulary ID) instead of vocabulary machine names. Of course, it also uses a parent term ID for which there is no machine name equivalent. But for common taxonomy term reference fields whose allowed values are one or more complete vocabularies, this is an exportability problem. Bug?

Remaining work needed:

Tests
Code comments

Log in or register to post comments

Comment #39

chx commented 12 August 2010 at 14:38

Taxonomy settings do not matter. What matters is that every node shows every term it had. We can't do otherwise. Broken or not, we can't get away with some nodes not showing terms hoping for a mythical contrib to smooth this out. Between extending your taxonomy settings and not showing some of your node-term associations I have no doubts which one to choose.

If we need to iterate over every node twice, first to figure out the configuration and then second to actually upgrade, so be it.

Log in or register to post comments

Comment #40

chx commented 12 August 2010 at 14:48

Sorry! I misunderstood the situation. OK. I get it. That extra field is still a term reference so we always show all terms it's just in a different field.

The question is this.

Do we want this excess field - type upgrade or do we want to adjust the cardinality of the taxonomy field so that it can accomodate the excess? Will post more later as I talk with bangpound.

Log in or register to post comments

Comment #41

Anonymous (not verified) commented 12 August 2010 at 15:02

Just to clarify:

The patch does not lose any data.
All the term values are/can be displayed in the node.
The "excess" values are stored in a new field together with all the other excess values.
The values could be edited in the node form, too. But the widget for that field should be disabled. If a contrib module was exposing values without Drupal core's node form integration, bypassing its validation and storing these associations in Drupal core's taxonomy term tables, Drupal core can't know whether these values should have a UI widget.
"Excess" values are still added to taxonomy_index table even though they're in a different field. This means they still appear for taxonomy/term/%term paths.
Excess values are segregated. It's easier to analyze. Contrib modules can decide what to do with those values without stepping data that core could migrate correctly.

Vocabulary settings in D6 are migrated to D7 widget and instance settings. While it is possible to change the cardinality and the 'required' setting on an existing field, it means users who upgrade to D7 may go through an apparently fault-free upgrade only to discover later that a field that was in D6 single valued has been mysteriously upgraded to multiple values.

But something tells me this can be worked around with Field API: taxonomy term reference field cardinality should always be unlimited? Widget/instance settings should control number of values allowed?

Log in or register to post comments

Comment #42

chx commented 12 August 2010 at 21:58

OK, that was a great summary. It's never going to be "ideal" but this is still the best we can do, I agree. Limiting stuff in widgets is against every core field API and it won't work really. Let's finish this w tests.

Log in or register to post comments

Comment #43

Anonymous (not verified) commented 14 August 2010 at 12:54

Status	File	Size
new	706842-43-taxonomy-update-path-broken.patch	15.92 KB

Here's a new patch. I touches all the bases EXCEPT not all tests pass.

Why don't all tests pass?

The demo data provided by Damien includes data that is invalid for the valid fields derived from the taxonomy_vocabulary_node_type table. The generated data doesn't seem to care that some of the vocabularies are defined as "multiple" and some are not. However, the migration script DOES care... sorta. It creates single value fields for these vocabularies that are not defined as multiple, but then it goes ahead and puts multiple values from taxonomy_term_node in those field data tables anyway.

The final test uses a simple algorithm to determine which terms should be in a field and which should not. If the term ID is the same as the node ID, it shouldn't be on the field. Likewise, if the term ID and the node ID add up to 49, the value shouldn't be in the field. However, every other term should be on every page and story node in the test data set. This isn't possible when many of the fields do not support multiple values.

Result: 1020 passes, 224 fails, 0 exceptions, and 40 debug messages

I tried to create all the vocabularies as multiple, but that didn't work.

Result: 1140 passes, 92 fails, 0 exceptions, and 34 debug messages

I'll still finish this up today, though! Not gonna let this last another day.

I'm beginning to think though that we're going to break many hearts with this upgrade path. Perhaps the best solution is the exceedingly careful one: migrate values that we can and delete rows of valid migrated values from the taxonomy_term_node table. If the table is empty? Hooray, drop the table. If not? Someone has a mess to clean up... but hey! Your data is "safe."

Log in or register to post comments

Comment #44

14 August 2010 at 13:30

Status:

Needs review

» Needs work

The last submitted patch, 706842-43-taxonomy-update-path-broken.patch, failed testing.

Log in or register to post comments

Comment #45

int commented 14 August 2010 at 13:57

the #43 patch work fine in my d6 to d7 upgrade.

All terms were migrated fine.

This patch is all what I need to my site upgrade work.

Log in or register to post comments

Comment #46

Anonymous (not verified) commented 15 August 2010 at 07:01

Status:

Needs work

» Needs review

Status	File	Size
new	706842-46-taxonomy-update-path-broken.patch	15.37 KB

This patch is 100% functional.

It needs extensive commenting and it could probably be a bit more elegant. Give me another pass through it.

It produces a wave of warnings:

Undefined property: stdClass::$vocabulary_machine_name

These only appear during one iteration of a loop of tests.

I'm so curious to find out how many people will end up with the "excess" field after they upgrade. I suspect it will be quite a lot.

Log in or register to post comments

Comment #47

15 August 2010 at 07:40

Status:

Needs review

» Needs work

The last submitted patch, 706842-46-taxonomy-update-path-broken.patch, failed testing.

Log in or register to post comments

Comment #48

Anonymous (not verified) commented 15 August 2010 at 15:56

Status:

Needs work

» Needs review

Status	File	Size
new	706842-48-taxonomy-update-path-broken.patch	16.31 KB

Here's a patch. I avoided the "warnings" so the patch would pass all tests. I've added comments to explain what's going on in the new taxonomy_upgrade_7005(). I hope it's not going to ruin performance.

TODO: We can just assume that an overflow field is necessary and create it with all vocabularies. This saves us one expensive query.

TODO: Name this overflow field. It can't be named 'taxonomy_' + something because that will collide with other potential names.

Log in or register to post comments

Comment #49

15 August 2010 at 16:40

Status:	Needs review	» Needs work
Issue tags:	-D7 upgrade path, -beta blocker

The last submitted patch, 706842-48-taxonomy-update-path-broken.patch, failed testing.

Log in or register to post comments

Comment #50

Anonymous (not verified) commented 15 August 2010 at 17:25

Status:	Needs work	» Needs review
Issue tags:		+D7 upgrade path, +beta blocker

#48: 706842-48-taxonomy-update-path-broken.patch queued for re-testing.

Log in or register to post comments

Comment #51

Anonymous (not verified) commented 17 August 2010 at 13:48

The patch basically rewrites taxonomy_update_7005() so it's probably hard to read without applying it. See http://drupalbin.com/node/15761 for the full function.

EDIT: Corrected URL

Log in or register to post comments

Comment #52

tsvenson commented 17 August 2010 at 09:48

@bangpound

That link gives Page not found. I tried http://drupalbin.com/node/15761 and it opens the update 7005 page. Is the correct one?

Log in or register to post comments

Comment #53

Anonymous (not verified) commented 17 August 2010 at 13:47

Hmm.. that's odd! Yes. http://drupalbin.com/node/15761 is the right one.

Log in or register to post comments

Comment #54

chx commented 17 August 2010 at 18:25

Status:

Needs review

» Reviewed & tested by the community

The solution is the best we can get out of such a mess (ie converting things not adhering to a foreign key constraint into a stricter system), the new code is nice and the tests are solid. Go!

Log in or register to post comments

Comment #55

mlncn commented 17 August 2010 at 22:07

Status:

Reviewed & tested by the community

» Needs work

With both an existing real-world Drupal 6 site and a freshly minted d6 site with some devel generated taxonomy terms and nodes fail when upgrading with this patch:

Failed: PDOException: SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(etid, entity_id, revision_id, bundle, language, delta, ) VALUES ('1', NULL, NUL' at line 1: INSERT INTO {} (etid, entity_id, revision_id, bundle, language, delta, ) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6); Array ( [:db_insert_placeholder_0] => 1 [:db_insert_placeholder_1] => [:db_insert_placeholder_2] => [:db_insert_placeholder_3] => [:db_insert_placeholder_4] => und [:db_insert_placeholder_5] => 0 [:db_insert_placeholder_6] => 2922 ) in taxonomy_update_7005() (line 571 of /home/ben/workspace/d7b/modules/taxonomy/taxonomy.install).

This variable:

      $value_column = $field['storage']['details']['sql'][FIELD_LOAD_REVISION][$table]['tid'];

is not getting any value. Haven't figured out yet.

Trying to list terms in any vocabulary results in this, but this is just a symptom:

Error message
PDOException: SQLSTATE[42S22]: Column not found: 1054 Unknown column 'base.format' in 'field list': SELECT base.tid AS tid, base.vid AS vid, base.name AS name, base.description AS description, base.format AS format, base.weight AS weight, v.machine_name AS vocabulary_machine_name FROM {taxonomy_term_data} base INNER JOIN {taxonomy_vocabulary} v ON base.vid = v.vid WHERE (base.tid IN (:db_condition_placeholder_0, :db_condition_placeholder_1)) ; Array ( [:db_condition_placeholder_0] => 30 [:db_condition_placeholder_1] => 31 ) in DrupalDefaultEntityController->load() (line 185 of /home/ben/workspace/d7b/includes/entity.inc).

Log in or register to post comments

Comment #56

mlncn commented 18 August 2010 at 02:20

Pretty sure the failure is that 'taxonomyextra_upgrade' is sometimes assigned here, about line 527 of the patched code:

      // Use the valid field for this vocabulary and node type or use the
      // overflow vocabulary if there is no valid field.
      $field_name = isset($sandbox['vocabularies'][$record->vocab_id][$record->type]) ? $sandbox['vocabularies'][$record->vocab_id][$record->type] : 'taxonomyextra_upgrade';
      $field = $field_info[$field_name];

when it has not been created within the if (!empty($extra_vocabularies)) { statement.

Specifically, given the failure on normal upgrades but not with the test data, this seems to happen when all terms should be fitting into normal fields.

Log in or register to post comments

Comment #57

webchick

she/they

English

Vancouver 🇨🇦

commented 18 August 2010 at 02:53

This means we need further additions to those tests to catch this case.

Log in or register to post comments

Comment #58

mlncn commented 18 August 2010 at 17:10

Status	File	Size
new	d6local.sql_.gz	1.13 MB

Here is the mostly devel-generated database that causes errors on upgrade.

Log in or register to post comments

Comment #59

catch

he/him

English

commented 19 August 2010 at 10:58

chx asked me to review this, but I'm flat out at work and haven't had time to do a proper review. Just to say I'm completely happy with the 'extras' field and don't see any other way apart from leaving the rows in term_node and letting contrib handle it, which doesn't seem so good anyway.

On naming can we call it something like taxonomyextras (no underscore) to keep it namespaced without the collisions?

Log in or register to post comments

Comment #60

bjaspan commented 20 August 2010 at 23:30

Subscribe. The basic approach based on reading only comments seems sound. I'll try to look at the code on the plane to CPH.

Log in or register to post comments

Comment #61

Anonymous (not verified) commented 21 August 2010 at 01:28

I'll be updating the patch tomorrow (Saturday, Central US) to try to fix the problems found by Benjamin Melançon in #56-58.

Log in or register to post comments

Comment #62

Anonymous (not verified) commented 22 August 2010 at 14:45

I'm having inexplicable trouble with this patch now. When I run the upgrade path tests, I am given this error (via the "verbose response"):

Unresolved dependency Text (Disabled)
Comment requires this module.
Unresolved dependency Field (Disabled)
Comment requires this module.
Unresolved dependency Field SQL storage (Disabled)
Comment requires this module.

EDIT: This issue is blocked by http://drupal.org/node/877690

Log in or register to post comments

Comment #63

Anonymous (not verified) commented 22 August 2010 at 15:17

Status:

Needs work

» Needs review

Status	File	Size
new	706842-63-taxonomy-update-path-broken.patch	16.26 KB

New patch.

The error Benjamin Melançon confronted was that his database had terms that were not applied to any nodes. This led to the SQL results for the second query in taxonomy_update_7005() including rows that had no reason to be migrated... no node ID, no type, thus no value column name.

This cannot be tested with the current test suite's upgrade database because EVERY term has a link to almost every node.

The field 'taxonomyextra_upgrade' has been renamed to 'taxonomyextra' so that it's namespace-less.

Log in or register to post comments

Comment #64

Anonymous (not verified) commented 22 August 2010 at 15:38

I just realized that if the ONLY excess values are those which exceed the cardinality of the otherwise valid destination field, there will be no "taxonomyextra" field created (because the vocabulary node type table and the extras calculation query in taxonomy_upgrade_7004 would have matched up identically) but during the upgrade, the excess values would be found and they'd be added to a non-existant taxonomyextra field.

What's the best way to handle this?

1. Always create the "taxonomyextra" field with every vocabulary as an allowed value even if it ends up with no data.

2. Create the taxonomyextra field at the first instance when it's needed in taxonomy_upgrade_7005.

Log in or register to post comments

Comment #65

Anonymous (not verified) commented 22 August 2010 at 15:44

Assigned:

Unassigned

»

Log in or register to post comments

Comment #66

mlncn commented 22 August 2010 at 15:57

Assigned:

» Unassigned

Patch in #63 did not work for me. Or rather, it allowed the upgrade, but it didn't do right by the data (using the same test database attached in #58). The term_node table has quite a lot of things in that database, but after upgrade the taxonomy_index table had nothing at all. (Which ended a long time of clicking to see if i could find any piece of content with a taxonomy term, or vice versa!)

Log in or register to post comments

Comment #67

bjaspan commented 22 August 2010 at 16:13

re #64: Why not always create the "taxonomyextra" field with every vocabulary as an allowed value even if it ends up with no data, and then when all done do a field query and delete the field if it has no data items?

Log in or register to post comments

Comment #68

mlncn commented 22 August 2010 at 16:25

I take #66 back. The patch in #63 works for everything i can test.

Log in or register to post comments

Comment #69

mlncn commented 22 August 2010 at 16:44

Status:

Needs review

» Needs work

Apologies again for the noise in #66. Upgrading a real Drupal 6 site that does crazy things with taxonomy still has the problems reported in #55. Given my theory in #56, that may be a vote for Barry's catch-all approach in #67.

Got that? In short, i'll test Bangpound's next patch, but for everyone's info the test database in #58 and Drupal's core tests don't catch all the awful possibilities out there, so i at least would like to be able to test again before this patch is committed.

Log in or register to post comments

Comment #70

Anonymous (not verified) commented 22 August 2010 at 18:28

Assigned:	Unassigned	»
Status:	Needs work	» Needs review

Status	File	Size
new	706842-70-taxonomy-update-path-broken.patch	16.06 KB

New patch!

Now the upgrade process creates the overflow field and migrates the data. Then it deletes instances of the overflow field when they aren't necessary. If the overflow field's revision table contains NO data, the whole field is deleted.

Log in or register to post comments

Comment #71

Anonymous (not verified) commented 22 August 2010 at 18:29

Status	File	Size
new	706842-71-taxonomy-update-path-broken.patch	16.04 KB

I wish I could undo that last patch. It's not going to work because I made it from git without --no-prefix.

Log in or register to post comments

Comment #72

mlncn commented 22 August 2010 at 20:42

My complex real-world database still fails, Update #7005, but the error is more precise?

Failed: PDOException: SQLSTATE[23000]: Integrity constraint violation: 1048 Column 'delta' cannot be null: INSERT INTO {field_revision_taxonomyextra} (etid, entity_id, revision_id, bundle, language, delta, taxonomyextra_tid) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6); Array ( [:db_insert_placeholder_0] => 1 [:db_insert_placeholder_1] => 3 [:db_insert_placeholder_2] => 16 [:db_insert_placeholder_3] => project [:db_insert_placeholder_4] => und [:db_insert_placeholder_5] => [:db_insert_placeholder_6] => 31 ) in taxonomy_update_7005() (line 565 of /home/ben/workspace/d7t/modules/taxonomy/taxonomy.install).

Log in or register to post comments

Comment #73

Anonymous (not verified) commented 23 August 2010 at 00:09

Fixed.

I don't really know how to create tests for all these different problems unless there's data in the test upgrade database.

The current dataset is very formulaic and it has a huge number of tests. If we want to rewrite the test database, it needs to have these cases covered:

All the current tests in testTaxonomyUpgrade().
A node which has 2 term_node entries for terms in a vocabulary which only supports single values.
Terms which have no entries in the term_node table.
A node which has term X, Y and Z in its first revision but only term X and Y in the current one.

Log in or register to post comments

Comment #74

Anonymous (not verified) commented 23 August 2010 at 00:10

Status	File	Size
new	706842-73-taxonomy-update-path-broken.patch	16.16 KB

And the patch...

Log in or register to post comments

Comment #75

23 August 2010 at 00:20

Status:

Needs review

» Needs work

The last submitted patch, 706842-73-taxonomy-update-path-broken.patch, failed testing.

Log in or register to post comments

Comment #76

Anonymous (not verified) commented 23 August 2010 at 00:21

Status:

Needs work

» Needs review

Status	File	Size
new	706842-76-taxonomy-update-path-broken.patch	16.14 KB

ARGH. No prefix bit me again.

Log in or register to post comments

Comment #77

mlncn commented 23 August 2010 at 00:46

Status:

Needs review

» Reviewed & tested by the community

So without my mentioning to him in advance, Bangpound has upgraded a site that did many unusual things with Taxonomy module, and then merged in a site that went back to Drupal 4.7 and... the contrib Category module. I dare say, that having upgraded all the vocabularies on each node in this monster means that there's not much out there in the world that the taxonomy upgrade could have trouble with.

My hat is off to Benjamin Doherty.

Log in or register to post comments

Comment #78

mlncn commented 23 August 2010 at 00:51

Status:

Reviewed & tested by the community

» Needs work

Oh hell the database attached in #58 no longer works:

* Failed: DatabaseSchemaObjectExistsException: Table field_data_taxonomyextra already exists. in DatabaseSchema->createTable() (line 621 of /home/ben/workspace/d7t/includes/database/schema.inc).

Log in or register to post comments

Comment #79

mlncn commented 23 August 2010 at 00:55

Status:

Needs work

» Reviewed & tested by the community

Never mind, i hadn't cleared the database (went to, but never completed that step) before bringing in the test database for a new try. It does work!! Please commit this before i mess up another test.

Log in or register to post comments

Comment #80

23 August 2010 at 01:00

Status:	Reviewed & tested by the community	» Needs work
Issue tags:	-D7 upgrade path, -beta blocker

The last submitted patch, 706842-76-taxonomy-update-path-broken.patch, failed testing.

Log in or register to post comments

Comment #81

Anonymous (not verified) commented 23 August 2010 at 01:02

Status:

Needs work

» Needs review

#76: 706842-76-taxonomy-update-path-broken.patch queued for re-testing.

Log in or register to post comments

Comment #82

23 August 2010 at 01:50

Status:	Needs review	» Needs work
Issue tags:		+D7 upgrade path, +beta blocker

The last submitted patch, 706842-76-taxonomy-update-path-broken.patch, failed testing.

Log in or register to post comments

Comment #83

chx commented 23 August 2010 at 04:35

That's a weird failure. I have instructed the testbot to retest HEAD.

Log in or register to post comments

Comment #84

chx commented 23 August 2010 at 05:08

Status:

Needs work

» Needs review

#76: 706842-76-taxonomy-update-path-broken.patch queued for re-testing.

Log in or register to post comments

Comment #85

bjaspan commented 23 August 2010 at 09:28

Partial review; next I will apply the patch to review update 7005.

+++ modules/simpletest/tests/upgrade/upgrade.taxonomy.test
@@ -0,0 +1,122 @@
+          // Prefer valid taxonomy term reference fields for a given vocabulary
+          // when they exist.
+          if (empty($instances[$tree['vid']]) || $instances[$tree['vid']] == 'taxonomyextra') {
+            $instances[$tree['vid']] = $field['field_name'];

$instances is not initialized to array(). It shouldn't matter b/c page will always have a taxo reference, but if it ever doesn't this will cause an E_NOTICE exception.

+++ modules/simpletest/tests/upgrade/upgrade.taxonomy.test
@@ -0,0 +1,122 @@
+    // Check that the node type 'story' has been associated to a taxonomy
+    // reference field for each vocabulary. It was not explicitely in
+    // $vocabulary->nodes but each node of type 'story' was associated to
+    // one or more terms.

The assertions following this comment say that story nodes should only be associated with the taxonomyextra field. Okay, I see that the second sentence of this comment is saying that, but only indirectly, and the first sentence seems to say the opposite. So, the comment should probably be clarified.

+++ modules/simpletest/tests/upgrade/upgrade.taxonomy.test
@@ -0,0 +1,122 @@
+    $this->assertEqual(count($field_names), 1, t('Only one taxonomy term field is used for on story nodes'));
+    $this->assertEqual(key($field_names), 'taxonomyextra', t('Only the excess taxonomy term field is used for on story nodes'));

Grammatical error: "used for" in both lines.

+++ modules/simpletest/tests/upgrade/upgrade.taxonomy.test
@@ -0,0 +1,122 @@
+    $instances = array();
+    foreach (field_info_instances('node', 'poll') as $instance) {

This loop appears three times identically. It could stand to be factored out. You knew someone would say that, right? :-)

+++ modules/taxonomy/taxonomy.install
@@ -412,7 +412,54 @@ function taxonomy_update_7004() {
+  // Allowed values for this extra vocabs field is every vocabulary.
+  foreach (taxonomy_get_vocabularies() as $vocabulary) {
+    $allowed_values[] = array(
+      'vid' => $vocabulary->vid,
+      'parent' => 0,
+    );
+  }

$allowed_values is not initialized; later code will fail if there are no vocs.

Powered by Dreditor.

Log in or register to post comments

Comment #86

bjaspan commented 23 August 2010 at 10:06

Status:

Needs review

» Needs work

Review of 7005:

    $result = db_query('SELECT v.vid, v.machine_name, n.type FROM {taxonomy_vocabulary} v INNER JOIN {taxonomy_vocabulary_node_type} n ON v.vid = n.vid');
    $vocabularies = array();
    foreach ($result as $record) {
      // If no node types are associated with a vocabulary, the LEFT JOIN will
      // return a NULL value for type.
      if (isset($record->type)) {
        $vocabularies[$record->vid][$record->type] = 'taxonomy_'. $record->machine_name;
      }

The query is an INNER JOIN, not a LEFT JOIN. Is the comment and if statement a holdover from previous "extra" field approaches?

    // Query selects all revisions at once and processes them in revision and
    // term weight order.
    $query = 'SELECT td.vid AS vocab_id, td.tid, tn.nid, tn.vid, n.type, n2.created, n2.sticky, n2.nid AS is_current FROM {taxonomy_term_data} td INNER JOIN {taxonomy_term_node} tn ON td.tid = tn.tid LEFT JOIN {node} n ON tn.nid = n.nid LEFT JOIN {node} n2 ON tn.vid = n2.vid ORDER BY tn.vid, td.weight ASC';

Confused. LEFT JOIN {node} n2 ON vid will only get the most current revision, not all revisions at once. Thinking... the LEFT JOIN {node} ON nid should grab the node the term is associated with, which presumably should always exist unless a bad node deletion occurred, so INNER JOIN should be safe but LEFT JOIN is probably prudence. But the LEFT JOIN {node} n2 ON vid will only ever match if the term is associated to the most current revision of the node. Okay, so this means it is possible for {node} n to match with data and {node} n2 not to. Maybe I'll see from the code below that this was intended, though I do not see how it "selects all revisions at once." More explanation, please?

    if (isset($sandbox['cursor'])) {
      $values = $sandbox['cursor']['values'];
      $deltas = $sandbox['cursor']['deltas'];
    }

It would be helpful to document the "rep invariant" for $sandbox, e.g.: all the possible keys of $sandbox, their format, and what they are guaranteed to look like after each invocation of the function. Otherwise, at this point in the code, I have no idea what $values and $deltas are.

      if (isset($values)) {

        // If the last inserted revision_id is the same as the current record,
        // use the previous deltas to calculate the next delta.
        if ($record->vid == $values[2]) {

"Huh?" See also, please document $values. :-)

          if ($field['cardinality'] != FIELD_CARDINALITY_UNLIMITED && ($deltas[$field_name] + 1) >= $field['cardinality']) {

            // For excess values for a single value field, switch over to the
            // overflow field.

Comment should be "For excess values for a fixed-cardinality field" since you are (properly) handling N>1 fixed cardinality fields.

      // Table and column found in the field's storage details. During upgrades,
      // it's always SQL.
      $table = key($field['storage']['details']['sql'][FIELD_LOAD_REVISION]);
      $value_column = $field['storage']['details']['sql'][FIELD_LOAD_REVISION][$table]['tid'];

I understand that it is safe to assume SQL storage here, but why do it? It seems like it would be simpler and clearer to construct a pseudo-entity and use field_attach_insert().

Hmmm, okay, the reason it is not easier is that you do not always have all the the terms for a single nid/vid at a time, which the Field Attach API requires. I think this whole update function would be simpler if you based it on "SELECT DISTINCT vid FROM {taxonomy_term_node}" then queried and handled all terms for each nid/vid as a separate pass. It would clearly require more queries, but who cares? Of course it would also be a complete rewrite of the function. So if we achieve done-ness soon, don't bother. If not, consider it. :-)

        // Update the {taxonomy_index} table.
        db_insert('taxonomy_index')
          ->fields(array('nid', 'tid', 'sticky', 'created',))
          ->values(array($record->nid, $record->tid, $record->sticky, $record->created))
          ->execute();

Hey, now I know what {node} n2 in the primary query is for! But now I also really think it should be {node_revision} n2 since otherwise won't created and sticky always be NULL for a term attached to a non-current revision?

    // Determine necessity of taxonomyextras field.
    $field = $field_info['taxonomyextra'];
    $table = key($field['storage']['details']['sql'][FIELD_LOAD_REVISION]);
    $node_types = db_select($table)->distinct()->fields($table, array('bundle'))
      ->execute()->fetchCol();

It seems like this could use a FieldEntityQuery in either count or limit-1 mode for each bundle separately, dropping the instance on no data and then no dropping the field if all instances had no data. Again, I guess it isn't necessary/important if the current code really is done.

Setting to CNW because unless I'm mistaken at least the {node} n2 vs {node_revisions} n2 is a real bug.

Log in or register to post comments

Comment #87

bjaspan commented 27 August 2010 at 10:46

Assigned:

» bjaspan

I'm working on this today.

Log in or register to post comments

Comment #88

pbuyle commented 27 August 2010 at 11:05

While testing the patch in #76 on a real D6 database, each taxonomy_update_7005 step took about 2min. Starting a transaction in taxonomy_update_7005 reduced each step of 1000 inserts to 2sec. I started the transaction at line 518 (just before starting the foreach loop). I guess it can be started at the begin of taxonomy_update_7005. Or if we don't want to run everything in taxonomy_update_7005 in a transaction, move the loop in an helper function to only do the taxonomy_index inserts in the transaction.

Log in or register to post comments

Comment #89

pfrenssen

Sofia

commented 27 August 2010 at 11:25

Assigned:

bjaspan

» Unassigned

small problem in the patch: $field_info is used on line 602:

    // Determine necessity of taxonomyextras field.
    $field = $field_info['taxonomyextra'];

but it is only defined when $sandbox['total'] is set. It caused my upgrade to fail. Moving it back up to the top of the function fixes this:

***************
*** 471,476 ****
--- 471,478 ----
   */
  function taxonomy_update_7005(&$sandbox) {
  
+   $field_info = field_info_fields();
+     
    // This is a multi-pass update. On the first call we need to initialize some
    // variables.
    if (!isset($sandbox['total'])) {
***************
*** 498,504 ****
      }
    }
    else {
-     $field_info = field_info_fields();
      $etid = _field_sql_storage_etid('node');
  
      // We do each pass in batches of 1000.
--- 500,505 ----

Log in or register to post comments

Comment #90

marvil07 commented 27 August 2010 at 12:01

Just to mention that relations between terms seems to not being conserved.

To reproduce it, just create a relation between two terms, upgrade and the relation is not going to be there.

Log in or register to post comments

Comment #91

mrfelton commented 27 August 2010 at 13:29

+++ modules/taxonomy/taxonomy.install
@@ -412,7 +412,54 @@ function taxonomy_update_7004() {
+
+  // Some contrib projects stored term node associations without regard for the
+  // selections in the taxonomy_vocabulary_node_types table.
+
+  // Allowed values for this extra vocabs field is every vocabulary.
+  foreach (taxonomy_get_vocabularies() as $vocabulary) {
+    $allowed_values[] = array(
+      'vid' => $vocabulary->vid,
+      'parent' => 0,
+    );
+  }

This part is failing my upgrade of a clean D6 site, where no vocbs have been defined. What should allowed values default to?

Powered by Dreditor.

EDIT: To clarify, the $allowed_values problem is only a notice, but still. And to confirm, the change in #89 gets me past this update hook.

Log in or register to post comments

Comment #92

mrfelton commented 27 August 2010 at 13:44

Status	File	Size
new	field_sql_storage.patch	707 bytes

I honestly don't know if this is related. But, with the patch applies, and the change made from #89. I get the following error:

Fatal error: Call to undefined function field_read_fields() in /home/tom/workspace/d7/modules/field/modules/field_sql_storage/field_sql_storage.install on line 39

The following patch resolves.

Log in or register to post comments

Comment #93

mrfelton commented 27 August 2010 at 14:34

More debugging:

For what ever reason, the site that I am upgrading has some entries in the term_node table, which relate to a non-existant node. This is screwing up the upgrade process with the following error:

Failed: PDOException: SQLSTATE[23000]: Integrity constraint violation: 1048 Column 'bundle' cannot be null: INSERT INTO {field_revision_taxonomyextra} (etid, entity_id, revision_id, bundle, language, delta, taxonomyextra_tid) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6); Array ( [:db_insert_placeholder_0] => 1 [:db_insert_placeholder_1] => 176 [:db_insert_placeholder_2] => 497 [:db_insert_placeholder_3] => [:db_insert_placeholder_4] => und [:db_insert_placeholder_5] => 0 [:db_insert_placeholder_6] => 80 ) in taxonomy_update_7005() (line 569 of /home/tom/workspace/d7/modules/taxonomy/taxonomy.install).

so it is attempting to insert a record with the bundle field as null. This is because the following query in taxonomy.install tries to join term_node on node.

$query = 'SELECT td.vid AS vocab_id, td.tid, tn.nid, tn.vid, n.type, n2.created, n2.sticky, n2.nid AS is_current FROM {taxonomy_term_data} td INNER JOIN {taxonomy_term_node} tn ON td.tid = tn.tid LEFT JOIN {node} n ON tn.nid = n.nid LEFT JOIN {node} n2 ON tn.vid = n2.vid ORDER BY tn.vid, td.weight ASC';

Since there is no entry in the node table in this case, n.type comes back blank, and so when we attempt to insert into field_revision_taxonomyextra, the bundle field is blank - which causes the error.

Note: I can only get this far with the patches from #84 and #92 applied. I also have http://drupal.org/node/895014#comment-3379240 applied

EDIT: And just to confirm, that after doing all the above, and then manually removing those rouge entries from the term_node table just prior to running the upgrade, I am able to get all the way to the end of the upgrade process - for the first time all day :D

Log in or register to post comments

Comment #94

mrfelton commented 27 August 2010 at 15:38

+  // Allowed values for this extra vocabs field is every vocabulary.
+  foreach (taxonomy_get_vocabularies() as $vocabulary) {
+    $allowed_values[] = array(
+      'vid' => $vocabulary->vid,

I don't think we can use taxonomy_get_vocabularies() here. Not with the current state of things anyway. If someone has disabled taxonomy.module, the upgrade results in the following error:

Fatal error: Call to undefined function taxonomy_get_vocabularies() in /home/tom/workspace/d7/modules/taxonomy/taxonomy.install on line 420

Or at least, we need explicitly to load that function first.

Powered by Dreditor.

Log in or register to post comments

Comment #95

pfrenssen

Sofia

commented 27 August 2010 at 15:50

I discovered an issue when investigating this: the upgrade fails if taxonomy terms are present in the database but the taxonomy module was disabled in D6. If filed a new issue for this: #895386: Clean-up the upgrade path: taxonomy

Log in or register to post comments

Comment #96

bjaspan commented 27 August 2010 at 19:27

Status	File	Size
new	taxonomy-upgrade-706842-96.patch	665.36 KB

New patch. FYI, I declare myself very impressed with bangpound's initial patch from #76; several of the problems I "identified" in my review were not in fact problems.

Changes:

#85, #86: All of my correctly-identified problems are fixed, so anything from those comments was either not a problem or is fixed.
#89: Undefined $field_info problem fixed.
#91: I identified this issue in my review in #85, and it is fixed.
#93: term node relations to non-existence nodes are no longer migrated, so this is fixed.
#94, #95: Moved to #895386: Clean-up the upgrade path: taxonomy.

Listed as @todos:

#88: Wrapping a transaction around update 7005. This is a performance improvement that is not critical and can be a separate issue.
#90: Term relations are still not migrated.

Not done:

#92: mrfelton, can you reproduce this?

Log in or register to post comments

Comment #97

bjaspan commented 27 August 2010 at 19:35

Status	File	Size
new	taxonomy-upgrade-706842-97.patch	665.31 KB

re #90: Apparently core no longer supports term relations. taxo update 7001 renames term_relation table to taxonomy_term_relation, and beyond that core ignores it. So yes, the upgrade path does not preserve it.

I removed the @todo. New patch.

Log in or register to post comments

Comment #98

bjaspan commented 27 August 2010 at 19:39

Actually, there are more changes in #96/97:

I added new data to the D6 upgrade test database for nodes that have different terms associated to different versions. thought I was doing this to reveal a flaw in bangpound's update function but it turns out his code handled it perfectly. :-) So, this added small changes to generate-d6-content.sh, and ENORMOUS changes to drupal-6.filled.database.php because dump-database-d6.sh does not order things and I guess my system is dumping in a different order than the previous submitter's did.

With the removal of the @todo from #97, I think this code is ready for a followup review, then RTBC.

Log in or register to post comments

Comment #99

bjaspan commented 27 August 2010 at 19:40

Status:

Needs work

» Needs review

Oops, CNW. To be clear, it is the patch in #97 that is CNR and I think RTBC.

Log in or register to post comments

Comment #100

c960657 commented 28 August 2010 at 10:05

This is not a complete review, but it did notice two minor issues:

+    $query = 'SELECT td.vid AS vocab_id, td.tid, tn.nid, tn.vid, n.type, n2.created, n2.sticky, n2.nid AS is_current FROM {taxonomy_term_data} td INNER JOIN {taxonomy_term_node} tn ON td.tid = tn.tid INNER JOIN {node} n ON tn.nid = n.nid LEFT JOIN {node} n2 ON tn.vid = n2.vid ORDER BY tn.vid, td.weight ASC';
+    $result = db_query_range($query, $sandbox['last'], $batch);

The tn.vid-td.weight tuple is not guaranteed to be unique (AFAICT weight is not used in Tags taxonomies, and even in non-Tags taxonomies there may be duplicates due to weight being just a tinyint), so consecutive queries are not guaranteed to return the terms in the same order. This is required when iterating over the whole result using multiple range queries.

+    $this->assertEqual(sort(array_keys($vocabularies)), sort(array_keys($instances)), t('Node type page has instances for every vocabulary.'));

This does not do what it claims to to. sort() works by reference and just returns a boolean. But the keys of the two arrays are in fact equal, though.

Log in or register to post comments

Comment #101

bjaspan commented 28 August 2010 at 10:17

Status:

Needs review

» Needs work

Sounds like CNW to me.

Log in or register to post comments

Comment #102

ctmattice1 commented 28 August 2010 at 23:40

Tried #97 on a D6 upgrade after applying the patch in #891028: Orphan node types lose bodies on upgrade. After applying #97 compared the D6 taxonomy to newly update D7. It caught all of the terms and even one that was not showing up in D6 which should have. Still may not be complete though dblog is showing

Type php
Date Saturday, August 28, 2010 - 5:37pm
User ctmattice
Location http://d6d7.local/taxonomy/term/9/edit?destination=admin/structure/taxon...
Referrer http://d6d7.local/admin/structure/taxonomy/vocabulary_7
Message Warning: Invalid argument supplied for foreach() in filter_list_format() (line 637 of C:\sites\d6d7\modules\filter\filter.module).
Severity warning
Hostname 127.0.0.1

Log in or register to post comments

Comment #103

bjaspan commented 31 August 2010 at 18:45

Status:

Needs work

» Needs review

Status	File	Size
new	taxonomy-upgrade-706842-103.patch	666 KB

New patch, fixes both issues from #100, and includes a lengthy comment for one since it is not otherwise obvious what is going on:

    // This query must return a consistent ordering across multiple calls.  We
    // need them ordered by node vid (since we use that to decide when to reset
    // the delta counters) and by term weight so they appear within each node
    // in weight order. However, tn.vid,td.weight is not guaranteed to be
    // unique, so we add tn.tid as an additional sort key because tn.tid,tn.vid
    // is the primary key of the D6 term_node table and so is guaranteed
    // unique. Unfortunately it also happens to be in the wrong order which is
    // less efficient, but c'est la vie.
    $query = 'SELECT td.vid AS vocab_id, td.tid, tn.nid, tn.vid, n.type, n2.created, n2.sticky, n2.nid AS is_current FROM {taxonomy_term_data} td INNER JOIN {taxonomy_term_node} tn ON td.tid = tn.tid INNER JOIN {node} n ON tn.nid = n.nid LEFT JOIN {node} n2 ON tn.vid = n2.vid ORDER BY tn.vid, td.weight ASC, tn.tid';

Log in or register to post comments

Comment #104

berdir

German

Switzerland

commented 1 September 2010 at 01:08

I haven't really read the patch so I don't know how complex the queries actually are but I've confirmed that they pass on PostgreSQL (with the patch from the changeField() issue...) and SQLite... :)

Log in or register to post comments

Comment #105

chx commented 5 September 2010 at 15:48

Status:

Needs review

» Reviewed & tested by the community

I think we are ready to go. It'll probably need a slight fixup once the field helpers are in for updates BUT right now it's impossible for a normal user to test taxonomy update w/o this patch. Let's get this in. Even if we find problems later, that can be fixed on. This is an incredible good step ahead.

Log in or register to post comments

Comment #106

mlncn commented 5 September 2010 at 16:18

Tested with the real-world and devel generated databases i've been working with. The improved patch still works for them, too. Beautifully. Agree with chx that this should go in.

RTBC++

Log in or register to post comments

Comment #107

webchick

she/they

English

Vancouver 🇨🇦

commented 5 September 2010 at 17:00

Priority:	Critical	» Normal
Status:	Reviewed & tested by the community	» Needs work

AWESOME work, folks!!! So glad to see this one polished off. :)

The comments in some of these update functions are works of poetry.

Committed to HEAD. Here's some minor clean-up I spotted going through, which would be nice to have a quick follow-up patch for.

+++ modules/simpletest/tests/upgrade/upgrade.taxonomy.test	31 Aug 2010 18:44:11 -0000
@@ -0,0 +1,145 @@
+   * Retrieve an array mapping allowed vocabulary id to field name for
+   * all taxonomy_term_reference fields for which an instance exists
+   * for the specified entity type and bundle.
...
+  function instanceVocabularies($entity_type, $bundle) {

This function name appears to be missing a verb. "getInstanceVocabularies"?

Additionally, is there a way to tighten up this doc in a summary that fits in one line?

+++ modules/simpletest/tests/upgrade/upgrade.taxonomy.test	31 Aug 2010 18:44:11 -0000
@@ -0,0 +1,145 @@
+          if (empty($instances[$tree['vid']]) || $instances[$tree['vid']] == 'taxonomyextra') {

What is "taxonomyextra"? Is that a contrib module we're special casing? A comment here wouldn't hurt.

+++ modules/simpletest/tests/upgrade/upgrade.taxonomy.test	31 Aug 2010 18:44:11 -0000
@@ -0,0 +1,145 @@
+    // Check that taxonomy_vocabulary_node_type and taxonomy_term_node have been
+    // removed.
+    $this->assertFalse(db_table_exists('taxonomy_vocabulary_node_type'), t('taxonomy_vocabulary_node_type has been removed.'));
+    $this->assertFalse(db_table_exists('taxonomy_term_node'), t('taxonomy_term_node has been removed.'));

Is this a good idea? Might contrib modules need to do something with the data in these tables?

+++ modules/simpletest/tests/upgrade/upgrade.taxonomy.test	31 Aug 2010 18:44:11 -0000
@@ -0,0 +1,145 @@
+      debug("Testing node $nid");

Debug.

+++ modules/simpletest/tests/upgrade/upgrade.taxonomy.test	31 Aug 2010 18:44:11 -0000
@@ -0,0 +1,145 @@
+      // The first 12 nodes have two revisions. For nodes with
+      // revisions, check that the oldest revision is associated only
+      // to terms whose ID is equal to the node ID or 49 less the node ID.

Uh. That might explain exactly what the code below is doing, but doesn't help me at all with "why".

+++ modules/taxonomy/taxonomy.install	31 Aug 2010 18:44:11 -0000
@@ -412,7 +412,57 @@ function taxonomy_update_7004() {
+  // Some contrib projects stored term node associations without regard for the
+  // selections in the taxonomy_vocabulary_node_types table, or have more terms
+  // for a single node than the vocabulary allowed. We construct the
+  // taxonomyextra field to store all the extra stuff.
...
+      'description' => 'Debris left over after upgrade from Drupal 6',

Oh, ok. Now taxonomyextra is explained. Maybe just a pointer to it in that other place, then.

Also, what exactly does "store all the extra stuff" mean? The description here could use some work, to help inform people who end up with this field in their database what to do next.

+++ modules/taxonomy/taxonomy.install	31 Aug 2010 18:44:11 -0000
@@ -440,47 +520,164 @@ function taxonomy_update_7005(&$sandbox)
+        $vocabularies[$record->vid][$record->type] = 'taxonomy_'. $record->machine_name;

Concatenation.

+++ modules/taxonomy/taxonomy.install	31 Aug 2010 18:44:11 -0000
@@ -440,47 +520,164 @@ function taxonomy_update_7005(&$sandbox)
+      // Table and column found in the field's storage details. During upgrades,
+      // it's always SQL.
+      $table = key($field['storage']['details']['sql'][FIELD_LOAD_REVISION]);
+      $value_column = $field['storage']['details']['sql'][FIELD_LOAD_REVISION][$table]['tid'];

Interesting. So what do MongoDB people do?

+++ modules/taxonomy/taxonomy.install	31 Aug 2010 18:44:11 -0000
--- scripts/generate-d6-content.sh	30 Jul 2010 01:28:00 -0000	1.1
+++ scripts/generate-d6-content.sh	31 Aug 2010 18:44:11 -0000

This file could use a heck of a lot more comments to explain what it's generating and why.

As a side-note, it's weird to have a file that ends in .sh and have it not be shell script code, but PHP. Not a huge deal, either way, but odd.

+++ scripts/generate-d6-content.sh	31 Aug 2010 18:44:11 -0000
@@ -113,16 +121,26 @@ for ($i = 0; $i < 24; $i++) {
+  // Make every term association different a little. For nodes with revisions,

"a little different"

Powered by Dreditor.

Log in or register to post comments

Comment #108

webchick

she/they

English

Vancouver 🇨🇦

commented 5 September 2010 at 17:00

Issue tags:

-beta blocker

no longer a beta blocker. yay! :D

Log in or register to post comments

Comment #109

catch

he/him

English

commented 6 September 2010 at 05:44

At some point there's going to have to be a module/script to migrate from from field_sql_storage to mongodb, that'd make the D6/CCK -> D7/mongodb upgrade process D6/CCK -> d7/field_sql_storage -> mongodb, but it'd also account for D7sql->mongodb at the same time. It's going to be the same for any other pluggable storage like per-bundle too.

I wasn't able to give this issue half the attention I wanted to, so thanks to those who actually found time to work on it.

Log in or register to post comments

Comment #110

chx commented 6 September 2010 at 06:07

A migration script from SQL to any other field storage is very possible -- once you have upped to D7, it's not too hard, the biggest challenge is keeping the field names, you likely want to up the field definitions and then run bare SQL to load and run field_attach_save and batch the process. However, on upgrade it's next to impossible.

Log in or register to post comments

Comment #111

yched commented 6 September 2010 at 11:09

or, for a generic storage migration tool (no assumption on SQL as a starting point) :
update the field definitions,
[batch]
fetch field data with a direct call to [old_storage]_field_storage_load(),
save to new storage with field_attach_save(),
remove old data with [old_storage]_field_storage_purge()
[/batch]
remove old storage (e.g SQL tables) with [old_storage]_field_storage_purge_field()

Log in or register to post comments

Comment #112

andypost

he/him

Russian

commented 11 September 2010 at 03:48

Is Migrate module a better place to discus a field storage data migration?

Log in or register to post comments

Comment #113

catch

he/him

English

commented 8 June 2011 at 05:09

Title:	"taxo as field" update broken + wipes some node/term associations	» Improve comments for the taxonomy upgrade path
Category:	bug	» task
Priority:	Normal	» Minor
Issue tags:	-D7 upgrade path	+Novice

Keep seeing this and thinking it is still broken.

Log in or register to post comments

Comment #114

lionel.a commented 30 September 2011 at 13:29

Version:	7.x-dev	» 7.8
Priority:	Minor	» Normal

Hello,
I'm getting exactly the same message than post #55 when I upgrade from D6 to 7:
PDOException: SQLSTATE[42S22]: Column not found: 1054 Unknown column 'base.format' in 'field list': SELECT base.tid AS tid, base.vid AS vid, base.name AS name, base.description AS description, base.format AS format, base.weight AS weight, v.machine_name AS vocabulary_machine_name FROM {taxonomy_term_data} base INNER JOIN {taxonomy_vocabulary} v ON base.vid = v.vid WHERE (base.tid IN (:db_condition_placeholder_0)) ; Array ( [:db_condition_placeholder_0] => 29 ) in DrupalDefaultEntityController->load() (line 196 of /xxxxx/includes/entity.inc).

But what I don't understand is that I've upgraded succesfully my multisite, except one site. This one is using taxomony and I got an error message while upgrade :

Update #7005
Failed: PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'node-64-78-0-0-und' for key 'PRIMARY': INSERT INTO {field_revision_taxonomy_vocabulary_16} (entity_type, entity_id, revision_id, bundle, language, delta, taxonomy_vocabulary_16_tid) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6); Array ( [:db_insert_placeholder_0] => node [:db_insert_placeholder_1] => 64 [:db_insert_placeholder_2] => 78 [:db_insert_placeholder_3] => story [:db_insert_placeholder_4] => und [:db_insert_placeholder_5] => 0 [:db_insert_placeholder_6] => 30 ) in taxonomy_update_7005() (line 686 of /xxxxx/modules/taxonomy/taxonomy.install).

I still remain with 6 warnings, but not critical.

What is strange, is that I got exactly the same message for another of my sites, but I succeed to upgrade it (and then to apply migration fields).

Thanks if you get any idea,
lionel

---
Post edit : I finaly avoid this error by deleting all taxonomy and upgrading the site.

Log in or register to post comments

Comment #115

xjm

she/her

English

commented 18 November 2011 at 00:16

Status:

Needs work

» Closed (fixed)

#114: If this still occurs when upgrading to Drupal 7.9, I'd suggest searching for an open issue or filing a new one. This one was open for comment cleanup. Thanks!

I opened #1345420: Improve comments for the taxonomy upgrade path to avoid further confusion here, and since folks who might otherwise do this cleanup might be scared off by the 115 comments. :)

Log in or register to post comments

Improve comments for the taxonomy upgrade path

Comments