I have deleted and bulk rebuilt all aliases and all alias types worked fine except taxonomy term paths.

The taxonomy terms rebuilt until the last set to be aliased (meaning the last group of bulk aliased items as set in the General section under Maximum number of objects to alias in a bulk update). To repeat and extend, here's what I learned...

1. The last "set" of bulk generated taxonomy term path settings hung (white screen of death and php error for time out...
PHP Fatal error: Maximum execution time of 240 seconds exceeded ... taxonomy.module on line 791).
2. All aliases generated fine except for the last taxonomy term, so it seems to be hanging on the last term. Perhaps it's a loop control issue but I don't know.
3. Once there was only one term unaliased, I tried setting the maximum number of objects to generate to 1 but it also caused the bulk generation to hang at this point (with only one term unaliased).
4. I deleted and rebuilt the taxonomy paths again using smaller groups, and it worked until the last group hung again with a time out as above, and all but the last term aliased fine.

Hope this helps and that there's not an issue for it already.

Roadskater.net

Comments

greggles’s picture

Do you have a hierarchical taxonomy? Does it have circular or multiple parent hierarchies?

Roadskater.net’s picture

I must confess my out of control taxonomy. I have 8 vocabularies (forums, defs, places, tags, years, not counting ads, images, userpoints) mostly focusing on tags. This is hierarchical and I have attempted to use WordNet as a source of suggested structure such as "abstractions," "physical entities," and such, but this has been at best occasional and inconsistent (lacking a project that does this, and still looking for something autotagging that uses a structure and has terms of service I can like...but I digress).

vid name relations hierarchy multiple required tags module weight
1 forums 0 1 0 0 0 taxonomy -10
2 places 1 2 1 0 1 taxonomy 0
5 defs 0 0 1 0 0 taxonomy 0
6 tags 1 2 1 0 1 taxonomy 0
7 years 0 1 1 0 1 taxonomy 0

So I have a structure but allow freetagging, leading to all sorts of impatiently made, inconsistent tags, of course, when lucky enough to get any from authors. As for circular, I would not be surprised but I've never looked for that. I would suppose this could happen easily enough when using related terms or parent-child nodes.

I'm not sure it will help but here's what I found looking into the db. The tid for the unaliased item which as of the bulk update has the highest tid number in that vocabulary (the tags one):

* is not in term_relation as either tid1 or tid 2 (neither is the tid 1 lower that was aliased).
* is not in term_synonym as either tid or tsid (nor is the tid 1 lower).
* is in term_node, but not as a relation to the highest nid in the table and not related to the same nid as the tid 1 lower (which is also not related to the highest value nid in the table).
* in term_parent both of these tids are related to parent 0.

Of course some of these facts are irrelevant I'm sure, but if I knew which, I'd probably know how to go further into this. I have NOT tried editing the node that contains this tag or adding another tag or anything like that. I realize those might help but would not answer this question.

Now I get that while the last item is not aliased, it is likely not the only item not aliased. In my vocabs marked belonging to taxonomy, I have...

47 forum items
1,433 places
36 defs
1,627 tags
90 years
should be 3,233 taxonomy items

There's also 1 term in vid 8, for 1 image gallery.

Just looking at tags, or vid = 6...

SELECT * FROM `term_data` WHERE `vid` =6 yields 1,627 total

SELECT * FROM `url_alias` WHERE `src` LIKE '%taxonomy\\/term\\/%' yields 4,214 total (divide by 2 for node and feed urlaliases, so it looks like 2,107 terms were aliased)

SELECT * FROM `url_alias` WHERE `dst` LIKE '%tags\\/%' 1,276 total (or 638 vid 6 terms aliased, one for the term and one for the feed for the term)

SELECT count(*) FROM `url_alias` WHERE `dst` REGEXP '^forums\\/'
47 (must not be generating forum feed urlaliases?)

SELECT count(*) FROM `url_alias` WHERE `dst` REGEXP '^defs\\/'
72 (2x36=ok)

SELECT COUNT( * ) FROM `url_alias` WHERE `dst` REGEXP '^places\\/'
2866 (2x1433=ok)

SELECT COUNT( * ) FROM `url_alias` WHERE `dst` REGEXP '^tags\\/'
1276 (should be 1627x2=3254)

SELECT COUNT( * ) FROM `url_alias` WHERE `dst` REGEXP '^years\\/'
0! (so it looks like the urlalias generation hung in tags and never got to years?).

I wonder if an illegal character or other data could cause the routine to stop? Probably not. I do see how a circular reference might get caught in a, uh, loop.

Hope this helps someone. Perhaps at least there might be a way to check for circular references and inform the user or somehow skip the term and continue the crawl of terms.

In any case thanks for all the work on Pathauto.

greggles’s picture

Title: Taxonomy Bulk Generation Hangs Times Out Does Not Alias Last Term » Taxonomy Bulk Generation Hangs Times Out Does Not Alias All Terms

That is very good research!

If you can identify the terms which cause the problem, try editing them at admin/content/taxonomy/edit/term/TID and when you save it should create the aliases.

If not...that could be an easier place to debug what is going on.

Roadskater.net’s picture

I solved my problem but perhaps this could be changed to a feature request. I would like to suggest there may be some simple integrity checks that this module could do to alert users to basic taxonomy disasters. Barring that, perhaps there's a need for a taxonomy cleaner/advisor module?

I'd like to share one failure In my fumbling around for ways of looking at this so it can perhaps save others some time. I tried to use the Taxonomy Manager module (which I like) to export CSV so I could look at relations and use formulae in Open Office Calc to check for data nastiness (circular references, self-parenting). However, I didn't find anything.

It turns out what I DID find later was NOT in the Taxonomy Manager CSV export, so this cost me some time in my efforts to track down the offending data. Perhaps Taxonomy Manager was filtering bad records, I'm not sure.

Anyway, in attempt to help others, here's how I found the answer.

I decided to go back to using mysql queries. If you're new to this, you can try using phpmyadmin as a gui to do searches and to browse and change records. This is not for everyone, so we'll hope for some help from the module, or a module, so we're not doing by hand and eye what could be done easily by computer.

THE FIX

I knew there were various errors that could cause circular references, but for perhaps the simplest, I looked for an item that was it's own parent. This would have been an obvious thing to try earlier! So if you have a problem with Pathauto hanging with a white screen when trying to generate taxonomy term aliases, try...

SELECT * FROM `term_hierarchy` WHERE tid = parent;

This yielded one entry with tid (and parent) 2126...
tid parent
2126 2126

To find out the name...
SELECT * FROM `term_data` WHERE tid =2126;

yielded...
tid vid name
2126 6 honey

Funny. Bulk generation had stopped with the alphabetical precursor to this term (Honda Civic LX). I had looked at the next term visible in the CSV export, but again, honey was not in the CSV export.

Out of caution I wanted to look for this tid in the other term-related tables. (This would be more important were I going to remove the item rather than change its parent.)

SELECT * FROM `term_node` WHERE tid =2126;
nid tid vid
681 2126 681
So this term only affects one node.

I looked for relations...
SELECT * FROM `term_relation` WHERE tid1 =2126 OR tid2 =2126 OR trid =2126;
Empty set so OK, no relations to worry over.

I looked for synonyms...
SELECT * FROM `term_synonym` WHERE tid = 2126;
Empty set so OK, no synonyms to worry about.

Since honey is food at least in one context, I looked to see vocab 6 (tags) had a "food" tag that could serve as parent.

SELECT * FROM `term_data` WHERE `name` LIKE 'food' AND vid =6;
tid vid name
2212 6 food

I changed the parent to 2212 for the item with tid of 2126...
UPDATE `drupaldb`.`term_hierarchy` SET `parent` = '2212' WHERE `term_hierarchy`.`tid` =2126 AND `term_hierarchy`.`parent` =2126 LIMIT 1

After changing the parent of the item to a valid tag id instead of the item's own tag id, Pathauto's bulk generation of terms worked. Thanks to greggles for the replies.

Again, I'd like to thank the maintainers for a great module, and to encourage those who fully and easily understand these relations to include some simple, fast-running checks for taxonomy integrity somewhere along the way, perhaps with some links to simple knowledge for fixes.

Pathauto is an essential assistant (for 60,000 users of 6.1.3 alone!) and is especially important for improving the user and administrator experience and website success for those who choose Drupal. Ideally, every module should fail as gracefully as possible, whether or not it is responsible for any errors that may lie within the data it parses. Kudos to Pathauto and I'm hoping this info can contribute to some simple improvements. Perhaps the module could have a verbose mode optionally selected in settings so the performance would not be affected when used with well-tuned data sets.

I hope this info helps someone. Also, I just did a search for circular references, so see also http://drupal.org/node/482304 for more clues.

greggles’s picture

Title: Taxonomy Bulk Generation Hangs Times Out Does Not Alias All Terms » Warn users (and don't try to alias) with circular hierarchies
Version: 6.x-1.3 » 7.x-1.x-dev
Category: bug » feature

Once again, awesome research.

I think we should watchdog and keep working if this case is identified.

klonos’s picture

...now remember to backport to 6.x if it still applies ;)

gilzero’s picture

subscribe

klonos’s picture

Issue tags: +Needs backport to D6

...as per #6, so we do remember.