Hi, I have over 500K taxonomy terms, and this piece of code is kiling me:
function pathauto_taxonomy() {
...
// For all children generate new alias (important if [catpath] used)
foreach (taxonomy_get_tree($category->vid, $category->tid) as $subcategory) {
$count = _taxonomy_pathauto_alias($subcategory, $op);
}
...
It easily sticks 1000M of $count in my PHP memory. It's also very slow, taking 2-4 seconds to pull this off.
What are we trying to achieve here? Can it be done more effectively, faster, less memory?
Comments
Comment #1
Anonymous (not verified) commentedIt's not $count, it's taxonomy_get_tree($category->vid, $category->tid) that is the culprit.
In my case even, I don't have "children" / "subcategories". So this cycle is not even needed for me.
May I suggest a change in logic here:
only do the taxonomy_get_tree($category->vid, $category->tid) cycle if someone is using [catpath] at all (which I am not), and has children at all.
Also, then place a warning in the documentation, not to use [catpath] for large taxonomies.
That would prevent & solve my issue.
Comment #2
dave reidThere are many possible tokens that could be using the term's parent, so we can't just skip that chunk if [catpath] is used. This will be solved with #810294: Add an 'URL alias update queue' for entities associated with an updated entity, so I'm tempted to mark it as a duplicate.
Comment #3
Anonymous (not verified) commentedThe thing is, the foreach(taxonomy_get_tree as $subcategory) did not iterate in my case. I mean, it had no results to begin with. Is there no check we can do before trying this foreach loop? Check if there are subcategories?
Otherwise go ahead. I guess the real problem is taxonomy_get_tree(), and I believe they are adressing this in D7 (just found this: #556842: taxonomy_get_tree() memory issues)
(But I'd at least mention this issue somewhere in the documentation perhaps, or as a help tip. Because it took me a day to find out,... ;-))
Comment #4
AntiNSA commentedsubscribe
Comment #5
Yoran commentedPerhaps it is just my personal use of taxonomies but huge amount of terms is generally linked with free tagging. An free tagging vocabularies don't have any hierarchy...
In this idea, a solution could by to generate children url aliases only when associated vocabulary is not defined as "free tagging" kind. This can be done by modifying the code like this :
Comment #6
rmjiv commentedI agree with Yoran. There needs to be some way of handling large free tag vocabularies. His solution works for me.
Comment #7
dave reidFree tagged terms can still be organized into parents and children in the term listing screen right?
Comment #8
rszrama commentedNecroing this sucker like a madman (as I recently worked on a site with 350k+ terms in a vocab).
Yes, even if you don't use a hierarchy, there's technically no way to forcefully disable hierarchical terms in core. Vocabularies have a hierarchy property on them, but it's purely reflective of whether or not any term in the vocabulary has one (
$vocabulary->hierarchy = 1) or more ($vocabulary->hierarchy = 2) parents. It gets updated when the term form is submitted and a term's parents have changed or when a term is deleted.I implemented a variety of workarounds on that site (including disabling changing parents on terms and deleting terms since they were remotely managed). I also had to workaround this use of taxonomy_get_tree() in Pathauto by unregistering the hook and then selectively invoking it for unmanaged vocabularies:
Leaving it here in case it helps someone else and turning my attentions toward the referenced issue. (I may also look at updating child terms without the use of taxonomy_get_tree().)
Comment #9
dave reid#2114323: Field based aliases incorrectly generated for taxonomy term children when updating top level term has been committed which switches to use taxonomy_get_children(). The rest will be handled by #810294: Add an 'URL alias update queue' for entities associated with an updated entity.