By robertdouglass on
It must be late at night. I'm stumped by what should be a really simple query. Here's the db schema and data:
mysql> select * from mms_node;
+-----+--------+-------+
| nid | mms_id | realm |
+-----+--------+-------+
| 1 | 20 | 2 |
| 1 | 4 | 1 |
| 6 | 7 | 2 |
| 6 | 1 | 1 |
| 7 | 10 | 2 |
| 7 | 2 | 1 |
| 5 | 8 | 2 |
| 5 | 1 | 1 |
+-----+--------+-------+
mysql> select * from manufacturer;
+----+------------+
| id | name |
+----+------------+
| 1 | Alabama |
| 2 | Arkansas |
| 3 | Canada |
| 4 | Delaware |
| 5 | California |
+----+------------+
mysql> select * from model;
+----+--------+----------+
| id | man_id | name |
+----+--------+----------+
| 6 | 1 | Aardvark |
| 7 | 1 | Anteater |
| 8 | 1 | Animal |
| 9 | 2 | Albino |
| 10 | 2 | Banana |
| 11 | 2 | Book |
| 12 | 5 | Cookie |
| 13 | 5 | Dog |
| 14 | 3 | Canoe |
| 15 | 5 | Eagle |
| 16 | 3 | Elefant |
| 17 | 4 | Duck |
| 18 | 4 | Dingo |
| 19 | 4 | Dragon |
| 20 | 4 | Dolphin |
+----+--------+----------+What I want is this:
+-----+-------------------+------------+
| nid | manufacturer_name | model_name |
+-----+-------------------+------------+
| 5 | Alabama | Animal |
| 6 | Alabama | Anteater |
| 7 | Arkansas | Banana |
| 1 | Delaware | Dolphin |
+-----+-------------------+------------+The query that I thought would get this is this:
SELECT mms_node.nid,
manufacturer.name AS manufacturer_name, model.name model_name
FROM mms_node mms_node
LEFT JOIN manufacturer manufacturer ON mms_node.mms_id = manufacturer.id
LEFT JOIN model model ON mms_node.mms_id = model.idThis query returns this result set:
+-----+-------------------+------------+
| nid | manufacturer_name | model_name |
+-----+-------------------+------------+
| 5 | Alabama | NULL |
| 6 | Alabama | NULL |
| 7 | Arkansas | NULL |
| 1 | Delaware | NULL |
| 6 | NULL | Anteater |
| 5 | NULL | Animal |
| 7 | NULL | Banana |
| 1 | NULL | Dolphin |
+-----+-------------------+------------+Note that the use of LEFT JOIN (instead of INNER) is mandatory for me because this query will ultimately be built by the Views module which doesn't yet support INNER JOINs. There is the option of IS NOT NULL.
So help me out. How do I get the result set I want? And can someone explain to me where my thinking broke down in building the query above? Thanks!
| Attachment | Size |
|---|---|
| man_mod.zip.txt | 802 bytes |
Comments
SQL dump attached
http://drupal.org/files/man_mod.zip.txt
You'll have to remove the .txt extension.
- Robert Douglass
-----
My Drupal book: Building Online Communities with Drupal, phpBB and WordPress
Sorry
I am not sure if this is poor DB design or if I don't understand the scenario.
----
http://LandCondos.com
A couple of things
Regarding the SQL, given the data you have defined I do not think you can get the results you expect. I see two issues, first in the sql it's self, you are using mms_id to join to both manufacturer.id and model.id which logically does not make sense since mms_id should match one or the other ids but not both. Given the data you have I think you want this for the SQL
This assumes the mms_id in mms_node is the same as model.id and it joins them as such. It then joins model and manfacturer using model.man_id and manufacturer.id
The second concern is there are twice as many entries in mms_node as I would expect from what you show for expected results. Since you show each nid only once in the expected results, I would expect to only see each need in mms_once also. I would also expect mms_id to correspond to model.id.
compensate for LEFT JOINS
It *was* late at night =) The only addition to the SQL above is the compensation for the unnatural LEFT JOIN:
- Robert Douglass
-----
My Drupal book: Building Online Communities with Drupal, phpBB and WordPress
Why track each term<->node relation?
I'd like to address this concern:
Lets assume this Drupal taxonomy:
In Drupal, this would generate the following widget:
Since the goal is choosing one manufacturer-model pair (and only those that logically go with each other), this widget is just fine. Except that if you choose Model Y, only the term id for Model Y is saved to the Drupal term_node table. No information about Manufacturer 2 is saved whatsoever. In Drupal, if you needed a node with a relationship to Model Y to also have a relationship to Manufacturer 2 (at the database level), you would need to handle it programmatically (use hook_taxonomy to look up Model Y's parent and do another insert into the term_node table), or let the user interface select more than one term:
The problem with the UI solution is that you can get bogus data from the user:
So my data model solves these problems by tracking the node-manufacturer relationship and the node-model relationship in the mms_node table. This is great when I want to ask "What nodes belong to Manufacturer Q?". Using Drupal's taxonomy model, I'd have to get *all* of the models, select all of their manufacturers and join on nid. Using my model I just do this:
- Robert Douglass
-----
My Drupal book: Building Online Communities with Drupal, phpBB and WordPress
Why this data model?
Seems like people are wondering why I am using a data model that looks like this. The data is filling a role that the Drupal taxonomy system cannot handle, namely, the case where a hierarchical vocabulary grows to have tens of thousands of terms. In such a scenario, the position of the term in the hierarchy is hard to calculate using Drupal's model because the term_hierarchy table only maintains a parent-child relationship and there is no way to easily know how many ancestors a term has:
The result is that to get all terms with a depth of 2, you have to select the entire table, loop over it a couple times, build the hierarchy in PHP and then return the PHP array with those terms that end up having the right number of parents. We call this taxonomy_build_tree. That explains the 'realm' field in the mms_node table from my original post.
But more efficiency was desired in my design. In the application that this is designed for, the query "Get all models with parent manufacturer = X" is very common. To express that with Drupal taxonomy, one is dependent on the term_data table:
The query that expresses "Get all models with parent manufacturer = X" is this:
It's not a horrible query, but with it you can guarantee neither that the results are at the "model" level of the hierarchy nor that the parents are at the "manufacturer" level. You are dependent on the people using the system to never make "General Motors" a child belonging to "Ford". To be sure at the query level that you are getting only terms that are really at the correct depth you have to add the 'realm' (or I could have called it 'depth') field to the term_hierarchy table, and then it would look like this:
My data model it reduces to this:
So now at least you can see the motivation for such a schema.
- Robert Douglass
-----
My Drupal book: Building Online Communities with Drupal, phpBB and WordPress