In testing 4.2 today, I was using the user profile module. Then I decided to write my own custom user data module, and stop using the Drupal supplied profile module. That worked fine, but in testing it, I was examing the database. I noticed that all of the old data for the now-unused profile module is still in the "data" column of the "users" table.
So I spent some time thinking about this design, and why it makes it hard to get rid of data stored in a multi-element column like "data" in the "users" table. In a sense, it is serialized data, like that used by PHP sessions and by lots of other applications. The advantage to using this method for storing data is it makes it easy for the programmer to add new data fields because no database change is required. it also makes upgrading or installing patches a little bit easier, for the same reason.
However, I believe this is a bad method in general.
Software should be designed and developed with an eye to providing what the end users need, and to making their jobs easy. It should not be designed to make the programmers' jobs easier at a cost to the end user.
Unfortunately, using serialized, persistent data makes exactly that mistake. It is easier for the programmer, but for the person or people managing a web site, it makes their life more difficult because it wastes data space and reduces performance. This is the mistake that the profile module makes, and it appears to be a flaw in the core Drupal design as well, since the "users" table is part of the core.
The correct way to do this, in my view, is to use normalized database design. For example, in the case of the profile module, the core "users" table should contain columns only necessary for the core system to function. Serialized, persistent data should be avoided unless there is a set of very good arguments to storing persistent data that way (persistent meaning over several sessions). When an administrator enabled the profile module, it should result in the creation of an auxilliary profile table with a primary key tied to the primary key of the users table.
If the administrator no longer wanted to use the profile module, she could disable it, and optionally, delete the table in the database.
This has several advantages:
1. No database space is wasted on storing data that is no longer used as is currently the case.
2. Performance does not suffer from having to parse and carry data which will never be used.
3. The database will be easier to understand.
4. It may actually make it easier to develop modules.
Question: given the current structure, is there any easy way to clean out the old profile module data from the users.data column without having to write something to cleverly parse it out and rewrite the column for all users with only the data I want?
--
Slavica
Comments
normalization would be good, indeed
drupals database design isn't really a design, it's more ad-hoc database hacking. there has never been a systematic approach to it, no naming conventions, no ERD diagram (i really appreciate yours - tried this a few times myself, but always gave up when confronted with 40 unrelated entities ...), no normalization ... i would really appreciate if someone (you? ;)) could have a look at it and improve it.
about the profile.module and it's use of the users.data table: from a designers point of view, it definitely is better to have the profile data in an extra table. on the other hand, and as you noticed, it's easier for developers and administrators who want to add custom fields if it doesn't need this separate table. an interesting read regarding this is marco's (author of profile.module) first post about it:
as you see, we have gone with 2. maybe we should go with 1. (an admin interface for tables, e.g. the profile table, to be changed)? question is: who implements this? (hint hint ;))
Combine the best of both worlds?
about the profile.module and it's use of the users.data table: from a designers point of view, it definitely is better to have the profile data in an extra table. on the other hand, and as you noticed, it's easier for developers and administrators who want to add custom fields if it doesn't need this separate table.
Maybe we can combine the best of both worlds and use a SQL table like this:
... where
name
could be 'birthday', 'age', etc.Altering the
user
table at the SQL level might be a viable alternative but it could get hairy as you don't know in advance whether a certain field will exist or not. For example, you can't blindly do... WHERE u.age > 18 ...
as theage
field might not be present. Then again, this might not be a problem as most modules know what they are doing.Re: Combine the best of both worlds?
I like this approach, used something like it in the agents module too, so I could easily add new counters. It might work for the user module too, and for the node module? I (as just an avarge drupal module developer) am still in a doubt, for small tweaks, I would rather not use a second table, and have an extra data row to put (serialized) data into. But if I know I want to search the data, I would use a second table. Would like to see both options supported, or as you suggested the best of both worlds, I love how the variable and cache are working, a simple like wise hook to safe some extra user info (var=value) might be enough in many situations, the same for nodes. This simple hook could of course also be used for this second table, like you described.
Robert
The realm of Relational DBs
This is getting into the realm of creating a relational Database design. Which in my opinion is the best way to go. I've tackled this problem in the past with MySQL, where you need to have a variable number of elements or structure to a table, and the amount of data contained within that structure is also variable. To do that, I use 3 tables:
Here's an example which allows the User tables to be extended in a flexible manner (this is meta stuff, I don't want to sit down and figure out the actual structure!):
table user
uid
name
realname
website
...
...
table user_extended_attributes_fields
fid (eg Field ID)
name
default
table user_extended_attributes_data
uid
fid
value
The "default" row is there to define a default value, in the event that nothing user specific is provided, therefor the system has an "acceptable default" to fall back on if you want to use it.
Then the actual extended data has it's structure defined in a table that can grow/shrink as need arises, and the actual amount of user data that uses that extended system can grow/shrink.
Of course, your modules would need to check to see if a specific field in the extended_attributes table existed, and do the right thing, if it didn't (eg don't actually try and query against it).
Just a thought.
v/r
Shane
Database management for Drupal
In reply to:
question is: who implements this?
I simply use phpMyAdmin to manage my MySQL databse. It's got everything and more you'd ever want for managing the database. This could be included into the drupal system, with a custom config file that simply allows for management of the Drupal database. Then you don't have to write anything your self.
I simply install phpMyAdmin in my drupal install under a standard location in the document tree.