In testing 4.2 today, I was using the user profile module. Then I decided to write my own custom user data module, and stop using the Drupal supplied profile module. That worked fine, but in testing it, I was examing the database. I noticed that all of the old data for the now-unused profile module is still in the "data" column of the "users" table.

So I spent some time thinking about this design, and why it makes it hard to get rid of data stored in a multi-element column like "data" in the "users" table. In a sense, it is serialized data, like that used by PHP sessions and by lots of other applications. The advantage to using this method for storing data is it makes it easy for the programmer to add new data fields because no database change is required. it also makes upgrading or installing patches a little bit easier, for the same reason.

However, I believe this is a bad method in general.

Software should be designed and developed with an eye to providing what the end users need, and to making their jobs easy. It should not be designed to make the programmers' jobs easier at a cost to the end user.

Unfortunately, using serialized, persistent data makes exactly that mistake. It is easier for the programmer, but for the person or people managing a web site, it makes their life more difficult because it wastes data space and reduces performance. This is the mistake that the profile module makes, and it appears to be a flaw in the core Drupal design as well, since the "users" table is part of the core.

The correct way to do this, in my view, is to use normalized database design. For example, in the case of the profile module, the core "users" table should contain columns only necessary for the core system to function. Serialized, persistent data should be avoided unless there is a set of very good arguments to storing persistent data that way (persistent meaning over several sessions). When an administrator enabled the profile module, it should result in the creation of an auxilliary profile table with a primary key tied to the primary key of the users table.

If the administrator no longer wanted to use the profile module, she could disable it, and optionally, delete the table in the database.

This has several advantages:
1. No database space is wasted on storing data that is no longer used as is currently the case.
2. Performance does not suffer from having to parse and carry data which will never be used.
3. The database will be easier to understand.
4. It may actually make it easier to develop modules.

Question: given the current structure, is there any easy way to clean out the old profile module data from the users.data column without having to write something to cleverly parse it out and rewrite the column for all users with only the data I want?

--
Slavica

Comments

ax’s picture

drupals database design isn't really a design, it's more ad-hoc database hacking. there has never been a systematic approach to it, no naming conventions, no ERD diagram (i really appreciate yours - tried this a few times myself, but always gave up when confronted with 40 unrelated entities ...), no normalization ... i would really appreciate if someone (you? ;)) could have a look at it and improve it.

about the profile.module and it's use of the users.data table: from a designers point of view, it definitely is better to have the profile data in an extra table. on the other hand, and as you noticed, it's easier for developers and administrators who want to add custom fields if it doesn't need this separate table. an interesting read regarding this is marco's (author of profile.module) first post about it:

- webmaster defined user properties: for example I need to let users set their football team (quite a popular sport here :); I'd use them in themes, for example. Possible approches:
  1. provide a way to alter users table (or an extra-properties specific table) via administration; I mean "alter table" in the sql way; have an admin interface to edit these extra properties: add/del properties; properties should have name, type (some types could need an extra info, such as set/select, or textfield length), default value, description; control layout in some way (advanced).
  2. provide the same admin interface but store extra properties in a serialized array in sql blob. easier (no db table bothering) but less flexible, more difficult to have stats or do strange queries
  3. don't provide a user interface and use module defined properties (as defined above): if a webmaster wants more properties he would define a new module. storing as 1 or 2.
  4. don't do this patch :) seriously, one could just custom patch. I tried in v3 adding 4 custom function calls in user.inc/User and account_info_edit, account_info_save, account_user in account.php and it was easy

as you see, we have gone with 2. maybe we should go with 1. (an admin interface for tables, e.g. the profile table, to be changed)? question is: who implements this? (hint hint ;))

Dries’s picture

about the profile.module and it's use of the users.data table: from a designers point of view, it definitely is better to have the profile data in an extra table. on the other hand, and as you noticed, it's easier for developers and administrators who want to add custom fields if it doesn't need this separate table.

Maybe we can combine the best of both worlds and use a SQL table like this:

CREATE TABLE profile (
  uid int(10) unsigned default '0',
  name VARCHAR(255) DEFAULT NULL,
  value text DEFAULT NULL
);

... where name could be 'birthday', 'age', etc.

Altering the user table at the SQL level might be a viable alternative but it could get hairy as you don't know in advance whether a certain field will exist or not. For example, you can't blindly do ... WHERE u.age > 18 ... as the age field might not be present. Then again, this might not be a problem as most modules know what they are doing.

Robert L’s picture

I like this approach, used something like it in the agents module too, so I could easily add new counters. It might work for the user module too, and for the node module? I (as just an avarge drupal module developer) am still in a doubt, for small tweaks, I would rather not use a second table, and have an extra data row to put (serialized) data into. But if I know I want to search the data, I would use a second table. Would like to see both options supported, or as you suggested the best of both worlds, I love how the variable and cache are working, a simple like wise hook to safe some extra user info (var=value) might be enough in many situations, the same for nodes. This simple hook could of course also be used for this second table, like you described.

Robert

shane’s picture

This is getting into the realm of creating a relational Database design. Which in my opinion is the best way to go. I've tackled this problem in the past with MySQL, where you need to have a variable number of elements or structure to a table, and the amount of data contained within that structure is also variable. To do that, I use 3 tables:

  1. the table that defines the core fields and values
  2. the table that defines the extended fields and a default
  3. the table that contains the extended content, as referenced by the extended fields definition

Here's an example which allows the User tables to be extended in a flexible manner (this is meta stuff, I don't want to sit down and figure out the actual structure!):


    table user
      uid

      name

      realname

      website

      ...

      ...

    table user_extended_attributes_fields

      fid (eg Field ID)

      name

      default

    table user_extended_attributes_data

      uid

      fid

      value

The "default" row is there to define a default value, in the event that nothing user specific is provided, therefor the system has an "acceptable default" to fall back on if you want to use it.

Then the actual extended data has it's structure defined in a table that can grow/shrink as need arises, and the actual amount of user data that uses that extended system can grow/shrink.

Of course, your modules would need to check to see if a specific field in the extended_attributes table existed, and do the right thing, if it didn't (eg don't actually try and query against it).

Just a thought.

v/r
Shane

shane’s picture

In reply to:

    question is: who implements this?

I simply use phpMyAdmin to manage my MySQL databse. It's got everything and more you'd ever want for managing the database. This could be included into the drupal system, with a custom config file that simply allows for management of the Drupal database. Then you don't have to write anything your self.

I simply install phpMyAdmin in my drupal install under a standard location in the document tree.