Closed (won't fix)
Project:
Drupal core
Version:
4.6.x-dev
Component:
database system
Priority:
Normal
Category:
Task
Assigned:
Unassigned
Reporter:
Created:
2 Aug 2005 at 15:41 UTC
Updated:
14 Nov 2009 at 21:28 UTC
Jump to comment: Most recent file
Comments
Comment #1
killes@www.drop.org commentedShowing the comments is only supported as of mysql 4.1, but earlier versions will simply ignore it.
http://dev.mysql.com/doc/mysql/en/create-table.html
Comment #2
Bèr Kessels commentedfrom the manual:
"A comment for a column can be specified with the COMMENT option. The comment is displayed by the SHOW CREATE TABLE and SHOW FULL COLUMNS statements. This option is operational as of MySQL 4.1. (It is allowed but ignored in earlier versions.)"
So for example
Comment #3
killes@www.drop.org commentedHere is an initial patch for the first two tables. It is embarrassing to admit that I actually needed to read the code to understand some of the columns...
Comment #4
sepeck commentedcomments in the files is a good and encouraged thing I thought, so +1 for me.
Comment #5
Cvbge commentedI'd prefer if the comments were done with --, so for example:
Yes, it won't be displayed by the mysql, but you can allways look in the database.mysql.
Comment #6
Zen commentedkilles: is this such a good idea to document everything in the DB? There seem to be character restrictions, possible MySQL version issues [I've not seen field comments in MySQL < 5], no pgSQL support etc. etc. I think it might be better to prepare proper documentation in drupal.org and maybe provide the link in each table's comments section. This might also be best for any translations etc.
Zen
Comment #7
killes@www.drop.org commentedZen: The idea is that we put the comments into database.mysql and then parse that file, extract the comments, and put them on drupaldocs.org. Comments are shown for mysql 4.x where x = 0 or 1, not sure. Postgres does not support this, but since the tables are identical, I cannot see any harm in that.
Adding commented tables into the modules would work, too, but I prefer to have everything in one file.
Comment #8
webchickJust a note that tables can also be commented. For example:
CREATE TABLE access (
aid tinyint(10) NOT NULL auto_increment COMMENT 'unique ID for each accesslog entry',
mask varchar(255) NOT NULL default '' COMMENT 'simply regular expression in MySQL style to match access requests',
type varchar(255) NOT NULL default '' COMMENT 'type of accessrule',
status tinyint(2) NOT NULL default '0' COMMENT 'access type: 0 = deny, 1 = allow',
PRIMARY KEY (aid)
) TYPE=MyISAM COMMENT='Stores user account access rules' ;
This would be great since it's not immediately obvious what some of these tables are used for.
I'm very interested in this patch and will try and do some work on it next week or so.
Comment #9
harry slaughteri'd like to see references explained too. this could probably most easily be done with diagrams.
here's an example of a drupal db diagram created with a CASE tool i'm playing with (don't look at it too closely, it was generated in a few seconds without any tweaking): http://slaughters.com/images/drupal_schema.png
being a visual person, i find these charts very useful for quick reference. of course they don't included field definitions, but they aren't meant to. we could probably create cool posters from this image when it's finished (look ma, no DESCRIBEs! :)
i'll play with this portion as i couldn't help define any but the most obvious fields in the actual schema.
ps - i believe mysql comments are limited in size (may be wrong) and may not be the best place to document (esp since you can't link to other info, etc... I like the idea of a separate document that describes the schema.
these case tools i'm looking at may be of some use in that way. i'll post here if i find anything interesting.
the tools i've just discovered (that seem exceptionally cool so far) come from here: http://www.microolap.com/
Comment #10
webchickFor diagramming, also give DB Designer a try. Though at times it can be a bit clunky, it's a nice piece of software for doing that kind of thing.
I agree that diagrams are nice for visual people, as is a separate page with really drawn-out explanations of what each field does if required, but it is no substitute for documenting the code itself. The reason for that is that is each time a new field is added, we need to track down "that guy with the diagramming software" and ask him to do a new diagram up, or "that girl who administers the db schema handbook page" and ask her to update it. Realistically, this isn't going to happen, it'll constantly be a catch-up game on the part of those people maintaining the other resources. Versus 'I can't add a field/table to database.mysql unless it has a comment attached' guarantees that at least *some* level of accurate portrayal of the current picture of the database is there.
Comment #11
niooi commentedYour DB visualisation is excellent. By the way this file needs to be put inside the core CVS. Clients want to have SQL schema as part of their technical specifications and your graph is very helpful. Thank you. Could you submit the Microolap file or put it in the CVS? The advantage of DBDesigner over Microolap is its uses of a xml files that are easier to maintain in a CVS environment.
Niooi
Comment #12
harry slaughterForiegn Keys
There are basically two good ways to handle FKs. 1: make literal FKs using constraints. 2: using a naming convention for PKs/FKs that makes relationships fairly obvious.
MyISAM doesn't currently support real foriegn keys, and because the schema is full of duplicate key names, there's no way to intuitively determine FK relationships.
Field Comments
mysql comments are restricted to a small number of words, so they aren't really suitable for documenting fields.
So what do we do?
There are many very cool DB tools that allow for management of DB documentation, including ERM diagrams (my pet fave). But all that I've seen are fairly 'proprietary', so they wouldn't really be suited to our environment. There are some options I haven't looked into, namely OpenOffice Base. OOBase supports mysql via ODBC/JDBC and supposedly creates ERM. This might be an option for us.
Meanwhile, I think we should begin the documentation with some manageable open format.
Here's a link to a data dictionary created with phpmyadmin:
http://slaughters.com/drupal_data_dictionary.html
FK Links were documented using phpmyadmin (with PMA tables enabled). We could use this HTML as a basis for our document, filling in the comments field with complete documentation of fields. A bit ugly, but also very complete and can be managed by the group.
Here's a similar document in PDF format (also created with PMA):
http://slaughters.com/drupal_data_dictionary.pdf
And here's an example of an ERM diagram (but because of duplicate key names, my tool could not reverse engineer FKs accurately):
http://slaughters.com/images/drupal_schema.png
At this point, I think one of us should take charge of this documentation effort and simply decide how it's going to be. This person would manage the official document and solicit feedback, etc...
Comment #13
bslade commentedSee my forum posting, Possible job offer: how much to document the Drupal database schema?
Ben in DC
PublicMailbox at BenSlade dot com
Comment #14
niooi commentedWhat is the reasoning for not including FKs?
Thanks
Comment #15
tostinni commentedAs mentionned previously, MySQL ISAM tables doesn't support FK...
Comment #16
niooi commentedHello,
But why using MyISAM if MySQL supports Innodb with Foreign Keys? Was Drupal database developped with MyISAM in head?
Last question: How would you call Drupal's way to "serialize" data for custom fields (eg: profile_fields and profile_values)? Is this serialization a bottleneck compared to strucutred tables?
Thanks,
Niooi
Comment #17
tostinni commentedInnoDB was not decided to be implemented when Drupal first went off.
Then it was discussed pretty much about adding such support, but regarding the numerous problem users encounters with MySQL scheme (they don't get granted for LOCK and creation of temporary tables), I think this move is very difficult to do (very few host support InnoDB tables in my mind).
There's quite some discussion about this here ;)
Comment #18
webchickI think a three-pronged approach to this would be good:
1. Comments added to the MySQL file, per killes' method. Even if those comments say "See http://www.drupal.org/node/12345" as a field description (thus abiding by the < 60 chars rule). No new changes to database.mysql until the additions/changes are commented. That way we always retain a base level of documentation about each field in the database.
2. A mysqldump of hlslaughter's pma_relation table which he's used to output the data dictionary. Coupled with comments on every field, this will be an extremely useful document, and the mysqldump will ensure that someone from the docs team can keep this updated if hlslaughter doesn't have time. Both the mysqldump and the HTML output should be stored in CVS somewhere (probably contrib/docs/developer/db or something) so they can be diff'ed version controlled and such.
3. The XML output of an E-R diagram done up on DBDesigner, located in the same spot on CVS. This is a free, GPL'd program which runs on both windows and linux, so we don't have to rely on one individual who could afford ProprietaryModeler 5.0 to do the diagram. It's a bit clumsy to work with at first, but once you get used to it it's not too bad. I've attached a sample output of the quiz module so you can see how it looks. DBD will attempt to make the relations itself based on key names, but you'll need to tweak the rest to get it the way you want it. The nice thing though is if this XML diagram is centralized, a few of us can work on it at the same time.
Comment #19
webchickJust as an update, I've created http://cvs.drupal.org/viewcvs/drupal/contributions/docs/developer/database/ as a centralized working point for database documentation.
However, this will work best if we *don't* have 80,000 people submitting updates at the same time. ;) Therefore, please submit corrections/changes to this thread (or, if that's not acceptable, I'll start a new issue in the documentation project for this) and either myself or hlslaughter will get it updated.
Things that need to be done:
1. Check over all the relations and make sure they're correct (there are still a couple outstanding ones I'm not sure about).
2. Document database.mysql with comments, per killes's original post. This is A LOT of work.
3. Identify the "trouble" fields that can't be summed up in 60 characters or less. I've created a new handbook page under the Module Developer's Guide > Reference for database documentation (http://drupal.org/node/38977), and we can centralize documentation of the "tricky" bits here. Then the only comment in these fields will be a reference to the handbook page.
4. Generate an E-R model in DBDesigner. Probably doesn't make sense to do this until 4.7 RC is out, since database changes are still very possible before then (and even after, but less likely). Once finished, upload the XML output to the CVS repository where we can keep track of changes.
So at the end of the day, we'll have 3 documents:
1. An E-R model for the more "visual" folks.
2. A data dictionary which provides thorough documentation of every table and field
3. A fully documented database.mysql file that STAYS documented, for those who just want to take a quick peek and remind themselves what 'somefield' is for.
So if you'd like to help out with anything on that list, please by all means. ;) Once we all of get this done, it will be pretty trivial to maintain -- just a matter of keeping up with the changes to database.mysql.
Comment #20
Zen commentedAs in my previous comment, I'm not a big fan of adding everything in the SQL files. The best way to do this imo is to just add a link in the COMMENT field for the table, linking to the pages on drupaldocs.org. So the table declarations would be something like:
CREATE TABLE access (
aid tinyint(10) NOT NULL auto_increment,
mask varchar(255) NOT NULL default '',
type varchar(255) NOT NULL default '',
status tinyint(2) NOT NULL default '0',
PRIMARY KEY (aid)
) TYPE=MyISAM COMMENT='http://drupaldocs.org/dbd/4.7/#table-access';
This will greatly reduce the amount of maintenance, make things more consistent as everything is in one place etc. etc. The diagrams can be linked from the drupaldocs pages.
Over the last couple of weeks, I've been .. sedately preparing the dictionary page, which might help as a good base for this. The entire thing is scripted (locally) and the comments are populated via an array. I upload the rendered page to the site.. Still under heavy construction, and I will need help with the tables I'm unfamiliar with..
Also, with the diagrams I don't think they are going to be useful unless they are accurate. As mentioned previously, the current naming conventions don't help when it comes to identifying foreign keys. Perhaps the pgSQL DB can be used as a base for the diagrams? Also, will it be a good idea to add FOREIGN KEY, CHECK and REFERENCE syntax (to the mysql db) for completeness? I'm unsure of 3.23 compatibility..
I also distinctly remember the DBDesigner project being shelved in favour of another similar but official product from MySQL AB - looks like they're back :)
Cheers
-K
Comment #21
harry slaughteri guess interest in this waned.
i'd still like to see this and am willing to help.
i think zen's idea is good. if we simply created a link as the comment for each table and pointed it to something like /db/schema/4.7/tablename, and used a standard format for each table definition (like: http://www.quillem.com/drupal/db-dictionary ), we'd at least have a place for folks to put the data that is known about each table. we might even extend it to include non-core modules, authors could contribute schema definitions for there own tables.
Comment #22
killes@www.drop.org commentedI am still yvery much interested in this.
I think the way we should proceed is as follows:
1) commit an annotated version (not neccessarily complete) to contrib cvs in a place where api.drupal.org can pick it up
2) educate api.module to parse the mysql comments and present them in a way usefull to api.drupal.org.
Comment #23
john.currier commentedHave you guys looked at using an open source tool like SchemaSpy (http://schemaspy.sourceforge.net) to document your database schema? And no, it doesn't currently display comments in the generated html, but I'm working on that now.
Comment #24
killes@www.drop.org commentedJohn, we'd be delighted to use your tool, however I think it will have problems with the fact that we don't declare foreign keys, etc. Could you generate us some example output from our database.mysql file?
Comment #25
wulff commentedhttp://www.ratatosk.net/tmp/schemaspy/
Comment #26
killes@www.drop.org commentedThanks wulff!
So I guess we need to wait untill either we add foreign keys or John figures out a way to use some pseudo foreign keys which are stored in comments.
Comment #27
Zen commentedIs there per-field/table comment support? I didn't see any in the examples.
Thanks :)
-K
Comment #28
killes@www.drop.org commentedWell, since we noe ship with an extra file for mysql 4.1 and up we coul dtake a copy of that, add foreign keys, and store it in contrib/docs. Then we would need to install John's software on api.drupal.org to display it.
Comment #29
webchick> Well, since we noe ship with an extra file for mysql 4.1 and up we coul dtake a copy of that, add foreign keys, and store it in contrib/docs.
Sorry, but -1. :( See the long list of Forms API reference bugs for a shining example of why you don't want to have docs for stuff on their own, separate from source code-generated ones. One person is going to get socked with the responsiblity to keep it updated and when they get busy the docs are going to fall by the wayside and become inaccurate, causing confusion.
The only way, imo, for this to work is to embed the documentation *in the source file* because only then will we be able to ensure it gets updated (no new columns/tables will be allowed to get created without also having documentation -- same as we do with PHPDoc comments and new functions).
Harry Slaughter and I worked on this problem a bit late last year, using PHPMyAdmin's pma_relation table to store the primary/foreign key relationships. (see: http://cvs.drupal.org/viewcvs/drupal/contributions/docs/developer/database/). How this works is it embeds in comments how the tables are related, for example:
--
-- RELATIONS FOR TABLE accesslog:
-- sid
-- sessions -> sid
-- uid
-- users -> uid
--
Because these will be in the same format all the time, they could probably be parsed out with a tool such as schemaspy.
Comment #30
john.currier commentedZen, you're right, it doesn't have comment support yet. I've got it all written, but I'm having trouble with MySQL's JDBC driver not returning table comments (it returns column comments though).
And yes, without true foreign key constraints SchemaSpy won't be of much use for analyzing relationships. If the primary keys had unique names that matched the column names that referred to them then its implied relationship logic would 'connect the dots.' The Columns page (http://www.ratatosk.net/tmp/schemaspy/columns.html), however, shows that your primary key names are far from unique. That page does, however, show some interesting inconsistencies that you might want to look into.
Take a look at the filter_formats table (http://www.ratatosk.net/tmp/schemaspy/tables/filter_formats.html) to see what the implied relationships can try to deduce from column names. You'll need to select the 'Implied relationships' checkbox to see anything useful.
If you store the relationships in some non-standard way then you'd have to retrieve them with some customized version of SchemaSpy. Note, however, that it was never designed for that type of thing.
John
Comment #31
john.currier commentedI've stepped through the MySQL JDBC driver and it doesn't even attempt to retrieve comments for tables/views. It's strange that they retrieve comments for columns but skip the high level comments.
I've thought about writing my own extension of their driver, but...
John
Comment #32
john.currier commentedI just published SchemaSpy release 3.0.0 that displays comments for both tables and columns.
Note that MySQL does some screwy things with table comments in that it that they've seriously overloaded the concept. Their inconsistent overloading makes it somewhat impossible to programatically extract the intended details.
Note, however, that SchemaSpy lets you specify your own SQL for retrieving table and column comments, so you can store comments wherever you want in the database.
John
http://schemaspy.sourceforge.net
Comment #33
adrian commentedGuys. The database schemas are due for removal , as soon as the installer gets in.
The install system uses .install files to create schemas, which already gets rid of the schemas.
After that, we want to switch to using a db_create_table function, which will be a wrapper to the mysql / pgsql equivalent.
This would mean that we get rid of all of the schema files, and just have the one .install.
there's even further talk of splitting up all the module's tables to their own separate .install files.
Comment #34
webchickThe tables/columns still need to get documented, though. ;)
Comment #35
killes@www.drop.org commentedActually I am not at all happy with the proposal to split up the database files.
This is ok if a module does have its "own" table (like book.module) but for a core set of tables (node, node_revisions, node_comment_statistics, ....) they should stay together.
I also see no reason to rm the .sql file, they are useful in order to get q quick overview over Drupal's database schemes, even if they aren't actually used due to the installer.
Comment #36
adrian commentedhttp://drupal.org/node/63049
Comment #37
harry slaughterok, i've looked extensively at all sorts of tools for documenting DB schema. they are all share one or more of the following qualities: - too complicated - too sucky - unmaintainable - lots and lots of work
I'm now thinking there is only one way to go, and I believe it has already been suggested. Document the schema in the sql itself.
here's a good example of a fully self-documented schema: http://brg.ai.sri.com/biowarehouse/repos/schema/mysql-schema.sql
the only drawback to this i see is that we'd need to choose only one of the current schemas to contain the documentation (4.1, 4.0, postgres...).
this solution would work regardless of how we divide up the sql (one master, one per table etc).
if we had a convention for writing this documentation, anyone who creates a db table (core or contrib) would be responsible for documenting that schema.
the schema documentation would follow the sql whether it's in a .install file or not.
we'd just use standard sql comments:
i'd be happy to take the existing info we have and incorporate it into one of the existing sql files.
i think the key will be having this documentation inline, pretty much ensuring that it will always be accurate.
Comment #38
Bèr Kessels commentedWhat is holding us from slowly (incremental) starting this documentation?
I know close to nothing about pgsql, but dont mind adding some comments to mysql. FWIW: those in need of a GUI to do so: mysql admininistrator is a crossplatform GUI that allows you to add comments in a rather friendly way!
Comment #39
drummI'm going to guess the status should be active.
Comment #40
harry slaughterattaching a commented version of the mysql4.1 schema.
this is not finished, but it contains most of the important key relationships that i'm aware of. a lot of the fields are self-explanatory, but it would be nice to have comment/definitions for those that aren't.
if this format is acceptable, i can continue working on this. it's a good chance to talk to different folks and understand the schema completely myself :)
Comment #41
brad.bulger commentedwhy add them to the mysql file? since postgres a) supports foreign keys in all versions people will be using, and b) has a comment syntax for tables and columns (eg, COMMENT ON TABLE foo IS 'my foo is good'), and the tables are the same for postgres and mysql, why not use the postgres file as the commented file? is there a sense that actually creating the referential integrity in the database would break stuff?
Comment #42
harry slaughterI have checked this into CVS.
Hopefully the following will happen:
1) Harry will ping appropriate developers who will then dutifully make contributions to this documentation.
2) Harry will look into the appropriate means of extracting this documentation for presentation via drupal.org documentation pages (api.module).
3) When documentation is completed, this annotated schema will live in the /database/ directory.
Harry Slaughter
Comment #43
simeI would appreciate it if someone would review some Handbook pages.
It is a database table summary that is targetted at non-technical users:
http://drupal.org/node/70825
Comment #44
bs commentedSime,
You have done an excellent job. I know the agony of newbie’s in Understanding Drupal Innards. It would be more helpful if any one add appropriate module code snippets along with database schema to understand the Drupal core API.
Thanks for your great work.
Comment #45
harry slaughterthe effort in this thread is to document the database schema for developers.
we won't be addressing any other issues, including multisite installs.
regarding multisite installs, sharing databases can get tricky very quickly. some of the most important tables have dependencies on other tables which may not be obvious to a non-developer. just be careful out there!
Comment #46
harry slaughterthis effort is dead in the water
Comment #47
harry slaughtersee: http://drupal.org/node/79874
Comment #48
gpk commentedFor a diagram for D6 see http://drupal.org/node/184586.