This patch introduces a registry of functions, classes and interfaces. What this is good for?
- No longer need to specify files in hook_menu. It Just Works (TM).
- Forms can be reused from inc files without manual includes which is necessary in D6.
- Preprocess functions can live in inc files.
- module_implements becomes faster and allows us to use a better weighting system later.
- Most hooks can live in inc files (minus hook_boot, hook_exit and the new hook_hooks)
- Every page load can have (almost) exactly only the inc files it needs preloaded. This can speed up bootstrap tremendously by not loading unnecessary code.
There is some nice magic in hook_hooks
: this registers the dynamic hooks. We cache the implementation of these, for example {$form_id}_form_alter
during registry build, so when module_implements
is called with system_themes_form_form_alter
and there is no implementation cached for it then it knows there is no need to look for implementations of this hook. This speeds up things tremendously for such dynamic hooks -- of which I hope we can introduce a few more during D7.
The registry itself and the parser is Crell's idea and code originally please credit him when committing.
Comment | File | Size | Author |
---|---|---|---|
#199 | registry-eall.patch | 719 bytes | robertDouglass |
#196 | module_load_install.patch | 848 bytes | Anonymous (not verified) |
#190 | registry_changelog.patch | 805 bytes | Crell |
#185 | registry_46.patch | 110.03 KB | Anonymous (not verified) |
#175 | registry.patch | 105.28 KB | chx |
Comments
Comment #1
Crell CreditAttribution: Crell commentedTo give some background, the problem with my original registry design was that it hit the database too often. In fact, it ended up hitting the database a few hundred times per request, which is double-plus-ungood. chx solved that by the additional magic in module_implements() which lets Drupal learn about its own hook implementations and cache that knowledge, which not only gives us the lazy-include logic without making the database cry but speeds up module_invoke_all() to boot. The rest is just a logical extension along the same lines.
chx is the one who made the whole thing actually work. :-)
Comment #2
cburschkaOn a cursory reading of the patch, I notice that you are calling module_list with FALSE, TRUE, FALSE, which are the default arguments anyway. From a developer's point of view, to find out what FALSE TRUE FALSE does (no refresh, bootstrap mode, default sort), I'd need to look up the function anyway - so including these arguments explicitly does not add any readability. And since this is core accessing core, we don't need to protect ourselves against sudden changes in the other function.
Comment #3
cburschka(Also, I presume that this patch will require the other core modules to implement hook_hooks too, correct?)
Comment #4
catchsubscribing so I can test a bit later.
Comment #5
chx CreditAttribution: chx commentedYou need hook_hooks only to register dynamic hooks. I am unaware of any other than {$form_id}_form_alter.
Comment #6
chx CreditAttribution: chx commentedFixed those module_list calls.
Comment #7
chx CreditAttribution: chx commentedSupressed hooks for the includes directory so tablesort_init does not fire on hook_init. content_type.inc is now parsed as belonging to node because it is inside the directory 'node'. install files are parsed so drupal_write_record works.
Comment #8
chx CreditAttribution: chx commentedThe module dir setting was wrong for content_type.inc and other similar named files. There is something good here now: if you want to implement a hook in includes, you can, though it'd be awkward a bit: hook_block can be implemented as includes_block.
Comment #9
chx CreditAttribution: chx commentedmodule_implements did not load the include files for the cached stuff. Bummer.
Comment #10
Wim LeersSubscribing.
Comment #11
chx CreditAttribution: chx commentedFixed the definition of hook_form_{$form_id}_alter. Fixed update.php WSOD'ing.
Comment #12
webchickI've been testing this a bit, and haven't run into any problems other than when I enabled all modules, I got an error:
"call_user_func_array() [function.call-user-func-array]: First argument is expected to be a valid callback, 'node_node_type' was given in /Users/webchick/Sites/head/includes/module.inc on line 505."
Comment #13
moshe weitzman CreditAttribution: moshe weitzman commentedLooks nice from the description and reading the code. I'll give it a try soon.
Comment #14
sinasalek CreditAttribution: sinasalek commentedGreat idea thanks, i will give it a try
Comment #15
profix898 CreditAttribution: profix898 commentedSubscribing.
Comment #16
cwgordon7 CreditAttribution: cwgordon7 commentedSubscribing, may be parallel to Drupal pipes...
Comment #17
demeester_roel CreditAttribution: demeester_roel commentedSubscribing.
Comment #18
breyten CreditAttribution: breyten commentedsubscribing.
Comment #19
NancyDruFrom what I can see so far this looks really cool. Is there any better documentation on everything that is included here?
Comment #20
dopry CreditAttribution: dopry commentedI tested the patch with Apache 2, PHP 5.2.5, MySQL 5.0 through a clean install and didn't experience any errors. I enabled upload.module, created some vocabularies and created some stories with terms and attachments without an error.
@chx: it would be nice if you could point out some specific things to test.
The concept is sound and saves from from the dynamic include insanity I deal with in D6, and provides automatic class loading. The code style is sound, and documentation is copious.
+1...
The attached patch is re-rolled to remove a debugging line and fix a system_update conflict. I'd say it's really close to RTBC with a few more positive tests.
Comment #21
Crell CreditAttribution: Crell commentedMake sure you have devel installed and are watching the query log. The various caching systems are designed to avoid ballooning the query count, so make sure it's not showing too many queries against the registry table.
To test the auto-load, try moving the class in aggregator into its own file and rebuilding the registry. (Happens on module enable or when the "clear all caches" button is clicked on the Performance page.) Then setup a feed or two and update them. It should load the file with the class and function normally rather then breaking. :-)
That should cover the two main tests at this point. If those work, the rest should be fine.
Comment #22
dopry CreditAttribution: dopry commentedHere is an updated patch which moves class update_xml_parser to a new file to make testing easier.
Comment #23
chx CreditAttribution: chx commentedTesting is rather simple, no need to move files around -- the file handling code is removed from menu.inc so if Drupal works, admin pages, node submission etc then all is well. I have also removed the file declarations from menu hook. Also included is a
$_SERVER['REQUEST_METHOD'] == 'GET'
for the caching, and that's good: rarely used included files necessary for form submissions holding lots of rarely running code are no longer cached. This was the only big remaining concern of Crell (please do not forget crediting him for this subsystem) and mine.I fixed the node_node_type error on the modules screen -- the error was in module_implements and meant that hooks in include files were not included.
Note that this patch is a continuation of my patch, it uses nothing from dopry's -- there was more debug code to be removed and the system update conflict was solved in an unclean way and also the aforementioned big bug was not fixed.
Comment #24
KarenS CreditAttribution: KarenS commentedThis seems like a great idea. Just to test that I understand this correctly (and to maybe help others new to the issue), are the following correct statements?
1) The files that hold the functions should be named $module.$something.inc, so content.admin.inc would work correctly as an additional registry location for the content module, but there might be a problem using a file named content_admin.inc. In other words, some files may need to be renamed.
2) To implement this in most modules, you would just need to :
a) find and replace if(function_exists('my_function')) {..} with if(registry_check_function('my_function')) {...}.
b) remove any 'file' => 'custom.inc' lines from hook_menu(), since the file name will be discovered automatically.
d) remove 'include_once(custom.inc)' from your code since it won't be needed any more :)
Comment #25
chx CreditAttribution: chx commentedComment #26
keith.smith CreditAttribution: keith.smith commentedNew patch attached, with very minor code comment changes (a couple of typos, periods at end of sentences, capitalization, etc.) Absolutely nothing major, and I diff'd it with the patch in #23 to make sure that I didn't inadvertently change any of the code.
(Please do not credit me on commit.)
Comment #27
Crell CreditAttribution: Crell commentedOne addition on conversion, some standard conventions are a good idea. Eg, page handlers in module.pages.inc and module.admin.inc as now. Hooks that are called once in a blue moon like hook_menu and hook_theme are "registry hooks", and should probably all end up in module.registry.inc. That sort of refactoring will be done to core after the main patch lands and we can establish some de facto best practices that way.
Ideally, code should be grouped in such a way that loading one file will get all of the functions a particular action is going to need. This is optimizing by cut and paste. :-)
Comment #28
webchick@Crell: Hm.
At a certain point though, I fear that we're going to be causing more of a performance overhead including all of these wacky files everywhere. A commenter on the D6 announcement post pointed out that for *really* high-traffic sites that need to use things like opcode caches, more code in *fewer* files is what you want. Additionally, it makes the module really hard to grok when code is here, there, and everywhere. The current hints in the menu system are nice, because they tell me when I can't find function x_y what file it lives in. This is going away now, so my only resource is to grep and have 11 files open in front of me to figure out how a module works.
node.pages.inc and node.admin.inc is a nice, logical separation. But if we start having node.registry.inc and node.node-pages.inc and node.node-teaser-pages.inc and node.admin-all-nodes.inc and node.admin-edit-nodes.inc and so on that's a little bit... insane. Let's shoot for a happy medium that gives us the fewest number of files per module with the biggest performance benefit.
Comment #29
webernet CreditAttribution: webernet commentedAttached patch removes the 'file' lines from forum_menu(). Also fixes a comment, 'enabled', not 'installed' modules.
Comment #30
Crell CreditAttribution: Crell commented@webchick: Absolutely. While this setup could support one-function-per-file, that would be an very stupid way to do it. :-) The advantage here is that we can experiment and find what a good balance would be (registry hooks are simply low-hanging fruit) without having to change the API or the engine itself. Optimize by cut and paste. :-)
That said, if you simply set your opcode cache to not recheck the file system then it shouldn't matter how many files you have. I also find 3-4 well-organized files much easier to deal with than one 1000 line behemoth. I frequently break modules up already into multiple files with a manual include just to make them more manageable. For a module that only has 4 small functions in it, it's probably best to just leave it all in one .module file. Again, it's fully dynamic so modules can organize themselves in whatever way makes the most sense for them.
Comment #31
BioALIEN CreditAttribution: BioALIEN commentedI agree with Crell's statement. Obviously there's a sane limit to how you should break a module's functions. May need to add a "best practice" guide once this lands? Either way, the flexibility is there!
Comment #32
theborg CreditAttribution: theborg commentedSubscribing.
Comment #33
chx CreditAttribution: chx commentedSo what else is needed to get this in?
Comment #34
dropcube CreditAttribution: dropcube commentedSubscribing.
Comment #35
moshe weitzman CreditAttribution: moshe weitzman commentedI tested a clean install and really clicked around. No errors.
When is update_create_registry_tables() supposed to be called? I don't see it in the upgrade path. Once it is there, this is RTBC.
Comment #36
meba CreditAttribution: meba commentedSubscribing
Comment #37
marcingy CreditAttribution: marcingy commentedThis patch seems to work fine and apart from the performance hit on modules page (registry creation) it certainly hasn't had a negative impact on page generation.
+1
Comment #38
moshe weitzman CreditAttribution: moshe weitzman commentedWe haven't discussed much a big benefit of this patch - we can rework the bootstrap so that we hand over control to the menu handlers when only the required modules are loaded. No longer will we indiscrimitately do a full bootstrap for most requests. This system will carefully load modules when they are needed. This has big implications for performance sensitive callbacks like ajax handlers, ahah handlers, private files handlers, etc. This bootstrap refactoring will be my next patch after this gets in.
Also, chx mentioned at the top that this paves the way for allowing modules to order hook invocation on a hook by hook basis instead of the crude system.weight we have now. thats nice.
Comment #39
Wim Leersmoshe: darn you! You got me drooling! :D
Comment #40
Crell CreditAttribution: Crell commented@moshe: Yep, that's exactly the end goal. A well-factored module will have nothing in .module aside from hook_boot, hook_exit, hook_hooks, and methods that have to be called directly. (And even some of those can be moved out.) Pretty much everything else can be loaded on-demand. That makes even a BOOTSTRAP_FULL tiny.
There's still the question of the includes directory, but I suspect we can clean out much of that, too. Phase 2. :-)
I'm not sure what you mean in #35. Can you clarify?
Comment #41
chx CreditAttribution: chx commented#35 says, there is a nice update function which creates the registry tables early but the function is not called. Tomorrow I roll a new one.
Comment #42
chx CreditAttribution: chx commentedSo there. When the Database: TNG patch gets in, we can do better with the update code as then the schema will not be in common.inc.
Comment #43
moshe weitzman CreditAttribution: moshe weitzman commented* need to rename system_update_7000 to system_update_7001
* even after that, i am getting "Call to undefined function cache_get() in /Users/mw/htd/pr/includes/module.inc on line 427" on update.php
Comment #44
chx CreditAttribution: chx commentedThe previous try was rolled from an older version -- sorry. This one seems working.
Comment #45
moshe weitzman CreditAttribution: moshe weitzman commentedok, upgrade path works now. rtbc
Comment #46
chx CreditAttribution: chx commentedMoshe in private raised a concern that we are now INSERTing on every page with the cache_set. The basic assumption is that every GET to a given router path uses the same include files. When this assumption fails, then the safety net is that the cache is incremental, so if there are two sets of includes for a given router path then both will be included always.
Comment #47
chx CreditAttribution: chx commentedLooking at the caching code I found a space error. I have edited the patch to fix this so there is absolutely no change just a code style fix.
Comment #48
chx CreditAttribution: chx commentedAnother reroll. This now really preloads the original_files as I wanted.
Comment #49
chx CreditAttribution: chx commenteda) I removed the possibility to implement hooks in includes/ it was silly anyways b) We no longer include information from disabled modules sharing a directory with an enabled one.
Comment #50
chx CreditAttribution: chx commentedMoshe said it's not silly and he likes it. Easy to move around the check.
Comment #51
starbow CreditAttribution: starbow commentedsubscribing
Comment #52
chx CreditAttribution: chx commentedMore goodness: module_hook ties into module_implements, reusing its cache. Thus node_invoke now can use this cache to save a few queries from the registry but then we have the problem that what node_hook calls $module is not necessarily a module. I circumvented this by adding recognition code for these pseudo modules into the registry build phase.
Comment #53
chx CreditAttribution: chx commentedAnd the patch.
Comment #54
moshe weitzman CreditAttribution: moshe weitzman commentedI tested and this is RTBC.
I found one minor glitch in that fapi often produces registry cache misses that result in a quick query. The calls are:
Until we have a directory of all $form_id on the site, I can't think of a clean solution. It is such a minor issue that the patch should proceed as is.
Comment #55
Dries CreditAttribution: Dries commented$ diffstat
I'm still trying to figure out what the killer advantage is of this patch ... I understand it could potentially speed-up Drupal but do we have any proof of that yet? The advantages listed in the body of the issue are nice, but don't blow me away. I'm a bit sceptic because it doesn't necessarily make the module code easier -- as you can see from diffstat's output, it quite a bit more complex. I'd like to see us unleash more of this patch potential to help us evaluate its usefulness.
Comment #56
chx CreditAttribution: chx commentedWe have wanted to make Drupal boot faster. Once this is in we can explore that so much easier -- most of the current boot process can be removed. Also, currently if you want to reuse a form, you need to figure it out first which file it is in. Also, this patch will make it possible to weight the hooks one by one, even relative to each other, which is a much wanted feature.
As you said it's complex so I would rather not make it even more complex with changing the bootstrap right in this one.
Comment #57
Crell CreditAttribution: Crell commentedI just benchmarked the Drupal 6 page split effort: http://www.garfieldtech.com/blog/benchmark-page-split
Short version: 20% speed improvement for uncached pages. The registry here should let us split off even more code. That's the killer advantage of this patch.
Comment #58
chx CreditAttribution: chx commentedAnother good thing is that hooks become really really cheap. Even dynamic hooks. We can add a lot more dynamic hooks here and there once this is in -- we can even add a hook_generic_function_name_alter so that you can do something with every function there is (altering internal data structures to your hearts content) and this costs us extremely little performance-wise... lots of other craziness is opened up with this.
Comment #59
moshe weitzman CreditAttribution: moshe weitzman commentedrerolled to fix 2 broken hunks in system.install
Comment #60
moshe weitzman CreditAttribution: moshe weitzman commentedoops. i had a local change in there. i *think* his one is same as chx' last patch.
Comment #61
moshe weitzman CreditAttribution: moshe weitzman commentedOK, I experimented tonight in an attempt to demonstrate the bootstrap advantages of the patch. I had some great success. To see for yourself, apply the patch and do this quick test:
1. Enable all the core modules you feel like testing.Then browse to /admin page.
2. Go to module_load_all() in module.inc] and replace module_list() with array('filter', 'block').
3. Refresh /admin page. Yell out "holy shit - my site still works!". The above step cut a gaping wound into the heart of the bootstrap. Only block and filter modules were automatically loaded. The rest are being loaded as needed. Consider the advantage this has once you have 50+ Contrib modules as most sites do now. Thats a lot of code that is not being parsed and loaded into memory.
4. Click around and your site mostly works. You can add/edit/view nodes, manage posts on admin/content/node, add feeds and retrieve feed items, and so on. You can't do funky stuff like install modules and flush all caches. That will take some refactoring. The error messages you see here are trivial compared to the bootstrap change we just made.
As I said earlier, this patch lets us hand over control to the menu callback much sooner. This is great for ajax handlers, autocomplete, web services, private file handling, etc.
I'd love for this patch to be committed before we fully rework the bootstrap. I hope the above demo satisfies the need to show speed.
Comment #62
NancyDruBut, golly, Moshe, that means fewer WSOD's too! We can't have that - how would all those contribs get debugged? ;-)
Comment #63
pwolanin CreditAttribution: pwolanin commentedsubscribe
Comment #64
Anonymous (not verified) CreditAttribution: Anonymous commentedsubscribing
Comment #65
jaydub CreditAttribution: jaydub commentedsubscribe
Comment #66
Dries CreditAttribution: Dries commentedWhile I'd still like to see more of this patch potential evaluated, I decided to move forward and to have a first good look at the code.
My main concern is that there is a lot of stuff going on, and the code is poorly documented. This requires the reviewer to reverse engineer a lot of what is going on. Also, certain function's PHPdoc (i.e. _registry_save_resource()) is out of sync with the declaration of the function.
Please better document the code.
Comment #67
robertDouglass CreditAttribution: robertDouglass commentedAnd that's how Dries reviews patches while skiing. Subscribing.
Comment #68
pwolanin CreditAttribution: pwolanin commentedA question also regarding files like cache.inc - we allow the file to be included to be determined via a variable setting. Is that going to break here? In other words, this registry code will always include the file includes/cache.inc based on the function names?
CNW based on #66
Comment #69
Crell CreditAttribution: Crell commented@pwolanin: The registry only pre-loads files that were lazy-loaded in the first place, I believe. As long as the cache system is initialized before the registry is, and the appropriate file is included, the lazy-loader will never kick in and there should be no problem.
Comment #70
maartenvg CreditAttribution: maartenvg commentedI can patch current CVS, but system.install patches with hunk #4 with fuzz 2 (and a few hunks with offsets), and this results in a
Fatal error: Cannot redeclare system_update_7001() (previously declared in drupal/modules/system/system.install:2702) in drupal/modules/system/system.install on line 2714
Can someone re-roll the patch, please?
Comment #71
Crell CreditAttribution: Crell commentedThere seems to be a problem with both moshe's and chx's latest versions in the installer. I tried to track it down, but couldn't figure out what the issue is. It seems to be trying to run the schema setup twice, the second time with NULL $schema, and that's breaking horribly.
That said, the attached patch refactors _registry_save_resource() out into a series of utility functions that make more sense, and increases the patch size by a third by adding a lot more documentation. Hopefully it should be much easier to follow now. It even adds a new registry @defgroup. :-)
There should be no functional changes in this patch, but I can't confirm that for sure because of the problem with the installer. I'm therefore leaving this as CNW and turning it back over to chx (or whoever else wants to figure out what happened to the installer).
Comment #72
chx CreditAttribution: chx commentedAnother nice thing we can do after the registry is in, there is no longer a problem that you can't invoke hooks relatively early -- so you can actually have a hook for variable defaults. w00t w00t.
Comment #73
Dries CreditAttribution: Dries commentedCrell: thanks for improving the documentation. I'll schedule another review. Glad to see this moving forward.
chx and moshe: thanks for posting 'motivations'. I'm sufficiently convinced now. At this point, I'd like to make sure that (i) these motivations are captures in the patch, (ii) that the code is self-explanatory and (iii) that this patch does not introduce bugs (i.e. caching).
all: please review the code and help make sure that the code is easy to understand. This is a key building block of Drupal, and it is important that it is both elegant and transparent. Your critic eye is required to make this a beautiful piece of design and code. Thanks.
Comment #74
recidive CreditAttribution: recidive commentedI did a quick code review. Here are some observations:
Code documentation formatting is inconsistent, some
@param
's declare data-types. Parameter descriptions are wrongly indented: some are single spaced, some have two spaces (the correct one) and others have tabs.Why the database update function is in update.php instead of system.install? I'm sure there are good reasons for this, I think they should be documented. Also its comment says "If batch table exists, update is not necessary".
I'll test the patch when I get a chance.
Comment #75
cwgordon7 CreditAttribution: cwgordon7 commentedThe last patch seems to be running drupal_rebuild_code_registry() in index.php. I'm pretty sure this is debug code left in there by mistake.
Comment #76
chx CreditAttribution: chx commentedThat was not too hard. As the fallback
module_implements
usesmodule_list
and during install, the module to be installed is not yet in the list but its install is loaded bymodule_load_include
and thenmodule_invoke
is used -- so I added a few lines of code tomodule_load_include
which ensures this works. To balance, I ripped the fallback code frommodule_invoke
it was a) useless b) broken.Comment #77
chx CreditAttribution: chx commentedSomewhere during the rerolls TAB characters got added to my beautiful little patch. Sacrilege! (definitely not me, my editor is set up properly.)
Comment #78
catchGetting a WSOD on a clean install.
Comment #79
Anonymous (not verified) CreditAttribution: Anonymous commentedWSOD is caused by a rebuild of the registry calling back into code that relies on the registry already being built:
_registry_parse_directory --> _registry_get_resource_module --> node_get_types --> _node_types_build --> module_invoke_all --> WSOD
specifically, according to this comment drupal_rebuild_code_registry:
too late for me to fix (3am in sydney), hope this helps someone else get a working patch going.
Comment #80
chx CreditAttribution: chx commentedWell that comment and the surrounding code is outdated, module_invoke_all calls module_implements which has a safeguard for the registry being rebuilt. I'll investigate.
Comment #81
chx CreditAttribution: chx commentedCalling
_registry_get_resource_hook
makes the conversion to utility functions and the patch actually work.Comment #82
catchOk no WSOD this time, but I'm getting table doesn't exist errors after enabling all modules.
Comment #83
chx CreditAttribution: chx commentedInstead of trying to figure out which version of module_implements to run we now make it possible to set a "dumb" mode which is basically what module_implements is now. This is set during registry rebuild (as in the previous patches) and during install of modules.
Comment #84
Anonymous (not verified) CreditAttribution: Anonymous commentedw00t! works for me after installing new modules.
updated patch to get rid of fuzz against HEAD.
Comment #85
keith.smith CreditAttribution: keith.smith commentedAttached is a version of the patch in #83 with substantially reworked comments in most places, a few minor typos corrected, and double spaces changed to single spaces between sentences. There should be no functional changes, if I did this correctly.
(Also, as in #26, please do not credit me on commit.)
Comment #86
chx CreditAttribution: chx commentedRegistry allows http://drupal.org/node/237959#comment-790661
Comment #87
moshe weitzman CreditAttribution: moshe weitzman commentedI could find only 2 problems
- right after i installed, i followed a link to node/add and got access denied. that cleared up though after i submitted the modules page
- i am getting a NOTICE and blank node form the notice is "notice: Undefined index: blog_node_form in /Users/mw/htd/registry/includes/form.inc on line 344.". perhaps the registry is missing those dynamic form ids again.
Comment #88
chx CreditAttribution: chx commentedAs we check for disabled modules, adding node_content to the module list in _registry_parse_directory helped the problem. Also, I added access to the node hooks list. Pseudo modules are a pain. I have grepped for node_invoke in node directory and these are all the hooks. I have updated some of the code comments in node.module so that now you can actually search for every node_invoke and get a list of hooks.
Comment #89
floretan CreditAttribution: floretan commentedRe-rolled this patch (there was a conflict in common.inc), and fixed a misnamed variable in system_update_7002() that was keeping the update from being successfully executed.
Comment #90
Anonymous (not verified) CreditAttribution: Anonymous commentedtried patches at #88 and #89, both need work.
- create content menus are missing after fresh install
- going to node/add/page --> access denied for uid 1
- installing a module --> WSOD for all pages on the site
Comment #91
Anonymous (not verified) CreditAttribution: Anonymous commentedthis is a nasty hack in module_hook makes the create content menus appear on a fresh install:
the real problem is that there seem to be some namespace hacks around the module names 'node' and 'node_content'. not sure the best way to go here, but currently module_implements doesn't return 'node_content' as a module that implements the 'form' hook.
also, the current implementation seems to rely on the hook cache too much. to illustrate:
1. fresh install plus patch
2. clear the hook row in cache table
3. visit admin/build/modules --> WSOD
Comment #92
pwolanin CreditAttribution: pwolanin commentedI had a patch to clean up some of the "node_content" stuff at one point, though I'm not sure if it would help here.
Comment #93
chx CreditAttribution: chx commentedI went back to #85 enrolled all the fixes, moved the hooks to cache_registry -- note that if the hooks cache entry is not there, the registry gets rebuilt -- and changed
node_hook
toregistry_check_function($module .'_'. $hook);
because that's what it is, node_content is not a module.Comment #94
chx CreditAttribution: chx commentedAnd the patch. The above comment / solution was wrong, the problem with content types was that we did not rebuild the registry after install, there is a comment in install.php which says: Rebuild menu to get content type links registered by the profile, and possibly any other menu items created through the tasks. I added registry there.
Edit: I realized it was wrong because module_hook deliberately does not check whether $module is really a module. That's enforced only in dumb mode because then we have no other way.
Comment #95
Anonymous (not verified) CreditAttribution: Anonymous commentedthe patch at #94 applies with some fuzz, but all the issues from #90 are fixed.
creating content works, enabling and disabling modules and themes works.
i also couldn't get things to fail by deleting the registry or hook cache, or emptying the registry table between page requests.
nice work!
Comment #96
chx CreditAttribution: chx commentedRerolled against HEAD. Absolutely no changes.
Comment #97
Anonymous (not verified) CreditAttribution: Anonymous commentedi've spent some more time trying out most of the admin section, enabling all modules, turning caching on and off.
i think this patch is ready to go.
Comment #98
pwolanin CreditAttribution: pwolanin commentedI'd like to test still - I'm concerned about what happens if (for example) several modules have a .inc file that define the same functions.
Comment #99
moshe weitzman CreditAttribution: moshe weitzman commented@pwolanin - there is no new problem with function duplication. this new system just loads less code on every request than the current drupal. if you get a duplicate function error with the new code, you would have certainly gotten it with the existing code.
Comment #100
Crell CreditAttribution: Crell commentedDuplicate function names indicate a bug in one or more modules for failure to namespace their code properly. That's true now, and it's true after this patch. No change. :-)
The only place that's not true is the cache and database systems. Both of those initialize before the registry so the registry wouldn't break on those since it never loads them itself. And the database will not be that way for long if the new Database API patch ever gets in. :-)
Comment #101
NancyDruI was just working on something that makes me wonder if this new facility will address: With the movement to "compartmentalize" code (which I like), would this allow hook_block code to be split from a module? This could, conceivably, greatly speed up block usage, which I can see slowing down my page rendering.
Comment #102
pwolanin CreditAttribution: pwolanin commented@Crell - well, in core already is the new password.inc file which may also be substituted based on a variable.
Comment #103
floretan CreditAttribution: floretan commentedWhen running update.php, the system update 7002 generates the following error on the "select updates" page:
The errors don't seem to affect the upgrade, but we can't commit something that will require users to "just ignore" an error while updating their site.
Comment #104
Crell CreditAttribution: Crell commented@nancyw: Yes, if we can compartmentalize block code more than it is now. One of my later goals for Drupal 7, time permitting, is to refactor the way blocks work to be more hook_menu()-ish, with a registry function that specifies callbacks to call for each block. That would allow the registry function, like hook_menu and hook_themes, to live in call-once files ($module.registry.inc) and the callback function to live "wherever", which could be conditionally loaded. However, any block that is displayed would still have to have its code loaded unless it's cached. The D6 block caching system is separate from either the Registry or any block refactoring. (Both the registry and the push for testing benefit from a move to smaller, more self-contained chunks of code, which is an overall API-betterment anyway.)
@pwolanin: Well, if password.inc is loaded before the registry then there's no problem. If not, that *may* be an issue. My preference, though, is like the database to move these "swappable subsystems" to a factory/object structure like the new database API uses. That provides a single front-end, lazy-code-loading, and no duplicate resource names.
Comment #105
NancyDruThanks, Larry. That's what I wanted to hear. I'd rather load 500 bytes than 50,000. (Now, if I can convince people that they don't need comments...)
Comment #106
Crell CreditAttribution: Crell commented*Crell stabs nancyw...
It's not the file size that's the issue, it's the code compile weight that we're trying to eliminate. Comments have virtually no compile weight. :-)
Comment #107
pwolanin CreditAttribution: pwolanin commented@Crell - that file is now loaded on demand during user login or user account edit submit. i.e. it would probably be well after the registry step. If it needs to be refactored, you'll have to explain to me how such an object would work and such a switch should be part of this patch.
Comment #108
chx CreditAttribution: chx commentedThe file include part of registry_check_file is now safegaurded against maintenance_mode. I have also simplified the function.
Comment #109
floretan CreditAttribution: floretan commentedYou probably meant defined() (http://us2.php.net/manual/en/function.defined.php) rather than is_defined() which is not a php function. No other changes. update.php runs successfully, if I wasn't making a fix I'd set this as RTBC.
Comment #110
Mongi CreditAttribution: Mongi commentedsubscribing so I can test a bit later...
Mongi Daghbouj
Malermester
http://www.holtemalerfirma.dk
Comment #111
Crell CreditAttribution: Crell commentedAhem.
Comment #112
Anonymous (not verified) CreditAttribution: Anonymous commentedthe patch at #109 applies and cleanly, and i couldn't find any problems from a fresh install.
however, looking at the code for 'what if' type problems, i tried out a 'what if the process rebuilding the registry fails before completion'. eg, put in a die() at some point after the registry is flushed and before all the new registry rows (over 1385 separate inserts queries on a standard install) are written. or just delete all/half of the registry in between two requests.
hilarity ensues - pages are randomly inaccessable, and its not obvious how to get the registry cache rebuilt without hackery. so, setting this back to patch needs work. i think we need to rework the registry rebuild so that we don't rely on processes not die()ing during a 1000+ insert run.
two suggestions:
- read the new registry data into memory *before* the flush and inserts. still leaves us open, but for a shorter time and we could do batch inserts. might mean larger RAM use for that request though
- keep track of the files we are parsing, and don't delete registry items that haven't changed (which would be the vast bulk). i think this is the cleanest and fastest way, but would add some complexity.
thoughts?
Comment #113
chx CreditAttribution: chx commentedI should not set this to RTBC but two people already said it works. justinrandell's problem IMO is a follow up patch, making the registry rebuild more robust.
Comment #114
Crell CreditAttribution: Crell commentedBatch inserts will be trivial with the new Database API, as it supports multi-insert statements (using whatever database-specific syntax is appropriate). That is still in development, though.
File-by-file rebuilding is interesting, and would offer a smaller total-death window, but would still leave the system in an unstable state should the process die halfway through system.module for some reason. Considering how rare a registry rebuild is, I don't know how huge a problem that would be.
The new Database API is also designed to support transactions where possible. Wrapping the entire delete/rebuild cycle in a transaction would offer far better robustness on databases that support transactions, as the system could (in theory) never be left in an unstable state. That's almost a text-book use case for transactions, actually. That's also pending the Database work.
So in short, yes it's a problem, but not a huge one, and two of the three ways to ameliorate it are pending on the new Database API. We're still working on it, but it's getting steadily closer! :-) Leaving as RTBC.
Comment #115
Anonymous (not verified) CreditAttribution: Anonymous commentedi'm happy for this patch to go in, and robustness to be added later in a follow up issue.
i'll volunteer to create some patches - even though the registry rebuild is rare, the current implementation is internsive on the webserver and the db, which always makes Bad Things more likely.
Comment #116
carlmcdade CreditAttribution: carlmcdade commentedsubscribing
Comment #117
pwolanin CreditAttribution: pwolanin commentedI discussed with Crell last night - if the patch goes in as now, you will be unable to use an alternate password hashing framework. So, that code will need to be refactored either as part of this patch, or immediately after.
Comment #118
ksenzeeI tested the patch in #109 and it applies with fuzz 2 against HEAD, but it works fine. Creating and deleting users, profiles, and nodes all worked.
Comment #119
Crell CreditAttribution: Crell commentedAlso as discussed with pwolanin last night, the underlying problem is that the Registry will get confused by pluggable systems that use duplicate function names that initialize after the registry in the bootstrap pipeline. Right now, there is only one such system: the password hashing. (Database and cache initialize before it.) The solution is to tweak the password hashing framework to use a non-duplication-based pluggable mechanism, which is actually quite easy to do in at least two ways (one OOP, one not). Since there are no additional hashing libraries right now AFAIK, that is not a problem right now. Password can be refactored after this goes in as a sample for how to do such pluggable systems. I've already discussed with pwolanin how to do so. Therefore, that is not a blocker on this patch. :-)
Comment #120
coupet CreditAttribution: coupet commentedsubscribing
Comment #121
Anonymous (not verified) CreditAttribution: Anonymous commentedwas testing a soon-to-be-submitted patch that adds don't-parse-a-file-that-hasn't-changed functionality to the registry, i came across this path to a WSOD. at first i thought it was the new code, but it turns out to apply to the patch at #109 as well:
1. clean install plus patch
2. enable statistics module
3. go to performance page and clear all pages -->
Fatal error: Call to undefined function drupal_rebuild_code_registry()
the call to drupal_rebuild_code_registry was triggered by a failed cache lookup in module_implements:
attached patch includes a require call for common.inc in the bootstrap, and makes
drupal_rebuild_cache_registry
check forfile_scan_directory
andnode_get_type
.Comment #122
chx CreditAttribution: chx commentedLoading the whole of common.inc during this stage of bootstrap would be a big performance blow. I will inspect what can we do instead.
Comment #123
Anonymous (not verified) CreditAttribution: Anonymous commented@chx: yes, i agree.
feels a bit like we need a registry.inc.
Comment #124
Crell CreditAttribution: Crell commentedYargh.
A registry.inc sounds good, as it should be quite small (under 400 lines) and we can load that very very early in the bootstrap process. That will also mean we can rely on it for more of the early bootstrap phase. However, it does need to initialize after the database, since it relies on the database to work. Perhaps it should get its own bootstrap phase, right after the database phase?
Comment #125
chx CreditAttribution: chx commentedI went back to #109 , moved registry_check_function and registry_mark_function into bootstrap, the registry build functions into a .inc , written a two line loader for that also in bootstrap. I believe this properly fixes the issues. So most of the added new code is now in a separate, on-demand loaded inc.
Comment #126
chx CreditAttribution: chx commentedI have refactored registry_check_class and interface into one function and moved everything those to bootstrp too.
Comment #127
Dries CreditAttribution: Dries commentedI had another good look at the registry patch. Here are my thoughts:
* I'd like to understand/measure better how many SQL queries this adds to the average page view. It's not clear how well the caching works in certain situations.
* One of the things I always really liked about Drupal is that it loads things based on convention. Conventions are a powerful mechanism to avoid expensive database operations (including caching), function body parsing, etc. This patch is a bit of a regression in terms of the use of conventions and, frankly, sheer elegance. While I think the registry is a great idea, I think we should think more about the implementation. My gut feeling tells me we can work on making this more elegant and less intrusive.
* Since the last review, a lot of progress has been made on the documentation. That was helpful.
* I found 'registry_check_function' and friends to be poorly named. Based on the function name, I couldn't figure out what this function did. I had to skip forward and read its PHPdoc instead. This is the function that does the actual loading so I'd rename it accordingly. Something like 'registry_load_function()', 'registry_load_code()', or 'registry_include_function_file()' seems more appropriate to me.
* drupal_rebuild_code_registry() needs more code comments or better named helper functions. It's not clear what is going on. The difference between 'patterns' and 'implementations' is unclear and would benefit from documentation. Also, the calls to module_implements() are awkward.
* Why can _registry_get_resource_module() return 'includes' -- that looks a bit like a hack to me? From an API's point of view, if it is not a valid module name, it should never be returned by _registry_get_resource_module().
* I think the word 'resource' is somewhat poorly chosen -- it is too generic and it isn't very descriptive which make the code /flow/ less fluent. From a compiler, linker or loader's point of view, a 'resource' is a 'symbol table entry', or 'symbol' for short. It might be better to use the correct terminology. I know it would have helped me understand things.
* _registry_get_resource_name() returns FALSE in case the resource is already processed. I found that to be odd. You'd expect it to return FALSE when the resource is not found. It think we should fix that up as it is going to be a source of confusion.
* "That is, if module foo provides the implementation of hook_example() on behalf of module bar, then the function must reside in foo/bar.something.inc for it to associate with module bar correctly." -- I don't get this. Can you elaborate on that or reformulate that in the code comments?
Comment #128
chx CreditAttribution: chx commentedComment #129
Crell CreditAttribution: Crell commented2. This system does rely on convention; specifically, file naming convention. See point #7. We cannot reliably derive a module name from a function name, because have no way of telling whether foo_bar_baz is the baz hook of module foo_bar or the bar_baz hook of module foo. (This was discussed at great length in the original page split thread.) However, with the dot-based file names we can derive the module from that (since module names cannot contain a period). So it is convention-based.
3. I actually disagree about renaming registry_check_function(). As an API, it's purpose is not to load a function. It is a drop-in replacement for function_exists(). It has two possible return values: No the function isn't callable, Yes the function is callable, and (pause while I load that file) now it is! A calling routine shouldn't know about loading, just "can I call this function, yes or no?" registry_check_class() and registry_check_interface() I can understand renaming, since they are never called directly but only from the autoload mechanism in PHP.
7. What don't you get? We can revise that text. It should probably also be moved to the @group block so that all of the documentation is centralized.
Comment #130
moshe weitzman CreditAttribution: moshe weitzman commentedThe motivation here is to make sane the dumping ground that is system.module. That module implements lots of hooks on behalf of files in the includes directory. For example, hook_elements() is in system.module even though it more properly belongs in form.inc. The code for deleting old temp files is in system_cron() even though it rightfully belongs in file.inc. I would not cry if we lost this feature, but I do think it is a step forward.
Comment #131
Crell CreditAttribution: Crell commentedOne possibility this approach opens up is to move code from /includes/ to the system module, but NOT system.module. That way we still get easy conditional loading, but it's more inline with Drupal's module-centric approach to things. That's something to deal with in a later follow-up, though.
Comment #132
Anonymous (not verified) CreditAttribution: Anonymous commentedi completely agree with this. i've been having a hard time with the coupling between module_implements and the registry code. i think we can use the registry to make it simpler, not more complex. some ideas:
1. keep the registry's bootstrap problems out of module_implements. as far as i can see, the only code that needs a non-registry version of module_implements is the registry code, and the only reason the registry code needs this is to get at hook_hooks. so, instead of adding the 'dumb' switch and more complexity to module_implements, lets create a function called _registry_get_dynamic_hook_patterns or similar, and put whatever the registry needs to get the list of patterns from hook_hooks without module_implements in there.
2. get rid of of the 'refresh' flag to module_implements. currently, we only refresh the list in core from module_enable and module_disable, both of which should refresh the list by a rebuild of the code registry. i think the registry code should manage caching (per-request and between requests) and resetting of sorted and sorted-by-name implementation lists, and have module_implements as a read-only call into the registry
(its currently not possible to call into module_implements and get the list for a hook sorted in a different way than that used the first time a hook's list is fetched without refreshing it, so we could fix that as well. and maybe rename the $sort flag to $sort_by_name?)
3. module_implements would look like this:
4. store the registry data such that it can handle a single resource as a candidate for multiple hooks, so registry_get_hook_implementations can do simple sql for fetching the data and caching like the rest of drupal. leave the pattern matching for dynamic hooks to the registry rebuild code.
Comment #133
Anonymous (not verified) CreditAttribution: Anonymous commentedhmmm, that's wrong. the call to node_get_types leads back into module_implements, so keeping registry rebuild out of module_implements is not so simple.
Comment #134
chx CreditAttribution: chx commentedRight. Drupal is a bit entangled. I would rather see this committed and work on followup patches which make things simpler. It is not a trivial matter, involves lot more than the registry to make it simpler.
Comment #135
Anonymous (not verified) CreditAttribution: Anonymous commentedok, so how about we just make registry 'dumber' about drupal's module concepts? so, the registry parsing pulls out information about a file and its functions, classes and interfaces, and thats it.
the patch just drops in the lazy-load stuff, and the existing module/pseudo-module code can remain mostly unchanged. attached is a patch based on this, and even with the registry-rebuild now being 'don't parse it if it hasn't changed' aware, it made things much, much simpler.
i haven't done anything to address Dries' 'how many time does this hit the db' question yet, but that feels like a simpler thing to do than getting this to work right.
Comment #136
chx CreditAttribution: chx commentedYou should have the read issue from the beginning. The reason we tie in to module_implements is to lower the number of queries. It's a massive hit otherwise. Check with devel module.
Comment #137
chx CreditAttribution: chx commentedI shall reroll tomorrow morning.
Comment #138
chx CreditAttribution: chx commentedIt's a reroll from #126 with a few search-and-replaces: registry_check_function is now drupal_function_exists and there is drupal_autoload_class and drupal_autoload_interface. Hope Dries likes these.
Comment #139
Anonymous (not verified) CreditAttribution: Anonymous commentedWSOD on this patch. update_7004 we redeclared, and there was an errant space in the require_once in drupal_rebuild_code_registry.
attached patch fixes these issues.
the path to a WSOD from #121 via enabling statistics and clearing caches is still evident in this patch.
@chx: any objections to working the 'don't delete unless something has changed' parts of my patch from #135 into this? i'm assuming we both agree on those parts?
yeah, according to my count the patch at #135 added over a hundred extra queries. not nice, and i guess i didn't make my point about not having addressed the db queries well enough. i wrote some naive-but-simple hook candidate collecting and caching, (based soley on a module_list call and the collected function names) and the db hits go away. its as quick according to ab and as easy on the db as the patch attached to this comment.
i'm still in favour of this general approach, but i'm not going to die in a ditch about it :-)
Comment #140
chx CreditAttribution: chx commentedI clarified with Justin that he only fixed a stray space in bootstrap and system_update_7004->7005. To avoid the WSOD I am now loading the necessary files: common.inc for drupal_get_path, node.module for node_get_types and file for file_scan_directory. All three utility functions are standalone and now I do get no errors with statistics enabled and cache_registry truncated.
Comment #141
Anonymous (not verified) CreditAttribution: Anonymous commented@chx: the new registry.inc is missing from this patch?
Comment #142
chx CreditAttribution: chx commentedBah!
Comment #143
Anonymous (not verified) CreditAttribution: Anonymous commentedok, this patch survives the WSOD on a cache clear when modules that use hook_boot are enabled.
w00t!
Comment #144
Anonymous (not verified) CreditAttribution: Anonymous commentedwoops! didn't mean to set this to code needs work.
/me runs away before chx finds out...
Comment #145
Anonymous (not verified) CreditAttribution: Anonymous commentedkicked the tires some more, can't break it, setting to RTBC.
Comment #146
Anonymous (not verified) CreditAttribution: Anonymous commentednew patch to keep up with HEAD, no functional changes.
Comment #147
Anonymous (not verified) CreditAttribution: Anonymous commentednew patch to keep up with HEAD, no functional changes.
Comment #148
Crell CreditAttribution: Crell commentedSo this patch has been in the "Patch Spotlight" for weeks now. It's now just tracking HEAD. What still needs to be done before we can land this patch and move on to cutting and pasting our way to a faster Drupal? :-)
Comment #149
dmitrig01 CreditAttribution: dmitrig01 commentedIn
install.inc
, you call function_exists instead of drupal_function_exists. Is this intentional? My gut wants me to mark as CNWComment #150
Crell CreditAttribution: Crell commentedThe registry hasn't been built yet in most of install.inc, so drupal_function_exists() and module_implements() don't work yet.
Comment #151
Anonymous (not verified) CreditAttribution: Anonymous commented@Crell: i'd like to see the registry only parse files that haven't changed, as per my question in #139, and refactoring to disentangle the registry from drupal's module_* functions. both could be addressed in a follow up patch(es) though, so i guess we need some feedback from chx and dries about this.
i'd prefer they go in with the initial patch, but i'll defer to their opinions.
irrespective of either of the above, the next thing this patch needs is unit tests.
Comment #152
Anonymous (not verified) CreditAttribution: Anonymous commentedsetting this to code needs work because applying the registry patch kills simpletest, which makes it kinda hard to write tests for the registry :-(
will post more when i figure out why.
Comment #153
Crell CreditAttribution: Crell commentedI don't see how the registry could be disengaged from module_*(), since part of the point is to be able to lazy-load code rapidly in module_*().
Breaking simpletests is, um, yeah, a problem. In what way does it break?
Comment #154
Stefan Nagtegaal CreditAttribution: Stefan Nagtegaal commentedNeeds a re-roll on my HEAD..
unfortunatly I could not fix the failing hunks myself..
Comment #155
Anonymous (not verified) CreditAttribution: Anonymous commentedthere are two ways i see the registry requiring knowledge about modules.
first, at build time, the registry has to know about which modules are enabled to figure out which files to parse, so it has to call into module_list.
second, at 'runtime' (where we are just reading the registry data for a page request), we want to build some module-aware caches, most importantly a cache of hook implementations.
we can uncouple the rebuilding part of the registry from everything except module list, if we accept that the registry build should just do one thing well - scan a bunch of files for functions/classes/interfaces and record them.
at the moment, we are trying to build the hook implementations cache at registry build time, which requires:
- some icky 'which resource does this module belong to' and 'which hook could this function be a candidate implementation for' code
- calls from drupal_rebuild_code_registry into module_implements which can call back into drupal_rebuild_code_registry. this is what was killing simpletest - module_implements --> drupal_rebuild_code_registry --> module_implements --> kaboom
making drupal go faster by putting some caches in front of the registry data doesn't require that we build those caches at registry build time. instead, we can generate the hook implementation cache like the menu-router cache - collect information during the request, cache it at the end of the request.
the attached patch separates building the hook implementation cache from building the registry and can run simpletests without blowing up. (ab also says its a bit faster than the patch at #147.)
i've also put the per-file registry build code in, so the registry won't reparse files that haven't changed since the last run. i'm happy to split that out into a separate patch if people don't like the idea 'separate building the registry from its caches' stuff.
there's more to do (including tests), but i'm setting to code needs review to get some feedback.
Comment #156
chx CreditAttribution: chx commentedThe general approach is fine. There are a number of details though....
array_key_exists
one of the things PHP should not have. Useisset
instead, it's almost always the same, unless the array value in question can be NULL which is not the case._registry_parse_files
needs a restructure, moving things into utility functions, I can't believe we can't make this without an if-else-if-continue structure.drupal_function_exists
you are doing a JOIN. I would like to see that gone even at the cost of denormalizing._registry_check_code
picks the file from the registry table anyways.registry_cache_path_files
suffers from the same. A JOIN will always be slower than a SELECT from a well-indexed single table.module_implements
you changed themodule_list
arguments, why?All in all: congratulations, very nice cleanup!
Comment #157
chx CreditAttribution: chx commentedI am working on this.
Comment #158
Anonymous (not verified) CreditAttribution: Anonymous commented@chx: thanks for the quick review! i should wait for you to post an updated patch?
1 and 2: ok and ok
3: not sure what you mean here - i think the problem is that i'm not using proper query placeholders. any problems here are my own, not sure Crell's good name should be brought into it :-)
4: not sure i agree on denormalising until we get more of an idea of the performance impact. i'd prefer this to be correct first, then lets see what follow up patches can do to take advantage of the registry to speed up drupal, and denormalise when its clear thats the only way to go.
5: didn't see the need for a special case :-)
6: i don't understand the current args.
first param, refresh, is false, so unless this is the first call in the request, the second param, bootstrap, will just be ignored. and i don't think we want just the bootstrap modules anyway? am i missing something? i'm curious as to how the module list patch will impact on this patch.
7: that theme call can go, was some left over code from taking the
module_load_all
call out of_drupal_bootstrap_full
to see how drupal would run ;-)Comment #159
Crell CreditAttribution: Crell commentedRegarding item 3: In Drupal 6, you can call db_query() with either of the following forms:
In the DB TNG patch, if I can get that damned thing finished :-), the first form is going away and the second is replaced by, preferably, an associative array of named placeholders. (Unnamed placeholders will still work for db_query()). I have therefore started getting in the habit of using the array form already. Eventually it will have to be an array, so I changed that at some point during cleanup to be more forward-looking. For right now, there's no real difference between the two.
Comment #160
chx CreditAttribution: chx commentedI do not want to hold up the issue -- this patch does not work at this moment, but it should be close. I did the denormalization and the reorganization. The registry table ends up empty, I wonder why.
Comment #161
chx CreditAttribution: chx commentedOh. Figured it out.
Comment #162
chx CreditAttribution: chx commentedplus registry.inc
Comment #163
Anonymous (not verified) CreditAttribution: Anonymous commented@chx: nice job simplifying the code.
attached patch with some corrected/extra docs only - no functional changes.
i'd like to add a
registry.inc.test
file with some unit tests, but currently run DrupalUnitTestCases are broken and the test framework doesn't run non-module tests.Comment #164
pwolanin CreditAttribution: pwolanin commentedMaybe this is or a later patch, but in the registry build part, why not just require that the module .info file specify the list of all files that are to be scanned? Otherwise I think you may need broader API/doc changes to indicate that an active and non-active module may not share the same directory, right?
The current code (for example) would seem not to allow a 3rd party library with a *.php name, right?
Comment #165
Crell CreditAttribution: Crell commentedHm. It took a bit of thinking and discussion with chx to decide if #164 is a very bad idea or a brilliant idea. We settled on brilliant. :-) It would put more work on the module developer (albeit not much), but hugely simplify the parsing code. To wit:
files[mymodule.module] = mymodule
files[mymodule.stuff.inc] = mymodule
files[mymodule.other.inc] = user
files[includes/mymodule.things.inc] = mymodule
That queues up 4 files that make up the module; 1 is in a subdirectory; 3 are "owned" by mymodule, while one is declared "on behalf of" the user module, thus any hooks in it will be presumed to be hook implementations "by" user module rather than by mymodule. (E.g., node.views.inc, which defines node module's views hooks.)
That kills the module detection code:
I am kinda sorry to lose the fully introspective functionality, but I can see where it is worth the simplification. chx said he'd try to write this after a good night's sleep. :-)
Comment #166
Anonymous (not verified) CreditAttribution: Anonymous commentedthe module detection code is dead! long live the module detection code! ;-)
we don't try to gather the hook implementations at build time anymore, so we don't need or use the module name at registry build time, so the module detection code is already dead.
i like the idea at #164 because it solves the 'more than one module sharing a directory' problem. i think we could get away with just:
files[] = mymodule.module
files[] = mymodule.stuff.inc
files[] = mymodule.other.inc
do we want a module to be able to specify a directory and a regex to pass to file_scan_directory? going to far?
Comment #167
chx CreditAttribution: chx commentedtoo far. yeah, the module detection code is nicely gone, but still the file list is needed to avoid the problem Moshe discovered somewhere earlier, namely if a directory contains a ton o' files belonging to several modules. I indeed go to sleep now, if someone surprises me with coding this come next mornin' great, if not, I will do it.
Comment #168
chx CreditAttribution: chx commentedThis did not need too big changes as our info parser automatically picks up whatever is in info. So I just needed to iterate this new array. I have also written a qucik script which adds the files information to the info files , I will add it to the update documentation, it's really a simple one.
Comment #169
recidive CreditAttribution: recidive commentedIs it really necessary to list the .module file on .info? Aren't the .info file always related to the .module? This way can we always process the .module and skip it from the .info file? Or are we willing to be able to make a 'pseudo module' with just a few .inc files but not a .module? Is it currently doable, i.e. can you enable a module that consists of just a .info file?
Comment #170
chx CreditAttribution: chx commentedSure, you are right.
Comment #171
Crell CreditAttribution: Crell commentedI'd actually be fine with listing the .module file in the .info. It only makes sense for .module to be special if we're going to load it always, which I believe chx said at one point he didn't want to have to do (and neither do I :-) ).
Comment #172
pwolanin CreditAttribution: pwolanin commentedhmm, can we have a module without a .module? On the one hand it sounds like heresy, on the other, it seems natural if the .info file is really the container for key information about what a module (or theme) is composed of.
Comment #173
moshe weitzman CreditAttribution: moshe weitzman commentedseems like a moot point to me. lets just auto-register a .module file if one exists.
Comment #174
Anonymous (not verified) CreditAttribution: Anonymous commentedi'm in favour of being more explicit in the .info file - don't think it costs us much to add the .module file, and it keeps the code a tiny bit dumber, folds more into the data.
as to whether we can have a module without .module - woah, that's for another issue ;-)
Comment #175
chx CreditAttribution: chx commentedOkay modules are back but in .info what's more important, if you run
drupal_rebuild_code_registry
more than once in a page, like simpletest does then now we do not fail -- and simpletests indeed pass._registry_get_resource_name
had a static cache we needed to reset. I am back to presuming that thefiles[]
array in the info files is not empty. That'd be a very weirdo module which has zero files. Currently, having a module is a must because that's how Drupal figures out it's a module but in the future we might want to find .info files instead and put type = Drupal module inside...Comment #176
Anonymous (not verified) CreditAttribution: Anonymous commentedtested the patch at #175, can confirm tests now work as per a clean install.
regarding tests for the registry: we're now waiting for simpletest fixes i mentioned in #163.
perhaps we need should add the tests in a follow up issue? if people are happy with this, then i think this is RTBC.
completely off topic re. static caches - sometimes i wish drupal had a container object for these. functions that need to create a static cache call into the container object, rather than declaring a static var withtin the function body. having to add switches/options to the function signature where we want to clear or fetch the data in a static cache makes me feel dirty. anyway, yeah, i'll go take my medication now...
Comment #177
Anonymous (not verified) CreditAttribution: Anonymous commenteddiscussed this with chx - marking to RTBC, and changing the focus of this follow up issue to track tests for the registry.
Comment #178
Dries CreditAttribution: Dries commentedGreat work all. However, IMO, this patch cannot be committed without some SimpleTests.
Comment #179
chx CreditAttribution: chx commentedNote that http://drupal.org/node/221964#comment-827580 points out our problems with testing, namely includes and unit tests both have open issues. Those are less coding problems than community agreements. This issue will be three months old soon. Please.
Comment #180
Dries CreditAttribution: Dries commentedThere are already tests in
includes/
-- I think SimpleTest should pick these up. As such, I recommend we add aincludes/registry.test
. Decision made. We can always change/move things around later. :-)Given that this patch pre-dates SimpleTest being included in Drupal core, I might be willing to make an exception. That said, I'd rather see us take ourselves serious with regard to testing.
Comment #181
chx CreditAttribution: chx commentedEdit: note that I have edited this comment quite a bit.
a) note that as the 'file' keys are removed from the menu system, everything would die if the registry would be completely broken. b) my unit test plans involving mock functions has been ... let's say postponed . Even then, testing stateful operations, such as scanning for files that exist on disk would be ... problematic. Do you really want to make this patch dependent on writing a unit test framework based on runkit?? Even, do you want to discuss this in the #181+ followup in an issue? Just because I am working hard on this and now this is the first big patch to be ready after simpletest was committed.
Also, quoting myself from earlier
you have no idea how hard it was to hunt this bug which only surfaces w/ simpletest. So I am not playing down simpletest, I worked on making all the tests pass but that's how far I can get now.
Edit: and so another day passed without this committed and I can go to sleep frustrated over this. How many of these am I going to able to take? This is not the first patch I need to fight over and often I feel that if it'd not be me, it'd get in easier.
Comment #182
Anonymous (not verified) CreditAttribution: Anonymous commentedthere's one test file in includes - xmlrpc.inc.test. simpletest code doesn't load it at the moment.
which is a good thing, because loading it via a module test file:
equals WSOD. i've submitted a patch for that - can we get that in?
maybe simpletest picked up non-module files for testing in the past, but it doesn't now. i've submitted a patch to fix that - can we get that in?
so, we can create a registry test file in includes, but it won't be loaded until the fixes above are applied.
Comment #183
Dries CreditAttribution: Dries commentedchx, I apologize for any frustration caused. However, not committing this patch four weeks ago was one of the best things I did. The patch has come a long way since. It would not likely have happened incrementally after the patch has been committed. I'll review this patch once more tomorrow morning. Thanks for your continued efforts! We both know that they'll pay off.
Comment #184
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedAs Dries requested I had a look at the patch. I can't claim to have understood it fully and I didn't have time to test it. However, it somehow feels right.
I've only seen two minor glitches which should not hold up the patch:
1) the db query in drupal_function_exists should have either no aliases at all or the aliases need to be used everywhere
2) I don't like the query builder in registry_cache_path_files(), it should use placeholders in a proper way.
In general, it is a bit sad that we have to implement such a registry at all. PHP should provide something like this.
Comment #185
Anonymous (not verified) CreditAttribution: Anonymous commentedthanks for the review killes.
attached patch removes the errant aliases in drupal_function_exists and uses db_placeholders in registry_cache_path_files.
no other changes.
Comment #186
chx CreditAttribution: chx commented@Dries, four weeks ago, yes, but Sunday? @killes, this is not the first (and the last) time Drupal codes around some weakness of PHP. Remember Steven's critique of PHP. It'd so much nicer if we could use autoload for functions, yes.
Comment #187
owahab CreditAttribution: owahab commentedsubscribing.
Comment #188
Dries CreditAttribution: Dries commentedkilles: PHP does this already (somewhat). It's called op-code caching and is an in-memory cache of all functions.
Comment #189
Dries CreditAttribution: Dries commentedI gave it another full review and it looks good now. The code has come a long way since, and is certainly a ton more readable and easier to understand. Good job everyone -- and thanks for your persistence chx. The patch is committed, and we can tackle incremental improvements in future patches.
Comment #190
Crell CreditAttribution: Crell commentedHere's a quick mention for the change log. w00t!
Comment #191
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD. I apologize for not giving Larry/Crell credit in the original commit message. Thanks for the hard work, Larry!
Comment #192
Susurrus CreditAttribution: Susurrus commentedShouldn't this change be added to Converting 6.x modules to 7.x?
Comment #193
Crell CreditAttribution: Crell commentedYes it should. I will go do so. (We should also write proper docs somewhere; where the right location is, I'm not sure.)
Comment #194
Crell CreditAttribution: Crell commentedConversion docs updated.
Comment #195
ax CreditAttribution: ax commentednot good.
module_load_install()
is called bydrupal_load_updates()
for all modules, including disabled ones. so the patch above marks disabled modules enabled inmodule_list()
. not sure about the best way to fix this.Comment #196
Anonymous (not verified) CreditAttribution: Anonymous commented@ax: this code snippet is a left over from previous versions of the registry patch.
the reason given for the snippet - being able to module_invoke something during install - is no longer valid. once a modules install file is loaded, then module_invoke will work for any hooks defined in that file.
attached patch removes the offending snippet from
module_load_install
. the module tests in system still all pass.Comment #197
chx CreditAttribution: chx commentedThis is indeed a leftover of an overoptimization.
Comment #198
Dries CreditAttribution: Dries commentedThanks ax and justin -- good catch. Committed to CVS HEAD.
Comment #199
robertDouglass CreditAttribution: robertDouglass commentedE_ALL warning when enabling modules. Do we need a new issue for this small patch?
Comment #200
robertDouglass CreditAttribution: robertDouglass commentedAlso see http://drupal.org/node/255658
Comment #201
robertDouglass CreditAttribution: robertDouglass commentedNote: the E_ALL errors from #199 first appear when using HEAD devel module. Don't know if it is a devel module bug.
Comment #202
cwgordon7 CreditAttribution: cwgordon7 commentedNo, I suppose it's possible that a module could have no additional files..
Comment #203
chx CreditAttribution: chx commentedBut as you need to list even the module file, this is not a corebug, I confirmed with Robert.
Comment #204
Anonymous (not verified) CreditAttribution: Anonymous commentedAutomatically closed -- issue fixed for two weeks with no activity.
Comment #211
donquixote CreditAttribution: donquixote commentedTwo ideas that are somewhat related to the hook registry concept, and will work for D6
---------
#584024: speed up _theme_process_registry(): get_defined_functions() instead of function_exists()
or better: #519940: Performance: optimize building theme registry by using get_defined_functions() instead of function_exists()
get_defined_functions() is a perfect way to quickly find all implementations of existing or imaginary hooks. It works nicely in D6, and I think it will work in D7 if all files in question are included.
The hooks can be stored in the DB (which is the purpose of the registry, I assume), and they can be kept in an array at runtime, keyed like this:
$hook_implementations[$hook][$suffix][] = $module;
(where $suffix can be an empty string)
The system does not know if something is a hook or not. It simply assumes that any function beginning with a module name should be remembered as a possible hook implementation. This is maybe not the most clever hook system, but it has always been the way that D6 works: It does not care if something has been explicitly defined as a hook.
----
And another one:
#569646: New hook that is called whenever a hook implementation is added, removed or modified.
This would work quite well if combined with the above approach: The hook implementations can be stored in a DB table, and with each rebuild, the system can check if there is anything new. For each hook with new implementations, it can then do:
There can be some bogus data in $add or $remove, if some occasionally included *.inc files contain functions that are recognized as hooks. This should not make a problem, however, because those hooks that matter are all in *.module files, and anything else should usually not try to look like a hook (otherwise it would disturb the old D6 hook system).
One problem to think about:
- module 'x' has a hook implementation 'x_y()' in 'x.inc'
- the module decides at runtime if 'x.inc' should be included or not, to dynamically include or exclude it from calls to hook_y().
This is hard to do with a cached list of hook implementations. I don't know if such modules exist in D6, but theoretically they can.
----
I am not familiar enough with D7, to decide if any of this makes sense for D7. But proposal one is definitely worth it for D6, if only to gain performance.
I don't see that any of this would break the D6 api. It extends the api, but doesn't break it for existing modules.
Comment #212
NancyDruYes, there are modules that do this. Web Links, for example checks for the presence of at least 3 other modules to decide whether or not to include those hook files.
Your "get_defined_functions()" method is interesting, but just checking the name starting with a module name doesn't really tell me that something is a hook. Do you have any way to narrow it down?
Comment #213
donquixote CreditAttribution: donquixote commented@NancyDru
You don't need to know if it's a hook.
If you call
module_invoke_all('bogus')
, then it will not care if "bogus" has ever been explicitly defined as a hook. In fact there is no way to define hooks. Instead, it will just call all functions with $modulename . '_bogus'.The new system would preemptively remember anything that could be a hook implementation, so it will also store $modulename . '_bogus'. Entries for non-existing hooks will just sit in the cache and do nothing.
In fact, one function can be stored several times:
This looks stupid, but it's the way that D6 is supposed to work.
Comment #214
donquixote CreditAttribution: donquixote commentedWhat we would need from a module is a list of possible function names from include files that should be considered as hook implementations, and/or a list of files to include before doing the get_defined_functions. The most robust would be doing it in the module's *.info file.
This would only be necessary for those rare modules where hook includes depend on a dynamic condition. But it is still an API change, which could kill this idea :(
I think for Web Links it would not even be necessary, because the condition (other modules exists or not) is rather static. The hook cache is rebuilt every time you enable/disable a module.
Comment #215
NancyDruWell, actually, in the Site Documentation module, I want to add a list of which modules invoke which hooks. That presupposes that I know what hooks are available so the admin can select from the list.
Comment #216
donquixote CreditAttribution: donquixote commentedWhat about this:
1) the hook cache I proposed does not care if something is a hook.
2) If you want, the system could remember the hook names used with every call to module_invoke_all or similar functions.
3) If you want, Drupal could define whatever rules to allow explicit definition of hooks. These rules are not used in the hook cache system, but can be used in your "Site Documentation" module. To my knowledge, such a system does not exist in D6 yet, but is the purpose of the D7 registry.
4) Information about hooks could be retrieved from drupal.org.
EDIT:
5) Information about hooks in dedicated files in module folders (such as help files). This could include documentation about each hook. On the other hand, a hook is not "owned" by a module, it is a free thing...
6) A list of hook names shipped with the "Site Documentation" module. It would be the module maintainer's task to keep it up to date.
Most of the system will work quite ok (and fast!) with solution (1). You can make a site documentation module, but it will contain a lot of noise.
Solution (2) needs a bit of extra resources, and its only use case would be modules like "Site Documentation".
Solution (3) means extra work for module developers, and again the only use case would be modules like "Site Documentation".
Solution (4) means extra work drupal.org and api documentation, but it doesn't add code complexity or performance penalties. After all, the info would only be retrieved for modules like "Site Documentation".
I doubt if keeping track of explicitly defined hook names is worth the trouble, even if a "Site documentation" module would be nice.
Comment #217
NancyDruI have thought about essentially doing #3, but can't count on any module developers to do it. That's the main reason why this feature has never been implemented even though it is easy to do.
I don't see DO admins including contrib APIs, so #4 won't happen.
Comment #218
donquixote CreditAttribution: donquixote commented(added point 5 and 6 to the above list)
---
True, but anyway, this is not really related to the get_defined_functions -based hook cache idea.
I am very concerned about performance in D6, and would appreciate if this solution would be implemented.
Comment #219
NancyDruThis issue is marked "closed" (#204) so the Drupal maintainers aren't even looking at it.
Comment #220
donquixote CreditAttribution: donquixote commentedso... should I reopen it, or is this considered hijacking?
I think the discussion, if any, should rather happen in the other issue,
http://drupal.org/node/584024
although I get the feeling the other issue is a bit too specific to '_preprocess_' hooks.
Comment #221
Crell CreditAttribution: Crell commentedThis is very much thread hijacking and not the place for this sort of discussion. The registry has already been rolled back and is no longer part of Drupal 7, save for a cache of class/interface definitions. There is no more registry for functions.
For further discussion of reducing code weight, see (but don't take off topic) #557542: Cache module_implements() .
Comment #222
donquixote CreditAttribution: donquixote commentedDamn, I played with the reopen thing but then didn't actually want to do it. But then I didn't switch it back. My apologies.
In the meantime I have found some other discussions which I should have checked first, including the one you are pointing to. And including one older issue I started myself months ago... which I totally forgot. I am duplicating myself!
#519940: Performance: optimize building theme registry by using get_defined_functions() instead of function_exists()
Btw:
This sidetracking would not have happened if someone had added links to the more up-to-date discussions.
- #497118: Remove the registry (for functions)
- #557542: Cache module_implements()
Comment #224
etaroza CreditAttribution: etaroza commentedI'm experiencing a problem, that I can't rebuild the registry before bootstrap and bootstrap is failing because of illegal registry...
The situation is as follows:
I might be moving the modules in a wrong way, but this worked for Drupal 6.x as far as I know. Now the bootstraping mechanism is looking at the stale registry data and I can't do much about it?
I guess I need to rebuild the registry after moving modules around. But how to do that?