This patch introduces a registry of functions, classes and interfaces. What this is good for?

  • No longer need to specify files in hook_menu. It Just Works (TM).
  • Forms can be reused from inc files without manual includes which is necessary in D6.
  • Preprocess functions can live in inc files.
  • module_implements becomes faster and allows us to use a better weighting system later.
  • Most hooks can live in inc files (minus hook_boot, hook_exit and the new hook_hooks)
  • Every page load can have (almost) exactly only the inc files it needs preloaded. This can speed up bootstrap tremendously by not loading unnecessary code.

There is some nice magic in hook_hooks: this registers the dynamic hooks. We cache the implementation of these, for example {$form_id}_form_alter during registry build, so when module_implements is called with system_themes_form_form_alter and there is no implementation cached for it then it knows there is no need to look for implementations of this hook. This speeds up things tremendously for such dynamic hooks -- of which I hope we can introduce a few more during D7.

The registry itself and the parser is Crell's idea and code originally please credit him when committing.

Files: 
CommentFileSizeAuthor
#199 registry-eall.patch719 bytesrobertDouglass
#196 module_load_install.patch848 bytesbeejeebus
#190 registry_changelog.patch805 bytesCrell
#185 registry_46.patch110.03 KBbeejeebus
#175 registry.patch105.28 KBchx
#170 registry.patch102.76 KBchx
#168 registry.patch105.94 KBchx
#163 registry_42.patch95.2 KBbeejeebus
#162 registry.patch94.26 KBchx
#161 registry.patch88.68 KBchx
#160 registry.patch88.97 KBchx
#155 registry_38.patch96.13 KBbeejeebus
#147 registry_37.patch99.36 KBbeejeebus
#146 registry_36.patch96.93 KBbeejeebus
#142 registry.patch95.79 KBchx
#140 registry.patch85.43 KBchx
#139 registry_33.patch97.36 KBbeejeebus
#138 registry.patch95.61 KBchx
#135 registry_33.patch91.56 KBbeejeebus
#126 registry.patch95.6 KBchx
#125 registry.patch95.91 KBchx
#121 registry_29.patch98.86 KBbeejeebus
#109 registry.patch96.38 KBfloretan
#108 registry.patch96.32 KBchx
#96 registry.patch96.29 KBchx
#94 registry.patch96.31 KBchx
#89 registry.patch102.13 KBfloretan
#88 registry.patch95.76 KBchx
#85 registry_22.patch100.51 KBkeith.smith
#84 registry_21.patch95.83 KBbeejeebus
#83 registry.patch94.4 KBchx
#81 registry.patch93.78 KBchx
#77 registry.patch93.58 KBchx
#76 registry.patch93.96 KBchx
#71 registry_lg.patch101.55 KBCrell
#60 mw.patch94.51 KBmoshe weitzman
#59 mw.patch94.95 KBmoshe weitzman
#53 registry.patch88.36 KBchx
#50 registry.patch87.47 KBchx
#49 registry.patch87.47 KBchx
#48 registry.patch86.88 KBchx
#47 registry.patch86.8 KBchx
#44 registry.patch86.8 KBchx
#42 registry.patch28.27 KBchx
#29 registry_9.patch92.9 KBwebernet
#26 registry_8.patch84.3 KBkeith.smith
#23 registry.patch84.29 KBchx
#22 221964.patch36.82 KBdopry
#20 221964.patch29.94 KBdopry
#11 registry.patch28.68 KBchx
#9 registry.patch28.58 KBchx
#8 registry.patch28.36 KBchx
#7 registry.patch28.36 KBchx
#6 registry.patch27.94 KBchx
registry.patch27.96 KBchx

Comments

Crell’s picture

To give some background, the problem with my original registry design was that it hit the database too often. In fact, it ended up hitting the database a few hundred times per request, which is double-plus-ungood. chx solved that by the additional magic in module_implements() which lets Drupal learn about its own hook implementations and cache that knowledge, which not only gives us the lazy-include logic without making the database cry but speeds up module_invoke_all() to boot. The rest is just a logical extension along the same lines.

chx is the one who made the whole thing actually work. :-)

cburschka’s picture

On a cursory reading of the patch, I notice that you are calling module_list with FALSE, TRUE, FALSE, which are the default arguments anyway. From a developer's point of view, to find out what FALSE TRUE FALSE does (no refresh, bootstrap mode, default sort), I'd need to look up the function anyway - so including these arguments explicitly does not add any readability. And since this is core accessing core, we don't need to protect ourselves against sudden changes in the other function.

cburschka’s picture

(Also, I presume that this patch will require the other core modules to implement hook_hooks too, correct?)

catch’s picture

subscribing so I can test a bit later.

chx’s picture

You need hook_hooks only to register dynamic hooks. I am unaware of any other than {$form_id}_form_alter.

chx’s picture

FileSize
27.94 KB

Fixed those module_list calls.

chx’s picture

FileSize
28.36 KB

Supressed hooks for the includes directory so tablesort_init does not fire on hook_init. content_type.inc is now parsed as belonging to node because it is inside the directory 'node'. install files are parsed so drupal_write_record works.

chx’s picture

FileSize
28.36 KB

The module dir setting was wrong for content_type.inc and other similar named files. There is something good here now: if you want to implement a hook in includes, you can, though it'd be awkward a bit: hook_block can be implemented as includes_block.

chx’s picture

FileSize
28.58 KB

module_implements did not load the include files for the cached stuff. Bummer.

Wim Leers’s picture

Subscribing.

chx’s picture

FileSize
28.68 KB

Fixed the definition of hook_form_{$form_id}_alter. Fixed update.php WSOD'ing.

webchick’s picture

I've been testing this a bit, and haven't run into any problems other than when I enabled all modules, I got an error:

"call_user_func_array() [function.call-user-func-array]: First argument is expected to be a valid callback, 'node_node_type' was given in /Users/webchick/Sites/head/includes/module.inc on line 505."

moshe weitzman’s picture

Looks nice from the description and reading the code. I'll give it a try soon.

sinasalek’s picture

Great idea thanks, i will give it a try

profix898’s picture

Subscribing.

cwgordon7’s picture

Subscribing, may be parallel to Drupal pipes...

demeester_roel’s picture

Subscribing.

breyten’s picture

subscribing.

NancyDru’s picture

From what I can see so far this looks really cool. Is there any better documentation on everything that is included here?

dopry’s picture

FileSize
29.94 KB

I tested the patch with Apache 2, PHP 5.2.5, MySQL 5.0 through a clean install and didn't experience any errors. I enabled upload.module, created some vocabularies and created some stories with terms and attachments without an error.

@chx: it would be nice if you could point out some specific things to test.

The concept is sound and saves from from the dynamic include insanity I deal with in D6, and provides automatic class loading. The code style is sound, and documentation is copious.

+1...

The attached patch is re-rolled to remove a debugging line and fix a system_update conflict. I'd say it's really close to RTBC with a few more positive tests.

Crell’s picture

Make sure you have devel installed and are watching the query log. The various caching systems are designed to avoid ballooning the query count, so make sure it's not showing too many queries against the registry table.

To test the auto-load, try moving the class in aggregator into its own file and rebuilding the registry. (Happens on module enable or when the "clear all caches" button is clicked on the Performance page.) Then setup a feed or two and update them. It should load the file with the class and function normally rather then breaking. :-)

That should cover the two main tests at this point. If those work, the rest should be fine.

dopry’s picture

FileSize
36.82 KB

Here is an updated patch which moves class update_xml_parser to a new file to make testing easier.

chx’s picture

FileSize
84.29 KB

Testing is rather simple, no need to move files around -- the file handling code is removed from menu.inc so if Drupal works, admin pages, node submission etc then all is well. I have also removed the file declarations from menu hook. Also included is a $_SERVER['REQUEST_METHOD'] == 'GET' for the caching, and that's good: rarely used included files necessary for form submissions holding lots of rarely running code are no longer cached. This was the only big remaining concern of Crell (please do not forget crediting him for this subsystem) and mine.

I fixed the node_node_type error on the modules screen -- the error was in module_implements and meant that hooks in include files were not included.

Note that this patch is a continuation of my patch, it uses nothing from dopry's -- there was more debug code to be removed and the system update conflict was solved in an unclean way and also the aforementioned big bug was not fixed.

KarenS’s picture

This seems like a great idea. Just to test that I understand this correctly (and to maybe help others new to the issue), are the following correct statements?

1) The files that hold the functions should be named $module.$something.inc, so content.admin.inc would work correctly as an additional registry location for the content module, but there might be a problem using a file named content_admin.inc. In other words, some files may need to be renamed.

2) To implement this in most modules, you would just need to :

a) find and replace if(function_exists('my_function')) {..} with if(registry_check_function('my_function')) {...}.

b) remove any 'file' => 'custom.inc' lines from hook_menu(), since the file name will be discovered automatically.

d) remove 'include_once(custom.inc)' from your code since it won't be needed any more :)

chx’s picture

  1. Naming conventions.
    1. node/content_type.inc belongs to node because it's something.inc residing in a dir called 'node'
    2. node.pages.inc belongs to node module because it's foo.bar.inc and foo == node
    3. node.module belongs to node module because duh
    4. Conversion notes, correct. Feel free to move hook implementations to include files, too because module_implements (and module_invoke_all relying on that) auto includes.
    keith.smith’s picture

    FileSize
    84.3 KB

    New patch attached, with very minor code comment changes (a couple of typos, periods at end of sentences, capitalization, etc.) Absolutely nothing major, and I diff'd it with the patch in #23 to make sure that I didn't inadvertently change any of the code.

    (Please do not credit me on commit.)

    Crell’s picture

    One addition on conversion, some standard conventions are a good idea. Eg, page handlers in module.pages.inc and module.admin.inc as now. Hooks that are called once in a blue moon like hook_menu and hook_theme are "registry hooks", and should probably all end up in module.registry.inc. That sort of refactoring will be done to core after the main patch lands and we can establish some de facto best practices that way.

    Ideally, code should be grouped in such a way that loading one file will get all of the functions a particular action is going to need. This is optimizing by cut and paste. :-)

    webchick’s picture

    @Crell: Hm.

    At a certain point though, I fear that we're going to be causing more of a performance overhead including all of these wacky files everywhere. A commenter on the D6 announcement post pointed out that for *really* high-traffic sites that need to use things like opcode caches, more code in *fewer* files is what you want. Additionally, it makes the module really hard to grok when code is here, there, and everywhere. The current hints in the menu system are nice, because they tell me when I can't find function x_y what file it lives in. This is going away now, so my only resource is to grep and have 11 files open in front of me to figure out how a module works.

    node.pages.inc and node.admin.inc is a nice, logical separation. But if we start having node.registry.inc and node.node-pages.inc and node.node-teaser-pages.inc and node.admin-all-nodes.inc and node.admin-edit-nodes.inc and so on that's a little bit... insane. Let's shoot for a happy medium that gives us the fewest number of files per module with the biggest performance benefit.

    webernet’s picture

    FileSize
    92.9 KB

    Attached patch removes the 'file' lines from forum_menu(). Also fixes a comment, 'enabled', not 'installed' modules.

    Crell’s picture

    @webchick: Absolutely. While this setup could support one-function-per-file, that would be an very stupid way to do it. :-) The advantage here is that we can experiment and find what a good balance would be (registry hooks are simply low-hanging fruit) without having to change the API or the engine itself. Optimize by cut and paste. :-)

    That said, if you simply set your opcode cache to not recheck the file system then it shouldn't matter how many files you have. I also find 3-4 well-organized files much easier to deal with than one 1000 line behemoth. I frequently break modules up already into multiple files with a manual include just to make them more manageable. For a module that only has 4 small functions in it, it's probably best to just leave it all in one .module file. Again, it's fully dynamic so modules can organize themselves in whatever way makes the most sense for them.

    BioALIEN’s picture

    I agree with Crell's statement. Obviously there's a sane limit to how you should break a module's functions. May need to add a "best practice" guide once this lands? Either way, the flexibility is there!

    theborg’s picture

    Subscribing.

    chx’s picture

    So what else is needed to get this in?

    dropcube’s picture

    Subscribing.

    moshe weitzman’s picture

    Status: Needs review » Needs work

    I tested a clean install and really clicked around. No errors.

    When is update_create_registry_tables() supposed to be called? I don't see it in the upgrade path. Once it is there, this is RTBC.

    meba’s picture

    Subscribing

    marcingy’s picture

    This patch seems to work fine and apart from the performance hit on modules page (registry creation) it certainly hasn't had a negative impact on page generation.

    +1

    moshe weitzman’s picture

    We haven't discussed much a big benefit of this patch - we can rework the bootstrap so that we hand over control to the menu handlers when only the required modules are loaded. No longer will we indiscrimitately do a full bootstrap for most requests. This system will carefully load modules when they are needed. This has big implications for performance sensitive callbacks like ajax handlers, ahah handlers, private files handlers, etc. This bootstrap refactoring will be my next patch after this gets in.

    Also, chx mentioned at the top that this paves the way for allowing modules to order hook invocation on a hook by hook basis instead of the crude system.weight we have now. thats nice.

    Wim Leers’s picture

    moshe: darn you! You got me drooling! :D

    Crell’s picture

    @moshe: Yep, that's exactly the end goal. A well-factored module will have nothing in .module aside from hook_boot, hook_exit, hook_hooks, and methods that have to be called directly. (And even some of those can be moved out.) Pretty much everything else can be loaded on-demand. That makes even a BOOTSTRAP_FULL tiny.

    There's still the question of the includes directory, but I suspect we can clean out much of that, too. Phase 2. :-)

    I'm not sure what you mean in #35. Can you clarify?

    chx’s picture

    #35 says, there is a nice update function which creates the registry tables early but the function is not called. Tomorrow I roll a new one.

    chx’s picture

    Status: Needs work » Needs review
    FileSize
    28.27 KB

    So there. When the Database: TNG patch gets in, we can do better with the update code as then the schema will not be in common.inc.

    moshe weitzman’s picture

    Status: Needs review » Needs work

    * need to rename system_update_7000 to system_update_7001
    * even after that, i am getting "Call to undefined function cache_get() in /Users/mw/htd/pr/includes/module.inc on line 427" on update.php

    chx’s picture

    Status: Needs work » Needs review
    FileSize
    86.8 KB

    The previous try was rolled from an older version -- sorry. This one seems working.

    moshe weitzman’s picture

    Status: Needs review » Reviewed & tested by the community

    ok, upgrade path works now. rtbc

    chx’s picture

    Moshe in private raised a concern that we are now INSERTing on every page with the cache_set. The basic assumption is that every GET to a given router path uses the same include files. When this assumption fails, then the safety net is that the cache is incremental, so if there are two sets of includes for a given router path then both will be included always.

    chx’s picture

    FileSize
    86.8 KB

    Looking at the caching code I found a space error. I have edited the patch to fix this so there is absolutely no change just a code style fix.

    chx’s picture

    FileSize
    86.88 KB

    Another reroll. This now really preloads the original_files as I wanted.

    chx’s picture

    Status: Reviewed & tested by the community » Needs review
    FileSize
    87.47 KB

    a) I removed the possibility to implement hooks in includes/ it was silly anyways b) We no longer include information from disabled modules sharing a directory with an enabled one.

    chx’s picture

    FileSize
    87.47 KB

    Moshe said it's not silly and he likes it. Easy to move around the check.

    starbow’s picture

    subscribing

    chx’s picture

    More goodness: module_hook ties into module_implements, reusing its cache. Thus node_invoke now can use this cache to save a few queries from the registry but then we have the problem that what node_hook calls $module is not necessarily a module. I circumvented this by adding recognition code for these pseudo modules into the registry build phase.

    chx’s picture

    FileSize
    88.36 KB

    And the patch.

    moshe weitzman’s picture

    Status: Needs review » Reviewed & tested by the community

    I tested and this is RTBC.

    I found one minor glitch in that fapi often produces registry cache misses that result in a quick query. The calls are:

    registry_check_function($form_id)
    registry_check_function($form_id .'_validate')
    registry_check_function($form_id .'_submit')
    

    Until we have a directory of all $form_id on the site, I can't think of a clean solution. It is such a minor issue that the patch should proceed as is.

    Dries’s picture

    $ diffstat

    38 files changed, 457 insertions(+), 264 deletions(-)
    

    I'm still trying to figure out what the killer advantage is of this patch ... I understand it could potentially speed-up Drupal but do we have any proof of that yet? The advantages listed in the body of the issue are nice, but don't blow me away. I'm a bit sceptic because it doesn't necessarily make the module code easier -- as you can see from diffstat's output, it quite a bit more complex. I'd like to see us unleash more of this patch potential to help us evaluate its usefulness.

    chx’s picture

    We have wanted to make Drupal boot faster. Once this is in we can explore that so much easier -- most of the current boot process can be removed. Also, currently if you want to reuse a form, you need to figure it out first which file it is in. Also, this patch will make it possible to weight the hooks one by one, even relative to each other, which is a much wanted feature.

    As you said it's complex so I would rather not make it even more complex with changing the bootstrap right in this one.

    Crell’s picture

    I just benchmarked the Drupal 6 page split effort: http://www.garfieldtech.com/blog/benchmark-page-split

    Short version: 20% speed improvement for uncached pages. The registry here should let us split off even more code. That's the killer advantage of this patch.

    chx’s picture

    Another good thing is that hooks become really really cheap. Even dynamic hooks. We can add a lot more dynamic hooks here and there once this is in -- we can even add a hook_generic_function_name_alter so that you can do something with every function there is (altering internal data structures to your hearts content) and this costs us extremely little performance-wise... lots of other craziness is opened up with this.

    moshe weitzman’s picture

    FileSize
    94.95 KB

    rerolled to fix 2 broken hunks in system.install

    moshe weitzman’s picture

    FileSize
    94.51 KB

    oops. i had a local change in there. i *think* his one is same as chx' last patch.

    moshe weitzman’s picture

    OK, I experimented tonight in an attempt to demonstrate the bootstrap advantages of the patch. I had some great success. To see for yourself, apply the patch and do this quick test:

    1. Enable all the core modules you feel like testing.Then browse to /admin page.
    2. Go to module_load_all() in module.inc] and replace module_list() with array('filter', 'block').
    3. Refresh /admin page. Yell out "holy shit - my site still works!". The above step cut a gaping wound into the heart of the bootstrap. Only block and filter modules were automatically loaded. The rest are being loaded as needed. Consider the advantage this has once you have 50+ Contrib modules as most sites do now. Thats a lot of code that is not being parsed and loaded into memory.
    4. Click around and your site mostly works. You can add/edit/view nodes, manage posts on admin/content/node, add feeds and retrieve feed items, and so on. You can't do funky stuff like install modules and flush all caches. That will take some refactoring. The error messages you see here are trivial compared to the bootstrap change we just made.

    As I said earlier, this patch lets us hand over control to the menu callback much sooner. This is great for ajax handlers, autocomplete, web services, private file handling, etc.

    I'd love for this patch to be committed before we fully rework the bootstrap. I hope the above demo satisfies the need to show speed.

    NancyDru’s picture

    But, golly, Moshe, that means fewer WSOD's too! We can't have that - how would all those contribs get debugged? ;-)

    pwolanin’s picture

    subscribe

    beejeebus’s picture

    subscribing

    jaydub’s picture

    subscribe

    Dries’s picture

    While I'd still like to see more of this patch potential evaluated, I decided to move forward and to have a first good look at the code.

    My main concern is that there is a lot of stuff going on, and the code is poorly documented. This requires the reviewer to reverse engineer a lot of what is going on. Also, certain function's PHPdoc (i.e. _registry_save_resource()) is out of sync with the declaration of the function.

    Please better document the code.

    robertDouglass’s picture

    And that's how Dries reviews patches while skiing. Subscribing.

    pwolanin’s picture

    Status: Reviewed & tested by the community » Needs work

    A question also regarding files like cache.inc - we allow the file to be included to be determined via a variable setting. Is that going to break here? In other words, this registry code will always include the file includes/cache.inc based on the function names?

    CNW based on #66

    Crell’s picture

    @pwolanin: The registry only pre-loads files that were lazy-loaded in the first place, I believe. As long as the cache system is initialized before the registry is, and the appropriate file is included, the lazy-loader will never kick in and there should be no problem.

    maartenvg’s picture

    I can patch current CVS, but system.install patches with hunk #4 with fuzz 2 (and a few hunks with offsets), and this results in a
    Fatal error: Cannot redeclare system_update_7001() (previously declared in drupal/modules/system/system.install:2702) in drupal/modules/system/system.install on line 2714

    Can someone re-roll the patch, please?

    Crell’s picture

    FileSize
    101.55 KB

    There seems to be a problem with both moshe's and chx's latest versions in the installer. I tried to track it down, but couldn't figure out what the issue is. It seems to be trying to run the schema setup twice, the second time with NULL $schema, and that's breaking horribly.

    That said, the attached patch refactors _registry_save_resource() out into a series of utility functions that make more sense, and increases the patch size by a third by adding a lot more documentation. Hopefully it should be much easier to follow now. It even adds a new registry @defgroup. :-)

    There should be no functional changes in this patch, but I can't confirm that for sure because of the problem with the installer. I'm therefore leaving this as CNW and turning it back over to chx (or whoever else wants to figure out what happened to the installer).

    chx’s picture

    Another nice thing we can do after the registry is in, there is no longer a problem that you can't invoke hooks relatively early -- so you can actually have a hook for variable defaults. w00t w00t.

    Dries’s picture

    Crell: thanks for improving the documentation. I'll schedule another review. Glad to see this moving forward.

    chx and moshe: thanks for posting 'motivations'. I'm sufficiently convinced now. At this point, I'd like to make sure that (i) these motivations are captures in the patch, (ii) that the code is self-explanatory and (iii) that this patch does not introduce bugs (i.e. caching).

    all: please review the code and help make sure that the code is easy to understand. This is a key building block of Drupal, and it is important that it is both elegant and transparent. Your critic eye is required to make this a beautiful piece of design and code. Thanks.

    recidive’s picture

    I did a quick code review. Here are some observations:

    Code documentation formatting is inconsistent, some @param's declare data-types. Parameter descriptions are wrongly indented: some are single spaced, some have two spaces (the correct one) and others have tabs.

    Why the database update function is in update.php instead of system.install? I'm sure there are good reasons for this, I think they should be documented. Also its comment says "If batch table exists, update is not necessary".

    I'll test the patch when I get a chance.

    cwgordon7’s picture

    The last patch seems to be running drupal_rebuild_code_registry() in index.php. I'm pretty sure this is debug code left in there by mistake.

    chx’s picture

    Status: Needs work » Needs review
    FileSize
    93.96 KB

    That was not too hard. As the fallback module_implements uses module_list and during install, the module to be installed is not yet in the list but its install is loaded by module_load_include and then module_invoke is used -- so I added a few lines of code to module_load_include which ensures this works. To balance, I ripped the fallback code from module_invoke it was a) useless b) broken.

    chx’s picture

    FileSize
    93.58 KB

    Somewhere during the rerolls TAB characters got added to my beautiful little patch. Sacrilege! (definitely not me, my editor is set up properly.)

    catch’s picture

    Status: Needs review » Needs work

    Getting a WSOD on a clean install.

    beejeebus’s picture

    WSOD is caused by a rebuild of the registry calling back into code that relies on the registry already being built:

    _registry_parse_directory --> _registry_get_resource_module --> node_get_types --> _node_types_build --> module_invoke_all --> WSOD

    specifically, according to this comment drupal_rebuild_code_registry:

    // Flush the old registry.
    db_query("DELETE FROM {registry}");
    // We can't use module_invoke_all here because it depends on the registry
    // which is currently being rebuilt.
    

    too late for me to fix (3am in sydney), hope this helps someone else get a working patch going.

    chx’s picture

    Well that comment and the surrounding code is outdated, module_invoke_all calls module_implements which has a safeguard for the registry being rebuilt. I'll investigate.

    chx’s picture

    Status: Needs work » Needs review
    FileSize
    93.78 KB

    Calling _registry_get_resource_hook makes the conversion to utility functions and the patch actually work.

    catch’s picture

    Status: Needs review » Needs work

    Ok no WSOD this time, but I'm getting table doesn't exist errors after enabling all modules.

    chx’s picture

    Status: Needs work » Needs review
    FileSize
    94.4 KB

    Instead of trying to figure out which version of module_implements to run we now make it possible to set a "dumb" mode which is basically what module_implements is now. This is set during registry rebuild (as in the previous patches) and during install of modules.

    beejeebus’s picture

    FileSize
    95.83 KB

    w00t! works for me after installing new modules.

    updated patch to get rid of fuzz against HEAD.

    keith.smith’s picture

    FileSize
    100.51 KB

    Attached is a version of the patch in #83 with substantially reworked comments in most places, a few minor typos corrected, and double spaces changed to single spaces between sentences. There should be no functional changes, if I did this correctly.

    (Also, as in #26, please do not credit me on commit.)

    chx’s picture

    moshe weitzman’s picture

    Status: Needs review » Needs work

    I could find only 2 problems
    - right after i installed, i followed a link to node/add and got access denied. that cleared up though after i submitted the modules page
    - i am getting a NOTICE and blank node form the notice is "notice: Undefined index: blog_node_form in /Users/mw/htd/registry/includes/form.inc on line 344.". perhaps the registry is missing those dynamic form ids again.

    chx’s picture

    Status: Needs work » Needs review
    FileSize
    95.76 KB

    As we check for disabled modules, adding node_content to the module list in _registry_parse_directory helped the problem. Also, I added access to the node hooks list. Pseudo modules are a pain. I have grepped for node_invoke in node directory and these are all the hooks. I have updated some of the code comments in node.module so that now you can actually search for every node_invoke and get a list of hooks.

    floretan’s picture

    FileSize
    102.13 KB

    Re-rolled this patch (there was a conflict in common.inc), and fixed a misnamed variable in system_update_7002() that was keeping the update from being successfully executed.

    beejeebus’s picture

    Status: Needs review » Needs work

    tried patches at #88 and #89, both need work.

    - create content menus are missing after fresh install
    - going to node/add/page --> access denied for uid 1
    - installing a module --> WSOD for all pages on the site

    beejeebus’s picture

    this is a nasty hack in module_hook makes the create content menus appear on a fresh install:

    function module_hook($module, $hook) {
      $list = module_implements($hook);
      if ($hook == 'form' || $hook == 'access') {
        $list[] = 'node_content';
      }
      return in_array($module, $list);
    }
    

    the real problem is that there seem to be some namespace hacks around the module names 'node' and 'node_content'. not sure the best way to go here, but currently module_implements doesn't return 'node_content' as a module that implements the 'form' hook.

    also, the current implementation seems to rely on the hook cache too much. to illustrate:
    1. fresh install plus patch
    2. clear the hook row in cache table
    3. visit admin/build/modules --> WSOD

    pwolanin’s picture

    I had a patch to clean up some of the "node_content" stuff at one point, though I'm not sure if it would help here.

    chx’s picture

    Status: Needs work » Needs review

    I went back to #85 enrolled all the fixes, moved the hooks to cache_registry -- note that if the hooks cache entry is not there, the registry gets rebuilt -- and changed node_hook to registry_check_function($module .'_'. $hook); because that's what it is, node_content is not a module.

    chx’s picture

    FileSize
    96.31 KB

    And the patch. The above comment / solution was wrong, the problem with content types was that we did not rebuild the registry after install, there is a comment in install.php which says: Rebuild menu to get content type links registered by the profile, and possibly any other menu items created through the tasks. I added registry there.

    Edit: I realized it was wrong because module_hook deliberately does not check whether $module is really a module. That's enforced only in dumb mode because then we have no other way.

    beejeebus’s picture

    the patch at #94 applies with some fuzz, but all the issues from #90 are fixed.

    creating content works, enabling and disabling modules and themes works.

    i also couldn't get things to fail by deleting the registry or hook cache, or emptying the registry table between page requests.

    nice work!

    chx’s picture

    FileSize
    96.29 KB

    Rerolled against HEAD. Absolutely no changes.

    beejeebus’s picture

    Status: Needs review » Reviewed & tested by the community

    i've spent some more time trying out most of the admin section, enabling all modules, turning caching on and off.

    i think this patch is ready to go.

    pwolanin’s picture

    I'd like to test still - I'm concerned about what happens if (for example) several modules have a .inc file that define the same functions.

    moshe weitzman’s picture

    @pwolanin - there is no new problem with function duplication. this new system just loads less code on every request than the current drupal. if you get a duplicate function error with the new code, you would have certainly gotten it with the existing code.

    Crell’s picture

    Duplicate function names indicate a bug in one or more modules for failure to namespace their code properly. That's true now, and it's true after this patch. No change. :-)

    The only place that's not true is the cache and database systems. Both of those initialize before the registry so the registry wouldn't break on those since it never loads them itself. And the database will not be that way for long if the new Database API patch ever gets in. :-)

    NancyDru’s picture

    I was just working on something that makes me wonder if this new facility will address: With the movement to "compartmentalize" code (which I like), would this allow hook_block code to be split from a module? This could, conceivably, greatly speed up block usage, which I can see slowing down my page rendering.

    pwolanin’s picture

    @Crell - well, in core already is the new password.inc file which may also be substituted based on a variable.

    floretan’s picture

    Status: Reviewed & tested by the community » Needs work

    When running update.php, the system update 7002 generates the following error on the "select updates" page:

    user warning: Table 'drupal7.registry' doesn't exist query: SELECT file FROM registry WHERE name = 'update_script_selection_form_validate' AND type = 'function' in /home/florian/workspace/drupal7/includes/common.inc on line 3911.
    

    The errors don't seem to affect the upgrade, but we can't commit something that will require users to "just ignore" an error while updating their site.

    Crell’s picture

    @nancyw: Yes, if we can compartmentalize block code more than it is now. One of my later goals for Drupal 7, time permitting, is to refactor the way blocks work to be more hook_menu()-ish, with a registry function that specifies callbacks to call for each block. That would allow the registry function, like hook_menu and hook_themes, to live in call-once files ($module.registry.inc) and the callback function to live "wherever", which could be conditionally loaded. However, any block that is displayed would still have to have its code loaded unless it's cached. The D6 block caching system is separate from either the Registry or any block refactoring. (Both the registry and the push for testing benefit from a move to smaller, more self-contained chunks of code, which is an overall API-betterment anyway.)

    @pwolanin: Well, if password.inc is loaded before the registry then there's no problem. If not, that *may* be an issue. My preference, though, is like the database to move these "swappable subsystems" to a factory/object structure like the new database API uses. That provides a single front-end, lazy-code-loading, and no duplicate resource names.

    NancyDru’s picture

    Thanks, Larry. That's what I wanted to hear. I'd rather load 500 bytes than 50,000. (Now, if I can convince people that they don't need comments...)

    Crell’s picture

    *Crell stabs nancyw...

    It's not the file size that's the issue, it's the code compile weight that we're trying to eliminate. Comments have virtually no compile weight. :-)

    pwolanin’s picture

    @Crell - that file is now loaded on demand during user login or user account edit submit. i.e. it would probably be well after the registry step. If it needs to be refactored, you'll have to explain to me how such an object would work and such a switch should be part of this patch.

    chx’s picture

    Status: Needs work » Needs review
    FileSize
    96.32 KB

    The file include part of registry_check_file is now safegaurded against maintenance_mode. I have also simplified the function.

    floretan’s picture

    FileSize
    96.38 KB

    You probably meant defined() (http://us2.php.net/manual/en/function.defined.php) rather than is_defined() which is not a php function. No other changes. update.php runs successfully, if I wasn't making a fix I'd set this as RTBC.

    Mongi’s picture

    Assigned: Unassigned » Mongi

    subscribing so I can test a bit later...

    Mongi Daghbouj
    Malermester

    http://www.holtemalerfirma.dk

    Crell’s picture

    Assigned: Mongi » Unassigned

    Ahem.

    beejeebus’s picture

    Status: Needs review » Needs work

    the patch at #109 applies and cleanly, and i couldn't find any problems from a fresh install.

    however, looking at the code for 'what if' type problems, i tried out a 'what if the process rebuilding the registry fails before completion'. eg, put in a die() at some point after the registry is flushed and before all the new registry rows (over 1385 separate inserts queries on a standard install) are written. or just delete all/half of the registry in between two requests.

    hilarity ensues - pages are randomly inaccessable, and its not obvious how to get the registry cache rebuilt without hackery. so, setting this back to patch needs work. i think we need to rework the registry rebuild so that we don't rely on processes not die()ing during a 1000+ insert run.

    two suggestions:

    - read the new registry data into memory *before* the flush and inserts. still leaves us open, but for a shorter time and we could do batch inserts. might mean larger RAM use for that request though
    - keep track of the files we are parsing, and don't delete registry items that haven't changed (which would be the vast bulk). i think this is the cleanest and fastest way, but would add some complexity.

    thoughts?

    chx’s picture

    Status: Needs work » Reviewed & tested by the community

    I should not set this to RTBC but two people already said it works. justinrandell's problem IMO is a follow up patch, making the registry rebuild more robust.

    Crell’s picture

    Batch inserts will be trivial with the new Database API, as it supports multi-insert statements (using whatever database-specific syntax is appropriate). That is still in development, though.

    File-by-file rebuilding is interesting, and would offer a smaller total-death window, but would still leave the system in an unstable state should the process die halfway through system.module for some reason. Considering how rare a registry rebuild is, I don't know how huge a problem that would be.

    The new Database API is also designed to support transactions where possible. Wrapping the entire delete/rebuild cycle in a transaction would offer far better robustness on databases that support transactions, as the system could (in theory) never be left in an unstable state. That's almost a text-book use case for transactions, actually. That's also pending the Database work.

    So in short, yes it's a problem, but not a huge one, and two of the three ways to ameliorate it are pending on the new Database API. We're still working on it, but it's getting steadily closer! :-) Leaving as RTBC.

    beejeebus’s picture

    i'm happy for this patch to go in, and robustness to be added later in a follow up issue.

    i'll volunteer to create some patches - even though the registry rebuild is rare, the current implementation is internsive on the webserver and the db, which always makes Bad Things more likely.

    carlmcdade’s picture

    subscribing

    pwolanin’s picture

    I discussed with Crell last night - if the patch goes in as now, you will be unable to use an alternate password hashing framework. So, that code will need to be refactored either as part of this patch, or immediately after.

    ksenzee’s picture

    I tested the patch in #109 and it applies with fuzz 2 against HEAD, but it works fine. Creating and deleting users, profiles, and nodes all worked.

    Crell’s picture

    Also as discussed with pwolanin last night, the underlying problem is that the Registry will get confused by pluggable systems that use duplicate function names that initialize after the registry in the bootstrap pipeline. Right now, there is only one such system: the password hashing. (Database and cache initialize before it.) The solution is to tweak the password hashing framework to use a non-duplication-based pluggable mechanism, which is actually quite easy to do in at least two ways (one OOP, one not). Since there are no additional hashing libraries right now AFAIK, that is not a problem right now. Password can be refactored after this goes in as a sample for how to do such pluggable systems. I've already discussed with pwolanin how to do so. Therefore, that is not a blocker on this patch. :-)

    coupet’s picture

    subscribing

    beejeebus’s picture

    Status: Reviewed & tested by the community » Needs review
    FileSize
    98.86 KB

    was testing a soon-to-be-submitted patch that adds don't-parse-a-file-that-hasn't-changed functionality to the registry, i came across this path to a WSOD. at first i thought it was the new code, but it turns out to apply to the patch at #109 as well:

    1. clean install plus patch
    2. enable statistics module
    3. go to performance page and clear all pages --> Fatal error: Call to undefined function drupal_rebuild_code_registry()

    the call to drupal_rebuild_code_registry was triggered by a failed cache lookup in module_implements:

    $cache = cache_get('hooks', 'cache_registry');
    if (!$cache) {
      drupal_rebuild_code_registry();
      $cache = cache_get('hooks', 'cache_registry');
    }
    

    attached patch includes a require call for common.inc in the bootstrap, and makes drupal_rebuild_cache_registry check for file_scan_directory and node_get_type.

    chx’s picture

    Status: Needs review » Needs work

    Loading the whole of common.inc during this stage of bootstrap would be a big performance blow. I will inspect what can we do instead.

    beejeebus’s picture

    @chx: yes, i agree.

    feels a bit like we need a registry.inc.

    Crell’s picture

    Yargh.

    A registry.inc sounds good, as it should be quite small (under 400 lines) and we can load that very very early in the bootstrap process. That will also mean we can rely on it for more of the early bootstrap phase. However, it does need to initialize after the database, since it relies on the database to work. Perhaps it should get its own bootstrap phase, right after the database phase?

    chx’s picture

    Status: Needs work » Needs review
    FileSize
    95.91 KB

    I went back to #109 , moved registry_check_function and registry_mark_function into bootstrap, the registry build functions into a .inc , written a two line loader for that also in bootstrap. I believe this properly fixes the issues. So most of the added new code is now in a separate, on-demand loaded inc.

    chx’s picture

    FileSize
    95.6 KB

    I have refactored registry_check_class and interface into one function and moved everything those to bootstrp too.

    Dries’s picture

    I had another good look at the registry patch. Here are my thoughts:

    * I'd like to understand/measure better how many SQL queries this adds to the average page view. It's not clear how well the caching works in certain situations.

    * One of the things I always really liked about Drupal is that it loads things based on convention. Conventions are a powerful mechanism to avoid expensive database operations (including caching), function body parsing, etc. This patch is a bit of a regression in terms of the use of conventions and, frankly, sheer elegance. While I think the registry is a great idea, I think we should think more about the implementation. My gut feeling tells me we can work on making this more elegant and less intrusive.

    * Since the last review, a lot of progress has been made on the documentation. That was helpful.

    * I found 'registry_check_function' and friends to be poorly named. Based on the function name, I couldn't figure out what this function did. I had to skip forward and read its PHPdoc instead. This is the function that does the actual loading so I'd rename it accordingly. Something like 'registry_load_function()', 'registry_load_code()', or 'registry_include_function_file()' seems more appropriate to me.

    * drupal_rebuild_code_registry() needs more code comments or better named helper functions. It's not clear what is going on. The difference between 'patterns' and 'implementations' is unclear and would benefit from documentation. Also, the calls to module_implements() are awkward.

    * Why can _registry_get_resource_module() return 'includes' -- that looks a bit like a hack to me? From an API's point of view, if it is not a valid module name, it should never be returned by _registry_get_resource_module().

    * I think the word 'resource' is somewhat poorly chosen -- it is too generic and it isn't very descriptive which make the code /flow/ less fluent. From a compiler, linker or loader's point of view, a 'resource' is a 'symbol table entry', or 'symbol' for short. It might be better to use the correct terminology. I know it would have helped me understand things.

    * _registry_get_resource_name() returns FALSE in case the resource is already processed. I found that to be odd. You'd expect it to return FALSE when the resource is not found. It think we should fix that up as it is going to be a source of confusion.

    * "That is, if module foo provides the implementation of hook_example() on behalf of module bar, then the function must reside in foo/bar.something.inc for it to associate with module bar correctly." -- I don't get this. Can you elaborate on that or reformulate that in the code comments?

    chx’s picture

    1. In what situations this is uncertain? We presume GET requests going to a router path are using one (or very few) sets of includes.
    2. Conventions. Is it possible to discover whch functions are used as form callbacks and which are used as menu callbacks? I am a bit unsure here. Drupal is very dynamic. You have hook_forms which can dynamically set a callback. You can run a menu_get_item in hook_init followed by a menu_set_item to change the callbacks runtime. This is not contradictory to the previous presumption -- it just means it's not trivial to find out which sets of includes are needed.
    3. Rename, sure, trivial.
    4. node_content and anything else set by node info is not a module either. Moshe above liked the possibility of hooks in includes.
    5. Rename, sure.
    6. _registry_get_resource_name , i will look into that.
    7. There is more comment there: " In order for a module to provide a hook on behalf of another module, the name of the file containing the implementation must match the module the hook applies to, not the providing module". The example can be enhanced for sure, I agree but this part tells you what's up: when views module implements hook_views_foo for user module, then user_views_foo needs to be in user.XXXXXX.inc because the module part based on the naming convention (we use co) ..inc
    Crell’s picture

    2. This system does rely on convention; specifically, file naming convention. See point #7. We cannot reliably derive a module name from a function name, because have no way of telling whether foo_bar_baz is the baz hook of module foo_bar or the bar_baz hook of module foo. (This was discussed at great length in the original page split thread.) However, with the dot-based file names we can derive the module from that (since module names cannot contain a period). So it is convention-based.

    3. I actually disagree about renaming registry_check_function(). As an API, it's purpose is not to load a function. It is a drop-in replacement for function_exists(). It has two possible return values: No the function isn't callable, Yes the function is callable, and (pause while I load that file) now it is! A calling routine shouldn't know about loading, just "can I call this function, yes or no?" registry_check_class() and registry_check_interface() I can understand renaming, since they are never called directly but only from the autoload mechanism in PHP.

    7. What don't you get? We can revise that text. It should probably also be moved to the @group block so that all of the documentation is centralized.

    moshe weitzman’s picture

    * Why can _registry_get_resource_module() return 'includes' -- that looks a bit like a hack to me? From an API's point of view, if it is not a valid module name, it should never be returned by _registry_get_resource_module().

    The motivation here is to make sane the dumping ground that is system.module. That module implements lots of hooks on behalf of files in the includes directory. For example, hook_elements() is in system.module even though it more properly belongs in form.inc. The code for deleting old temp files is in system_cron() even though it rightfully belongs in file.inc. I would not cry if we lost this feature, but I do think it is a step forward.

    Crell’s picture

    One possibility this approach opens up is to move code from /includes/ to the system module, but NOT system.module. That way we still get easy conditional loading, but it's more inline with Drupal's module-centric approach to things. That's something to deal with in a later follow-up, though.

    beejeebus’s picture

    ... Also, the calls to module_implements() are awkward.

    i completely agree with this. i've been having a hard time with the coupling between module_implements and the registry code. i think we can use the registry to make it simpler, not more complex. some ideas:

    1. keep the registry's bootstrap problems out of module_implements. as far as i can see, the only code that needs a non-registry version of module_implements is the registry code, and the only reason the registry code needs this is to get at hook_hooks. so, instead of adding the 'dumb' switch and more complexity to module_implements, lets create a function called _registry_get_dynamic_hook_patterns or similar, and put whatever the registry needs to get the list of patterns from hook_hooks without module_implements in there.

    2. get rid of of the 'refresh' flag to module_implements. currently, we only refresh the list in core from module_enable and module_disable, both of which should refresh the list by a rebuild of the code registry. i think the registry code should manage caching (per-request and between requests) and resetting of sorted and sorted-by-name implementation lists, and have module_implements as a read-only call into the registry

    (its currently not possible to call into module_implements and get the list for a hook sorted in a different way than that used the first time a hook's list is fetched without refreshing it, so we could fix that as well. and maybe rename the $sort flag to $sort_by_name?)

    3. module_implements would look like this:

    function module_implements($hook, $sort_by_name = FALSE) {
      return registry_get_hook_implementations($hook, $sort_by_name);
    }
    

    4. store the registry data such that it can handle a single resource as a candidate for multiple hooks, so registry_get_hook_implementations can do simple sql for fetching the data and caching like the rest of drupal. leave the pattern matching for dynamic hooks to the registry rebuild code.

    beejeebus’s picture

    1. keep the registry's bootstrap problems out of module_implements. as far as i can see, the only code that needs a non-registry version of module_implements is the registry code, and the only reason the registry code needs this is to get at hook_hooks. so, instead of adding the 'dumb' switch and more complexity to module_implements, lets create a function called _registry_get_dynamic_hook_patterns or similar, and put whatever the registry needs to get the list of patterns from hook_hooks without module_implements in there.

    hmmm, that's wrong. the call to node_get_types leads back into module_implements, so keeping registry rebuild out of module_implements is not so simple.

    chx’s picture

    Right. Drupal is a bit entangled. I would rather see this committed and work on followup patches which make things simpler. It is not a trivial matter, involves lot more than the registry to make it simpler.

    beejeebus’s picture

    FileSize
    91.56 KB

    Right. Drupal is a bit entangled. I would rather see this committed and work on followup patches which make things simpler. It is not a trivial matter, involves lot more than the registry to make it simpler.

    ok, so how about we just make registry 'dumber' about drupal's module concepts? so, the registry parsing pulls out information about a file and its functions, classes and interfaces, and thats it.

    the patch just drops in the lazy-load stuff, and the existing module/pseudo-module code can remain mostly unchanged. attached is a patch based on this, and even with the registry-rebuild now being 'don't parse it if it hasn't changed' aware, it made things much, much simpler.

    i haven't done anything to address Dries' 'how many time does this hit the db' question yet, but that feels like a simpler thing to do than getting this to work right.

    chx’s picture

    You should have the read issue from the beginning. The reason we tie in to module_implements is to lower the number of queries. It's a massive hit otherwise. Check with devel module.

    chx’s picture

    Status: Needs review » Needs work

    I shall reroll tomorrow morning.

    chx’s picture

    Status: Needs work » Needs review
    FileSize
    95.61 KB

    It's a reroll from #126 with a few search-and-replaces: registry_check_function is now drupal_function_exists and there is drupal_autoload_class and drupal_autoload_interface. Hope Dries likes these.

    beejeebus’s picture

    FileSize
    97.36 KB

    WSOD on this patch. update_7004 we redeclared, and there was an errant space in the require_once in drupal_rebuild_code_registry.

    attached patch fixes these issues.

    the path to a WSOD from #121 via enabling statistics and clearing caches is still evident in this patch.

    @chx: any objections to working the 'don't delete unless something has changed' parts of my patch from #135 into this? i'm assuming we both agree on those parts?

    You should have the read issue from the beginning. The reason we tie in to module_implements is to lower the number of queries. It's a massive hit otherwise. Check with devel module.

    yeah, according to my count the patch at #135 added over a hundred extra queries. not nice, and i guess i didn't make my point about not having addressed the db queries well enough. i wrote some naive-but-simple hook candidate collecting and caching, (based soley on a module_list call and the collected function names) and the db hits go away. its as quick according to ab and as easy on the db as the patch attached to this comment.

    i'm still in favour of this general approach, but i'm not going to die in a ditch about it :-)

    chx’s picture

    FileSize
    85.43 KB

    I clarified with Justin that he only fixed a stray space in bootstrap and system_update_7004->7005. To avoid the WSOD I am now loading the necessary files: common.inc for drupal_get_path, node.module for node_get_types and file for file_scan_directory. All three utility functions are standalone and now I do get no errors with statistics enabled and cache_registry truncated.

    beejeebus’s picture

    Status: Needs review » Needs work

    @chx: the new registry.inc is missing from this patch?

    chx’s picture

    Status: Needs work » Needs review
    FileSize
    95.79 KB

    Bah!

    beejeebus’s picture

    Status: Needs review » Needs work

    ok, this patch survives the WSOD on a cache clear when modules that use hook_boot are enabled.

    w00t!

    beejeebus’s picture

    Status: Needs work » Needs review

    woops! didn't mean to set this to code needs work.

    /me runs away before chx finds out...

    beejeebus’s picture

    Status: Needs review » Reviewed & tested by the community

    kicked the tires some more, can't break it, setting to RTBC.

    beejeebus’s picture

    FileSize
    96.93 KB

    new patch to keep up with HEAD, no functional changes.

    beejeebus’s picture

    FileSize
    99.36 KB

    new patch to keep up with HEAD, no functional changes.

    Crell’s picture

    So this patch has been in the "Patch Spotlight" for weeks now. It's now just tracking HEAD. What still needs to be done before we can land this patch and move on to cutting and pasting our way to a faster Drupal? :-)

    dmitrig01’s picture

    In install.inc, you call function_exists instead of drupal_function_exists. Is this intentional? My gut wants me to mark as CNW

    Crell’s picture

    The registry hasn't been built yet in most of install.inc, so drupal_function_exists() and module_implements() don't work yet.

    beejeebus’s picture

    @Crell: i'd like to see the registry only parse files that haven't changed, as per my question in #139, and refactoring to disentangle the registry from drupal's module_* functions. both could be addressed in a follow up patch(es) though, so i guess we need some feedback from chx and dries about this.

    i'd prefer they go in with the initial patch, but i'll defer to their opinions.

    irrespective of either of the above, the next thing this patch needs is unit tests.

    beejeebus’s picture

    Status: Reviewed & tested by the community » Needs work

    setting this to code needs work because applying the registry patch kills simpletest, which makes it kinda hard to write tests for the registry :-(

    will post more when i figure out why.

    Crell’s picture

    I don't see how the registry could be disengaged from module_*(), since part of the point is to be able to lazy-load code rapidly in module_*().

    Breaking simpletests is, um, yeah, a problem. In what way does it break?

    Stefan Nagtegaal’s picture

    Needs a re-roll on my HEAD..
    unfortunatly I could not fix the failing hunks myself..

    beejeebus’s picture

    Status: Needs work » Needs review
    FileSize
    96.13 KB

    I don't see how the registry could be disengaged from module_*(), since part of the point is to be able to lazy-load code rapidly in module_*().

    there are two ways i see the registry requiring knowledge about modules.

    first, at build time, the registry has to know about which modules are enabled to figure out which files to parse, so it has to call into module_list.

    second, at 'runtime' (where we are just reading the registry data for a page request), we want to build some module-aware caches, most importantly a cache of hook implementations.

    we can uncouple the rebuilding part of the registry from everything except module list, if we accept that the registry build should just do one thing well - scan a bunch of files for functions/classes/interfaces and record them.

    at the moment, we are trying to build the hook implementations cache at registry build time, which requires:
    - some icky 'which resource does this module belong to' and 'which hook could this function be a candidate implementation for' code
    - calls from drupal_rebuild_code_registry into module_implements which can call back into drupal_rebuild_code_registry. this is what was killing simpletest - module_implements --> drupal_rebuild_code_registry --> module_implements --> kaboom

    making drupal go faster by putting some caches in front of the registry data doesn't require that we build those caches at registry build time. instead, we can generate the hook implementation cache like the menu-router cache - collect information during the request, cache it at the end of the request.

    the attached patch separates building the hook implementation cache from building the registry and can run simpletests without blowing up. (ab also says its a bit faster than the patch at #147.)

    i've also put the per-file registry build code in, so the registry won't reparse files that haven't changed since the last run. i'm happy to split that out into a separate patch if people don't like the idea 'separate building the registry from its caches' stuff.

    there's more to do (including tests), but i'm setting to code needs review to get some feedback.

    chx’s picture

    Status: Needs review » Needs work

    The general approach is fine. There are a number of details though....

    1. array_key_exists one of the things PHP should not have. Use isset instead, it's almost always the same, unless the array value in question can be NULL which is not the case.
    2. _registry_parse_files needs a restructure, moving things into utility functions, I can't believe we can't make this without an if-else-if-continue structure.
    3. db_query does not need an array made out of its arguments. The Database TNG patch has that so it might be that Crell used that and you copied that -- but still, it's not needed yet.
    4. In drupal_function_exists you are doing a JOIN. I would like to see that gone even at the cost of denormalizing._registry_check_code picks the file from the registry table anyways.registry_cache_path_filessuffers from the same. A JOIN will always be slower than a SELECT from a well-indexed single table.
    5. I can see you are storing empty hook implementations. That's good and makes my dynamic hook cache stuff unneeded.
    6. In module_implements you changed the module_list arguments, why?
    7. I see registry calls in theme.inc. Are you sure?

    All in all: congratulations, very nice cleanup!

    chx’s picture

    I am working on this.

    beejeebus’s picture

    @chx: thanks for the quick review! i should wait for you to post an updated patch?

    1 and 2: ok and ok

    3: not sure what you mean here - i think the problem is that i'm not using proper query placeholders. any problems here are my own, not sure Crell's good name should be brought into it :-)

    4: not sure i agree on denormalising until we get more of an idea of the performance impact. i'd prefer this to be correct first, then lets see what follow up patches can do to take advantage of the registry to speed up drupal, and denormalise when its clear thats the only way to go.

    5: didn't see the need for a special case :-)

    6: i don't understand the current args.

    $list = module_list(FALSE, TRUE, $sort);
    

    first param, refresh, is false, so unless this is the first call in the request, the second param, bootstrap, will just be ignored. and i don't think we want just the bootstrap modules anyway? am i missing something? i'm curious as to how the module list patch will impact on this patch.

    7: that theme call can go, was some left over code from taking the module_load_all call out of _drupal_bootstrap_full to see how drupal would run ;-)

    Crell’s picture

    Regarding item 3: In Drupal 6, you can call db_query() with either of the following forms:

    db_query("SELECT a, b FROM {mytable} WHERE c=%d and d='%s'", 5, 'hello');
    
    db_query("SELECT a, b FROM {mytable} WHERE c=%d and d='%s'", array(5, 'hello'));
    

    In the DB TNG patch, if I can get that damned thing finished :-), the first form is going away and the second is replaced by, preferably, an associative array of named placeholders. (Unnamed placeholders will still work for db_query()). I have therefore started getting in the habit of using the array form already. Eventually it will have to be an array, so I changed that at some point during cleanup to be more forward-looking. For right now, there's no real difference between the two.

    chx’s picture

    FileSize
    88.97 KB

    I do not want to hold up the issue -- this patch does not work at this moment, but it should be close. I did the denormalization and the reorganization. The registry table ends up empty, I wonder why.

    chx’s picture

    Status: Needs work » Needs review
    FileSize
    88.68 KB

    Oh. Figured it out.

    chx’s picture

    FileSize
    94.26 KB

    plus registry.inc

    beejeebus’s picture

    FileSize
    95.2 KB

    @chx: nice job simplifying the code.

    attached patch with some corrected/extra docs only - no functional changes.

    i'd like to add a registry.inc.test file with some unit tests, but currently run DrupalUnitTestCases are broken and the test framework doesn't run non-module tests.

    pwolanin’s picture

    Maybe this is or a later patch, but in the registry build part, why not just require that the module .info file specify the list of all files that are to be scanned? Otherwise I think you may need broader API/doc changes to indicate that an active and non-active module may not share the same directory, right?

    The current code (for example) would seem not to allow a 3rd party library with a *.php name, right?

    foreach (file_scan_directory('includes', '\.(inc|module|install)$') as $filename => $file) {
    
    Crell’s picture

    Hm. It took a bit of thinking and discussion with chx to decide if #164 is a very bad idea or a brilliant idea. We settled on brilliant. :-) It would put more work on the module developer (albeit not much), but hugely simplify the parsing code. To wit:

    files[mymodule.module] = mymodule
    files[mymodule.stuff.inc] = mymodule
    files[mymodule.other.inc] = user
    files[includes/mymodule.things.inc] = mymodule

    That queues up 4 files that make up the module; 1 is in a subdirectory; 3 are "owned" by mymodule, while one is declared "on behalf of" the user module, thus any hooks in it will be presumed to be hook implementations "by" user module rather than by mymodule. (E.g., node.views.inc, which defines node module's views hooks.)

    That kills the module detection code:

    // get info files
    foreach ($info_file as $module => $files) {
      foreach ($files as $file => $owner) {
        // parse the file and save it, using $owner instead of a detected module from the file name.
      }
    }
    

    I am kinda sorry to lose the fully introspective functionality, but I can see where it is worth the simplification. chx said he'd try to write this after a good night's sleep. :-)

    beejeebus’s picture

    That kills the module detection code:

    the module detection code is dead! long live the module detection code! ;-)

    we don't try to gather the hook implementations at build time anymore, so we don't need or use the module name at registry build time, so the module detection code is already dead.

    i like the idea at #164 because it solves the 'more than one module sharing a directory' problem. i think we could get away with just:

    files[] = mymodule.module
    files[] = mymodule.stuff.inc
    files[] = mymodule.other.inc

    The current code (for example) would seem not to allow a 3rd party library with a *.php name, right?

    do we want a module to be able to specify a directory and a regex to pass to file_scan_directory? going to far?

    chx’s picture

    too far. yeah, the module detection code is nicely gone, but still the file list is needed to avoid the problem Moshe discovered somewhere earlier, namely if a directory contains a ton o' files belonging to several modules. I indeed go to sleep now, if someone surprises me with coding this come next mornin' great, if not, I will do it.

    chx’s picture

    FileSize
    105.94 KB

    This did not need too big changes as our info parser automatically picks up whatever is in info. So I just needed to iterate this new array. I have also written a qucik script which adds the files information to the info files , I will add it to the update documentation, it's really a simple one.

    recidive’s picture

    Is it really necessary to list the .module file on .info? Aren't the .info file always related to the .module? This way can we always process the .module and skip it from the .info file? Or are we willing to be able to make a 'pseudo module' with just a few .inc files but not a .module? Is it currently doable, i.e. can you enable a module that consists of just a .info file?

    chx’s picture

    FileSize
    102.76 KB

    Sure, you are right.

    Crell’s picture

    I'd actually be fine with listing the .module file in the .info. It only makes sense for .module to be special if we're going to load it always, which I believe chx said at one point he didn't want to have to do (and neither do I :-) ).

    pwolanin’s picture

    hmm, can we have a module without a .module? On the one hand it sounds like heresy, on the other, it seems natural if the .info file is really the container for key information about what a module (or theme) is composed of.

    moshe weitzman’s picture

    seems like a moot point to me. lets just auto-register a .module file if one exists.

    beejeebus’s picture

    i'm in favour of being more explicit in the .info file - don't think it costs us much to add the .module file, and it keeps the code a tiny bit dumber, folds more into the data.

    as to whether we can have a module without .module - woah, that's for another issue ;-)

    chx’s picture

    FileSize
    105.28 KB

    Okay modules are back but in .info what's more important, if you run drupal_rebuild_code_registry more than once in a page, like simpletest does then now we do not fail -- and simpletests indeed pass. _registry_get_resource_name had a static cache we needed to reset. I am back to presuming that the files[] array in the info files is not empty. That'd be a very weirdo module which has zero files. Currently, having a module is a must because that's how Drupal figures out it's a module but in the future we might want to find .info files instead and put type = Drupal module inside...

    beejeebus’s picture

    tested the patch at #175, can confirm tests now work as per a clean install.

    regarding tests for the registry: we're now waiting for simpletest fixes i mentioned in #163.

    perhaps we need should add the tests in a follow up issue? if people are happy with this, then i think this is RTBC.

    completely off topic re. static caches - sometimes i wish drupal had a container object for these. functions that need to create a static cache call into the container object, rather than declaring a static var withtin the function body. having to add switches/options to the function signature where we want to clear or fetch the data in a static cache makes me feel dirty. anyway, yeah, i'll go take my medication now...

    beejeebus’s picture

    Status: Needs review » Reviewed & tested by the community

    discussed this with chx - marking to RTBC, and changing the focus of this follow up issue to track tests for the registry.

    Dries’s picture

    Great work all. However, IMO, this patch cannot be committed without some SimpleTests.

    chx’s picture

    Note that http://drupal.org/node/221964#comment-827580 points out our problems with testing, namely includes and unit tests both have open issues. Those are less coding problems than community agreements. This issue will be three months old soon. Please.

    Dries’s picture

    There are already tests in includes/ -- I think SimpleTest should pick these up. As such, I recommend we add a includes/registry.test. Decision made. We can always change/move things around later. :-)

    Given that this patch pre-dates SimpleTest being included in Drupal core, I might be willing to make an exception. That said, I'd rather see us take ourselves serious with regard to testing.

    chx’s picture

    Edit: note that I have edited this comment quite a bit.

    a) note that as the 'file' keys are removed from the menu system, everything would die if the registry would be completely broken. b) my unit test plans involving mock functions has been ... let's say postponed . Even then, testing stateful operations, such as scanning for files that exist on disk would be ... problematic. Do you really want to make this patch dependent on writing a unit test framework based on runkit?? Even, do you want to discuss this in the #181+ followup in an issue? Just because I am working hard on this and now this is the first big patch to be ready after simpletest was committed.
    Also, quoting myself from earlier

    , if you run drupal_rebuild_code_registry more than once in a page, like simpletest does then now we do not fail -- and simpletests indeed pass

    you have no idea how hard it was to hunt this bug which only surfaces w/ simpletest. So I am not playing down simpletest, I worked on making all the tests pass but that's how far I can get now.

    Edit: and so another day passed without this committed and I can go to sleep frustrated over this. How many of these am I going to able to take? This is not the first patch I need to fight over and often I feel that if it'd not be me, it'd get in easier.

    beejeebus’s picture

    There are already tests in includes/ -- I think SimpleTest should pick these up. As such, I recommend we add a includes/registry.test. Decision made. We can always change/move things around later. :-)

    there's one test file in includes - xmlrpc.inc.test. simpletest code doesn't load it at the moment.

    which is a good thing, because loading it via a module test file:

    [justin@chavez clean]$ tail --lines=+2 includes/xmlrpc.inc.test >> modules/node/node.test
    

    equals WSOD. i've submitted a patch for that - can we get that in?

    maybe simpletest picked up non-module files for testing in the past, but it doesn't now. i've submitted a patch to fix that - can we get that in?

    so, we can create a registry test file in includes, but it won't be loaded until the fixes above are applied.

    Dries’s picture

    chx, I apologize for any frustration caused. However, not committing this patch four weeks ago was one of the best things I did. The patch has come a long way since. It would not likely have happened incrementally after the patch has been committed. I'll review this patch once more tomorrow morning. Thanks for your continued efforts! We both know that they'll pay off.

    killes@www.drop.org’s picture

    As Dries requested I had a look at the patch. I can't claim to have understood it fully and I didn't have time to test it. However, it somehow feels right.

    I've only seen two minor glitches which should not hold up the patch:

    1) the db query in drupal_function_exists should have either no aliases at all or the aliases need to be used everywhere

    2) I don't like the query builder in registry_cache_path_files(), it should use placeholders in a proper way.

    In general, it is a bit sad that we have to implement such a registry at all. PHP should provide something like this.

    beejeebus’s picture

    FileSize
    110.03 KB

    thanks for the review killes.

    attached patch removes the errant aliases in drupal_function_exists and uses db_placeholders in registry_cache_path_files.

    no other changes.

    chx’s picture

    @Dries, four weeks ago, yes, but Sunday? @killes, this is not the first (and the last) time Drupal codes around some weakness of PHP. Remember Steven's critique of PHP. It'd so much nicer if we could use autoload for functions, yes.

    owahab’s picture

    subscribing.

    Dries’s picture

    killes: PHP does this already (somewhat). It's called op-code caching and is an in-memory cache of all functions.

    Dries’s picture

    Status: Reviewed & tested by the community » Fixed

    I gave it another full review and it looks good now. The code has come a long way since, and is certainly a ton more readable and easier to understand. Good job everyone -- and thanks for your persistence chx. The patch is committed, and we can tackle incremental improvements in future patches.

    Crell’s picture

    Status: Fixed » Reviewed & tested by the community
    FileSize
    805 bytes

    Here's a quick mention for the change log. w00t!

    Dries’s picture

    Status: Reviewed & tested by the community » Fixed

    Committed to CVS HEAD. I apologize for not giving Larry/Crell credit in the original commit message. Thanks for the hard work, Larry!

    Susurrus’s picture

    Shouldn't this change be added to Converting 6.x modules to 7.x?

    Crell’s picture

    Yes it should. I will go do so. (We should also write proper docs somewhere; where the right location is, I'm not sure.)

    Crell’s picture

    Conversion docs updated.

    ax’s picture

    Status: Fixed » Needs work
    @@ -231,7 +231,14 @@ function module_load_install(
       // Make sure the installation API is available
       include_once './includes/install.inc';
     
    -  module_load_include('install', $module);
    +  $file = module_load_include('install', $module);
    +  // Ensure that you can module_invoke something inside the newly-loaded
    +  // file during install.
    +  $module_list = module_list();
    +  if (!isset($module_list[$module])) {
    +    $module_list[$module]['filename'] = $file;
    +    module_list(TRUE, FALSE, FALSE, $module_list);
    +  }
     }
     
     /**
    
    

    not good. module_load_install() is called by drupal_load_updates() for all modules, including disabled ones. so the patch above marks disabled modules enabled in module_list(). not sure about the best way to fix this.

    beejeebus’s picture

    Status: Needs work » Needs review
    FileSize
    848 bytes

    @ax: this code snippet is a left over from previous versions of the registry patch.

    the reason given for the snippet - being able to module_invoke something during install - is no longer valid. once a modules install file is loaded, then module_invoke will work for any hooks defined in that file.

    attached patch removes the offending snippet from module_load_install. the module tests in system still all pass.

    chx’s picture

    Status: Needs review » Reviewed & tested by the community

    This is indeed a leftover of an overoptimization.

    Dries’s picture

    Status: Reviewed & tested by the community » Fixed

    Thanks ax and justin -- good catch. Committed to CVS HEAD.

    robertDouglass’s picture

    FileSize
    719 bytes

    E_ALL warning when enabling modules. Do we need a new issue for this small patch?

    robertDouglass’s picture

    robertDouglass’s picture

    Note: the E_ALL errors from #199 first appear when using HEAD devel module. Don't know if it is a devel module bug.

    cwgordon7’s picture

    No, I suppose it's possible that a module could have no additional files..

    chx’s picture

    But as you need to list even the module file, this is not a corebug, I confirmed with Robert.

    Anonymous’s picture

    Status: Fixed » Closed (fixed)

    Automatically closed -- issue fixed for two weeks with no activity.

    donquixote’s picture

    Two ideas that are somewhat related to the hook registry concept, and will work for D6

    ---------

    #584024: speed up _theme_process_registry(): get_defined_functions() instead of function_exists()
    or better: #519940: Performance: optimize building theme registry by using get_defined_functions() instead of function_exists()
    get_defined_functions() is a perfect way to quickly find all implementations of existing or imaginary hooks. It works nicely in D6, and I think it will work in D7 if all files in question are included.

    The hooks can be stored in the DB (which is the purpose of the registry, I assume), and they can be kept in an array at runtime, keyed like this:
    $hook_implementations[$hook][$suffix][] = $module;
    (where $suffix can be an empty string)

    The system does not know if something is a hook or not. It simply assumes that any function beginning with a module name should be remembered as a possible hook implementation. This is maybe not the most clever hook system, but it has always been the way that D6 works: It does not care if something has been explicitly defined as a hook.

    ----

    And another one:
    #569646: New hook that is called whenever a hook implementation is added, removed or modified.
    This would work quite well if combined with the above approach: The hook implementations can be stored in a DB table, and with each rebuild, the system can check if there is anything new. For each hook with new implementations, it can then do:

    <?php
    $hook = '...';         // name of the hook with a modified list of implementations
    $update = array(...);  // hook implementations in modules with a modified version number,
                           //   or where an update is explicitly requested.
    $add = array(...);     // added hook implementations
    $remove = array(...);  // removed hook implementations
    foreach ($hook_implementations['update_hook_implementations']['__' . $hook] as $module) {
      $function = $module . '_update_hook_implementations__' . $hook;
      $function($update, $add, $remove);
    }
    ?>

    There can be some bogus data in $add or $remove, if some occasionally included *.inc files contain functions that are recognized as hooks. This should not make a problem, however, because those hooks that matter are all in *.module files, and anything else should usually not try to look like a hook (otherwise it would disturb the old D6 hook system).

    One problem to think about:
    - module 'x' has a hook implementation 'x_y()' in 'x.inc'
    - the module decides at runtime if 'x.inc' should be included or not, to dynamically include or exclude it from calls to hook_y().

    This is hard to do with a cached list of hook implementations. I don't know if such modules exist in D6, but theoretically they can.
    ----

    I am not familiar enough with D7, to decide if any of this makes sense for D7. But proposal one is definitely worth it for D6, if only to gain performance.

    I don't see that any of this would break the D6 api. It extends the api, but doesn't break it for existing modules.

    NancyDru’s picture

    One problem to think about:
    - module 'x' has a hook implementation 'x_y()' in 'x.inc'
    - the module decides at runtime if 'x.inc' should be included or not, to dynamically include or exclude it from calls to hook_y().

    This is hard to do with a cached list of hook implementations. I don't know if such modules exist in D6, but theoretically they can.

    Yes, there are modules that do this. Web Links, for example checks for the presence of at least 3 other modules to decide whether or not to include those hook files.

    Your "get_defined_functions()" method is interesting, but just checking the name starting with a module name doesn't really tell me that something is a hook. Do you have any way to narrow it down?

    donquixote’s picture

    @NancyDru
    You don't need to know if it's a hook.

    If you call module_invoke_all('bogus'), then it will not care if "bogus" has ever been explicitly defined as a hook. In fact there is no way to define hooks. Instead, it will just call all functions with $modulename . '_bogus'.

    The new system would preemptively remember anything that could be a hook implementation, so it will also store $modulename . '_bogus'. Entries for non-existing hooks will just sit in the cache and do nothing.

    In fact, one function can be stored several times:

    // function name was 'x_y_z'.
    $hook_implementations['y_z'][''][] = 'x';  // module 'x' implements hook 'y_z'.
    $hook_implementations['y']['_z'][] = 'x';  // module 'x' implements hook 'y', with suffix '_z'.
    $hook_implementations['z'][''][] = 'x_y';  // module 'x_y' implements hook 'z'.
    

    This looks stupid, but it's the way that D6 is supposed to work.

    donquixote’s picture

    One problem to think about:
    - module 'x' has a hook implementation 'x_y()' in 'x.inc'
    - the module decides at runtime if 'x.inc' should be included or not, to dynamically include or exclude it from calls to hook_y().

    This is hard to do with a cached list of hook implementations. I don't know if such modules exist in D6, but theoretically they can.

    Yes, there are modules that do this. Web Links, for example checks for the presence of at least 3 other modules to decide whether or not to include those hook files.

    What we would need from a module is a list of possible function names from include files that should be considered as hook implementations, and/or a list of files to include before doing the get_defined_functions. The most robust would be doing it in the module's *.info file.

    ; files other than *.module that contain hook implementations,
    ; and that are not included by default in *.module
    hook_includes[] = includefile.inc
    hook_includes[] = includefile2.inc
    
    ; hook implementations that are neither in *.module,
    ; nor in any of the files listed in hook_includes,
    ; nor in any of the files included by default in *.module
    hook_functions[] = modulename_hook1
    hook_functions[] = modulename_hook2
    

    This would only be necessary for those rare modules where hook includes depend on a dynamic condition. But it is still an API change, which could kill this idea :(

    I think for Web Links it would not even be necessary, because the condition (other modules exists or not) is rather static. The hook cache is rebuilt every time you enable/disable a module.

    NancyDru’s picture

    You don't need to know if it's a hook.

    Well, actually, in the Site Documentation module, I want to add a list of which modules invoke which hooks. That presupposes that I know what hooks are available so the admin can select from the list.

    donquixote’s picture

    What about this:
    1) the hook cache I proposed does not care if something is a hook.
    2) If you want, the system could remember the hook names used with every call to module_invoke_all or similar functions.
    3) If you want, Drupal could define whatever rules to allow explicit definition of hooks. These rules are not used in the hook cache system, but can be used in your "Site Documentation" module. To my knowledge, such a system does not exist in D6 yet, but is the purpose of the D7 registry.
    4) Information about hooks could be retrieved from drupal.org.

    EDIT:
    5) Information about hooks in dedicated files in module folders (such as help files). This could include documentation about each hook. On the other hand, a hook is not "owned" by a module, it is a free thing...
    6) A list of hook names shipped with the "Site Documentation" module. It would be the module maintainer's task to keep it up to date.

    Most of the system will work quite ok (and fast!) with solution (1). You can make a site documentation module, but it will contain a lot of noise.
    Solution (2) needs a bit of extra resources, and its only use case would be modules like "Site Documentation".
    Solution (3) means extra work for module developers, and again the only use case would be modules like "Site Documentation".
    Solution (4) means extra work drupal.org and api documentation, but it doesn't add code complexity or performance penalties. After all, the info would only be retrieved for modules like "Site Documentation".

    I doubt if keeping track of explicitly defined hook names is worth the trouble, even if a "Site documentation" module would be nice.

    NancyDru’s picture

    I have thought about essentially doing #3, but can't count on any module developers to do it. That's the main reason why this feature has never been implemented even though it is easy to do.

    I don't see DO admins including contrib APIs, so #4 won't happen.

    donquixote’s picture

    (added point 5 and 6 to the above list)

    ---

    True, but anyway, this is not really related to the get_defined_functions -based hook cache idea.
    I am very concerned about performance in D6, and would appreciate if this solution would be implemented.

    NancyDru’s picture

    This issue is marked "closed" (#204) so the Drupal maintainers aren't even looking at it.

    donquixote’s picture

    Status: Closed (fixed) » Active

    so... should I reopen it, or is this considered hijacking?
    I think the discussion, if any, should rather happen in the other issue,
    http://drupal.org/node/584024
    although I get the feeling the other issue is a bit too specific to '_preprocess_' hooks.

    Crell’s picture

    Status: Active » Closed (fixed)

    This is very much thread hijacking and not the place for this sort of discussion. The registry has already been rolled back and is no longer part of Drupal 7, save for a cache of class/interface definitions. There is no more registry for functions.

    For further discussion of reducing code weight, see (but don't take off topic) #557542: Cache module_implements() .

    donquixote’s picture

    Damn, I played with the reopen thing but then didn't actually want to do it. But then I didn't switch it back. My apologies.

    In the meantime I have found some other discussions which I should have checked first, including the one you are pointing to. And including one older issue I started myself months ago... which I totally forgot. I am duplicating myself!
    #519940: Performance: optimize building theme registry by using get_defined_functions() instead of function_exists()

    Btw:
    This sidetracking would not have happened if someone had added links to the more up-to-date discussions.
    - #497118: Remove the registry (for functions)
    - #557542: Cache module_implements()

    etaroza’s picture

    Issue tags: +bootstrap

    I'm experiencing a problem, that I can't rebuild the registry before bootstrap and bootstrap is failing because of illegal registry...

    The situation is as follows:

    • I've got all the modules in sites/all/modules
    • Then I move those modules to sites/all/modules/_contrib
    • The site doesn't load because of some *.inc files aren't found where the root of the problem is at bootstrap.inc

    I might be moving the modules in a wrong way, but this worked for Drupal 6.x as far as I know. Now the bootstraping mechanism is looking at the stale registry data and I can't do much about it?

    I guess I need to rebuild the registry after moving modules around. But how to do that?