This is a commentary on the process Drupal goes through when serving a page. For convenience, we will choose the following URL, which asks Drupal to display the first node for us. (A node is a thing, usually a web page.)
A visual companion to this narration can be found here; you may want to print it out and follow along. Before we start, let's dissect the URL. I'm running on an OS X machine, so the site I'm serving lives at /Users/vandyk/Sites/. The drupal directory contains a checkout of the latest Drupal CVS tree. It looks like this:
CHANGELOG.txt cron.php CVS/ database/ favicon.ico includes/ index.php INSTALL.txt LICENSE.txt MAINTAINERS.txt misc/ modules/ phpinfo.php scripts/ themes/ tiptoe.txt update.php xmlrpc.php
So the URL above will be be requesting the root directory
/ of the Drupal site. Apache translates that into
index.php. One variable/value pair is passed along with the request: the variable 'q' is set to the value 'node/1'.
So, let's pick up the show with the execution of index.php, which looks very simple and is only a few lines long.
Let's take a broad look at what happens during the execution of
index.php. First, the
includes/bootstrap.inc file is included, bringing in all the functions that are necessary to get Drupal's machinery up and running. There's a call to
drupal_page_header(), which starts a timer, sets up caching, and notifies interested modules that the request is beginning. Next, the
includes/common.inc file is included, giving access to a wide variety of utility functions such as path formatting functions, form generation and validation, etc. The call to
fix_gpc_magic() is there to check on the status of PHP "magic quotes" and to ensure that all escaped quotes enter Drupal's database consistently. Drupal then builds its navigation menu and sets the variable
$status to the result of that operation. In the switch statement, Drupal checks for cases in which a Not Found or Access Denied message needs to be generated, and finally a call to
drupal_page_footer(), which notifies all interested modules that the request is ending. Drupal closes up shop and the page is served. Simple, eh?
Let's delve a little more deeply into the process outlined above.
The first line of
index.php includes the
includes/bootstrap.inc file, but it also executes code towards the end of
bootstrap.inc. First, it destroys any previous variable named
$conf. Next, it calls
conf_init(). This function allows Drupal to use site-specific configuration files, if it finds them. The name of the site-specific configuration file is based on the hostname of the server, as reported by PHP.
conf_init returns the name of the site-specific configuration file; if no site-specific configuration file is found, sets the variable
$config equal to the string
$confdir/default. Next, it includes the named configuration file. Thus, in the default case it will include
sites/default/settings.php. The code in
conf_init() would be easier to understand if the variable
$file were instead called
$conf_filename would be a better choice than
The selected configuration file (normally
/sites/default/settings.php) is now parsed, setting the
$db_url variable, the optional
$db_prefix variable, the
$base_url for the website, and the
$languages array (default is
database.inc file is now parsed, with the primary goal of initializing a connection to the database. If MySQL is being used, the
database.mysql.inc files is brought in. Although the global variables
$db_url are set, the most useful result of parsing
database.inc is a global variable called
$active_db which contains the database connection handle.
Now that the database connection is set up, it's time to start a session by including the
includes/session.inc file. Oddly, in this include file the executable code is located at the top of the file instead of the bottom. What the code does is to tell PHP to use Drupal's own session storage functions (located in this file) instead of the default PHP session code. A call to PHP's
session_start() function thus calls Drupal's
sess_read() functions. The
sess_read() function creates a global
$user object and sets the
$user->roles array appropriately. Since I am running as an anonymous user, the
$user->roles array contains one entry,
We have a database connection, a session has been set up...now it's time to get things set up for modules. The
includes/module.inc file is included but no actual code is executed.
The last thing
bootstrap.inc does is to set up the global variable
$conf, an array of configuration options. It does this by calling the
variable_init() function. If a per-site configuration file exists and has already populated the
$conf variable, this populated array is passed in to
variable_init(). Otherwise, the
$conf variable is null and an empty array is passed in. In both cases, a populated array of name-value pairs is returned and assigned to the global
$conf variable, where it will live for the duration of this request. It should be noted that name-value pairs in the per-site configuration file have precedence over name-value pairs retrieved from the "variable" table by
We're done with
bootstrap.inc! Now it's time to go back to
index.php and call
drupal_page_header(). This function has two responsibilities. First, it starts a timer if
$conf['dev_timer'] is set; that is, if you are keeping track of page execution times. Second, if caching has been enabled it retrieves the cached page, calls
module_invoke_all() for the 'init' and 'exit' hooks, and exits. If caching is not enabled or the page is not being served to an anonymous user (or several other special cases, like when feedback needs to be sent to a user), it simply exits and returns control to
index.php, we find an include statement for
common.inc. This file is chock-full of miscellaneous utility goodness, all kept in one file for performance reasons. But in addition to putting all these utility functions into our namespace,
common.inc includes some files on its own. They include
theme.inc, for theme support;
pager.inc for paging through large datasets (it has nothing to do with calling your pager); and
menu.inc, many constants are defined that are used later by the menu system.
The next inclusion that
common.inc makes is
xmlrpc.inc, with all sorts of functions for dealing with XML-RPC calls. Although one would expect a quick check of whether or not this request is actually an XML-RPC call, no such check is done here. Instead, over 30 variable assignments are made, apparently so that if this request turns to actually be an XML-RPC call, they will be ready. An
xmlrpc_init() function instead may help performance here?
tablesort.inc file is included as well, containing functions that help behind the scenes with sortable tables. Given the paucity of code here, a performance boost could be gained by moving these into
The last include done by
file.inc, which contains common file handling functions. The constants
FILE_DOWNLOADS_PUBLIC = 1 and
FILE_DOWNLOADS_PRIVATE = 2 are set here, as well as the
FILE_SEPARATOR, which is
\\ for Windows machines and
/ for all others.
Finally, with includes finished, common.inc sets PHP's error handler to the
error_handler() function in the
common.inc file. This error handler creates a watchdog entry to record the error and, if any error reporting is enabled via the
error_reporting directive in PHP's configuration file (
php.ini), it prints the error message to the screen. Drupal's
error_handler() does not use the last parameter
$variables, which is an array that points to the active symbol table at the point the error occurred. The comment "
// set error handler:" at the end of common.inc is redundant, as it is readily apparent what the function call to
Content-Type header is now sent to the browser as a hard coded string: "
Content-Type: text/html; charset=utf-8".
If you remember that the URL we are serving ends with
/~vandyk/drupal/?q=node/1, you'll note that the variable
q has been set. Drupal now parses this out and checks for any path aliasing for the value of
q. If the value of
q is a path alias, Drupal replaces the value of
q with the actual path that the value of
q is aliased to. This sleight-of-hand happens before any modules see the value of
Module initialization now happens via the
module_init() function. This function runs
require_once() on the
watchdog modules. The filter module defines
FILTER_STYLE* constants while being included. Next, other modules are
module_list(). In order to be loaded, a module must (1) be enabled (that is, the status column of the "system" database table must be set to 1), and (2) Drupal's throttle mechanism must determine whether or not the module is eligible for exclusion when load is high. First, it determines whether the module is eligible by looking at the throttle column of the "system" database table; then, if the module is eligible, it looks at
$conf["throttle_level"] to see whether the load is high enough to exclude the module. Once all modules have been
include_once'd and their names added to the
$list local array, the array is sorted by module name and returned. The returned
$list is discarded because the
module_list() invocation is not part of an assignment (e.g., it is simply
module_list() and not
$module_list = module_list()). The strategy here is to keep the module list inside a static variable called
$list inside the
module_list() function. The next time
module_list() is called, it will simply return its static variable
$list rather than rebuilding the whole array. We see that as we follow the final objective of
module_init(); that is, to send all modules the "init" callback.
To see how the callbacks work let's step through the init callback for the first module. First
module_invoke_all() is called and passed the string enumerating which callback is to be called. This string could be anything; it is simply a symbol that call modules have agreed to abide by, by convention. In this case it is the string "init".
module_invoke_all() function now steps through the list of modules it got from calling
module_list(). The first one is "
admin", so it calls
module_invoke() function simply puts the two together to get the name of the function it will call. In this case the name of the function to call is "
admin_init()". If a function by this name exists, the function is called and the returned result, if any, ends up in an array called
$return which is returned after all modules have been invoked. The lesson learned here is that if you are writing a module and intend to return a value from a callback, you must return it as an array. [Jonathan Chaffer: Each "hook" (our word for what you call a callback) defines its own return type. See the full list of hooks available to module developers, with documentation about what they are expected to return.]
common.inc. There is a check for suspicious input data. To find out whether or not the user has permission to bypass this check,
user_access() is called. This retrieves the user's permissions and stashes them in a static variable called
$perm. Whether or not a user has permission for a given action is determined by a simple substring search for the name of the permission (e.g., "bypass input data check") within the
$perm string. Our
$perm string, as an anonymous user, is currently "0access content, ". Why the 0 at the beginning of the string? Because
$perm is initialized to 0 by
The actual check for suspicious input data is carried out by
valid_input_data() which lives in
common.inc. It simply goes through an array it's been handed (in this case the
$_REQUEST array are examined. This seems very time-consuming. Also, would it die if my URL ended with "
/?xml=true" or "
The next step in
common.inc's executable code is a call to
locale_init() to set up locale data. If the user is not an anonymous user and has a language preference set up, the two-character language key is returned; otherwise, the key of the single-entry global array
$language is returned. In our case, that's "en".
The last gasp of
common.inc is to call
init_theme(). You'd think that for consistency this would be called
theme_init() (of course, that would be a namespace clash with a callback of the same name). This finds out which themes are available, which the user has selected, and then
include_once's the chosen theme. If the user's selected theme is not available, the value at
$conf["theme_default"] is used. In our case, we are an anonymous user with no theme selected, so the default xtemplate theme is used. Thus, the file
include_once'd. The inclusion of
include_once("themes/xtemplate/xtemplate.inc"), and creates a new object called xtemplate as a global variable. Inside this object is an xtemplate object called "
template" with lots of attributes. Then there is a nonfunctional line where
SetNullBlock is called. A comment indicates that someone is aware that this doesn't work.
Now we're back to
index.php! A call to
fix_gpc_magic() is in order. The "gpc" stands for Get, Post, Cookie: the three places that unescaped quotes may be found. If deemed necessary by the status of the boolean
magic_quotes_gpc directive in PHP's configuration file (
php.ini), slashes will be stripped from
$_REQUEST arrays. It seems odd that the function is not called
fix_gpc_magic_quotes, since it is the "magic quotes" that are being fixed, not the magic. In my distribution of PHP, the
magic_quotes_gpc directive is set to "Off", so slashes do not need to be stripped.
The next step is to set up menus. This step is crucial. The menu system doesn't just handle displaying menus to the user, but also determines what function will be handed the responsibility of displaying the page. The "
q" variable (we usually call the Drupal path) is matched against the available menu items to find the appropriate callback to use. Much more information on this topic is available in the menu system documentation for developers. We jump to
menu.inc. This sets up a
$_menu array consisting of items, local tasks, path index, and visible arrays. Then the system realizes that we're not going to be building any menus for an anonymous user and bows out. The real meat of the node creation and formatting happens here, but is complex enough for a separate commentary; Drupal's node building mechanism. Back in
index.php, the switch statement doesn't match either case and we approach the last call in the file, to
common.inc. This takes care of caching the page we've built if caching is enabled (it's not) and calls
module_invoke_all() with the "exit" callback symbol.
Although you may think we're done, PHP's session handler still needs to tidy up. It calls
session.inc to update the session database table, then
sess_close() which simply returns 1.