Drupal Module Upgrader for Contributors

DMU’s Architecture

It all starts with a target module. A target module is a Drupal 7 module that is being analyzed or converted.

Target modules are represented by the TargetInterface interface. The primary purpose of this interface is to provide methods for looping through the module’s files and parsing their code. This concept of a target module (TargetInterface) is omnipresent at pretty much every level of the DMU.

Digging a little bit deeper, all targets are associated with an index and a code manager. The index is a database of information about the target module (like what functions it defines, what functions it calls, what hooks it implements, what classes or interfaces it provides, and so forth). The code manager, on the other hand, is essentially a thin I/O wrapper. (I’ll talk more about it later.)

The target, its index, and its code manager comprise what I call the "global state". They’re the top-level objects of DMU’s API. The real meat and potatoes are the plugins. DMU is 95% plugins ^{[citation needed]}. All the heavy lifting in DMU is done by plugins. DMU defines four plugin types: analyzers, converters, cleaners, and indexers.

Really, when it comes down to it, DMU’s plugins are in the business of parsing, examining, and manipulating PHP code. Just like jQuery is in the business of parsing, examining, and manipulating HTML.

But, you’re thinking, there’s no jQuery for PHP code. Is there?

Enter the Dragon -- er, Pharborist

Pharborist is a PHP parser, itself written in PHP. What makes it great is how remarkably easy it is to use, once you “get” it. At times, it even feels rather jQuery-ish. For my money, it’s the best library out there for parsing and modifying PHP code. And it has a cool name, so there. The syntax looks something like this (a huge step up from parsing PHP with regex or a giant array of PHP tokens! ;)):

$manager = ClassMethodCallNode::create('\Drupal', 'entityManager');

$arguments = $call->getArguments();
if (empty($arguments)) {
  return $manager->appendMethodCall('getDefinitions');
}
elseif (sizeof($arguments) == 1) {
  return $manager
    ->appendMethodCall('getDefinition')
    ->appendArgument(clone $arguments[0]);
}

If that makes sense to you, you’re ahead of the curve; it is initially hard to grok. Because when you look at PHP code, it doesn’t really look or feel like a tree. Consider HTML: it’s obviously tree-like. Just by glancing at well-formed HTML, you can readily discern the tree structure of the document. Well, the truth is that, from the interpreter’s point of view, a PHP source file is also a tree (known as a syntax tree). From a human’s point of view, it sure doesn’t seem like one. Too bad: Pharborist sees PHP code as a tree. This, right here, is the single biggest hurdle to grokking the Pharborist way -- you need to learn to think about PHP code the way computers do. But if you’ve read this far, I think you can handle it ;)

Now, what has any of this got to do with DMU?

To do the grunt work of manipulating code, DMU relies on Pharborist, utterly. Let’s say you want to write a plugin to examine and/or modify your module’s implementation of hook_permission: the good news is that DMU will not hand you a string containing the code of your hook and expect you to pull it apart with regexes. Nor will it hand you a Geneva Convention-violating array of tokens (which is what PHP_CodeSniffer does).

Instead, DMU will first figure out which file contains the function you want. Then it’ll use Pharborist to parse that file into a syntax tree. Finally, it’ll search through that syntax tree until it finds the function you want, which in Pharborist terminology is a node (meaning in “node in a tree”, not the traditional Drupal definition of a node). That node -- which is just an object descending from Pharborist's Node class -- is what you work with.

Different snippets of code are represented by different node types. For example, a function -- say, this one:

function foo_hello() {
    // C for Dummies fans, rejoice.
    return t(‘Goodbye, cruel world!’);
}

...would be represented by a FunctionDeclarationNode object. Or take a variable, like $foo. In Pharborist, that’s a VariableNode. There are many, many different node types.

With any given node, I can search for its children and/or descendants. Sticking with the stupid foo_hello() function above, let’s say I want to get the node for that call to t(). Here’s how I’d find it:

// $foo_hello is a FunctionDeclarationNode
$foo_hello->find(\Pharborist\Filter::isFunctionCall(‘t’));

This will return a collection (an array-like object) containing a node for every call to t() in foo_hello(). So this will work:

$t_calls = $foo_hello->find(\Pharborist\Filter::isFunctionCall(‘t’));
// $t_calls[0] isn’t a string -- it’s a FunctionCallNode. But when you
// cast it (or any Pharborist node) to a string, you get its original PHP
// code -- right down to the whitespace.
echo $t_calls[0];    // Produces “t(‘Goodbye, cruel world!’)”
echo $t->getName();    // Produces “t”

Pharborist is very deep, but I think you get the general idea. The entire Pharborist API is documented at Pharborist

Guide maintainers