Programming Best Practices

Summary

This page outlines our Best Practice guidelines for programming within Drupal. It concerns higher level issues than those dealt with in the Coding Standards document or the Configuration & Usage Best Practices page(s).

The purpose of these pages is not to teach one how to program, but to teach one how to become a better programmer within this framework. Many people can write code, but many of those people do not know that there is a science to programming for Drupal. Why are speed and performance so important? Why document code? Why is this commonly used code snippet actually a mistake? These questions and so many more are answered and explained below.

Proposed Topical Outline

  • Writing efficient code
    • Using recursive functions
    • The concept of run-time and Big-O
    • Avoiding unnecessary coupling
  • Writing code that comments itself
  • The necessity of documentation
  • Naming your functions
    • Using private functions
  • Using global variables
  • Making your modules extensible
    • Writing hook functions and theme functions
    • When should you build an API for your module and how to do this
  • OO code versus procedural and where Drupal stands on this
  • Links to useful documentation on PHP programming best practices, e.g. http://www.odi.ch/prog/design/php/guide.php
  • Database specific considerations
    • e.g. how to store information efficiently, use of the system table, the watchdog table, etc
  • Programming pitfalls to avoid
    • Creating code that already exists
    • Overusing if-else, switch-case, and try-catch
    • The dichotomy of the ternary operator

Don't preemptively optimize your code

dww - July 26, 2008 - 19:30

Sure, you can/should avoid needlessly inefficient code. However, don't make the mistake of preemptively optimizing everything. Many of your functions won't be called in performance critical situations. It's far more important for your code to be clear and understandable than it is for every last function to be optimized with obfuscated logic and code.

A) When you start writing the code, you're likely to be wrong about where the bottlenecks are, and you'll spend time optimizing something completely irrelevant when some other function is what really needs your attention.

B) The complicated logic for your "optimization" might be optimizing for the wrong problem. Maybe the really slow thing is you have a query that needs a DB index, and the processing on the results of that query are dwarfed by the cost of retrieving the data in the first place. You really need to understand what the problem is before you try to fix it.

C) The additional complication in your code will make debugging or extending the code more difficult. In the vast majority of cases, the extra cost in human developer time far outweighs the minor savings in computational power that your "optimization" made. Computers are fast and cheap. Humans are slow and expensive.

D) Correctness is always more important than speed. First write your code so it provides the correct results. Often, that's enough of a challenge. ;) Once it actually does what it's supposed to, you can use a profiler to see if there are problem spots that can be fine tuned. But focus on correctness first.

Caching is often the best optimization

dww - July 26, 2008 - 19:33

If something needs to be optimized, in a large majority of cases, the only optimization it needs is to cache the expensive result instead of recomputing it all the time. Learn how to use static variables (same function being called multiple times for the same page load) and the DB cache (same answer being used on multiple page loads) to save expensive results and reuse them. It's pretty rare that it's worth spending time speeding up the expensive computation -- just save the result and reuse it.

Pay extra attention to I/O

dww - July 26, 2008 - 19:35

User input and what your code prints out are basically the root of all security problems. Pay EXTRA attention any time your code is touching user-supplied data or is generating output.

If your code is touching the filesystem at all you should be extra careful about that, too.

Oh, and whenever your code is generating output, consider using a theme function. ;)

return comment in update function

beginner - July 28, 2008 - 08:46

in hook_update_N() functions, we usually have queries to performs. Sometimes however, there is no query but some other operation is performed (like updating some variables). Instead of returning an empty array, return some useful message this way:

<?php
mymodule_update_6000
() { // Number your update function according to core compatibility.

// do what you have to do.

return array('success' => TRUE, 'sql' => 'We upgraded this and that...'); // don't do: return array();
}
?>

This will prevent having an unhelpful No queries message printed when performing the upgrade.

parsing errors are easy to debug

beginner - July 28, 2008 - 08:56

A parsing error is immediately visible and easily fixed. A wrong variable attribution can remain hidden for a long time and be hard to debug.

How often have you written if ($foo = 'bar') {} when you meant: if ($foo == 'bar') {}?? Me? Very often.

<?php
// If you mean this:
if ($foo == 'bar') {}
// get into the habit of writing this instead:
if ('bar' == $foo) {}
// because this will cause a hard to find bug that may remain hidden for a long time:
if ($foo = 'bar') {}
// while this will immediately be noticed:
if ('bar' = $foo) {}
// and I don't know any *good* reason to follow the more traditional order.
?>

This one isn't universally agreed...

dww - July 28, 2008 - 17:24

Many people advocate this, but I know there's not universal agreement that this is good practice. The fact that Core Doesn't Do It(tm) is already enough evidence that this isn't a clear win that belongs in any "Best Practices" document.

Personally, I find it jarring to the logical flow of reading code. It reminds me of top-posting, which I find incredibly annoying and hard to read. Yes, the compiler/interpreter can handle $A == B or B == $A, but to the human reading the code, you're checking if some variable is holding a value you're looking for. You're not testing if some value is the same as the variable you're interested in.

Of course, a thread like this is bound to generate some flames and bikeshed arguments, and reasonable people can agree to disagree about some of these things. In cases like those, if there's not a clear majority that supports one side of it, let's just not put them in the document.

maybe core should adopt this practice.

beginner - July 30, 2008 - 03:54

In any case, this is one instance where I don't care what core does. I use it everywhere in my modules. This practice saves me time and headaches and that's what matters to me.

The error of putting one equal sign instead of two is a very common one, especially for beginners like me and for people who don't spent their whole day, every day programming. Throughout our school education we are taught that = is the equal sign, i.e. things on the right are equal to the things on the left. When we start programming we are taught to use == for the same operation. Previous conditioning during school education make it easy to forget. I have wasted too much time because of this.

Now that I remember to put things the other way around, I remember why I taught myself to do so in the first place, so I don't do the same mistake anymore. It's just a question of habit and one gets used to it very easily. And it makes sense too: if I can say A equals B, then in English I can also say B equals A. In computerese: if I can write A == B, then I can write B == A.

It is my opinion that core should use this practice as well (but I am not going to push for it). After all, recently some mega patch was committed to add a simple extra space between a quotation mark and the concatenation operator. Drupal is used to such changes to adopt the practice that makes more sense. Avoiding bugs and saving developer time make sense to me.

[Edit: I am not interested in "winning" an argument. I have now presented my best arguments and have nothing to add. I use this myself, but I don't care about convicing other people or about what core does.]

Not to start a flamewar

aj045 - July 30, 2008 - 16:49

Not to start a flamewar here, but I equate (no pun) confusing = with == on the same issue of this mortal sin, for which all non-new programmers who make this mistake should be punished for making:

<?php
if (someFunctionThatOnlyReturnsTrueOrFalse() == TRUE) {
 
// do something...
}
// or...
if (someFunctionThatOnlyReturnsTrueOrFalse() == FALSE) {
 
// do something
}
?>

It is simply a lack of understanding on the part of the programmer. The solution is to write more code and more complex code-- and to keep doing so until the differences are actively etched into the programmers brain and the likelihood of making such a mistake is near 0%. Just my opinion. But I think this topic deserves a page of its own, as well.

Then don't

Michelle - August 3, 2008 - 21:39

Beginner's argument was calm and well reasoned. Yours is uncalled for. I am not a beginning programmer but I spent so many years using a language where you use one = that it's taking me some time to retrain my brain. To say I should be "punished" for that is ridiculous. If you don't want to start a flamewar, don't make inflammatory comments.

As to the issue itself, I could go either way. I do make the mistake a lot but am just working hard to watch out for it rather than change the order. I find 'foo' == $bar hard to read as well but will follow whatever core does.

Michelle

--------------------------------------
See my Drupal articles and tutorials or come check out life in the Coulee Region.

Static Caching

metzlerd - July 29, 2008 - 04:31

Particularly with OO. Create static caching factory functions rather than global variables to avoid name space collisions.

Would also recommend teaching people when to use static variable caching instead of Session variables.

When to use/not use check_plain

earnie - July 29, 2008 - 11:48

For instance the the l() function already uses check_plain so do not use check_plain for the text parameter you pass it.

One should store any keyed or aggregated text in the database using check_plain so that check_plain isn't needed to display the data from the DB.

One should store any keyed

Heine - July 29, 2008 - 12:11

One should store any keyed or aggregated text in the database using check_plain so that check_plain isn't needed to display the data from the DB.

This is incorrect. You should store data verbatim and use check_plain on plaintext data when inserting it in an HTML context.

See also
http://drupal.org/node/263002
http://drupal.org/node/28984
http://acko.net/blog/safe-string-theory-for-the-web

--
The Manual | Troubleshooting FAQ | Tips for posting | How to report a security issue.

We're still working out the

aj045 - July 29, 2008 - 20:49

We're still working out the ideas for "Best Programming Practices" but this is a bit "lower level" than the purpose of this subsection of the handbook. Really what we are looking for is conceptual ideas: optimization of code, caching, documentation, etc.

But now that I think about it, this could fit under "Best Programming Practices" because it isn't always clear when and when not to use this. If you use it but don't need to, you just blew away a few nano seconds of processing time needlessly. If you didn't use it but should have, you opened a security hole.

Undoubtedly there will be a gray-area of topics for this subsection.

This comment is mainly in

katbailey - July 30, 2008 - 17:54

This comment is mainly in relation to the comment above about the practice of writing "if ('bar' == $foo) {}" for easier debugging but it applies generally to what this document should be about. A Best Practices guide should focus on explaining what constitutes superior code - I take superior to mean things like "more efficient", "more logical". In the case the "if ('bar' == $foo) {}" practice, any efficiency gained is efficiency in the face of error-prone-ness, rather than efficiency of the code itself, and so combined with the fact that it is logically awkward, I would say it does not belong in a Best Practices guide but more in a different, probably equally helpful but less authoritative, sort of document. It would be great if we could keep this one for more universally agreed upon practices.

It belongs (or doesn't) in the coding standards guidelines

dww - July 30, 2008 - 18:21

I agree. The question of $A == B vs. B == $A isn't a "Programming best practices" issue, it belongs (or doesn't) in the http://drupal.org/coding-standards area of the handbook. It's a low-level syntax question that should be agreed upon and adopted (or not) via the coding standards. It's not a high-level best practice like "don't prematurely optimize your code" or "don't write a 500 line function that accomplishes 10 things all at once", which are the sorts of things we're aiming for here.

While I agree with dww I

earnie - August 3, 2008 - 19:48

While I agree with dww I want to state that $A == B is more natural for me but I understand the syntax catch benefit of B == $A when forgetting the == and setting the variable instead. This is one of those ahah items that one should get used to when coding because it helps the coder. It is like always adding the comma for the last item in an array list. So while I currently write $A == B my vote would be for B == $A as the standard best practice.

As for the coding standards document; I believe it to be the beginning place for any best practice. However, that document cannot contain all convention because it would be too large. The best practice document is a common sense type of document that makes suggestions but doesn't set in stone the practice like the coding standards do. So my question back to community is; does this need to be set in stone?

Any thoughts on the ternary

aj045 - August 11, 2008 - 23:01

Any thoughts on the ternary operator? Does it make code harder to read? Is it faster than a regular if-else statement?

it depends, and seems out of scope for this doc...

dww - August 11, 2008 - 23:33

I think the readability of it really depends on the specific situation.

Furthermore, I think that's the sort of low-level style-guide material that we should avoid in this page.

 
 

Drupal is a registered trademark of Dries Buytaert.