bmj.com Drupal success story

bmj.com Drupal success story

The British Medical Journal website was migrated to Drupal as part of a two year redesign project that went live in November 2011. It was a co-development project that involved BMJ Group’s Technology team and HighWire Press, the digital publishing arm of Stanford University Libraries and bmj.com’s technical hosting partner since 1997.

About BMJ

The BMJ (British Medical Journal) is one of the world’s top five international peer reviewed medical journals. It was the first general medical journal with an online presence, and its “web first” publishing model means that bmj.com gets updated daily with original research papers, education, news, and comment articles, plus podcasts, videos, and article based readers’ responses. Each month bmj.com gets more than 1.5 million unique visitors and 6.8 million page views.

Why Drupal was chosen: 
  1. Drupal has a lot of out-of-the box content management functionalities that deliver most of our requirements. The Drupal node concept has been a good fit for BMJ article content types (research, news, education, and comment), as have the easier multi-user admin tools for content creation, editing and maintenance, path module for renaming URLs, and RSS feed generation and taxonomy module for the specialties pages. These are sponsored pages aimed at our non-general medical readers (cardiologists, diabetologists, psychiatrists, etc), and pull in content from across BMJ Group.
  2. Drupal has thousands of contributed modules. CCK, Views, and Panels 3 were particularly useful for building bmj.com key pages such as the article, home, channel, and medical specialty pages.
  3. Drupal core is modular. It has flexible and very well documented hooks API. This makes it easy to extend it with new modules, and to integrate it with other services.
  4. Drupal has one of the biggest, most vibrant, and responsive open source communities with a knowledge database to mine and learn from.
Describe the project (goals, requirements and outcome): 

bmj.com before Drupal

Between 1998 and November 2011 bmj.com ran on an old custom made proprietary platform made up of JavaServlets and CGI based application and templating systems. It served the BMJ well but was expensive to maintain, and updating the site with new features was often difficult.

In 2009 we decided to migrate bmj.com to a platform that would make it easier to adopt new cutting edge technologies and integrate more seamlessly to other services and BMJ Group product sites.

We wanted a platform that allowed us to address business requirements more efficiently and add new features and functionalities quickly. We also wanted a platform that made it easier for colleagues in the BMJ editorial and marketing teams to create and edit website content.

After a careful analysis of leading open source content management systems, we selected Drupal 6.

The challenge

The BMJ’s huge online archive dates back to 1840 when the weekly print journal launched. More than 80% of online traffic is directly to article pages, so the main challenge was to boost discoverability and online engagement by promoting related article and multimedia content from across BMJ Group, links to resources from similar medical specialty collections, and tabbed widgets displaying most read/searched/shared/commented (where applicable).

Article pages consist of more than 50 metadata elements in total and include also the ability for them to auto display in the appropriate channel (research, news, comment, education) publication date, citation information, metrics, post publication peer review (pre-moderated “rapid responses,” which the journal pioneered in the mid 1990s) and supplementary data files. This was all accomplished using the Drupal taxonomy and CCK fields in combination with Panel 3 Ctools API. The content is stored in NLM XML but the underlying schema had been amended a few times to accommodate new features and functionality as part of the BMJ’s drive for digital innovation.

The core team

Three Drupal developers from the BMJ technology team undertook primarily front end development, along with two web designer colleagues, a project manager, business analyst and tester. The bmj.com editor represented the business owner (BMJ editorial team). Both he and the project manager were part of the project steering group. Its membership also included the BMJ editor-in-chief, deputy editor, head of technology, publishing director, and head of marketing. The HighWire team’s focus was on back end integration with HighWire services and comprised two Drupal developers, a project manager, and strategic business lead.

Overview of the HighWire infrastructure

The chart below illustrates HighWire back end infrastructure, beginning with the F5 load balancer, to the Varnish accelerator, Apache instances, CDN, database server etc...

Overview of the HighWire infrastructure

Project management

We used Agile to manage the development of this project. Agile uses an iterative and emergent approach to development. We felt this best suited the project’s business owner, the BMJ editorial team, which has a very consultative and reflective culture where decisions are often revisited and refined. Also, the business owner needed to liaise heavily with other stakeholders (BMJ marketing, corporate sales, customer service and institutional sales teams) both during and after the requirements gathering process, to ensure the new site maximized commercial opportunities.

It took two years to complete the project with the initial requirements phase lasting approximately 10 months (elapsed) and the rest of the time spent on developing the software. The site requirements were broken down into user stories. These were then estimated and prioritised. A great deal of focus was placed on the article page. As mentioned earlier, 80% of our users land on this page, the majority of them directly from Google searches. The old article page contained a lot of legacy features built up over 10 years. We had to ensure that the page provided a better user experience and met our key business objectives. Breaking down page features and questioning the validity and priority of each item allowed us to build the most important features first and test these early with users. This process was also helpful at the end of the project, when we had to decide what items might be removed or postponed to meet the deadline.

Agile was also chosen due to the complexity of the project, not just in terms of the site features and data set but also the fact that this was a co-development project between two organisations in different time zones and with limited experience in the technology chosen. Each organisation had to bring in additional resources with specific domain knowledge at short notice. There were a lot of unknowns and risks in this project and Agile provided a way of managing these as well as providing a solid overall structure to manage the project.

Good communication and a positive, collaborative approach were also critical. We worked to fortnightly iterations with meetings between BMJ and HighWire taking place at the end and start of each organisation’s working day. Show and tells with the business provided useful feedback, some of which would filter into the next iteration planning meeting via the business lead who was responsible for prioritisation. Underpinning this approach was a powerful but simple cloud hosted Agile tool called Pivotal Tracker which allowed us to manage stories through their lifecycle, allowing developers and stakeholders to add info and comments, and providing progress reports based on historical performance (velocity).

Source control management

In the first two weeks of the project we established a flexible software development environment for the two teams. This comprised up to 10 developers and designers who would be working on the same code base from various locations.

We decided to use Git distributed source control system. Git has made software development easy, efficient, and fast. It allows each software developer/designer to have the complete repository with full history and revision tracking capabilities without dependency on network or central repository access.

We divided the project code base into three repositories: Drupal core and contrib modules, HighWire modules, and BMJ modules. We also adopted a standard Git workflow branching model, where each repository has three Git branches. These were “dev,” “test / stage,” and “stable.”

We set up a GitHub account as a central repository to make interaction between the BMJ Group Technology and HighWire development teams more easy. The GitHub web interface makes code reviews easier. It simplifies some version control features. For example, we were able to make suggestions for code changes using Fork feature for the HighWire repository which we’ve only read access.

Configuration management

One of the biggest Drupal development challenges was that configuration information is saved in the database when changes were made in Views, Panels 3, CCK, and other modules. This makes it more difficult to implement continuous development processes where several developers work across multiple environments. There is, however, a Features module that allows the configuration to be exported to the code. This makes version control more straightforward. Once we created and set Features based modules, it was efficient to use Drush shell commands (drush fu module-name) to export the configuration changes to the source code to be tracked via Git.

Data modelling and content types

One of the most important challenges was how to design our content types. The BMJ article content type in particular is quite complex with a lot of metadata elements (author details, publication date, vol/issue number, section, series, category, taxonomy, relations to other articles, open access flag, etc...). The Drupal CCK module makes it easy and efficient to create new content types. It was used to create more than 50 CCK fields of the article content type data fields which were captured as part of the data modelling process.

We had to go through some long iterative data modelling processes to develop and optimise the article content type. We needed also to decide how many content types and types of field nodes were needed, and how to make the right choice between CCK fields and taxonomy terms.

Data migration and integration to content data database

The key business content of bmj.com is the article content (research, news, education, and comment). This is directly stored on an XML based database called the ATOM store. This is managed by HighWire and contains hundreds of thousands of articles dating back to October 1840 when the journal launched. Since July 2008, when the BMJ adopted a fully online first publishing workflow (choosing a UK-focused subsection of the content to appear weekly in print), the ATOM store gets updated many times a day.

HighWire decided to migrate all article content metadata (article title, authors, sections, citation information, publication date, plus other metadata), and taxonomy terms, to Drupal. The main body of the article continues to reside in the HighWire based Markup service. The Markup service is integrated with Drupal so it can directly serve and cache the article body to the end user.

Drupal and HighWire article data store integration

Panel 3, Views, and CCK are the most important and powerful Drupal add-on modules for our project. They are used by almost all pages on bmj.com. After the HighWire team created the article content node type with all the relevant metadata elements mapped to CCK fields and taxonomy terms, the article content archive in the ATOM store was migrated to the Drupal article node database. We then started the process of assembling article, home, and channel pages using Drupal admin tools. It was incredibly efficient and flexible to use Views to filter queries by CCK fields and taxonomy terms and and render the returned results into panels using Panels 3. which are assembled into article, home, channel and specialty pages.

Article content

Article content is edited by BMJ editorial staff and published into the XML ATOM store database. HighWire deployed a service that automatically feeds article metadata (article title, authors, publication data, citation information, etc..) to the Drupal database, while the body of the article content is kept in Markup Service. Panels 3 module and Ctools plugin API were used to integrate the article content page to the Markup service. This makes it possible to serve and cache the body content in the Drupal panel layer.

The article content page structure is complex and contains various panel tabs to display contextual information about related content, metrics, rapid responses, and a submission form for article responses. These subpages are created using a Panel 3 variants feature.

The diagram below details HighWire services, BMJ Group and Drupal platform integration.

BMJ Group and Drupal platform integration

Article comments

The BMJ pioneered post publication peer review when it launched pre-moderated article based “rapid responses” in the late 1990s. Soon afterwards it broke with more than 150 years of tradition by ending the practice of publishing print based letters to the editor. Each week the “cream of the crop” is chosen as to populate the print and online letters section. We decided to create our own commenting feature for this high value user-generated scholarly comment by using the CCK module. We felt the default Drupal commenting system did not have all the required flexibility and features (the ability to add custom fields and specific moderation feature, for example). The rapid response content type has custom fields (name, email, address, affiliation, terms and conditions and competing interests etc), plus the ability to upload supplementary files (images, tables etc). Previously these had to be submitted separately by email.

Home and channel pages

The main landing pages of the site are the home, research, news, education and comment channels. These pages are updated at least twice daily. These pages include a carousel/slideshow and “channel highlight” panels. Both of these are updated manually by editorial staff. The Drupal Nodequeue module allows editorial staff to pick any article content they want to be featured in these panels. The rest of the home and channel pages are updated automatically as Drupal databases gets updated via an automated feed from the ATOM store.

Specialty pages

As mentioned earlier, bmj.com contains more than 230 specialty pages. These pages provide users with alternative access to more than 50,000 articles from across BMJ Group, grouped around medical topics and clinical disciplines, These are marketed as landing pages for non-general readers, where they can access the latest articles and other resources related to their medical specialty.

All article content is tagged by a taxonomy system during the publication process. The Drupal taxonomy module in conjunction with Views is used to query and filter by the taxonomy term and Panel 3 is used to assemble the specialty portal pages. As a new article gets published and added to the Drupal database, it automatically appears in the relevant specialty portal page.

Multimedia pages

In 2008 the BMJ began publishing more video and audio content, including the launch of a weekly podcast on a WordPress site. Again, a key requirement of the project was to boost discoverability of our growing multimedia archive. This necessitated the migration of all video and audio to Drupal (making them searchable from bmj.com), and the creation of a new multimedia channel with an exciting white on black design to boost the colour contrast of our films. We created a podcast and video content type using the CCK module with the relevant required CCK fields which allows the multimedia producer to enter the title, body, audio or video location (the audio and video files are uploaded into our RTMP streaming server) and chapter information. We’ve developed custom multimedia player using Flowplayer API and Drupal CCK fields.

Contextual widgets

One of the objectives of the redesign project was to integrate bmj.com with other BMJ Group sites and third party services. Most of this third party information is syndicated in RSS (XML) or JSON format. As we are using Panel 3 to assemble pages, we developed a custom module which converts and renders the RSS and JSON feed headlines in to page panels using C-tools plugin API.

You will see plenty of widgets on the home and channel pages (bottom and right side) being displayed, using this module. The latest jobs and latest from BMJ Group tabbed widgets (located near the bottom of the home page) are generated using this module. The Google Analytics Core Reporting API service provides JSON feeds, which this module renders as tabbed widgets displaying most read, shared and searched (bottom right of the home page).

Some of the widgets are contextual which is implemented using Panels 3 Ctools API. The jobs widget at the bottom right of the cardiology specialty page, for example, displays related jobs in cardiology.

Theming the site

The BMJ Group web design team made a decision to use the 960 grid CSS framework. As all pages are made up of Panels 3, we had to modify the default Panel 3 layout and integrate it to 960gs framework. The design team preferred to use BMJ Group standard CSS class and ID names for consistency and reuse global CSS across sites. The Drupal theme devel tool in combination with Drupal theme overriding functionality made the task of changing templates easier.

Search

There are two ways of searching BMJ content: simple and advanced search. Simple search is driven by the Drupal module Apache Solr Search which allowed us to index and search all BMJ content types (article content, video, podcasts, article responses, etc). We decided to use HighWire’s managed search system for the advanced search of article content.

Performance and scalability

bmj.com is a very busy site, with more than 6 million page views and 1.5 million unique visitors a month. The performance and scalability of the new redesigned site in Drupal was a key requirement of the project. Again, thanks to the open source model of the Drupal community, there is a Drupal 6 distribution for high traffic sites that is optimised for performance and scalability.

We used Pressflow from Four Kitchens. Pressflow has built-in performance optimisation support for all the different layers of the Drupal application stack, and was used on bmj.com for the PHP5 code optimisation, database optimisation, replication of the MySQL database and Varnish reverse proxying

Database queries are the main bottleneck for busy sites such as bmj.com. The Drupal solution to this problem is the Memcache module, which is used to shift MySQL caching operations in to fast memory to speed up the site

In the performance optimisation phase of the project, we spent a few days load testing all key pages such as the article, home, and channel pages. The PHP profiler, XHProf in conjunction with Drupal devel module allowed us to spot specific performance problems in PHP code or MySQL queries. This helped us review and do further optimisation of the PHP code, MySQL queries, Views and Panel 3 settings.

In addition to database caching (Memcached) and page level caching (Varnish), we were able to implement panel level caching per page as all pages were built using Panels 3. We developed a new caching module called Panels Hash Cache that gave us very fine-grained control over Panels caching. It has since been ported to Drupal 7.

To make the page download speed even faster, we decided to store all static assets, images, CSS, and JavaScript in our CDN. The Drupal CDN module made integration of the Drupal site with our CDN straightforward and with a single switch it alters all static assets URL to the CDN.

Drupal provides additional page download speed performance improvements as it allows you to automatically minimise JS and CSS files, cache and reduce HTTP requests by combining multiple CSS and JS files.

Conclusion, challenges, and next steps

The new bmj.com launched on 8 November 2011. When the project was first conceived, we had five main objectives:

  • To boost discoverability of new content as it gets published each day, particularly on the homepage
  • Increase user engagement
  • Increase overall traffic
  • Promote the four existing and two new content channels (research, education, comment, news, multimedia, specialties)
  • Drive referrals to other BMJ Group product sites

In November and December bmj.com recorded the highest number of homepage views for 2011 (497,419 and 523,305 views respectively) excluding robot/crawler counts. Previously the highest was in January (473,575), when the site published a series of articles about MMR vaccination which attracted global attention). Total full text views reached a peak 1,843,484 views in December 2011, almost 300,000 higher than the second highest month (1,558,227, in October).

A good index of user engagement is voting on our weekly online poll, which before November 2011 usually had between 200 and 300 votes per week. They now routinely attract between 700 and 800 votes, no doubt due to a more prominent homepage slot. Four polls since November 2011 have received more than 1500 votes.

The new-look channel pages with automated content feeds are also proving effective at promoting new content. In January 2011, for instance, there were 301,294 visits to current articles from the channel pages. In January 2012 this figure was 1,144,588.

Finally, bmj.com, as BMJ Group’s flagship product website, appears to be driving more traffic to sister sites. In January 2011 there were 55,409 clicks on links to our specialist journal sites, modules and Masterclasses on BMJ Learning, and the BMJ Careers UK jobs site. A year later there were 108,037.

User feedback has been broadly positive. We launched with a video guide to the new site and a feedback email address, and contact information for BMJ Group’s customer services team. We continue to refine the site and have identified a number of development projects to meet the journal’s strategic priorities. These include a fully mobile optimised mobile site in 2012, geo-targeting content for UK visitors, and real-time-post moderated commenting on some journalistic content.

Modules/Themes/Distributions
Why these modules/theme/distribution were chosen: 

All bmj.com article content types are made using CCK. All channel and article pages are made up of components made using CCK fields, Views, Panels, Pathautho. The Features module is used to export configuration changes made in Drupal and add on products to source code to be tracked via GIT. Editorial staff use the Nodequeue module to promote headlines in to the slideshow panel.

Community contributions: 

While working on bmj.com we developed following contributed modules:
Drupal 6 Entity Cache
Panels Hash Cache
Is Robot?
Syslog NG
NLM Field

BMJ Group and Drupal platform integration
Overview of the HighWire infrastructure
BMJ.com comment
BMJ.com multimedia

Comments

fishfree’s picture

A huge and prolific project!

Prague man’s picture

I agree, very motivating.

jasondaniels455’s picture

This was definitely worth the time and effort given the awesomeness of the result.

F.Abdou’s picture

One of my main sources for medical information. Thank you!

Php mysql developper.

itserich’s picture

Is the carousel slider - in the upper left of the front page - a contributed module or a custom module?

I am now using Drupal 7, but spent a lot of time in Drupal 6 searching for such a slider and tried many modules.

Thank you and congratulations on a nice looking site.

trisketonni’s picture

Thanks for your feedback.

The carousel is a custom module. Initially we thought in doing it with the views_slideshow module, but it didn't met all the requirements and we decided to implement our own style plugin.

You can find more info here http://views-help.doc.logrus.com/help/views/api-plugins

However I always find more useful to read code. You can read the default views style plugin implementation.

Best

haggins’s picture

Great project!
Could you please provide more details about how you managed different apache instances (especially syncing files for redundancy)?

pwaterz’s picture

We have a F5 load balancer in the front, which balances between 2 varnish. Varnish balances between 3 apache. We use NFS for storage, and our NFS is raided. The infrastructure behind this site is massive.

PJW

tsaorin’s picture

It seems a well positioned competitor of Project Ambra de PLOS, and far much better than a journal published using Open Journal Sytem (OJS). Scientific journals have to blend news, sections and articles, and this case is really powerfull. I hope I'll see more a more publishers taking this way.
Thanks!

jacopo3001’s picture

I am doing research for a similar project. But there are few things I dont understand, mainly for my ignorance on the topic.

I understand that articles XML and markup are managed on Highwire system, which works as a back-end editorial repository system. Fine.

Then some data are synced to Drupal's DB, and some other (body?) are cached only at request?
Why some are copied, and some other are not? Is that just to save DB space? or is there something else I don't see?

Then one stupid question due to my ignorance: why Highwire is used on the backend? what does it do that could not be done in Drupal directly?

thanks

Andrés-B’s picture

It is a shame, but seems like wordpress wins more a more sites every day.

dberhane’s picture

The BMJ site is still using Drupal. In fact, recently, we've migrated from Drupal 6 to Drupal 7. There is a separate case study for it at:

https://www.drupal.org/node/2323999

The Journal site URL has been changed to: http://www.bmj.com/theBMJ