Genomeweb, a print and online publisher for the molecular biology research community, has successfully migrated its web site to Drupal. Cyrve built the Drupal site and migrated data into it from a legacy Microsoft SQL Server application. Drupal craftsmen Moshe Weitzman and Mike Ryan developed the site, and Maureen Lyons authored the theme.

  1. Data Migration
  2. Premium Content
  3. Email Domain Authorization
  4. Email Newsletter Integration
  5. Member Messages
  6. Challenges
  7. GIVEBACK

Data Migration

The Genomeweb data migration was the coming of age of the migrate and table wizard (TW) modules. Cyrve has contributed these modules on drupal.org for your migration pleasure. The methodology goes like this:

  • Get your legacy data into comma separated file format (CSV) or mysql tables.
  • Run table wizard analysis on these tables. TW creates a default view for each table, including every column. Thats right - instant Views integration.
  • Review each column with the client and annotate using the textboxes in TW admin. At this point, there is common understanding about what each legacy column does.
  • Each set of data to be migrated needs to be represented by a view. The default view TW created for one of your tables can be used, but you can also create custom views to filter the data, join multiple source tables, etc.
  • In the migrate module, create content sets based on these views for each distinct destination step. For example, different content types will usually be migrated using different content sets. Similarly, taxonomy term migrations, comment migrations, and user migrations will be distinct content sets from nodes.
  • Use migrate web UI to map legacy columns to Drupal properties such as old.date => node.created. Columns which don't map cleanly are mapped using a migrate module prepare hook.
  • Run and re-run migrations for each content set. Migrate can run a full migration or subsets like “Next 20 story nodes from 2008” or “Records, 20, 22, 29 from User content set”. Migrate provides a quick and accurate way to rollback migrations so you can tweak the code and re-run. Migrations may be run via drush or via the web.
  • Migrate provides a dashboard in drush and on the web so you can monitor progress.

Planning the Genomeweb migration was complex as content, users, terms, comments, and Ubercart transaction records all had to be created in Drupal. In the end, the migration went quite smoothly on launch day.

Premium Content

All stories with a lock icon beside the title are protected by a premium membership requirement. When an unauthorized user follows such a link, she sees a teaser and then an offer for a trial membership.

Cyrve chose to implement this requirement without using the node access control API. That API is perfect for cases when unauthorized should not ever see premium content titles or teasers. In our case, we borrowed ideas from premium module which uses the nodeapi(‘view’) hook to replace the full body of the premium node with custom “upgrade” message.

Customers and companies may actually buy access to each newsletter on the site so the access control calculation considers the newsletter for a given piece of content and the user's roles. Users buy these memberships using an Ubercart powered store and are assigned role(s) at the end of the transaction. These roles expire after 3 months or 12 months, depending on the offer that was purchased.

Email Domain Authorization

Genomeweb supports an alternate way to gain access to premium content. Genomeweb wants to give away free access to all members of academic institutions. So, users whose email addresses end in .edu are granted premium access. In addition, companies may purchase a company wide license for all their staff. Thus, domains like foo.com or bar.net can be manually added to a ‘premium domains’ table once their payment arrives.

To support this requirement, Cyrve uses Email registration and Email change confirmation modules. Email registration lets users login using an email address instead of a username and Email change conformation requires that users click on a link in a verification email when they choose to change their email address.

Email Newsletter Integration

Genomeweb maintains 14 very popular email newsletters. Genomeweb Daily News boasts 30,000 subscribers and publishes twice a day. Mass email delivery is a specialized expertise and is technically ill suited to PHP and Drupal. After a vendor search, Genomeweb chose Lyris as its email partner. Cyrve then built full integration between www.genomeweb.com and the Lyris API. Editors author their content in Drupal and never copy/paste content into Lyris. Instead, they use a custom Drupal form to schedule newsletters and customize their contents.

Similarly, all email subscribe/unsubscribe activity is handled in Drupal. This way, Drupal has complete knowledge of the payment status for any given user or domain. Drupal periodically POSTS to Lyris a full, up to date email subscriber list.

Member Messages

In the upper right hand side of each page, Genomeweb can broadcast small messages to its users. Cyrve built an audience targetting application to meet this requirement. Examples of such messages are ‘Your BioInform subscription is about to expire’ or ‘Get ProteoMonitor headlines delivered to your Inbox’. These messages are content specific (e.g. ProteoMonitor message appears on a Proteo story) and user specific (e.g. expiration date is less than 30 days away). Further, messages have an interval during which they may not be repeated. This prevents barraging the user with excessive messages. Any given message can be permanently dismissed; thats useful if a user never intends to subscribe via email (for example). Messages may also be targeted at particular email domains and may have custom expiration dates.

Challenges

Genomeweb’s site features lots of dense information blocks. The Drupal 6 block caching feature is instrumental to serving up fast content to authenticated users. The default caching strategy of BLOCK_PER_ROLE is perfect for this site, as lock icons are role specific depending users access to premium content. Given this heavy reliance on block cache, the site does wobble a bit when all caches are cleared. Custom and preemptive cache clearing is a real need here and Cyrve intends to work on this during the Drupal 8 development cycle.

menu_rebuild() is another performance problem for most Drupal sites, including genomeweb.com. Lets all help get this locking issue committed to Drupal 6.

GIVEBACK

Genomeweb and Cyrve are committed to GIVEBACK to the Drupal community. We have received so much, and want to keep the flow of contributions growing. As such, we contributed the following during the project:

Comments

ludo1960’s picture

..Excellent work guys and gals!!

sunward’s picture

wow. Well done.

What did you use for the most viewed/ most emailed/ blog section? And for the menu sections that open up (Young investigator profile is the first)

Also, considering it is such a big site and with many users, did you have any concerns regarding speed? Was boost used?

moshe weitzman’s picture

Most viewed/Most emailed/Most blog are each Views that are grouped into a tab component by Jquery UI module's Tabs feature. That Young Investigator widget is Jquery UI accordion feature.

Speed is OK since we used block caching extensively. When the cache clears though, we do experience stampede problems. Drupal 7's custom block cache feature will be a useful fix for this. We can't use boost on this site since we have a '3 free clicks' feature which lets you view premium content 3 times before getting the pay barrier. So Drupal has to see each page to count it. So no boost, unfortunately.

J. Daglees’s picture

Looks awesome, good job. :)

cels’s picture

Thanks a lot for your work to port Primary term to Drupal 6.

I have one suggestion. There are a maintainer requesting commit access (ericduran http://drupal.org/user/244460). He has made good patches to version 6, can you help him for commits this patches?

#641560: Request for commit access
#588166: Pending Patches for Primary Term module

And thanks for your work!

JBI’s picture

From what I can see the premium is still in Drupal5 with a growing user base in 6
http://drupal.org/project/usage/premium

Even so some users are unsure is it's stable enough http://drupal.org/node/645984

The last dev version from jerdavis http://drupal.org/user/228997 back in January 2009.

As moshe is a first class contributor is imput could be usefull to the maintainer.
The advantage for the community would also to find an other and clean way around permission with OG.

MarlonRibunal’s picture

What did you use on the "slideshow" (image + Title + teaser)? Or is it a customized module?

frankcarey’s picture

Mike and moshe, excellent work! how long did the migration process take?

Frank Carey

dheeraj.dagliya’s picture

great work and contributions!!

I would be exploring Table wizard and migrate module soon. These modules would be of real help for all those looking forward to move over to Drupal.

datenrettung’s picture

sorry guys, but you should blushing for shame ;-)

have a look: http://validator.w3.org/check?uri=http://www.genomeweb.com/&charset=(detect+automatically)&doctype=Inline&group=0

221 Errors

but, it looks good, thats it.

jaochoo’s picture

The "Member Messages" look nice. I was looking for something similar for our project. What module did you use for that or did you develop one your own? Have you considered contributing that module, too?

giorgio79’s picture

Great Stuff!

I was just checking out Premium module last week, but was not happy with it as I did not find where can I set which user role is meant to see the Premium nodes...

I ended up using the Contemplate module, and inside the body template, I check for the user's role with an if else.

If the user is anon or simple registered, they see the teaser, or their role is customer, they see the full content...And Sayonara Premium :)

Drupal9.9’s picture

Some url's still end with; node/12345 like Drupal. Make it look nice is all that matters for some people. What really matters is seo which many developers know nothing about or don't believe even exists. Very scary!!