This project is not covered by Drupal’s security advisory policy.
Serves flat HTML files, Working PHP files, or even remote website mirrors (realtime or cached) in a Drupal context.
Originally designed to just serve a subsection of legacy, malformed HTML pages under a Drupal theme, this module has been extended to enable wrapping a Drupal shell around legacy applications, (PHP, ASP, Perl) allowing most of the old functionality to keep working, without much code review needed.
Say you have a set of old pages that just can't, won't, or shouldn't be migrated into Drupal nodes.
- A Discussion board archive,
- Exported Presentation Slides,
- Files that are updated from an external tool (eg statistics dumps)
Custom, Crappy HTML, or an external PHP script
- any Web application that's too much work to rewrite.
- Even an old CGI script in another language can be 'wrapped' in Drupal
by entering the URL as a source.
Wrapper attempts to automatically take most of the hard work away from getting this done.
(By putting past mistakes on a life-support system)
Instructions
- Visit [ Administer : Site Configuration : Wrapper ] to adjust global settings.
- Create a 'wrapper' instance. Wrappers are managed as nodes, so it is created at [ Create content : Wrapper ] .
- You must set a path value for a wrapper node to indicate where wrapped content will appear on your site.
- When creating a wrapper node, you must set the *method* that will be used to extract content from the list shown on the node edit form, and also the /source* that content will be fetched from.
- Different extraction methods require different parameters, so configure them individually.
- A wrapper can be set up to 'wrap' just a single page, but it's most powerful when wrapping a whole section of external content.
URL paths
A wrapper node without a path would mean that your whole site was being wrapped - this is also possible, but advanced. It can have some unexpected side-effects, as it necessarily acts by intercepting all your 404 requests and trying them on the target system.
Many different virtual locations can be wrapped with different rules on one installation, if needed. This happens by just creating another 'wrapper' type node.
input formats & filters
For most imports, you don't need to change the filter format. You probably want full (unfiltered) HTML most of the time. It is really inadvisable (and a security risk) to 'wrap' a site you don't trust without filtering.
The HTML we are working with is expected to be ready to display already, but you can add any filter enhancements you like.
You may wish to point a wrapper at a directory of text files - logs, statistics or even a readme. When doing that, you should probably add a line-break-converter.
Features
"Features.module" support
"Wrapper" instances can be exported and imported via features module.
Module features
A number of extraction methods are provided -
- string token matching
- xpath patterns
- regular expression
- XSL templates
- and integration with import_html module full semantic extraction.
Which one is appropriate for you depends on your source data. XPath is the most preferred method, though regular expressions or tokens are probably the fastest.
Transactional forms - including POSTs, GETs, uploads, sessions, redirects and cookies should be supported mostly. It's been tested against a number of strange targets and upgraded to handle a number of form-handling scenarios. There are probably cases where it would fail however. Authentication and forms over SSL for example are not yet tested.
Theory
This whole method is admittedly inefficient.
It is not a long-term solution for real site-serving, but is an emulation layer that helps old things just keep working under a new system.
- Local file serving through wrapper - is not too bad performance wise.
- Local executed server code - can be slower, as often the legacy code does its own layer of template rendering and layout, which then gets stripped and discarded, so that's a waste.
- Remote URL fetching and re-rendering - can never be efficient as it triggers a remote page request each time, that remote site has to build the page and respond, then that content is stripped and reformatted into the Drupal context. If the results are only twice as slow as the original, you'd be lucky.
TODO - work on the caching options. Caching cannot help in cases where remote applications need to work with forms and active content.
Examples : The test directory
A set of files are available in the 'tests' subdirectory of the wrapper module.
They illustrate a number of common cases that can be wrapped, and contain some instructions on the configurations that should be used to wrap them.
The examples are all available as 'Features' that can be enabled (if you use features) for experimentation. They will be in the 'Testing' section of Features admin.
SCENARIO : Emulating an old Guestbook Script
Say, for example, you have an old, but trustworthy 'guestbookplus'
script on your site. And you like it. But you also want to run Drupal,
and have installed Drupal over top of your site, leaving the old
/guestbookplus directory alone.
Your first problem is that /guestbookplus link may no longer work,
due to what Drupal did to your root .htaccess.
You may want to fix that first http://drupal.org/node/30334
Now we want to put the Drupal-style chrome, theme and navigation
around that set of pages. Due to naming issues, we cannot use exactly
the same URL as before, so you either have to rename the old one,
or use a new name for the 'wrapped' version.
We'll just call it /guestbook .
Visiting [ Create content : Wrapper ]
will allow us to create a wrapper instance.
In the node url alias settings, enter the local path -
the virtual alias for the new section
- 'guestbook'
In the 'source' field of the wrapper node, enter the real filepath of the
HTML/PHP subsite files that will be read - 'guestbookplus'
By default, a file directory will be looked for relative to drupal root.
"files/oldsite" or "/var/www/oldsite" are also valid filepaths.
If you save now, (in 'Pass through' mode) you can probably immediately start to
access the files that used to be in /guestbookplus from the new /guestbook
location.
However, what you'll be seeing is the WHOLE of the old pages,
including old banners or headers or menubars, crammed into the Drupal
page area. Probably breaking your layout unless the original pages
were really lightweight.
The next step is to set the rule to extract the actual page content.
You'll have to find a string/marker of some kind that can reliably
mark the beginning and the end of the content you want to frame.
This is not always easy, but many pages, hopefully, will have
something like
[!--BEGIN CONTENT--] ... [!--END CONTENT--] in the source.
If not, you'll have to figure something out.
Just removing or blanking old [!--INCLUDE header.inc--]
pragmas may be good enough.
The import_html module provides a much more advanced way of
extracting content, but for now, we'll just hope that simple
token-matching can work.
Preparing legacy PHP code for wrapper
As the pages are served from virtual URLs corresponding to their
original pathnames, relative links - to nearby pages, images, and
javascripts - should continue to work - but only under clean-urls.
Making old files relative links work under non-clean-urls would
require much source rewriting.
Files that are not in the 'rewritable' file extension list in the
settings will be passed through directly, so binaries should be safe.
However, if you create Drupal URL Aliases to specific pages within
the wrapper context, things can get confused. The best way to resolve
these problems is to check the error logs and see what files are
being missed by figuring out what is wrong with the URL path of the
request being made.
Before a wrapped PHP page is run, the effective directory it would
be run from (the directory where it was found) is added to the
php_include path. This means that relatively included PHP files, like
local routine libraries, SHOULD be found as if the php page was
running unwrapped.
Be aware that most PHP execution is done in Drupal in an eval()
context.
This means that file level 'global' variables
(vars simply declared at the top of a file, not inside any function)
are not really globals, and have to be explicitly declared global to
act like that.
This will not work:
-------original-file.php-----
<?php
// Top of the file
$settings = array('repeat'=>5,'fallback'=>TRUE);
function get_settings(){
global $settings;
return $settings;
}
// etc
-----------------------------
Must be changed to:
-------original-file.php-----
<?php
// Top of the file
global $settings;
$settings = array('repeat'=>5,'fallback'=>TRUE);
function get_settings(){
global $settings;
return $settings;
}
// etc
-----------------------------
Globals are of course bad in general.
It's possible that a few terms (like global $user) may conflict
with Drupals version of the same name. Results will be unpredictable.
SCENARIO : Wrapping sections of external sites - realtime proxy
As an example, wrapper.module comes preloaded with a proxy configuration
that mirrors the whole of drupal.org!
There is a wrapper 'instance' configured to answer requests under
the path 'drupal.org' eg http://localhost/drupal.org/project
will return the contents of http://drupal.org/project in your site.
Note that this is a realtime, non-caching proxy and the remote page is
fetched every time! Pretty inefficient.
Issues
Menu items for virtual pages are not created automatically, and breadcrumbs and menu expansion behaviour for nodes that are not really there may be imperfect.
Current behaviour is the URL is trimmed back upwards until a match in the menu is found, then that entry is used as a navigational point.
Remote site wrapping doesn't always perfectly rewrite the embedded links as would really be needed to run a full proxy.
This module was originally written for Drupal 4, and has been upgraded somewhat yearly since.
TODO
Extract titles or other page elements?
Test more session-passing issues when trying to proxy remote sites.
Credits
Development of this module has been sponsored and supported by Sparks Interactive
Project information
Minimally maintained
Maintainers monitor issues, but fast responses are not guaranteed.- Project categories: Content editing experience, Content display
- By dman on , updated
This project is not covered by the security advisory policy.
Use at your own risk! It may have publicly disclosed vulnerabilities.

