This filter cleans up HTML generated by Microsoft Office. It can remove header tags (
<style>, <script>, etc...) and their contents, and can convert HTML entities to their plain-text equivalents. This filter can be used in conjunction with the core HTML filter to completely filter out the plethora of HTML generated by Microsoft Office.
In order to deal with Office-generated HTML, you must not only strip the offending tags but also the markup between them. The core HTML filter can easily deal with stripping the tags by using a whitelist such as
<a> <i> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <img> <h1> <h2> <h3> <h4> <h5> <h6> <table> <tr> <td> <thead> <tbody> <tfoot><br><p><b> and choosing to strip disallowed tags. However, due to a bug/feature it does not strip the content between the tags ( ). This is the gaping void that this module seeks to fill by stripping out that offending content. It also converts some HTML entities to their plain-text equivalents.
Development / maintenance / issue queue policy
I have no immediate plans / funding for further development. However, I will happily accept RTBC patches.
- Maintenance status: Minimally maintained
- Development status: Maintenance fixes only
- Module categories: Filters/Editors
- Reported installs: 370 sites currently report using this module. View usage statistics.
- Downloads: 8,109
- Last modified: December 2, 2014
- Stable releases receive coverage from the Drupal Security Team.
Look for the shield icon below.