Hi,

I'm trying to import the following XHTML file (which is converted from a DITA XML sample file):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
  PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xml:lang="en-us" lang="en-us">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<meta name="DC.Format" content="XHTML"/>
<link rel="stylesheet" type="text/css" href="../commonltr.css"/>
<title>Changing the oil in your car</title>
</head>

<body id="changeoil">
  <h1 class="title topictitle1">Changing the oil in your car</h1>
  <div class="body taskbody">
  <p class="shortdesc">
    Once every 6000 kilometers or three months, change the oil in your car.
  </p>
  <div class="section context">
    <p class="p">Changing the oil regularly 
will help keep the engine in good condition. 
    </p>
  </div>
  <p class="li stepsection">To change the oil:</p>
    <ol class="ol steps">
      <li class="li step"><span class="ph cmd">Remove the old oil filter.</span></li>
      <li class="li step"><span class="ph cmd">Drain the old oil.</span></li>
      <li class="li step"><span class="ph cmd">Install a new oil filter and gasket.</span></li>
      <li class="li step"><span class="ph cmd">Add new oil to the engine.</span></li>
      <li class="li step"><span class="ph cmd">Check the air filter and replace or clean it.</span></li>
      <li class="li step"><span class="ph cmd">Top up the windshield washer fluid.</span></li>
    </ol>
</div>
</body>
</html>

However, the HTML file isn't recognized as a page and therefore doesn't show in blue in step 2. And I get the following error message:

I think (due to file suffix 'document') that 'sites/default/files/garage/tasks/changingtheoil.html' is not a html page I can process.

Is this due to some misconfiguration (I tried various combinations of the settings, all with the same result) or a bug?

Any pointers welcome.

Frank

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

dman’s picture

I'd expect that as the file is called 'changingtheoil.html' on the system, then it would be picked up as usual.
Inspecting the stuff that happens inside import_html_guess_file_class() and _import_html_file_classes() ... I'm going to start by guessing that the MIME type registry on your system may not be recognising it as type "text/html", but as something else.

Are you able to see what the result is if you run the php function :

    $mime = mime_content_type($filename);
Frank Ralf’s picture

Hi dman,

Thanks for the quick reply. I ran your script on my hosted server and it indeed returned "application/xml" instead of "text/html" for the MIME type. When I comment the XML and doctype declarations the returned MIME type is "text/html" and the module works properly.

So is this an error or a misconfiguration on the server side? How can I amend this?

TIA
Frank

JFTR, what the W3C says:

Frank Ralf’s picture

I've had a closer look at the functions you mentioned. import_html_guess_file_class() doesn't cater for "application/xml" MIME type so the general "document" is returned:

function import_html_guess_file_class($filename) {
  ...
  if ($mime_type == 'application') {
      return 'document'; // gross generalization
  }
}

So I'd suggest to either add the following more specific code to that function:

  if ($mime == 'application/xml') {
      return 'html';
  }

Or give the file ending precedence over the MIME type when guessing the file format.

Frank

dman’s picture

Category: Bug report » Task

OK, I guess we can do that.
I'm an XHTML standards freak, but I still never thought that 'application' was a good description for an xhtml document.
import_html however can happily run on pure-xml documents without too much effort though, so it's fine to support application/xml for that reason.

I think your additional case would be a fine patch.

Frank Ralf’s picture

Status: Active » Needs review
FileSize
930 bytes

Here's the patch ;-)

Frank

Frank Ralf’s picture

FileSize
464 bytes

Re-uploaded the patch in UTF-8 format without BOM.

  • dman committed 0e6c924 on 7.x-1.x authored by Frank Ralf
    Issue #2448437 by Frank Ralf: HTML file not processed "due to file...
dman’s picture

Thanks! Committed to 7.x-1.x. I'll see if I can get it into 7.x-2.x dev also

  • dman committed 5e9c811 on 7.x-2.x authored by Frank Ralf
    Issue #2448437 by Frank Ralf: HTML file not processed "due to file...
dman’s picture

Status: Needs review » Fixed
Frank Ralf’s picture

Status: Fixed » Closed (fixed)

Thanks! I'm closing this issue then ;-)