Strange markup in book render functions [#1898]

Snippet from book_render():

$output .= "<dt>". l($node->title, "node/view/$node->nid") ."</dt><dd>". book_body($node) ."<br /><br /></dd>";

Is using definition list here is really correct? This should be replaced with appropiate div's or even better, funnel through theme()

Same goes for the function book_print():

$output .= "$node->title";

if ($node->body) {
$output .= "<ul>". book_body($node) ."</ul>";
}

Why wrap body inside <ul>?

Comment	File	Size	Author
#22	xml-export-06.patch	616 bytes	puregin
#22
#20	xml-export-05.patch	23.89 KB	puregin
#20
#19	xml-export-04.patch	23.89 KB	puregin
#19
#18	opml.png	39.94 KB	Dries
#11	print_css_div.patch	1.36 KB	puregin
#11
#9	02_book_printer_friendly_div.patch	1.78 KB	puregin
#9
#6	book_printer_friendly_div.patch	1.57 KB	puregin
#6

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

al CreditAttribution: al commented 11 June 2003 at 00:31

Assigned:

Unassigned

» al

Yeah, it's all nasty. I patched this ages ago, but no one seemed interested. The patch now won't apply due to all the changes since then.

I'm on this - expect a new patch to hit my sandbox shortly.

Comment #2

ax CreditAttribution: ax commented 11 June 2003 at 06:03

please remove the superfluous <ul>s and <li> [1], too.

to solve the problem of invalid nested <hx> tags, maybe we should change <h$depth> to <h1 class="book-h$depth">? this way, we could use <h2> - <h6> in book pages. alternatively, we should only use <h5> and <h6> in book pages - we shouldn't have nestings deeper than 4 in printed books, then.

additionally, i think it would be a good idea to add id's to all book headings so they can be jumped to via fragment identifiers (url#fragment-identifier). i often wish this would be possible when referencing the last paragraph of a long page of documentation ... these id's should be of the form <hx id="foo" name="foo"> [2], with foo probably being the node-id of the single book page.

adding id's to headings would also be a good idea for headings / sections in single book nodes. for example, it would be nice if i could link to list point 10 in the drupal installation instructions [3] via something like url#10. these id's had to be added manually. the question is: how can these id's be named to be unique within a /printed/ book? probably <node_id>_<section_id> ...

[1] [ [drupal-devel] Drupal handbook formatting and heading tags | http://lists.drupal.org/pipermail/drupal-devel/2003-June/025860.html ]
[2] [ XHTML Spec, HTML Compatibility Guidelines, Fragment Identifiers | http://www.w3.org/TR/xhtml1/#C_8 ]
[3] http://drupal.org/node/view/260

Comment #3

al CreditAttribution: al commented 15 June 2003 at 14:23

Fixed in latest CVS.
Uses all of Ax's suggestions:
- <h1 class="depthX">
- Fixed to use CSS rather than tables.
- Changes link at top to standard "breadcrumb" style, rather than hierarchy in tree view.
- Removes odd and superfluous <ul> tags and the like.
- Makes printed docs have proper <html> and <body> wrapping, etc.

Comment #4

(not verified) CreditAttribution: commented 28 September 2003 at 20:12

Comment #5

puregin CreditAttribution: puregin commented 5 May 2005 at 07:20

I'm re-opening this, because the problem persists in CVS.

1) The current CVS version (1.288.2.4) of book module still wraps $node->body in an <ul> element in book_print_recurse(). This produces non-conformant HTML.

I'm not sure why this was done in the first place - I'm guessing someone wanted to insulate from having bad markup in a book page throw off the enclosing page.

Getting rid of the offending UL tags is a one-line solution, but what is the right thing to do here?

I'd like to see something very structural - wrapping the entire output (including the generated sectional header) of the book_print_recurse() wrapped in a <div> element. This will properly nest true structural sections, making the structure accessible to applications via DOM. It will also make it easier to use XSLT, for example, to transform into various XML based export formats, for example. Any reason not to do so?

Also... when we get the 'printer-friendly' version of a page, the subtree rooted at that page is returned, and the root page is marked up with a class="book-h1". Is there any reason why we don't compute the depth of the root page, and use that to generate the "in-place" header? In other words, if a section is "book-h3" relative to the top-level book, shouldn't it be "book-h3" whenever it is viewed?

In this case, we could apply the same principal to generating the <div> tags around each section.

Comment #6

puregin CreditAttribution: puregin commented 5 May 2005 at 22:58

Assigned:	al	» puregin
Priority:	Normal	» Critical

File	Size
book_printer_friendly_div.patch	1.57 KB

I am attaching a patch which fixes improper printer-friendly output, as discussed.

The patch fixes book_print and book_print_recurse() to insert a <div> start tag before emitting the header, and to close this

after subsections have been recursively generated.

The <div> tag has an id attribute of the form id="sect-123", where the node's nid is 123, and a class attribute of form class="sectn" where the depth of the section is n.

I'm marking this issue as critical, since the present HTML output plays havoc with any attempts at sane PDF generation.

I will address the issue of where printer-friendly subsections should be 'rooted' in a follow-up.

Djun

Comment #7

Dries CreditAttribution: Dries commented 6 May 2005 at 08:53

Let's not abbreviate 'section' to 'sect'. Also, we use '-' to separate words in CSS names.

- If the section ID uses the node ID, maybe we should use 'node-' instead of 'sect-'?

- If the class uses depth, maybe we should use 'depth-' instead of 'sect'?

The same is true for your other patch where you suggest to use 'n'. Better to make that 'node-'. (Feel free to merge both patches.)

Looks like the book.module generates a _lot_ of CSS IDs/classes. If the introduction of these new IDs/classes allows us to remove some of the existing IDs/classes, please do.

Comment #8

Steven CreditAttribution: Steven commented 7 May 2005 at 04:26

Here's my idea.. why not use regular h tags for the book structure, and renumber the header tags in the content, starting one level after the current book depth?

Here's a function which does this:

function fix_up_headers($text, $level = 1) {
  // Find all header tags that are used (if any)
  if (preg_match_all('/<h[0-9]+/i', $text, $tags) == 0) {
    return $text;
  }
  // Discard duplicates and sort them by number
  $tags = array_unique(array_map('strtolower', $tags[0]));
  natsort($tags);
  
  // Renumber them and replace them
  $i = $level;
  foreach ($tags as $tag) {
    $from[] = '@'. preg_quote($tag) .'(?![0-9])@i';
    $from[] = '@'. preg_quote(str_replace('<', '</', $tag)) .'(?![0-9])@i';
    $to[] = '<h'. $i;
    $to[] = '</h'. $i;
    $i++;
  }
  $text = preg_replace($from, $to, $text);
  
  // Change level 7 headers and higher to divs.
  return preg_replace(array('!<h([7-9]|[1-9][0-9]+)!i', '!</h([7-9]|[1-9][0-9]+)!i'), array('<div class="header-\1"', '</div'), $text);
}

That way you can use header tags safely in the content (which is useful in the context of a single page) without messing up the book output.

Comment #9

puregin CreditAttribution: puregin commented 7 May 2005 at 07:50

File	Size
02_book_printer_friendly_div.patch	1.78 KB

The attached patch changes the div id to the form 'node-123'. I was concerned that a document fragment might be created with both a book section and a non-book section from the same node, leading to non-unique id's, but at this point it seems like an unlikely scenario.

I've changed 'class=sect1' to 'class=section-1'. I was trying to be lazy and use the DocBook section name convention to make XSLT conversion a little easier, but it's really not saving any effort, in the end, and this is clearer.

I think 'section' rather than 'depth' is appropriate - it describes what the role of the div element is.

I've closed the other issue regarding id's in header elements, because I've replaced the whole 'class="book-hx"' business in headers with 'class="book-header"'. Since the level can now be selected by context with respect to the enclosing div, separate classes for headers are not required. That is, we can style with CSS like the following:

.section-1 h1.book-header {
    font-weight: bold;
    font-size: 2.2em;
}
.section-2 h1.book-header {
    font-size: 2em;
}
.section-1 p  {
    text-align: justified;
}
.section-3 p  {
    margin-left: 3em;
    margin-right: 3em;
}

Note that the section-2 h1 will be bold, since the section-2 div is nested within the section-1 div. Also, all paragraphs regardless of section will be justified; and section-3 paragraphs will be indented left and right.

Because we can style everything by section, not just headers, and because the styling specifications respect the sectional hierarchy, it's possible to create very elegant and sophisticated book pages with very little CSS. Even I can do it! :) I'm definitely not a designer, but I've played around a bit with this - have a look at http://www.puregin.org/book/print/129 for an example.

I will attach patches for misc/print.css and misc/drupal.css separately. I don't understand why, but the present version (cvs) of these files has all of the h1.book-hx selectors in drupal.css - the printer-friendly page actually looks at print.css. Am I missing something?

For the record, this approach seems to have been suggested by clairem in http://drupal.org/node/8049 where I have a little rant about headings in book pages. Twice :)

Djun

Comment #10

puregin CreditAttribution: puregin commented 7 May 2005 at 08:10

Steven - that's a neat idea, and very clever, clean code! But it still doesn't catch problems such as, for example, an h5 inside of an h1 section with no intervening h2..h4 headers, or?

I'm inclined to be a bit suspicious of the approach, though - I rather think it's 'doing magic behind the user's back'. If people are going to insist on putting headings inside of pages, it should be possible. And it would be odd for users to discover that what they typed isn't what they get.

Comment #11

puregin CreditAttribution: puregin commented 7 May 2005 at 08:32

File	Size
print_css_div.patch	1.36 KB

Here's the patch for print.css.

There's a couple of commented-out lines - these show how to make section headings 'run in' as part of the following paragraph (or other inline element). I am assuming that print.css is supposed to be something of an example that people would use to experiment with? Please feel free to improve this in any way - as I said, I'm far from being a graphic designer, and I'm also not really sure what the intention of this file is.

Comment #12

Steven CreditAttribution: Steven commented 7 May 2005 at 18:40

Puregin: nope, the code correctly handles missing h numbers. If the book page only uses h2, then that will be the first new level. If it uses h2 and h4, and it is inside a 3rd level book page for example (whose title would be h3), they they get renumbered to respectively h4 and h5 (no missing number).

From a semantic point of view it is crazy not to use h# for book titles.

Comment #13

Steven CreditAttribution: Steven commented 7 May 2005 at 18:44

A bit more verbose: I think the overwhelming majority of users browse a book on the site itself. Within a single page, there is a h1 tag for the page title. So it makes sense for a writer to use h2, h3, etc inside a single page. This is semantically valid HTML.

On the other hand, the print output places the content in a whole new context. I would expect the book module to take whatever steps necessary 'behind the user's back' to ensure it displays correctly.

Comment #14

puregin CreditAttribution: puregin commented 7 May 2005 at 20:43

Sorry, I replied to the last post on the list - I'll include the post here for the record, and expand a little.

Steven, I think that the issue you're addressing is actually a different one than I am trying to fix in my patch, which addresses how sections are wrapped.

My patch would still allow users to write headers in their book pages - it would then be up to post-processing - or something like your approach - to deal with how embedded headers are dealt with.

Perhaps we should take up your suggestion in another of the book.module issues dealing with the sequences of headers? How about http://drupal.org/node/8049, or http://drupal.org/node/4118?

With respect to your suggestion - I understand now that you're proposing to apply the re-writing only to the printer-friendly output. I'm conflicted about this idea - on the one hand, the output which it produces is clearly an improvement over the present output. On the other hand, it changes what the user types to make it 'right' - I guess this is in the long tradition of HTML being interpreted very forgivingly by web-browsers and other applications. On the balance, I support your idea - I guess it's in the spirit of 'make things easier for the user'.

With respect to the actual code - I don't think it will handle the case where h2, h3, and h4 are all used in the text, but the sequence h2, h4 occurs. To put it another way, the problem of checking whether headers are in a legal sequence is context sensitive, while a regular-expression search and replace (even if the replacement sequence remaps the targets) is context-free. When I say legal sequence, I mean legal in terms of book hierarchy, not in terms of the XHTML definition, which doesn't address hierarchy, because headers are titles, not sectional elements. SGML/XML publishing DTDs define such things via inclusion/exclusion on true sectional elements)

The case I described is, admittedly, a relatively infrequent occurrence, and the general solution would be way to difficult to deal with - it would involve constructing and manipulating a DOM, or something along those lines. Ugh!

Djun

Comment #15

cel4145 CreditAttribution: cel4145 commented 18 May 2005 at 17:24

The Drupal doc team is very interested in being able to import/export book content. I'm sure that others would be interested in it, too. Djun has indicated that this patch is a good step toward having that functionality

So +1 if the developers can help him to get this patch so that it can be implemented.

Comment #16

puregin CreditAttribution: puregin commented 1 June 2005 at 10:37

This patch is subsumed in my patch for issue http://drupal.org/project/comments/add/1482, which provides for XML export of books.

Note that the patch for print.css in this issue still applies with the new patch.

Comment #17

puregin CreditAttribution: puregin commented 6 June 2005 at 09:30

This issue has been fixed with the application of patch from http://drupal.org/node/1482.

However, I'm not sure if the print.css file patch has been commited.

Dries?

Also, as I mentioned, someone with real page/graphic designer's expertise should probably look over this and tweak/replace with better layout.

Comment #18

Dries CreditAttribution: Dries commented 6 June 2005 at 20:30

File	Size
opml.png	39.94 KB

I think this version generates invalid OPML and invalid Docbook code. The Docbook does not seems to validate as XML because sometimes a book-tag is closed with a section-tag. In OPML's case, the section-classes look weird (see screenshot). I'm guessing something is wrong with the computation of the depth.

Are the class-sections part of the OPML standard? I'm having a hard time guessing what they are about or why one wants to use these.

A couple of code comments:

1. Don't use tabs. Even in absence of tabs, the spacing isn't always 100% consistent -- sometimes two spaces are used.

2. The use of " (double quotes) and ' (single quotes) is sometimes confusing. Only use single quotes when it shortens the code. Often your code can be shortened by using double quotes (and escaping double quotes in the embedded string). Example: $output .= '<div id="node-'.$node->nid. '" class="section-'.$depth.'">'."\n";

3. Write $depth == 2, not 2 == $depth.

Comment #19

puregin CreditAttribution: puregin commented 7 June 2005 at 00:07

File	Size
xml-export-04.patch	23.89 KB

I am now supplying the correct number of parameters to the '_post_node' visitor definitions, so the depth is now passed correctly. This fixes the problems with mismatched tags and strange section depths.

The DocBook output for top-level nodes validates against Docbook XML V4.1.2 (using onsgmls). Level 2 and deeper nodes generate Chapters and sections as document fragments and so won't validate per-se. I've checked that they are well-formed, and that they validate on a manual embedding.

OPML now validates against the 1.0 DTD (though there seem to be some problems with the DTD itself)

I now encode the "id" attribute as string in type attribute of outline for OPML, in order to store this metadata in an existing attribute. The use case I have in mind is someone exporting the book structure (hierarchy) via OPML, editing it and re-importing to modify the structure of the book. I have removed the "class" metadata.

Checked for tabs/whitespace - I hope this is now OK.

Now using ($depth == 2) rather than (2 == $depth).

I've replace single quotes with double where code is simplified by this.