I've got a fairly long HTML document with a long list of internal anchors. I *believe* what's happenning is that after print_pdf_generate_path
converts the HREF elements to absolute links (by calling print.pages.inc's _print_rewrite_urls
), it tries to restore anchor links to relative links again using a preg_replace call that is somehow failing.
I'm using the wkhtmltopdf-amd64 static binary, version 0.11.0 rc1 with the following options: -d 300 -s Letter --page-size Letter --outline --enable-internal-links --footer-font-size 7 --footer-right '[page]'. If I try to generate the page by calling the binary from the command line, my internal anchors are respected.
Here's the page I'm trying to render as PDF with internal links intact: http://www.oasis-open.org/policies-guidelines/tc-process
Thanks,
Jose
Comment | File | Size | Author |
---|---|---|---|
#2 | internal-anchor-1313754-7x.patch | 1.07 KB | hackwater |
#2 | internal-anchor-1313754-6x.patch | 1.06 KB | hackwater |
Comments
Comment #1
jcnventura CreditAttribution: jcnventura commentedNo, what's happening is that it's successfully being able to retain the ability to navigate the links in the print version (as you can see in the print link at the bottom of the page you indicated)..
However, this is clearly not working correctly for the PDF case (using at least wkhtmltopdf). The annoying part (for me) is that everything is based on the ouput of the print version, so I'll have to see where that's being done and disable it in this case.
Comment #2
hackwater CreditAttribution: hackwater commentedI'm focusing pretty tightly on print_pdf.pages.inc and the
print_pdf_generate_path
function therein. Rather than disabling the calls to_print_rewrite_urls
, I think I have a solution. It works for this particular PDF, anyway; let's see if we can validate it for a more general case.I noted a couple of things after adding a debugging function to the code (dumping the contents of $html to a file so I could do visual diffs against two points: after we convert the anchor elements, and after we attempt to make internal anchors relative again:
After dumping the
$html
into a file after each preg_replace, I found that the second preg_replace was not doing anything, at least in the case of my test HTML file: the two files were binary equal. This means that there isn't a double-encoded "#" in the initial $html; changing the %2523 to a # fixes this conversion. But it still wasn't working. More interestingly, opening the HTML output in a browser and clicking on the internal links ALSO wasn't working properly. I traced this to the<base>
tag the Print conversion adds to the file; at least in the case of wkhtmltopdf, which respects the base tag, getting rid of the base tag solves the internal anchor problem:Patches rolled against 7.x-1.x and 6.x-1.x
Comment #3
hackwater CreditAttribution: hackwater commentedComment #4
jcnventura CreditAttribution: jcnventura commentedThe base tag is absolutely necessary because of images and other media resources.
However, changing the %2534 does indeed make it work on wktmltopdf, but I don't think it's possible to make it work properly in tcpdf or dompdf.
I've committed the patch to git.