h3 id="summary-problem-motivation">Problem/Motivation
When league/html-to-markdown converts an <a> tag that carries a title attribute (common in Drupal-rendered HTML), it produces a Markdown link with an inline title:
[Link text](/en/some/path " See Link text tooltip")When the export is run with --rewrite-links, the rewriteMarkdownLinks() method uses the regex ([^)]+) to capture the URL. Because that pattern matches everything up to the closing ), it captures the entire string including the title:
/en/some/path " See Link text tooltip"That string is passed to pathToFilename(), which sanitises all non-word characters to underscores, producing a garbled slug:
en_some_path___See_Link_text_tooltip_The exported file then contains a broken local link instead of the expected en_some_path.
Steps to reproduce
- Have a node whose rendered HTML includes an
<a>with atitleattribute, e.g.<a href="/en/services/foo" title="See Foo">Foo</a>. - Run
drush content-first:export --rewrite-links. - Open the exported
.mdfile and observe the link target is a garbled underscore string instead of the sanitised path.
Proposed resolution
In rewriteMarkdownLinks(), strip the optional Markdown link title from the captured URL string before passing it to pathToFilename() or prepending the base URL. The title follows the URL after whitespace and is wrapped in double or single quotes.
The fix is applied to both callbacks inside the method (the --rewrite-links block and the --assets-base-url block):
$url = trim(preg_replace('/\s+(?:"[^"]*"|\'[^\']*\')\s*$/', '', $matches[2]));Two unit tests are added to ContentFirstCommandsTest covering:
- A plain link with a double-quoted title.
- A link whose title contains HTML entities (e.g.
&amp;).
Issue fork content_first-3589157
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
gedur commentedComment #5
gedur commented