h3 id="summary-problem-motivation">Problem/Motivation

When league/html-to-markdown converts an <a> tag that carries a title attribute (common in Drupal-rendered HTML), it produces a Markdown link with an inline title:

[Link text](/en/some/path " See Link text tooltip")

When the export is run with --rewrite-links, the rewriteMarkdownLinks() method uses the regex ([^)]+) to capture the URL. Because that pattern matches everything up to the closing ), it captures the entire string including the title:

/en/some/path " See Link text tooltip"

That string is passed to pathToFilename(), which sanitises all non-word characters to underscores, producing a garbled slug:

en_some_path___See_Link_text_tooltip_

The exported file then contains a broken local link instead of the expected en_some_path.

Steps to reproduce

  1. Have a node whose rendered HTML includes an &lt;a&gt; with a title attribute, e.g. &lt;a href="/en/services/foo" title="See Foo"&gt;Foo&lt;/a&gt;.
  2. Run drush content-first:export --rewrite-links.
  3. Open the exported .md file and observe the link target is a garbled underscore string instead of the sanitised path.

Proposed resolution

In rewriteMarkdownLinks(), strip the optional Markdown link title from the captured URL string before passing it to pathToFilename() or prepending the base URL. The title follows the URL after whitespace and is wrapped in double or single quotes.

The fix is applied to both callbacks inside the method (the --rewrite-links block and the --assets-base-url block):

$url = trim(preg_replace('/\s+(?:"[^"]*"|\'[^\']*\')\s*$/', '', $matches[2]));

Two unit tests are added to ContentFirstCommandsTest covering:

  • A plain link with a double-quoted title.
  • A link whose title contains HTML entities (e.g. &amp;amp;).
Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

gedur created an issue. See original summary.

gedur’s picture

Issue summary: View changes

  • gedur committed ace7bb96 on 2.x
    Issue #3589157: Fix --rewrite-links garbling URLs when anchor has a...
gedur’s picture

Status: Active » Fixed

Now that this issue is closed, review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, credit people who helped resolve this issue.