In the core openid Drupal module (modules/openid/openid.inc), the functions _openid_link_href and _openid_meta_httpequiv perform simple parsing of HTML in order to locate openid provider information; these calls contain regular expressions which depend on the order of the attributes for link and meta tags, respectively.

This breaks, for one, compatibility with Community-ID (http://source.keyboard-monkeys.org/projects/show/communityid), which outputs it's link tags with the "rel" attribute _following_ the "href" attribute - which is valid HTML, but gets missed by the existing parser code.

Example:

Works (Drupal finds the provider information):

<link rel="openid2.provider" href="https://open.login.yahooapis.com/openid/op/auth" />

Doesn't work (Drupal does not find the provider information and returns an error):

<link href="https://open.login.yahooapis.com/openid/op/auth" rel="openid2.provider" />

To reproduce, either install Community-ID and try to use an OpenID account with it to sign into Drupal; Or copy the contents of any OpenID URL page; find the link tag which has rel="openid2.provider" and reverse the order of the "rel" and "href" attributes as above. Doing this produces a "Sorry, that is not a valid OpenID..." message in response to the login attempt.

The fix is fairly simple: I rewrote these two functions to be more robust by not depending on the sequence of the attributes and being more lenient about white space in a couple places.

These patches take into account this issue - http://drupal.org/node/206476 - and afaict will not regress that. I also tested these changes against my own Community-ID RC3 install, Flickr.com and myid.net which all worked without issue (previously only Community-ID was broken - just showing that I did do some checking to try to ensure I didn't break anything else).

Applies to Drupal 6.12 and Drupal 7.x-dev as of 2009-05-21 (applicable section of code is the same, except for minor source formatting differences - which is why two different patches).

Please include in main drupal/openid 7.x tree - or let me know if anything about the above is not correct or confusing, etc.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

j.somers’s picture

Status: Active » Needs review

Status: Needs review » Needs work

The last submitted patch failed testing.

c960657’s picture

Would it be better to parse the page using DOMDocument::loadHTML() and then find the elements by traversing the DOM tree? Parsing HTML using regular expressions is generally not as easy as it may seem.

jslag’s picture

is the 6.x version failing because the patch is marked as applying to 7? Just tried it on the current DRUPAL-6 & it applies properly.

jslag’s picture

Status: Needs work » Needs review

Just tried applying vs. HEAD, and it goes smoothly. All tests are passing for me, with 0 exceptions. Perhaps something elsewhere in the codebase was causing the exceptions when this was tested in June?

Status: Needs review » Needs work
Issue tags: -link, -content, -html, -rel, -parsing, -parse

The last submitted patch failed testing.

Anonymous’s picture

Status: Needs work » Needs review
Issue tags: +link, +content, +html, +rel, +parsing, +parse

openid-link-meta-parsing-6.x.patch queued for re-testing.

c960657’s picture

FileSize
5.73 KB

This patch uses DOMDocument::loadHTML() for parsing the HTML. Parsing HTML using regular expressions is generally a bad idea.

c960657’s picture

FileSize
5.83 KB

Reroll.

basvredeling’s picture

The problem with the first drupal 6.x patch is that the source file name is incorrect. Applying the hereby attached patch should work correctly and fix problems with the community id openid link provided.

Status: Needs review » Needs work

The last submitted patch, openid-link-meta-parsing-6.x.patch, failed testing.

basvredeling’s picture

The patch I just uploaded shouldn't be tested against the latest revision from CVS. So I'm not surprised it doesn't pass. It's a drupal 6 patch, HEAD is drupal 7. I created a new issue for the 6.x branch: http://drupal.org/node/865098

c960657’s picture

Status: Needs work » Needs review

For now the D7 patch in #9 is the one up for review.

c960657’s picture

#9: openid-parse-html-2.patch queued for re-testing.

nocturn’s picture

No progress after 2010? D7 and D6 without this patch still fail on my CommunityID provider

c960657’s picture

Version: 7.x-dev » 8.x-dev
FileSize
5.33 KB

Rolled new patch with git. Applies to D7 and D8.

c960657’s picture

FileSize
3.7 KB
sun’s picture

Status: Needs review » Needs work
+++ b/core/modules/openid/openid.inc
@@ -371,24 +371,32 @@ function _openid_nonce() {
+    foreach ($html_element->head->link as $link) {
...
+    foreach ($html_element->head->meta as $meta) {

If the DOM unexpectedly does not contain a HEAD or LINK tags, this will trigger notices. Let's add a !empty() condition here before attempting to iterate.

+++ b/core/modules/openid/openid.inc
@@ -371,24 +371,32 @@ function _openid_nonce() {
+      if (preg_match('/(\s|^)' . $rel . '(\s|$)/i', $link['rel'])) {
+        return trim($link['href']);

Missing preg_quote() to $rel. Note that the second argument to it should always be specified ('/' in this case).

For the actual regex, I believe you rather want:

@^(?:\s*)?([rel])(?:\s*)?$@/i

Lastly, I'd rather return the more explicit $matches[1].

9 days to next Drupal core point release.

c960657’s picture

Status: Needs work » Needs review
FileSize
3.8 KB
3.86 KB

Let's add a !empty() condition here before attempting to iterate.

Good point. I added an isset() (it seems that empty() checks for text content inside the element).

For the actual regex, I believe you rather want:
@^(?:\s*)?([rel])(?:\s*)?$@/i

No (if I understand you correctly). The rel attribute may contain a space-separated list of link types (see the HTML5 spec), so if rel is e.g. "foo", it should match <link rel="foo bar">. That is not supported by your pattern, right? But I did add the ?:.

Lastly, I'd rather return the more explicit $matches[1].

I'm not sure what you mean? The regexp operates on the rel attribute, but we return the href attribute. The only $matches[1] that appear in the patch is the old code that is removed.

sun’s picture

Title: patch for openid parsing of link rel=... and meta http-equiv=... to improve robustness » Improve robustness of OpenID parsing for link 'rel' and meta 'http-equiv' attributes
Category: bug » task
Status: Needs review » Reviewed & tested by the community
Issue tags: -link, -content, -html, -rel, -parsing, -parse +External Identities Initiative

Thanks for clarifying and adjusting!

Looks ready to fly for me.

catch’s picture

Version: 8.x-dev » 7.x-dev
Category: task » bug
Status: Reviewed & tested by the community » Needs review
Issue tags: -External Identities Initiative +Needs backport to D7

Committed/pushed to 8.x.

Moving to CNR against Drupal 7 since there's already a patch. Reports above suggest this is a bug rather than a task.

mathieu’s picture

I wonder if this will also fix the issue where `rel="OLDopenid.server"' (where there are other characters before the "openid.server" string) is positively matched by the _openid_link_href() function on D7...? It seems like it will, but I haven't seen this issue reported anywhere else.

c960657’s picture

@mathieu: I believe it will. Feel free to test and review the patch, and then mark it “Reviewed and tested by the community” if everything looks fine.

  • catch committed 1fe62f2 on 8.3.x
    Issue #469590 by c960657, basvredeling, bradpeabody: Improve robustness...

  • catch committed 1fe62f2 on 8.3.x
    Issue #469590 by c960657, basvredeling, bradpeabody: Improve robustness...

  • catch committed 1fe62f2 on 8.4.x
    Issue #469590 by c960657, basvredeling, bradpeabody: Improve robustness...

  • catch committed 1fe62f2 on 8.4.x
    Issue #469590 by c960657, basvredeling, bradpeabody: Improve robustness...

  • catch committed 1fe62f2 on 9.1.x
    Issue #469590 by c960657, basvredeling, bradpeabody: Improve robustness...