In the core openid Drupal module (modules/openid/openid.inc), the functions _openid_link_href and _openid_meta_httpequiv perform simple parsing of HTML in order to locate openid provider information; these calls contain regular expressions which depend on the order of the attributes for link and meta tags, respectively.
This breaks, for one, compatibility with Community-ID (http://source.keyboard-monkeys.org/projects/show/communityid), which outputs it's link tags with the "rel" attribute _following_ the "href" attribute - which is valid HTML, but gets missed by the existing parser code.
Example:
Works (Drupal finds the provider information):
<link rel="openid2.provider" href="https://open.login.yahooapis.com/openid/op/auth" />
Doesn't work (Drupal does not find the provider information and returns an error):
<link href="https://open.login.yahooapis.com/openid/op/auth" rel="openid2.provider" />
To reproduce, either install Community-ID and try to use an OpenID account with it to sign into Drupal; Or copy the contents of any OpenID URL page; find the link tag which has rel="openid2.provider" and reverse the order of the "rel" and "href" attributes as above. Doing this produces a "Sorry, that is not a valid OpenID..." message in response to the login attempt.
The fix is fairly simple: I rewrote these two functions to be more robust by not depending on the sequence of the attributes and being more lenient about white space in a couple places.
These patches take into account this issue - http://drupal.org/node/206476 - and afaict will not regress that. I also tested these changes against my own Community-ID RC3 install, Flickr.com and myid.net which all worked without issue (previously only Community-ID was broken - just showing that I did do some checking to try to ensure I didn't break anything else).
Applies to Drupal 6.12 and Drupal 7.x-dev as of 2009-05-21 (applicable section of code is the same, except for minor source formatting differences - which is why two different patches).
Please include in main drupal/openid 7.x tree - or let me know if anything about the above is not correct or confusing, etc.
Comment | File | Size | Author |
---|---|---|---|
#22 | openid-parse-html-7-D7.patch | 3.8 KB | c960657 |
#19 | openid-parse-html-6.patch | 3.86 KB | c960657 |
#19 | openid-parse-html-6-D7.patch | 3.8 KB | c960657 |
#17 | openid-parse-html-5.patch | 3.7 KB | c960657 |
#16 | openid-parse-html-4.patch | 5.33 KB | c960657 |
Comments
Comment #1
j.somers CreditAttribution: j.somers commentedComment #3
c960657 CreditAttribution: c960657 commentedWould it be better to parse the page using DOMDocument::loadHTML() and then find the elements by traversing the DOM tree? Parsing HTML using regular expressions is generally not as easy as it may seem.
Comment #4
jslag CreditAttribution: jslag commentedis the 6.x version failing because the patch is marked as applying to 7? Just tried it on the current DRUPAL-6 & it applies properly.
Comment #5
jslag CreditAttribution: jslag commentedJust tried applying vs. HEAD, and it goes smoothly. All tests are passing for me, with 0 exceptions. Perhaps something elsewhere in the codebase was causing the exceptions when this was tested in June?
Comment #7
Anonymous (not verified) CreditAttribution: Anonymous commentedopenid-link-meta-parsing-6.x.patch queued for re-testing.
Comment #8
c960657 CreditAttribution: c960657 commentedThis patch uses DOMDocument::loadHTML() for parsing the HTML. Parsing HTML using regular expressions is generally a bad idea.
Comment #9
c960657 CreditAttribution: c960657 commentedReroll.
Comment #10
basvredelingThe problem with the first drupal 6.x patch is that the source file name is incorrect. Applying the hereby attached patch should work correctly and fix problems with the community id openid link provided.
Comment #12
basvredelingThe patch I just uploaded shouldn't be tested against the latest revision from CVS. So I'm not surprised it doesn't pass. It's a drupal 6 patch, HEAD is drupal 7. I created a new issue for the 6.x branch: http://drupal.org/node/865098
Comment #13
c960657 CreditAttribution: c960657 commentedFor now the D7 patch in #9 is the one up for review.
Comment #14
c960657 CreditAttribution: c960657 commented#9: openid-parse-html-2.patch queued for re-testing.
Comment #15
nocturn CreditAttribution: nocturn commentedNo progress after 2010? D7 and D6 without this patch still fail on my CommunityID provider
Comment #16
c960657 CreditAttribution: c960657 commentedRolled new patch with git. Applies to D7 and D8.
Comment #17
c960657 CreditAttribution: c960657 commentedComment #18
sunIf the DOM unexpectedly does not contain a HEAD or LINK tags, this will trigger notices. Let's add a !empty() condition here before attempting to iterate.
Missing preg_quote() to $rel. Note that the second argument to it should always be specified ('/' in this case).
For the actual regex, I believe you rather want:
@^(?:\s*)?([rel])(?:\s*)?$@/i
Lastly, I'd rather return the more explicit $matches[1].
9 days to next Drupal core point release.
Comment #19
c960657 CreditAttribution: c960657 commentedGood point. I added an isset() (it seems that empty() checks for text content inside the element).
No (if I understand you correctly). The rel attribute may contain a space-separated list of link types (see the HTML5 spec), so if rel is e.g. "foo", it should match <link rel="foo bar">. That is not supported by your pattern, right? But I did add the
?:
.I'm not sure what you mean? The regexp operates on the rel attribute, but we return the href attribute. The only $matches[1] that appear in the patch is the old code that is removed.
Comment #20
sunThanks for clarifying and adjusting!
Looks ready to fly for me.
Comment #21
catchCommitted/pushed to 8.x.
Moving to CNR against Drupal 7 since there's already a patch. Reports above suggest this is a bug rather than a task.
Comment #22
c960657 CreditAttribution: c960657 commentedReuploaded latest D7 for the sake of the test bot.
Comment #23
mathieu CreditAttribution: mathieu commentedI wonder if this will also fix the issue where `rel="OLDopenid.server"' (where there are other characters before the "openid.server" string) is positively matched by the _openid_link_href() function on D7...? It seems like it will, but I haven't seen this issue reported anywhere else.
Comment #24
c960657 CreditAttribution: c960657 commented@mathieu: I believe it will. Feel free to test and review the patch, and then mark it “Reviewed and tested by the community” if everything looks fine.