This issue shall be the hub for those issues that keep popping up since a couple of days. The feeds are perfectly valid and Aggregator seems to not to choke but merely throw a warning (although this might be just the module lying to me). Currently "infected"
Marzee Labs
Gizra
Darren Mothersele
Lin Clark
Stéphane Corlosquet

Those feeds remain unsuspended until they do make Aggregator in fact choke for now but I really would like to know the cause here.

Update: Scor's feed mysteriously self-repaired so only the original four remain on the list. Looking through the error logs it appears that no other feeds have shown this behavior in the last weeks. Sadly the problem seems to be more than just annoying as Gizra's latest posts aren't showing up on the Planet. The "infected" feeds validate fine btw. #3 contains a debugging idea.

Comments

amitaibu’s picture

At least you have Lin Clark in the "infected" list -- probably one of the few persons that really understands what the RSS schema is ;)

dddave’s picture

Title: Meta: Feeds failing due to -1002 missing schema » Meta: Feeds "looking" broken due to -1002 missing schema

Better title and here comes the error message: The feed from XYXYXY.com seems to be broken, due to "-1002 missing schema".

linclark’s picture

I'm pretty sure that this error isn't actually an RSS schema issue, but instead is a poorly written error related to the HTTP scheme.

From drupal_http_request in common.inc

  if (!isset($uri['scheme'])) {
    $result->error = 'missing schema';
    $result->code = -1002;
    return $result;
  }

Could you add a line there to log the URIs which result in this error?

dddave’s picture

Issue summary: View changes
dddave’s picture

I'll suggest that debugging advise to tvn. Might take a while.
Added a new feed btw. I wonder what triggered this as I cannot recall seeing this issue before.

linclark’s picture

It looks like Marzee, Darren's, and mine all had XML elements with href attributes. I checked Wunderkraut's and Wim Leers, and neither of them had XML elements with href attributes as far as I could see.

I might be that Aggregator doesn't know how to handle XML elements with href attributes and thus passes the empty value into drupal_http_request?

I've changed my feed so that it no longer uses an href on an element. If there is no error for my blog on the next pass, then that's likely the issue.

linclark’s picture

I see that you added scor's blog. I'm pretty sure that he uses Drupal's (or Views') default RSS output, so I don't know what could be causing it.

dddave’s picture

re #6: The error message persists. ;(

dddave’s picture

It seems this does create some real trouble as Gizra's latest content isn't showing up. Going to talk to tvn today.

dddave’s picture

Issue summary: View changes
dddave’s picture

Project: Drupal.org content » Drupal.org infrastructure
Component: Planet Drupal » Other
dddave’s picture

Issue summary: View changes

updated and claryfied the summary

darrenmothersele’s picture

Perhaps this is something to do with the HTTP headers? Perhaps the content type?

For example, these are the HTTP headers from the Get Pantheon blog (picked randomly from the working feeds on Drupal Planet)...

HTTP/1.1 200 OK
Server: nginx
Date: Fri, 17 Jan 2014 07:36:12 GMT
Content-Type: application/rss+xml; charset=utf-8
Content-Length: 80022
Connection: keep-alive
X-Pantheon-Styx-Hostname: styx1560bba9.chios.panth.io
X-Drupal-Cache: HIT
Etag: "1389927433-0"
Content-Language: en
Cache-Control: public, max-age=600
Last-Modified: Fri, 17 Jan 2014 02:57:13 +0000
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Vary: Cookie
x-pantheon-endpoint: 3565b3da-77f9-485c-ad17-c495c4c7d2dd
Accept-Ranges: bytes
X-Varnish: 3941556195
Age: 0
Via: 1.1 varnish

But these are the HTTP headers from my feed:

HTTP/1.1 304 Not Modified
Server: GitHub.com
Date: Fri, 17 Jan 2014 07:32:40 GMT
Connection: keep-alive
Last-Modified: Thu, 19 Dec 2013 21:50:40 GMT
Expires: Fri, 17 Jan 2014 07:42:40 GMT
Cache-Control: max-age=600
Vary: Accept-Encoding, Accept-Encoding

And these are the HTTP headers from Gizra's feed:

HTTP/1.1 304 Not Modified
Server: GitHub.com
Date: Fri, 17 Jan 2014 07:33:58 GMT
Connection: keep-alive
Last-Modified: Thu, 16 Jan 2014 14:15:43 GMT
Expires: Fri, 17 Jan 2014 07:43:58 GMT
Cache-Control: max-age=600
Vary: Accept-Encoding, Accept-Encoding

As you can see the failing feeds (Gizra and mine) are both hosted on GitHub pages, and it doesn't give the correct content type of 'application/rss+xml'.

darrenmothersele’s picture

I just checked Marzee Labs and Lin Clark's feed and they're both hosted on GitHub pages too.

linclark’s picture

It doesn't seem to get in the way of processing. I posted something yesterday and it is showing up on the Planet without any problems.

dddave’s picture

mmh, Gizra's new Planet content is not showing up though. But at least we have narrowed it down a bit.

amitaibu’s picture

@linclark ,
Are you using Jekyll as-well? If so can you share the format of your RSS file. Mine is this

linclark’s picture

I'm using Middleman with the Builder gem. Here's a version of the code I'm using. I technically don't need to make the link absolute in the item because I have the base namespace set, but I do it anyway.

xml.instruct!
xml.rss :version => "2.0", :"xml:base" => "http://example.com", :"xmlns:dc" => "http://purl.org/dc/elements/1.1/" do
  xml.channel do
    xml.title "Your Name"
    xml.link "href" => "http://example.com"
    xml.language "en"
    xml.updated blog.articles.first.date.to_time.iso8601
    xml.author { xml.name "Your Name" }

    blog.articles[0..10].each do |article|
      if article.tags.include? "drupal-planet"
        xml.item do
          xml.title article.title
          xml.link "http://example.com" + article.url
          xml.description article.summary
          xml.pubDate (article.date).to_time.rfc822()
          xml.guid "http://example.com" + article.url
          xml.dc :creator, "Your Name"
        end
      end
    end
  end
end
amitaibu’s picture

I was able to reproduce the error locally.
I have changed in the aggregator the URL from

http://www.gizra.com/taxonomy/term/1/all/feed/ => http://www.gizra.com/taxonomy/term/1/all/feed/index.html (i.e. added /index.html) and the error was gone.

@dddave can you please try to do this change on d.o as-well?

amitaibu’s picture

bump. I have a blog post in the pipe -- would love it to reach Drupal planet ;)

dddave’s picture

First off: sorry I missed this in the first place. But I am sorry to report that this change did not solve the issue.

amitaibu’s picture

> First off: sorry I missed this in the first place

No problem, you are probably swamped with issued :)

> But I am sorry to report that this change did not solve the issue

Hmm, I hoped it would "just" work. So it seems its not just the Aggregator module in the way. Is there a dev server I can get admin access to so I can try to debug it there?

dddave’s picture

Best catch tvn on #drupalorg for such requests.

tvn’s picture

Hi Amitai, you can use the following dev site:
http://links-drupal.redesign.devdrupal.org/

(I created it for this issue https://drupal.org/node/2125757, but no one used it yet and the 2 issues should not interfere with each other anyway).

Here are some instructions on how to work on our dev server: https://drupal.org/node/1018084

I added both of your SSH keys already.

amitaibu’s picture

@tvn, thank you for the dev site.

Some insights:
I've changed the rss link to an atom link now served from http://www.gizra.com/atom-drupal.xml. Clicking on update items, gives me an error, and re-clicking works fine. So it seems that the URL is valid, but sometimes, for some reason it chokes.

I've done the same test on Lin's link, and got the same behavior -- sometimes it errors, some times it works.

amitaibu’s picture

@dddave
Can you please try the change to http://www.gizra.com/atom-drupal.xml (with the known issues as mentioned in #25)

dddave’s picture

I am sad to report that I still get this error consistently even after trying multiple times.

amitaibu’s picture

@dddave,
When you try to re-import Lin's feeds -- does it work ok?

amitaibu’s picture

Debugging more locally, I think I spot the problem:

Occasionally we are getting a 302 response, and the new location is extracted from $location = $result->headers['location'];

However the location returned by Github is /drupal-atom.xml.
So on the next call to drupal_http_request() the URI isn't correct -- it doesn't have the schema or the path.

amitaibu’s picture

amitaibu’s picture

And here's a blog post about the 302 response from Github

dddave’s picture

Lin's feed was still throwing the error yet had content from January fetched. I emptied and refetched which threw the error and didn't catch the content. #meh

I am on vacation until next week so I won't be able to help out here for a while.

amitaibu’s picture

I have followed https://help.github.com/articles/setting-up-a-custom-domain-with-pages so now my Github page is using CDN.

This means the issue should be solved, as Drupal shouldn't be getting a 302. I have tried it on the dev site, and got no error.

@tvn can you please re-add Gizra to the Drupal planet?

dddave’s picture

You are on, were the whole time. The issue is that you didn't get aggregated correctly. Which is the feed url we should use btw. Currently it uses the one provided in #26. It also seems your images are broken upon aggregation: https://drupal.org/aggregator/sources/552

edit: The error is indeed gone. Just hard refreshed your feed.

amitaibu’s picture

Hi @dddave , how was the vacation? :)

> Which is the feed url we should use btw.

Can you change it back to http://www.gizra.com/taxonomy/term/1/all/feed/ please

dddave’s picture

Status: Active » Fixed

I had a very good time.

The feed is working fine now, only the images are broken but #2030877: Review Planet posts for relative URLs could be the cause. Let any other issues discuss in the issue regarding your feed directly. I'll link to this issue on the Planet docs btw.

amitaibu’s picture

Thanks! I've pushed a fix for the images.

darrenmothersele’s picture

I updated my feed a) so it doesn't redirect, and b) to add the .xml extension so it serves with the correct content type from GitHub pages.

Yesterday it was serving without a redirect, but today I noticed this:

curl -I http://darrenmothersele.com/drupal-planet.xml

HTTP/1.1 302 Found
Connection: close
Pragma: no-cache
cache-control: no-cache
Location: /drupal-planet.xml

Then I tried, for comparison:

curl -I http://www.gizra.com/taxonomy/term/1/all/feed/

HTTP/1.1 200 OK
Server: GitHub.com
Content-Type: text/html; charset=utf-8
Last-Modified: Mon, 17 Feb 2014 19:16:42 GMT
Expires: Thu, 20 Feb 2014 10:28:45 GMT
Cache-Control: max-age=600
Content-Length: 89246
Accept-Ranges: bytes
Date: Thu, 20 Feb 2014 10:18:45 GMT
Via: 1.1 varnish
Age: 0
Connection: keep-alive
X-Served-By: cache-am71-AMS
X-Cache: MISS
X-Cache-Hits: 0
X-Timer: S1392891525.039837599,VS0,VE493
Vary: Accept-Encoding

Then I tried again,

curl -I http://darrenmothersele.com/drupal-planet.xml

HTTP/1.1 200 OK
Server: GitHub.com
Date: Thu, 20 Feb 2014 10:19:22 GMT
Content-Type: text/xml; charset=utf-8
Connection: keep-alive
Content-Length: 118871
Last-Modified: Thu, 20 Feb 2014 09:33:26 GMT
Expires: Thu, 20 Feb 2014 10:29:22 GMT
Cache-Control: max-age=600
Vary: Accept-Encoding
Accept-Ranges: bytes
Vary: Accept-Encoding

Second time running it I always get a 200 direct reply. But sometimes I get 302?

Does fixing the DNS to use ALIAS always reply with a 200? If so then I'll have to move do a different DNS provider as it's not supported where I am now.

amitaibu’s picture

>Does fixing the DNS to use ALIAS always reply with a 200?

On Github we actually removed the ALIAS and use CNAME instead - and it fixed the problem.

darrenmothersele’s picture

darrenmothersele’s picture

I decided to move off GitHub Pages. As a side effect, hopefully that fixes this issue.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

kostajh’s picture

Quick note for anyone else who has a blog on GitHub pages that they want on Drupal Planet and is running into the "missing schema" error. We resolved the issue by proxying the RSS feed on GitHub pages through FeedBurner. Not ideal but it works. See https://www.drupal.org/node/2553551#comment-10252437 for more info.