I don't know where the problem is - I just know there's a problem.

I am working with two Drupal 6.22 sites. One site has news articles for our entire university. Our departmental website needs to pull in news articles from the news site that apply to our own department.

Both sites have undergone a lot of changes recently. Before the changes, the news website generated an RSS feed for us using a php script. The departmental site processed the RSS feed and generated new nodes on the departmental site. This processing was done using Feed API and Feed Element Mapper.

Because Feed API is now obsolete, I am trying to get the equivalent functionality to work using the "Feeds" module. Additionally, the RSS feed is now being generated on the news site using a "view" created via the "Views" module.

Everything seems to be fine regarding the generation of the RSS feed - at least as far as I can tell. When I look at the feed, it appears to be a perfectly valid XML file with the proper headers and such. Additionally, when I setup Outlook to read this feed, it appears to be recognized as a valid feed by Outlook. I see all of the individual items in the feed within Outlook, plus the names and values for each field within each feed item. Also, IE 9 and Firefox 3.6 are also able to display the feed items, field names, and field values as well.

However, the Feed Importer that I have created does not appear to be processing any of the items in the feed. There is no indication whatsoever as to what the problem is. I click on the "import" button and just get a re-display of the page. I am using the standalone form at mysite.com/import.

I have used both the "common syndication parser" and the "simple pie" parsers. Same result either way - no new webpages created from the feed.

I then tried going the "fast feed" route, to see if I could create database records from the RSS feed. I was just hoping to gain troubleshooting insight by doing this - it isn't a solution. At that point, when I click on the "import" button, I get the following error message: "cURL error (47) SSL read: error:00000000:lib(0):func(0):reason(0), errno 104 for http://asunews.asu.edu/feed/engext_v3/rss.xml."

I cannot find anything on google.com nor drupal.org regarding the above message. There were no hits on "error (47)." The hits on "errno 104" did not seem to be relevant.

One of the requirements for running "Feeds" is "cURL" support. When I run phpinfo () on my site, I get the following:

cURL support enabled
cURL Information libcurl/7.21.6 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5

Is this sufficient cURL support?

Also, I am not 100% sure how I should be naming the items in the generated RSS feed. I didn't see anything in the documentation about this.

So I was wondering - what names should I be using for the following items in the RSS feed?
1) Author Name
2) Published Date
3) Item URL (link)
4) Item GUID

Does anyone have any ideas as to what could be going wrong, or where I could be going wrong, besides within the above questions?

Thank you in advance for your help, and for reading all of this.

Comments

SteveMM’s picture

Just wanted to follow-up, as I found answers to many of the above questions with help from others.

First, I began using "Feeds XML XPath" instead of "Simple Pie" and "Common Syndication Parser" to parse the feed.

Second, it turned out that I had two basic problems.

Problem one: my employer wrote their own authentication module for Drupal - "webauth." It was getting in the way. On the source website, on the webauth configuration page, had to specify the URLs of the RSS feeds so that they can be accessed without authentication. Without this, apparently there are cURL errors; the end result is that it appears that none of the feed items are processed.

Problem two: on the destination website I made some errors in how I configured the feed mappings.

The example below shows the format of the XML data supplied in the RSS feed:

<?xml version="1.0" encoding="UTF-8"?>
<xml>
<node>
<guid>17595</guid>
<title>Growing volume of electronic data threatens privacy</title>
<date_published>January 20, 2011</date_published>
<description><p>With ever-increasing online transactions and electronic social networking, ...</p></description>
<source_name>Nextgov.com</source_name>
<source_url>http://www.nextgov.com/nextgov/ng_20110107_8262.php?oref=topnews</source_url>
<categories>
<a href="/taxonomy/term/306">Computer Science</a>,
<a href="/taxonomy/term/21">Engineering</a>,
<a href="/taxonomy/term/35">Research</a>,
<a href="/taxonomy/term/104">Science</a>
</categories>
</node>
</xml>

I have changed some of the field names since the original post, so that they make more sense. Additionally, there are lots of feed items - not just one.

Under the feed importer definition, in the mappings, set the source of every field to be "XPath Expression." Then choose the appropriate destination field.

Under example.com/import, choose the feed importer that you have setup. There, link the fields in the RSS feed to the XPath expressions.

In the above case, the "XPath Parser Settings" are set as follows:

Field Value
Context /xml/node
guid guid
title title
field_pubdate:start date_published
body description

Et cetera. I do not check any of the boxes under "Select the queries you would like to return raw XML or HTML."

The above configuration works. Nodes are created, and fields are properly populated with the correct values.

I am still working out bugs with the taxonomy terms. I may try out the solution at http://drupal.org/node/1193272. I just wanted to post what I had found, since people reached out to help me.

SteveMM’s picture

The solution at http://drupal.org/node/1193272 worked for me. The taxonomy terms are all being set correctly now. Note that I have made a change as to what is being supplied in the "categories" field. The new format is as follows.

<categories>Computer Science||Engineering||Research||Science</categories>

The "||" characters are specified as the delimiter between taxonomy terms in Feeds Tamper.

twistor’s picture

Component: Feeds Import » Code
Status: Active » Closed (outdated)