We really need some examples in the docs for the different use cases.

Comments

dman’s picture

Some notes - put here FYI as I work through some issues I encounter.
Not good docs yet, but here's what I managed to make work:

Step 1 - find and analyze your source data

Where are you wanting to get the feed from? What does it look like?

Here is an example of a shop product feed - Flight of the Conchords Merchandise. It uses Google Merchant additions to publish price and product ID, and I want them.
It also has an image I'd like to import.

<item>
<title>Ladies - Too Many Dicks Tee</title>
<link>http://shop.flightoftheconchords.co.nz/node/11</link>
<description></description>
<category domain="http://shop.flightoftheconchords.co.nz/taxonomy/term/2">Apparel</category>
<category domain="http://shop.flightoftheconchords.co.nz/taxonomy/term/3">Ladies Size (US)</category>
<g:price>45.00000</g:price>
<g:id>FOC04</g:id>
<g:image_link>http://shop.flightoftheconchords.co.nz/sites/fotc/files/FOC04-01.jpg</g:... <g:image_link>http://shop.flightoftheconchords.co.nz/sites/fotc/files/FOC04-02.jpg</g:... <pubDate>Sun, 29 Nov 2009 11:31:03 +0000</pubDate>
<dc:creator>admin</dc:creator>
<guid isPermaLink="false">11 at http://shop.flightoftheconchords.co.nz</guid>
</item>
<item>
[/codefilte...

Step 2 - Set up your target Product Content Type with the fields you want to save.

That's pretty normal.

This content type defines the final destination of the data you want to import. Get this schema straightened out first.

Step 3 - Set up a 'Product Feed' Content Type

In this case, we need a node type that represents a type of feed. In this case a 'Product Feed'. Individual suppliers or shops would be instances of this. There will be a feed definition (a single node of this type) for our "Flight of the Conchords" data source.
Just make the content type, Don't make the shop yet.
I may be confused here, but it seemed I needed to create various feed types to support various rules and parsers. This step MAY be redundant

Step 4 - Start the mapping rules

Install and set up Feeds, Feeds UI, and any dependencies.
Create a new Feed importer profile in the admin /admin/build/feeds/

Here we set up the rules for parsing product feeds

  • Name: "Google Merchant Feed importer" - these are the rules
  • Attached to: "Product Feed" - this is the type of data source it will process.
  • Fetcher: default, "HTTP Fetcher"
  • Parser: change to "XPath parser"
  • Processor: default, "Node processor"
  • Node processor settings : Content type: "Product" - these are the things that will be created upon import.
    I choose to 'update' imports when testing, but that's not important right now.
  • Mapping - here's the interesting bit.

You need to make a label for each bit of data you want to extract from the source. eg, "price". Then map that label with a CCK field on the right. This may seem a little redundant, but there are reasons.

The labels you make here will define what values can be set when you go to extract data from the source.
... which is next.

Step 5 - Create a 'feed' node that will use the xpath rules.

"Create Content" of type "Product Feed", named, eg "FOTC Shop". Finally we get to work with the datasource itself.

Figure out your xpath rules


Here, //item/title means that the value of the <title> tags will be parsed as a value, and our mapping rules mean that that value will be put into the node title field. Easy.

Repeat as needed.

Step 6 - Save, import, test, repeat

About now you should be able to start seeing results, but this is still in development, so your milage may vary.

Troubleshooting

First, ensure that your source data is indeed valid XML.
Watch out for namespaces, they get tricky.

Some still-in-development code was used in the screenshots here. Namespaces (g:image_link) do not work without an unstable patch added, and similarly, the "Image FIG" field is an in-development patch to Feeds ImageGrabber

twistor’s picture

First off, wow, thank you. I think FOTC will be our official example feed. This is very thorough, I'll probably copy it more or less into the docs. The dev version, which will become a release soon, contains an extra configuration field called 'context'(ideas on a better name welcome). There is a severe limitation with the current release where each query must return the same number of items, the context solves this problem. In your use case it seems this is not an issue. Namespaces are indeed tricky, PHP doesn't support XPath and default namespaces, although this is actually an issue with XPath in general. I've got a workaround that attempts to solve the problem, but it needs more testing. In the dev, there are also other parsing capabilities, REGEX and QueryPath, these need to be documented as well. I've been reluctant to post the docs I have because this project is moving fast. However, it should slow down soon. Thanks again, sorry for the brain dump.

twistor’s picture

Title: Add document and examples. » Document XPath features.
dman’s picture

* I was working on the namespace troubles yesterday, and solved them (for me) by rewriting the scraper to use proper DOM+XPATH, not simpleXML (which I really don't like, now that I have grokked namespaces). Still unstable and needing review and test cases though. I added my method as an alternative in the UI, so both old and new parsings will be supported.

* Yeah, you've seen the need for what you call 'contexts'
For me I think we need to break it down into two bits - one step to xpath identify each 'item' in a feed or scrape, and a second sub-pattern to get the individual values.
The 'requires the same number of values' check did hit me a few times in testing, and felt like a real weakness in logic there. We need to handle a feed like this that doesn't always have images etc etc. That required quite a rewrite to achieve though, so lets start with a coder.module syntax review, then I can join in with the CVS - if you'd like a hand.

brycesenz’s picture

Dman -

Can you post an example of how you solve the issue of contexts? It seems like you were facing a problem similar to mine. What I have in an XML document laid out as:

  <Products>
    <Product>
      <Title>Product 1</Title>
      <Field_1>Value 1</Field_1>
      <Field_2>Value 2</Field_2>
    </Product>
    <Product>
      <Title>Product 2</Title>
      <Field_1>Value 1</Field_1>
      <Field_2>Value 2</Field_2>
    </Product>
  </Products>

For the most part, I can get the fields to populate, but the Feed will only generate one node with the code that I am using. I feel like this may be due to using the context field incorrectly, but so far my guessing and checking towards the right solution has not been successful.

Any help would be greatly appreciated.

Cheers,
Bryce

dman’s picture

I have not yet solved contexts or items.
I just note that it is needed.
So far I'm working within the limitation of all items needing all fields present. And was on the last release, not current -dev, so I'm a bit behind there I think.

iantresman’s picture

Here's a basic contributed hook_help snippet, to provide basic help information. $output is fine, but the $path variable does not appear to work (containing admin/help), so that $output does not appear. But once that's fixed, it will give notices something to start with.

/**
 *  Implemenation of hook_help().
 */
function feeds_xpathparser_help($path, $arg) {
  switch ($path) {

    case 'admin/help#feeds_xpathparser':
      $output = '<p>' . t('<b>Feeds XPath Parser</b> is plugin for the Feeds module that allows you to extract information from XML- and HTML-formatted Web sites and feeds, parsing them with XPath, and optionally with Regex, and QueryPath.').'</p>';
      $output .= t('<h3>Configuring</h3><p>A detailed version of the summary below can be found here: <a href="http://drupal.org/node/845018#comment-3220604">http://drupal.org/node/845018</a>
<ul>
<li>Optionally create a new <a href="/admin/content/types">content type</a></li><li>Go to: Administer | Site building | <a href="/admin/build/feeds">Feed importers</a>, and create or edit a feed.</li>
<li>In the Fetcher section, select HTTP Fetcher</li>
<li>In the Parser section, select "XPath parser"</li>
<li>In the Processor section, select Node Processor</li>
<li>In Node processor settings, match your Content type</li>
<li>In Node processor mapping, create you own source field name, and select a target</li>
<li>Click <a href="/node/add">Create content</a> and into the fields, add parsing/mapping rules. eg. Into your title field, enter //item/title which retrieves the contents of the source &lt;target&gt; tag.</li></p>');
      $output .= t('<h3>Further information</h3><p>
<ul>
<li><a href="http://drupal.org/project/feeds_xpathparser"><b>Feeds XPath Parser</b> Drupal project page</a></li>
<li><a href="http://drupal.org/project/feeds"><b>Feeds</b> Drupal project page</a> | <a href="http://drupal.org/node/622696"><b>Feeds</b> documentation</a></li>
<li><a href="http://drupal.org/project/querypath"><b>QueryPath</b> Drupal project page</a></li></ul></p>');
      break;

    default:
      $output = '';
      break;
}

  return $output;
}

Links are hard-wired, and probably need converting to a Drupal friendly approved format.

twistor’s picture

brycesenz,

If you're using the dev version, your queries should look something like this:
context: //Product
title field: /Title
field_1 field: /Field_1

this could also work:
context: //Product
title field: //Title
field_1 field: //Field_1

or maybe:
context: /Products
title field: /Product/Title
field_1 field: /Product/Field_1

If you have an attribute that you want, maybe Value 3 you'd use:
field_3 field: /Field_3[@attribute_i_want]

dman’s picture

<quote style="mr-burns">
Excellent
</quote>

brycesenz’s picture

FileSize
310 bytes

Twistor,

I've tried all of those combinations, but none has given the outcome that I want (or expected). Attached is the XML file I've used, and below are the outcomes for each of the possible solutions attempted.

Test 1
context: //Product
title field: /Title
field_1 field: /Field_1

Outcome: Results in 1 new node create, with no fields populated.

Test 2
context: //Product
title field: //Title
field_1 field: //Field_1

Outcome: Results in 2 new nodes created, with the fields populated correctly.

Test 3
context: /Products
title field: /Product/Title
field_1 field: /Product/Field_1

Outcome: Results in 1 new nodes created, with no fields populated.

Test 4
context: //Products
title field: //Product//Title
field_1 field: //Product//Field_1

Outcome: Results in 1 new nodes created, with all fields populated as the two node values concatenated (e.g. Title = "Product 1Product 2").

I'm submitting this largely as an FYI & for documentation purposes, though I'm curious why the double slash (//) seems to work while the single slash (/) breaks the functionality. If nothing else, this should clearly be noted in future releases.

dman’s picture

Try

context: /Products/Product

- that says that the lumps of data you are looking for are the individual 'Product' nodes.
//Product would also work, it's effectively wildcard shorthand for the above path. /*/Product

Then, your context is that 'Product' node, and that's at the top of the next phase. It seems you have to name it however (this means you'll be able to get attributes off it if you need to I guess)
So

title field: /Product/Title
field_1 field: /Product/Field_1

//Title would also work for the above reasons, but it's untidy

brycesenz’s picture

I can confirm that the suggested syntax works as well. Thanks for all of the help!

iantresman’s picture

Can someone suggest some syntax for HTML extraction. I'm not sure whether I enter full HTML, just the tag names, and/or can use wildcards such as * and ?

dman’s picture

@iantresman
Based on the level of detail given in your question, I'd suggest checking out an xpath syntax reference

iantresman’s picture

Thanks for that, I found that this XPath Syntax summary looks useful, and I would suggest that a link to it is placed in the Help text, and/or where you mention that Regex expressions should be in the form /.*/.

I just assumed that they query format would be different for HTML to XML.

Thanks.

rbrownell’s picture

This took waaaayyy too long to find, especially for a newbie to XML. This thread should be boiled down into a documentation with the link to the w3c schools explanation of XPath Syntax right at the top.

Anyways, thanks for posting this at the least it really really helped me get started after two days of head bashing frustration. Whoever did this is the best!

-Ryan

twistor’s picture

I'm very close to boiling this thread down into a handbook. It should apply more or less to XML/HTML thanks for everyone who contributed.

iantresman’s picture

This post includes some links to Firefox/Firebug add-ons that help with XPaths.

iantresman’s picture

The Xpath example in #1 above obviously uses Xpath queries to indicate the specified data required from the source. But I'm having problems retrieving anything from non-RSS sources.

If I select the parsing engine to be HTML, I assume that I don't use Xpath queries, so what do I enter, and how does it deal with variably formatted HTML? For example, if the required title is described: <h3 class="title"><a href="/story336.htm">Xpath takes world by storm</a></h3>, do I enter:

  • <h3><a>*</a></h3>, or:
  • <h3 class="title"><a href="/story336.htm">*</a></h3>

What if there is also HTML: <h3 class="subhead">Sport stories</h3>. How do I distinguish between <h3 class="title"> and <h3 class="subhead">

It would be very useful if an example could be provided using a non-RSS source. Eg. the setup to retrieve (a) title (b) teaser (c) data (d) category (e) URL, from, for example, the news page at The Register.

dman’s picture

You assume wrong.
Xpath works on XML. Parsed HTML, or XHTML is XML. Therefore xpath works on HTML.
That's the point.

The selection of the parsing engine is just about the options used and the strictness it expects. Both end up with a Dom structure that you can run extractions on, using the same syntax.

I really don't know where you got those examples from. See the references on xpath syntax above.
The rss example is just because it's free of clutter. The process is the same either way, you just have to formulate the search to ignore the fluff when looking at HTML pages.
It does take some intuition to pick out the right filter to select the right target sometimes. Which is why no example will work for all cases

iantresman’s picture

>I really don't know where you got those examples from

I guessed. Wrong. Just trying to get some feeds working on HTML pages.

dman’s picture

FYI
//*[@class='subhead']
would select the content of the element classed as 'subhead' anywhere in the HTML page. @ filters by attribute value
So would
//h3[@class='subhead']
With a little more precision if needed
Often if takes a few trials to get it just right.
There is a firefox plugin that helps scan page stricture with xpath I think.

nonsie’s picture

Based on #1 how would one import all categories though? Normal RSS Feed importer that ships with Feeds knows how to handle multiple values but I cannot figure out how to make it work with XPath. Is there a way to store multiple values in one field in Drupal?

yasmen’s picture

Hi!

I wonder if there is any special configuration needed to have XPath parser enabled?! I tried the regular way to enable any Drupal module, yet I can't see it at the parser options.

Any idea why this module isn't visiable?

Hanno’s picture

@yasmen: probably the proposed patch helps: #863732: clear cache at enabling the module

elliotcapelo’s picture

Help please!

My xml feed isn't coming through and displaying on the page. I followed all the above steps but:

I'm confused about Step 3 - Set up a 'Product Feed' Content Type, what kind of content type do we add?

I tried adding just a "page" but this doesn't work, also tried adding a content type the same type as the target product content type but this doesn't work either. Can someone help me and tell me what they did for Step 3.

Thanks Heaps!

dman’s picture

It's a new content type, different from the actual node contents you will be creating.
Just make it. Don't do anything with it until step 5

elliotcapelo’s picture

thanks dman I was able to get it to work, I had to play with the data fields a bit to work out which worked best.

I'm new to xml stuff, but noticed that this creates an "admin" node to import the data then a page is created with the actual content each time the user clicks on import. But looks like I can change this feature in the "Update existing nodes" area.

this is great info though because it saves us a headache. :-)

dman’s picture

Yes. This admin node is just a place to store the config info. Not a user facing content type usually

If you have several sources, shop A, shop B, that need different parse rules, you would have more than one but both would produce product nodes

elliotcapelo’s picture

Ok a new node is created each time I click import on the "admin" node, even though I set node processor to replace existing nodes.

Is there anyway to get it to update the page that was created on first import?

Many thanks!

dman’s picture

In theory, if the GUID (probably the URL) is the same, then setting import to replace or update should work. It does for me, so there could be something special about your source, or something you just need to do.
I do it above by setting my 'link' to be mapped from the URL, and as a 'Unique target'. That is the default I think.

rtbrown’s picture

Hi folks, I've been able to install XPath Parser, Feeds, etc. and can now ingest a feed into nodes. However, I have a field that is problematic! When displayed in the browser (by selection within views), I can see the field. The field however, never imports. Xpath stable provides a query error (same if a field is not present), but Xpath dev will proceed with the import of other fields and create the nodes without any error.

I've been using a tool to test my XPath query strings during this process: http://www.mizar.dk/XPath/Default.aspx (a handy little util!).

When I test the feed using this tool, the result set, regardless of a successful match, will display the XML processed, but I don't see my mystery field? I've created a new display, checked to see that the field is not excluded, created a new view, checked the content type, checked the DB structure, but can see nothing odd about it. It's a simply text (longtext) field. I have another text field that maps / imports fine.

Any guidance is appreciated!

Best --

rtb

rbrownell’s picture

From my experience the dev version is the best one to work with but the difference between stable and dev is that you have to specify a context. In issue #900632: Grab value from outside the context section of XML... At all possible? I've been trying to reach outside of the context area which it doesn't seem capeable of doing... yet.

Essentually the context is what you are going to be populating. Here is an example


<Grandparent>
  <Parent id="1">
    <Child>
      <toy type="favourite"></toy>
      <toy type="leastfavourite"></toy>
   </Child>
    <Child id="2">
      <toy type="favourite"></toy>
      <toy type="leastfavourite"></toy>
    </Child>
  </Parent>
  <Parent id="2">
   <Child id="1">
      <toy type="favourite"></toy>
      <toy type="leastfavourite"></toy>
   </Child>
   <Child id="2">
      <toy type="favourite"></toy>
      <toy type="leastfavourite"></toy>
   </Child>
  </Parent>
</Grandparent>

So if you were to list the Children of a particular Parent (Parent ID 2) your context would be /Grandparent/Parent[@id="2"] it then should list the children of any the parent with id2.

The variables are then run against the context. So if you wanted to pull the variable favourite toy the expressions would be child/toy[@type="favourite"] (note how the context is already handled by the context field.)

Hope this helps, I am still not completely sure what your question is but this is a brief intro to the current Dev version.

-Ryan

rtbrown’s picture

Hey Ryan,

I'm not sure you were responding to my post, if so, the issue for me isn't the sytnax of the dev version...I've got that working fine. It's a strange issue with a field not displaying in the feed, but it displays in the browser when you select it from with in the view.

I can send you the view display & XML Validation (to show how it doesn't display) privately if needed.

thanks --

rtb

rbrownell’s picture

@rtb

This should be made into a new issue and not be part of this one. (Please start one and we can continue on from there)

Yes I was responding to you... Can you confirm if the field is actually making it to the node (i.e. when you view the node is it being displayed to just you and no one else (CCK Permissions Issue)). Thing is when a field does not import it is not displayed so it could either be syntax or something else.

-Ryan

rtbrown’s picture

@Ryan

After, I'm embarrassed to admit, a good number of hours, I thought I'd check the field level permissions on my mysterious field. Sure enough, it was not set for anonymous viewing.

I've been humbled once again...

Best --

rtb

tyler-durden’s picture

My feed has an odd format shown below. I can get the title to parse, but not the other elements.

<Title>AMC</Title>
−
<ItemSpecifics>
−
<NameValueList>
<Name>Make</Name>
<Value>AMC</Value>
</NameValueList>
−
<NameValueList>
<Name>Model</Name>
<Value>
        </Value>
</NameValueList>
−
<NameValueList>
<Name>Seller guarantee</Name>
<Value>Not selected</Value>
</NameValueList>

Any suggestions? I'm just learning xpath functions and I'm stuck on this. Thanks!

sunchaser’s picture

FileSize
29.9 KB

First off , thanx for the mini tutorial writeup
only thing is that at Step 4, I run into something that is different from what you have.

see attached screenshot
Your screenshot seems to allow you to enter "name of source field" at the mapping stage.
In my flow , this is not possible. I just have a dropdown list with 1 option (see screenshot)

I have created a new Content Type with my XML as the source ... but I cannot map anything...

dman’s picture

@sunchaser
That screenshot looks like it has no connection with any valid content type.
Not sure how you can get to that stage. May be a bug.
It's supposed to list all the content type fields that the system knows about.

@tyler-durden
To parse that you need an expression that says "the Value of the NameValueList that has a Name = {n} "
So I think that can be done...
//NameValueList[Name = 'Make']/Value
Should return the "Make" value for you. At least it does in full Xpath. Not sure if it'll work for this context, but it should.

tyler-durden’s picture

Sweet, that worked perfectly dman!!!

Another quick question, to not confuse myself I am running tests with only one item in my test xml file. I know the xml file I will be importing will not always be standard, as in extra fields, maybe missing fields, etc. depending on the item. I believe I read somewhere this can cause problems, is this still true? Is there a way I can parse the xml file to be a strict standard, and have blank or non existing fields put in as blank records for the purpose to make all item feeds standard? If so, could someone point me in the right direction of where to learn how to do this, or another work around?

rbrownell’s picture

@sunchaser. This was a result of the newest release of Feeds setting new standards for parsers (essentually breaking all parsers and forcing them to adapt.) At this point though, the interface for Feeds parsers needs to be improved to accomodate both types.

medden’s picture

Having the same issue as @sunchaser.... no options showing in my SOURCE drop down except one.

Would the best solution be to use the older version of the feeds module? Or to wait for an upgrade to feeds Xpath parser?

Seems odd that feeds would change all the parser settings without checking it's supporting modules wont break first.

twistor’s picture

The latest dev release should solve this problem. You will have to flush your caches to get it to work.

medden’s picture

FileSize
126.75 KB

Tried it with the latest dev release and the latest version of feeds...

Still only get a drop down available for entry, although now it has XPATH Expression as a drop down entry, but nowhere to put any tags in.

I included a screen shot.

Can you tell me which version of the main Feeds module worked?

twistor’s picture

That is the correct behavior. You put in the XPath expression when you configure the source.

medden’s picture

FileSize
109.18 KB

Got it working again using Feeds-6.x-1.0-beta4 and Feeds XpathParser-6.x-1.01

Would rather be using more up to date versions, but until this works think I'm stuck.

Is there anything I can do to help get it working with the newer release?

twistor’s picture

The workflow has changed a little bit. You no longer have to name the source field, the target name is used automatically. The source configuration form will still work the same as before, it just pulls in the names from the target fields.

medden’s picture

FileSize
105.46 KB

Understood. so you set the actual source fields when you create the feed-api type node correct?

It's not very clear to new users where they set the source fields, perhaps a short line in the readme file, or a description under the Mapping select box.

Thanks for pointing it out.... looks like it works perfectly...

twistor’s picture

Wow, you just pointed out a bug. Make sure you select a unique field URL or GUID.

brycesenz’s picture

I'm playing with the latest -dev version (also using Feeds beta7), but it's causing an odd problem. I am reading a data feed in the form of:

  <Products>
    <Product>
      <Title>Product 1</Title>
      <Field_1>Value 1</Field_1>
      <Field_2>Value 2</Field_2>
    </Product>
    <Product>
      <Title>Product 2</Title>
      <Field_1>Value 1</Field_1>
      <Field_2>Value 2</Field_2>
    </Product>
  </Products>

I used to parse this successfully with the context query "/Products/Product", title query "/Product/Title", etc.

With the new -dev version though, those queries don't work; instead of returning a new node for each item, the feed just creates one empty node.

Is anyone else facing this issue?

twistor’s picture

Try,

"/Products/Product"

"Title"

brycesenz’s picture

@twistor -

Thanks again man, that worked perfectly. I wished I'd just posted the question before I spent 5 hours guessing, checking, and failing.

medden’s picture

If I had an XML structure like this:

<r>
<t>MyTitle</t>
<src url="http://www.mydomain.com">Link Text</src>
</r>

How would I get the url mapped?

I thought it would be:

context:/r
title:t
url:src[@url]

But that seems to map the 'Link Text' not the 'URL'.

medden’s picture

Cracked it!

I needed
context:/r
title:t
url:src/@url

This module is excellent.

medden’s picture

I just added this child page to the feeds_Xpath documentation page

http://drupal.org/node/919448

Hope it helps anyone new to XML or new to this module. Please edit or change anything I got wrong.

claudejanz’s picture

Hi,

I have a structure like this one:

 <service-response name="millionlot/winlist" status="ok">

  <win-list>

   <win type="swiss-award">
    <date day="27" month="9" year="2010"/>
    <winning-code>51111118</winning-code>
    <prize-description>Tisch 01 SwissAward + Fr. 200 F</prize-description>

    <player>
     <salutation>1</salutation>
     <firstName>Laurent</firstName>
     <name>Dupont</name>
     <street>Rue de Fenetre</street>
     <housenumber>20</housenumber>
     <zip>1200</zip>
     <city>Genf</city>
     <phonenumber>022 322 22 22</phonenumber>
     <email>urs.duenner@sltest2.ch</email>
     <language>fr</language>
    </player>
   </win>

   <win type="swiss-award">

How do I map the date in d/m/Y format to populate my datefield?

tyler-durden’s picture

dman solved my mapping issue on #39, but now it seems that I have upgraded to Feeds 9 and the latest Xpath parser Dev update it is not working anymore. Here is the data feed.

<GetMultipleItemsResponse>
   -<Item>
    <Description>description field</Description>
    <ItemID>123456789</ItemID> 
    -<ItemSpecifics>
      <NameValueList>
      <Name>Make</Name> 
      <Value>AMC</Value> 

My context is "//GetMultipleItemsResponse/Item" and using "ItemID" and "Description" as the XPath queries to run, it pulls in the data just fine.

But I have spent nearly 2 hours trying to get the Value field of the "Make". "ItemSpecifics/NameValueList[Name = 'Make']/Value" is what worked before (I may have had a "/" in the beginning I don't remember), but I have tried EVERY conceivable way possible with no success...

Any suggestions? Thanks! I believe this was working when I was testing with Feeds BEtA 4 and and Xpath Parser 1.01 .

twistor’s picture

should be

ItemSpecifics/NameValueList/Value

If you're trying to match against attributes, that looks like [@attr='Value']

tyler-durden’s picture

Thanks for the quick response, and please excuse my ignorance but I am still a bit confused. I should add that there are multiple ItemSpecifics. here is a better show of the xml file

<GetMultipleItemsResponse>
   -<Item>
    <Description>description field</Description>
    <ItemID>123456789</ItemID> 
    -<ItemSpecifics>
      <NameValueList>
         <Name>Make</Name> 
         <Value>AMC</Value>
      </NameValueList>
      <NameValueList>
         <Name>Model</Name> 
         <Value /> 
      </NameValueList>
      <NameValueList>
        <Name>Mileage</Name> 
        <Value>100000</Value> 
     </NameValueList>

I am getting errors using "ItemSpecifics/NameValueList/[@Make = 'Value']". Am I close?

twistor’s picture

I did some experimenting, your original value "ItemSpecifics/NameValueList[Name = 'Make']/Value" works for me. Assuming the feed has multiple [item]'s in it you might want to try "/GetMultipleItemsResponse//Item" as the context.

tyler-durden’s picture

Thanks for the help, I'll have to experiment later but I have yet to get it. "/GetMultipleItemsResponse//Item" did not help.

tyler-durden’s picture

Twistor,
Could I convince you to help me for 20 to 30 mn's today? I could float you $50 through paypal for the time, as I need to get this project completed by tomorrow midday. I have spent hours unsuccessfully getting this last problem to work, so I decided to use exactly what I posted above for code (I removed other unnecessary info), and it works? But when i use the actual feed from my source, it doesn't work? I can pull the standard fields like title, itemid, etc just fine but these "Name - Value" things I cannot get to work. I know I had it working on a previous version". Who knows, maybe it's sort of some little bug, but likely me just being stupid.

If interested, let me know where I can send you the url's, etc. Thanks

twistor’s picture

Shot you an email.

tyler-durden’s picture

Shot you an email back - thanks.

tyler-durden’s picture

Works like a charm now twistor. Thanks for the help!

As an FYI to everyone, 1.8 was just released and fixes a few minor bugs...

lancerkind’s picture

Category: support » feature

This URL explains namespaces and how MS handled them in their xpath parser. I'm not familiar with the PHP's xpath implementation. Since namespaces are a common phenomena I imagine the PHP library has an interface for expressing namespaces. We just need to add a view so people can express the namespaces for the "blah:" that occurs in their xpath expression.

http://msdn.microsoft.com/en-us/library/ms950779.aspx

lancerkind’s picture

Title: Document XPath features. » Xpath namespaces support
Category: task » support

@yasman
You need to install both the Feeds module and the Xpath for Feeds module, and enable both of them via administration-site builder- modules (roughly).

lancerkind’s picture

Category: feature » support

Hmmm. Am I the only one having trouble creating xpaths containing namspace prefixes? Eg. "/f:parent/f:child"
In this thread, I see a lot of people just not using namespace prefixes.

I'm still on the stable release but it seems everyone on this thread is using Dev. Perhaps that is my problem.

lancerkind’s picture

The XML document in this example isn't validatable without the namespace declaration's for g: and dc:. (It may not even be "well formed" XML, but I'm fuzzy on that right now.)
(To the audience: there are two levels of XML "goodness": well formed, meaning each start tag has an end tag. And valid-- meaning that a validating parser will be happy that the XML follows its schema defination.)

Maybe the namespaces got removed when building the example? Once you have the namespace information, you can inline the namespace into the XPATH and replace the "g:" and "dc:" with the full namespace. (Note that I'm still beating my head against this module and can't confirm this, but that's how XML and namespaces are supposed to work.)

(I'm trying to reply to this comment, I'm not sure what to do with all this "catagory," Assigned, ... etc. edit issue settings area. Sorry if I'm using it incorrectly.)

<item>
<title>Ladies - Too Many Dicks Tee</title>
<link>http://shop.flightoftheconchords.co.nz/node/11</link>
<description></description>
<category domain="http://shop.flightoftheconchords.co.nz/taxonomy/term/2">Apparel</category>
<category domain="http://shop.flightoftheconchords.co.nz/taxonomy/term/3">Ladies Size (US)</category>
<g:price>45.00000</g:price>
<g:id>FOC04</g:id>
<g:image_link>http://shop.flightoftheconchords.co.nz/sites/fotc/files/FOC04-01.jpg</g:image_link>
<g:image_link>http://shop.flightoftheconchords.co.nz/sites/fotc/files/FOC04-02.jpg</g:image_link>
<pubDate>Sun, 29 Nov 2009 11:31:03 +0000</pubDate>
<dc:creator>admin</dc:creator>
<guid isPermaLink="false">11 at http://shop.flightoftheconchords.co.nz</guid>
</item>
<item>
dman’s picture

Yes, the example text pasted there is a snippet one of a dozen items within the full document which is linked to directly above it.
http://shop.flightoftheconchords.co.nz/rss.xml
You'll find it is valid when viewed in full.

'category' is in the default namespace, which is RSS2, although the RSS feed that comes from Drupal doesn't bother defining the xmlns, which seems to be quite standard, though I'd prefer if it did.

NeoID’s picture

So there's no way to map <media:content url="..." /> from RSS?

twistor’s picture

@lancerkind,
The namespaces declared in a document are automatically registered before performing an xpath query. So any prefixed queries should work without user intervention. Are you saying that you want to declare new namespaces or use new prefixes that aren't defined in the document?

@NeoID,
That should be as easy as //media:content or //media:content/@url.

mitchell’s picture

Title: Xpath namespaces support » Improve documentation (examples, faq, user stories)
Category: support » task
Status: Active » Fixed

Marking this as fixed. Please open a new issue rather than reopening this one.

I made some updates to the Basic XML Tutorial, started a FAQ, and started a page for contributed importers and examples. I'd very much appreciate some reviews and edits.

Additional todos:
@iantresman, Re: #7: This help text wasn't committed. Please roll a patch and submit a new issue.

@claudejanz, Re: #56: "How do I map the date in d/m/Y format to populate my datefield?" I believe this is fixed in #831470: Import fails due to dates not converted correctly or should be added as another feeds issue.

@lancerkind, Re:#66: Please open a new issue for "Namespaces support" with a reply to twistor's comment, "Are you saying that you want to declare new namespaces or use new prefixes that aren't defined in the document?". Also, please be mindful about jacking existing threads.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

JMOmandown’s picture

So here's a question regarding your explanations of how the context property is used within the module. My understanding is that the context basically determines how many nodes are created and what defines a new node. My question is if a child node in the xml can be the context.

For Example:

<?php
<Product>
  <ID>1</ID>
  <name>Blah</name>
</Product>
<Product>
  <ID>1</ID>
  <type>green</type>
</Product>
<Product>
  <ID>2</ID>
  <name>Yeah</name>
</Product>
<Product>
  <ID>2</ID>
  <type>blue</type>
</Product>
?>

My Question is can the context be defined as //ID , so that there are two nodes created but the xpath expressions still allow for the fields, name and type, to be combined in their respective nodes?

The use case for this is somewhat more complex but solves a problem for systems dealing with 1) Microsoft Access and 2) Systems that allow for content to be added, revised, extended, and aggregated from multiple sources.