I am doing a large date import from several custom tables and converting each entry to nodes.

I essentially am assigning the primary key of the tables to the GUID when I do the Feeds Import. But now that I'm moving along, I've examined the "feeds_item" table in my database and noticed that everything I've imported goes to this one table.

I will need the ability for Feeds to update some of these nodes eventually. However, some of my tables currently just use an auto incrementing integer field as the primary key. With multiple tables doing this, will I likely run into problems when say I import table 1 with a primary key = 1 and table 2 with primary key = 1?

I could probably figure this out with trial and error. I just don't want to have to go into all of these tables and create new primary keys that are unique among every table.

Comments

Jaypan’s picture

Yes. A primary key must be unique. If you are trying to import the number 1 twice in the primary key column, you will get errors.

RyFo18’s picture

My question is more along the lines of this...

I have two tables. Each will get imported by Feeds as different content types. I am mapping the Primary keys to the GUID of Feeds Importer.

If both of these distinct tables have a primary key value of 1, will this screw things up in the Feeds importer down the line? Or is it smart enough to keep the GUIDs distinct for different content types?

I will need to update some of these nodes eventually, and I believe you can use this GUID value so that the Feeds Importer can determine if imported data from a new CSV file is an update to an existing node or if it should be a new node.

Jaypan’s picture

I'm not 100% following you, but maybe this will help. Each table has a primary key (or is supposed to). Primary keys need to be unique for that table. Different tables may have the same primary key.

So if I have table 1, with the primary key column being called 'pk', and I have a value of '1' in one of the rows for 'pk', then I try to insert another row with a value of 1 for the 'pk' column, I will get an error, as the value must be unique.

If I have another table, table 2, with the primary key column also being named 'pk', I can insert a value of 1 for 'pk' even if there is a value of 1 for 'pk' in table 1. That is because these are separate tables, so the key does not need to be unique between the tables, only within an individual table.

Does that answer your question?

RyFo18’s picture

My biggest problem is that these tables are getting imported using Feeds Importer, where each table row in Table 1 is a node in a content type I have created. Each node in table 2 is a node in a different content type I have created.

Feeds importer uses the concept of a "GUID" to distinguish between items that have been imported. So here is my setup:

  • I have two feeds Importers, one imports table 1 in to Content Type 1, the other imports table 2 into Content Type 2
  • For both tables, the primary key is mapped to this "GUID" field that Feeds importer uses

Now, say for instance I import a CSV file of table 1's data and create a bunch of nodes of Content Type 1 (one for each row). Later, I need to import another CSV file that provides updates to these nodes. Feeds Importer uses the GUID to determine if that node already exists, and if it does it updates it. If it doesn't, it creates a new node.

Basically what I don't know, and still haven't had to time to experiment with, is if the GUID is unique for each individual Feeds Importer, or not. If not, then I will have to make sure all of my tables have unique primary keys, even among different tables.

Jaypan’s picture

I still don't understand, seeing as nodes save to their own table (the {node} table), so I'm not sure what you are referring to.

The GUID is supposed to be unique, but it arbitrary by the creator of the feed. I have a feed importer on one of my sites, and the GUID is just the URL of the original article - this will be unique from the feed I'm getting it from, but theoretically, someone else could use the same GUID in a different feed somewhere, though it would be pretty doubtful.

Going back to your comments, and more of why I'm confused. The first thing you say:

My biggest problem is that these tables are getting imported using Feeds Importer

Followed by this:

where each table row in Table 1 is a node in a content type I have created. Each node in table 2 is a node in a different content type I have created.

How are these two statements connected?

RyFo18’s picture

this will be unique from the feed I'm getting it from, but theoretically, someone else could use the same GUID in a different feed somewhere

So my main question is can i use the same GUID for two separate feeds?

The two statements are connected because I have imported table 1, using Feeds Importer 1, and mapping the primary key of table 1 to the GUID. I also would like to import table 2, using Feeds Importer 2, and mapping the primary key of table 2 to the GUID as well.

My concern is then if I were to need to update an existing node using the Feeds Importer, will I run into problems when two imported items from different feeds have the same GUID? Or is Feeds able to distinguish between items that have the same GUID, but were imported with different Feeds Importers?

Hopefully that clears it up lol...Thanks for your patience.

Jaypan’s picture

You will run into issues. Feeds will not know which importer imported the feed, it will just check the GUID regardless of what feed saved the info originally.

RyFo18’s picture

Appreciate all the help mate! Time to look for a different solution.

steven.wichers’s picture

This doesn't appear to be correct (at least on Drupal 7). Line 527 of FeedsProcessor.inc adds the source ID to the query it uses to look for existing entries.

brandonc503’s picture

*deleted as i cant remove account

davemaxg’s picture

If you are importing a lot of related tables, the GUID is used to find a previously imported item, get its nid and then correctly relate it to this previously imported record. The problem is that Feeds does not check content types at all, so this GUID which is presumably from an external system MUST be globally unique. If it isn't unique, you'll end up with incorrect entries in entity reference fields.

I've run into this myself on some rather large imports that I didn't want to reimport. Because each GUID is unique when combined with the feed id, I was able to construct a query like the one below that fixes the wrong nids with the right ones without the need to reimport anything.

update
drupal_field_data_field_license_status_code_1 sc
inner join drupal_feeds_item i on sc.field_license_status_code_1_target_id=i.entity_id
inner join (select * from drupal_feeds_item
where id='import_license_status') statuses
on i.guid=statuses.guid
and i.id<>'import_license_status'
set sc.field_license_status_code_1_target_id=statuses.entity_id

brandonc503’s picture

*deleted as i cant remove account

brandonc503’s picture

*deleted as i cant remove account