I will not be deleting nodes through the feeds import page, but I will be updating nodes. I believe updating is done with the GUID I have set in the feeds importer.

The feeds_item table is 50mb in my database. Is it safe to delete all the rows in it, or is it required to be there?

Comments

Collins405’s picture

Component: Feeds Import » Code
Issue summary: View changes

Also interested in an answer for this. I'll test now.

Collins405’s picture

It seems like Truncating the feeds_item table deletes all of the GUID data for the feeds items, meaning when you select "skip hash" check and import again, all items are created again as duplicates.

This is disappointing, as it means you have to keep a hefty table in the database on a big site if you ever want to import again, but not too often.

Perhaps a work around is not to use the GUID mapping target, and instead use another field as the "Unique" target.

You can use the field validation module to make a field on your content type "Unique", and it will be available as a "Unique" target in feeds mapping.

Collins405’s picture

Just tested this as well. I can confirm that if you use your own unique field instead of the GUID, and then truncate the feeds_item table, you can successfully re-import the items and they will update.

If you use the GUID mapping, you will end up with importing a duplicate item for every row.

MegaChriz’s picture

Status: Active » Fixed

The feeds_item table is used to keep track of items imported via Feeds. This also includes items that were only updated by Feeds, but not originally created by Feeds.

If you use either the targets "GUID (guid)" or "URL (url)" as unique target, then truncating the feeds_item table can result into importing duplicates.

The feeds_item table also serves an other purpose: for every imported item it creates a hash value. With that value Feeds can check if an item to import has changed since the previous import. If so, it skips the item. This can save a lot of entity save calls and also a lot of file downloading if you are importing images for example.
Say you have a source with 1000 items and you already have imported them all once. Then you change three items in the source. This will result into Feeds only updating those three items on the site, thus saving 997 entity save calls. Truncating the feeds_item table in this case will result into all 1000 items being re-imported, changed or not.

If you do not use GUID or URL as unique target and you don't mind that all items are being re-imported, then it is safe to truncate the feeds_item table. It will not trigger errors or render your site unusable. In fact, in some cases it could be a good thing to do: if you have lot of items imported that you don't plan to import again. In such case I usually just only delete the records that belong to the importer in question:
DELETE FROM feeds_item WHERE id = 'my_importer';

To sum up:

  1. Truncating feeds_item table can result into duplicates when using GUID or URL target as unique target.
  2. Truncating feeds_item table can result into a lot of items being imported again that are not changed in the source, potentially costing a lot of resources.
  3. Truncating feeds_item table will not result into errors or render your site unusable.
  4. To save space, it can be a good thing to clean up items that you do not intend to import again.
Collins405’s picture

Thanks for taking the time to clear that up.

In my case, we do the imports offline as we import over a million rows at a time through 30 different importers! So it is pointless having those rows in the feeds_item table on the live site anyway. As we are importing ASCII data into custom ECK types, we do the import offline, then just truncate and dump the sql tables from offline to online. Much faster :-)

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.