In cases where more than the batch limit of 50 nodes need to be expired, expiry is not scheduled ASAP but after the next 3600. This leads to a back up in nodes that need to be deleted after a while.

Solution: move expiry to source level (integrate it in clear()?), use FeedsSource' batch state tracking.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Anonymous’s picture

I've moved node expiry into FeedsSource and everything appears to be working now. I've attached a patch.

Anonymous’s picture

Here is an updated patch with an upgrade path and I've removed some other overlooked cruft so I've removed that.

Anonymous’s picture

Missed that I added feeds_node_expiry.test to info file this shouldn't be added in this patch. Here's an updated patch.

Anonymous’s picture

Might as well roll expiry tests into this patch as well. Here is an updated patch with expiry tests included.

alex_b’s picture

Status: Active » Needs work

- Update hook should remove feeds_importer_expire from job_scheduler
- Why is scheduleImport() removed from feeds_source_import()?
- Also scheduleExpire() should be invoked at the end of feeds_source_export().
- scheduleExpire() is using fetcherPeriod() where it shouldn't.
- scheduleExpire() should only schedule a job if FEEDS_EXPIRE_NEVER != $this->processor->expiryTime()
- scheduleExpire() should remove a job otherwise.

Anonymous’s picture

Status: Needs work » Needs review
FileSize
19.31 KB

Alright I've fixed scheduleExpire() and I've added back in somethings I removed accidentally while troubleshooting, like scheduleImport(). I've also updated the update hook.

Anonymous’s picture

Just noticed one of the comments in my expiry tests were not accurate just a quick update.

Dave Reid’s picture

Priority: Critical » Major
twistor’s picture

Assigned: Unassigned » twistor
FileSize
18.71 KB

Picking up the torch. Let's see what testbot says.

twistor’s picture

Forgot the new test file.

Status: Needs review » Needs work

The last submitted patch, feeds-expiry-batch-930652-10.patch, failed testing.

twistor’s picture

Status: Needs work » Needs review
FileSize
21.12 KB

Second stab. This should pass.

Status: Needs review » Needs work

The last submitted patch, feeds-expiry-batch-930652-12.patch, failed testing.

twistor’s picture

Status: Needs work » Needs review
FileSize
20.99 KB

I moved the new test class into the node processor tests.

C'mon testbot!

ressa’s picture

Is this patch about ready to be included in the dev version? The "Expire nodes" doesn't work in the current release, as can be seen in these posts.

Will this patch delete nodes before import? That would make sense to me.

ressa’s picture

I spoke to soon. I just ran cron, after setting "Expire nodes" to one hour, and after a bit of a wait I went to the log and saw this entry "Type: job_scheduler" - "Finished processing scheduled jobs (0 sek. s, 1 total, 0 failed)."

And indeed, all nodes older than one hour are gone, so it seems "Expire nodes" actually works.

byronveale’s picture

Just wanted to point out a sort-of similar discussion with patch going on in the "Unpublish/Delete nodes not included in feed" feature request.

I notice 7.x-2.0-alpha8 was last updated on April 22nd; presumably this patch made it in, and this issue can be closed??

Thanks for everyone's efforts on this!

byronveale’s picture

Actually, it looks like this code did not make it in to the latest dev version. Just want to cross-reference other related issues, for users coming here: #1470530 and #1363088.

(Twistor, if you're getting notified every time I leave a comment, sorry if it feels like I'm hounding you!)

byronveale’s picture

I tried the patch in #14, and am having issues, not exactly sure where they are coming from:

- when manually running feeds import, no feed items were expired, regardless of the expiry setting (1 year, 1 month, 1 hour, etc.), or process setting (background on or off) specified

- when running automatically (periodically, every 15 minutes, scheduled cron entry), feeds items were deleted and recreated, seemingly at random, and could not find a single specific instance of an item that should have been deleted (no longer in feed) being deleted

I tested in in a local Acquia dev instance, Drupal 7.23, MySQL 5.1.66, PHP 5.3.18, Mac OS X 10.8.4. There are 767 items in my feed (coming from LDAP, using the LDAP module, "ldap_feeds" component).

So, can't say specifically whether or not the patch successfully fixed expiry "batching"; but the whole expiry process seems to have issues.

byronveale’s picture

Okay, another day of testing, and some enlightenment:

- because I was working with the issue of unpublishing nodes no longer in a feed source (#1470530), I was conflating that issue with this one, that is expecting nodes no longer in a feed to be expired, thinking that the "expire nodes" setting would expire a node only after it was no longer in a feed for the specified duration (yeah, I know, I know...!)

- I now get that the "expire nodes" setting removes an item after the specified duration regardless of whether or not it is still in the feed; if the item is still in the feed, it is recreated as new (this explains the "random" behavior I reported in my last comment)

- it appears there may be an issue still with expiration, as brought up in issue #1433170, as after a morning of updates, things have been quiet; whether or not this is part of this batching issue or something else, I couldn't say...

I'm going to let it go overnight (periodic import/cron running every 15 minutes, nodes expiring after an hour, feeds_process_limit = 100). Will report results tomorrow...

byronveale’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
184.62 KB

Looks like it's working just fine, had an evening full of expiring and creating 100, 200 nodes at a shot (see attached). Will file under a separate issue, but I think it would be nice if Feeds logged when it expires nodes, in addition to logging their creation. Dare I say this has been "reviewed and tested"...?!

twistor’s picture

Issue summary: View changes
Status: Reviewed & tested by the community » Needs review
FileSize
24.5 KB

Alright, some minor tweaks and fixes.

If this passes, I'm ready to commit.

Status: Needs review » Needs work

The last submitted patch, 22: feeds-expiry-batch-930652-22.patch, failed testing.

twistor’s picture

Status: Needs work » Needs review
FileSize
24.55 KB

That's why we have tests.

twistor’s picture

Assigned: twistor » Unassigned
Status: Needs review » Fixed

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Yorgg’s picture

Status: Closed (fixed) » Active

It may have passed the tests but suddenly nodes don't get expired any longer. Nothing that I'm aware has changed.
This is the second time that it happens since #24 though somehow I managed to get it back to work once.
I have disabled and uninstalled the feeds module, then and (re)-installed. I thought this was the fix but this time it was not.

Should I open another issue?

MegaChriz’s picture

Status: Active » Postponed (maintainer needs more info)

@j_nunes
Are you using the latest dev version of the module? If so, can you provide the minimal steps to reproduce the issue? With the current information it can be anything from misconfiguration to bug.

Yorgg’s picture

@MegaChriz: I am using the latest dev now with xpath parser and json parser.
In the Node Processor, the expiration date is set to 3 hours from "The node's published date will be used for determining the node's age"
My mapping is set so expiration occurs 3 hours from the current published date, meaning nodes should expire 3 hours from import.
I don't see any other configuration other then this.

I went ahead to see the timestamp values with devel on some feed-items and I believe I must have 200 or 300 candidates pending expire action, but they don't.
Am I missing something?

MegaChriz’s picture

@j_nunes
Can it be that cron didn't run? I think that expiring nodes depend on cron.

Yorgg’s picture

That's not the case. I even run cron manually on my devel machine to make sure its running properly every time.

Yorgg’s picture

Is there any other information you require to reopen this issue again?

donquixote’s picture

Status: Postponed (maintainer needs more info) » Active

Fellows,
the variable $reschedule in function feeds_reschedule() is messed up.
Sometimes it is a boolean, sometimes it is an array (list of importer ids), sometimes a string (importer id).
This is asking for trouble.

The $importer_id is not much better. But at least this one is documented.
But the combination of different states of these two variables is hard to follow.
And probably buggy.

twistor’s picture

Status: Active » Closed (fixed)