Marking this as a support request as I may be missing something.
The skip_hash_check boolean is set on the Processor. From what I can tell a feed that hasn't been modified throws an exception in the fetcher and dies before reaching the processor or parser.
My settings:
Fetcher: Download
Parser: custom loosely based on the xml parser
Processor: Node
Content Type: custom

Processor settings:
Update existing contents: Update existing contents
Advanced - Force update: checked

I dropped some debug statements into the fetcher and that is definitely throwing the exception. Then I threw some debugs into my parser and it was never reached.

The Feed URL of this particular item is the first time I have been using a static xml instead of one generated from a DB. I'm looking to test some image parsing and wanted full control over the feed source.

I got around the issue for now by adding the skip_hash_check boolean as a setting on the fetcher, and skipping the changed test if the skip hash check has been checked off. If that is how this should work then I can work up a patch that moves the boolean from the processor to the fetcher, but I wanted to see if I was missing something before moving forward with that.

Comments

wolffereast created an issue. See original summary.

wolffereast’s picture

I also had to modify the fetcher's get cache key function to avoid the cached response from failing. Could this be an issue with my file system not correctly holding on to the cached file?

megachriz’s picture

When a source feed has not been modified (and it returns a 304 HTTP status code), Feeds will throw an EmptyFeedException and the import will be aborted. The HTTP Fetcher has no knowledge of any settings on the processor, so the "Force update" setting will have no effect in this case.

I think that the "Force update" setting should be respected by the HTTP Fetcher. Maybe the setting needs to be moved to the general settings?

megachriz’s picture

Title: unchanged feed with skip_hash_check enabled not getting to parser » Always redownload source (even if not modified) when "Force update" is checked
Category: Support request » Feature request
joelpittet’s picture

Status: Active » Reviewed & tested by the community
StatusFileSize
new2.38 KB

How about something like this on the fetcher?

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 5: 2850888-5.patch, failed testing. View results

joelpittet’s picture

Status: Needs work » Needs review
StatusFileSize
new2.97 KB

That's a bit of an annoying test, is it needed?

megachriz’s picture

Status: Needs review » Needs work
Issue tags: +Needs tests

@joelpittet
This is a good idea, though I think it would be better to not use the cache at all when this option is enabled. We could then call the option "cache_http_result" as it is called in the D7 version:

$form['advanced']['cache_http_result'] = array(
  '#type' => 'checkbox',
  '#title' => t('Cache HTTP result of request'),
  '#description' => '<p>' . t('Disabling this cache means that the downloaded source will not be cached (for example: on the file system or on memcache), but will be redownloaded on every feeds import attempt. This can be helpful if the source to download is dynamically generated and will always be different, or if it is very large (50 MB+) which would cost additional webspace.') . '</p><p>' . t("If you're having issues with Feeds not processing changes from the source file, or if you are experiencing caching issues, you can disable the caching of this feeds content.") . '</p>',
  '#default_value' => $this->config['cache_http_result'],
);

Also needs two tests:

  1. A functional or kernel test that doesn't fetch the same content twice when the option is disabled.
  2. A functional or kernel test that fetches the same content twice when the option is enabled.

I looked if perhaps some tests from the D7 could be ported, but caching downloaded content in the D7 version works slightly different there. In the D8 version, only the headers are cached. In the D7 version also the complete file contents. Anyway, similar tests would be:

  • FeedsFileHTTPTestCase::testHTTPCacheDisabled()
  • FeedsFileHTTPTestCase::testSourceCaching()
  • FeedsFileHTTPTestCase::testChangedSource()
megachriz’s picture

This probably a slightly better option for the annoying HTTP Fetcher unit test:
$this->assertInternalType('array', $form);

nchristensen’s picture

#7 worked for me.

Was running into issue where my product's stock field data wouldn't update if the CSV I was importing hadn't been changed/data hadn't been updated, even when I had the "Force update" checkbox selected.

My workflow:
-> Initial stock of product set at 100
-> user then purchases 10 items of the product
-> remaining stock shows 90
-> cron runs, and feeds attempts to pull csv to update products (where product's stock is listed at 100)
-> feeds fails because stock data is the same as initial stock (100) so remaining stock remains at 90. (so even if client gets another shipment in to keep the stock at 100, feeds doesn't care)

After applying patch and leaving the "Force update" box checked, Feeds is now updating my product stock as hoped.

caesius’s picture

StatusFileSize
new2.97 KB

Rerolling.

caesius’s picture

Status: Needs work » Needs review
joshua.boltz’s picture

This patch is working great on Drupal 8.7.11 using Feeds and Feeds Extensible Parsers to import from 3rd party JSON API.
Previously, I had to append a query string parameter to the end of the API URL, such as https://jsonplaceholder.typicode.com/posts?abc, but now, with the "Always Download" setting added with this patch, that removes the need to do that each and every time I wanted to run the import, even if the previous import was unsuccessful and Drupal actually didn't have any feed items from the API yet, but due to caching of the headers, it was still requiring to use a query string so Feeds could try a re-import.

megachriz’s picture

Issue tags: +beta target

Adding this as a nice to have before beta phase.

ollie222’s picture

The patch in #11 applied cleanly to the latest dev feeds and so far things look and it's solved a problem for me.

I'd like to see this feature make it to the release version too.

megachriz’s picture

Status: Needs review » Needs work

@Ollie222
Do you want to write an automated test for this feature? This is needed to ensure that the feature keeps working in future versions. See also #8.

ollie222’s picture

@MegaChriz

While I've been using Drupal and creating custom modules and functions for many years now writing tests is not something that I've had huge exposure to although I'm aware it's becoming more and more important.

My understanding from the examples I've seen previously and the Drupal PHPUnit testing page is that there is quite a steep learning curve to get up to speed with writing tests. Is this thinking correct?

I don't have a great deal of free time at the moment and I don't think enough to learn something like this.

megachriz’s picture

@Ollie222

My understanding from the examples I've seen previously and the Drupal PHPUnit testing page is that there is quite a steep learning curve to get up to speed with writing tests. Is this thinking correct?

I think you are mostly correct with that, yes. Writing tests can take some time to master. I remember mocking things in PHPUnit took me some time to understand. I think browser tests are the easiest to start with as these are the closest to what you would do as an user in the browser. Kernel tests are useful for testing API's with database backend and unit tests for testing a specific class.

For this issue, I think we need a kernel test: it needs to be tested how imports behave with this option turned on and I don't think we need to test any user interaction.

bbox1’s picture

For those looking for an easy drop-in solution, please see my solution below. Our use case was such that we wanted a new entity created on every Feeds fetch, even if the remote data had not changed.

I defined a new FeedsFetcher plugin, that extends the HttpFetcher plugin, and placed it in my module's directory under: my_module/src/Feeds/Fetcher/HttpUnconditionalFetcher.php

The code:

use Drupal\feeds\FeedInterface;
use Drupal\feeds\Feeds\Fetcher\HttpFetcher;

/**
 * Class HttpUnconditionalFetcher
 *
 * @FeedsFetcher(
 *   id = "unconditional_http",
 *   title = @Translation("Download unconditionally from url"),
 *   description = @Translation("Downloads data from a URL using Drupal's HTTP request handler. Data will be downloaded even when there has been no change since the last request."),
 *   form = {
 *     "configuration" = "Drupal\feeds\Feeds\Fetcher\Form\HttpFetcherForm",
 *     "feed" = "Drupal\feeds\Feeds\Fetcher\Form\HttpFetcherFeedForm",
 *   }
 * )
 */
class HttpUnconditionalFetcher extends HttpFetcher {

  /**
   * @inheritDoc
   */
  protected function getCacheKey(FeedInterface $feed) {
    // Prevent attaching If-None-Match and If-Modified-Since headers so that
    // the target file will always be downloaded.
    // @see \Drupal\feeds\Feeds\Fetcher\HttpFetcher::get()
    // @see https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Modified-Since
    return FALSE;
  }

Hope it's helpful to someone somewhere.

megachriz’s picture

Status: Needs work » Needs review
StatusFileSize
new6.06 KB
new5.52 KB

In addition of the patch in #11, this skips saving data to the cache when the option "Always download" is enabled.

Also adds tests, ported from the D7 version:

  • FeedsFileHTTPTestCase::testHTTPCacheDisabled()
  • FeedsFileHTTPTestCase::testChangedSource()

megachriz’s picture

Status: Needs review » Fixed

Committed #20. Thanks to everyone who posted or reviewed patches!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.