Hi,

Are there any plans yet to update this module to Drupal 8?

Best regards,
Frank

Comments

Frank Ralf created an issue. See original summary.

dman’s picture

Status: Active » Postponed

Sorry, no.
I haven't been able to imagine a game-plan for upgrading the current system to Drupal8. Too many things have changed, and it would be a total rebuild.
On the way, I'd be moving towards closer compatibility with either Feeds API or Migrate.module., and a plugin system for the processors

Import_html in its current form cannot be ported to Drupal8, and it won't have a new form until I spend at least another year getting up to speed with D8.

Frank Ralf’s picture

Hi Dan,

Thanks for your quick reply. I suspected as much. I'm also just about having a closer look at Drupal 8 myself. IMO making this module compatible to Feeds and/or Migrate makes much sense. Is there anything I can do to move things in that direction? I had a look at the Drupal 8 roadmap for Feeds (#1960800: [meta] Feeds 8.x roadmap) and there's also still a lot to do at that end. So I could have a look at the parts of Feeds where Import HTML might fit in. What do you think?

Best regards,
Frank

dman’s picture

I did formulate something of a roadmap - up to the point where I had so many plans that I didn't know where to start.

I even did a prototype at one point.

Key ideas included:

  1. There is probably too much that needs to be done at the 'scraping' end to be able to configure the import_html magic as just a Feeds 'fetcher'... So I was considering an alternate UI that dropped in to emulate part the Feeds UI, *by consuming feeds plugins where possible* but just presenting them in different context.
  2. I already had many of the vast collection of data-massaging cleanups abstracted into plugins in D7, and many of these would become feeds_tamper-like plugins. Though I also need a pre-parse and post-import event to hook into as well
  3. I really need to revisit the whole thing using advanced batch API and real, fully-managed task queues - so that scrapes can be paused, monitored, backgrounded and re-run.
  4. I have a trial overhaul of the user interface - a totally graphical point-and-click DOM selector (somewhat like Yahoo! Pipes - or these days, scraper.io and similar) That was actually doing well until the project I was sponsored for fell through.
  5. I thought of hooking in to the https://www.readability.com/ API as one of the plugins to try. I had a proof-of-concept that really worked there.
  6. The D8/Symfony web components and DOM scrapers look like they will give me a load of code reduction
  7. I plan to drop the XSL-centric process that lives at the heart of the system right out. XSL hasn't proven to be as popular as I'd predicted, and in reality, DOMXpath alone does what we most often want. That's what the later plugins I added really brought to the table.
  8. However, a major reason for choosing XSL in the old days was the idea that we would be able to share a 'scraper template' for common website shapes (eg, common alternative CMS's). I'd have to surface our "import configurations" as exportables. Apparently this will be pretty much required by D8 CMI, so that's another thing I don't have to build by hand. Ditto, can drop the existing 'Features' support

However, all of these ideas require getting most of the 'basics' in place first. And I've not even started that journey with D8, and am going to try to do a few much simpler D8 modules first.

Frank Ralf’s picture

Title: Update to Drupal 8? » Drupal 8 port - roadmap sketch

Thanks for this detailed information. I've updated the title of this issue to better match the current content so others interested will find it.

Best regards,
Frank