This project is not covered by Drupal’s security advisory policy.

Overview

Utility for populating content entities from HTML using plugins.

On it's own, this module does nothing. It is a tool to assist with writing custom code for tasks such as migration, where there is a need to do something to an entity based on the contents of an HTML document.

To use this module you will need to write one or more plugins, and then call the service to execute the code within these plugins.

There is no user interface.

Usage

Step 1 - write a plugin

A plugin has the responsibility of taking a parsed HTML document and populating an entity in some way. Plugins typically deal with populating a single content type, or populating a field across multiple content types. However this module does not place any restrictions on what a plugin can do.

Example:

#[HtmlToEntity('node_title')]
class SetNodeTitleFromH1 extends HtmlToEntityPluginBase {

  // A plugin decides whether to act on a particular entity.
  // In this case, we can set the title of any node.
  public function appliesToEntity(ContentEntityInterface $entity): bool {
    return $entity->getEntityTypeId() === 'node';
  }

  // A plugin decides whether to act upon or to ignore a document
  // based on the document's URI. In this case we only act on a
  // subset of pages within a website being scraped.
  public function appliesToUri(string $uri): bool {
    return str_starts_with($uri, 'https://www.example.com/news/');
  }

  // A plugin takes a document and does something to the entity.
  // In this case we use the text within the <h1> for the node's title.
  public function populate(ContentEntityInterface $entity, HTMLDocument $document): void {
    $h1_text = // extract the content of the <h1> element
    if ($h1_text) {
      $entity->setTitle($h1_text);
    }
  }

}

Step 2 - applying plugins

Call the service with an existing entity (which may or may not be already saved) and an HTML document:

$entity   = ... ;  // load or create a ContentEntityInterface
$document = ... ;  // obtain an HTMLDocument object
$logger   = ... ;  // optional LoggerInterface. If set, this is passed to plugins

\Drupal::service(\Drupal\html_to_entity\HtmlToEntityInterface::class)
  ->populate($entity, $document, $logger)

Requirements

PHP 8.4 or higher.

Related modules

You may find it useful to combine this module with:

Supporting organizations: 
Development as part of the LocalGov Drupal project

Project information

Releases