This project is not covered by Drupal’s security advisory policy.
Overview
Utility for populating content entities from HTML using plugins.
On it's own, this module does nothing. It is a tool to assist with writing custom code for tasks such as migration, where there is a need to do something to an entity based on the contents of an HTML document.
To use this module you will need to write one or more plugins, and then call the service to execute the code within these plugins.
There is no user interface.
Usage
Step 1 - write a plugin
A plugin has the responsibility of taking a parsed HTML document and populating an entity in some way. Plugins typically deal with populating a single content type, or populating a field across multiple content types. However this module does not place any restrictions on what a plugin can do.
Example:
#[HtmlToEntity('node_title')]
class SetNodeTitleFromH1 extends HtmlToEntityPluginBase {
// A plugin decides whether to act on a particular entity.
// In this case, we can set the title of any node.
public function appliesToEntity(ContentEntityInterface $entity): bool {
return $entity->getEntityTypeId() === 'node';
}
// A plugin decides whether to act upon or to ignore a document
// based on the document's URI. In this case we only act on a
// subset of pages within a website being scraped.
public function appliesToUri(string $uri): bool {
return str_starts_with($uri, 'https://www.example.com/news/');
}
// A plugin takes a document and does something to the entity.
// In this case we use the text within the <h1> for the node's title.
public function populate(ContentEntityInterface $entity, HTMLDocument $document): void {
$h1_text = // extract the content of the <h1> element
if ($h1_text) {
$entity->setTitle($h1_text);
}
}
}
Step 2 - applying plugins
Call the service with an existing entity (which may or may not be already saved) and an HTML document:
$entity = ... ; // load or create a ContentEntityInterface
$document = ... ; // obtain an HTMLDocument object
$logger = ... ; // optional LoggerInterface. If set, this is passed to plugins
\Drupal::service(\Drupal\html_to_entity\HtmlToEntityInterface::class)
->populate($entity, $document, $logger)
Requirements
PHP 8.4 or higher.
Related modules
You may find it useful to combine this module with:
- HTML Transfomer API, for modifying HTML prior to populating formatted text fields.
- Meda on Demand, for turning ad-hoc media URLs into media entities.
Project information
- Project categories: Developer tools
- Created by erik.erskine on , updated
This project is not covered by the security advisory policy.
Use at your own risk! It may have publicly disclosed vulnerabilities.
