Writing Migrations
This documentation needs review. See "Help improve this page" in the sidebar.
The AI Prompt
For now, basic system and user prompts are hard-coded into the module. The prompts work along these lines:
System
- "You are an API that attempts to parse HTML content and find the important parts of a web page. You will return this as a JSON object with this schema..."
- The module appends a json serialized schema describing the bundle you're migrating to.
User
- Along the lines of, "Here are some basic instructions to follow ... include HTML, make urls absolute, etc."
- The module appends the processed HTML from the source url.
In the near future, prompts will be configurable in the migration yml.
Migration Configuration
Source Parsers
AI Migration contains a data parser Ai plugin that sends the source HTML content (and prompts) to the AI provider. You will need to specify both the http data fetcher plugin (provided by Migrate Plus) and the Ai data parser and your migration.
Under source.urls in the migration yml, provide a list of urls that you would like to migrate. We recommend starting with one or two to test out the system. Migrating many source urls without proper testing and tweaking could be quite expensive and time-consuming.
Here's an example:
source:
plugin: url
data_fetcher_plugin: http
data_parser_plugin: ai
urls:
- https://accessibility.civicactions.com/posts/prioritizing-accessibility-bugs-for-maximum-impact
- https://accessibility.civicactions.com/posts/delivering-digital-first-turning-21st-century-idea-into-actionHTML Processing/Sanitzation
Removing superfluous HTML will result in fewer hallucinations and lower costs! HTML processing settings are available under the source.ai section. These allow you to limit/sanitize the amount of source HTML sent to the AI provider. See the simple_content_migration.yml file in the ai_migration_example submodule for an example.
Process Plugins
There is one required process plugin: row_passthrough, which is provided by AI Migration. This plugin passes data from the source plugin directly through to the destination plugin. Note the key has to be row.
process:
row:
plugin: row_passthrough
source: urlThe need for process plugins is potentially eliminated, depending on the prompt. However, process plugins do work as you would expect them to. However, they should be included after the row_passthrough plugin.
process:
row:
plugin: row_passthrough
source: url
status:
plugin: default_value
default_value: 0
promote:
plugin: default_value
default_value: 0
sticky:
plugin: default_value
default_value: 0
langcode:
plugin: default_value
default_value: enHelp improve this page
You can:
- Log in, click Edit, and edit this page
- Log in, click Discuss, update the Page status value, and suggest an improvement
- Log in and create a Documentation issue with your suggestion