Problem/Motivation

There is a lot of overlap between extractor modules, and that is why we created the Document Loader, so you can load any type of file into text (or markdown/html).

When we have the plugins added, unstructured, and any future document loaders, that means that anyone can use this to abstract the loading of documents.

So, in Automator, you say that you want a document loader to load DocX to MD, but unless you go into advanced, you do not have to express which one unless you need a specific one, just that you have one. Same with a Tool API Tool.

That way, we can move agents or automators to recipes where they do not have to care about which document loader is being used. Just that one exists.

Proposed resolution

  • Create the Document Loader Plugin

Remaining tasks

User interface changes

API changes

Data model changes

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

ahmad khader created an issue. See original summary.

ahmad khader’s picture

Issue summary: View changes
marcus_johansson’s picture

Issue tags: +sprint candidate
arianraeesi’s picture

ahmad khader’s picture

We may need to implement #3577857: Allow Document Loader to define LLM inputs or something similar before we can integrate with document loader

ahmad khader’s picture

Status: Active » Needs review

The text fields have been migrated to the document loader. However, tables and images are still utilizing our existing automation processes. To fully migrate all automation processes to the document loader, we need to implement support for these fields/outputs within the document loader.

ahmad khader’s picture

The ticket also depends on #3577857: Allow Document Loader to define LLM inputs since we need to configure the option per loader so we can pass our configuration to the automators.

ahmad khader’s picture

Assigned: ahmad khader » Unassigned
marcus_johansson’s picture

Assigned: Unassigned » marcus_johansson
marcus_johansson’s picture

Assigned: marcus_johansson » Unassigned
Status: Needs review » Needs work
ahmad khader’s picture

Status: Needs work » Needs review
marcus_johansson’s picture

Assigned: Unassigned » marcus_johansson

reviewing

marcus_johansson’s picture

Assigned: marcus_johansson » Unassigned
Status: Needs review » Reviewed & tested by the community

Ready to merge from my side