Problem/Motivation

When a content type has a media field type containing PDF files, we want to index the text inside the PDF and save it in the Typesense's vector database in order to allow AI embaddings.

Remaining tasks

User interface changes

API changes

Data model changes

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

robertoperuzzo created an issue. See original summary.

robertoperuzzo’s picture

Issue summary: View changes
robertoperuzzo’s picture

robertoperuzzo’s picture

Issue summary: View changes

robertoperuzzo’s picture

@lussoluca I'm not able to fix the phpstan error

$ php vendor/bin/phpstan analyze $_WEB_ROOT/modules/custom/$CI_PROJECT_NAME $PHPSTAN_CONFIGURATION --no-progress || EXIT_CODE=$?
 ------ --------------------------------------------------------------------- 
  Line   src/Attribute/EmbeddingModel.php                                     
 ------ --------------------------------------------------------------------- 
  32     Drupal\search_api_typesense\Attribute\EmbeddingModel::__construct()  
         does not call parent constructor from                                
         Drupal\Component\Plugin\Attribute\Plugin.                            
 ------ --------------------------------------------------------------------- 
 [ERROR] Found 1 error  

Any advice?

lussoluca’s picture

This has been fixed in the latest 1.0.x version

robertoperuzzo’s picture

Assigned: robertoperuzzo » Unassigned
Status: Active » Needs review
robertoperuzzo’s picture

Assigned: Unassigned » robertoperuzzo
Status: Needs review » Needs work
robertoperuzzo’s picture

robertoperuzzo’s picture

I close this issue because we can approach that in two steps:

  1. first, we can use search_api_attachments module to extract the text from PDF (...and not only PDFs see #3519494: Add a plugin for Unstructured.io
  2. then, create the embeddings from the extracted text
robertoperuzzo’s picture

Assigned: robertoperuzzo » Unassigned
Status: Needs work » Closed (outdated)
robertoperuzzo’s picture