Problem
This module does not seem to support multiple languages. The external_id sent to Vragen.ai does contain the language code, but upon retrieving results the module does not take the query language into account.
This means that the results might be in Dutch while the user is currently viewing the site in English. This is especially unfortunate when only a small amount of content on a site is translated.
Steps to reproduce
- On a site with 2 languages
- Index content of both languages
- When searching through content mixed languages are returned
Proposed resolution
The current behaviour within Vragen.ai seems fine, we have the possibility to index content in multiple languages, this is also returned when found. Semantically the language does not matter much.
I propose when transforming the search results back into SAPI items, we take the language into account and filter out results that are in the wrong language. The language could possibly also be sent as metadata to Vragen.ai, so upon requests to Vragen.ai we can pass the current language and filter on it.
Remaining tasks
- Adding languages as standard metadata
- Filtering search results based on languages
User interface changes
We might want to introduce an option for languages in results: return mixed languages, return only strict matches.
API changes
Additional metadata being sent to Vragen.ai by default.
Data model changes
None I can see.
| Comment | File | Size | Author |
|---|---|---|---|
| #5 | 3559026-2.patch | 12.85 KB | jelleglebbeek |
| #2 | 3559026-1.patch | 1.62 KB | jelleglebbeek |
Issue fork search_api_vragen_ai-3559026
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
jelleglebbeek commentedI've created an initial approach to adding language support, attached the patch here.
Comment #4
jelleglebbeek commentedComment #5
jelleglebbeek commentedI've changed the approach to work with metadata and only send content in canonical languages to Vragen.ai. As language does not matter so much for semantic search. This does go from the assumption that a translation is semantically similar to the canonical.
This way we can properly fallback to a default language or only show content in the user's language.
I've attached the new patch and updated the MR.
Comment #6
bbrala