Problem

This module does not seem to support multiple languages. The external_id sent to Vragen.ai does contain the language code, but upon retrieving results the module does not take the query language into account.

This means that the results might be in Dutch while the user is currently viewing the site in English. This is especially unfortunate when only a small amount of content on a site is translated.

Steps to reproduce

  • On a site with 2 languages
  • Index content of both languages
  • When searching through content mixed languages are returned

Proposed resolution

The current behaviour within Vragen.ai seems fine, we have the possibility to index content in multiple languages, this is also returned when found. Semantically the language does not matter much.

I propose when transforming the search results back into SAPI items, we take the language into account and filter out results that are in the wrong language. The language could possibly also be sent as metadata to Vragen.ai, so upon requests to Vragen.ai we can pass the current language and filter on it.

Remaining tasks

  • Adding languages as standard metadata
  • Filtering search results based on languages

User interface changes

We might want to introduce an option for languages in results: return mixed languages, return only strict matches.

API changes

Additional metadata being sent to Vragen.ai by default.

Data model changes

None I can see.

CommentFileSizeAuthor
#5 3559026-2.patch12.85 KBjelleglebbeek
#2 3559026-1.patch1.62 KBjelleglebbeek
Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

jelleglebbeek created an issue. See original summary.

jelleglebbeek’s picture

StatusFileSize
new1.62 KB

I've created an initial approach to adding language support, attached the patch here.

jelleglebbeek’s picture

Assigned: Unassigned » jelleglebbeek
Status: Active » Needs review
jelleglebbeek’s picture

StatusFileSize
new12.85 KB

I've changed the approach to work with metadata and only send content in canonical languages to Vragen.ai. As language does not matter so much for semantic search. This does go from the assumption that a translation is semantically similar to the canonical.

This way we can properly fallback to a default language or only show content in the user's language.
I've attached the new patch and updated the MR.

bbrala’s picture

Status: Needs review » Fixed

Now that this issue is closed, review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, credit people who helped resolve this issue.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.