Problem/Motivation

I can't get consistent results filter my search based on Taxonomy tags. Some tags will not work and just act to remove the content from the results set.

Steps to reproduce

My search servers is a Milvus DB which is a free account hosted on Zilliz

I have an AI Search Index that is currently indexing 34 pieces of content from two content types.

Both content types have a body field and use the existing field_tags referencing the Tags Taxonomy, There are
19 tags, each piece of content can have multiple tags, most only have a single tag. Content moderation has been disabled on my site. The Content fields indexed are Body and Tags, the Tags are identified as 'filterable attributes' and type Integer.

I have a search view that is using the index, it has a full-text search filter and a content filter on the Tags that are exposed to the user, the filter is a block that is displayed on my home page.

Consistently whilst I have been developing this random tags (I have not identified a pattern) just don't work. Last index rebuild two tags did not work. To test I select a tag in the filter and leave the search box empty, two of the tags return no results after the last index (no results are returned when that tag is selected), If I set the filter to 'Any' and use the text search with something very specific to return a piece of content that is tagged with one of the none working tags I get the expected result, if I then set the filter to that tag that is on the content no results are returned.

It feels random because I have had issues with different tags. Sometimes (not always) I can 'fix' the tag by editing and saving the relevant content or by adding a working tag to it and then removing that tag.

In all cases, investigating the index in the Zilliz playground the field_tags meta field looks correct, regardless of whether the tag is working on the Drupal front-end or not.

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

chris_hall_hu_cheng created an issue. See original summary.

chris_hall_hu_cheng’s picture

Issue summary: View changes
chris_hall_hu_cheng’s picture

scott_euser’s picture

Project: AI (Artificial Intelligence) » Milvus VDB Provider
Version: 1.0.2 » 1.0.x-dev
Component: AI Search » Code
Priority: Major » Normal

You can try setting different types, eg integer instead of string and see if it helps, but Views Filtering support is in any case very limited as filtering options directly in providers like Milvus/Zilliz are not nearly as feature rich as Views from Drupal Core.

Moved to Milvus module since filtering support is specific per provider.

Beyond that you can instead combine with database or solr index and use the Boost with AI Search plugins instead for full Filter support. Downgrading priority given this valid workaround.

tim corkerton’s picture

@scott_euser I can confirm that I am seeing the same kind of random behavior when trying to ai search view using exposed taxonomy terms.
I have added the taxonomy into the search api field list and set "Filterable attributes" as the Indexing option.

This issue however is not confined just to Milvus/Zilliz. I have both a Zilliz account and also a Pinecone account. I am seeing identical behavior using Pinecone. it feel like it is an issue with the search ai module. This issue might be better moved to another thread or at least duplicated in the Pinecone provider.

Scott, I have just watched your video where you discuss the workaround suggested above. https://youtu.be/WZEh4JOGhhM?t=5989 (Great video by the way!) Can you confirm how this approach works? Does it simply prepend results from the ai search to the results of a solr search. If so I don't think that really solves my use case. Any suggestions on how we can help fix this? Can you point to where the exposed filters are handled in the code base?

I forgot to say I have tried different types as suggested above and setting as integer for the filterable attributes seems to be the only one that works

scott_euser’s picture

Filters still work on top of it, but the filtering needs to be in a View using the Search API Database/SOLR backend, not the AI Search backend.

It's impossible maintain (or even achieve in the first place) feature parity with Views Filters, so you really need to use Database or SOLR backend in the first place.

It doesn't prepend, it adds them into the query. Here is a pseudo query example

SELECY *
FROM index
WHERE
( keyword in :search_terms OR entity_id in :results_from_ai_search)
AND status is published
AND exposed filter is example
AND etc
ORDER BY CASE (order from AI Search).., relevance, etc

Ie, keywords it adds an or by filters are still applied. So if you get 10 results from vector database, still 5 of those might get subsequently excluded by filters. BUT those also might not have been found without vector database if there is semantic meaning match but no actual keyword MATCH (solr has a known bug in the queue so isn't 100% like database yet)

In any case if you want to use filterable attributes it does need to be VBD provider issue queue as that's where the code sits for the basic (nowhere near feature parity) of filtering per provider. Pinecone filtering in their API is completely different from Milvus/Zilliz. In Milvus Zilliz code is at MilvusProvider::prepareFilters() attempts to convert the query conditions to Milvus Zilliz API documentation (again, very basic though)

scott_euser’s picture

For those stumbling across this, AI Search has a 'Boost with AI' search api processor plugin you can use on a database or SOLR search that allows a vector database to enhance the ranking [database search + solr search] and breadth of results (ie, finding results that would not otherwise be found) [database search only until #3491446: Solr 'boost' of results should find results that are not found by traditional Solr search is completed].

erykmynn’s picture

I think I ran into this same issue today. It seems like it has to do with cardinality (multiple values vs single). It seems the way Milvus stores multi-value fields. Milvus seems to store values as array only when there is multiple but raw integer ID (in my case) when there is only 1 (even if the field is multi).

What I think is happening is the filtering in views is causing it to only search the multi-value entries if Drupal think's it's multivalued. Should be possible to remedy by having the query search the array or scalar instance.

I plan to work on a patch for this, but would love to know if this description could also explain the prior posters conundrums.

erykmynn’s picture

Status: Active » Needs review
StatusFileSize
new1.52 KB

Alright I believe this is the fix for this, at least what I experienced. Milvus is storing as array only when there is actual multiple values, regardless of field def. So to filter on the multivalue fields we need a fallback to non-array query or it will miss things.

Bit rusty on Drupal patches as I mostly worked in 7.x before, but here's something.

scott_euser’s picture

Status: Needs review » Needs work

Thanks for working on this! Can you make it a merge request please? Hard to review or test like that.

erykmynn’s picture

Status: Needs work » Needs review

Thanks for any patience needed while I stumble through this process. I went ahead and ported to the other branches.

FWIW The errors in 1.0.x and 2.0.x are in files I didn't touch. 1.0.x is a minor linting thing in a yaml file and 2.0.x is the whole AI 2 vs 1 changeover so I think both of those were standing in their branches.