I have created a custom field to store a single fulltext value. However, Search API is automatically creating that field in the Solr schema with the prefix "tm", meaning fulltext multi-valued. This is not what I want, as you cannot do fulltext searching on a multi-valued field.

I believe this issue is related to #2903920: Can not sort on aggregated fields. After looking at the code in SearchApiSolrBackend::getSolrFieldNames(), I see that if a field has no datasource, this method throws an exception and assumes that the field should be treated as multi-valued. My custom field is implemented as having no datasource, thus it triggers the exception and multi-value handling.

As far as I know, custom fields are supposed to be implemented as having no datasource, so I'm not sure how I should address this issue. However, if I'm misunderstanding something and there is a way to implement a custom field using a datasource so that the correct cardinality can be enforced, please let me know.

For now, I am working around the issue using hook_search_api_solr_field_mapping_alter to manually override the Solr schema field name for my custom field.

Comments

mrkdboyd created an issue. See original summary.

mkalkbrenner’s picture

Status: Active » Postponed (maintainer needs more info)

This is not what I want, as you cannot do fulltext searching on a multi-valued field.

What do you mean exactly? I don't think that this is the case.

mrkdboyd’s picture

Yes, you are correct. I think I was moving a little too fast when I wrote that sentence.

Regardless, my use case is that I needed to create a single-value text field so that I could do field boosting based on whether the value in this field matches the search query. And while I was able to create a new custom Solr field without any issue, by default the field was created in Solr as "tm_*", thus multi-valued.

So the real issue is that I could only seem to control single value vs multi-value behavior for my custom field using hook_search_api_solr_field_mapping_alter, but I wasn't sure if there is a better way to go about it.

mrkdboyd’s picture

Status: Postponed (maintainer needs more info) » Needs review
mkalkbrenner’s picture

Status: Needs review » Active

There's no patch to review.

mikemadison’s picture

I ran into this issue recently, but in very specific context (and with an aggregated field).

Use case: create a search results view that contains both content entities and media entities. This works great, but has limitations given the hard separation between fields across entity types. Aggregated fields work as a way to bridge that gap, giving facets and views the ability to filter / sort on data from multiple entities.

My problem occurs with views SORTING oddly, not filtering or viewing fields.
Locally in Drupal VM when running SOLR 5.x, views has no problem sorting on the aggregated field.
Running on a cloud server that has SOLR 4.x, views acts as if there are no results at all. The instant I remove the aggregated field from the sorting on the view, the view returns results.

As far as I can tell, there is no difference in behavior after flushing caches, re-indexing, etc.

I have confirmed with devel that the data IS being indexed into the aggregated field.

Using hook_search_api_solr_field_mapping_alter to rewrite the $fields object to treat the aggregated field as a single instance field instead of a multi-instance field solves the problem.

I would propose that we need some method of flagging aggregated fields as single or multi-value fields (or default to single) so that they function properly in views.

FYI this is, as far as I can tell, an entirely silent failure. Nothing in any logs tripped. We just got no results.

mkalkbrenner’s picture

I would propose that we need some method of flagging aggregated fields as single or multi-value fields (or default to single) so that they function properly in views.

That should not be required. For every multi-valued field, the framework already adds a corresponding single-valued field.
So there must be an error somewhere. Could you please verify that this single-valued field is created in the index and contains data?
If it is filled correctly, please check the query that is sent to Solr regarding the sort parameter. Does it target the correct field?

froboy’s picture

I'm seeing similar behavior: custom fields always get written to the index as multivalued, and sometimes they get duplicated as single and sometimes not. When you say:

For every multi-valued field, the framework already adds a corresponding single-valued field.

do you know where that functionality is taking place? Is it at the Drupal layer or in the solr schema?

I'm on Drupal 8.3.7, Search API Solr 8.x-1.2, indexing to hosted Solr 6.1.0. I have a custom processor and I'm logging $field->getValues() and seeing Array ( [0] => wut wut why? ) but when I inspect the index directly I only see "sm_federated_field":["wut wut why?"], and not the single valued counterpart.

mkalkbrenner’s picture

but when I inspect the index directly I only see "sm_federated_field":["wut wut why?"], and not the single valued counterpart.

The single value counterpart for sorting should be named "sort_federated_field".

mkalkbrenner’s picture

Version: 8.x-1.x-dev » 8.x-2.x-dev
Status: Active » Postponed (maintainer needs more info)

Does the issue still exist in 8.x-2.x?
We need to be precise about the name custom field because there's a defined meaning for custom field types in 8.x-2.x.
Can you describe how you add such a custom filed exactly?

marcvangend’s picture

Status: Postponed (maintainer needs more info) » Active

I am still seeing this behavior. I created a processor following the documentation to index the entity type of the indexed item. My plugin looks like this:

namespace Drupal\my_custom_search\Plugin\search_api\processor;

use Drupal\search_api\Datasource\DatasourceInterface;
use Drupal\search_api\Item\ItemInterface;
use Drupal\search_api\Processor\ProcessorPluginBase;
use Drupal\search_api\Processor\ProcessorProperty;

/**
 * Adds the item's Entity Type to the indexed data.
 *
 * @SearchApiProcessor(
 *   id = "add_entity_type",
 *   label = @Translation("Entity Type field"),
 *   description = @Translation("Adds the item's Entity Type to the indexed data."),
 *   stages = {
 *     "add_properties" = 0,
 *   },
 *   locked = true,
 *   hidden = true,
 * )
 */
class AddEntityType extends ProcessorPluginBase {

  /**
   * {@inheritdoc}
   */
  public function getPropertyDefinitions(DatasourceInterface $datasource = NULL) {
    $properties = [];

    if (!$datasource) {
      $definition = [
        'label' => $this->t('Entity type'),
        'description' => $this->t('The type of the entity'),
        'type' => 'string',
        'is_list' => FALSE,
        'processor_id' => $this->getPluginId(),
      ];
      $properties['search_api_entity_type'] = new ProcessorProperty($definition);
    }

    return $properties;
  }

  /**
   * {@inheritdoc}
   */
  public function addFieldValues(ItemInterface $item) {
    $entity_type = $item->getDatasource()->getEntityTypeId();
    if ($entity_type) {
      $fields = $item->getFields(FALSE);
      $fields = $this->getFieldsHelper()
        ->filterForPropertyPath($fields, NULL, 'search_api_entity_type');
      foreach ($fields as $field) {
        $field->addValue($entity_type);
      }
    }
  }

}

As you can see I already added 'is_list' => FALSE but in Solr I see "sm_entity_type": [ "media" ] where I was expecting "ss_entity_type": "media".

I did not yet run into problems because of this, I just noticed the unexpected outcome.

nsciacca’s picture

I have a similar problem and confirm that on Solr 5 the results are returned whereas on Solr 4 they are not when I'm sorting on my custom processor field. My value is an integer and when it is set to 0 then only the multivalued version of the field is indexed, not the single sort. When it is set to anything other than 0, it properly indexes both the multivalued and single valued.

@mikemadison - can you explain what you did in "Using hook_search_api_solr_field_mapping_alter to rewrite the $fields object to treat the aggregated field as a single instance field instead of a multi-instance field solves the problem." to fix?

mkalkbrenner’s picture

When it is set to anything other than 0, it properly indexes both the multivalued and single valued.

This sounds like a different issue. It might be that we check for empty somewhere to eliminate empty strings and remove 0 integers by accident.
Can you open an issue for that?

mkalkbrenner’s picture

As you can see I already added 'is_list' => FALSE but in Solr I see "sm_entity_type": [ "media" ] where I was expecting "ss_entity_type": "media".

Ok, it turned out that there's an issue in Search API itself! See #3053603: Entity-typed processor properties don’t support isList().
This issue has been fixed now but there's no new release of Search API yet.

Once released we have to integrate the pending patch in #3050475: SearchApiSolrBackend is setting incorrect prefix to Search API reverse entity references.

Therefore this issue here is now a duplicate.

marcvangend’s picture

Thanks @mkalkbrenner, good to know the problem was found and fixed.

Has the dev version of Search API been tested with existing Search API Solr installations? I am concerned that upgrading to the next Search API release (probably 8.x-1.14) will cause problems. When the generated field names (currently sm_*, tm_*, etc) suddenly change (becoming ss_*, ts_*, etc), will the search index be updated accordingly? If not, will searches break because the query tries to select ss_entity_type while the index still contains sm_entity_type? If yes, what if my custom code contains a hard-coded Solr query on the sm_entity_type field which no longer exists?

mkalkbrenner’s picture

Re-indexing should transparently fix any issues.

But if you hardcoded something like 'sm_foo' somewhere in your code, that will break!

I already saw such *wrong* implementations. If you used Search API the common way for customizations, re-indexing should be enough. For example:

  $query->addCondition('foo', ...);

If you can't use a Search API query because you need to customize on a deeper layer like solarium, you should not hardcode as well.
In this case you could call SearchApiSolrBacken::getSolrFieldNames() to get a list of the real Solr field names.

Or more elegant, use Drupal\search_api_solr\Utility\StreamingExpressionBuilder which translates into the real Solr field name:

  $builder->_field('foo');
mkalkbrenner’s picture

If you can't re-index and need to keep your hardcoded stuff for whatever reason, you can force the name mapping by implementing an alter hook:

**
 * Change the way the index's field names are mapped to Solr field names.
 *
 * @param \Drupal\search_api\IndexInterface $index
 *   The index whose field mappings are altered.
 * @param array $fields
 *   An associative array containing the index field names mapped to their Solr
 *   counterparts. The special fields 'search_api_id' and 'search_api_relevance'
 *   are also included.
 * @param string $language_id
 *   The language ID that applies for this field mapping.
 */
function hook_search_api_solr_field_mapping_alter(\Drupal\search_api\IndexInterface $index, array &$fields, string $language_id) {
  $fields['foo'] = 'sm_foo';
}

marcvangend’s picture

Thank you for the quick response and additional tips, Markus! I didn't know hook_search_api_solr_field_mapping_alter yet, good to know it exists although I hope I'll never need it ;-)

I think it's acceptable that hard-coded field names would break, field names are not an API after all. It would be good to mention this in the Search API release notes though: reindexing may be needed, hard coded queries may break.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

iyyappan.govind’s picture

Same problem here also. I am using below versions of Drupal core and modules

Drupal core - 8.7.5
Search api - 8.x - 1.4
Search api solr - 8.x - 3.2

Problem is, I have first name field in user profile. It is a single value field. I have indexed this field as full text ngram. It is indexed on solr as

tm_X3b_en_field_first_name

But it should index like ts_X3b_en_field_first_name

While I am getting Solr fields in Drupal it is showing like this ts_X3b_en_field_first_name. But why is it indexed as multi-value field on solr? Please help me to fix this issue.

Thanks

rahul231’s picture

I am facing same problem.

Problem is that, I have created a date field. It is a single value field. I have indexed this field as full-text. It is indexed on solr as

dt_date_created_1 and ds_date_created_2

But it should index like dt_date_created_1 / ds_date_created_2

While I am getting Solr fields in Drupal it is showing like this dt_date_created_1 / ds_date_created_2. But why is it indexed as a multivalue field on solr? Please help me to fix this issue.

Thanks

artem_sylchuk’s picture

Posting a patch to the closed issue has no that much sense, but I faced exactly the same issue as described by @mrkdboyd.
It wasn't fixed by updating search_api to the most recent version.
I checked the code and as @mrkdboyd said in the original post fields without the datasouce will be always threaded as multi-valued, no matter if is_list is set to TRUE or FALSE.
Here is the obvious patch, but as I'm not very familiar with the module's code I may be doing something wrong (documentation can be improved?).
Hope it may help somebody else looking for the same problem solution.

mkalkbrenner’s picture

Version: 8.x-3.x-dev » 4.x-dev
Status: Closed (fixed) » Needs review
marcvangend’s picture

Thanks for the patch, James. I didn't test the code but I found the nested if-else structure a bit hard to read. May I suggest something like this?

try {
  $datasource = $field->getDatasource();
  $definition = $field->getDataDefinition();
  if ($datasource) {
    $pref .= $this->getPropertyPathCardinality($field->getPropertyPath(), $datasource->getPropertyDefinitions()) != 1 ? 'm' : 's';
  }
  elseif ($definition) {
    $pref .= $definition->isList() ? 'm' : 's';
  }
  else {
    throw new SearchApiException();
  }
}
mkalkbrenner’s picture

Status: Needs review » Needs work

It would be great to have a test for this.

mkalkbrenner’s picture

Status: Needs work » Needs review
StatusFileSize
new1.58 KB

I think the patch in #22 is wrong because it doesn't respect that the property path could overrule the field data definition.

Taking this into account the patch could be more simplified if we remove the "safety net" that should avoid Solr index time exceptions.

mkalkbrenner’s picture

StatusFileSize
new8.73 KB

To maintain BC the behavior should be configurable. The new default will be 'single'. The existing configs will be migrated to use 'multiple' as fallback to not change the existing behavior.

  • mkalkbrenner committed 42afa2a on 4.x
    Issue #2906905 by mkalkbrenner, james_kerrigan, marcvangend, Murz:...

mkalkbrenner credited Murz.

mkalkbrenner’s picture

Status: Needs review » Fixed

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.