solr 5.x teaser fix when using i18n entity config from apachesolr_confgen

I used the config from apachesolr_confgen to get my i18n schema.xml (helped a lot thanks to apachesolr_confgen)

however in the results page I am using custom_search which outputs a teaser text. The teaser works in english but doesn't work in the other language(s) (in my case French).

I made this patch and it's working for me.
Quite simple, it works now in english AND french using the "teaser" field.

this might also be the case with other versions of solr (4.x) , in any case, this is what I did and it worked.

see patch.

CommentFileSizeAuthor
#1 solr5x_teaser-2538774-1.patch1.28 KBjoseph.olstad
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

joseph.olstad’s picture

Title: solr 5.x teaser fix when using i18n entity config from apachesolr_confgen » teaser fix
Issue summary: View changes
FileSize
1.28 KB
joseph.olstad’s picture

To test this use case

1) setup the apachesolr_multilingual module and dependancies (in my test I was using custom_search for the search results , not sure if that makes a difference),
2) make sure you've got entity translation/field translation enabled for your content type used for testing
3) translate your test node from source language (say English) to another language (test language)
4) index your content, then check the search results in english, you'll see the teaser
5) change to "test language" and search for your test node, you'll see it in the results list but teaser text is just "..." so the teaser is missing!!!

To fix this and test: apply the patch, DELETE YOUR SOLR INDEX then repeat steps 1 through 4, then on step 5 you will see teaser text working for "test language"

enjoy

mkalkbrenner’s picture

Status: Needs review » Postponed (maintainer needs more info)
+++ b/apachesolr_multilingual.module
@@ -274,9 +274,17 @@ function apachesolr_multilingual_copy_common_to_i18n_fields($src_document, $dst_
+        $dst_document->{$field_name} =
...
+        truncate_utf8($src_document->{'content'}, 300, TRUE);

In your setup $src_doument->teaser doesn't seem to be set.
Therefor it looks like the wrong place to fix that issue.

I don't see why this problem should be related to solr 5.x. Please tell us more about your setup.

joseph.olstad’s picture

Status: Postponed (maintainer needs more info) » Needs review

@mkalkbrenner I updated the issue title and comments to clarify a bit more the use case.

we're using field translation , not node translation, if we were using node translation maybe then the teaser might work but we're not. I'm thinking this was just an easy oversight, something missed during QA/Testing or perhaps a regression that caused this issue.

most everyone these days would want to use field translation as we are in our use case, so we need to read the translation from the field_body_data when language is (french or german or dutch or swedish or spanish instead of the default language). The teaser might work unpatched using node translation but I haven't tested this module with node translation recently. In node translation node is always the same language and the fields language don't matter because it'd got it's own entity_id with one language pointing to the node language. With Entity translation / Field translation fully enabled the node language is always the source language so we cannot base all our field reads on the node language because we'll miss the other languages.

When using field translation /entity translation it's the fields that have language, field translation does not use tnid in these cases nor does the node language really matter anymore (in fact we should not use the node language for much of anything at this point).

This patch works when using field translation mode. While a more elegant fix might be somewhere else or maybe is to rewrite the apachesolr module as well, the patch I created does do the job without having to touch other apachesolr modules. The way I figure , only those using apachesolr_multilingual really care that much about multilingual.

joseph.olstad’s picture

In my tests prior to the patch, the teaser worked only in the source language, it doesn't work in other languages until you apply the patch. (field translation mode is what my multilingual settings are using and enabled to use for the content type structure)

joseph.olstad’s picture

from my investigation of the code, the problem is up in the apachesolr module ignoring the field language on the node when it generates the teaser for the index from the "document" assuming it's the node language that matters when in fact the node language doesn't help at all, apachesolr_multilingual must deal with this grabbing the field values by the appropriate language, and previously it didn't handle the teaser except for the default language.

so as far as my investigation is concerned, the patch is required. the patch generates a teaser (missing except on source language) based on the content field for the appropriate language.

joseph.olstad’s picture

Title: teaser fix » teaser fix for languages other than "source language"
mkalkbrenner’s picture

Status: Needs review » Needs work
  1. +++ b/apachesolr_multilingual.module
    @@ -274,9 +274,17 @@ function apachesolr_multilingual_copy_common_to_i18n_fields($src_document, $dst_
    +        //Solr 5.x with this module apachesolr_multilingual, apachesolr_search, custom_search added this to get the teaser.
    

    Can you confirm that this issue isn't related to solr 5.x at all.

  2. +++ b/apachesolr_multilingual.module
    @@ -274,9 +274,17 @@ function apachesolr_multilingual_copy_common_to_i18n_fields($src_document, $dst_
    +        truncate_utf8($src_document->{'content'}, 300, TRUE);
    

    I don't think that this hard-coded creation of a teaser is correct. In apachesolr_convert_entity_to_documents() this is only the fallback! There're different ways to build a teaser we have to deal with.

mkalkbrenner’s picture

function apachesolr_index_node_solr_document(ApacheSolrDocument $document, $node, $entity_type, $env_id) {
  
  ...

  // Build the node body.
  $language = !empty($node->language) ? $node->language : LANGUAGE_NONE;
  $build = node_view($node, 'search_index', $language);
  // Remove useless html crap out of the render.
  unset($build['#theme']);
  // Allow cache if it's present
  $build['#cache'] = true;
  // Render it into html
  $text = drupal_render($build);
  $document->content = apachesolr_clean_text($text);

  // Adding the teaser
  if (isset($node->teaser)) {
    $document->teaser = apachesolr_clean_text($node->teaser);
  }
  else {
    // If there is no node teaser we will have to generate the teaser
    // ourselves. We have to be careful to not leak the author and other
    // information that is normally also not visible.
    if (isset($node->body[$language][0]['safe_summary'])) {
      $document->teaser = apachesolr_clean_text($node->body[$language][0]['safe_summary']);
    }
    else {
      $document->teaser = truncate_utf8($document->content, 300, TRUE);
    }
  }
  
  ...

}

The teaser field creation in apachesolr.index.inc seems to be correct. Please note that this module doesn't deal with entity translation at all. But apachesolr_multilingual already deals with that. It's most obvious in the context of CLIR:

// CLIR for Entity Translation.

// Temporarily switch the language context for apachesolr_index_node_solr_document(),
// which is not aware of entity_translation.
$entity->language = $langcode;
$language_content = $languages[$langcode];
apachesolr_multilingual_index_node_translation($document, $entity, $env_id);
$entity->language = $original_langcode;
$language_content = $languages[$original_langcode];

But the same logic is applied later in function apachesolr_multilingual_apachesolr_index_documents_alter():

foreach ($additional_documents_langcodes as $langcode) {
  
  ...
  
  // Temporarily switch the language context for apachesolr_index_node_solr_document(),
  // which is not aware of entity_translation.
  $original_langcode = $entity->language;
  $entity->language = $langcode;
  $language_content = $languages[$langcode];

  ...

}

I checked some sites that use entity translation. The teaser is indexed correctly for each language.
So I assume that some other module is causing the trouble or we missed a use-case in apachesolr_multilingual_apachesolr_index_documents_alter().
Please have a look at that code and debug it in your environment. Your current patch only fixes the symptom in your environment in a non-generic way.

joseph.olstad’s picture

Hi @mkalkbrenner thanks for looking into this issue.

You've pointed out the code snippets upstream in apachesolr that cause the teaser problem

function apachesolr_index_node_solr_document(ApacheSolrDocument $document, $node, $entity_type, $env_id) {
  
  ...
// Build the node body.
  $language = !empty($node->language) ? $node->language : LANGUAGE_NONE;
...
 else {
    // If there is no node teaser we will have to generate the teaser
    // ourselves. We have to be careful to not leak the author and other
    // information that is normally also not visible.

Here in this code snippet the $language is always going to be the "source language" because the node only has one language in node translation. In field translation there is only one node but with 1 or more languages for the fields.

apachesolr_multilingual must fix this problem, that is why I created the patch.

if you look at the code in : apachesolr_multilingual_copy_common_to_i18n_fields
it's copying fields that exist, but the teaser field is an optional field ( see the code comments in apachesolr regarding the teaser when it doesn't exist they create it from content), it's a field created in apachesolr based on the content field, it has no field translation and the teaser we do see is only created for the "source language" because apachesolr is unaware of field translation as seen in the above code snippet.

This is why we have to take the field value for "content" and generate a teaser using the truncate_utf8 function as does apachesolr. We have to do this because in drupal the teaser is not a field. Content is a field, but not teaser, so there is no field value for teaser, we have to create it based on the field value for "content".

My patch does exactly this.

I don't think this is a solr 5.x problem. Have you tested this in solr 4.x?
can you confirm that the teaser is outputted in the non-source languages?

According to all the dpm and debug that I did on the $dst_document array the teaser was not there except for the "source language", but the content field was , so that's why I recreated the teaser from the content field, it only exists in the "source language".

joseph.olstad’s picture

so in my case, I have no node teaser at all (not enabled in the content type), apachesolr creates it by using the node content field, but only in the source language.

snippet from apachesolr:

...
 else {
    // If there is no node teaser we will have to generate the teaser
    // ourselves. We have to be careful to not leak the author and other
    // information that is normally also not visible.

so then apachesolr_multilingual_copy_common_to_i18n_fields tries to grab the teaser that doesn't exist except in the source language as created by copying/truncating the content as done in apachesolr_index_node_solr_document function in apachesolr

the patch above copies/truncates the i18n content field in apachesolr_multilingual_copy_common_to_i18n_fields to the empty teaser

make sense?

joseph.olstad’s picture

Status: Needs work » Needs review

I'm not sure what work needs to be done, maybe someone can test the patch out with 2 or more languages enabled in ET mode when the teaser is not enabled (generated teaser by apachesolr / or apachesolr_multilingual with the patch instead).

mkalkbrenner’s picture

Status: Needs review » Needs work

Please read again through my comment #9.

But I try to explain it again.

Here in this code snippet the $language is always going to be the "source language" because the node only has one language in node translation. In field translation there is only one node but with 1 or more languages for the fields.

That's absolutely correct! And we both agree on the problematic part in apachesolr module itself:

// Build the node body.
$language = !empty($node->language) ? $node->language : LANGUAGE_NONE;

But we already deal with that in apachesolr_multilingual to support entity or field translation. We iterate over all (entity) translations and "fake" the source language dynamically:

// Temporarily switch the language context for apachesolr_index_node_solr_document(),
// which is not aware of entity_translation.
$entity->language = $langcode;

This code runs before function apachesolr_index_node_solr_document(). So in combination with apachesolr_multilingual the "source language" isn't static anymore.

Please don't get me wrong. You definitely encountered an issue in your setup, but the patch is simply wrong. We need to figure out why the already existing teaser creation for entity translation doesn't work instead of just creating it again.

joseph.olstad’s picture

Hi mkalkbrenner, I believe the answer is right here in this code snippet from apachesolr:

   if (isset($node->body[$language][0]['safe_summary'])) {
      $document->teaser = apachesolr_clean_text($node->body[$language][0]['safe_summary']);
    }
    else {
      $document->teaser = truncate_utf8($document->content, 300, TRUE);
    }
  }

In my configuration, safe_summary is disabled.

apachesolr then takes the $document->content (truncates it to length 300) and then copies it to $document->teaser

safe_summary has a $language but $document->content somehow ends up empty (notice in the code snippet above there's no $language "key" (empty for documents of language other than source language, according to my observations).

So as you say as we "iterate over all (entity) translations and "fake" the source language dynamically" , there is no $language key used or "faked" in this case. Hence the need for the patch to make this work when the "safe_summary" is disabled on the content type.

So probably I should add this explanation to the issue summary or issue title. The teaser problem arises when "safe_summary" is not set (disabled on the content type structure).

ok, so I can see how this became a bit of an edge case as "safe_summary" I believe is a content type structure configuration setting, perhaps in my patch we could add a check for the empty teaser value so that the execution path only changes when teaser is empty or a zero length string. I'd have to re-run the debug again to see what my actual value was (zero length string or an empty array or something like that. Unfortunately I'm on vacation and have no access to my dev box where I have this set up.

Does this help?