When indexing multilingual content (pages) created using the i18n module (using language prefixes), both the site and url field in the Solr index will be wrong, they always point to the node in the default language instead of the language specified for the node in question. This can be fixed by explicitly passing the entity language in all url calls in _apachesolr_index_process_entity_get_document (in apachesolr.index.inc).

I changed it as follows (might not be the most performant way to do this, but it seems to work) :

function _apachesolr_index_process_entity_get_document($entity, $entity_type) {
  list($entity_id, $vid, $bundle) = entity_extract_ids($entity_type, $entity);

  $document = new ApacheSolrDocument();
  $languages = language_list();
  $urlOptions = array('absolute' => true);
  if (!empty($entity->language)) {
    $urlOptions = $urlOptions + array('language' => $languages[$entity->language]);
  }
  $document->id = apachesolr_document_id($entity_id, $entity_type);
  $document->site = url(null, $urlOptions);
  $document->hash = apachesolr_site_hash();

  $document->entity_id = $entity_id;
  $document->entity_type = $entity_type;
  $document->bundle = $bundle;
  $document->bundle_name = entity_bundle_label($entity_type, $bundle);

  $path = entity_uri($entity_type, $entity);
  // A path is not a requirement of an entity
  if (!empty($path)) {
    $document->path = $path['path'];
    $document->url = url($path['path'], $path['options'] + $urlOptions);
  }
  if (empty($entity->language)) {
    // 'und' is the language-neutral code in Drupal 7.
    $document->language = LANGUAGE_NONE;
  }
  else {
    $document->language = $entity->language;
  }

  // Path aliases can have important information about the content.
  // Add them to the index as well.
  if (function_exists('drupal_get_path_alias')) {
    // Add any path alias to the index, looking first for language specific
    // aliases but using language neutral aliases otherwise.
    $output = drupal_get_path_alias($document->path, $document->language);
    if ($output && $output != $document->path) {
      $document->path_alias = $output;
    }
  }
  return $document;
}

Comments

nick_vh’s picture

Status: Active » Postponed (maintainer needs more info)

Can you highlight the changes or post a patch with your changes?

How to make a patch -> http://drupal.org/node/707484

wimvds’s picture

Will do. I fixed a small issue related to this on the results page as well (urls were regenerated there instead of using those already stored in the Solr index, causing the same issue).

wimvds’s picture

nick_vh’s picture

Status: Postponed (maintainer needs more info) » Needs review
StatusFileSize
new2.3 KB

Updated the code to be a bit more robust and Drupal API friendly

nick_vh’s picture

StatusFileSize
new2.29 KB

Whitespaces issue

nick_vh’s picture

Status: Needs review » Patch (to be ported)

Does this apply to 6.x-3.x? I'm sure i18n is not as developed in D6 compared to D7?

nick_vh’s picture

Version: 7.x-1.x-dev » 6.x-3.x-dev
nick_vh’s picture

Status: Patch (to be ported) » Needs review
StatusFileSize
new2.18 KB
nick_vh’s picture

Status: Needs review » Fixed

fixed

nick_vh’s picture

Status: Fixed » Closed (fixed)

closing to clear out the issue queue a bit

marc angles’s picture

Version: 6.x-3.x-dev » 7.x-1.1
Status: Closed (fixed) » Active

Hi,
I'm running 7.x-1.1.

I still don't have the right url to the non english nodes. When a non-english node appears in the results the url displayed for it is the url of the source (english) node.

In brief I still do not have the right url in search results...

Is this supposed to work correctly ? If so, I'll look elsewhere.

Thanks

nick_vh’s picture

Status: Active » Closed (works as designed)

This is supposed to work correctly. It could be that you work with a different translation system?

If you can reproduce this problem, please tell us what you are working with and give us as much information as possible. I'd also check out the apachesolr_multilingual project to see if they have some additional information. I'm closing this, please open a follow up ticket.

pwolanin’s picture

Title: Site and url wrong when indexing multilingual content (using i18n module) » Site and url wrong when indexing multilingual content (using i18n module) - causes REGRESSION
Version: 7.x-1.1 » 7.x-1.x-dev
Status: Closed (works as designed) » Active

this is a serious regression use from index the absolute url - it screws up anyone who might index on http, and view on https, or index on a back-end server.

pwolanin’s picture

Status: Active » Needs review
StatusFileSize
new2.47 KB

this rolls back part of the change, and renders the URL from the path at display time, which is the only correct way.

nick_vh’s picture

Good to go - this also fixes https://drupal.org/node/1852088

cilefen’s picture

#14 works for me in terms of returning https links when searching on https.

pwolanin’s picture

Status: Needs review » Fixed

committed to 7.x and 6.x-3.x

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

pwolanin’s picture

doh, that was wrong too - the common schema won't index $doc->language

pwolanin’s picture

Status: Closed (fixed) » Needs review
StatusFileSize
new3.56 KB
pwolanin’s picture

StatusFileSize
new3.55 KB
pwolanin’s picture

Status: Needs review » Fixed

committed

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.