Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Hi there,
I tried to find out if this supports Arabic, but I couldn't. Can you please let me know if this can work just fine with Arabic as well?
Thanks
Comments
Comment #1
mkalkbrennerAccording to http://wiki.apache.org/solr/LanguageAnalysis#Arabic it should work from a solr perspective.
But the current implementation of Apache Solr Multilingual does not support the exchange of a stemmer. I turn that issue into a feature request ...
Comment #2
cspitzlayThat wiki page is marked as "mostly obsolete" (although I expect language support to improve and not to become worse over time).
They suggest looking at the example config file at
http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/solr... instead.
Does not change the fact that the stemmer would need replacing, though.
Comment #3
memoday CreditAttribution: memoday commentedThanks for your replies. When I compare the default solr schema.xlm against the one that comes with the Apache Solr integration module, I see that there is a section for most languages in the default schema file including Arabic. I can search for Arabic words, but if there are diacritics in any words, it doesn't yield any result. Is there an easy way to force Apache Solr to ignore diacritics in Arabic?
Comment #4
memoday CreditAttribution: memoday commentedThanks cspitzlay. As you can see in the example schema file you provided, there is an Arabic section
My question is: if I added this to the schema file that comes with the Apache Integration module, would the Arabic search work fine? I am not sure why languages were removed from the Apache integration schema file.
All what I need for now is the ability to ignore diacritics. Do you think adding this section above would resolve the issue?
Comment #5
cspitzlayThe file I linked to is an example configuration from the solr project. If I understand correctly it's not a suggested default configuration.
It's not like the Apache Solr Multilingual project removed any languages from a standard config.
Apache Solr Multilingual works the other way around.
It extends the schema provided by the Apachesolr Search Integration project which connects Drupal and Apache Solr but which supports only English well.
Apache Solr Multilingual makes it possible to have multiple languages at once and to tune things like stop words on a per-language basis.
So Apache Solr Multilingual actually adds multilingual features to a monolingual and somewhat hard-coded integration.
Your request made it clear that there are still some missing features, though. That's why mkalkbrenner accepted your issue as feature request.
What might work for you:
Configure arabic via the Apache Solr Multilingual interface, for example add stop words you need, then *change* the generated schema to have the above filter configuration.
Comment #6
memoday CreditAttribution: memoday commentedThanks Christian for your reply and sorry for my belated one.
I will be available for testing the Arabic integration once it is available.
Thanks for your help!
Comment #7
mkalkbrennerThis issue will be solved by #2361393: Stemmers supported in Solr 3.x and updated stopwords