Problem

For certain common short (one, two, or three word) searches, the "best" or "most important" result doesn't appear first or possibly not even in the first page of results.

Current Status

Much improved - per [#73] many of our exemplar searches are now returning the desired result.

We are currently evaluating additional configuration tweaks to improve this further.

Input / Expected results

To assist with fixing this issue, the D.o search maintainers need a variety of example searches - both searches that are working now (so we can test to make sure those do not regress) and searches which do not yet work.

For each example search we need:

  • The search terms
  • The ideal result
  • Any 'okay' results (not ideal, but still good results)
  • Any 'bad' results that might currently be dominant for that term

Those search terms will be added to this script which is used to evaluate the changing rankings of search term results:

http://cgit.drupalcode.org/infrastructure/tree/Misc/site-search-test.php

Please provide your suggestions as comments to this issue, or as patches for this script.

Discussion / Solutions

DA Staff Recommendations

Recommendation Issues
Tune our biases #2566617: Review our site search biases


#2566587: Make machine name Solr bias configurable
Favor Exact Matches #2558663: Favor exact title matches in site search
Remove Search refinements from d.o header (use facets on search page) needs research/issue
Each D.O content type should have a well-designed search display mode
Index important content #2584011: Include api.drupal.org in Drupal.org search results

Include events.drupal.org results in search
Include jobs.drupal.org results in search
Include important taxonomies/views in the index
Create a blacklist for known undeseriable results needs research/issue
Conside including a 'is this the result your looking for?' function needs research
Update the do-not-stem list as needed
Use elevations(sparingly) as needed
Add synonyms(sparingly) as needed

Possible new ranking factors

  • Project has a current release version (7.x, or shortly, 8.x)
  • Project usage stats - more widely used projects should rank higher
  • Forum topics and some other content should potentially lose relevancy in a more dramatic way a certain amount of time after being posted (regardless of last comment time, since that may be spammy or off-topic?)
  • See comment #33: node rendering needs to look for the search_index view mode to avoid adding garbage keywords like "View" to each node.

Incorporate Populatrity?

We could add the "popularity" field to solr index, and then boost it in the search result or make it a sorting criteria. "popularity" can be defined by the number of clicks collected by Google Analytics or the "access log" module.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

rfay’s picture

Thanks for following this up!

rfay’s picture

I guess I have to ask why, even without popularity:

Why does a search for "Coding Standards" fail to find the single most important page, whose title is "Coding Standards" and whose URL is /coding-standards?

Dave Reid’s picture

Adding URL alias to the index with a high ranking should help right? We don't have statistics.module enabled on d.org so I'm not sure how we'd get popularity indexed.

danithaca’s picture

If statistics.module is not enabled, we can still get popularity data from Google Analytics using the GA Data Export API. I believe there's already a module that can do that.

danithaca’s picture

oh, forgot to mention that i'm working on another issue #479812: Remove "Related projects" block until it provides relevant suggestions that will retrieve some GA data to improve "related projects" recommendation. If needed, I can write some customized code that dump GA popularity data into some d.o. database table, which can be then feed into the Solr index.

cyberswat’s picture

Couldn't you just modify the content bias settings so that the content type page takes precedence over issues. Maybe combine that with a field bias that gives the h1 tag a little more relevance. Either adjustment should produce the desired result for this use case.

rfay’s picture

It seems like we should be using all of these: path, title, H1 tags. Any one of these would help this case. All of them seems like it would give much better results.

rfay’s picture

Title: add "popularity" to solr index/sorting on d.o. » Drupal.org search (solr) fails an enormous number of simple searches
Priority: Normal » Critical

I think this is a really critical usability issue on drupal.org.

Today I tried searching for the great handbook page on Clean URLs. I unfortunately tried searching for it on Drupal.org itself, and was no way able to find the page. (The page is http://drupal.org/node/15365 and its title is "Clean URLs".

The search for "Clean URLs" on drupal.org (http://drupal.org/search/apachesolr_search/Clean%20URLs) retrieves an enormous clutter of useless posts.

Google gives us the correct page as #1 with no effort.

Should we just use google search? If not, we should come up with a way to at least find relevant information with the vaunted solr search.

I suspect that if we made a list of the 10 most important searches on Drupal.org, the d.o search would not return useful results for very many of them.

pwolanin’s picture

Has no one commenting on this thread looked at the Apache Solr module admin interface?

You could give a big weight to url alias for example, or increase the weight of the title. These settings surely need to be tuned for any specific site, and the complaints here suggest the first increments.

Dave Reid’s picture

Title and URL alias already have the highest values possible (21.0). Not sure what else we can do?

pwolanin’s picture

I just tweaked the settings a little (url alias was not considered before) - but an interesting effect there is that we chose to set omitNorms="true" in for the title field in schema.xml. If we set this to false, a short title that matched exactly would get a much bigger boost. Perhaps it would be worth comparing the effect of setting this to normed - the only normed field currently is the body. A changed would require reindexing.

pwolanin’s picture

I also just set the url alias of that page to http://drupal.org/getting-started/clean-urls

I set the Apache Solr boosts to add a score for matching the url alias, so that would also tend to bring the correct pages to the fore.

pwolanin’s picture

Note - we can also reduce the score for the body field relative to others. I changed it from 1.0 to 0.5.

Since the body is the only normed field, it actually has its score multiplied by 40x additionally by our module. I think (maybe) what this means is that matching one word in a 40 word body is the same as matching any word in a title, but still a bit fuzzy about the Solr internals. I jsut picked this 40x scale as a fast and dirty heuristic.

Damien Tournoud’s picture

The main issue is that Apache Solr boosts recently created pages. For some reason, I'm not able to tweak that parameter.

pwolanin’s picture

Note for per-content type biases:

read the description: "Any value except Ignore will increase the score of the given type in search results."

pwolanin’s picture

tweaking some more - seems that biasing by more recent comment/update is the trick + url alias biasing that brings it to the top of the search with no facet filtering.

pwolanin’s picture

@DamZ - I set "More recently created: " bias to "Ignore"

pwolanin’s picture

relates also to the discussion here w.r.t the redesign: http://drupal.org/node/665722#comment-2452900

Seems like we need to consider additional fields that can be used for weighting - such as how many children a book page has? or something like the "sticky" toggle that can provide a big additional boost?

rfay’s picture

This is *so much* better.

I just searched for "Module Developer" and the #1 hit was the "Module Developer's guide".

apaderno’s picture

I agree; it is much better now. I tried some searches that first didn't get the most intuitive result, and now they do.

rfay’s picture

Status: Active » Fixed

Excellent work on this. A huge improvement to d.o search with this.

Congrats and thanks.

Marking fixed,
-Randy

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

adanielyan’s picture

Issue summary: View changes

I believe this issue should be re-opened. Searching such a basic thing as "views" doesn't return the Views module or documentation page in results. https://drupal.org/search/site/views

The google search on the other hand returns Views module page as a first result: https://www.google.com/#q=site:drupal.org+views

adanielyan’s picture

Status: Closed (fixed) » Active
pwolanin’s picture

We need more than 1 example of what's "wrong".

projects already have a high bias, but a simple word like Views is apparently hard to match unless we up that bias even more or use some other boost for popular projects.

adanielyan’s picture

Here are more examples.

Searching for "display suite module" doesn't return the module page (but the group page instead): https://drupal.org/search/site/display%20suite%20module

Same for CKEditor: https://drupal.org/search/site/ckeditor%20module

There are bunch of other examples, but the point is that the search doesn't really work as most of users would expect it to work. I don't think the issues should be solved on case by case basis, but rather the fundamental change should be made to the way the content is weighted.

drumm’s picture

Since titles and projects are already boosted 21x, I think the next steps are along the lines of:

- Boost on exact title match
- Boost field_project_type == full (instead of sandbox)
- Boost on project maintenance taxonomy, for example Maintenance status == Actively maintained
- Boost on project usage

drumm’s picture

We also need to collect specific searches & expected results so we can measure the effectiveness of changes.

rfay’s picture

IMO this kind of search optimization should be a perpetual task assigned to somebody in the DA group running d.o. It's a never-ending problem and is not going to go away and is really quite important.

pwolanin’s picture

Also - we should possibly lower some of the boots (a lot are at 21) and/or use vset to make the key ones even higher.

Sorting out the relative boosts impact is not always easy - probably need debug output.

@rfay - ya, I agree. Or at least we should have a set of scenarios describing input and expected results (or at least ype of resultS) so we understand what is wanted.

rcross’s picture

drumm’s picture

The current Solr config is on any dev site, https://drupal.org/node/1018084. All dev sites are set up to connect to a full read-only Solr index, so debugging can be turned up and configuration changes previewed.

(This likely isn't an indexing problem. If it is, we can spin up a r/w Solr index.)

pwolanin’s picture

So, davidhernandez and I found one possible problem that makes Views module in particular had to find. This is sort of a bug in project_release module

http://cgit.drupalcode.org/project/tree/release/project_release.module#n982

function project_release_node_view() should not append links - especially not 'View all releases' to the node if the view mode is 'search_index'

This View string is matching every project when someone looks for Views!

pwolanin’s picture

re: #27 How about excluding modules that don't have a 7.x release and e.g. forum topics older than a certain age?

For integrating BDD testing, if you want a more readily parse able search result, you can take a look at what I did at: https://www.drupal.org/sandbox/pwolanin/2134321

That was for a POC of integrating with https://quepid.com/secure/#/

pwolanin’s picture

Using commongrams might also help since you could increase the number of stop words: http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-comm...

drumm’s picture

Issue summary: View changes
Issue tags: +Needs issue summary update

To really improve this, we need to be a bit more methodical. I started a Search / Expected result table in the issue summary. Having a summary will be much easier than reading through 30 comments. Later, we can automate BDD testing or even use something like https://quepid.com/secure/#/.

webchick’s picture

Issue summary: View changes

Adding a bit more info to the issue summary for "outsiders."

webchick’s picture

Also, tweeted here: https://twitter.com/webchick/status/480745059164233729 This seems like something that should also be broadcast via @drupal or similar channels to get better visibility.

pwolanin’s picture

Issue summary: View changes
pwolanin’s picture

Issue summary: View changes
pwolanin’s picture

Issue summary: View changes
pwolanin’s picture

Issue summary: View changes
Dave Reid’s picture

Issue summary: View changes
Dave Reid’s picture

Issue summary: View changes
Dave Reid’s picture

Issue summary: View changes
drumm’s picture

Issue summary: View changes
drumm’s picture

Issue summary: View changes

#33 is implemented and re-indexing. The result count at https://www.drupal.org/search/site/views is gradually going down as projects are indexed.

It looks like the examples so far are in 2 categories - documentation & modules. Documentation so far is okay, we can use it to set up automated monitoring. Modules are where work can be concentrated.

Component: Solr » Servers
mgifford’s picture

Issue summary: View changes

@drumm - I'm sure this is done indexing. Can you confirm that the solution from #33 is implemented?

Views still isn't on page one of this search https://www.drupal.org/search/site/views

Can't we just make a given search (like "Views") show a given result at the top. Something like what Defacto Search offers. There must be similar solutions to look at.

I've checked that the summary is still accurate.

drumm’s picture

#33 is implemented. That comment was about removing irrelevant results for "views" because "View all releases" was being indexed all the time. It was not about making sure Views was showing as high as it should be.

As I said in #36, this needs systematic improvement. If we artificially pin the 9 examples in the issue summary to the top, we have not solved the problem.

webchick’s picture

So what are the next steps then, and how can we help? This was one of the top 3 pain points in the ideation process last year, it came up in user research, it grates on contributors who use the site every day, etc.

mgifford’s picture

We have to be able to take small, iterative steps to improve drupal.org. So much of the time it seems issues like this sit for years (this one is coming up to 5 years now), while folks wait for a perfect solution. Perfect is the enemy of the good. We might be able to wait for perfect for some elements of Core, but drupal.org isn't like that and we have to be able to experiment.

Good to know that #33 was specific about that phrase.

We can start right now to implement those 9 examples. We can look at the server logs to determine common searches and ensure we've got good results for the top 50 requests. We can look out to common themes & modules and ensure that if there are child projects that we always pin the parent as the top result. We can ask the community for examples of what they think should be a logical first hit for common requests that they make.

Leaving it as it is though is just a really bad option. It discourages everyone.

If we have a means to fix these first few known issues, we can then set up a pattern to repeat it.

This is a solvable problem. Others in the community have dealt with it before. Let's not be afraid to try something out on this old issue and see if we can't improve it.

Nick_vh’s picture

I suggest as a small iterative step to install apachesolr_proximity. (https://www.drupal.org/project/apachesolr_proximity)

This module applies proximity boosting to Solr searches so that the distance between two or more terms is factored into the relevancy calculation. For example, a search for "blue drop" will rank documents higher that have the terms "blue" and "drop" next to each other in to source text than documents where the words are far away form one another.

When used in conjunction with the OR operator, which is the default setting used by the Apache Solr Search Integration module, the relevancy of documents returned in search results is likely to be improved.

I highly suggest we install this on drupal.org and take some example searches and see what results they have.

As a follow up, we should most likely add the module name as a separate field, use it to search on and boost that field. We also need to look at the search_index display mode and make sure there is no useless information in there that we don't want to appear in the index.

Secondly, I suggest we take a look at quepid platform. We can use this to quantitatively check if our tweaks that we do at for example, synonyms and protwords.txt but also other config files do not break the expected results. A presentation on the topic can be found here: http://www.slideshare.net/DougTurnbull2/test-driven-search-relevancy
and more information here: http://opensourceconnections.com/blog/2013/10/07/quepid-give-your-search...

drumm’s picture

Title: Drupal.org search (solr) fails an enormous number of simple searches » Drupal.org site search (solr) fails on some searches

we should most likely add the module name as a separate field

We have the machine name as a separate field, field_project_machine_name. The human-readable name is the project node's title property.

The "Needs issue summary update" tag still applies. Potential improvements should be in the issue summary, so they don't get lost in the comments again. (As well as more example searches.)

webchick’s picture

If more human input is needed in order to fix this, what about adding a link to this issue on all search results pages saying "Were these results helpful? Help us improve here: XXX"

Else it seems like it'd be possible to look at analytics for this. Figure out the top N search terms (https://www.drupal.org/admin/reports/search has some data but it seems to be totally wrong; maybe it's only user search data?), and add them to the table and let people fill in the blanks, maybe?

mgifford’s picture

@webchick I think we need either of these to be installed/enabled to get valuable analytics:

https://www.drupal.org/project/search_log
https://www.drupal.org/project/apachesolr_stats

That would help with the most common problems. Particularly if it were reviewed every few months.

Like the idea of a webform that would allow us to track what folks searched for and what they had hoped to find. That would be a big step forward and allow us to take on some of the long tail searches.

joshuami’s picture

Issue tags: +drupal.org search
webchick’s picture

Title: Drupal.org site search (solr) fails on some searches » Drupal.org site search (solr) fails on the most common searches
Category: Feature request » Bug report
Issue summary: View changes
FileSize
84.05 KB
60.95 KB

Got annoyed again today trying to use Drupal.org search, so taking my own suggestion at #55, here are the top search terms on Drupal.org, at least according to Google Analytics (Drupal's "top searches" report is useless because Solr searches bypass it):

Series of module searches

Interestingly, every single one of them corresponds to a module name, with the exception of "responsive" (most likely). And yet:

- https://www.drupal.org/search/site/views - Views module is nowhere on the first page. (The first result is "View" module.. talk about confusing.)
- https://www.drupal.org/search/site/ctools - CTools module is at least on the list, but not until about halfway down. Also for whatever reason the description is truncated towards the end? so it says:

Ctools weird description truncation.

...which would never present itself to me as a new user as the thing I was supposed to click on. Another swing and a miss.

- https://www.drupal.org/search/site/webform - Webform module nowhere on the first page.
- https://www.drupal.org/search/site/bootstrap - First result is Twitter Bootstrap 3.0 which you'd think was good, except when you look and find out it's a) an unapproved sandbox project last touched in 2013 and b) actually a theme engine, not a theme. Presumably people coming here want https://www.drupal.org/project/bootstrap which like Ctools is about mid-way down and has a weirdly truncated description.
- https://www.drupal.org/search/site/ckeditor - Halfway down the first page.
- https://www.drupal.org/search/site/entity - Entity API module nowhere on the first page. One could argue that maybe they're looking for documentation about entity instead of the module, but all 10 results are all theme/module projects so the results still don't help.
- https://www.drupal.org/search/site/commerce - At least Commerce Kickstart is there, once again about halfway down the page, but Drupal Commerce is nowhere to be found.
- https://www.drupal.org/search/site/responsive - This one's difficult to gauge what people are looking for, so I can't really tell if these results are useful or not. I'm guessing what they're actually looking for is more like a resource guide on Mobile, tho.

Anyway, the bottom line. This isn't a feature request, and this isn't "some" searches. Drupal.org's search is just flat-out busted for all of the most common searches, as far as I can tell.

I don't know anything about configuring Solr, but if it's possible to say "if the search term is an exact match of a project short name, shoot it to the top of the list" that would certainly help.

webchick’s picture

It's probably also worth noting that you need to drill down all the way to search term ~34/35 before you start seeing terms like "theme" and "view" that are not well-known project names, where people are more likely starting to look for "documentation about X" versus "project X." The next such search term isn't until "seo" in #59. Basically, people come to Drupal.org looking for modules/themes. ;)

This seems to be further evidence of the importance of #1243332: Deploy Project Browser Server and drupalorg_pbs on d.o, and that we should definitely try and target this for a minor release of Drupal 8.

joshuami’s picture

It's on the list. We can probably fit in some short term fixes to address the most egregious errors in the search results. As we role out new content types from the content strategy work, that will be a great opportunity to drop in some new Solr configuration.

webchick’s picture

Another example that came up tonight.

https://www.drupal.org/search/site/acquia

Expected to find: Acquia's organization page: https://www.drupal.org/marketplace/acquia

Instead found: a series of job postings from 2014. :) Then some themes. Because there are no facets for organizations you are kind of SOL. Luckily it does appear on the first page of results, just like 3/4 down.

The pattern here seems to be "if it's an exact title match, make it the first thing in the list."

Also just a note that https://www.drupal.org/roadmap/search says "Expect more to come in early 2015." So this should probably be updated with a new ETA.

joshuami’s picture

The initiative to improve search has been driving me a bit batty. While it is on the roadmap, we have not been able to give it much time.

A few weeks back, we had a developer I use to work with on large library websites, @bob-tricoski, come in and give us a crash course on steps we could take to make Drupal.org search better. Today, I did a little additional research today to help me get my head wrapped around the "views" results that @webchick pointed out. I also tried to take a lot of what Bob covered with us and turn it into something that could be molded into an actionable plan.

TL;DR = Search is hard to configure well on sites with lots of specific jargon and millions of "documents". (Documents is a Solr term that roughly equates to entities or rendered pages/paths.)

These 7 changes will fix some—but not all—of the issues with our primary search on D.o. The next step will be turning this into a work plan.

1. We rely on bias too much. By weighting things by a bias, we are making assumptions that may not always be correct. For instance, we bias "Drupal Core" projects... but there is really only one and we should use elevation for that. Another example is that organizations are ignored from biasing, but they would be great to bias because of the likelihood of unique strings. The bias we currently have is only a little off so tweaking those settings is relatively low risk.

2. We have a lot of words that should not be stemmed by default. Views and Ctools both end in an "s". So Solr sees those as "View" and "Ctool". The good news is that Solr can deal with this if we update protwords.txt. This protected words file will remove certain words from the stemming filter. The catch is that we will need to schedule this as a reindex event. (It takes a while to reindex D.o.)

We probably need to look at a list of all modules, and key jargon specific to Drupal, and include them in the protwords.txt if they end in "ing", "er", "s" or "ed". That would immediately make results better for a lot of our modules.

3. We need to carefully add some elevations. "Drupal Core" should likely be elevated to have the Drupal Core project as the first result. This would be a better alternative to biasing the Drupal core content type. There may be some other exact match searches that we simply need to make give better first results. This should be limited to really important words that essentially need sponsored elevation to the top spot.

4. We should role back our custom facets and let the Solr module do this for us. In digging through the code, I found that we have facet blocks that are enabled on our search that are not displaying on the search page. This is a bunch of custom code so that we can combine facets like we do with the "or filter by…" block. That means we are leaving out content types—like organization and case studies—from our facets.

This one is a little tricky as we have some content types that we don't really expect people to search for, such as "theme engine" and "book page". (We have 17 content types—two of which are essentially deprecated.) The content model currently proposed in #2481519: [META] Content Model for Drupal.org will address this a little so that "documentation" will be a real content type rather than a use case for book pages. That content model is also going to make the list of facets longer.

As a side note, just turning on issue tags as a facet option would really give some cool results for tracking down issues that are hard to find.

5. We need to carefully add some synonyms. Very carefully. This will also require a re-index, but synonyms can be a great way to tie together jargon that is very site specific. Drupal has a lot of site specific jargon. The danger with synonyms is that it can hide results for incorrectly associated words. I'd love some feedback on the best way to group edit a list of synonyms. We can't change this file very often because of the reindex required, so we need to get it as close to correct as possible on the first go.

6. We need to make exact match score higher than contains. Exact match in Solr is a bit of a deeper dive. Accounting to the Solr wiki, we need to:

index the content twice, using different fields with different fieldTypes (and different analyzers associated with those fieldTypes). One analyzer will contain a lowercase filter for case-insensitive matches, and one will preserve case for exact-case matches.

Ironically, this would help with searches for terms like "apachesolr"—we might actually get back the apachesolr module with that search.

We could take this a step further and bias—I know I said we had to be careful with this—a project shortname field to be more important than even title. (That might cause some weird outcomes though.)

7. We should index a couple of important views and taxonomies. By default, the Solr module does not index term pages. As we add topic pages and possibly issue tag page views, we will need to include those into the index if we want them to show up as a search result. Likewise, a view page display—or really any view—does not get indexed. There are ways to add specific "documents" into the Solr index to help get better results for these non-node things.

There are some other tweaks we could make, but this list would go a long way to making many of the results closer to what we expect. Most of these changes are to configuration files in Solr, so I'm leaving this in infrastructure for now.

webchick’s picture

Wow, that looks like a great list! Thanks for digging into this one. I realize it's further down the roadmap but nonetheless it's one of those things that erodes the d.o experience for all target audiences, so great that a plan is being put together.

mgifford’s picture

Here's another one. Searching for a users name should result in a quick and easy link to their user profile with either of these:

https://www.drupal.org/search/site/Angie+Byron
https://www.drupal.org/search/user/Angie+Byron

tvn’s picture

Title: Drupal.org site search (solr) fails on the most common searches » Users should get an expected result when using search
Category: Bug report » Plan
basic’s picture

Project: Drupal.org infrastructure » Drupal.org customizations
Version: » 7.x-3.x-dev
Component: Servers » Miscellaneous

I am moving this to Drupal.org customizations, because as far as the infrastructure is concerned the Solr servers have been functioning without issue. This issue is related to optimizing the Drupal.org search functionality and the customizations that are required for this.

drumm’s picture

Issue summary: View changes
drumm’s picture

I'm pulling specific solutions into child issues as we tackle this:

drumm’s picture

Acquia Slate ranks artificially high because More comments is biased to 6 and the theme has comments. I'm not even sure why a theme has comments and am removing them, #2558859: Remove comments from project_(module|release|theme) nodes.

If I bias the Organization type the same as module/theme/etc on dev, Acquia the organization does beat out everything except Acquia Slate.

For biases, we should consider:

  • Backing module/theme/etc down, increasing organization
  • Backing More comments down, increasing More recent comments
drumm’s picture

Those comments are now closed and unpublished. I also took the liberty of removing the jobs on groups from the index.

drumm’s picture

Issue summary: View changes

Tracking a bunch of searches will really help make sure we're making progress without bad regressions. I made a little script to track these searches and highlight the interesting results http://cgit.drupalcode.org/infrastructure/tree/Misc/site-search-test.php.

The initial set of searches is from the table on this page, plus a few added by myself and hestenet. Patches to fill out this list are welcome.

drumm’s picture

I deployed #2566587: Make machine name Solr bias configurable and went ahead and configured the project machine name bias up to 8, the lowest that was really effective along with my current test settings for #2566617: Review our site search biases. Until the boosts are generally reset, this won't be completely effective, but there are already some good results:

  • rules is now on page 3
  • draggableviews is now #1
  • zen is now #1
  • views is now on page 2
  • apachesolr is now on page 2
  • media is now on page 3
  • redirect is now on page 1
  • ctools is now on page 1

This bias, unlike the path alias bias, only takes effect if there is a complete, exact match. So there is no effect on the ranking of non-matching projects, the only possible downside is if a module's short name is the same as a common 1-word search that has a better non-project result.

drumm’s picture

Status: Active » Needs review

#2558663: Favor exact title matches in site search & #2566617: Review our site search biases are now in production.

The searches I'm currently tracking are: http://cgit.drupalcode.org/infrastructure/tree/Misc/site-search-test.php. Of those, we have some good improvements:

coding standards:
#1 https://www.drupal.org/project/coding_standards - it isn't a bad result, but it isn't ideal
#2 https://www.drupal.org/coding-standards

installation guide
#1 https://www.drupal.org/documentation/install

glossary:
#1 https://www.drupal.org/glossary

rules:
#1 https://www.drupal.org/documentation/modules/rules
#2 https://www.drupal.org/project/rules

draggableviews:
#1 https://www.drupal.org/project/draggableviews
#3 https://www.drupal.org/node/283498 - documentation about the module

zen:
#1 https://www.drupal.org/project/zen
#8 https://www.drupal.org/documentation/theme/zen

views:
#2 https://www.drupal.org/project/views

apachesolr:
#1 https://www.drupal.org/project/apachesolr

apache solr:
#51 https://www.drupal.org/project/apachesolr

media:
#5 https://www.drupal.org/project/media
#9 https://www.drupal.org/resource-guides/managing-media

redirect:
#1 https://www.drupal.org/project/redirect
#63 https://www.drupal.org/project/path_redirect

xml sitemap:
#7 https://www.drupal.org/project/xmlsitemap

ctools:
#12 https://www.drupal.org/project/ctools

core:
#64 https://www.drupal.org/project/drupal

drupal core:
#1 https://www.drupal.org/project/drupal

drupal:
#1 https://www.drupal.org/project/drupal

tag1:
#1 https://www.drupal.org/marketplace/tag1consulting

mediacurrent:
#1 https://www.drupal.org/marketplace/mediacurrent

acquia:
#17 https://www.drupal.org/marketplace/acquia

drupal geeks:
#16 https://www.drupal.org/node/2013897

dddave’s picture

FileSize
30.86 KB

Glad to see we are making progress with our broken search. However it seems to me that we are breaking parts of search that were working before. I often use search to find old Planet issues. This usually works best when using a Planet feed's full url (i.e. the url the user submitted for the Planet) in the search for the webmaster or content queue (never worked on general site search). This no longer works.

Issue queue search in general seems to be broken (or slow at indexing) because the search in the attached picture returns nothing. The issue in question at the time of search was well over six hours old.

drumm’s picture

This issue hasn't touched issue queue search, which uses a separate index; and there hasn't been other work there lately. https://www.drupal.org/project/search_api_db and our configuration is what we have to work with there. Searching with punctuation there is tough, since it doesn't (currently) have a great way to know the end of a sentence from part of a URL, if I recall correctly. Issue search's index is updated immediately.

Now that we're on Solr 5, we have more of an option to switch to Solr for issue queue search. However, our setup still waits to index on cron, and Solr still takes up to ~2 minutes for the index to update. People would rightfully get antsy if https://www.drupal.org/project/issues/search/drupal?status[0]=1&status[1... took up to 12 min to update. However, I hear the options to make the updates near-instant are a whole lot better in Solr 5 than 3.

drumm’s picture

That said, I think https://www.drupal.org/search/site/barnettech.com?f[0]=ss_meta_type%3Afo... might be better.

I do want to get some multi-word searches into our test cases, so we can try out apachesolr_proximity.

dddave’s picture

Thanks for the clarification. I'll have an eye on it but I feel like this is regressing.

hestenet’s picture

@dddave - any specific examples you can provide of searches, single or multi word, and what the ideal results should be would be very helpful. We can add those examples into drumm's script for evaluating whether search has improved: http://cgit.drupalcode.org/infrastructure/tree/Misc/site-search-test.php

For the 20 or so kinds of searches we're tracking right now (and we tried to be representative with the types of searches) we're seeing some strong improvements. But more eyeballs will help.

dddave’s picture

@hestenet My "issues" are not with site search which this issue is about, isn't it? If I notice issue queue search going downhill I create a new issue.

hestenet’s picture

Issue summary: View changes
Issue tags: -Needs issue summary update
hestenet’s picture

Issue summary: View changes
hestenet’s picture

Issue summary: View changes
kristofferwiklund’s picture

I have added a issue for User searches.

hestenet’s picture

pale177’s picture

Were you guys able to incorporate Google Analytics into the search results?

We recently moved from GSA to Solr and users have been complaining all month long. I have the results biased by 1.Title 2.URL and 3.Content

We have over 75000 basic pages, news articles and profiles. The news articles are the worst, since they have titles that contain "human resources" and they push our HR page way off. Other pages like the xyz.com/president page are also pushed back in the results due to the news articles containing the word "president" a lot of times in the title and content.

I am guessing GSA was referencing to page rank somehow, that's the only piece missing from this puzzle.