Problem
For certain common short (one, two, or three word) searches, the "best" or "most important" result doesn't appear first or possibly not even in the first page of results.
Current Status
Much improved - per [#73] many of our exemplar searches are now returning the desired result.
We are currently evaluating additional configuration tweaks to improve this further.
Input / Expected results
To assist with fixing this issue, the D.o search maintainers need a variety of example searches - both searches that are working now (so we can test to make sure those do not regress) and searches which do not yet work.
For each example search we need:
- The search terms
- The ideal result
- Any 'okay' results (not ideal, but still good results)
- Any 'bad' results that might currently be dominant for that term
Those search terms will be added to this script which is used to evaluate the changing rankings of search term results:
http://cgit.drupalcode.org/infrastructure/tree/Misc/site-search-test.php
Please provide your suggestions as comments to this issue, or as patches for this script.
Discussion / Solutions
Recommendation | Issues |
---|---|
Tune our biases |
#2566617: Review our site search biases #2566587: Make machine name Solr bias configurable |
Favor Exact Matches | #2558663: Favor exact title matches in site search |
Remove Search refinements from d.o header (use facets on search page) | needs research/issue Each D.O content type should have a well-designed search display mode |
Index important content |
#2584011: Include api.drupal.org in Drupal.org search results Include events.drupal.org results in search Include jobs.drupal.org results in search Include important taxonomies/views in the index |
Create a blacklist for known undeseriable results | needs research/issue |
Conside including a 'is this the result your looking for?' function | needs research |
Update the do-not-stem list | as needed |
Use elevations(sparingly) | as needed |
Add synonyms(sparingly) | as needed |
Possible new ranking factors
- Project has a current release version (7.x, or shortly, 8.x)
- Project usage stats - more widely used projects should rank higher
- Forum topics and some other content should potentially lose relevancy in a more dramatic way a certain amount of time after being posted (regardless of last comment time, since that may be spammy or off-topic?)
See comment #33: node rendering needs to look for the search_index view mode to avoid adding garbage keywords like "View" to each node.
Incorporate Populatrity?
We could add the "popularity" field to solr index, and then boost it in the search result or make it a sorting criteria. "popularity" can be defined by the number of clicks collected by Google Analytics or the "access log" module.
Comment | File | Size | Author |
---|
Comments
Comment #1
rfayThanks for following this up!
Comment #2
rfayI guess I have to ask why, even without popularity:
Why does a search for "Coding Standards" fail to find the single most important page, whose title is "Coding Standards" and whose URL is /coding-standards?
Comment #3
Dave ReidAdding URL alias to the index with a high ranking should help right? We don't have statistics.module enabled on d.org so I'm not sure how we'd get popularity indexed.
Comment #4
danithaca CreditAttribution: danithaca commentedIf statistics.module is not enabled, we can still get popularity data from Google Analytics using the GA Data Export API. I believe there's already a module that can do that.
Comment #5
danithaca CreditAttribution: danithaca commentedoh, forgot to mention that i'm working on another issue #479812: Remove "Related projects" block until it provides relevant suggestions that will retrieve some GA data to improve "related projects" recommendation. If needed, I can write some customized code that dump GA popularity data into some d.o. database table, which can be then feed into the Solr index.
Comment #6
cyberswat CreditAttribution: cyberswat commentedCouldn't you just modify the content bias settings so that the content type page takes precedence over issues. Maybe combine that with a field bias that gives the h1 tag a little more relevance. Either adjustment should produce the desired result for this use case.
Comment #7
rfayIt seems like we should be using all of these: path, title, H1 tags. Any one of these would help this case. All of them seems like it would give much better results.
Comment #8
rfayI think this is a really critical usability issue on drupal.org.
Today I tried searching for the great handbook page on Clean URLs. I unfortunately tried searching for it on Drupal.org itself, and was no way able to find the page. (The page is http://drupal.org/node/15365 and its title is "Clean URLs".
The search for "Clean URLs" on drupal.org (http://drupal.org/search/apachesolr_search/Clean%20URLs) retrieves an enormous clutter of useless posts.
Google gives us the correct page as #1 with no effort.
Should we just use google search? If not, we should come up with a way to at least find relevant information with the vaunted solr search.
I suspect that if we made a list of the 10 most important searches on Drupal.org, the d.o search would not return useful results for very many of them.
Comment #9
pwolanin CreditAttribution: pwolanin commentedHas no one commenting on this thread looked at the Apache Solr module admin interface?
You could give a big weight to url alias for example, or increase the weight of the title. These settings surely need to be tuned for any specific site, and the complaints here suggest the first increments.
Comment #10
Dave ReidTitle and URL alias already have the highest values possible (21.0). Not sure what else we can do?
Comment #11
pwolanin CreditAttribution: pwolanin commentedI just tweaked the settings a little (url alias was not considered before) - but an interesting effect there is that we chose to set omitNorms="true" in for the title field in schema.xml. If we set this to false, a short title that matched exactly would get a much bigger boost. Perhaps it would be worth comparing the effect of setting this to normed - the only normed field currently is the body. A changed would require reindexing.
Comment #12
pwolanin CreditAttribution: pwolanin commentedI also just set the url alias of that page to http://drupal.org/getting-started/clean-urls
I set the Apache Solr boosts to add a score for matching the url alias, so that would also tend to bring the correct pages to the fore.
Comment #13
pwolanin CreditAttribution: pwolanin commentedNote - we can also reduce the score for the body field relative to others. I changed it from 1.0 to 0.5.
Since the body is the only normed field, it actually has its score multiplied by 40x additionally by our module. I think (maybe) what this means is that matching one word in a 40 word body is the same as matching any word in a title, but still a bit fuzzy about the Solr internals. I jsut picked this 40x scale as a fast and dirty heuristic.
Comment #14
Damien Tournoud CreditAttribution: Damien Tournoud commentedThe main issue is that Apache Solr boosts recently created pages. For some reason, I'm not able to tweak that parameter.
Comment #15
pwolanin CreditAttribution: pwolanin commentedNote for per-content type biases:
read the description: "Any value except Ignore will increase the score of the given type in search results."
Comment #16
pwolanin CreditAttribution: pwolanin commentedtweaking some more - seems that biasing by more recent comment/update is the trick + url alias biasing that brings it to the top of the search with no facet filtering.
Comment #17
pwolanin CreditAttribution: pwolanin commented@DamZ - I set "More recently created: " bias to "Ignore"
Comment #18
pwolanin CreditAttribution: pwolanin commentedrelates also to the discussion here w.r.t the redesign: http://drupal.org/node/665722#comment-2452900
Seems like we need to consider additional fields that can be used for weighting - such as how many children a book page has? or something like the "sticky" toggle that can provide a big additional boost?
Comment #19
rfayThis is *so much* better.
I just searched for "Module Developer" and the #1 hit was the "Module Developer's guide".
Comment #20
apadernoI agree; it is much better now. I tried some searches that first didn't get the most intuitive result, and now they do.
Comment #21
rfayExcellent work on this. A huge improvement to d.o search with this.
Congrats and thanks.
Marking fixed,
-Randy
Comment #23
adanielyan CreditAttribution: adanielyan commentedI believe this issue should be re-opened. Searching such a basic thing as "views" doesn't return the Views module or documentation page in results. https://drupal.org/search/site/views
The google search on the other hand returns Views module page as a first result: https://www.google.com/#q=site:drupal.org+views
Comment #24
adanielyan CreditAttribution: adanielyan commentedComment #25
pwolanin CreditAttribution: pwolanin commentedWe need more than 1 example of what's "wrong".
projects already have a high bias, but a simple word like Views is apparently hard to match unless we up that bias even more or use some other boost for popular projects.
Comment #26
adanielyan CreditAttribution: adanielyan commentedHere are more examples.
Searching for "display suite module" doesn't return the module page (but the group page instead): https://drupal.org/search/site/display%20suite%20module
Same for CKEditor: https://drupal.org/search/site/ckeditor%20module
There are bunch of other examples, but the point is that the search doesn't really work as most of users would expect it to work. I don't think the issues should be solved on case by case basis, but rather the fundamental change should be made to the way the content is weighted.
Comment #27
drummSince titles and projects are already boosted 21x, I think the next steps are along the lines of:
- Boost on exact title match
- Boost field_project_type == full (instead of sandbox)
- Boost on project maintenance taxonomy, for example Maintenance status == Actively maintained
- Boost on project usage
Comment #28
drummWe also need to collect specific searches & expected results so we can measure the effectiveness of changes.
Comment #29
rfayIMO this kind of search optimization should be a perpetual task assigned to somebody in the DA group running d.o. It's a never-ending problem and is not going to go away and is really quite important.
Comment #30
pwolanin CreditAttribution: pwolanin commentedAlso - we should possibly lower some of the boots (a lot are at 21) and/or use vset to make the key ones even higher.
Sorting out the relative boosts impact is not always easy - probably need debug output.
@rfay - ya, I agree. Or at least we should have a set of scenarios describing input and expected results (or at least ype of resultS) so we understand what is wanted.
Comment #31
rcross CreditAttribution: rcross commented[#1727576: Case sensitive search causes bad results.]
Comment #32
drummThe current Solr config is on any dev site, https://drupal.org/node/1018084. All dev sites are set up to connect to a full read-only Solr index, so debugging can be turned up and configuration changes previewed.
(This likely isn't an indexing problem. If it is, we can spin up a r/w Solr index.)
Comment #33
pwolanin CreditAttribution: pwolanin commentedSo, davidhernandez and I found one possible problem that makes Views module in particular had to find. This is sort of a bug in project_release module
http://cgit.drupalcode.org/project/tree/release/project_release.module#n982
function project_release_node_view() should not append links - especially not 'View all releases' to the node if the view mode is 'search_index'
This View string is matching every project when someone looks for Views!
Comment #34
pwolanin CreditAttribution: pwolanin commentedre: #27 How about excluding modules that don't have a 7.x release and e.g. forum topics older than a certain age?
For integrating BDD testing, if you want a more readily parse able search result, you can take a look at what I did at: https://www.drupal.org/sandbox/pwolanin/2134321
That was for a POC of integrating with https://quepid.com/secure/#/
Comment #35
pwolanin CreditAttribution: pwolanin commentedUsing commongrams might also help since you could increase the number of stop words: http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-comm...
Comment #36
drummTo really improve this, we need to be a bit more methodical. I started a Search / Expected result table in the issue summary. Having a summary will be much easier than reading through 30 comments. Later, we can automate BDD testing or even use something like https://quepid.com/secure/#/.
Comment #37
webchickAdding a bit more info to the issue summary for "outsiders."
Comment #38
webchickAlso, tweeted here: https://twitter.com/webchick/status/480745059164233729 This seems like something that should also be broadcast via @drupal or similar channels to get better visibility.
Comment #39
pwolanin CreditAttribution: pwolanin commentedComment #40
pwolanin CreditAttribution: pwolanin commentedComment #41
pwolanin CreditAttribution: pwolanin commentedComment #42
pwolanin CreditAttribution: pwolanin commentedComment #43
Dave ReidComment #44
Dave ReidComment #45
Dave ReidComment #46
drummComment #47
drumm#33 is implemented and re-indexing. The result count at https://www.drupal.org/search/site/views is gradually going down as projects are indexed.
It looks like the examples so far are in 2 categories - documentation & modules. Documentation so far is okay, we can use it to set up automated monitoring. Modules are where work can be concentrated.
Comment #49
mgifford@drumm - I'm sure this is done indexing. Can you confirm that the solution from #33 is implemented?
Views still isn't on page one of this search https://www.drupal.org/search/site/views
Can't we just make a given search (like "Views") show a given result at the top. Something like what Defacto Search offers. There must be similar solutions to look at.
I've checked that the summary is still accurate.
Comment #50
drumm#33 is implemented. That comment was about removing irrelevant results for "views" because "View all releases" was being indexed all the time. It was not about making sure Views was showing as high as it should be.
As I said in #36, this needs systematic improvement. If we artificially pin the 9 examples in the issue summary to the top, we have not solved the problem.
Comment #51
webchickSo what are the next steps then, and how can we help? This was one of the top 3 pain points in the ideation process last year, it came up in user research, it grates on contributors who use the site every day, etc.
Comment #52
mgiffordWe have to be able to take small, iterative steps to improve drupal.org. So much of the time it seems issues like this sit for years (this one is coming up to 5 years now), while folks wait for a perfect solution. Perfect is the enemy of the good. We might be able to wait for perfect for some elements of Core, but drupal.org isn't like that and we have to be able to experiment.
Good to know that #33 was specific about that phrase.
We can start right now to implement those 9 examples. We can look at the server logs to determine common searches and ensure we've got good results for the top 50 requests. We can look out to common themes & modules and ensure that if there are child projects that we always pin the parent as the top result. We can ask the community for examples of what they think should be a logical first hit for common requests that they make.
Leaving it as it is though is just a really bad option. It discourages everyone.
If we have a means to fix these first few known issues, we can then set up a pattern to repeat it.
This is a solvable problem. Others in the community have dealt with it before. Let's not be afraid to try something out on this old issue and see if we can't improve it.
Comment #53
Nick_vhI suggest as a small iterative step to install apachesolr_proximity. (https://www.drupal.org/project/apachesolr_proximity)
I highly suggest we install this on drupal.org and take some example searches and see what results they have.
As a follow up, we should most likely add the module name as a separate field, use it to search on and boost that field. We also need to look at the search_index display mode and make sure there is no useless information in there that we don't want to appear in the index.
Secondly, I suggest we take a look at quepid platform. We can use this to quantitatively check if our tweaks that we do at for example, synonyms and protwords.txt but also other config files do not break the expected results. A presentation on the topic can be found here: http://www.slideshare.net/DougTurnbull2/test-driven-search-relevancy
and more information here: http://opensourceconnections.com/blog/2013/10/07/quepid-give-your-search...
Comment #54
drummWe have the machine name as a separate field,
field_project_machine_name
. The human-readable name is the project node's title property.The "Needs issue summary update" tag still applies. Potential improvements should be in the issue summary, so they don't get lost in the comments again. (As well as more example searches.)
Comment #55
webchickIf more human input is needed in order to fix this, what about adding a link to this issue on all search results pages saying "Were these results helpful? Help us improve here: XXX"
Else it seems like it'd be possible to look at analytics for this. Figure out the top N search terms (https://www.drupal.org/admin/reports/search has some data but it seems to be totally wrong; maybe it's only user search data?), and add them to the table and let people fill in the blanks, maybe?
Comment #56
mgifford@webchick I think we need either of these to be installed/enabled to get valuable analytics:
https://www.drupal.org/project/search_log
https://www.drupal.org/project/apachesolr_stats
That would help with the most common problems. Particularly if it were reviewed every few months.
Like the idea of a webform that would allow us to track what folks searched for and what they had hoped to find. That would be a big step forward and allow us to take on some of the long tail searches.
Comment #57
joshuamiComment #58
webchickGot annoyed again today trying to use Drupal.org search, so taking my own suggestion at #55, here are the top search terms on Drupal.org, at least according to Google Analytics (Drupal's "top searches" report is useless because Solr searches bypass it):
Interestingly, every single one of them corresponds to a module name, with the exception of "responsive" (most likely). And yet:
- https://www.drupal.org/search/site/views - Views module is nowhere on the first page. (The first result is "View" module.. talk about confusing.)
- https://www.drupal.org/search/site/ctools - CTools module is at least on the list, but not until about halfway down. Also for whatever reason the description is truncated towards the end? so it says:
...which would never present itself to me as a new user as the thing I was supposed to click on. Another swing and a miss.
- https://www.drupal.org/search/site/webform - Webform module nowhere on the first page.
- https://www.drupal.org/search/site/bootstrap - First result is Twitter Bootstrap 3.0 which you'd think was good, except when you look and find out it's a) an unapproved sandbox project last touched in 2013 and b) actually a theme engine, not a theme. Presumably people coming here want https://www.drupal.org/project/bootstrap which like Ctools is about mid-way down and has a weirdly truncated description.
- https://www.drupal.org/search/site/ckeditor - Halfway down the first page.
- https://www.drupal.org/search/site/entity - Entity API module nowhere on the first page. One could argue that maybe they're looking for documentation about entity instead of the module, but all 10 results are all theme/module projects so the results still don't help.
- https://www.drupal.org/search/site/commerce - At least Commerce Kickstart is there, once again about halfway down the page, but Drupal Commerce is nowhere to be found.
- https://www.drupal.org/search/site/responsive - This one's difficult to gauge what people are looking for, so I can't really tell if these results are useful or not. I'm guessing what they're actually looking for is more like a resource guide on Mobile, tho.
Anyway, the bottom line. This isn't a feature request, and this isn't "some" searches. Drupal.org's search is just flat-out busted for all of the most common searches, as far as I can tell.
I don't know anything about configuring Solr, but if it's possible to say "if the search term is an exact match of a project short name, shoot it to the top of the list" that would certainly help.
Comment #59
webchickIt's probably also worth noting that you need to drill down all the way to search term ~34/35 before you start seeing terms like "theme" and "view" that are not well-known project names, where people are more likely starting to look for "documentation about X" versus "project X." The next such search term isn't until "seo" in #59. Basically, people come to Drupal.org looking for modules/themes. ;)
This seems to be further evidence of the importance of #1243332: Deploy Project Browser Server and drupalorg_pbs on d.o, and that we should definitely try and target this for a minor release of Drupal 8.
Comment #60
joshuamiIt's on the list. We can probably fit in some short term fixes to address the most egregious errors in the search results. As we role out new content types from the content strategy work, that will be a great opportunity to drop in some new Solr configuration.
Comment #61
webchickAnother example that came up tonight.
https://www.drupal.org/search/site/acquia
Expected to find: Acquia's organization page: https://www.drupal.org/marketplace/acquia
Instead found: a series of job postings from 2014. :) Then some themes. Because there are no facets for organizations you are kind of SOL. Luckily it does appear on the first page of results, just like 3/4 down.
The pattern here seems to be "if it's an exact title match, make it the first thing in the list."
Also just a note that https://www.drupal.org/roadmap/search says "Expect more to come in early 2015." So this should probably be updated with a new ETA.
Comment #62
joshuamiThe initiative to improve search has been driving me a bit batty. While it is on the roadmap, we have not been able to give it much time.
A few weeks back, we had a developer I use to work with on large library websites, @bob-tricoski, come in and give us a crash course on steps we could take to make Drupal.org search better. Today, I did a little additional research today to help me get my head wrapped around the "views" results that @webchick pointed out. I also tried to take a lot of what Bob covered with us and turn it into something that could be molded into an actionable plan.
TL;DR = Search is hard to configure well on sites with lots of specific jargon and millions of "documents". (Documents is a Solr term that roughly equates to entities or rendered pages/paths.)
These 7 changes will fix some—but not all—of the issues with our primary search on D.o. The next step will be turning this into a work plan.
1. We rely on bias too much. By weighting things by a bias, we are making assumptions that may not always be correct. For instance, we bias "Drupal Core" projects... but there is really only one and we should use elevation for that. Another example is that organizations are ignored from biasing, but they would be great to bias because of the likelihood of unique strings. The bias we currently have is only a little off so tweaking those settings is relatively low risk.
2. We have a lot of words that should not be stemmed by default. Views and Ctools both end in an "s". So Solr sees those as "View" and "Ctool". The good news is that Solr can deal with this if we update protwords.txt. This protected words file will remove certain words from the stemming filter. The catch is that we will need to schedule this as a reindex event. (It takes a while to reindex D.o.)
We probably need to look at a list of all modules, and key jargon specific to Drupal, and include them in the protwords.txt if they end in "ing", "er", "s" or "ed". That would immediately make results better for a lot of our modules.
3. We need to carefully add some elevations. "Drupal Core" should likely be elevated to have the Drupal Core project as the first result. This would be a better alternative to biasing the Drupal core content type. There may be some other exact match searches that we simply need to make give better first results. This should be limited to really important words that essentially need sponsored elevation to the top spot.
4. We should role back our custom facets and let the Solr module do this for us. In digging through the code, I found that we have facet blocks that are enabled on our search that are not displaying on the search page. This is a bunch of custom code so that we can combine facets like we do with the "or filter by…" block. That means we are leaving out content types—like organization and case studies—from our facets.
This one is a little tricky as we have some content types that we don't really expect people to search for, such as "theme engine" and "book page". (We have 17 content types—two of which are essentially deprecated.) The content model currently proposed in #2481519: [META] Content Model for Drupal.org will address this a little so that "documentation" will be a real content type rather than a use case for book pages. That content model is also going to make the list of facets longer.
As a side note, just turning on issue tags as a facet option would really give some cool results for tracking down issues that are hard to find.
5. We need to carefully add some synonyms. Very carefully. This will also require a re-index, but synonyms can be a great way to tie together jargon that is very site specific. Drupal has a lot of site specific jargon. The danger with synonyms is that it can hide results for incorrectly associated words. I'd love some feedback on the best way to group edit a list of synonyms. We can't change this file very often because of the reindex required, so we need to get it as close to correct as possible on the first go.
6. We need to make exact match score higher than contains. Exact match in Solr is a bit of a deeper dive. Accounting to the Solr wiki, we need to:
Ironically, this would help with searches for terms like "apachesolr"—we might actually get back the apachesolr module with that search.
We could take this a step further and bias—I know I said we had to be careful with this—a project shortname field to be more important than even title. (That might cause some weird outcomes though.)
7. We should index a couple of important views and taxonomies. By default, the Solr module does not index term pages. As we add topic pages and possibly issue tag page views, we will need to include those into the index if we want them to show up as a search result. Likewise, a view page display—or really any view—does not get indexed. There are ways to add specific "documents" into the Solr index to help get better results for these non-node things.
There are some other tweaks we could make, but this list would go a long way to making many of the results closer to what we expect. Most of these changes are to configuration files in Solr, so I'm leaving this in infrastructure for now.
Comment #63
webchickWow, that looks like a great list! Thanks for digging into this one. I realize it's further down the roadmap but nonetheless it's one of those things that erodes the d.o experience for all target audiences, so great that a plan is being put together.
Comment #64
mgiffordHere's another one. Searching for a users name should result in a quick and easy link to their user profile with either of these:
https://www.drupal.org/search/site/Angie+Byron
https://www.drupal.org/search/user/Angie+Byron
Comment #65
tvn CreditAttribution: tvn at Drupal Association commentedComment #66
basic CreditAttribution: basic at Drupal Association commentedI am moving this to Drupal.org customizations, because as far as the infrastructure is concerned the Solr servers have been functioning without issue. This issue is related to optimizing the Drupal.org search functionality and the customizations that are required for this.
Comment #67
drummComment #68
drummI'm pulling specific solutions into child issues as we tackle this:
Comment #69
drummAcquia Slate ranks artificially high because More comments is biased to 6 and the theme has comments. I'm not even sure why a theme has comments and am removing them, #2558859: Remove comments from project_(module|release|theme) nodes.
If I bias the Organization type the same as module/theme/etc on dev, Acquia the organization does beat out everything except Acquia Slate.
For biases, we should consider:
Comment #70
drummThose comments are now closed and unpublished. I also took the liberty of removing the jobs on groups from the index.
Comment #71
drummTracking a bunch of searches will really help make sure we're making progress without bad regressions. I made a little script to track these searches and highlight the interesting results http://cgit.drupalcode.org/infrastructure/tree/Misc/site-search-test.php.
The initial set of searches is from the table on this page, plus a few added by myself and hestenet. Patches to fill out this list are welcome.
Comment #72
drummI deployed #2566587: Make machine name Solr bias configurable and went ahead and configured the project machine name bias up to 8, the lowest that was really effective along with my current test settings for #2566617: Review our site search biases. Until the boosts are generally reset, this won't be completely effective, but there are already some good results:
This bias, unlike the path alias bias, only takes effect if there is a complete, exact match. So there is no effect on the ranking of non-matching projects, the only possible downside is if a module's short name is the same as a common 1-word search that has a better non-project result.
Comment #73
drumm#2558663: Favor exact title matches in site search & #2566617: Review our site search biases are now in production.
The searches I'm currently tracking are: http://cgit.drupalcode.org/infrastructure/tree/Misc/site-search-test.php. Of those, we have some good improvements:
coding standards:
#1 https://www.drupal.org/project/coding_standards - it isn't a bad result, but it isn't ideal
#2 https://www.drupal.org/coding-standards
installation guide
#1 https://www.drupal.org/documentation/install
glossary:
#1 https://www.drupal.org/glossary
rules:
#1 https://www.drupal.org/documentation/modules/rules
#2 https://www.drupal.org/project/rules
draggableviews:
#1 https://www.drupal.org/project/draggableviews
#3 https://www.drupal.org/node/283498 - documentation about the module
zen:
#1 https://www.drupal.org/project/zen
#8 https://www.drupal.org/documentation/theme/zen
views:
#2 https://www.drupal.org/project/views
apachesolr:
#1 https://www.drupal.org/project/apachesolr
apache solr:
#51 https://www.drupal.org/project/apachesolr
media:
#5 https://www.drupal.org/project/media
#9 https://www.drupal.org/resource-guides/managing-media
redirect:
#1 https://www.drupal.org/project/redirect
#63 https://www.drupal.org/project/path_redirect
xml sitemap:
#7 https://www.drupal.org/project/xmlsitemap
ctools:
#12 https://www.drupal.org/project/ctools
core:
#64 https://www.drupal.org/project/drupal
drupal core:
#1 https://www.drupal.org/project/drupal
drupal:
#1 https://www.drupal.org/project/drupal
tag1:
#1 https://www.drupal.org/marketplace/tag1consulting
mediacurrent:
#1 https://www.drupal.org/marketplace/mediacurrent
acquia:
#17 https://www.drupal.org/marketplace/acquia
drupal geeks:
#16 https://www.drupal.org/node/2013897
Comment #74
dddave CreditAttribution: dddave commentedGlad to see we are making progress with our broken search. However it seems to me that we are breaking parts of search that were working before. I often use search to find old Planet issues. This usually works best when using a Planet feed's full url (i.e. the url the user submitted for the Planet) in the search for the webmaster or content queue (never worked on general site search). This no longer works.
Issue queue search in general seems to be broken (or slow at indexing) because the search in the attached picture returns nothing. The issue in question at the time of search was well over six hours old.
Comment #75
drummThis issue hasn't touched issue queue search, which uses a separate index; and there hasn't been other work there lately. https://www.drupal.org/project/search_api_db and our configuration is what we have to work with there. Searching with punctuation there is tough, since it doesn't (currently) have a great way to know the end of a sentence from part of a URL, if I recall correctly. Issue search's index is updated immediately.
Now that we're on Solr 5, we have more of an option to switch to Solr for issue queue search. However, our setup still waits to index on cron, and Solr still takes up to ~2 minutes for the index to update. People would rightfully get antsy if https://www.drupal.org/project/issues/search/drupal?status[0]=1&status[1... took up to 12 min to update. However, I hear the options to make the updates near-instant are a whole lot better in Solr 5 than 3.
Comment #76
drummThat said, I think https://www.drupal.org/search/site/barnettech.com?f[0]=ss_meta_type%3Afo... might be better.
I do want to get some multi-word searches into our test cases, so we can try out apachesolr_proximity.
Comment #77
dddave CreditAttribution: dddave commentedThanks for the clarification. I'll have an eye on it but I feel like this is regressing.
Comment #78
hestenet@dddave - any specific examples you can provide of searches, single or multi word, and what the ideal results should be would be very helpful. We can add those examples into drumm's script for evaluating whether search has improved: http://cgit.drupalcode.org/infrastructure/tree/Misc/site-search-test.php
For the 20 or so kinds of searches we're tracking right now (and we tried to be representative with the types of searches) we're seeing some strong improvements. But more eyeballs will help.
Comment #79
dddave CreditAttribution: dddave commented@hestenet My "issues" are not with site search which this issue is about, isn't it? If I notice issue queue search going downhill I create a new issue.
Comment #80
hestenetComment #81
hestenetComment #82
hestenetComment #83
kristofferwiklund CreditAttribution: kristofferwiklund commentedI have added a issue for User searches.
Comment #84
hestenetComment #85
pale177 CreditAttribution: pale177 commentedWere you guys able to incorporate Google Analytics into the search results?
We recently moved from GSA to Solr and users have been complaining all month long. I have the results biased by 1.Title 2.URL and 3.Content
We have over 75000 basic pages, news articles and profiles. The news articles are the worst, since they have titles that contain "human resources" and they push our HR page way off. Other pages like the xyz.com/president page are also pushed back in the results due to the news articles containing the word "president" a lot of times in the title and content.
I am guessing GSA was referencing to page rank somehow, that's the only piece missing from this puzzle.