Problem/Motivation
This issue attempts to solve a couple of inter-related problems with the Search module:
1. Submitting a search form (either from the Search block or the Search page) is currently redirecting you to a page with URL "search/(specific_page_URL)/(keywords)". This is not great because:
- If you submit the form again, you have a mixture of the previous keywords and the new submission, which caused various problems, including sometimes not being able to get rid of a problematic search submission.
- It wasn't a very standard URL pattern. Most search engines use GET parameters for the keywords.
- It causes problems if you try to search for something like "../../admin" and that gets into the URL.
2. When submitting a new search from a search results page, you ended up doing the old search query first, and then the new one, leading to inefficiency and sometimes carrying over error/warning messages from the previous search to the new results page.
3. The SearchQuery extender was calling drupal_set_message() and form_set_error(). form_set_error() calls would not work anyway (it is not a form and doesn't know what form the error should go to), and other query extenders do not call drupal_set_message(), so this is non-standard behavior. So it needs to have a different method for passing information back to users of the search query that there were problems parsing or executing the search query.
The warnings and errors are as follows:
a) In parseSearchExpression() if someone searches for something like "cat or dog" instead of using "cat OR dog", they get a warning message.
b) In executeFirstPass() it calls form_set_error() if there are no "positive keywords" (as opposed to "negative keywords" that are excluded) to search for, and aborts the query. We need to fix that, but see also #1126688: Allow users to use advanced search without keywords entered -- this message is not actually desirable in all cases, such as an advanced search.
c) In executeFirstPass() it calls drupal_set_message() if there was a large number of search terms (to prevent DOS attacks). The query can still work in this case, but the user needs to be warned. (Came from #1265946: DoS against core using a large number of OR search query terms.)
As a note... These issues may not seem related, but it proved to be impossible to fix them separately, so they were combined into one issue.
Proposed resolution
1. Change the redirection so that instead of going to "search/(specific_page_url)/(keywords)" we instead redirect to "search/(type of search)" with a GET param named "keys", and other GET parameters as needed for certain search plugins. See also #894486: Use the query string for search keys rather than appending them to the URL.
2. SearchQuery needs to not call any warning/error/message functions. Instead, it should keep track of query "status" and provide a method for users of SearchQuery to check and see what happened after a search query, and generate the appropriate messages itself. This will also take care of #1789768: search_box_form_submit() improperly calls form_set_error().
3. We need to be more flexible than in the past about generating the "Please enter some keywords" message, because in a Node advanced search, keywords can come from the main keywords field or some of the "advanced" sections (where you can enter "or" words and phrases). But we still need the message if no keywords or advanced stuff is entered at all. We have a searchIsExecutable() method for search plugins, which was set up for this exact purpose; we need NodeSearch to implement this method. See #2170411: Please enter some keywords message has vanished and #1126688: Allow users to use advanced search without keywords entered, which are fixed by this issue's patch.
[As a note, it turned out that there wasn't really a good way to fix #2 and #3 here without also fixing #1. We needed to separate out the query, the plugin, the page, the form, and the messages properly. This has since then became the main goal of this patch.]
4. The SearchQuery class methods and documentation needed to be somewhat overhauled, after which we probably won't need to do any more in #582938: Fix up documentation and member names in SearchQuery, and it may also be possible to take care of #1435834: Cannot alter search queries, tag added too late during the overhaul.
Remaining tasks
a) Get the patch fully working and reviewed. As of the patch in #34, it works when tested manually and all the Search module tests also pass. [Patch still needs a review.]
b) Add tests for all the Related Issues that are supposedly fixed by this patch, to verify that they are in fact fixed. [This has been done]
c) Note that there is one @todo added to in the SearchBlockForm::submitForm() -- but it's been filed as a separate issue and it was not introduced by this patch: [This was removed because SearchBlockForm was changed to use GET so it could directly go to the search page instead of using submit/redirect.]
#2174521: Page processing order: form submits vs. main page content is inefficient if form submit is a redirect
d) Commit patch and update the change notices (see the UI changes and API changes "Notes" sections for instructions).
User interface changes
a) The URL for searching when you submit keywords is changing from (page url)/(keywords)?(additional query params) to (page url)?keys=(keywords)&(additional query params). This is more in line with standard industry practice for web searches.
b) All the error and warning messages from searching are generated with drupal_set_message and not form_set_error, so they might look slightly different.
c) Some warning/error messages that weren't working in 8 due to regression bugs will reappear.
d) The form element in the Search block form where you type in your keywords is changed from having its HTML 'name' attribute being 'search_block_form' to being 'keys', and the search block form submits directly to the default search page with GET instead of using a redirect in the submit method (so its 'action' and 'method' attributes changed).
Change notice note
There is a draft change notice that covers (a) and (d), plus some other UI changes that are described on the previous change notice (https://drupal.org/node/2083471 -- I thought it wouldn't hurt to repeat them, since that one is mostly about the hook-to-plugin conversion):
https://drupal.org/node/2231633
This change notice will need to be published when the patch is committed.
API changes
(See also the UI changes, which may affect themers and modules that use form alters on the search form.)
a) Users of SearchQuery will need to check query status using the new getStatus() method and warn the user accordingly, instead of having the messages generated in SearchQuery itself.
b) Search plugins using the searchFormAlter() method to add fields to the search form (e.g., for advanced search) no longer need to define their own form submit, and in fact are discouraged from doing so. Instead, they should implement the new buildSearchUrlQuery() method. Search plugins that do not use searchFormAlter do not need to change in any way.
c) The misleadingly-named mostly-internal executeFirstPass() method in SearchQuery is renamed prepareAndNormalize(). After this patch, there are actually no core calls to this method, and there are probably not going to need to be contrib calls to it either.
Change notice note
Items (a) and (c) are included in draft change notice: https://drupal.org/node/2231623
This change notice will need to be published when the patch is committed.
Item (b) needs to be added to the existing change notice https://drupal.org/node/2083471 . This change notice didn't explicitly mention searchFormAlter() before. I think the easiest thing would be to copy in the new NodeSearch class after this patch lands (the whole class is included in the change notice). And then add this to the long bullet list:
Search plugins can use the plugin methods searchFormAlter() and buildSearchUrlQuery() to alter the search form, for "Advanced" searching or other parameters (previously they would have needed to use hook_form_alter() or hook_form_FORM_ID_alter() to do this).
These edits (pasting in the new NodeSearch class and adding this bullet) will need to be done after the patch is committed.
| Comment | File | Size | Author |
|---|---|---|---|
| #114 | search-1366020-114.patch | 73.22 KB | jhodgdon |
| #114 | interdiff.txt | 2.91 KB | jhodgdon |
| #102 | search-1366020-102.patch | 73.8 KB | jhodgdon |
Comments
Comment #1
jhodgdonI took a look at this today, and am updating the issue summary with some additional information and a proposed resolution.
Comment #2
jhodgdonI'm going to work on this one.
I think in the process I'll most likely end up resolving
#2170411: Please enter some keywords message has vanished
#1126688: Allow users to use advanced search without keywords entered
#1789768: search_box_form_submit() improperly calls form_set_error()
because I'll need to redo how we handle errors and warnings in searching. I'm going to postpone all of these until this is done.
I might also end up taking care of
#582938: Fix up documentation and member names in SearchQuery
which is already postponed on this and other issues.
Comment #3
jhodgdonIn working on this issue, I realized that I cannot get it to work properly without also fixing #894486: Use the query string for search keys rather than appending them to the URL as well as the issues above, so I'm also postponing that issue on this one.
Comment #4
jhodgdonOh, and the reason for needing to fix that other issue is that the redirects that the search module is doing mean that NodeSearch is not even accessed in many cases. We need to get rid of the redirection that is happening in the search module, which is currently redirecting the search to (search URL)/(keywords), in order for NodeSearch to handle the search properly and do its own error handling.
Comment #5
jhodgdonHere's a patch, which seems to work on my test site. I'm assuming that some of the tests will need updates, because this is a pretty major refactoring (including changing URLs for searches), but I'm going to go ahead and upload the patch so that anyone interested can look at it, and then go work on the tests.
Here are some notes about this patch, which I'll also summarize in the issue summary:
- The search forms (block and page) are now redirecting to the search page using URL (page url)?keys=(keywords)&(additional query params), instead of (page url)/(keywords)?(additional query params). This takes care of #894486: Use the query string for search keys rather than appending them to the URL.
- You will get a "Please enter some keywords" message if the plugin's isSearchExecutable() method returns FALSE. For UserSearch, that will be if there are no keywords in the main search keywords box. For NodeSearch, if some advanced search params are chosen, you will not get this message, and it will proceed with processing the search query, which can read words from some of the "advanced" section of the form. Then after processing, if you didn't enter either some sufficiently-long "or" words or a phrase or keywords, you'll get a message about that. And you'll also get a warning if you use "or" instead of "OR" in your search. This takes care of #2170411: Please enter some keywords message has vanished and #1126688: Allow users to use advanced search without keywords entered. Part of the fix was overriding the default isSearchExecutable() method in NodeSearch and using that to decide if a search can proceed or not, since that is what it was intended to do.
- Unlike before, these messages are generated from the NodeSearch class or the page controller, and not from SearchQuery and not in either of the form submit methods. This takes care of #1789768: search_box_form_submit() improperly calls form_set_error().
- I cleaned up the SearchQuery class and its documentation, fixing #582938: Fix up documentation and member names in SearchQuery, since a lot of it didn't make much sense with the new paradigm, and I also needed to add some new stuff for keeping track of "status" of the query and letting NodeSearch or other users of the query check the status after querying. In the process, I also moved the setting of the query tag into the searchExpression() method, since all the information needed for the tag would be there at this point. This may fix #1435834: Cannot alter search queries, tag added too late
- There is one To Do in the SearchBlockForm::submitForm(). In the case where you are already on a search results page and you search using the search block, you will end up building the main page content first (including search results from the previous search that was executed), and then the block form submit is processed, triggering a redirect, which then builds the new page, including search results from the new search. This is obviously bad -- terribly inefficient -- but I am not sure how to prioritize. We need the SearchBlockForm submit to be processed before the main page content is generated. Is that possible to accomplish?
I also want to thank berdir and larowlan for helping me figure some of this stuff out yesterday in IRC. It was at berdir's suggestion that the SearchPageForm stopped being an EntityForm per se and became just a regular Form, with the search page ID being passed in, and it was also berdir's suggestion to make the new Search access class.
Comment #7
jhodgdonWell, that's about what I expected. 16 of the Search tests had failures, and 9 passed. Probably 90% of it can be fixed by fixing the URLs from search/node/(word) to search/node?keys=(word). I'll take a look but possibly not until tomorrow.
Comment #8
tim.plunkettPlease use $this->l().
This change is not okay.
This needs to be an entity.
#2166863: Add a formBuilder() method to ControllerBase
Please use $this->l()
This is just the one method. I'll be back for the rest.
Comment #9
jhodgdonYes, I'm going to reverse that change that made the SearchPageForm not any more an EntityForm. That was done during the iterations I went through to get this working, and then I reversed the change I had done that made that necessary, so it's no longer necessary. I agree it should still be an EntityForm. I'll take care of that. Might as well not review the rest until I make that change.
Comment #10
berdirI suggested to make it a normal form because from what I saw in that discussion, an entity form didn't seem like a good match, we're not editing an entity and the entity is not passed in as an argument. Looked as a lot of what makes up an entity form behavior would have to be overwritten so you end up with more code to overwrite it again than the necessary glue code to use a normal form.
If you can get it working, fine with me ;)
Comment #11
jhodgdonThanks for clarifying that Berdir! Yeah at the point where we were discussing things in IRC on Monday, I was trying to get it to work without using a page controller, and it being an entity form was a pain. But I found I needed to put the page controller back in the loop, so the entity form is no longer such a pain, and I think putting it back to an entity form now will allow me to take the access controller back out of the equation and also to skip the entity load calls.
I also woke up this morning with a lot of ideas on how to simplify this patch further. Don't bother reviewing what's there now... I'm going to overhaul it today.
Comment #12
jhodgdonOK, here's a new patch:
- I made SearchPageForm back into an entity form, with corresponding changes in the Controller and Route generator. The net change to the Route generator is just that it doesn't have /{keys} in the route.
- I didn't need the custom access service any more.
- I added a method to the plugin interface (SearchInterface), to generate the URL query from $form_state. I felt this was cleaner than making NodeSearch have its own form submit handler -- instead it just generates the URL GET query parameters it needs, which are what it gets in setSearch(). So now the redirection is handled in a uniform way by SearchPageForm. No changes were needed to UserSearch (it just uses the default from the base Plugin class).
I still haven't fixed up the tests but I thought it would be best to do it in two incrementals. I have tested this one manually and it's working fine. The next step will be to fix the URLs in the tests and get them to pass.
Also note that I made an interdiff but it doesn't show that the Access class is no longer in the patch. Reading the new patch directly may be easier than an interdiff anyway, because a lot of the interdiff is reversing changes that I now am not making at all.
Comment #13
jhodgdonMinor stuff -- sorry forgot to do another git add and my git diff --cached didn't get these into the patch file.
[EDIT] Ignore the .htaccess stuff in that interdiff. I can't seem to do anything right this morning. :)
Comment #16
jhodgdonOK, I have made some changes to the tests. They all now pass locally, so hopefully the bot will be happy as well. Here's a new patch and an interdiff.
Notes:
- Most of the fixes were changes to the URLs in tests -- we don't put keywords at the end of URLs any more; instead it's a GET keys= parameter.
- SearchKeywordsAndConditions - I took out a few lines because search results are no longer generated unless keys= is in the GET so that part wouldn't work any more, and now because keys= is in the GET one line above that had to be changed where it compared a print_r with what was on the screen.
- A functionality bug uncovered by one of the tests: the page title of the Search pages had changed from being "Search" to being the tab title. A change to SearchPageRoutes fixed that.
- Another bug: in NodeSearch when it tried to make the "too many and/or expressions" it used $searchSettings instead of $this->searchSettings. Oops.
Comment #19
jhodgdon16: 1366020-search-query-url-refactor-16.patch queued for re-testing.
Comment #20
jhodgdonThe test that failed there was a random bot failure in an Image test, definitely not related to this patch. It looks like all the search-related tests passed. !!
So... I just updated the "remaining tasks" section of the issue summary. I think if we can get this patch to RTBC the rest of the remaining tasks can probably be follow-up issues, but I'm open to opinions on that. One thing that this issue might benefit from is writing tests for all the issues in the summary that it purports to (at least maybe) fix, and seeing if they are in fact fixed by this patch.
And if anyone has ideas about the @todo in SearchBlockForm, I'd love to hear them. The problem there is that if you are on, for instance, search/node and have just done a search, and then you submit a new search by typing keywords into the Block form, you will end up executing queries first for the previous search, and then for the new search. This is not a new problem, actually -- and in fact before this patch you would also do that when submitting a new search from the main search form; it's slightly better now in that the problem only occurs from the search block. The reason is that the main page content is rendering (at the old URL and therefore with the old search query) before the block's form submit happens, which then does a redirect to a different URL, which then renders the new search query and results. Any ideas?
Comment #21
tim.plunkettThis is the only unfortunate change I can see. I'm working to remove path-baed redirects, and this should continue to use route names. Can't the getUri() calls be replaced by checking the current route name?
@todo should be indented two spaces on following lines
Seems unneeded now?
Out of scope, but I don't blame you...
Comment #22
jhodgdonRegarding #20, I talked to dawehner and Crell in IRC and filed
#2174521: Page processing order: form submits vs. main page content is inefficient if form submit is a redirect
Comment #23
jhodgdonRegarding #21:
1. Ummm... The redirects are all using routes. The only thing getUri() is used for is to verify that it's actually a redirect to a new URL (to avoid redirecting to the same URL that you were already on, which is kind of pointless). In both cases, it's doing something like:
So... Oh I see, the RedirectResponse itself is getting the full URL. Yes, I guess I could pass the route and query in there. Not a problem to change, but I still think I need to generate the URL to compare to the current URL and only redirect if they're different.
2-3 - sure, will fix.
4. I can take out the out of scope changes and file a separate issue. I fixed up a few docs comments in classes that I had to look at for this patch... can reverse them even though it's so hard for me to do and put on separate issues if you want. I had to look at all the uses of SearchQuery and I just noticed a few other things... sigh, don't want to kill kittens!
Comment #24
jhodgdonWait though. It looks like RedirectResponse::construct() expects a full URL? I don't see how I can change what the Search forms are doing?
Comment #25
jhodgdonComment #26
tim.plunkett$form_state['redirect'] = new RedirectResponse($url); wouldn't actually work AFAIK.
Just $form_state['redirect'] = $url;
Or what I actually want:
$form_state['redirect_route']['route_name'] = $route_name;
Comment #27
jhodgdonAh, OK. Thanks for the clarification and patience with my lack of knowledge of all the new 8.x APIs. I'll see if I can make that work in the next iteration.
Anyone else have comments before tomorrow and the next patch?
Comment #28
jhodgdonUpdated summary.
I think my next tasks are (a) address review comments in #21, and (b) make some tests for all the issues this patch is supposed to fix, verify that the tests (without the patch) fail, and verify that with the patch, they pass.
Comment #29
jhodgdonLots of details here... summary below
In reading over
#1126688: Allow users to use advanced search without keywords entered
I realized it was not really as related to this issue as I thought (yesterday I changed the title there to reflect what it is actually doing). So I've un-postponed that issue and taken it out of "related issues". I am, however, adding a couple of tests to make sure that what I thought that issue was about doesn't come back (namely, that if you enter OR keywords or a phrase in the Advanced search area, you do not get the "please enter some keywords" message).
I am also creating tests for the following issues, which should be fixed by this patch:
#2170411: Please enter some keywords message has vanished
#2017095: After searching for a too-short word, original search keyword is retained in subsequent searches (which is marked as a duplicate of either #1789768: search_box_form_submit() improperly calls form_set_error() or [#894486)
#1529778: Search with validation error (e.g. small word error) blocks new search from search block form (which is marked as a duplicate of #894486: Use the query string for search keys rather than appending them to the URL)
#890058: Searching for ../../admin takes me to the admin page (ditto)
#1421560: searches beginning with a . lead to apache Forbidden error (ditto)
#1435834: Cannot alter search queries, tag added too late
I've checked over all of these tests carefully on my local box, including the debug output, and they mostly doing what I had hoped.
There is one exception: the test for searching for "../../admin", which didn't fail on my local box although a manual test did illustrate the bug is still there in the current D8. There may be something in drupalPostForm that is doing something weird? Anyway, I also manually tested this search with the patch in #16 applied, and as expected, this patch does fix the problem.
Summary
So... I'm attaching:
a) A test-only patch, addressing all the Related Issues. At least on my local box, this has multiple failures and exceptions in the added tests in SearchBlockTest, SearchPageTextTest, and the new SearchQueryAlterTest. [Note: this is a patch for Core as it exists now. It applies after the patch in #16 only with some fuzz.]
b) I also addressed the issues in #21 and am attaching an interdiff for those changes (I didn't include the new tests in that interdiff file, since they're already in a separate patch file anyway). I also rewrote that @todo in SearchBlockForm and included a link to the issue.
c) A new patch containing both a and b on top of the patch in #16. On my local box, this now passes all of the tests from (a). Hooray! Let's see what the test bot thinks, as well as any reviewers.
And also... Several of the issues resolved by this patch are marked Major so I am upping the status of this one, since it should resolve all of them.
Comment #31
jhodgdonWell, good. The "fail" test failed; the full patch with new tests passed. I think this is ready to go. Any other opinions?
Comment #32
tim.plunkettSo this is a new bit of logic, right? I don't 100% understand why we're adding that.
And what are we actually trying to compare, the path with query string? Or just like /search vs /search/node?
If the querystring doesn't matter, we should just check the current route name and compare it. That way we don't generate the URL just to throw it away.
Comment #33
jhodgdonRE #32: Good question.
Yes, it's a new bit of logic... There are several redirects happening here.
a) The redirect from path search to search/(default search page's URL) is not changed by this patch. That happens in SearchPageRoutes in route search.view when it picks up the 'search' path and reroutes it to the default configured search page by calling \SearchController::redirectSearchPage().
b) Then once you're on a particular search page like search/node or search/user, if you submit the search form that is in the main page body, that is where the code you are looking into happens (at least, the version in SearchPageForm). It needs to pick up the keywords and other stuff from the POST, make it into a GET query, and redirect to the same route (which is now search.view_(search page entity ID)) with the appropriate GET stuff. [Before this patch, it redirected by putting the keywords after the base URL, and the NodeSearch submit function's redirect stuck the other POST info into the GET query.]
I figured there was not a need to redirect if the URL + GET query was the same as the previous URL, because we're already on that page, and a redirect means starting the whole process again. It seemed more efficient. If you object, we can take that out. It's not saving a huge amount of time, because I reordered the order of rendering in the page controller, so at least it's attempting to build the form first, and the redirect (if it happens) comes before it tries to build the search results at least.
c) Then there is nearly identical code in the submit handler for SearchBlockForm. That form could appear on any page on the site -- it's in a block. The code there figures out the route for the default search page, and redirects to that with a GET query that has the search keywords in it.
The block has an extra efficiency problem if your starting point was a search results page, as discussed in #5, #20, #22, and #2174521: Page processing order: form submits vs. main page content is inefficient if form submit is a redirect, because the form in the block is never built (and its submit processed) until the main page has already been built. So in the case of doing the same search again, not doing the redirect because the URL is the same would save you quite a bit of time (if you redirected you would need to do the same search query again that you just did in the first build of the page). So, I think it definitely makes sense to keep the "is the URL different" check in SearchBlockForm, but again if you object we can take that out.
On the other hand, it does make the code for redirecting slightly less efficient in the case that the search is different (which is likely the usual case), because it does have to generate the URL and compare to the current URL. I'm OK with either way -- if you think it's unnecessary I can definitely take the check out.
Comment #34
jhodgdonAfter thinking about this for a few days, I decided that the edge case of someone doing the same search they just did isn't worth the extra logic. We should just keep it simple and always redirect.
Here's a new patch with interdiff.
Comment #35
jhodgdonupdated summary and simplified title
Comment #36
nick_vhSteps to reproduce the problem as a user
a) In parseSearchExpression() if someone searches for something like "cat or dog" instead of using "cat OR dog", they get a warning message. The query can still work in this case, but the user needs to be warned.
Without Patch
1) Install drupal using normal profile
2) Enable Search
3) Create Article content with some random words and include "cat" and "dog". Create 3 articles, 1 with just "cat", 1 with just "dog" and 1 with "cat dog"
4) Search for "cat or dog"
5) no results should show up with the lowercase and a warning is shown.
6) The query fails to present relevant data as it does not detect the lowercase or.
Search for cat or dog => https://www.evernote.com/shard/s44/sh/4604c795-2c97-4af9-9fba-dc46ae04b5...
Search for cat OR dof => https://www.evernote.com/shard/s44/sh/533ebe5f-6e89-4119-b885-f770679ca6...
With Patch
1) Install drupal using normal profile
2) Enable Search
3) Create Article content with some random words and include "cat" and "dog". Create 3 articles, 1 with just "cat", 1 with just "dog" and 1 with "cat dog"
4) Search for "cat or dog"
5) no results show up with the lowercase and a warning is shown. I thought the idea was to let the query still execute?
6) Search for cat or dog => https://www.evernote.com/shard/s44/sh/ca8a1260-0cd1-4473-a206-3bab6c2170...
Search for cat OR dof => https://www.evernote.com/shard/s44/sh/0fee77c7-495c-41dd-84c1-76bcfd62ce...
7) I might have misunderstood the purpose of this patch, so tested with "cat or dog" in the content and as search. https://www.evernote.com/shard/s44/sh/e9171bbb-ba3d-4577-9a75-8bede6877a.... This does show up so if that is the intent of the patch, that's good.
b) In executeFirstPass() it calls form_set_error() if there are no "positive keywords" (as opposed to "negative keywords" that are excluded) to search for, and aborts the query. It's not a form and this call is never going to work. We need to fix that, but see also #1126688: Allow users to use advanced search without keywords entered -- this message is not actually desirable in all cases, such as an advanced search.
Without Patch
1) Install drupal using normal profile
2) Enable Search
3) Create Article content with some random words and include "cat" and "dog". Create 3 articles, 1 with just "cat", 1 with just "dog" and 1 with "cat dog"
4) Search for "-cat AND -dog"
5) It actually shows You must include at least one positive keyword with 3 characters or more.
https://www.evernote.com/shard/s44/sh/b89cc171-9097-4421-a4ba-eaa1690d35...
6) I assume what is expected here is that it works with negative searches?
With Patch
1) Install drupal using normal profile
2) Enable Search
3) Create Article content with some random words and include "cat" and "dog". Create 3 articles, 1 with just "cat", 1 with just "dog" and 1 with "cat dog"
4) Search for "-cat AND -dog"
5) It actually shows You must include at least one positive keyword with 3 characters or more.
https://www.evernote.com/shard/s44/sh/c945422a-8408-4ac4-9f7e-32a4e28d23...
6) My assumption from this issue was wrong. However code wise the form_set_error is still there. It is deprecated now so we should probably convert that in this issue if that is what the issue wants to tackle?
c) In executeFirstPass() it calls drupal_set_message() if there was a large number of search terms (to prevent DOS attacks). The query can still work in this case, but the user needs to be warned. (Came from #1265946: DoS against core using a large number of OR search query terms.)
Without Patch
1) Install drupal using normal profile
2) Enable Search
3) Create Article content with some random words and include "cat" and "dog". Create 3 articles, 1 with just "cat", 1 with just "dog" and 1 with "cat dog"
4) Search for "cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog". Doing this takes ages so I warn you. This does not show the error of too many expressions. It is because an OR is expected to be there and thus it does not error out but leaves the drupal site occupied and could cause the DDOS.
5) Search for "cat OR dog OR cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog". Doing this takes ages so I warn you. This does not show the error of too many expressions as it timed out on my machine. Probably because the previous search was still executed during the form redirect.
6) Search for "cat OR dog AND cat OR cat AND dog" shows you the message but it looks like it showed the message from the search before.
7) Search for "cat OR dog AND cat OR cat AND dog OR cat AND dog OR dog" shows you the message, finally!
With Patch
1) Install drupal using normal profile
2) Enable Search
3) Create Article content with some random words and include "cat" and "dog". Create 3 articles, 1 with just "cat", 1 with just "dog" and 1 with "cat dog"
4) Search for "cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog". Doing this takes ages so I warn you. This does not show the error of too many expressions. It is because an OR is expected to be there and thus it does not error out but leaves the drupal site occupied and could cause the DDOS
5) Search for "cat OR dog OR cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog AND cat AND dog". Doing this takes ages so I warn you. This does not show the error of too many expressions it times out.
6) Search for "cat OR dog AND cat OR cat AND dog" shows you the message but it looks like it showed the message from the search before. When doing this, the message showed up because the previous search was still running. I assume the text "Your search used too many AND/OR expressions. Only the first 7 terms were included in this search." is not true somehow as it times out with the big statement while it does not with the small statement.
7) Search for "cat OR dog AND cat OR cat AND dog OR cat AND dog OR dog" shows you the message, finally!
I would not allow the user to put in more than 7 capitalized OR or AND statements. It really times out and causes heavy load on the website.
Comment #37
nick_vhComment #38
nick_vhComment #39
pwolanin commentedAs a follow-up we should try to change the form method to GET instead of POST, so we bypass the need to redirect at all.
Comment #40
nick_vhThat said, this patch does solve different issues, so I think it's a matter of cleaning up the issue summary to better reflect what was changed here and what the problem was. I'm going to take a stab at this, as the messages are still there.
Comment #41
nick_vhWe should split up
into its own issue so this can be focussed on changing the keys to a GET param. I see it has already a link to an issue of its own but I'mnotsure if it fully reflects what is needed here.
Aside of that, I think this patch is really awesome and a good start for a problem I encountered during the SprintWeekend2014 in Boston. I'd like to find out what else is needed from this patch to continue now that the issue summary has been adjusted and more accurately reflects what the aim is here. I've tested it and it works, and it looks like all tests also pass so I'm almost ok in marking this patch as reviewed so we can tackle the other issues such as that block that does a POST and executes the previous search first etc...
Comment #42
nick_vhSetting it back to needs review now that the goals are back aligned.
Comment #43
jhodgdonCleaned up the issue summary.
Comment #44
jhodgdonOK so... Can someone please review this patch? It doesn't seem like Nick_vh came up with any objections, right?
Comment #45
jhodgdon34: 1366020-search-query-url-refactor-34.patch queued for re-testing.
Comment #47
jhodgdonLooks like this needs a reroll. Not sure why that tag is there either since I wasn't involved in the sprint weekend.
Comment #48
rcaracaus commentedre-rolled
Comment #49
rcaracaus commentedComment #51
jhodgdonThe reroll needs to be done differently, because another issue was committed that made some changes to the searching that we need to keep. I'll try to get to it soon. Thanks for trying!
Comment #52
jhodgdonHere is a new reroll.
Comment #53
jhodgdonComment #56
jhodgdonMessed up that reroll. Here's a new one. The problem was in SearchController where it built the form. Apparently the entity form building stuff had changed since the last time the patch was done.
Anyway, not really an interdiff, this is still really just a reroll of the earlier patch.
Comment #58
jhodgdonLooks like that reroll didn't quite work either -- two Search tests are failing. I'll see what I can do.
Comment #59
jhodgdonSo one of the tests that is failing is SearchBlockTest.
The line where it is failing is the test where it verifies that you can search from the Search block on a 404 page (like going to example.com/foo and using the Search block there). This was working fine on Jan 25th with the patch in #34, it is currently failing in the test bot as well as manual testing.
What's happening in this latest patch is that SearchBlockForm in its submit method is setting
Then FormBuilder is transforming this successfully into a response that would redirect to a page like
example.com/search/node?keys=test. But in FormBuilder::sendResponse(), somewhere in these lines of code:the response is being discarded and the effect is that you just get back to example.com/foo and do not search.
I discussed this in IRC with damiankloip and timplunkett, and it is most probably due to the patch from this issue:
#2147153: Replace the last instance of $_GET/$_POST; Create a special exception listener / exception controller which allows to use POST requests
But I don't have time to test this today... We need to check if with that patch reverted, you can search from a 404 page with the patch in #56 on this issue installed.
I also haven't yet looked into the other test failure, but it also looks like it is related to the search block not redirecting correctly, and may also be due to this problem.
Comment #60
dawehner@jhodgon
I will have a look at that tomorrow, unless someone beets me, given that I also debugged the other issue.
Comment #61
dawehnerAfter quite some debugging I think this is the proper solution. We want to let the subrequest not break redirect.
Comment #62
jhodgdonOK... dawhener, should that code added in the interdiff be put into a completely different issue, or a reopen of #2147153: Replace the last instance of $_GET/$_POST; Create a special exception listener / exception controller which allows to use POST requests? Because it definitely doesn't relate to this issue specifically!
Let's see if the tests pass: the ones in Search that were failing here, the ones that the commit on that other issue introduced, and not to mention the rest of core. I have no way of evaluating the change here to see if it makes any sense or not. But thanks!
Comment #64
jhodgdonWell @dawhener, that seems to have mostly worked, except it failed in Drupal\system\Tests\System\AccessDeniedTest->testAccessDenied() line 96. Which seems like it was probably related to this patch?
Again, I think the modifications to the core system should probably be moved to another issue, but it does look like your patch fixes the Search problems at least.
Comment #65
jhodgdonI've spun the form redirect problem off into a new issue:
#2206909: Regression: Form submit redirects from 403/404 pages are no longer possible
Comment #66
jhodgdonWe had a discussion in IRC about this issue today. pwolanin made a good suggestion: the search block form should really be submitting directly to the search page with a GET, rather than doing a form submit / redirect (which is inefficient). I'm working on this.
Comment #67
jhodgdonOK, here is a new patch that uses GET to submit the Search Block form directly to the Search page, which should be more efficient than the redirect, and avoids running into the problem of #2206909: Regression: Form submit redirects from 403/404 pages are no longer possible. It seems to work on my test site (I can submit the search from from page example.com/foo as well as regular pages); let's see what the Bot says.
I also updated the issue summary, because this change required a slight UI change: the name of the keywords form element in the Search Block Form was changed from 'search_block_form' to 'keys' so that the form could be submitted directly to the search page with GET.
Note: The interdiff is to patch from #56, not the patch from #61 that had dawehner's fixes for that other issue in it.
Comment #69
jhodgdonThe test failures...
The Search Block Form is now using GET. Meaning that the tests that were submitting the form, which use WebTestBase::drupalPostForm() fail, because drupalPostForm() uses POST to submit the form values, which is not supported any more by this form. So the search block actually works fine, but the tests do not work.
So, all the tests can really do is verify that the form action/method are set up correctly (using xpath), and then send to the proper URL with drupalGet(). I'm reworking to do this.
Comment #70
jhodgdonOK, fixed the test failures (at least on my local box). One more try!
Comment #71
dawehnerI put a search form block onto the left sidebar, typed in nothing and landed on the following url:
Comment #72
jhodgdonRE #71... Did you clear the cache? There is a form alter in search.module
which takes care of this, at least on my test installation.
Comment #73
jhodgdonAlso on #71, there is a test that verifies that the URL you get to after submitting the search box form is exactly correct, in SearchBlockTest:
So... I'm not sure why you were seeing this (probably cache clear or not fully applied patch?) but in my install and in the test, the same result is not happening.
Comment #74
dawehnerOh I am sorry :(
Comment #75
pwolanin commentedneeds re-roll. working on it.
Comment #76
pwolanin commentedI was a bit confused with the changes made in #2195907: Search ranking for number of views fails in PostgreSQL
So, hopefully this is the right resolution.
Comment #77
pwolanin commentedposted a 1st patch at #1191278: Simplify DX of removing form_id, form_build_id, and token from RESTful GET forms
Comment #78
jhodgdonVery slightly different reroll attached.
Comment #79
pwolanin commentedThis uses a trick from Views exposed filters to set
$form['#name'] = '';so that you do not get &op=Search added to the URL.Also needed to fix the test code to handle that properly, adds a missing use for String, and fixes some docs.
Comment #80
jhodgdonI think the incremental changes there look good!
Comment #82
pwolanin commentedfixes the other test to use the correct query string.
Comment #83
pwolanin commentedComment #84
jhodgdonYes, correct. So I am happy with the latest two incremental changes from my last patch, which got rid of the &op=Search in the URL when redirecting from the search block. As far as I am concerned, this is ready for RTBC. Are there any other concerns?
I also made a pass through the issue summary and updated it to the latest status.
Comment #85
jhodgdonpwolanin: This is I think the highest-priority issue in search.module, since we've been talking about this change for years and it fixes about 10 long-standing bugs. I think it is ready for RTBC (I have reviewed the parts that you worked on)... What do you think?
Comment #86
jhodgdon82: 1366020-search-query-url-refactor-82.patch queued for re-testing.
Comment #88
jhodgdonPatch apply failure was in SearchCommentToggleTest... fixed with clean reroll. (Patch actually applied fine with a bit of "fuzz".)
Comment #89
pwolanin commentedIt would be nice to get this in 1st to make this patch cleaner: https://drupal.org/node/1191278
Comment #90
effulgentsia commentedJust coming in new to this issue from #1191278: Simplify DX of removing form_id, form_build_id, and token from RESTful GET forms. #88 needs a reroll. Re #89: what would be cleaner about this patch if/when that issue lands?
Comment #91
jhodgdonRE #90 - we would not need this bit in search.module:
Comment #92
effulgentsia commentedIf that's it, then I don't see that as worth holding this issue up on at all. Maybe the other issue can simplify the DX of that, but those lines of code seem like pretty clear ways of asking the form to not output those hidden elements.
Comment #93
pwolanin commentedre-rolled
Comment #94
klausiSo we need this to make progress with #2228217: Further optimize RouteProvider and add web test for large number of path parts. Search module uses the routing system in a not supported way by adding arbitrary path components with slashes at the end of URLs.
I reviewed the patch and it mostly looks good, but I'm not that familiar with the search module code, so I guess someone else should review this as well.
This should be $this->urlGenerator->generateFromRoute().
Comment #95
pwolanin commentedThanks for the review. Fixed that one call.
Comment #97
pwolanin commentedoops, need to call the method, not use the property.
Comment #98
jhodgdonOK... So what do we need to do to get this to RTBC?
There were early reviews and suggestions by tim.plunkett and Nick_vh prior to #70, and I believe they were all addressed by the time the patch in #70 was finished. All the code up through there was mine, so I cannot mark this RTBC. (The patch by dawehner in #61 was not actually used.)
Then pwolanin made some changes, and I rerolled once or twice. I officially approve all of the changes pwolanin made. Then klausi made one review comment in #94, which was addressed by pwolanin in #97. That change looks good to me.
So... maybe pwolanin, tim.plunkett, Nick_vh, or klausi can set this to RTBC, or back to "needs work" if there is something else we need to address?
Let's get this done!
Comment #99
pwolanin commentedFrom discussion with fago, seems there is some more work to do on the clean get FAPI patch, so let's unblock this and follow-up once we have that fix.
Comment #100
jhodgdonGreat, thanks! Unassigning to avoid "this will be committed by jhodgdon" confusion in RTBC queue.
Comment #102
jhodgdonHere's a reroll. Something changed in SearchInterface that was incompatible.
I made a couple of minor docs changes in the reroll of that section of the patch. Nothing else needed any attention. Please check over the docs in SearchInterface...
Comment #103
pwolanin commenteddocs look fine.
Comment #104
jhodgdonI have just updated the issue summary to deal with change notices, and added a couple of draft notices.
Comment #106
pwolanin commented102: search-1366020-102.patch queued for re-testing.
Comment #107
pwolanin commentedI don't know why that 1 test run failed to install, but I think it was a problem with the bot or HEAD, so putting back to RTBC. I also manually installed locally with the patch and there was no problem.
Comment #108
catchI haven't reviewed the whole patch, but I'm not sure about removing CSRF protection here, there's at least a couple of cases that could go wrong:
1. A contrib module adding flood protection for the search form.
2. A contrib module that tracks user searches - i.e. showing recent search history or recommendations or similar.
Core does neither of these at the moment, but it wasn't discussed in this issue yet that I can see.
I expect the answer to be that you can already just navigate to the search page directly and there's no CSRF protection there already.
Comment #109
pwolanin commentedSince the target page takes parameters from the query string, any CSRF protection on the form is pointless.
Any of these uses cases should be addressed at the point when the search runs (using params from the query string), not in the form.
Comment #110
webchickLet's see what catch says about that.
I also don't see any hook_update_N() functions in here, though I'm not exactly sure how we could do that. But this change will definitely break menus, paths, links within content, etc. that are pointing to "old style" search URLs. It's probably worth a separate change notice about this itself, targeted at site builders vs. developers.
Comment #111
catchI think that's OK about the CSRF protection on the form, pwolanin's right we'd need to do any handling in the controller rather than the form. Unassigning me (busy week this week and haven't been able to review properly still).
Comment #112
alexpottDuplicate array meaning you are not testing what you think you are.
Unused
Unused
Unused
Comment #113
jhodgdonOh, yeah, the redirects were removed from the patch and not from the "use" statements. Good catch! And on the tests. I'll make a new patch... just got back from 10 days off so I'm working through about 200 updated issues... give me a day or two.
Comment #114
jhodgdonOK. I had to first reroll this patch a bit to get it to apply. Interdiff is starting from the reroll, and addresses #112.
Regarding the "needs change record" tag, there are draft change notices here and one note in the issue summary about the edits needed to one existing change record when/if this ever gets committed. So the change records are ready to go when this gets committed.
Comment #115
pwolanin commentedChanges look good.
Comment #116
xjmThanks for the clear instructions regarding change records in the issue summary.
Comment #118
webchickLooks like Alex's feedback was addressed. I've been picking away at reviewing this patch about 20 minutes at a time for the past week or so. I'll admit a lot of it goes over my head (especially the query builder stuff - yikes :)), and the whole thing makes me a bit sad because I really love our search/with/clean/urls/that/are/easy/to/remember/and/hack. :( But this has buy-in from all the right folks, and it blocks a number of other things, so we can probably go ahead with this.
Committed and pushed to 8.x. Thanks!
We still need a change notice targeted for site builders on how to fix their links, IMO. It's not a beta blocker, however, so not really sure what to do with the tags. Hacking the title instead.
Comment #119
jhodgdonTHANK YOU @webchick!
I have taken care of the change notices, as detailed in the issue summary:
a) Published the two "draft" notices that I had written before (isn't that supposed to be done at commit time?). This actually took care of the "site builder" change record, which was/is: https://drupal.org/node/2231633
b) Made the changes detailed in the issue summary to the existing change notice (adding a bullet point and copying in the new NodeSearch plugin as the current "D8" code).
Comment #120
jhodgdonPhew! I also went through and looked at all the issues we had postponed on this one. Marked them as "duplicate" or "fixed" as appropriate. Soooooooo glad we have this done!
Comment #121
xjmAwesome work! Thanks @jhodgdon.
Comment #123
cilefen commented#2637680: Submit buttons for GET forms in search/views are not W3C valid due to empty 'name' attribute