I believe the fix is pretty simple I think the (int) cast is not really needed, see following

search.module

function search_index($sid, $type, $text)

$word = (int)ltrim($word, '-0'); //original line, causes overflow with large numbers
$word = ltrim($word, '-0'); //fixed line

function _search_parse_query(&$word, &$scores, $not = false)

$s = $num ? ((int)ltrim($s, '-0')) : $s; //original line, causes overflow with large numbers
$s = $num ? (ltrim($s, '-0')) : $s; //fixed line

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

magico’s picture

Version: 4.7.2 » x.y.z

@stilldodge: could you make a real patch for this (it is still present in HEAD)?

Thanks!

Jo Wouters’s picture

I tested this with head, and could not repeat it.
Numbers up to 19 digits (bigger then the maxint on my machine) are correctly indexed, and I can search for them (and find them).

magico’s picture

Status: Active » Fixed

Thanks Jo!

Anonymous’s picture

Status: Fixed » Closed (fixed)
marikeeler’s picture

Version: x.y.z » 6.x-dev
Assigned: Unassigned » marikeeler
Priority: Normal » Critical
Status: Closed (fixed) » Active

Hi, I know this is 'closed' But I have an issue with searching for #'s (digits). I tried the first posts fix (removing int) and it's still not working. Not sure what the HEAD aspect of this thread means, can someone explain?

I'm not a php expert.

Main issue: website is a forum, posts will be made with Order #'s, example. post title = 'PO Order # 455667' .
(Order # will also be in the body of post).

I would like users to type in the order # in the search box, and have it come up in results. no luck just yet.
(words work, just no numbers)

please please help.

Thanks

jhodgdon’s picture

Category: bug » support
Status: Active » Postponed (maintainer needs more info)

I am not having this problem in Drupal 6.15. If I put a number in some content, I can search for that number.

I tried it in the title as well as the body, and I tried it with and without a space between the # and the number.

Can you provide more information on how your site is set up, which modules you are using, etc.?

jhodgdon’s picture

Version: 6.x-dev » 7.x-dev
Category: support » bug
Status: Postponed (maintainer needs more info) » Active

Actually, I was able to duplicate this in Drupal 7 with a sufficiently large number. My page content is:

9999999999999888888888888877777777777776666666666666
978-0446365383
111
1111111111111111
1111111111111111111111111111111111
11111111111111111111111111111111111111111111111

Everything except the first number is searchable, but the first number isn't, and appears in my search index as 2147483647.

catch’s picture

Priority: Critical » Normal

Downgrading from critical, strange bug though!

jhodgdon’s picture

See the original bug report to see why it is happening (int cast of the word, for some unknown reason?). Should be an easy fix, but we need a patch and probably a test would be good too.

sin’s picture

This (int) cuts fractional part of a float numbers to not to be indexed and searched. We need to replace it like this:

$word = ltrim($word, '-0')
$dot_pos = strpos($word, '.');
if ($dot_pos) {
$word = substr($word , 0, $dot_pos);
}

jhodgdon’s picture

Good idea. We still need a patch and a test.

sin’s picture

Assigned: marikeeler » sin

I'll do the patch.

sin’s picture

Status: Active » Needs review
FileSize
1.86 KB
sin’s picture

Other thread for reference:
http://drupal.org/node/323299

Status: Needs review » Needs work

The last submitted patch, search.patch, failed testing.

jhodgdon’s picture

Detect invalid patch format:
Ensure the patch only contains unix-style line endings.

It looks like you need to set your line endings on your development enviroment/editor to Unix-style, and redo the patch.

jhodgdon’s picture

Issue tags: +Needs tests

Other comments on the patch:
a)

+          $s = ltrim($s, '-0');

Could use a comment saying it is stripping off leading zeros and - signs, and why this is a good idea.

b) Since the code that is preprocessing numbers is duplicated in search.extender.inc and search.module, I really think it should be made into a little function. The main reason for that is that it's essential it gets done the same way both times, and since the code has grown to 5-6 lines, it could easily be modified one place and not the other.

c) Needs a test to verify we can index/search for large numbers, so that this doesn't happen again. The test should include negative numbers, numbers starting with zero, and large numbers. I think the easiest thing would be to make a node including all three of these, run cron to index it, and then search for all of the numbers and verify that the node shows up in the search results.

jhodgdon’s picture

Assigned: sin » jhodgdon

I'm working on this - new patch plus tests.

jhodgdon’s picture

I did some testing... a couple of notes:

a) It turns out that truncating to integer is totally unnecessary. The reason is that search_simplify() is removing all punctuation (including decimal points) inside numbers.

b) I think that the real reason that large number searching is failing (at least in Drupal 7) is that the search index word field is only 50 characters. So words are truncated to 50 characters during search indexing. However, they weren't being truncated in the same way during searching, so they weren't matching the values in the index and were not found.

I'm still working on this... will try to get a working patch with a test. Right now my test isn't passing, so I have a bit of work to do (tomorrow).

jhodgdon’s picture

Status: Needs work » Needs review
FileSize
9.47 KB

A few more notes:

1) Removing the initial - sign on a number is unnecessary, because all punctuation including hyphens is removed during search_simplify anyway.

2) There is no way to search for negative numbers. The - signs are removed during search indexing, so the page is indexed as if it was a positive number. Then during searching, -(text) indicates to find a page without (text) in it. That's just a note; this patch doesn't do anything about it or change that in any way.

3) The ltrim() that removes leading zeros was not allowing you to search for 0234 and have it match 234. It only enabled that these words were scored the same. This seemed like it wasn't really the intention, so I made it so 0234 matches 234.

So, I've made a patch that makes sure truncation and leading 0 removal happens the same way during search indexing as searching, so that numbers can be searched more successfully. It also removes unnecessary (int) casts and redundant removal of -.

I've also added some tests. There are two sets of tests: (a) test that various numbers can be searched for, and (b) test that numbers with internal punctuation and leading zeros match each other.

Without the patches in search.module and search.extender.inc, my new tests all pass except the larger-than-50-characters number search test, and the leading-zero-matching tests. With the patches, all tests in search.test pass.

Gosh this is an old bug (July 2006?!?) with only a 5-digit issue number...

sin’s picture

Thank you for working on it!

You right about (int), search_simplify indeed removes a decimal point, (int) looks useless, now I have no idea what it was intended for, +1 to drop it.

Search works fine with your patch. I noticed that searching for 0.777 returns no results before the patch but works with your patch as expected.

Ancient bug indeed. The fix may worth backporting.

jhodgdon’s picture

Issue tags: -Needs tests

#21: 74673.patch queued for re-testing.

Status: Needs review » Needs work
Issue tags: +Needs tests

The last submitted patch, 74673.patch, failed testing.

jhodgdon’s picture

Status: Needs work » Needs review
FileSize
9.48 KB

I don't know why that patch wouldn't apply, but here's a reroll (it applied fine for me).

douggreen’s picture

FileSize
9.52 KB
  • Why the extra argument to search_simplify, when we always pass TRUE here?
  • The comment changed from Simplify keyword according to indexing rules and external preprocessors. to Simplify keyword as it would be simplified during indexing. makes less sense to me now.
  • Once you remove the extra check for $num, we can don't need to set the variable.
  • Nice tests!

Some suggested changes attached in patch, feel free to use, or go back to your last one...

Status: Needs review » Needs work

The last submitted patch, 74673.patch, failed testing.

douggreen’s picture

Failure has something to do with removing the search_simplify argument. Since we don't need the argument, this might be a legitimate failure, but we might need to fix the test

jhodgdon’s picture

FileSize
9.78 KB

There's an additional call to search_simplify that is not in the patch (unchanged line), and it doesn't pass TRUE. So we need the argument. I've put it back. I also revised that comment so hopefully it is better, and otherwise used your patch. Hopefully it will pass tests this time...

jhodgdon’s picture

Status: Needs work » Needs review
douggreen’s picture

Status: Needs review » Reviewed & tested by the community

@jhodgdon, sorry for being a pain here, but the only other call to search_simplify I see is in search.test. Where is this call that doesn't pass TRUE?

douggreen’s picture

Status: Reviewed & tested by the community » Needs review

oops, didn't mean to mark it RTBC yet, sorry :(

jhodgdon’s picture

Duh. I was searching the wrong repository (D6). You are correct that in the actual code, it's always TRUE.

And now I am remembering that I left that argument in there just for that test, because I was too afraid of splitting up the test strings, which I couldn't really edit very well (they have all kinds of interesting arguments in them). We could make the default be FALSE, although that would mean yet another thing to document as a change from D6 to D7. Or you could take a stab at breaking up those strings. I can't do it in my editing setup.

duellj’s picture

Status: Needs review » Reviewed & tested by the community

I've tested the patch, and both the search changes and the testing are working properly. I can verify the following:

-Search correctly handles complex numbers
-Tests pass correctly with patch
-Tests fail without patch

douggreen’s picture

@jhodgdon if you have the energy, I'd prefer that remove the extra argument and fix the test, but I'll leave this RTBC and not fight very hard.

jhodgdon’s picture

Status: Reviewed & tested by the community » Needs work

OK, I'll see what I can do. If it's impossible/nearly so, I may decide not to though. Last time I looked, there were many non-Latin characters in that part of the tests that I couldn't edit properly, and I didn't see a way to break it up into smaller chunks, but we'll see.

jhodgdon’s picture

I guess another option would be to make the default of the argument go the other way, and pass in FALSE (for no truncate) from the test. In that case, we would need to document this function change on the module upgrade guide (since the default behavior would be different from D6).

douggreen’s picture

My suggestion to remove the argument is that we have extra code in core, that is only there because our tests need it. Any extra function argument, whether the default or not, has this same problem.

jhodgdon’s picture

OK, point taken. I'll see what I can do about splitting up those strings in the test.

jhodgdon’s picture

Hmmm...

The problem is the file modules/search/tests/UnicodeTest.txt

I don't think I have an editor that can edit this file, because it just looks like a bunch of non-printable characters on my screen (I tried several editors in both Windows and Ubuntu Linux). It would need to be truncated to at most 50-character lines to work with the proposed changes to search_simplify.

Does anyone have an editor/system that can handle this file? Maybe we should ask chx, who put these characters in on this issue:
#604002: Poor search support of some Unicode scripts
to see if he can perhaps truncate them to 50 character lines?

jhodgdon’s picture

FileSize
15.66 KB

OK, I've fixed the test so that it breaks the strings up into shorter ones before trying search_simplify() on them. This way we don't have to modify that unicode text file. I also added a bunch of comments to the test, because it took me a bit to figure out what was going on.

The search simplify test now passes for me. Let's see what the test bot says...

jhodgdon’s picture

Status: Needs work » Needs review
jhodgdon’s picture

Oh. I also found some trailing spaces in search.test and removed those (I changed my editor settings so they are showing up in BRIGHT RED and they are hard to miss...)

jhodgdon’s picture

Issue tags: -Needs tests

#41: 74673-v41.patch queued for re-testing.

jhodgdon’s picture

#41: 74673-v41.patch queued for re-testing.

jhodgdon’s picture

#41: 74673-v41.patch queued for re-testing.

jhodgdon’s picture

#41: 74673-v41.patch queued for re-testing.

Status: Needs review » Needs work
Issue tags: +Needs tests

The last submitted patch, 74673-v41.patch, failed testing.

jhodgdon’s picture

Status: Needs work » Needs review

Here's a reroll. Patch still could use a review...

jhodgdon’s picture

FileSize
15.78 KB

Whoops. I guess I didn't attach that last reroll. Here's a reroll.

jhodgdon’s picture

In case someone wants to review this patch, here's a summary of what the patch does:
- search_simplify() now truncates everything to 50 characters, because the word index db table only stores 50 characters. This means that both at indexing time and search time, characters past 50 are ignored, and they can match. As things were previously, that was only done at indexing time (implicitly when you saved the word in the index db), so longer words wouldn't come up as search matches.
- Tests for unicode in search_simplify() had to be rewritten to take this into account.
- Changes were made when indexing/searching numbers, so that leading zeros were removed in numbers, but numbers were not cast using (int), as that was causing this bug.
- Tests were added to test number searching/matching.
- A few spaces at ends of lines were removed.

mcarbone’s picture

Component: search.module » shortcut.module
Status: Needs review » Needs work

This is a fantastic patch. Tests are thorough, and the change is very much needed. Two minor things you may want to add:

1) While it's true that negative numbers can't be searched (e.g., -100) because of the not functionality, it actually does work fine if you use phrase searching with quotes (e.g., "-100"). I created a node with -123456.7890 as the body and was able to find it with both "-123456.7890" and "-1234567890" -- so I think an example like this might be worth adding to the tests.

2) However, a minor bug popped up when searching for that number. If I search for 123456.7890 it comes up in the result, and the occurrence of that number in the node is highlighted in bold. If I search for 1234567890, the result comes up, but it is *not* highlighted in bold. In other words, the search_excerpt() function that generates the search snippet isn't using search_simplify() before highlighting the word match.

mcarbone’s picture

#2 above also applies to numbers with > 50 characters. I created a node with 666666666666666666666666666666666666666666666666666666666666 as the body. If I search for the whole thing, it gets highlighted in the snippet. If I remove a few characters so that it's still longer than 50 characters, it pulls up the correct result but doesn't bold the original number string.

David_Rothstein’s picture

Component: shortcut.module » search.module

I'm guessing this got moved between queues accidentally :)

Looks like a great patch, but no time for a real review at the moment, unfortunately.

mcarbone’s picture

I think I might take a stab at resolving the issues in #52 and #53 -- jhodgdon, ping me on freenode if you've already started on them but hopefully you haven't.

mcarbone’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests

#50: 74673-50.patch queued for re-testing.

Status: Needs review » Needs work
Issue tags: +Needs tests

The last submitted patch, 74673-50.patch, failed testing.

mcarbone’s picture

Status: Needs work » Needs review

I reran the test because I discovered locally that a recent change broke slashes in search. See #915214: Regression: Slashes broken in search.

mcarbone’s picture

Issue tags: -Needs tests
FileSize
20.41 KB

All right, patch re-rolled to account for issues in #52 and #53 above, with additional tests for both quoted negatives and for highlighting simplified matches in search snippets.

The code to support simplified matches in search snippets is somewhat non-trivial, but I'm fairly sure it's performant. It runs only on keywords that required simplification for matching, and so is O(kn) where k is the number of such keywords and n is the number of words in the original text. There's no reason to expect k to be larger than a handful.

This will fail testing until #915214: Regression: Slashes broken in search is resolved.

Status: Needs review » Needs work

The last submitted patch, 74673-59.patch, failed testing.

jhodgdon’s picture

Supporting more matches in search results is a separate issue, so please don't combine the patches... that issue has been around for a while. I'm on vacation in Europe for the next three weeks, not checking email much or doing patches... sorry I can't be of more help at the moment!

mcarbone’s picture

Status: Needs work » Needs review
FileSize
16.45 KB

jhodgdon, you're totally correct -- the search_excerpt issue I mentioned above exists without your original patch. (Newly created issue here: #916086: search_excerpt() doesn't highlight words that are matched via search_simplify())

I've re-attached your original patch with one line added with an additional test for a quoted negative. Minus the irrelevant search_excerpt issue, my original review from #52 then finds this patch RTBC.

However, the original patch will still fail until #298561: menu_tail should automatically act as a load function as well to allow slashes in search is committed so I won't switch it to that until this passes tests.

Have an excellent vacation.

Status: Needs review » Needs work

The last submitted patch, 74673-62.patch, failed testing.

jhodgdon’s picture

Status: Needs work » Needs review

#62: 74673-62.patch queued for re-testing.

jhodgdon’s picture

I took a quick look at this patch, and the code part looks reasonable (didn't really look at the tests yet)... what exactly in here is dependent on that other issue? Can we take that part out and get this in?

Status: Needs review » Needs work

The last submitted patch, 74673-62.patch, failed testing.

mcarbone’s picture

Yeah, it's just the test for "12/34/56789" -- the slashes need the menu_tail_load fix to work. I'll strip that out and put up a new patch later today.

mcarbone’s picture

Oh, this is coming back to me. Those tests are pretty much all yours, except for one I added with a negative number in quotes.

mcarbone’s picture

Status: Needs work » Needs review
FileSize
16.32 KB

I simply commented out the two tests that were failing, as I really hope they get added in as soon as the menu_tail_load fix goes in. (And I really hope that fix goes in soon, as date search is broken without it.)

This patch, jhodgdon, was pretty much entirely created by you, and I reviewed it awhile back, but I'm not sure it'd be appropriate for me to RTBC it or not at this point.

jhodgdon’s picture

Status: Needs review » Reviewed & tested by the community

Well, I think both of us like the patch quite a bit, so it can probably be marked RTBC (assuming the test bot agrees). Thanks for all the work you did!

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 74673-69.patch, failed testing.

jhodgdon’s picture

Looks like the patch fails 8 tests...

jhodgdon’s picture

Assigned: jhodgdon » Unassigned

punting back to you, mcarbone...

mcarbone’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
16.32 KB

Weird, I somehow uploaded the wrong patch.

webchick’s picture

Version: 7.x-dev » 6.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

Great bug fix, great tests. What's not to love?

Committed to HEAD. Thanks!

Moving back to 6.x for consideration.

mcarbone’s picture

Version: 6.x-dev » 7.x-dev
Status: Patch (to be ported) » Reviewed & tested by the community
FileSize
974 bytes

Before we switch this to 6.x-dev to be ported, can we get in these two small tests that we commented out before because they were waiting for #298561: menu_tail should automatically act as a load function as well to allow slashes in search?

jhodgdon’s picture

Good idea, thanks -- assuming the tests pass of course. :)

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD. Thanks.

jhodgdon’s picture

Version: 7.x-dev » 6.x-dev
Status: Fixed » Patch (to be ported)

Back to 6.x for consideration.

jhodgdon’s picture

Version: 6.x-dev » 7.x-dev
Issue summary: View changes
Status: Patch (to be ported) » Fixed

Talked with Gabor (the Drupal 6 branch maintainer) and D6 issues are really not being committed unless they're really essential -- we really don't have a test system for Drupal 6 and it's too dangerous. So... putting this back to D7 / fixed.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Alan D.’s picture

A simple patch if you do need this on D6 :/