I noticed that indexing my site using the search module is really slow. I have around 7000 nodes but I can't get the index to run to completion from cron.php in the 240 seconds allotted for the minimal 10 nodes. I've made sure that I was using innodb, turned on the query_cache, and assigned lots of memory to the CPU to no avail. I mostly followed the mysql tuning recommendations.

I did notice two things. One, there is a fairly slow query (10+ seconds) that is used at the end of the search module to determine if there are any words/terms to delete (a Left join of search_total and search_index). While slow, it isn't my problem.

I did notice that the search_index table has no primary key. Since more than 50% of the queries during indexing hit this table (lots of insert for new nodes), I made a change to the table to make the sid, word, and type fields the primary key (my presumption is that the combined fields are unique which so far seems to hold true).

The result is that the speed of indexing improved by around 60% or so. I'd like to know if this is a reasonable change to make and if I should expect to see any issues in the future. Even with these changes, I could only index 20 nodes at a time without PHP timing out the cron request.

I also noticed that I have a lot of numbers as "words" in the index. A lot of the data in the node contains numeric information, most of which I have no interest in searching (nor would anyone else). I noticed that there is a lot of special handling for numeric data. Is there someway to easily circumvent that, and if so, would I likely see some kind of performance increase in the indexing process?

Comments

marcingy’s picture

Version: 5.1 » 8.x-dev

Bumping version

damien tournoud’s picture

Version: 8.x-dev » 5.1
Category: feature » support
Status: Active » Closed (fixed)

Reclassifying.