Hello there,

I'm writing some customizations for the search engine and would like to take into account the number of times a node had been viewed or read. I already have that part implemented, but I'm running into some trouble with setting up reindexing.

Our cron runs 4 times an hour and our server can safely process 50 nodes with Apache Solr. 50x4 = 200 nodes per hour. The problem is that on average we get about 300 page views an hour and that number is growing. If I submit a node for a reindex every time it's viewed the queue will always be full, which presents all kinds of problems.

So I need to come up with a way to moderate the submission for reindex. A couple of ways to do that are:
* Only allow submission of a node for reindex once a day
* Only submit node for reindex if it had 10+ views/reads since last reindex

I can store the timestamp of when the node was last indexed as one of it's solr fields. Same with number of views at the last time of indexing.

I think it makes sense to do the check and submission for reindex at node load time.

My questions is: How do i retrieve a particular node's values from apache solr index?

Thanks,
Andrey.

P.S. If you have a better idea of how to accomplish what I'm trying to do, please let me know.

Files: 
CommentFileSizeAuthor
Screen shot 2011-05-06 at 9.46.30 AM.png281.96 KBmr.andrey

Comments

mr.andrey’s picture

Status: Active » Fixed

Ended up just using a custom table with a hook_nodeapi.

Now the node is reindexed if:
* node counter is < 10 and it has advanced 2+ views
* node counter is < 100 and it has advanced 5+ views
* node counter is >= 100 and it has advanced 10+ views

Seems reasonable for now.

Cheers,
Andrey.

pwolanin’s picture

You must not be doing any caching?

Anyhow - sounds like a reasonable approach.

mr.andrey’s picture

What do you mean?

I'm not quite clear on the whole caching and read counter thing. I'm using Boost and Statistics Advanced Settings.

I've read that Boost isn't 100% friendly with the read counter, but haven't looked that deeply into it yet.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.