Recommender algorithms are time consuming. One concern is that one PHP session would usually time out in 20 mins. If the computation cannot finish in 20 mins, then there could be problem. For D5, one workaround is to use the drush module and perform all the time-consuming algorithms offline. For D6, another workaround is to use the drupal.sh.

I haven't done much performance testing on the algorithms. Some performance testing is needed to see how the algorithms scale.

Comments

patchak’s picture

Version: » 6.x-1.1

Hello, we made some quick test on a test site with some 14 000 votes to chew... using the classical algo, the query was still running 2 hours after we started, we had to stop it eventually...

I would say the classical algo performance is really not good, I even wonder if it's usable on a live site that would eventually have a couple thounsands of users and votes...

What would you recommended to manage a huge amount of users and nodes? Using another algo?

Is there something we can do to increase performance on the classical algo, or it's that way it works??

Thanks for the great work, looking forward to see your work in GSoC!
Patchak

danithaca’s picture

thanks for the performance report!
I guess PHP is not that good in terms of performance no matter how we optimize.... one solution is to use a wrapper to some Java implementation. that's another direction I need to work on in GSoC, because you are right, if it doesn't scale well in terms of performance, it won't get used in any production site.

I released this module before any serious performance test in order to get feedback like this. Thanks!

patchak’s picture

We are now in the process of adding some indexes to some tables, to see if we can come up with something...Trying SlopeOne algo now...We are right now in the middle of it, so if you have any ideas, for example on how to build that wrapper, let me know!!!

Thanks,
Patchak

danithaca’s picture

another possible approach is to use the 'performance'=>'memory' option. then all the computation is done in memory. it could be much faster, but requires a lot of memory (might need to set PHP max memory option)

SlopeOne might be faster, but I'm not sure if the results are precise. Not many sites I know of use this algorithm.

Since my research focus is recommender system, at least in the next 3+ years in my PhD program I'm going to work on this. You can be assured that I'll try to make this module the #1 recommender algorithm provider for Drupal (only the algorithms). Even though performance is not good now, I hope to make some significant improvement during the summer.

Other suggestions are much appreciated.

patchak’s picture

I just found this project :
http://lucene.apache.org/mahout/

It seems like this does the kind of calculations we need? Or is it somehow a replacement of what your module does? If it's not really a replacement but can accept input and then output results, it seems like it could be nice...

danithaca’s picture

I'm now try to fix #414570: add local java (Mahout) support as part of my GSoC'09 task. Hopefully it'll increase performance.
I'm aware of http://lucene.apache.org/mahout/ and other Java implementations. The problem is that it needs to be integrated into Drupal.

danithaca’s picture

Version: 6.x-1.1 » 6.x-2.x-dev
Status: Postponed » Closed (fixed)

As far as I know, the best PHP can do in terms of performance is to use PHP CLI. This is added through #448696: Add drush/drupal.sh support and hook_run_recommender(). I don't think there's much room for performance improvement in the current architecture.

For further performance improvements, it's going to be integration with Java implementation. Refer to #414570: add local java (Mahout) support and #503212: add Apache Mahout web services support