Apache Mahout support has been added to 7.x-3.x release. It will be back port to D6 as 6.x-3.x perhaps after the summer.
Mahout is very fast. Computing recommendations for about 100K node-user ratings only takes a few minutes (compared to a few hours in PHP). So the performance issue is finally solved through Mahout.

CommentFileSizeAuthor
#6 Recommender.png67.14 KBangusmccloud
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

angusmccloud’s picture

Thanks Dan!

I'm more than happy to use the 6.3 release when it's ready (instead of this code we have hacked together). Particularly if ti can do incremental indexing and recommendation creation.

At this point I'll probably wait for 6.3 to do this -- but the next piece I was going to work on adding was a way to rebuild the similarities table without destroying the table (until the rebuild is complete). That way if it takes 12-24 hours to recalculate those values, you can keep making predictions using the old similarities until it finishes (just have separate threads running).

Right now I use two computers to do this, but ideally I'll keep it all on one box (without having two instances of drupal installed).

Anyway -- very excited to switch to Mahout with incremental builds!

Still on target for the end of August?

Cheers!

danithaca’s picture

Thanks for the feedback Connor. Mahout is super fast, so it won't take 12-24 hours computation anymore :)

I'll make sure not to destroy the old similarity data before the new data is generated in the 3.x releases.

The target date is Aug or Sep.

Cheers~

angusmccloud’s picture

That's awesome! Very much looking forward to this one Dan.

Let me know if there's anything I can do to help out (keeping in mind you're WAY better at this than I am)

danithaca’s picture

Assigned: Unassigned » danithaca
Priority: Normal » Major

Working on this now. D6 backport of the dependancy async_command is done.

danithaca’s picture

Version: 6.x-2.x-dev » 6.x-3.0-alpha1
Status: Active » Fixed

Done. Most of the helper modules (history_rec, fivestar_rec, uc_rec, similargroups) have been updated to use 6.x-3.0-alpha1.
I haven't done much testing though. Please test it first on dev environment and then production. Let me know if there's any problem.

angusmccloud’s picture

FileSize
67.14 KB

Dan,

I installed the newest version of the recommender (on a fresh d6 install -- so nothing installed except modules related to the recommender).

I'm using the fivestar recommender, running item-to-item. I ran it first for two users worth of ratings (about 500 ratings). It finished really quickly (whew).

I then tried to run it for 10k users (270k ratings), and it's erroring out. I can't tell what the error is: it looks like it's scrolling through ratings for a while, then it throws an error. A screenshot of the error is attached.

Let me know if you have any ideas,
-Connor

angusmccloud’s picture

I tried with smaller chunks of ratings:
* 5000 = failed (same error)
* 1000 = failed (same error)
* 500 = failed (same error)
* 100 = failed (same error)
* 50 = succeeded (1400 ratings)

danithaca’s picture

Connor,

The computation ran without problem. The problem was that it had too much data to save back to the database all at once. You can set "db_max_batch_size" in "config.properties" to be 5000 so that it'll save 5000 computed results per batch at a time. Or, you can set "max_allowed_packet" in your MySQL database (http://dev.mysql.com/doc/refman/5.5/en/packet-too-large.html).

I've seen this problem in my dev environment before, so it should fix the problem.

Another thing to note is that the Mahout algorithm implementation is slightly different from RecAPI 2.x, so you might see some differences in the recommendations it generates. Let me know if the recommendations make sense or not. Thanks.

Btw, please submit new issues for other bugs. It'll be easier for me to keep track. Thanks.

--daniel

angusmccloud’s picture

Sorry for using this thread, will use a new one next time and let you merge if they're related.

Neither of those things fixed it -- will open in a new thread instead of continuing here.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.