When I run the recommender I get an error while it's trying to load data (screenshot attached).
On Dan's recommendation, I tried making a few modifications:
* In config.properties I added "db_max_batch_size" and tried values between 5,000 and 50,000,000. I get the same error in each case, it just happens more quickly with a smaller value.
* In the my.ini (mysql config) I tried increasing the max_allowed_packet up to 800m (had it at 64, tried 512 and 800 -- it looks like I could go up to a gig, but we're not dealing with that much data yet). I didn't see a change in the result based on changing this value.
I'm running the 6.3alpha of the recommender, and using the most updated async as of 9/17. Everything running on a fresh d6 install from last night.
| Comment | File | Size | Author |
|---|---|---|---|
| #4 | 200 and 500.txt | 1.52 KB | angusmccloud |
| Recommender.png | 67.14 KB | angusmccloud |
Comments
Comment #1
danithaca commentedThanks Connor. I'm looking at it now. Can you let me know your MySQL version? Thanks.
Comment #2
angusmccloud commentedDan,
MYSQL version is: 5.1.36-community
Let me know if sharing my desktop (or letting you remote in) would be helpful in any way.
Cheers,
-Connor
Comment #3
danithaca commentedThanks Connor. Can you try set db_max_batch_size to be 200, 500 or 1000 to see if it works?
I'm going to make some debugging code in RecAPI to generate more info to help me debug.
Comment #4
angusmccloud commentedTried all 3, at 1000 it gets the same error. 200 and 500 don't really run (log from the command prompt attached).
Comment #5
danithaca commentedGot it.
When you try 200 and 500, please issue a new "run recommender" command each time. The last time you ran with 1000, the command was marked as "failure" and won't get executed again, so you have to issue a new "run recommender" to test the new cases.
Comment #6
angusmccloud commentedDan,
Same issue with 200 -- however I can see the text now. Looks like there's a unique constraint issue?
Command evaluation error. See script log for details. Error: Sourced file: inline evaluation of: ``app.runRecommender(); //fivestar_rec_i2i;'' : Method Invocation app.runRecommender : at Line: 1 : in file: inline evaluation of: ``app.runRecommender(); //fivestar_rec_i2i;'' : app .runRecommender ( ) Target exception: org.drupal.project.async_command.DrupalRuntimeException: java.sql.SQLException: Duplicate entry '164-57040' for key 'PRIMARY' Query: INSERT INTO drupal_recommender_preference_staging(source_eid, target_eid, score, updated) VALUES(?, ?, ?, ?) Parameters: [lots and lots of values -- I removed them]
Comment #7
angusmccloud commentedI'm going to try reloading all the ratings, want to make sure there's no bad/duplicate data in there... (I feel like fivestar let in some dupes that I had to manually cleanup). One minute
Comment #8
angusmccloud commentedThat was it -- there's duplicate data in the ratings. Removed all the dupes, and it's running now (at least it's gotten farther than it ever has before).
Thanks Dan, you are the man!
Comment #9
danithaca commentedCool. thanks. Can you tell me how much time it takes to compute the recommendations? My experience is that computation itself is fast, but saving to database takes quite some time (maybe because of slow hard drive operations it takes). You can find out the time log of each process while executing run.bat.
I'll try to fix the dup data problem (all data from a failed runRecommender command should be purged before a new execution).
Let me know if there's other problem.
Comment #10
angusmccloud commentedDan,
It finished in about 6 hours, for 271k ratings. Summary message was: Users: 8985. Items: 11602. (Time spent: 6h11m5s)
I've also included the full command prompt printout in case you want to see where the time was spent.
As we discussed, I'm also going to try without the max batch size turned on and see what happens.
-Connor
C:\wamp\www\sites\all\modules\recommender>run
PHP 5.3.0 (cli) (built: Jun 29 2009 21:25:23)
Copyright (c) 1997-2009 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2009 Zend Technologies
Sep 19, 2011 10:21:31 AM org.drupal.project.async_command.AbstractDrupalApp hand
leCLI
INFO: DrupalApp VERSION: 6_0
Sep 19, 2011 10:21:31 AM org.drupal.project.async_command.AbstractDrupalApp getD
efaultSettingsPhpFile
INFO: Jar file location: C:\wamp\www\sites\all\modules\recommender\recommender.j
ar
Sep 19, 2011 10:21:31 AM org.drupal.project.async_command.AbstractDrupalApp init
DrupalConnection
INFO: Batch SQL size: 200
Sep 19, 2011 10:21:31 AM org.drupal.project.async_command.AbstractDrupalApp runA
pp
INFO: Total commands to be executed: 1
Sep 19, 2011 10:21:31 AM org.drupal.project.async_command.AbstractDrupalApp runA
pp
INFO: Running async_command: runRecommender(); //fivestar_rec_i2i
Sep 19, 2011 10:21:31 AM org.drupal.project.recommender.RecommenderApp processTa
ble
INFO: Using {recommender_preference_staging} table. Loading data.
Sep 19, 2011 10:21:53 AM org.drupal.project.recommender.RecommenderApp$Algorithm
Impl run
INFO: Initializing data model, similarity and recommender.
Sep 19, 2011 10:21:53 AM org.drupal.project.recommender.RecommenderApp$Algorithm
Impl initDataModel
INFO: Initializing data model.
Sep 19, 2011 10:21:54 AM org.drupal.project.recommender.RecommenderApp$Algorithm
Impl initDataModel
INFO: Switching to MEMORY mode. Load all data from database into memory first.
Sep 19, 2011 10:21:54 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Loading new JDBC delegate data...
Sep 19, 2011 10:21:54 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Processed 8985 users
Sep 19, 2011 10:21:54 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: New data loaded.
Sep 19, 2011 10:21:54 AM org.drupal.project.recommender.RecommenderApp$Algorithm
Impl run
INFO: Using similarity class: org.apache.mahout.cf.taste.impl.similarity.Pearson
CorrelationSimilarity
Sep 19, 2011 10:21:54 AM org.drupal.project.recommender.RecommenderApp$Algorithm
Impl run
INFO: Using recommender class: org.apache.mahout.cf.taste.impl.recommender.Gener
icItemBasedRecommender
Sep 19, 2011 10:21:54 AM org.drupal.project.recommender.RecommenderApp$Algorithm
Impl run
INFO: Computing and saving similarity data.
Sep 19, 2011 10:22:20 AM org.drupal.project.recommender.RecommenderApp$Algorithm
Impl genericComputeSave
INFO: Finished computing. Saving to database with rows: 320567
Sep 19, 2011 10:22:44 AM org.drupal.project.recommender.RecommenderApp$Algorithm
Impl run
INFO: Computing and saving prediction data.
Sep 19, 2011 4:31:36 PM org.drupal.project.recommender.RecommenderApp$AlgorithmI
mpl genericComputeSave
INFO: Finished computing. Saving to database with rows: 824494
Sep 19, 2011 4:32:36 PM org.drupal.project.async_command.AbstractDrupalApp runAp
p
INFO: Result: true
C:\wamp\www\sites\all\modules\recommender>
Comment #11
danithaca commentedSounds good. thanks Connor. Also, do you think it's faster than the original PHP implementation?
Comment #12
angusmccloud commentedDan,
This is WAY faster than the PHP version. I haven't actually tried to run a full set of recommendations in a while because the PHP version was so slow.
The last time I ran one there was about 150k ratings, and it ran for 30something hours. As you know the time goes up exponentially, so if I had to guess I'd say the 270k ratings would be >=75 hours on the PHP version (I don't want to test this theory).
Comment #13
danithaca commentedCool. Sounds good. I believe there are ways to improve the current performance even more by JDBC tuning. I'll look into that more.
Comment #14
danithaca commentedcan't reproduce the "dup" data scenario. mark this as closed. new issues related to "dup" data problem could be a separate issue.