I assume the UI for it will be coming at some point, but I see there is a "train" API function in naive_bayes.inc file for the MachineLearningAPI module. How would one go using it to categorize or tag memes?

Comments

kyle_mathews’s picture

Project: Memetracker » Machine Learning API

Ummm. . . without custom programming right now you can't use the naive_bayes function to categorize or tag memes. Memetracker uses naive_bayes at the moment solely to designate how "interesting" a feed item is. When someone clicks on a node, the click is saved. Then naive_bayes is trained using the content from the clicked-on node.

Do you have an upcoming project where you need to categorize or tag memes? Or is this just an exploratory question? I can provide more help if needed but memetracker/machinelearningapi isn't set-up at the moment to categorize memes. This is certainly something I hope memetracker supports in the future but it's not a very high priority task for me right now.

gemini’s picture

One of the projects I'm working on is a Local News aggregator within a specific industry. Something like Topix, but limited by one state and one industry. I would like it to work similar to eScienceNews.com

Since there are no specific feeds for this industry, I would have to subscribe to different local feeds - scrape their content for better topic/industry recognition (which I think Naive Bayes would do just fine), then may be, do more granular categorization by area. Of course I wouldn't publish the whole articles, I think I could publish small snippets (or custom summaries) like topix does and link back to the source.

That's basically what I'm looking into.

kyle_mathews’s picture

That sounds like a cool project -- something Memetracker could definitely help with. Naive bayes would do a good job at categorizing articles along topic / industry.

But like I said, Memetracker won't do what you want at the moment. But as the direction you'd like to take Memetracker is a direction I think it should go -- I'll definitely help you program that functionality if you (or someone that works with you) has the programming chops. The Naive Bayes implementation is already set-up to categorize content into different categories -- you'd just need to build a UI to train the algorithm as to which content fits into which categories.

What's your timeline for moving forward with this?

gemini’s picture

Kyle,

the timeline is not defined, so it's more or less flexible. My programming skills are not great. I'm working alone on all of our projects, and creating not very complicated things in PHP quite a bit. I'm running a few sites on Drupal as well. If someone could help me out with the business logic for how things should operate (an algorithm or at least some pointers), I could probably start tweaking things on my own and see how far I could get.