Active
Project:
Machine Learning API
Version:
6.x-1.1-alpha3
Component:
Documentation
Priority:
Normal
Category:
Support request
Assigned:
Unassigned
Reporter:
Created:
7 Aug 2008 at 19:17 UTC
Updated:
15 Aug 2008 at 05:14 UTC
I assume the UI for it will be coming at some point, but I see there is a "train" API function in naive_bayes.inc file for the MachineLearningAPI module. How would one go using it to categorize or tag memes?
Comments
Comment #1
kyle_mathews commentedUmmm. . . without custom programming right now you can't use the naive_bayes function to categorize or tag memes. Memetracker uses naive_bayes at the moment solely to designate how "interesting" a feed item is. When someone clicks on a node, the click is saved. Then naive_bayes is trained using the content from the clicked-on node.
Do you have an upcoming project where you need to categorize or tag memes? Or is this just an exploratory question? I can provide more help if needed but memetracker/machinelearningapi isn't set-up at the moment to categorize memes. This is certainly something I hope memetracker supports in the future but it's not a very high priority task for me right now.
Comment #2
gemini commentedOne of the projects I'm working on is a Local News aggregator within a specific industry. Something like Topix, but limited by one state and one industry. I would like it to work similar to eScienceNews.com
Since there are no specific feeds for this industry, I would have to subscribe to different local feeds - scrape their content for better topic/industry recognition (which I think Naive Bayes would do just fine), then may be, do more granular categorization by area. Of course I wouldn't publish the whole articles, I think I could publish small snippets (or custom summaries) like topix does and link back to the source.
That's basically what I'm looking into.
Comment #3
kyle_mathews commentedThat sounds like a cool project -- something Memetracker could definitely help with. Naive bayes would do a good job at categorizing articles along topic / industry.
But like I said, Memetracker won't do what you want at the moment. But as the direction you'd like to take Memetracker is a direction I think it should go -- I'll definitely help you program that functionality if you (or someone that works with you) has the programming chops. The Naive Bayes implementation is already set-up to categorize content into different categories -- you'd just need to build a UI to train the algorithm as to which content fits into which categories.
What's your timeline for moving forward with this?
Comment #4
gemini commentedKyle,
the timeline is not defined, so it's more or less flexible. My programming skills are not great. I'm working alone on all of our projects, and creating not very complicated things in PHP quite a bit. I'm running a few sites on Drupal as well. If someone could help me out with the business logic for how things should operate (an algorithm or at least some pointers), I could probably start tweaking things on my own and see how far I could get.