I've been thinking about feed aggregation inside a company intranet and am developing some ideas about how company weblogs can be aggregated and archived. It seems to me that this might be possible with import.module and node.module. I haven't really created Drupal modules on my own yet -- except for 1 tiny module I made that makes my home page show recent weblinks, blogs, images in separate lists -- but I am hoping to develop the concept around this one and perhaps get some suggestions on how to proceed.
My concept:
A business weblog aggregator is different from client's news feed readers. A business weblog aggregator would need to...
1) archive collected weblog entries for long-term storage and retrieval
2) provide a richer set of metadata for each entered record. A minimal set of metadata might include: author, title, publisher, URL, subject. Some of these metadata elements should be entered in an automated fashion.
3) provide some semi-automated means of classification via a controlled vocabulary (taxonomy), e.g. where terms occurring in the blog entry text (title, description) are compared with synonyms, phrases, more complex boolean expressions that can be mapped to terms in the controlled vocabulary.
Now, while I think I might be able to handle feature 1 above, I am thinking, as a non-programmer, there is no way I can do 2 and three. So before investigating this further, I want to get your ideas about the possibility of doing this in Drupal.