CVS edit link for benoit.borrel
Hello,I am a senior PHP developer with more than 2 years of experience with Drupal. I had already provided (very) few patches to some modules and would like to contribute to the community my own new module.
The Semantic Similarity module automatically computes semantic similarity scores between each node. These score are then used to display four blocks (I have plan to refactor them to integrate with Views):
- Most/Least semantically similar nodes, on every pages
- Node's most/least semantically similar nodes, on node pages
The two first blocks contain a list of the specified number pair of nodes which
are the most/least semantically similars, site-wide. The last two blocks contain
links to the specified number of the most/least semantically similar nodes,
related to the viewed node only.
These scores are computed by integrating Drupal with the R Project for Statistical Computing and its Latent Semantic Analysis package. The semantic similarity scores, obtained from a Latent Semantic Analysis algorithm (http://en.wikipedia.org/wiki/Latent_semantic_analysis) a well established one in text mining, is an effective measure of semantic relatedness (http://en.wikipedia.org/wiki/Semantic_relatedness).
As I stated in Methods to detect relations of similarity between nodes (http://groups.drupal.org/node/45340), many modules offer, based upon different methods, functionalities that serve to detect relation of similarity between nodes (sometimes named "more like this", relevant, similar...). I classified these methods as taxonomy/CCK based and content based.
The first method relies on term-matching between nodes (like module http://drupal.org/project/similarterms) or even let users create complex/custom defined weight and compound associations (like module http://drupal.org/project/Associated_nodes). An (incomplete) list of modules using such method is here: http://drupal.org/node/323329.
The second method relies on content-matching between nodes. The only existing module belonging to this method (as far as I know is http://drupal.org/project/similar) relies on MySQL full text searching to perform basic natural language processing.
My proposed module Semantic Similarity, is also using a content based method but utilizes advanced text mining. It offers a truly semantic approach that applies the Latent Semantic Analysis (LSA) algorithm to approximate the meaning of texts, thereby exposing semantic structure to computation. LSA combines the classical vector-space model — well known in computational linguistics — with a singular value decomposition (SVD), a two-mode factor analysis. Thus, bag-of-words representations of texts can be mapped into a modified vector space that is assumed to reflect semantic structure. The module then computes the distance amongst the vectors. This distance is in fact a measure of semantic relatedness between texts.
The algorithm, segmented in 3 steps (pre-process, process and post-process) and simply configurable, permits to choose between 2400 combinaisons of factors, allowing the user to fine tune the relevancy of the scores.
Thanks to consider my application.
Benoit Borrel
| Comment | File | Size | Author |
|---|---|---|---|
| #1 | semantic_similarity-6.x-0.2.tar_.gz | 33.91 KB | benoit.borrel |
Comments
Comment #1
benoit.borrel commentedHere is the code.
Comment #2
avpadernoHello, and thanks for applying for a CVS account. I am adding the review tags, and some volunteers will review the code, pointing out what it needs to be changed.
Comment #4
avpadernoDeleting Drupal variables using a query that matches any Drupal variable with a name that starts with the module name would remove also the Drupal variables of other modules.
Thank you for your contribution! I am going to update your account.
These are some recommended readings to help with excellent maintainership:
You can find more contributors chatting on the IRC #drupal-contribute channel. So, come hang out and stay involved.
Thank you, also, for your patience with the review process.
Anyone is welcome to participate in the review process. Please consider reviewing other projects that are pending review. I encourage you to learn more about that process and join the group of reviewers.
I thank all the dedicated reviewers as well.
Comment #5
benoit.borrel commentedCode fixed and module published: http://www.drupal.org/project/semantic_similarity.
Thanks,
Comment #6
avpadernoComment #9
avpaderno