The Authenticate module provides a mechanism to verify if site content has been plagiarized.
The module is a framework which supports various search APIs (plugins) to scour the net looking for possible plagiarized content. The framework provides support for 2 different types of APIs - Standard APIs (Google and Yahoo plugins are included here) and Custom APIs (such as the 3rd party paid authentication service from iThenticate (www.ithenticate.com)).
The Standard APIs process is basically:
- split BODY of node to be checked into configurable number of consecutive word "chunks"
- use API's search engine (Google or Yahoo for example) to search for any URL's which match each chunk
- load the full page content for each matching URL and do complex comparison against the entire body of the node
- come up with a comparison score based on how closely the content matches between the scraped URL's content and the body of the node
- provide a report of all matching URL's whose comparison score exceeds a configurable threshold
Custom API's like iThenticate's do their own search offsite from the user's Drupal site and return a report in any fashion they prefer (embedded in an iFrame within the Drupal site).
API accounts (Google, Yahoo, iThenticate or others which may be added) require API accounts with the respective companies.
This module is expected to be popular amongst schools, colleges and publishing companies.
NOTE: As per the README, this module requires the PEAR text_diff class to be installed.
Also, requires simple_html_dom class file which is included here or you may find it here: http://sourceforge.net/projects/simplehtmldom/files/. You only need the simple_html_dom.php file out of this - rename it to simple_html_dom.inc and place it in the root folder of this module.
NOTE: The iThenticate module was removed from the release as it contained code which was not allowed in Drupal CVS due to licensing issues. If you need this API, feel free to contact me or iThenticate directly.
Drupal 6 version now available
I have extensively tested the Google API but not the Yahoo API. I was getting connect errors with Yahoo API which may be due to their API changing as the API code is relatively untouched between Dr5 and Dr6 versions.
Funding for Drupal 6 version provided by ConsumerSearch.com, a NY Times Company.
Drupal 7 version now available
The D7 version ships with both the Google plugin and the Yahoo plugin. The Yahoo plugin, although converted to D7 does not have code changes to support changes to Yahoo Search API so it does not work. But if someone can track down a link to their new API; I am sure the update would be very simple.
The iThenticate plugin is still not included. I believe I was forced to remove it from the D5 version due to a silly idea that it was not an open source product. That was just someone on Drupal exec that was confused and I am sure there is no reason it can not be included here. I do however not have a valid iThenticate account; so would be impossible for me to test. If someone can offer a test account; I can likely port over the iThenticate plugin and include it here.