- Various core features need a diff functionality.
- Add a Diff library to Drupal core.
Originally crafted by @heyrocker in
- GPL2+ compatible license. This is a hard requirement. In the past we have approached authors about dual or re-licensing their libraries and had success.
- Fully OO and PSR-0 compatible. This is not a hard requirement, but it will be tough to sell any library to the community that doesn't meet it. Another possibility would be to refactor the library and push the changes upstream if it's not too much work.
- Horde_Text: LGPL 2.1 (compatible). But the classes are not PSR-0 compatible.
- phpspec/php-diff: Modified BSD license (compatible). PSR-0 compatible.
- tyler-king/php-diff: Freely licensed with no restrictions (incompatible).
- PhpWiki DiffEngine: GPL (compatible). OO, but not PSR-0 compatible.
- Diff module's DiffEngine originates from PhpWiki, and MediaWiki (Wikipedia) apparently uses the exact same code.
- Diff module's DiffEngine.php was refactored in major ways over the past years — although major parts of the code still look identical to the original library source.
- MediaWiki's DairikiDiff.php is actively maintained and shows very recent changes.
- PhpWiki actually still seems to have recent releases — however, I checked out their SVN repository and none of the contained library code has been changed since 2003.
A very interesting aspect is how the original PhpWiki diff library was split up and structured — essentially 3 files consisting of:
difflib.php(low-level diff engine classes),
diff.php(high-level entry point for all two-way diffs), and
diff3.php(high-level entry point for 3-way diffs).
This clear organization unfortunately got lost in all forks.
- The project is very young.
- There haven't been many contributors, and it is not very clear which algorithms it uses (it only references Python difflib).
- It may be PSR-0 compatible, but the code is not aware of PSR-0 classloaders and performs its own loading.
- PhpWiki's Diff code looks superior — even though it is old (2003), the origin of the actual diff algorithms are clearly documented and have not been re-invented from scratch:
* Class used internally by Diff to actually compute the diffs.
* The algorithm used here is mostly lifted from the perl module
* Algorithm::Diff (version 1.06) by Ned Konz, which is available at:
* More ideas are taken from:
* Some ideas are (and a bit of code) are from from analyze.c, from GNU
* diffutils-2.7, which can be found at:
* Finally, some ideas (subdivision by NCHUNKS > 2, and some optimizations)
* are my own.
* @author Geoffrey T. Dairiki
* @access private
This is not the only documentation in the code that references external resources. Those same comments still exist in MediaWiki's fork of DiffEngine.
- MediaWiki's fork has been actively maintained since 2003, since it powers the diff engine on one of the world's largest web sites, Wikipedia. (which means it is mature and battle-tested in the field)
- Neither of both libraries has any tests. (Not even MediaWiki wrote any.)
PhpWiki/MediaWiki's DiffEngine looks superior and much more mature.
The code needs to be refactored into a proper, PSR-0 compatible PHP component though.
@alexpott already started with that in, but this should be approached a bit differently:
- First, we want to compare Diff module's DiffEngine with MediaWiki's DiffEngine and essentially merge them.
- Second, it would be a good idea to re-instantiate the original structure/organization of the PhpWiki diff classes.
- Only afterwards, we want to perform the PSR-0/component conversion.
If done well, this refactored PHP component would be a very interesting and useful contribution to the wider PHP framework developer community, so by providing it has a proper and decoupled library in
Drupal\Component\Diff, Drupal could actively help others.
(not verified) The refactoring could happen later on — the classloader should be able to load the current Diff* classes from
Diff*as a namespace prefix (essentially, identical to how Twig classes are loaded). Alternatively, as a temporary measure, we could simply stuff the .php files into the designated later filesystem location, but manually include the files where needed for now, until the refactoring has happened.
In any case, the merging and refactoring appears to be a larger job, but there's not really a reason for why other core efforts should be held off by that (i.e., Config Import UI + Node/Entity Revisions UI).
If you use revisions on your site, Diff module is very quickly becoming an *absolutely* necessary module. Revisions are actually pretty much entirely useless without this functionality, unless you're really good at opening two windows, and scanning through the text line by line manually trying to find the missing punctuation or whatever. :P
Diff module has one major "flaw," however... it uses a third-party diff library. This means:
a) it's not code we wrote
b) it doesn't make use of the Drupal API
c) its output is not themable
d) it uses classes. hiss. ;)
This is an issue to centralize efforts around getting this library either better integrated with Drupal or else re-written so that this functionality can live in core.
PASSED: [[SimpleTest]]: [MySQL] 48,328 pass(es).
Failed: Failed to apply patch.
Failed: 11294 passes, 8 fails, 0 exceptions
Failed: Failed to apply patch.
Failed: Invalid PHP syntax.