Problem

  • Various core features need a diff functionality.

Goal

  • Add a Diff library to Drupal core.

Requirements

Originally crafted by @heyrocker in #1821548-6: Add a "diff" of some kind to the CMI UI

  1. GPL2+ compatible license. This is a hard requirement. In the past we have approached authors about dual or re-licensing their libraries and had success.
  2. Fully OO and PSR-0 compatible. This is not a hard requirement, but it will be tough to sell any library to the community that doesn't meet it. Another possibility would be to refactor the library and push the changes upstream if it's not too much work.

Candidates

Details

  • Diff module's DiffEngine originates from PhpWiki, and MediaWiki (Wikipedia) apparently uses the exact same code.
  • Diff module's DiffEngine.php was refactored in major ways over the past years — although major parts of the code still look identical to the original library source.
  • MediaWiki's DairikiDiff.php is actively maintained and shows very recent changes.
  • PhpWiki actually still seems to have recent releases — however, I checked out their SVN repository and none of the contained library code has been changed since 2003.
  • A very interesting aspect is how the original PhpWiki diff library was split up and structured — essentially 3 files consisting of:

    - difflib.php (low-level diff engine classes),
    - diff.php (high-level entry point for all two-way diffs), and
    - diff3.php (high-level entry point for 3-way diffs).

    This clear organization unfortunately got lost in all forks.

  • Regarding phpspec/php-diff:

    • The project is very young.
    • There haven't been many contributors, and it is not very clear which algorithms it uses (it only references Python difflib).
    • It may be PSR-0 compatible, but the code is not aware of PSR-0 classloaders and performs its own loading.
  • PhpWiki's Diff code looks superior — even though it is old (2003), the origin of the actual diff algorithms are clearly documented and have not been re-invented from scratch:
    /**
     * Class used internally by Diff to actually compute the diffs.
     *
     * The algorithm used here is mostly lifted from the perl module
     * Algorithm::Diff (version 1.06) by Ned Konz, which is available at:
     *   http://www.perl.com/CPAN/authors/id/N/NE/NEDKONZ/Algorithm-Diff-1.06.zip
     *
     * More ideas are taken from:
     *   http://www.ics.uci.edu/~eppstein/161/960229.html
     *
     * Some ideas are (and a bit of code) are from from analyze.c, from GNU
     * diffutils-2.7, which can be found at:
     *   ftp://gnudist.gnu.org/pub/gnu/diffutils/diffutils-2.7.tar.gz
     *
     * Finally, some ideas (subdivision by NCHUNKS > 2, and some optimizations)
     * are my own.
     *
     * @author Geoffrey T. Dairiki
     * @access private
     */
    

    This is not the only documentation in the code that references external resources. Those same comments still exist in MediaWiki's fork of DiffEngine.

  • MediaWiki's fork has been actively maintained since 2003, since it powers the diff engine on one of the world's largest web sites, Wikipedia. (which means it is mature and battle-tested in the field)
  • Neither of both libraries has any tests. (Not even MediaWiki wrote any.)

Proposed solution

  1. PhpWiki/MediaWiki's DiffEngine looks superior and much more mature.

  2. The code needs to be refactored into a proper, PSR-0 compatible PHP component though.

    @alexpott already started with that in #1821548: Add a "diff" of some kind to the CMI UI, but this should be approached a bit differently:

    1. First, we want to compare Diff module's DiffEngine with MediaWiki's DiffEngine and essentially merge them.
    2. Second, it would be a good idea to re-instantiate the original structure/organization of the PhpWiki diff classes.
    3. Only afterwards, we want to perform the PSR-0/component conversion.
  3. If done well, this refactored PHP component would be a very interesting and useful contribution to the wider PHP framework developer community, so by providing it has a proper and decoupled library in Drupal\Component\Diff, Drupal could actively help others.

  4. (not verified) The refactoring could happen later on — the classloader should be able to load the current Diff* classes from Drupal\Component\Diff\ by registering Diff* as a namespace prefix (essentially, identical to how Twig classes are loaded). Alternatively, as a temporary measure, we could simply stuff the .php files into the designated later filesystem location, but manually include the files where needed for now, until the refactoring has happened.

    In any case, the merging and refactoring appears to be a larger job, but there's not really a reason for why other core efforts should be held off by that (i.e., Config Import UI + Node/Entity Revisions UI).


Original summary

If you use revisions on your site, Diff module is very quickly becoming an *absolutely* necessary module. Revisions are actually pretty much entirely useless without this functionality, unless you're really good at opening two windows, and scanning through the text line by line manually trying to find the missing punctuation or whatever. :P

Diff module has one major "flaw," however... it uses a third-party diff library. This means:
a) it's not code we wrote
b) it doesn't make use of the Drupal API
c) its output is not themable
d) it uses classes. hiss. ;)

This is an issue to centralize efforts around getting this library either better integrated with Drupal or else re-written so that this functionality can live in core.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

oriol_e9g’s picture

Version: 6.x-dev » 7.x-dev
moshe weitzman’s picture

I have committed a nearly core worthy version of diff to diff module's HEAD branch. I need a little feedback before making a final patch.

This version keeps a separate diff.module with its diff engine and implements hook_diff() on behalf of node.module, upload.module, and taxonomy.module.

  1. Should we move those node.inc, upload.inc, taxonomy.inc into core modules? I say yes. They are very small.
  2. Should we let diff.module be switched off so one can get the current the UI? The alternative is to simply integrate the diff powered revisions tab into node module and be done with the less capable UI we have today. In this case, we would move the diff engine to a node.diff.inc that only gets loaded when viewing a diff.
  3. Should the Preview changes button be default enabled for all content types? What about upgraded sites?

In general, I'd support full integration of diff so the Revisions tab is always useful. I'd like some confirmation that this makes sense to others before proceeding. If folks are feeling cautious, the diff that is in HEAD is completely committable. It takes over the Revisions tab using hook_menu_alter() which is quite reasonable.

As for webchick's list:
a) neither is jquery, phpass, etc. not a concern anymore. the diff engine code has not change in years.
b) now it does.
c) now it is.
d) we no longer fear classes. See PDO and simpletest.

webchick’s picture

So my original idea for this was to #120967: Separate revisions from node.module so that it lived in node_revisions.module and Diff could be incorporated into that. I still think that's a good idea, since many sites don't use revisions and it clutters up the UI to have them.

But. I think it's more important to make revisions actually useful, so I don't want to hold up this effort for that one. Unless we could do both at once, in which case I would pretty much pee my pants with excitement. ;)

1. If you mean take the code from those and integrate them into their respective modules, yes.
2. No separate module (unless you take up the "node_revisions.module" torch). This should be built into the revisions system. Separate inc though, definitely.
3. I actually hate the preview changes button. So no, definitely not enabled for all content types. Possibly for any content types with revisions enabled by default. Let's ping some usability folks about that.

moshe weitzman’s picture

Assigned: Unassigned » moshe weitzman
Status: Active » Needs review
FileSize
30.4 KB
13.93 KB
13.08 KB

OK, Here is the patch and 2 new files which belongs in the node.module directory. I know of 1 small bug in taxonomy_diff() and I have not yet run the simpletests. Meanwhile, feel free to start reviewing, folks.

moshe weitzman’s picture

FileSize
59.52 KB

I fixed the taxonomy_diff bug and cleaned up a touch. The patch now includes the 2 new files.

drewish’s picture

subscribing.

catch’s picture

I'd like to see this get in, and be part of node module so it owns the revisions tab.

I also dislike the "preview changes" button.

BioALIEN’s picture

I like some of webchick's comments in #3 but yes maybe for another issue.

What will happen to "Preview" and "Preview changes" buttons?

maartenvg’s picture

subscribing for future review. I'm all for making revisions better.

Dave Reid’s picture

I'm in favor of the proposal in #3 as well. I'd rather see revision split off and joined with diff. I have no need for revisions or diff on most of my sites.

Pancho’s picture

@webchick, Dave Reid: Same here. I propose splitting it off first and merging diff in second. See my comment there.

moshe weitzman’s picture

Sorry, it is silly to make this issue dependant on vaporware. Just because a given order is "more logical" does not mean it shakes down that way.

We are in "Patch needs Review". Please.

Pancho’s picture

Oh sorry Moshe, I certainly don't wanna hold back this patch! Go forward, please!
In the meanwhile I'm gonna see if I can review your patch in #5.

Pancho’s picture

+1
I gave the patch in #5 some testing, and found out that diffing title, body, tags and attachments works like a charm!
Some more testing with CCK fields and possibly some edge cases would be great.

One glitch that comes to my mind: the view revision pages have no proper menu entry, so parent title, tabs and breadcrumb get lost and navigating back is only possible via the browser button. But this has also been the case before, so it's a separate issue.

It would also be awesome to include menu, path and other data in the revisioning system, but this is also beyond the scope of this issue.

So overall this patch seems to work exactly as advertised! Some more reviews would be good, so we can move forward.

moonray’s picture

Subscribing.

sym’s picture

FileSize
13.04 KB

Applied fine with a few offsets, so I remade it.

1. I like the idea of diff in core.

2. On node add/edit pages I find having 'preview' and 'preview changes' confusing. Maybe something like 'preview published version' and 'preview changes' would be better? The text is getting far too long though.

3. The IU on the revision page is confusing too:

* As far as I can tell 2 radio buttons aren't needed on the first and last revision. When you only have 2 revisions, is isn't clear what they do.

* The 'Show diff' button is at the top of the form. Is this a good idea? I know the IU has to be complex, but we don't want to emulate the horrid form mediawiki uses -- I still get confused by that.

I know some of these points apply to the diff module even if it doesn't get in to core.

BioALIEN’s picture

Very good suggestions from sym. With regards to the button names, it got me the very first time I used the module too.

How about we remove "Pre" to give us: "Preview" and "View Changes"?

catch’s picture

I like 'Preview' and 'View Changes', you're not really 'Previewing changes' anyway. sym's point #3 is also valid

SeanBannister’s picture

I also like 'Preview' and 'View Changes'. My users get confussed with the 'Preview changes" button.

moonray’s picture

I agree. 'Preview' and 'Changes' look like the best way to add some clarity.

moshe weitzman’s picture

OK, I get the point about text on the button. Will change that ... I would love a more substantial review here.

maartenvg’s picture

Status: Needs review » Needs work

The patch in #16 doesn't contain the new files, but #5 still applies (with some offset). Attached is a reroll without offset, but with the new files.

This is exactly what I expected, great improvement of the revision system! Most things work as they should, and behavior of the functions also agrees with my expectations. I haven't tested the workings and code of the diff engine too much, but it seems to work and look fine. About the rest of the module changes I have the following remarks:
- an improvement would be if the diff also mentions changes in the user. If this is possible of course (do revisions have authors?)
- There is s a bug when you've added or changed the taxonomy terms when creating a new revision. If you then want to watch the diff at the revisions overview of that node, you'll get a Fatal error: Call to undefined function taxonomy_get_term() in /var/www/home/d7/modules/taxonomy/taxonomy.module on line 1391
- All simpletests run without fails. :-) That means we need more tests! Testing during and after editing the diffs of taxonomy, title, body and uploads, and testing the output of the diff engine. Do we want to create them in this issue or first put this part in and bother with tests later?

moshe weitzman’s picture

Status: Needs work » Needs review
FileSize
59.49 KB

Changed the button to read 'View changes' and fixed the taxonomy_get_term() error. This patch again includes the 2 new files.

It would indeed be nice to show changes in url alias and menu settings but we don't version those items yet. We do version the author and I'd like to see that be exposed in these diffs. I'm hoping that can go in with a subsequent patch.

The revisions listing page does indeed have a few radio button combinations which don't make sense. See #114308: add jQuery for hiding radios that shouldn't show diffs. I think that kind of UI optimization can go into a subsequent patch. This one is already substantial.

Thanks for the reviews, folks. Lets keep pushing this one toward core.

moshe weitzman’s picture

FileSize
59.32 KB

Synched with HEAD. Let us please review this soon. I suggest you not spend too many cycles reviewing node.diff.inc. Thats the diff engine itself and has not changed in many years (except for code formatting).

Status: Needs review » Needs work

The last submitted patch failed testing.

lilou’s picture

Status: Needs work » Needs review
Pancho’s picture

FileSize
60.08 KB

Rerolled removing some fuzz.

Status: Needs review » Needs work

The last submitted patch failed testing.

moshe weitzman’s picture

thanks. note that your patch has eclipse file in it

Pancho’s picture

Status: Needs work » Needs review
FileSize
58.95 KB

Well, now here's another reroll. Let's see if it passes the testbed this time?

kelvinc’s picture

When u view an old revision, there a only 3 options shown on message field: edit, revert and delete.
My suggestion is to add one more option "view all revisions" to go back to previous page or make the "tabs" visible ^^.

thx

David Strauss’s picture

@kelvinc Please file a new issue for functional changes to the diff module.

Status: Needs review » Needs work

The last submitted patch failed testing.

Pancho’s picture

@kelvinc:
I'm not convinced of the approach you suggest - this would linearize or denormalize a hierarchical relation 1∈n.
Getting back from a child page to a parent page is what breadcrumbs are for.

Anonymous’s picture

Subscribing

Dries’s picture

Personally, I'm not convinced that diff.module belongs into core ...

webchick’s picture

The reason I'd be supportive of this in core is that node revision functionality is, quite literally, absolutely useless without this functionality. Unless of course you're *really* good at staring at two pages in two browser windows and spotting the differences and have a notebook and quill pen handy where you can jot down notes as you go through. ;)

I'd be fine with simplifying it more if that's what your hesitation is, but I really do feel that most people would consider this "core" functionality in a CMS that offers content revisioning.

FiReaNGeL’s picture

As previously discussed, maybe content revisioning doesn't belong in core either

catch’s picture

I think we need support for revisioned content in core - the underlying schema and on/off variables for content types if nothing else. It looks like it's going to be built into Field API as well, and the 'body' field removed from special casing, so that end of it is here to stay most likely if expanded and unified a bit.

But as webchick says the UI in core is pretty useless - I do have a site which uses revisions without diff, but it's 1. only as a safeguard against vandalism 2. very clunky and I keep meaning to install diff ;) I still think it'd be good to put some effort into #120967: Separate revisions from node.module - probably into a revisions_ui module since node_revisions might get refactored by fields anyway. Then we have three choices - a very limited and clunky revisions_ui module in core and diff in contrib, diff in core, or no UI for revisions in core and everything is handled by diff in contrib.

SeanBannister’s picture

Content revision is such an important part of content creation because when a user makes a change to a node in Drupal that change is "forever". I don't believe users should be required to wade threw SQL dumps to retrieve content changes. This becomes even more important on sites where multiple users are editing the same node. This seems like such a core feature to me because Drupal is all about content creation, and as a Content Managment System users need a way to revise edits on that content, but without Diff content revision is a usability nightmare.

Dries’s picture

Maybe the solution is to leave revisions in core, but to move the revisions UI out of core, and to create a revisions_ui project that comes with diff.module. I'm not necessarily advocating this solution, but it might be worthy of exploration.

David Strauss’s picture

Revisions absolutely need to be in core. I continue to advocate making revisions required and removing the current ability to directly overwrite existing revisions with new revisions.

The work on fields in core has made me skeptical of how effectively we can embed most APIs in core without corresponding UIs. I agree that Diff itself does not necessarily have to be in core; it's currently a nice example of how flexible Drupal allows decorators to extend core capabilities.

webchick’s picture

Yes, I don't think anyone's arguing that Diff can't live in contrib. It has for several releases now.

What myself and others are arguing is that as a user, I would find it absolutely bizarre to use a CMS that touted this as one of its features, where I had to go download an *add-on* to actually see differences in revisions. Especially when "finding the add-ons for Drupal that you need to accomplish X" is the toughest task that both new and experienced users alike face.

Let's look at how other projects that way more people use than Drupal offer revisioned content:

Wikipedia:
Wikipedia

Google Docs:
Google Docs

WordPress:
WordPress

Now, let's look at Drupal:

Drupal

We keep talking about how we want the usability in D7 to be better than any other release before it. Jakob Nielsen's Law of the Web User Experience is, "users spend most of their time on other sites, so that's where they form their expectations for how the Web works." So core matching user expectations is a critical piece of Drupal being more usable, IMO.

Dries, is there any chance you could elaborate on why you think this wouldn't be a good thing for core revisions to offer out of the box?

webchick’s picture

Issue tags: +Usability
sun’s picture

I absolutely agree that Diff should be in core. If we don't add Diff, then we can remove Poll. As webchick pointed out in #43, almost every other competing CMS provides revisions and revision differences out of the box.

Hence, subscribing, and marking for later review.

SeanBannister’s picture

Awesome post webchick, that can be Drupal's new usability slogan "Where's my button!"

Your spot on, users expect this ability to compare revisions and the ability to compare content in a content management system.

jstoller’s picture

This talk of removing revisions from core scares me. For me the main reason to keep revisions is another missing Drupal feature: node moderation. While Diff may exist comfortably in contrib, the Revision Moderation module is far to limited by its contrib status. As I see it, that is the feature that makes revisions necessary and needs to be brought into core (#218755: Support revisions in different states), which will be difficult to do if revisions themselves are not in core.

That being said, if you're going to include revisions it does seem odd not to provide a way to compare them. So, show me the button!

babbage’s picture

+1 to diff in core. webchick's screenshots and attached argument are exactly what my expectation of revisions in core was and is.

vtsan’s picture

+1 vote for core.

moshe weitzman’s picture

Assigned: moshe weitzman » Unassigned
wretched sinner - saved by grace’s picture

Would making revisions a standalone, optional core module, which integrates diff, and serves as an API for contrib to extend the abilities of revisions (allowing #218755: Support revisions in different states) be the best solution? Then if someone doesn't need revisions, they can be turned off and removed completely, making the UI simpler for those who have no need for, and currently don't use revisions, and then having a more fully featured and capable starting point for revisions to build from.

And then we could "addz muh button!!11!"

---Edit---
Duh, I've been beaten to this suggestion by webchick in #3! Ah well, that was a few comments ago - worth the reminder I guess.

cburschka’s picture

Status: Needs work » Needs review
FileSize
59.24 KB

I downloaded this big patch expecting much difficulty getting it to apply - but since this patch merely adds code, the three failures were only caused by trivial changes in the surrounding context. :) Here's a fresh patch.

(Naturally, this upload only fixes the obvious problem of applying the changes to the code. I'll leave it to testbot to decide whether the changes still work.)

steve02476’s picture

Revision Moderation and Diff are both requirements for a "serious" CMS I think. Would programmers in a large project use a source code control system that didn't have that kind of functionality? There must be a way for an 'amateur' on a project to edit existing material, and a way for the 'professional' on the project to check the edits and then optionally approve them before they become public.

Status: Needs review » Needs work

The last submitted patch failed testing.

cburschka’s picture

Ceterum censeo, we need more detailed failure output on the testing site. We have such lovely automated testing, and all it accomplishes in this case is that I know which test I have to run locally to see what broke.

I wasn't able to replicate the testbot's results (my test site runs SQLite by the way).

I haven't got the time to examine the error right now, but here is a full dump of the failed test case (Node revisions, naturally).

[removed overly verbose output; attach such stuff as .txt file instead -- d.o moderator]

It's interesting to note that the deletion and reverting asserts failed, but their actions seem to have taken effect.

Also some stuff about the log messages.

cburschka’s picture

Status: Needs work » Needs review
FileSize
59.21 KB

So, looks like node/1/revisions runs into an infinite recursion.

This is because the node_revisions_overview form is set to be rendered by theme_node_revisions_overview, which passes the form array to drupal_render, which dispatches it to the designated renderer theme_node_revisions_overview, which... and so on.

I don't know why theme_node_revisions_overview tries to render the entire form anyway - it already renders a perfectly functional table from the form's elements. I have removed the recursive call in this patch. The revision overview is now displayed with no problems.

cburschka’s picture

All tests pass on my site. I still get the notice, but since testbot didn't get it I rather suspect it's caused by one of my contrib addons.

Dries’s picture

Revisions need to be in core, but diff doesn't have to be (although it could). I'm not 100% convinced yet, it should.

Bojhan’s picture

What is usefull about revisions without diff, for a user?

jstoller’s picture

@Bojhan - I've said it before, but I'll say it again: In my humble opinion, the most important reason for revisions to be in core is the (potential) ability to support content moderation and advanced workflows. (#218755: Support revisions in different states)

That being said, I support the inclusion of Diff in core 100%.

catch’s picture

@Bojhan: it can be useful for 'backups' of content if you have a few different editors but don't need to actually revert very often - that's more a site policy thing than actually useful though.

It'd make sense to tie revisions and diff to fieldable objects (or entities if that patch lands), and have it work for more than just nodes, but I don't think whether diff in core is a good idea depends on doing something like that.

Dries’s picture

Exactly, it is like Ctrl-Z (Undo). My undo feature doesn't sport a diff-feature either.

steve02476’s picture

Maybe not a great analogy, because the idea of Undo or ctrl-z exists in the world of one person working on one project. On a CMS, various people (with various levels of competence and authority) may be be editing the same page. Same as a SCCS?

But, the more I think about it, I agree that DIFF doesn't need to be in core, because as long as revisions / revision moderation is in core, it should be easy to implement diff as a contributed module. Plus, diff is exactly the sort of thing where people may disagree about the appearance etc, so there might be a good reason to have more than one contributed DIFF module?

stella’s picture

I agree diff could be done as a contributed module, but I don't think that's a good idea. Coming purely from a usability stand point, I think it's something that a lot of people expect to be present when you say there's a revisions feature in core, and are left scratching their heads not realising they need a contrib module. So +1 for adding it to core.

Though if we could make it work for fields other than the body, then I think it'd be a really fantastic feature.

Status: Needs review » Needs work

The last submitted patch failed testing.

sun.core’s picture

Version: 7.x-dev » 8.x-dev
joachim’s picture

Revisions are mostly useless without diff. Even with diff they're pretty bad.

Today I found that a drupal.org documentation page I wrote had had some content removed from it. I wanted to get it back.

On wikipedia, that's a two second job once you've found the right revision: you edit the past revision, you select the bit you want back, you paste it into the current page edit form.

On d.org, that was impossible.

lpalgarvio’s picture

makes sense. its a simple and small api, 24kb

and features uses it ;)

mitchell’s picture

Component: node system » entity system
Status: Needs work » Active

> its a simple and small api, 24kb
7.x-3.x isn't as small, but it's significantly improved and still in alpha, so definitely would benefit from more attention and involvement.

> #61: It'd make sense to tie revisions and diff to fieldable objects (or entities if that patch lands), and have it work for more than just nodes
#1365750: Generalize API and Integrate with core field types included this and several other updates. Also, I reopened #1371916: Restructure around the Entity / Field system to expose diffs as entities to help with Rules and Views integrations.

Setting back to active, since #56 is from 2009.

mitchell’s picture

Title: Integrate diff into core » Integrate Diff into Core
Priority: Normal » Major

Raising priority for the needs in:
* #1697256: Create a UI for importing new configuration
* #1821548: Add a "diff" of some kind to the CMI UI
* #1608912: (Re-)adding keys to configuration breaks the intended goal of being able to diff files
* #1776796: Provide a better UX for creating, editing & managing draft revisions.

Since #69, Diff has moved along very well in preparation of the drupal.org D7 upgrade. Still, this issue could use some serious attention.

mitchell’s picture

Issue tags: +revision

tagging

mitchell’s picture

I added a separate issue for #1826156: Move to Text_Diff library, thinking that #1821548: Add a "diff" of some kind to the CMI UI would share it too.

It's a really nice library, so please take a moment to review the code.

mitchell’s picture

For client side diffs, there is this library https://github.com/cemerick/jsdifflib

It might serve well for the intersection of #1826688: REST module: PATCH/update and #1149866: Add Backbone.js and Underscore.js to core.

gdd’s picture

At #1821548-15: Add a "diff" of some kind to the CMI UI we are talking about integrating the diff module's engine and library into core for the purposes of using it with CMI. There's a patch posted there and I'd encourage people interested in this issue to go over there and comment / review the patch.

gdd’s picture

Issue tags: +Configuration system

Moving conversation about actual library integration from #1821548: Add a "diff" of some kind to the CMI UI to here.

So over at #1821548-6: Add a "diff" of some kind to the CMI UI I outlined the diff libraries I found and the pros and cons of each. A big problem with the Horde_Text_Diff library discussed above is that it is not actually completely self-contained. It depends on the libraries Horde_Exception and Horde_Util (which themselves have their own dependencies.) I'm not a fan of including all this code just to get a Diff library. Note that it appears my original evaluation of this library as not PSR-0 compliant appears to be untrue, I had just misunderstood an alternate naming convention for the classes.

I had then been leading towards the phpspec/php-diff library however alexpott argued we should integrate the existing diff module engine. He took the existing Diff module classes, converted them to PSR-0 standards, and rolled it into a core patch for use with CMI. I started to agree with him. While we're taking on some extra code to maintain, once we get it where it needs to be then it won't need much work to keep up. It mostly needs a lot of docs and coding standards help and I also made the following comments at that time

- It seems like the formatter classes could use some architecture help. (creating and implementing a proper interface, potentially making them all descend from a common abstract base class.)
- I'm not sure how much sense it makes to include classes like MappedDiff and WordLevelDiff in core if we aren't using them. While this code may not change much, we should also be mindful not to take on maintenance of code we aren't using. They could easily be supplied by contrib.

I'd like to take that patch and roll it into a new one for this issue (just the library plus an API function for creating diffs) but I'd like to hear thoughts about the library issues too. Swiftly though, I consider this feature freeze limited and we have 12 days left.

Thanks

sun’s picture

One surprisingly simple and extremely reasonably lesson to be learned from @Dries very recently translates into the following for this issue:

I don't think we should be in the business of maintaining a Diff library.

I do not want to put any words in @Dries' mouth, but at least for me, the very same principle applies here. Developing and maintaining a proper diff library takes a significant amount of time and requires plenty of field expertise.

As mentioned in #1821548-20: Add a "diff" of some kind to the CMI UI, Diff module's DiffEngine class was apparently refactored in major ways and thus heavily maintained over the years (instead of contributing to whichever upstream library it originally derived from).

Due to that, I only see two options here:

  1. Convert that Diff module code into a PSR-0 library in Drupal\Component\Diff, and ensure it is properly decoupled from Drupal, so it can be leveraged by other projects.
  2. Use and adopt whichever library is PSR-0 and fulfills basic vendor library requirements of Drupal core. Contribute any missing features to upstream, and update the library accordingly, so as to achieve a better diff performance in D8.

1) clearly puts us into library maintainer business. Personally, I think this is a task that Drupal core should not take on, regardless of how good our fork became over the years. However, only recently, we added a fork of a fork of a fork of a Transliteration library to core, so my opinion doesn't mean too much. ;)

gdd’s picture

I definitely agree with the two options being our only appropriate choices, and based on the research I've done that leaves us with

1) Use the DiffEngine from Diff module.
2) Use the phpspec library.

Ultimately I admit I'm not too fussy about either decision. I agree in principle that we should not be in the library business, but I am still in favor of using the Diff module engine it is ready now, and I don't think it will add much maintainership burden despite its recent refactoring. If one of the Diff module maintainers agreed to be in MAINTAINERS.txt would that help relieve some concerns? Would be nice to have some others weighing in here as well.

sun’s picture

So... I dug a bit deeper. :)

Diff module's DiffEngine originates from PhpWiki, and MediaWiki (Wikipedia) apparently uses the exact same code.

As mentioned before, Diff module's DiffEngine.php was refactored in major ways over the past years — although major parts of the code still look identical to the original library source.

MediaWiki's DairikiDiff.php is actively maintained and shows very recent changes.

PhpWiki actually still seems to have recent releases — however, I checked out their SVN repository and none of the contained library code has been changed since 2003.

However, a very interesting aspect is how the diff library was originally split up and structured — essentially 3 files consisting of:

- difflib.php (low-level diff engine classes),
- diff.php (high-level entry point for all two-way diffs), and
- diff3.php (high-level entry point for 3-way diffs).

This clear organization unfortunately got lost in all forks.


Now, on to phpspec/php-diff:

  • The project is very young.
  • There haven't been many contributors, and it is not very clear which algorithms it uses (it only references Python difflib).
  • It may be PSR-0 compatible, but the code is not aware of PSR-0 classloaders and performs its own loading.

My personal evaluation results:

PhpWiki's Diff code looks superior to me — even though it is old (2003), the origin of the actual diff algorithms are clearly documented and have not been re-invented from scratch:

/**
 * Class used internally by Diff to actually compute the diffs.
 *
 * The algorithm used here is mostly lifted from the perl module
 * Algorithm::Diff (version 1.06) by Ned Konz, which is available at:
 *   http://www.perl.com/CPAN/authors/id/N/NE/NEDKONZ/Algorithm-Diff-1.06.zip
 *
 * More ideas are taken from:
 *   http://www.ics.uci.edu/~eppstein/161/960229.html
 *
 * Some ideas are (and a bit of code) are from from analyze.c, from GNU
 * diffutils-2.7, which can be found at:
 *   ftp://gnudist.gnu.org/pub/gnu/diffutils/diffutils-2.7.tar.gz
 *
 * Finally, some ideas (subdivision by NCHUNKS > 2, and some optimizations)
 * are my own.
 *
 * @author Geoffrey T. Dairiki
 * @access private
 */

(This is not the only documentation in the code that references external resources.)

Those same comments still exist in MediaWiki's fork of DiffEngine.

Furthermore, MediaWiki's fork has been actively maintained since 2003, since it powers the diff engine on one of the world's largest web sites, Wikipedia. (Dare I say I doubt there's any way to battle-test a library any harder? ;))

To wrap up, my conclusion is:

  1. Neither of both libraries has any tests. (Not even MediaWiki wrote some.)

  2. PhpWiki/MediaWiki's DiffEngine looks superior and much more mature to me.

  3. The code needs to be refactored into a proper, PSR-0 compatible PHP component though.

    @alexpott already started with that in #1821548: Add a "diff" of some kind to the CMI UI, but I think this has to be approached a bit differently: First, we want to compare Diff module's DiffEngine with MediaWiki's DiffEngine and essentially merge them. Second, the original structure/organization of the diff classes should be re-instantiated. Only afterwards, we want to perform the PSR-0/component conversion.

  4. If done well, it appears to me that this refactored PHP component would be a very interesting and useful contribution to the wider PHP framework developer community, so by providing it has a proper and decoupled library in Drupal\Component\Diff, Drupal could actively help others.

  5. I didn't verify it yet, but I believe all of the refactoring could happen later on — the classloader should be able to load the current Diff* classes from Drupal\Component\Diff\ by registering Diff* as a namespace prefix (essentially, identical to how Twig classes are loaded).

    Alternatively, I personally wouldn't mind to just stuff the .php files into the designated later filesystem location, but manually include the files where needed for now.

    The merging and refactoring appears to be a larger job from my perspective, but I don't see why other core efforts should be held off by that (i.e., Config Import UI + Node/Entity Revisions UI).

What do you think?

sun’s picture

Issue summary: View changes

Updated issue summary.

sun’s picture

FWIW, I've merged a couple of comments and background info into the issue summary and essentially took over #78 as the proposed solution, since I think @heyrocker and I seem to be in line, merely based on different arguments. ;)

It should be noted though that the effective summary of my analysis is that there would in fact be quite some maintenance work for us, at minimum in the short-term. Long-term is hard to predict — MediaWiki shows no sign of upgrading to PHP 5.3+, so at this point it is unlikely that they will convert their diff class into a component anytime soon (but they will still continue to patch/improve their current). Sorta circling back into what I already mentioned — if we do it properly, then MediaWiki could simply pull in and incorporate the Diff component from Drupal, since it's going to be more or less exactly the same as their current code, just as a proper PSR-0 component.

gdd’s picture

Thanks for doing a deeper analysis of the solutions available, it is really helpful. I think your plan of merging the components to create a modern reusable library is a thoughtful and ideal choice. Rearchitecting and merging the code bases is some bit of work, but it shouldn't be too bad, especially if we can get the current Diff maintainers involved.

The question in the short-term then becomes what do we do for this patch? Either of the solutions you gave above would work for me, but given that we agree to a greater re-architecture down the road, wouldn't it be just as good to take alexpott's existing patch and use that for now?

gdd’s picture

Issue summary: View changes

Formatting.

sun’s picture

Any premature PSR-ification will make comparisons and merges of the forks significantly harder, so I'd really prefer to simply do this in that case:

require_once DRUPAL_ROOT . '/core/lib/Drupal/Component/Diff/DiffEngine.php';

i.e., just simply dump the existing library file into the designated later filesystem location, load it, and use it. Remove the manual loading as soon as it has been merged and refactored.

alexpott’s picture

Status: Active » Needs review
FileSize
36.26 KB

Here a patch to do just that! (#81 suggestion).

This patch just takes the DiffEngine.php file from the latest and greatest Diff module for Drupal 7 and plonks it where @sun suggests.

sun’s picture

Assigned: Unassigned » Dries
Status: Needs review » Reviewed & tested by the community

Thanks!

@Dries had reservations about this in the past, but I think @webchick addressed those in a very convincing way in #43 :P ;) — but regardless of that, assigning to @Dries.

Also, I'd personally volunteer to co-work on that refactoring into a modern PHP component, so as to help to ensure that we're 1) incorporating and merging MediaWiki's improvements and 2) make the new component truly useful for other PHP frameworks. I'd consider that post-feature-freeze material though.

gdd’s picture

We could just do that too :)

I'm on board with this as well. Thanks a lot alex and sun for helping push this forward quickly!

Dries’s picture

Status: Reviewed & tested by the community » Fixed

I fought this one before, but I think now is a good time. Happy to admit I was wrong about this in the past.

Committed to 8.x. Thanks.

Let's continue to make some progress on CMI.

lpalgarvio’s picture

yay, another issue bites the dust =P

catch’s picture

Title: Integrate Diff into Core » Change notice: Integrate Diff into Core
Category: feature » task
Priority: Major » Critical
Status: Fixed » Active

Could use a change notice.

YesCT’s picture

I think this put in just the Engine. So #1821548: Add a "diff" of some kind to the CMI UI can get restarted building on this.

Alan D.’s picture

Sorry coming in late here. Regarding core Diff class code stability as brought up in #1821548: Add a "diff" of some kind to the CMI UI.

If you exclude coding standards and strlen() to drupal_strlen(), there have only been about 6 loc added in the core phpwiki classes since this file was added in 2007.

The only issue with it that I know of is with long strings without newlines that are greater than 10,000 characters. These don't compare that well. This is only really an issue with WYSIWYG output that strips newlines on save. See #1828346: Diff seems to work incorrectly on large pages if your interested. As this is a performance based limit, I'm considering this a GIGO issue better resolved independently of the core classes.

sun’s picture

Title: Change notice: Integrate Diff into Core » Integrate Diff into Core
Assigned: Dries » Unassigned
Category: task » feature
Priority: Critical » Normal
Status: Active » Fixed

Status: Fixed » Closed (fixed)
Issue tags: -Usability, -revision, -Configuration system

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

Updated issue summary.