We've got Archive_Tar in core now!

But we also want to support other Archive operations. The Plugin Manager and Update Status want this as we start supporting .zip files.

To make this simple to achieve, here is the anticipated call pattern:

$file = "myfile.tar.gz" // (or .zip, .bz2, rar, etc).
$archiver = Archiver::factory($file); 
$files = $archiver->list();
$archiver->extract($destination);
$files[] = '/tmp/newfile';
$archiver->archive($files);

Obviously, this can evolve and become richer, but for now, these are the basic operations all archive formats should support. This doesn't mean Archive_tar won't be able to be accessed normally, it just provides a more logical way to instantiate, consume and extend Archive Formats.

So:


abstract class Archiver {
  abstract public function extract($destination);

  static function factory($file) {
    $archivers = system_get_archivers(); // Returns a list of possible Archiver classes
    foreach ($archivers as $archiver) {
     if ($archiver->supports($file)) {
       return new $archiver($file);
     }
    }
  }
}

And an archiver would look like this:

class ArchiverTar extends Archiver() {
  function extract($destination)...
  function list()...
}
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

dww’s picture

Yes, we really need this or we lock ourselves into only supporting .tar.gz for the next 2+ years. :(

Crell’s picture

Status: Active » Needs review
FileSize
12.7 KB

OK, this is by no means done, and probably don't quite work yet, but here's the planned approach.

1) Define ArchiverInterface, which is a near-exact mirror of Archive_Tar. It even has the same docblocks, even though they don't conform to Drupal standard. :-) Will fix that later.

2) Define another info hook to declare Archiver implementations. Standard stuff here, mostly.

3) Create a factory function for routing a file path to the correct class.

4) System module declares 3 (yes, 3!) archivers: Tar, TarGz, and TarBz2. Why? Because that made supporting the different options to Archive_Tar easier.

Oh, why's that? Because we're leveraging The Power Of Interfaces (TPOI) so that each of the three archivers EXTENDS Archive_Tar, but then implements the required interface. Because we're just defining a base interface, NOT a base abstract class, we are able to do that and wrap virtually any foreign class into our system cleanly just by ensuring that the right methods are implemented. Those happen to happen in this case easily because the interface was defined to match Archive_tar in the first place. :-)

Anyone who says they don't understand why interfaces are useful, send them to this patch and don't let them leave until they understand it.

As I said, this is still incomplete so will probably not even pass, but I wanted to get the current status published for review. Tag dww!

JacobSingh’s picture

@Crell: Very nice start!

I agree about TPOI, and I think it is potentially useful here, but not the way it is implemented. I agree we could do this instead of an abstract class because I don't see how a Template pattern would benefit us in this case (although we will likely find out later) :)

Gripes

1. The Interface you've defined is too exhaustive. Most archive providers will not support all those methods, don't force them to.

2. The Interface mimicking Archive_Tar is not necessarily a good thing. We should define a logical interface for us, not map it to Archive_Tar for the sake of convenience. That interface is full of wtfs. I still suggest the interface I defined above. (extract, list, create) - we can always extend this later.

3. There is absolutely no need to define 3 different classes for handling gz vs bz2 vs tar. These are all handled by Archive_Tar natively, with no flags passed in. There is just one we need to define for tar: ArchiverTar and it will wrap Archive_Tar. I think we should build 3:

ArchiveZip
ArchiveTar
ArchiveRar* Maybe

I can't think of any other archive formats that are needed here.

+++ modules/system/system.archiver.inc	2009-10-15 03:38:26 +0000
@@ -0,0 +1,39 @@
+/**
+ * Archiver for .tar files.
+ */
+class ArchiverTar extends Archive_Tar implements ArchiverInterface {
+  public function __construct($filename) {
+    parent::__construct($filename);
+  }
+}
+
+/**
+ * Archiver for .tar.gz files.
+ */
+class ArchiverTarGz extends Archive_Tar implements ArchiverInterface {
+  public function __construct($filename) {
+    parent::__construct($filename, 'gz');
+  }
+}
+

Why do this, it's just bloat IMO, Archive_Tar will take any of those without the flag, we already do it.

If you look at the Archive_Tar class, you can see this:

if (($p_compress === null) || ($p_compress == '')) {
            if (@file_exists($p_tarname)) {
                if ($fp = @fopen($p_tarname, "rb")) {
                    // look for gzip magic cookie
                    $data = fread($fp, 2);
                    fclose($fp);
<code>

Also, we're wrapping it, so if we want to provide a flag for whatever reason (can't think of any) we can.

<code>
+++ includes/common.inc	2009-10-15 03:24:12 +0000
@@ -5797,3 +5797,47 @@ function xmlrpc($url) {
+    if (in_array($extension, $implementation['extensions'])) {
+      return new $implementation['class']($file);
+    }

This assumes that extensions are the only way to tell what class should handle it. Should call a static detection method in all of the implementers of ArchiverInterface until one says it can handle it. More flexibility. If you want to make a base class with a convenience method for extensions, go for it. Archive_tar already has one btw: _is_Archive().

Summary
Interfaces are useful in many applications, and this is not a misuse. I just think building separate classes for all tar compression formats to show off interfaces with no other reason is overkill.

My todo list here is:

a) simplify. One implementer of the interface for now: ArchiverTar().

b) flexible. All implementers need to implement ::supports($file_path); @return bool.

c) suitable. Whack the Archive_Tar interface, make it better for our needs, follow our coding standards (can't believe I just said that) and easier to plug into and put it in ArchiverTar(), and be done with it.

d) ready. Build ArchiverZip. I think applying this pattern without building out a basic zip implementation is a mistake. Even if it might have low-lying bugs in it, at least is there and proves the usefulness. I recommend gutting http://pear.php.net/package/Archive_Zip.

Okay, hope that helps.

-J

JacobSingh’s picture

In our factory, I think we should make sure the stream is local no Archiving tools we use are going to operate on remote streams AFAIK. Archive_Tar (or rather zlib) doesn't.

drupal_real_path() translates a stream to a local path (if it is local).
if it is not, copy() should be able to copy it to a local temporary location.

Of course this means copying it back if it is modified, and additional complexity. At a first go, I think throwin an exception saying "You can't work with remote archives is good enough", but the drupal_real_path will need to be in the factory to check that.

-J

Crell’s picture

FileSize
10.23 KB

After much discussion with Jacob in IRC, here's round 2. We now compose Archive_Tar rather than extending it, and the interface is much simpler. We're keeping the simple class-external mapping, though, to avoid lots of autoloading and instantiation and we can't get much out of it anyway since we may be dealing with just bare file name strings, not files that exist yet.

Crell’s picture

FileSize
10.23 KB

Oops, silly typo.

JacobSingh’s picture

+++ includes/common.inc	2009-10-15 08:10:09 +0000
@@ -5797,3 +5797,53 @@ function xmlrpc($url) {
+     if (strrpos($file, $extension) === strlen($file) - strlen($ext)) {
+       return new $implementation['class']($file);
+     }

This probably doesn't matter, but it doesn't check for the occurrence of a . in the filename. So blacktar.gz will match.

Also, "myzip" would match of course.

There are also a lot of whitespace changes. I don't know if you want to deal with them here or not. Either way, without testing it RTBC for code, docs and structure.

Crell’s picture

FileSize
7.47 KB

Easy enough to fix the extension matching. Also removed the whitespace blocks, because I know that I have other patches pending that fix them, too. :-) (Seriously, people, strip whitespace in your IDE!)

Bot, what do you think?

JacobSingh’s picture

Status: Needs review » Reviewed & tested by the community

Okay, this looks good to me, and will be a nice addition to Plugin manager so that it can handle Zip files, etc

#605408: Provide an implementation of Archiver_Interface to handle .zip files

I think this is also very unit testable, at least the factory logic. Testing the performance of the actual archive operations is kinda silly in the case of Archive_Tar, but could be done for others, so:

#605412: ArchiverInterface needs unit tests.

Dries’s picture

This looks nice and easy. One comment:

+++ includes/archiver.inc	2009-10-15 07:08:32 +0000
@@ -0,0 +1,67 @@
+  public function listContents();

Why listContents() and not just list()? We use extract() and not extractContents()? Just wondering.

JacobSingh’s picture

->list is a system construct :)

We tried that first, but it's a parse error.

Dries’s picture

Status: Reviewed & tested by the community » Needs work

Ok, makes sense. The patch needs a reroll though.

JacobSingh’s picture

Status: Needs work » Needs review
FileSize
7.64 KB

Okay, here it is.

Crell’s picture

Status: Needs review » Reviewed & tested by the community

Bot still approves. Woot!

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Thanks. Committed.

We should consider a combined CHANGELOG.txt entry for all the archive stuff?

sun’s picture

Status: Fixed » Needs work

whoa... that is what I'd call quick! ;)

Looks good except for a couple coding-style issues...

+++ includes/archiver.inc
@@ -0,0 +1,67 @@
+<?php
+
+++ modules/system/system.archiver.inc
@@ -0,0 +1,67 @@
+<?php
+

Missing CVS $Id$

+++ includes/common.inc
@@ -5937,3 +5937,53 @@ function xmlrpc($url) {
+function archiver_get_info() {
+  $archiver_info = &drupal_static(__FUNCTION__, array());
+
+  if (empty($archiver_info)) {
+    $cache = cache_get('archiver_info');
+    if ($cache === FALSE) {
+      // Rebuild the cache and save it.
+      $archiver_info = module_invoke_all('archiver_info');
+      drupal_alter('archiver_info', $archiver_info);
+      uasort($archiver_info, 'drupal_sort_weight');
+      cache_set('archiver_info', $archiver_info);
+    }
+    else {
+      $archiver_info = $cache->data;
+    }
+  }
+
+  return $archiver_info;
+}

In D8, we should think about a common info registry helper function. I think I've seen THIS function with one more or less comment and a single string change a couple of times in core now. ;)

+++ includes/common.inc
@@ -5937,3 +5937,53 @@ function xmlrpc($url) {
+  foreach ($archiver_info as $implementation) {
+   foreach ($implementation['extensions'] as $extension) {
+     // Because extensions may be multi-part, such as .tar.gz,
+     // we cannot use simpler approaches like substr() or pathinfo().
+     // This method isn't quite as clean but gets the job done.
+     // Also note that the file may not yet exist, so we cannot rely
+     // on fileinfo() or other disk-level utilities.
+     if (strrpos($file, '.' . $extension) === strlen($file) - strlen('.' . $extension)) {
+       return new $implementation['class']($file);
+     }
+   }
+  }

Wrong indentation here.

+++ modules/system/system.api.php
@@ -2403,6 +2403,34 @@ function hook_action_info_alter(&$actions) {
 /**
+ * Declare archivers to the system.
+ *
...
+ * When mapping a
+ *

uhm? a what? :)

+++ modules/system/system.api.php
@@ -2403,6 +2403,34 @@ function hook_action_info_alter(&$actions) {
+ * Each entry should be keyed on a unique value, and specify three
+ * additional keys:
+ *   - class: The name of the PHP class for this archiver.
+ *   - extensions: An array of file extensions that this archiver supports.
+ *   - weight: This optional key specifies the weight of this archiver.
+ *     When mapping file extensions to archivers, the first archiver by
+ *     weight found that supports the requested extension will be used.

Please no extra indentation for Doxygen lists. Please read http://drupal.org/node/1354 on proper formatting.

+++ modules/system/system.api.php
@@ -2403,6 +2403,34 @@ function hook_action_info_alter(&$actions) {
+function system_archiver_info() {

Wrong function name, this should be hook_archiver_info().

This review is powered by Dreditor.

axyjo’s picture

Status: Needs work » Needs review
FileSize
3.95 KB

Fixes all of sun's comments in #17. I removed the "When mapping a" comment since I think the code authors do go on and complete the comment in the list ("When mapping file extensions to archivers, the first archiver by...").

EDIT: Looks like my editor caught some trailing spaces as well.

JacobSingh’s picture

Status: Needs review » Reviewed & tested by the community

White space killing spree!

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD. Thanks!

JacobSingh’s picture

FileSize
1.58 KB

Another follow up.

The error handling is bad in archiver_get_archiver since we can't operate on remote archives because zlib, etc won't support them. This change makes that clear, also improves the docs a little bit to indicate a return of FALSE on failure.

JacobSingh’s picture

Status: Fixed » Needs review
JacobSingh’s picture

This is actually blocking implementation of
#605918: Port update manager to use the new Archiver class, not directly Archive_Tar (the original motivation)

because passing temproary://update-cache/whatever.tar.gz causes it to bork now.

Crell’s picture

Looks good to me, but shouldn't we have a properly typed exception rather than just the generic one?

dww and I started on some naming conventions for exceptions here: http://drupal.org/node/608166

We aren't following them entirely in core yet, but we should start doing so. :-)

JacobSingh’s picture

I thought about that as I wrote it, but copped to "Well, we don't have any good classifications, so implementing one here is not worth it."

I don't want to hold up this patch for this issue, although it is important to get right in D8. I think in this case, it is not really an ArchiverException, because it doesn't really have much to do with Archiving, it's something like a "FileAccessException", but that just sounds too meta and weird. I guess Archiver would be okay... perhaps a BadFileStreamException, for cases were the wrong type of stream is passed in? But then it isn't implemented anywhere in the stream handling operations, so this would be weird to make a precedent here.

Either way, let's just take a call on it. I'd just leave it as is, but if you want to throw a type on it, just re-roll and let's RTBC this so we can progress on implementing it.

Dries’s picture

Status: Needs review » Fixed

Committed to CVS HEAD. We can follow-up about the permission later.

JacobSingh’s picture

eh? Permission?

dww’s picture

I think he meant to say "Exception"...

Status: Fixed » Closed (fixed)
Issue tags: -D7 API clean-up, -Update manager

Automatically closed -- issue fixed for 2 weeks with no activity.