Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
We've got Archive_Tar in core now!
But we also want to support other Archive operations. The Plugin Manager and Update Status want this as we start supporting .zip files.
To make this simple to achieve, here is the anticipated call pattern:
$file = "myfile.tar.gz" // (or .zip, .bz2, rar, etc).
$archiver = Archiver::factory($file);
$files = $archiver->list();
$archiver->extract($destination);
$files[] = '/tmp/newfile';
$archiver->archive($files);
Obviously, this can evolve and become richer, but for now, these are the basic operations all archive formats should support. This doesn't mean Archive_tar won't be able to be accessed normally, it just provides a more logical way to instantiate, consume and extend Archive Formats.
So:
abstract class Archiver {
abstract public function extract($destination);
static function factory($file) {
$archivers = system_get_archivers(); // Returns a list of possible Archiver classes
foreach ($archivers as $archiver) {
if ($archiver->supports($file)) {
return new $archiver($file);
}
}
}
}
And an archiver would look like this:
class ArchiverTar extends Archiver() {
function extract($destination)...
function list()...
}
Comment | File | Size | Author |
---|---|---|---|
#21 | archiver-604618-21.patch | 1.58 KB | JacobSingh |
#18 | archive-604618-18-axyjo.patch | 3.95 KB | axyjo |
#14 | archiver-604618-14.patch | 7.64 KB | JacobSingh |
#8 | archiver.patch | 7.47 KB | Crell |
#6 | archiver.patch | 10.23 KB | Crell |
Comments
Comment #1
dwwYes, we really need this or we lock ourselves into only supporting .tar.gz for the next 2+ years. :(
Comment #2
Crell CreditAttribution: Crell commentedOK, this is by no means done, and probably don't quite work yet, but here's the planned approach.
1) Define ArchiverInterface, which is a near-exact mirror of Archive_Tar. It even has the same docblocks, even though they don't conform to Drupal standard. :-) Will fix that later.
2) Define another info hook to declare Archiver implementations. Standard stuff here, mostly.
3) Create a factory function for routing a file path to the correct class.
4) System module declares 3 (yes, 3!) archivers: Tar, TarGz, and TarBz2. Why? Because that made supporting the different options to Archive_Tar easier.
Oh, why's that? Because we're leveraging The Power Of Interfaces (TPOI) so that each of the three archivers EXTENDS Archive_Tar, but then implements the required interface. Because we're just defining a base interface, NOT a base abstract class, we are able to do that and wrap virtually any foreign class into our system cleanly just by ensuring that the right methods are implemented. Those happen to happen in this case easily because the interface was defined to match Archive_tar in the first place. :-)
Anyone who says they don't understand why interfaces are useful, send them to this patch and don't let them leave until they understand it.
As I said, this is still incomplete so will probably not even pass, but I wanted to get the current status published for review. Tag dww!
Comment #3
JacobSingh CreditAttribution: JacobSingh commented@Crell: Very nice start!
I agree about TPOI, and I think it is potentially useful here, but not the way it is implemented. I agree we could do this instead of an abstract class because I don't see how a Template pattern would benefit us in this case (although we will likely find out later) :)
Gripes
1. The Interface you've defined is too exhaustive. Most archive providers will not support all those methods, don't force them to.
2. The Interface mimicking Archive_Tar is not necessarily a good thing. We should define a logical interface for us, not map it to Archive_Tar for the sake of convenience. That interface is full of wtfs. I still suggest the interface I defined above. (extract, list, create) - we can always extend this later.
3. There is absolutely no need to define 3 different classes for handling gz vs bz2 vs tar. These are all handled by Archive_Tar natively, with no flags passed in. There is just one we need to define for tar: ArchiverTar and it will wrap Archive_Tar. I think we should build 3:
ArchiveZip
ArchiveTar
ArchiveRar* Maybe
I can't think of any other archive formats that are needed here.
Why do this, it's just bloat IMO, Archive_Tar will take any of those without the flag, we already do it.
If you look at the Archive_Tar class, you can see this:
This assumes that extensions are the only way to tell what class should handle it. Should call a static detection method in all of the implementers of ArchiverInterface until one says it can handle it. More flexibility. If you want to make a base class with a convenience method for extensions, go for it. Archive_tar already has one btw: _is_Archive().
Summary
Interfaces are useful in many applications, and this is not a misuse. I just think building separate classes for all tar compression formats to show off interfaces with no other reason is overkill.
My todo list here is:
a) simplify. One implementer of the interface for now: ArchiverTar().
b) flexible. All implementers need to implement ::supports($file_path); @return bool.
c) suitable. Whack the Archive_Tar interface, make it better for our needs, follow our coding standards (can't believe I just said that) and easier to plug into and put it in ArchiverTar(), and be done with it.
d) ready. Build ArchiverZip. I think applying this pattern without building out a basic zip implementation is a mistake. Even if it might have low-lying bugs in it, at least is there and proves the usefulness. I recommend gutting http://pear.php.net/package/Archive_Zip.
Okay, hope that helps.
-J
Comment #4
JacobSingh CreditAttribution: JacobSingh commentedIn our factory, I think we should make sure the stream is local no Archiving tools we use are going to operate on remote streams AFAIK. Archive_Tar (or rather zlib) doesn't.
drupal_real_path() translates a stream to a local path (if it is local).
if it is not, copy() should be able to copy it to a local temporary location.
Of course this means copying it back if it is modified, and additional complexity. At a first go, I think throwin an exception saying "You can't work with remote archives is good enough", but the drupal_real_path will need to be in the factory to check that.
-J
Comment #5
Crell CreditAttribution: Crell commentedAfter much discussion with Jacob in IRC, here's round 2. We now compose Archive_Tar rather than extending it, and the interface is much simpler. We're keeping the simple class-external mapping, though, to avoid lots of autoloading and instantiation and we can't get much out of it anyway since we may be dealing with just bare file name strings, not files that exist yet.
Comment #6
Crell CreditAttribution: Crell commentedOops, silly typo.
Comment #7
JacobSingh CreditAttribution: JacobSingh commentedThis probably doesn't matter, but it doesn't check for the occurrence of a . in the filename. So blacktar.gz will match.
Also, "myzip" would match of course.
There are also a lot of whitespace changes. I don't know if you want to deal with them here or not. Either way, without testing it RTBC for code, docs and structure.
Comment #8
Crell CreditAttribution: Crell commentedEasy enough to fix the extension matching. Also removed the whitespace blocks, because I know that I have other patches pending that fix them, too. :-) (Seriously, people, strip whitespace in your IDE!)
Bot, what do you think?
Comment #10
JacobSingh CreditAttribution: JacobSingh commentedOkay, this looks good to me, and will be a nice addition to Plugin manager so that it can handle Zip files, etc
#605408: Provide an implementation of Archiver_Interface to handle .zip files
I think this is also very unit testable, at least the factory logic. Testing the performance of the actual archive operations is kinda silly in the case of Archive_Tar, but could be done for others, so:
#605412: ArchiverInterface needs unit tests.
Comment #11
Dries CreditAttribution: Dries commentedThis looks nice and easy. One comment:
Why listContents() and not just list()? We use extract() and not extractContents()? Just wondering.
Comment #12
JacobSingh CreditAttribution: JacobSingh commented->list is a system construct :)
We tried that first, but it's a parse error.
Comment #13
Dries CreditAttribution: Dries commentedOk, makes sense. The patch needs a reroll though.
Comment #14
JacobSingh CreditAttribution: JacobSingh commentedOkay, here it is.
Comment #15
Crell CreditAttribution: Crell commentedBot still approves. Woot!
Comment #16
Dries CreditAttribution: Dries commentedThanks. Committed.
We should consider a combined CHANGELOG.txt entry for all the archive stuff?
Comment #17
sunwhoa... that is what I'd call quick! ;)
Looks good except for a couple coding-style issues...
Missing CVS $Id$
In D8, we should think about a common info registry helper function. I think I've seen THIS function with one more or less comment and a single string change a couple of times in core now. ;)
Wrong indentation here.
uhm? a what? :)
Please no extra indentation for Doxygen lists. Please read http://drupal.org/node/1354 on proper formatting.
Wrong function name, this should be hook_archiver_info().
This review is powered by Dreditor.
Comment #18
axyjo CreditAttribution: axyjo commentedFixes all of sun's comments in #17. I removed the "When mapping a" comment since I think the code authors do go on and complete the comment in the list ("When mapping file extensions to archivers, the first archiver by...").
EDIT: Looks like my editor caught some trailing spaces as well.
Comment #19
JacobSingh CreditAttribution: JacobSingh commentedWhite space killing spree!
Comment #20
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD. Thanks!
Comment #21
JacobSingh CreditAttribution: JacobSingh commentedAnother follow up.
The error handling is bad in archiver_get_archiver since we can't operate on remote archives because zlib, etc won't support them. This change makes that clear, also improves the docs a little bit to indicate a return of FALSE on failure.
Comment #22
JacobSingh CreditAttribution: JacobSingh commentedComment #23
JacobSingh CreditAttribution: JacobSingh commentedThis is actually blocking implementation of
#605918: Port update manager to use the new Archiver class, not directly Archive_Tar (the original motivation)
because passing temproary://update-cache/whatever.tar.gz causes it to bork now.
Comment #24
Crell CreditAttribution: Crell commentedLooks good to me, but shouldn't we have a properly typed exception rather than just the generic one?
dww and I started on some naming conventions for exceptions here: http://drupal.org/node/608166
We aren't following them entirely in core yet, but we should start doing so. :-)
Comment #25
JacobSingh CreditAttribution: JacobSingh commentedI thought about that as I wrote it, but copped to "Well, we don't have any good classifications, so implementing one here is not worth it."
I don't want to hold up this patch for this issue, although it is important to get right in D8. I think in this case, it is not really an ArchiverException, because it doesn't really have much to do with Archiving, it's something like a "FileAccessException", but that just sounds too meta and weird. I guess Archiver would be okay... perhaps a BadFileStreamException, for cases were the wrong type of stream is passed in? But then it isn't implemented anywhere in the stream handling operations, so this would be weird to make a precedent here.
Either way, let's just take a call on it. I'd just leave it as is, but if you want to throw a type on it, just re-roll and let's RTBC this so we can progress on implementing it.
Comment #26
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD. We can follow-up about the permission later.
Comment #27
JacobSingh CreditAttribution: JacobSingh commentedeh? Permission?
Comment #28
dwwI think he meant to say "Exception"...