Drupal provides the format_size function to format a file size. But it is not enough flexible to meet all requirements. It cannot handle suffixes larger than megabytes, don't let choose the precision, and the format used to render it.

The following code is able to take the three most common file size formats up to the yottabyte. It is feature complete and could be considered in future versions of Drupal.

function format_filesize($filesize, $precision = 2, $format = "byte") {
switch ($format) {
case "byte":
$suffixes = array("B", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB");
$power_div = 1024;
break;
case "byte-si":
$suffixes = array("B", "KiB", "MiB", "GiB", "TiB", "PiB", "EiB", "ZiB", "YiB");
$power_div = 1000;
break;
case "bits":
$suffixes = array("b", "kb", "Mb", "Gb", "Tb", "Pb", "Eb", "Zb", "Yb");
$filesize *= 8;
$power_div = 1000;
break;
default:
die(t("No valid file size format specified."));
}
$power = floor(log($filesize, $power_div));
$suffix = t($suffixes[$power]);
if ($power > 0) {
$filesize = number_format($filesize / pow($power_div, $power), $precision);
}
return t(
"@filesize @suffix",
array(
'@filesize' => $filesize,
'@suffix' => $suffix
)
);
}

Comments

bdragon’s picture

Status: Needs review » Active

Please attach a patch in -up format before setting cnr.. Thanks.

ntiostle’s picture

The case for SI is wrong. The SI defines k=kilo=1000 (see http://en.wikipedia.org/wiki/SI_prefix#List_of_SI_prefixes). The KiB, MiB units are defined in IEC 60027-2 (http://en.wikipedia.org/wiki/Binary_prefix#IEC_standard_prefixes) and define 1KiB=1024 Byte.

mrharolda’s picture

Version: 6.x-dev » 7.x-dev
Component: comment.module » base system
Status: Active » Needs review
StatusFileSize
new1.12 KB

I've attached a patch without the bits and SI code, but as close to the existing code as possible.

It adds support for 'KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB' and 'YiB' using the correct naming schema as defined in http://en.wikipedia.org/wiki/Binary_prefix#IEC_standard_prefixes

Edit: removed 'B' from the list, as it is handled by the first if statement...

dries’s picture

Status: Needs review » Needs work

I know that technically, one has to use KiB and MiB but in practice, people don't know what these stand for (unless you have a CS degree). I'd prefer to stick with MB instead of MiB.

mrharolda’s picture

StatusFileSize
new1.25 KB

How about using units of 1000 bytes as a kilobyte (KB) then? That would be complient to the SI prefixes: http://en.wikipedia.org/wiki/SI_prefixes#List_of_SI_prefixes and thus not a 'lie'... ;)

mrharolda’s picture

Status: Needs work » Needs review

reset patch to 'code needs review'...

dries’s picture

I'm OK with the patch in #5 but it could use a SimpleTest ...

mrharolda’s picture

Status: Needs work » Needs review

After running the following test, I found 2 issues with my patch...

1: yottabytes don't fit in a 32-bit integer ;)
2: there is a bug in locale_get_plural() wich uses $langcode = NULL as index in line 422: $locale_formula[$langcode] = $language_list[$langcode]->formula;

The test:

$test_sizes = array(
  '1 byte' =>      '1', // byte
  '1 KB' =>        '1'.'000', // kilobyte
  '1 MB' =>        '1'.'000'.'000', // megabyte
  '1 GB' =>        '1'.'000'.'000'.'000', // gigabyte
  '1 TB' =>        '1'.'000'.'000'.'000'.'000', // terabyte
  '1 PB' =>        '1'.'000'.'000'.'000'.'000'.'000', // petabyte
  '1 EB' =>        '1'.'000'.'000'.'000'.'000'.'000'.'000', // exabyte
  '1 ZB' =>        '1'.'000'.'000'.'000'.'000'.'000'.'000'.'000', // zettabyte
  '1 YB' =>        '1'.'000'.'000'.'000'.'000'.'000'.'000'.'000'.'000', // yottabyte
  '2 bytes' =>     '2', // bytes
  '3.62 MB' =>     '3'.'623'.'651', // megabytes
  '67.23 PB' =>   '67'.'234'.'178'.'751'.'368'.'124', // petabytes
  '235.35 YB' => '235'.'346'.'823'.'821'.'125'.'814'.'962'.'843'.'827', // yottabyte
);
foreach ($test_sizes as $expected => $size) {
  $this->assertTrue(
    ($result = format_size($size, NULL)) == $expected,
    "format_size(): '". $expected ."' == '". $result ."'. %s"
  );
}

Edit: removed attached patch to prevent confusion

mrharolda’s picture

I've just created a separate issue for issue 2 in comment #8: http://drupal.org/node/263259

damien tournoud’s picture

Hum. The handling of file sizes as strings is ugly. Do we really need to support sizes up to yottabytes?

mrharolda’s picture

The maximum value of a signed 32-bit integer is 2147483647, which is equal to 2.15 GB.

Support up to yottabyte is probably a bit too much, but it comes for free with this function. I'd guess support of terabyte is quite reasonable these days...

We @ madcap are working on a video platform capable of handling multiple terabytes of data, hence this function enhancement. Filesize is retrieved from a SUM() function in mysql and fetched as a string, because of the limit of 32-bit integers as mentioned above.

damien tournoud’s picture

Why not treating everything as a float instead? We just don't need arbitrary precision here.

mrharolda’s picture

You're absolutely right! ;)

I've just run the test code below both a 32-bit and a 64-bit platform and it passed without any problems using the non-string code as posted in comment #5. Integers larger than 2147483647 are automatically casted into a float on a 32-bit platform and the code in comment #5 does not have a problem with that...

$test_sizes = array(
  '1 byte' =>    1, // byte
  '1 KB' =>      1000, // kilobyte
  '1 MB' =>      1000000, // megabyte
  '1 GB' =>      1000000000, // gigabyte
  '1 TB' =>      1000000000000, // terabyte
  '1 PB' =>      1000000000000000, // petabyte
  '1 EB' =>      1000000000000000000, // exabyte
  '1 ZB' =>      1000000000000000000000, // zettabyte
  '1 YB' =>      1000000000000000000000000, // yottabyte
  '2 bytes' =>   2, // bytes
  '3.62 MB' =>   3623651, // megabytes
  '67.23 PB' =>  67234178751368124, // petabytes
  '235.35 YB' => 235346823821125814962843827, // yottabyte
);
foreach ($test_sizes as $expected => $size) {
  $this->assertTrue(
    ($result = format_size($size, NULL)) == $expected,
    "format_size(): '". $expected ."' == '". $result ."'. %s"
  );
}
mrharolda’s picture

If no-one has any other objections, can this be committed to D7?

Patch @ comment #5: http://drupal.org/node/151902#comment-858506

-tnx!-

Harold.

mrharolda’s picture

Status: Needs review » Reviewed & tested by the community

Dries: I'm OK with the patch in #5 but it could use a SimpleTest ...

Simpletest code in comment #13...

dries’s picture

Status: Reviewed & tested by the community » Needs work

The way we use strings to do this is, in fact, ugly. I'd like to see us brainstorm some more about something that is a bit more elegant.

dries’s picture

Also, please add the simpletest code as a patch. We commit these to core as well nowadays.

damien tournoud’s picture

@Dries: The brainstorming is already done. The patch in #5 does not use strings at all, but floats.

mrharolda’s picture

StatusFileSize
new1.62 KB

Please review the attached patch wich will create a 'common.test' file in which the test for format_size() resides...

The patch for the format_size() function can be found at comment #5.

mrharolda’s picture

Off topic: I've changed my username from Harold@Madcap to MadHarold, since IRC doesn't allow '@' in a username...

mrharolda’s picture

I've updated format_size() to never show 1000 'whatevers'. I've also added the following test to prove it! ;)

'1 MB'      => 999999, // 1 MB (not 1000 kilobyte!)

Edit:
Removed the 2 patches in favor of the single patch attached below...

mrharolda’s picture

StatusFileSize
new3.02 KB

Replaced the above patches with a single one, diffed from the root of the Drupal install...

This single patch includes the creation of the common.test file.

coupet’s picture

Calculations should be in Binary or Decimal (SI)?

Byte
http://en.wikipedia.org/wiki/Byte

mrharolda’s picture

@coupet: Dries said that using decimal(1000) calculations would be best. Using binary(1024) calculations would force the output to use KiB, MiB, etc and users would be confused by that...

Using a mix of binary(1024) and KB, MB etc would be 'lying'... :)

catch’s picture

Category: feature » task
Priority: Normal » Critical
Status: Needs review » Reviewed & tested by the community

Test passes fine and I think it addresses Dries' questions, bumping this to critical as well since it introduces common.test - and a lack of common.test is blocking at least 3-4 patches elsewhere in the queue.

dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD. Thanks folks. :-)

kkaefer’s picture

Pleae make sure that "KB", "MB", ... are translatable. While they are identical to the English version in most cases, Germans usually use "kB" instead of "KB".

mrharolda’s picture

@kkaefer: good catch! I've created a separate issue (+patch) for the missing t() call in http://drupal.org/node/268477

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.

mrharolda’s picture

I'll leave this one closed, but I want to add that I've created a new patch for format_size() which can be found here: http://drupal.org/node/267883#comment-1095221