Problem/Motivation
According to the SI standard, there are 1000 bytes in a kilobyte.
There is another standard called IEC that has 1024 bytes in a kibibyte, but this is only useful when measuring things that are naturally a power of two, e.g. a stick of RAM.
https://en.wikipedia.org/wiki/Kilobyte
https://en.wikipedia.org/wiki/Kibibyte
https://en.wikipedia.org/wiki/Binary_prefix
Currently Drupal renders IEC quantities with SI units which is incorrect.
Most people find it very confusing and surprising for files to use IEC (1024 bytes in a kibibyte) because everything else in the world uses 1000 (litres, kilograms). Apple is leading the way by switching to SI: https://en.wikipedia.org/wiki/Binary_prefix#Operating_systems
Steps to reproduce
N/A
Proposed resolution
Update documentation to indicate why we handle the units the way we do. Read #91 for supporting reasons.
Remaining tasks
Locate the documentation that should have
User interface changes
N/A
API changes
N/A
Data model changes
N/A
Release notes snippet
Comment | File | Size | Author |
---|---|---|---|
#93 | handle-si-iec-units.patch | 49.11 KB | jbrown |
#93 | interdiff.txt | 15.86 KB | jbrown |
#89 | handle_si_iec_units-1114538-89.patch | 47.09 KB | deekayen |
#87 | handle-si-iec-units.patch | 48.7 KB | jbrown |
#87 | interdiff.txt | 3.15 KB | jbrown |
Comments
Comment #1
jbrown CreditAttribution: jbrown commentedmake comment lines < 80
Comment #3
jbrown CreditAttribution: jbrown commented#1: si-units.patch queued for re-testing.
Comment #5
lambic CreditAttribution: lambic commentedThis was posted 2 days late, right?
Comment #6
alexanderpas CreditAttribution: alexanderpas commentedFor reference, not only Apple is using the SI standard correctly, Linux kernel is too, as you can read in units(7)
It is my opinion, Drupal should always be using the correct SI and IEC units, except when handling a interface that does not handle those units correctly
Comment #7
jbrown CreditAttribution: jbrown commentedComment #8
jbrown CreditAttribution: jbrown commentedComment #9
pingers CreditAttribution: pingers commentedThis makes no sense to me.
Primary hosting platform is linux. It uses IEC (unless you go out of your way not to).
IMO it would be more confusing to have Drupal say one thing and command line say something else.
I.e.
$ dd if=/dev/zero of=testfile.txt bs=1000 count=1
$ dd if=/dev/zero of=testfile2.txt bs=1024 count=1
$ ls -al testfile*
-rw-r--r-- 1 michael michael 1024 2012-01-27 18:11 testfile2.txt
-rw-r--r-- 1 michael michael 1000 2012-01-27 18:10 testfile.txt
$ ls -alh testfile*
-rw-r--r-- 1 michael michael 1.0K 2012-01-27 18:11 testfile2.txt
-rw-r--r-- 1 michael michael 1000 2012-01-27 18:10 testfile.txt
Comment #10
oriol_e9gFor me we should follow the international standards, rather than be like a IE6. Drupal = Early Adopters.
Extra whitespace at the end:
Gubibytes? :)
The patch does not apply.
Comment #11
jbrown CreditAttribution: jbrown commentedComment #12
oriol_e9gWe are still using the word 'Gubibytes', sorry for my ignorance but, shouldn't it be 'Gibibytes'?
Comment #13
jbrown CreditAttribution: jbrown commentedAs documented in the comments, the purpose of that test is for a misspelt unit. Perhaps this test is unnecessary.
Comment #14
oriol_e9gOh, sorry! You're right, I had not seen this!
Maybe we need more opinions before the RTBC.
Comment #15
droplet CreditAttribution: droplet commentedConfusing.
It's patch going to get drupal accepts every units, right ?
user input :
10 KiB = 10 x 1024
10 kB = 10 x 1000
10 KB = (error ?)
Comment #16
jbrown CreditAttribution: jbrown commentedThe first thing parse_byte_count() does is strtolower() the input, so
10 KB = 10 x 1000
Comment #17
jbrown CreditAttribution: jbrown commentedreroll
Comment #19
jbrown CreditAttribution: jbrown commentedComment #21
oriol_e9g#19: si-units.patch queued for re-testing.
Comment #23
oriol_e9gFixing test!
Comment #24
oriol_e9gComment #26
oriol_e9gGGrrrrr
Comment #28
jbrown CreditAttribution: jbrown commentedI'll have an updated patch within an hour.
Comment #29
jbrown CreditAttribution: jbrown commentedComment #31
jbrown CreditAttribution: jbrown commentedComment #31.0
jbrown CreditAttribution: jbrown commentedFix wikipedia link
Comment #32
jbrown CreditAttribution: jbrown commented#31: si-units.patch queued for re-testing.
Comment #33
jbrown CreditAttribution: jbrown commentedImprove comments.
Comment #34
oriol_e9gLess than minor: for consistency I think we should always use a space between the number and its units:
7.4kB --> 7.4 kB, 5MB --> 5 MB, 40gibibytes --> 40 gibibytes, ...
For me it's RTBC
Comment #35
jbrown CreditAttribution: jbrown commentedThe reason some of the examples have a space and some of them don't is to illustrate that it works either way.
I fixed the ~1kB sized struct comment.
Comment #36
oriol_e9gGo with this, 17 months reviewing without substantial changes it's sufficient time.
Comment #37
Bojhan CreditAttribution: Bojhan commentedThe argumentation for this change is a bit poor, most people find it very confusing?
I am fine with following the standard that better matches file storage. But we don't really allow changes in, with this argumentation because do we know - what standard people expect here?
Comment #38
MrHaroldA CreditAttribution: MrHaroldA commentedThis issue fixes Drupal to not lie about how many bytes there are in a kilobyte. 1 kilogram of cheese is also 1000 grams so I'd expect that normal (cheese eating) users expect. Some will expect the lie, others just don't care.
I'm putting it back as RBTC since it is in fact reviewed by the community ;)
Comment #39
jbrown CreditAttribution: jbrown commentedI know this is controversial, so I don't mind if we use IEC or SI. Currently we are using IEC quantities with SI units which is totally wrong. We need to render one or the other properly by default.
Comment #40
catchI'm going to mark this back to needs review, the patch itself has had lots, but the implications definitely need more discussion.
Comment #41
MrHaroldA CreditAttribution: MrHaroldA commented@catch: Dries already said he was up for the 1000 bytes in a kilobyte back in 2008: http://drupal.org/node/151902#comment-858570
Comment #42
catch@MrHaroldA: what people said in an issue four years ago isn't the same as discussing it now.
Comment #43
pingers CreditAttribution: pingers commentedI think getting a patch in for IEC units would be a whole lot easier than changing the calculated values. If there's any code out there relying on this, we're only making string changes, rather than API changes. My 2 cents.
Comment #44
MrHaroldA CreditAttribution: MrHaroldA commented@catch: How can we get an updated opinion from Dries on this?
Comment #45
jbrown CreditAttribution: jbrown commentedI'm gonna re-roll this to change the default output to IEC.
Comment #45.0
jbrown CreditAttribution: jbrown commentedupdate function names
format output can now be IEC
Comment #46
jbrown CreditAttribution: jbrown commented@Dries also said that he didn't like the IEC units: http://drupal.org/node/151902#comment-857964 so I'm not going to update the patch to format IEC by default quite yet.
Basically the community needs to decide one way or the other, but we shouldn't leave it the way it is.
Comment #46.0
jbrown CreditAttribution: jbrown commentedUse summary template
Comment #47
droplet CreditAttribution: droplet commentedEither IEC/SI unit by default. It's patch will bring a new problem to Windows Users.
For example,
Select a file and check the size in windows explorer. It shows 500KB. and then upload it.. Oh ouch. Error message popup.
Add a note to recommend site builders use IEC unit would make the site friendly for most users.
Comment #48
MrHaroldA CreditAttribution: MrHaroldA commentedSame goes for real KB/kB/KiB users: select a 500kB file and upload it. It then shows as 488.2kB in Drupal so the user may assume that something went wrong in the upload.
As long as there are different implementation of the "same" unit, there will be confusion so it's probably best to settle on 1 definition of a kB. Future-wise that definitely must be 1kB = 1000 Bytes (SI) as Drupal 8 will be around for quite a while.
@jbrown I also got confused with the standards! ;) Thanks for pointing out: all KB's replaced with kB's!
Comment #49
jbrown CreditAttribution: jbrown commentedYeah - I think we should do it correctly (either SI or IEC), not work around other peoples bugs.
@MrHaroldA KB is not a unit anywhere. 1 kB = 1000 Bytes (SI), 1 KiB = 1024 Bytes (IEC)
Comment #50
jbrown CreditAttribution: jbrown commented@MrHaroldA your post in #48 is still not correct: 1kB = 1000 Bytes is SI, not IEC
Edit: okay - fixed now!
Comment #51
MrHaroldA CreditAttribution: MrHaroldA commented@jbrown: already fixed!
Need ... more ... coffee ...
Comment #52
jbrown CreditAttribution: jbrown commentedComment #53
jbrown CreditAttribution: jbrown commentedComment #54
msonnabaum CreditAttribution: msonnabaum commented#53: handle-si-iec-units.patch queued for re-testing.
Comment #56
jbrown CreditAttribution: jbrown commentedComment #58
jbrown CreditAttribution: jbrown commentedComment #60
jbrown CreditAttribution: jbrown commentedComment #61
jbrown CreditAttribution: jbrown commented#60: handle-si-iec-units.patch queued for re-testing.
Comment #62
jbrown CreditAttribution: jbrown commented#60: handle-si-iec-units.patch queued for re-testing.
Comment #64
MrHaroldA CreditAttribution: MrHaroldA commentedStill rooting for this to be committed to D8 ;)
Comment #65
jbrown CreditAttribution: jbrown commentedComment #66
oriol_e9g#65: handle-si-iec-units.patch queued for re-testing.
Comment #68
oriol_e9gWe still need a decision about using SI or IEC.
Comment #69
jbrown CreditAttribution: jbrown commentedReroll.
Comment #70
jbrown CreditAttribution: jbrown commented#69: handle-si-iec-units.patch queued for re-testing.
Comment #71.0
(not verified) CreditAttribution: commentedFix unit.
Comment #71.1
jbrown CreditAttribution: jbrown commentedFix links
Comment #71.2
jbrown CreditAttribution: jbrown commentedRemove Remaining Tasks section.
Comment #72
jbrown CreditAttribution: jbrown commentedReroll.
Comment #73
jbrown CreditAttribution: jbrown commentedComment #73.0
jbrown CreditAttribution: jbrown commentedAdd default formatting comment.
Comment #73.1
jbrown CreditAttribution: jbrown commentedFix e.g.
Comment #73.2
jbrown CreditAttribution: jbrown commentedUpdate description.
Comment #74
oriol_e9gCurrently we use IEC numbers with SI units. We only need to use IEC numbers/units or SI numbers/units for consistency.
Nobody can choose between SI or IEC for default formatting? 2 years for a simple decision is a lot of time O.o
Comment #75
mikl CreditAttribution: mikl commentedI think this is a good example of code bloat. Adding hundreds of extra lines of code that has to be maintained, parsed and run on every page load on every Drupal sites, all for the sake of some a minor unit difference that makes no difference what so ever to the user.
Worse, I think the changes around the PHP memory size are actively harmful:
"32M" is the value you need to set in php.ini. If you try to input "32 MiB", you will get a weird error.
And what's the great benefit of showing '31.5344424 MiB' (or whatever) on the user status page, rather than a simple '32M'?
Isn't there something more important to work on for Drupal 8?
Comment #76
MrHaroldA CreditAttribution: MrHaroldA commentedNot being able to tell how much MiB or MB a file is, is an utter faillure for a serious framework like Drupal. Postponing this to D9 in 2015/2016 seems like a great loss for a relatively straight forward bugfix.
But I agree, there are other importent issues too ;)
Comment #77
jbrown CreditAttribution: jbrown commented@mikl the purpose of this issue is to fix parsing and formatting of byte counts. How it interacts with PHP settings can be re-arranged.
Note that the formatting of PHP memory limit on the status page is in IEC format and so would not have the digits to the right of the decimal point.
Also, the patch does not add hundreds of lines of code. It adds 64 lines of code, most of which are in tests.
Comment #78
jbrown CreditAttribution: jbrown commentedUpdated the patch so DRUPAL_MINIMUM_PHP_MEMORY_LIMIT is not changed. This means that when Drupal is recommending the PHP memory limit to the developer it is in the same format required by php.ini
Comment #79
mikl CreditAttribution: mikl commentedI don't think this is a bugfix at all. The old way (where MB means 1024^2 bytes) is still the way most people are used to seeing it. Like it or not, but Windows is still the largest desktop OS out there, and what the huge majority of our end users are using. And be honest, when did you last use the word gibibyte in a sentence?
This change adds a whole lot of code weight, accomplishes nothing useful, and it violates the principle of least astonishment.
Comment #80
MrHaroldA CreditAttribution: MrHaroldA commentedSo if a populair operating system has the same bug, this isn't a bug?
People clearly need to be educated, not be kept stupid for thinking 1 kilo = 1024 grams.
Comment #81
tim.plunkettThis adds 40 lines of code outside of tests. Forty lines.
Usefulness is in the eye of the beholder.
Comment #82
mikl CreditAttribution: mikl commented…are forty too many.
This prefix mess is a mess, sure, but it's not our mess to fix, so the principle of least astonishment dictates that we do what the end users expect, namely the old way, even if it's not techically a correct use do SI prefixes.
Comment #83
MrHaroldA CreditAttribution: MrHaroldA commentedIt's still a bug. You can't mix IEC numbers with SI units. That's not an opinion, that's a fact.
Comment #84
deekayen CreditAttribution: deekayen commentedEven if Apple isn't dominating the computer OS landscape entirely yet, they're dominating mobile, and they seem to be dominating in attracting web developers. It seems odd to me that there'd be any discussion of not following the lead of Apple so that at least the web developers, who I've interpreted as mostly using Apple products these days, can have a confirmation on the website of what they see in Finder. Otherwise, if you can't reliably use it as a confirmation of what file you've likely uploaded, that defeats half the point of showing the size in the first place.
Comment #85
deekayen CreditAttribution: deekayen commentedThe @params in the parse_byte_count() and format_byte_count() docblocks should have type hints I think, e.g. @param int|string $count, @param bool $force_iec.
Comment #86
boombatower CreditAttribution: boombatower commentedRelated discussion from a while back #267883: parse_size() and format_size() do not use same kilo standard but using proper units seems like the thing to do.
Comment #87
jbrown CreditAttribution: jbrown commentedAdded the type hinting.
Comment #88
alexanderpas CreditAttribution: alexanderpas commentedIs this an intended change, considering the documentation?
Comment #89
deekayen CreditAttribution: deekayen commented#87 didn't apply for me anymore. File module stuff got moved around two days ago.
I re-rolled it and removed the (int) declaration noted in #88.
Comment #90
jbrown CreditAttribution: jbrown commented#2050133: Refactor drupal_check_memory_limit() to use Symfony's MemoryDataCollector class really needs to go in before we can make progress on this issue.
Comment #91
mikl CreditAttribution: mikl commentedOkay, let's just stop for a second, and review how these units are currently used across the industry:
Windows
Consistently binary (the old way)
OS X/iOS
Inconsistent. Decimal used for hard disk space in the Finder (in accordance with SI), binary used for RAM (in accordance with JEDEC). Binary used almost anywhere else (ls -lah, for example).
Linux/FreeBSD
Inconsistent, depending on which graphical shell (if any) you're using. CLI tools still using binary.
I don't own any Android devices, so I haven't been able to test those.
Bottom line
I don't there's a really good solution here.
Additionally, we would be doing the wrong thing for memory sizes (ie. RAM limits), since RAM measurement follows the JEDEC standard, which specifies that RAM should be measured binarily.
mebibytes and gibibytes), we will introduce a unit for disk size measurements that our users have likely never seen before, thereby confusing them.
In summary, I think we should leave this alone. It's a nasty mess, and there's no solution that's correct everywhere.
Comment #92
deekayen CreditAttribution: deekayen commentedI think it probably makes sense to have the memory limits set in php.ini match the output on the Drupal side. A max upload of 10M in php.ini ought show as 10M in Drupal, lest create confusion; same for memory_limit. I don't think displaying "32M" for memory_limit from php.ini, showing as "32M" on the status page, means we can't show SI prefixes for metadata on file uploads, though. That would seem to be a similar conclusion to what Apple already implemented.
Comment #93
jbrown CreditAttribution: jbrown commentedThe patch already renders the PHP memory limit as IEC, so there is no problem there.
If we want filesizes to be rendered as SI (yes) then max upload sizes must also be in SI. This is fine if the max is defined in field admin, but what if it comes from a PHP setting? The solution is to interpret these as SI, so when they are rendered as SI the value is the same as in the setting. Technically this is incorrect, but it is the same behaviour as we have at the moment and it means we can have correct behaviour everywhere else without any weird numbers popping out.
I've introduced Symfony\Component\HttpKernel\DataCollector\MemoryDataCollector from #2050133: Refactor drupal_check_memory_limit() to use Symfony's MemoryDataCollector class to the patch as it makes everything a lot simpler.
I removed the changes relating to checking memory_get_usage() exists as it is in #2049167: Don't use function_exists() on functions that are built into PHP 5.3.
Interdiff from #89.
Comment #94
mikl CreditAttribution: mikl commented#93: Don't try to skirt the issue here. There is no correct here. We can choose between following SI, IEC or JEDEC rules. I outlined the problems in #91.
No, that's actually incorrect. It should use JEDEC, ie. classic binary units.
Comment #95
jbrown CreditAttribution: jbrown commentedThis issue is about moving away from JEDEC. We need to use the modern SI standard so that there isn't 1024 bytes in a kilobyte (which is something 99% of people think is totally weird).
The reason we need to use IEC for the PHP memory limit is because this value always has a base of 1024 and we need to distinguish it from SI.
JEDEC is really for physical memory, not a quantity of memory defined in software.
The JEDEC Wikipedia page says:
Comment #96
mikl CreditAttribution: mikl commentedWe don't actually need to change anything. The current state of things works fine, and I've never heard of any problems.
Additonally, SI isn't particularly modern. ~200 years old. Classic appeal to authority fallacy.
So instead of just showing the actual value from the php.ini configuration, you want to use obscure prefixes that are not used on any major computing platforms.
I understand that you'd like Drupal to be more technically correct, but the goal of the Drupal project is not to educate end users on scientific measurement standards, and as such I don't think the added costs of code complexity and user confusion are justified.
Comment #97
LarsKramer CreditAttribution: LarsKramer commentedNot complying with measurement standards is not necessarily a bug, especially not in the case when such standards are not already in widespread use.
Comment #98
alexanderpas CreditAttribution: alexanderpas commentedThe SI standards are already in widespread use.
Bit Rate: http://en.wikipedia.org/wiki/Bit_rate
Linux Kernel: http://man7.org/linux/man-pages/man7/units.7.html
Apple: http://support.apple.com/kb/TS2419
Ubuntu: https://wiki.ubuntu.com/UnitsPolicy
Comment #99
mikl CreditAttribution: mikl commentedFirst of all, SI is not a standard, it is a unit system. Secondly, I don't think anyone's arguing that SI itself is not widely used.
However, as I outlined in #91, the majority of computing platforms still use binary units to some extent, and Windows still use binary units exclusively.
And the IEC kibibyte/mebibytes are not used on any system a normal Drupal end-user would have been exposed to.
Comment #99.0
mikl CreditAttribution: mikl commentedAdd bold.
Comment #108
quietone CreditAttribution: quietone as a volunteer commentedComment #109
quietone CreditAttribution: quietone as a volunteer commentedComment #113
quietone CreditAttribution: quietone at PreviousNext commentedThe related issue, #451404: Incorrect abbreviations in format_size, was discussed in #bugsmash, 3 months ago and we (darvanen, lendude, borisson_, catch and myself) all forgot to update the issue. I am doing so now.
The comment by miki in #91 provides good reason to leave this alone. The people in the meeting agreed as well. That makes this issue a won't fix. However, instead maybe this issue should be used to update documentation to explain what we are doing.