Problem/Motivation

Drupal will upload files into one directory by default. Users can affect this by changing configuration of file/image fields, but real-world experience shows that they don't. Uploading thousands or millions of files into one directory is a very bad idea, which can have performance implications too. There are reports of sites going down just as a result of this issue.

Proposed resolution

Let's change default location configuration variable in file/image fields. This variable is currently empty, which results in all files being uploaded in a root folder (usually public://). Since use of tokens is supported we can provide configuration that will dynamically adapt for different fields.

[field-storage:name]/[date:custom:Y]-[date:custom:m] was preferred but couldn't make it in time for RC1 so we've decided to simplify the patch to minimize the disruption and file #2592251: Uploading files should have more robust default tokens instead.

We're now going with [date:custom:Y]-[date:custom:m] only for all fields. On top of that we've decided to upload user avatars under pictures/[date:custom:Y]-[date:custom:m] and to remove file_directory: field/image completely from field.field.user.user.user_picture.yml since it would be duplicating what the default is anyway.

Remaining tasks

  • Evaluate (and document in the issue summary) impact of showing internals like field machine names to end users whose browsers load an asset and expose the filename (and thus the path in the url) Attempt at answering this in #141
  • Validate the approach is confirm the rc target requirements
  • Review change record

User interface changes

Site builders now see the default tokens in the File directory field for files/images. This can still be completely customized to suite their needs.

API changes

None.

Original issue summary by @meba

It is a well known fact that if you upload thousands of files into a single directory you will have performance issues on certain systems such as Cloud systems using GlusterFS, NFS or other shared filesystems or simply slow hard drives. The reason being that list operations on that single directory take too much overhead over network.

The solution typically is to ensure there is no more than a couple of thousands of files in a single directory in your files/. This can be easily achieved through dividing directories by date:

30/12/2013/file.jpg
29/12/2013/file2.jpg

https://drupal.org/project/filefield_paths provides this functionality but many people do not install it when they create their website and migration is complicated.

I propose that we automatically add date based tokens to all file system paths in Drupal core and allow modules such as FileField Paths to override these tokens to provide advanced functionality.

Beta phase evaluation

Reference: https://www.drupal.org/core/beta-changes
Issue category Task because it's not really new functionality, but it's not really a bug as nothing is broken.
Issue priority Normal because it's not a blocker for anything.
Prioritized changes The main goal of this issue is usability.
Disruption This issue should have no disruption for any existing APIs, modules, or sites.
CommentFileSizeAuthor
#231 drupal-n2128055-231.patch2.65 KBDamienMcKenna
#25 Added_data_pattern_in_destination_folder-2128055-25.patch1021 bytesestoyausente
#28 Added_data_pattern_in_destination_folder-2128055_2.patch1005 bytesestoyausente
#29 added_data_pattern_in_destination_folder-2128055_28.patch1.02 KBstijntilleman
#42 2128055_42.patch1.09 KBslashrsm
#44 2128055_44.patch4.55 KBslashrsm
#44 interdiff.txt3.75 KBslashrsm
#50 2128055_50.patch6.16 KBslashrsm
#50 interdiff.txt3.57 KBslashrsm
#52 interdiff.txt1.28 KBslashrsm
#52 2128055_52.patch6.28 KBslashrsm
#56 interdiff.txt949 bytesslashrsm
#56 2128055_55.patch6.29 KBslashrsm
#64 2128055_65.patch6.26 KBrteijeiro
#64 interdiff.txt849 bytesrteijeiro
#69 2128055_69.patch6.26 KBjonhattan
#88 2128055_88.patch5.27 KBslashrsm
#93 2128055_93.patch5.26 KBrootwork
#98 2128055-98.patch5.43 KBanavarre
#103 2128055-102.patch5.46 KBanavarre
#103 interdiff-98-102.txt1.63 KBanavarre
#107 2128055-107.patch5.74 KBanavarre
#108 2128055-108.patch5.47 KBanavarre
#108 interdiff-102-108.txt1.52 KBanavarre
#109 2128055-109.patch5.25 KBanavarre
#109 interdiff-108-109.txt2.92 KBanavarre
#121 2128055-121.patch6.51 KBnielsvm
#121 interdiff-109-121.txt3.69 KBnielsvm
#126 2128055-126.patch6.4 KBanavarre
#126 interdiff-121-126.txt2.62 KBanavarre
#132 2128055-132.patch6.4 KBanavarre
#132 interdiff-126-132.txt2.62 KBanavarre
#145 machinename.png40.93 KBnielsvm
#162 2128055_162.patch4.31 KBslashrsm
#162 interdiff_132_162.patch5.02 KBslashrsm
#168 2128055_169.patch4.31 KBslashrsm
#176 2128055-176.patch4.33 KBtimhilliard
#176 interdiff-168-176.patch1.3 KBtimhilliard
#182 2128055-181.patch4.33 KBtimhilliard
#182 interdiff-168-181-do-not-test.patch1.3 KBtimhilliard
#185 2128055_185.patch4.32 KBtimhilliard
#185 interdiff-168-185-do-not-test.patch663 bytestimhilliard
#188 2128055-188.patch4.27 KBtimhilliard
#188 interdiff-168-188.txt1.24 KBtimhilliard
#191 2128055-191.patch5.28 KBeffulgentsia
#191 interdiff.txt1.99 KBeffulgentsia
#207 2128055-207.patch3.27 KBvijaycs85
#210 2128055-210.patch3.36 KBzopa
#210 interdiff-2128055-207-210.txt2.77 KBzopa
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

attiks’s picture

I don't think it has to be that many levels, we could use 20131230 so we only create one extra sub directory.

In the past we used something like substr(md5($file_name), 0, 4);, but maybe a date is easier for all use cases.

Dries’s picture

Yesterday, a big Drupal site was down for several hours because Drupal uploaded hundreds of thousands of files into a single directory. The amount of files caused severe problems for the underlying filesystem, leading to data corruption. We should have safe defaults, so I propose providing a mechanism in core to prevent this kind of data corruption.

Dave Reid’s picture

I think just [current-date:custom:Y]/[current-date:custom:m] (e.g. 2013/11) would be good.

attiks’s picture

#3 We did a project last month where we imported 30.000 files, so Y/m isn't enough, can we settle for Ymd (without /)?

pdrake’s picture

One problem with using dates in the URL for files is that users may assume that it is a timestamp of sorts for the file, which may lead to confusion about the freshness of content. The date in the URL has no relation to the creation time of the content itself, just the date on which the file was uploaded (or perhaps the date on which it was migrated). Additionally, as attiks said, certain situations (such as content migration) may result in the creation of too many files on a given day.

Crell’s picture

I agree that we should have a dynamic default, but it should not be date-based. I would recommend something like the node type or field type (or both). That makes it very easy and predictable, but still helps to reduce the "crapload of files" problem. It doesn't help if you're going to have 300,000 nodes of the same type, all with files attached, but if you're building that scale site you should set up custom rules anyway. (Perhaps better help text in the latter case could be helpful?)

meba’s picture

You would be surprised about the amount of sites that are not "enterprise" but have this many files.

Let's discuss *how* to make this happen and choose a pattern later. Right now we agree on the need for the pattern, we have the ability to choose it later.

saltednut’s picture

I agree with #6 that sorting by date makes things less searchable. I am less concerned about whats suggested in #5 since we are going to have a confusing URL in making things granular in most cases. I would suggest a pattern of [entity]/[bundle]/[field]/[id]/my_snarky_meme.gif for a file. This allows developers to easily find files and should prevent too many files from being stored in a single directory. The edge case would be a node with a multifield with millions of attachments to that field. It doesn't seem a likely scenario but its possible someone could do it.

Dave Reid’s picture

We already have a way to make this happen - there is a setting in the file field configuration, but it's default is nothing, which means all files get dumped into the public directory. Discussing how to make this happen means choosing a pattern now.

Dave Reid’s picture

@brantwynn: Always consider that adding files to new content means we don't have context such as ID. Also, we'd need to fix that the token replacement when the file uploaded has no context related to it's parent entity at all (see file_field_widget_uri() and nothing that calls this function uses the $data parameter.

saltednut’s picture

@Dave Reid Could we get around this by storing newly saved files in a tmp directory and then moving them after the entity is saved?

Dave Reid’s picture

@brantwynn: Sure, it would require some major changes to file fields, but it might be worth a try in order to get that context. I'm sure the maintainers of filefield paths would appricate that work going into core.

slashrsm’s picture

I think we should think about this out of field context. We can create default pattern for fields, but we should not forget that files can come from other sources (there is support for WYSIWYG in D8 now, modules like Media, etc.).

Dave Reid’s picture

@slashrsm Indeed. Also images could via contrib be attached to multiple entities, so I think the same cons for a date approach also apply to an entity/ID approach.

meba’s picture

Any pattern that is not going to be based on either semi-random / date information will fail the same way as no pattern. Remember - most sites have one major content type and one major image field.

Let's summarize:

  • Field based token: Simple and understandable to end users but has potential issues with performance
  • Date based token: Might confuse users who are unsure whether this is date of being uploaded, updated or anything else. Migrate module might complicate things if many images uploaded at the same time
  • Random (think first couple of letters of md5 of filename): Performant, does not confuse users as date does but I wonder if it will still confuse users because searching a filesystem will now be harder if I SSH in. People will ask "What are those strange characters in my URL?"
attiks’s picture

There;s another use case, people using IMCE plugin for wysiwyg, they probably will have a hard time finding their images. No idea if these type of module still exists in Drupal 8, but in general if we start moving images, we will make it hard for all modules trying to access the file system directly.

Crell’s picture

A value unique to the entity isn't going to help much. Instead of 500,000 image files in one directory, you get a directory with 500,000 directories in it, each of which has one file. This is not an improvement.

I think we need to be careful of our goal here; it shouldn't be to "solve" the performance problem. It should be to mitigate it. Whatever we do, we're going to have patterns that result in epic fail in some use cases. We should try to have a default pattern that isn't going to epic fail in the most common use cases, and *suggests to users that they can setup their own rules* based on their use case. ("Teachable moment", as they say in the Ed biz.)

slashrsm’s picture

It looks that we have two options here:

- We use something file-related for sub-folder token. Upload date seems the most appropriate in this group, but it has some obvious problems. Pros: semantically nice urls, easy to understand, seen elsewhere (Wordpress?); Cons: possibility of migration problems (not if timestamp is also migrated over), may cause confusion among users and less experienced developers.
- Pseudo-random sub-folder token. I personally like the approach that hash_wrapper takes, since it ensures good dispersion by design (seen it in action on a site with tens millions of files and it works great). We could also use parts of UUID or filesize's modulo. Those two methods also ensure quite good dispersion of files and seem easier to calculate (compared to hashing) and probably a bit easier to understand (compared to hash_wrapper). Pros: good dispersion in most use-cases, no problems with migrations; Cons: harder to understand, does not provide nice/semantic URLs, harder to directly browse file system.

I completely agree with @Crell about the "teachable moment". People should be aware about implications of huge file libraries and not simply follow defaults we provide. While this might work for smaller sites it is definitely not acceptable for high-end projects.

Fidelix’s picture

How about these:

{field_machinename}/{year}-{weekoftheyear}
{field_machinename}/{year}-{dayoftheyear}
{field_machinename}/{year}-{month}-{day}/

Crell’s picture

field name/year-month seems like a reasonable "common default case" to me. (Most sites won't need per-day, I suspect.)

Fidelix’s picture

I proposed "week of the year" because it won't be recognizable by the average user looking at the folders thinking that's the image timestamp, neither it's too random/dirty like a MD5 or another type of hash.

But yeah, it we take that aside then field_name/year-month is good enough, I think.

askibinski’s picture

I would prefer not using field/entity machine names in the paths because files can be uploaded from outside a field (wysiwyg editor) or media module.

Year + Month + Day (eg. /2014/01/21/) should probably be enough for most cases.

slashrsm’s picture

How do we want to implement that? As as default configuration on image/file fields or on a lower level (more general approach)?

Crell’s picture

Just a default on newly-created file fields, IMO. KISS. This is not quite a Novice issue, but not far from it.

estoyausente’s picture

Issue tags: +D8SVQ, +#SprintWeekend2014
FileSize
1021 bytes

Something like this?

I put the next pattern: {field_machinename}/{year}/{month}/

If somewhere needs a better implementation for this topic because he has thousand of files in a month... If you want I can try other implementation. I accepts suggestion.

stijntilleman’s picture

Shouldn't it be the other way around?

If the user set an upload location you add /{year}/{month} to that location. The way I understand this issue is that if nothing was entered by the user the token will automatically filled with {field_machinename}/{year}/{month}. That way the files root will not be cluttered with files.

estoyausente’s picture

@stijntilleman mmm... I'm not sure. It's posible, yes. I'm going to change and test again.
Do you think that is a correct pattern?

estoyausente’s picture

I change the patch like #26 told me.

stijntilleman’s picture

@estoyausente I tryed applying your patch but it didn't work. I manually created the patch and this one should apply.

Crell’s picture

Hm, this is doing the work at runtime. I was thinking we just set the pattern by default in filefield, so it shows in the UI. That helps suggest to people that it's changeable.

Also, remember to set the issue to "needs review" when you post a patch so that the testbot can find it.

estoyausente’s picture

Status: Active » Needs review

Thanks @stijntilleman (I'm novice and sometimes have problem with the patchs ^^).

@Cell done. Now it have need review status.

And... I'm not sure about your suggest. In the filetype fileld form, show an input with a pattern, for configure the fieldpattern? It sound great. But I'm not sure if I can do it. I Never work with form in D8. I will try it.

The last submitted patch, 28: Added_data_pattern_in_destination_folder-2128055_2.patch, failed testing.

The last submitted patch, 25: Added_data_pattern_in_destination_folder-2128055-25.patch, failed testing.

Anonymous’s picture

Has any decision been reached about how things will be done in D8?

- Pseudo-random sub-folder token. I personally like the approach that hash_wrapper takes, since it ensures good dispersion by design (seen it in action on a site with tens millions of files and it works great). We could also use parts of UUID or filesize's modulo. Those two methods also ensure quite good dispersion of files and seem easier to calculate (compared to hashing) and probably a bit easier to understand (compared to hash_wrapper). Pros: good dispersion in most use-cases, no problems with migrations; Cons: harder to understand, does not provide nice/semantic URLs, harder to directly browse file system.

I love this one! I think the cons are quite minor, as you should be handling files through Drupal and not directly through the filesystem. This way of doing it would also help to identify and not duplicate identical files, a nice benefit.

erwangel’s picture

From user's side experience, I can say that an alphabetical/lexicographical order on directory tree is more meaningful to a user than a day or any other number split, although this second option has some sens to news sites. I am operating both news and e-commerce sites.
- For news sites based on drupal I have this system :
$files/file-type/optional-content-type/year/month/day/filename.ext
- For user generated content sections like picture galleries I have this system :
$files/file-type/user/year/month/day/filename.ext
- For e-commerce sites based on Magento I use this architecture :
$files/optional-category-or-brand/firstletter/secondletter/filename.ext
where firstletter and secondletter derive from filename, i.e. "f" and "i" => f/i/filename.ext

So architecture make sens depending to the use case. We sometimes have to locate a file in a filesystem architecture and then it should be intuitive. There is no ideal situation, but we can give the choice in initial settings of uploading file path to use a numerical or an alphabetical subdivision system along with some tokens about file-type, user-name or content-type. I think this can be easy to configure and understand even to starters and it covers a large number of usage cases.

pbattino’s picture

Pardon the question, but why this is not considered a "D7 also" issue? I know nothing about the main structural differences between D7 and D8 as far as files are concerned, but having a look at this patch seems no D8 specific.

Is is possible to have this issues backported?

batigol’s picture

Heh, yea it was one of the biggest problem for me - multidomains and multilanguage markets site - real pain in the ass.

Media was totally out, I did use:

https://www.drupal.org/project/insert
https://www.drupal.org/project/imce_filefield
https://www.drupal.org/project/filefield_paths
https://www.drupal.org/project/filefield_sources
https://www.drupal.org/project/colorbox

and File (Field) Path settings was something... like this - [current-date:custom:Y-m]/[node:nid]-[node:language]-[node:domain:id]

kerios83’s picture

Please remember that some images need to used twice or more times on the same page.

alimac’s picture

Issue tags: -#SprintWeekend2014 +SprintWeekend2014

Minor tag cleanup - please ignore.

Status: Needs review » Needs work

The last submitted patch, 29: added_data_pattern_in_destination_folder-2128055_28.patch, failed testing.

slashrsm’s picture

Status: Needs work » Needs review
FileSize
1.09 KB

Attached patch follows the suggestion that @Crell gave many times in this thread. It simply adds a default file directory pattern on a file/image field. It uses a workaround for field name replacement as we don't have a token for that in core. Implementing that in this patch would be a) completely out of scope and b) would make the patch much, much bigger. If the workaround is not acceptable I'd rather simplify the default pattern instead.

Status: Needs review » Needs work

The last submitted patch, 42: 2128055_42.patch, failed testing.

slashrsm’s picture

Status: Needs work » Needs review
FileSize
4.55 KB
3.75 KB

Test fixes.

Crell’s picture

Status: Needs review » Needs work
+++ b/core/modules/file/src/Plugin/Field/FieldType/FileItem.php
@@ -269,6 +269,9 @@ public function getUploadLocation($data = array()) {
+    // There is no token for field name so we have to use a little workaround.
+    $destination = str_replace('[field_name]', $this->parent->getName(), $destination);

Let's go ahead and add the token if we can. This sort of code debt is a time bomb, and the sort of "good enough for now" that has gotten us in trouble before. :-) If we're going to do it, let's do it right.

I also don't really care for the year/month being in the default path. That feels unnecessary to me.

slashrsm’s picture

I agree. That is a nasty hack :).

How do we want to approach this? Add proper token support for fields in a separate issue and postpone this until that gets in? Which field-related tokens we want?

Crell’s picture

I'd have to see what field related tokens exist now; My knee-jerk thought is to add just what we need for this issue, in this issue. We'd also need a beta eval added...

slashrsm’s picture

I believe there are none at the moment. That was the main reason for the nasty hack in #44. We only need field name I believe.

Crell’s picture

Let's add it then. Hacks--

slashrsm’s picture

Status: Needs work » Needs review
FileSize
6.16 KB
3.57 KB

Done.

Crell’s picture

Status: Needs review » Needs work
  1. +++ b/core/modules/file/src/Plugin/Field/FieldType/FileItem.php
    @@ -269,6 +269,8 @@ public function getUploadLocation($data = array()) {
    +    $data += ['field-storage' => $this->getFieldDefinition()->getFieldStorageDefinition()];
    +
    

    Do we still need this part now?

  2. +++ b/core/modules/file/src/Plugin/Field/FieldType/FileItem.php
    @@ -269,6 +269,8 @@ public function getUploadLocation($data = array()) {
    @@ -310,8 +312,16 @@ public static function generateSampleValue(FieldDefinitionInterface $field_defin
    
    @@ -310,8 +312,16 @@ public static function generateSampleValue(FieldDefinitionInterface $field_defin
         $random = new Random();
         $settings = $field_definition->getSettings();
     
    +    /**
    +     * Prepare upload destination.
    +     *
    +     * @see FileItem::getUploadLocation()
    +     */
    +    $destination = trim($settings['file_directory'], '/');
    +    $destination = \Drupal::token()->replace($destination, ['field-storage' => $field_definition->getFieldStorageDefinition()]);
    

    This may be out of scope as well, but... \Drupal:: inside a class is a code smell. Always, without exception. What's this method doing that it needs to run a token replacement? It says "generate sample value", which... shouldn't need a service for that, should it? Is that a pre-existing design flaw in the field system?

  3. +++ b/core/modules/file/src/Tests/FileFieldPathTest.php
    @@ -17,20 +17,28 @@ class FileFieldPathTest extends FileFieldTestBase {
    +      $this->container->get('date.formatter')->format(REQUEST_TIME, 'custom', 'Y') . '/' .
    +      $this->container->get('date.formatter')->format(REQUEST_TIME, 'custom', 'm') . '/' .
    

    Grab this service once and use it twice. Doesn't make a big difference in a test but it's a good habit to get into. :-)

slashrsm’s picture

Status: Needs work » Needs review
FileSize
1.28 KB
6.28 KB

Thank you. Glad to see we're moving this into the right direction. :)

  1. Do we still need this part now?

    This was added with field tokens. Token service needs field storage definition in order to process them and here we're making sure the service is always there.

  2. This may be out of scope as well, but... \Drupal:: inside a class is a code smell. Always, without exception. What's this method doing that it needs to run a token replacement? It says "generate sample value", which... shouldn't need a service for that, should it? Is that a pre-existing design flaw in the field system?

    Since this is static context we can't inject. We could just skip token replacement (as we do now), but that assumes we'll never use field definition with non-empy "file_directoy" configuration variable in here. Sounds wrong to me.

  3. Grab this service once and use it twice. Doesn't make a big difference in a test but it's a good habit to get into. :-)

    Done. Also noticed unnecessary file_load() few lines above. Fixed that too.

Crell’s picture

Category: Feature request » Task
Issue summary: View changes
Status: Needs review » Reviewed & tested by the community

I'm not going to try and critique the design flaws of the token API at this time, so I guess let's move forward. :-) Thanks!

Adding beta evaluation, too. I don't think we need a change notice for this.

willzyx’s picture

+++ b/core/modules/field/field.module
@@ -13,6 +13,7 @@
+use Drupal\Component\Utility\String;

@@ -339,3 +340,49 @@ function field_form_config_admin_import_form_alter(&$form, FormStateInterface $f
+          $replacements[$original] = $sanitize ? String::checkPlain($field_storage->getName()) : $field_storage->getName();

String::checkPlain was deprecated (will be removed before Drupal 8.0.). Can we start to use \Drupal\Component\Utility\SafeMarkup's methods?

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs issue summary update, +Needs change record

Re #54 yep we shouldn't be adding new usages.

Also the issue summary could be updated to say what the patch is actually doing. Plus it might be good to have a site builder CR to tell people that this default is changing.

slashrsm’s picture

Status: Needs work » Needs review
FileSize
949 bytes
6.29 KB

Fixed #54.

slashrsm’s picture

Issue summary: View changes
slashrsm’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: -Needs issue summary update, -Needs change record

Issue summary updated and change record draft created. This was technically already reviewed and was RTBC in the past.

Crell’s picture

I must have missed SafeMarkup becoming public in favor of String... Latest patch still RTBC IMO.

Berdir’s picture

@Crell: #2454447: Split Utility\String class to support PHP 7 (String is a reserved word)

Patch looks fine to me, would be useful to have a "by default" or so in the issue title/commit message I think.

slashrsm’s picture

@Berdir: like this?

alexpott’s picture

Title: Files should automatically be uploaded to directories based on a token » Files should be uploaded to directories based on tokens by default

@Berdir: like this?

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/modules/file/src/Plugin/Field/FieldType/FileItem.php
@@ -310,8 +312,16 @@ public static function generateSampleValue(FieldDefinitionInterface $field_defin
+    /**
+     * Prepare upload destination.
+     *
+     * @see FileItem::getUploadLocation()
+     */

This is an odd inline comment.

rteijeiro’s picture

Status: Needs work » Needs review
FileSize
6.26 KB
849 bytes

Fixed. Is this way correct? Could not find anything in [#1354]

toamit’s picture

I did not see this issue until now, so I'd request you folks to take a look at this proposal as well for file handling.
https://www.drupal.org/node/2485807

toamit’s picture

In current patch, which tokens will be available to user besides year/month. Will they include system wide tokens like [nid], [uid] etc?

Year token is mostly useless, as we will get handful of folders. Month gives us 12 folders, a bit of improvement, but does this really solve the original problem?
field-storage/[uid]/[nid] approach will be more scalable for authenticated uploads (if anonymous uploads are permitted that is a different can of worm)

Crell’s picture

#64 looks fine to me; not RTBCing myself because I'd prefer to have someone from token and/or entity API sign off on it. Who should we get for that?

woprrr’s picture

+++ b/core/modules/field/field.module
@@ -339,3 +340,49 @@ function field_form_config_admin_import_form_alter(&$form, FormStateInterface $f
+        'needs-data' => 'field-stoarge',

A small error i think ;)

jonhattan’s picture

FileSize
6.26 KB

Patch doesn't apply any more due to core changes. Rerolled and fixed #68

Crell’s picture

Status: Needs review » Reviewed & tested by the community

Going out on a limb...

webchick’s picture

Status: Reviewed & tested by the community » Needs review
Issue tags: +Needs usability review

Hm. A bit torn here. :\

I've definitely hit this problem myself on sites with tons of file uploads, and it's extremely painful to deal with if it happens to you.

However, I'm concerned that the beta evaluation may be underselling the disruption here, especially to end users using modules such as Media Browser or IMCE or what have you that provide a browser to add existing uploaded files to your content. These content authors (aka, the least technically savvy users of a Drupal site) are now going to have to click 3+ times into deeply nested folders, named firstly after system internal details (particular filefield/imagefield machine name, which is not exposed to them anywhere), and secondly after "what day month did you upload it?" (as if anyone can remember that :P) in order to find files/images that they uploaded in the past.

So I wonder if it'd be better to kick this to 8.1.x instead, and postpone it in on having a file browser built into core with a search box by file name?

Not doing that postponing yet so we can discuss it a bit more, but for now tagging for usability review, since the beta eval says this is ostensibly for usability and there's no one from the UX team who's chimed in here yet.

Crell’s picture

webchick: Honestly, I don't recall the last time I used the file system directly via any module, and I've recommended against doing so for the past 8 years or so. The internal organization of the files directory is a Drupal implementation detail, and anyone relying on that is still thinking in Drupal 5 terms at best. Drupal manages files for you, and provides categorization, tagging, and whatever other organization. That modules still exist that encourage you to break through that abstraction and dig into the file system directly is both surprising and depressing.

IMO, we should view the organization within the files directory the same way we do the DB schema now in D8: An implementation detail that you should not rely on, and if you do then you're on your own.

slashrsm’s picture

I agree with @Crell. There are two additional arguments that are in favour of this patch IMO:

1. It is still possible to change file_directory pattern, which allows people that might be using IMCE or anything similar to simplify their folder structure.
2. If I have to choose between default configuration that works nicely with solutions that we're trying to avoid in the future anyway and avoiding potential serious performance problems I'd definitely pick the latter.

toamit’s picture

This patch is quite useful, not only it provides customization as earlier, but provides scalability those who need it. The usability issue can simply be handled with a short description in UI.

Its also high time that IMCE and its kin add a feature which uses restrictions imposed by content type rather than provide global browsable/upload solutions. Ofcourse those who need or like global files as spaghetti could change the path to /files, etc

webchick’s picture

A second question would be if there are existing patterns we can pull from other CMSes for this. I have a feeling none of them would store files in top-level folders called field_txn_img (for example), but rather by date or whatever instead. That would make this a lot less controversial if it became a "bring Drupal inline with existing CMS standards" than "Let's make up a file directory structure that will result in fewer large file repos, but pushes system internals to content authors."

Crell’s picture

See #3-#17 for a discussion of what the default path should be. We already went over this 2 years ago.

The current path *is* storing by date, but also by field. Field is a decent substitute for "type" or "use case" in a typical setup. (All, no, but all setups was never the goal of this issue.)

Also, this default does not "pushes system internals to content authors." As noted in #72-#74, content authors should be blissfully unaware of this entire detail. If they somehow need to click through these directories to get to a file, it's because they're using a contrib module that is, quite frankly, doing it wrong and it's that contrib module's problem. Using just core, a content editor will never see this path, period, ever. So "what other CMSes do" is completely irrelevant here.

All this patch does, in the end, is this:

+++ b/core/modules/file/src/Plugin/Field/FieldType/FileItem.php
+++ b/core/modules/file/src/Plugin/Field/FieldType/FileItem.php
@@ -50,7 +50,7 @@ public static function defaultStorageSettings() {

@@ -50,7 +50,7 @@ public static function defaultStorageSettings() {
   public static function defaultFieldSettings() {
     return array(
       'file_extensions' => 'txt',
-      'file_directory' => '',
+      'file_directory' => 'field/[field-storage:name]/[date:custom:Y]/[date:custom:m]',

Everything else is just support code for changing the default value in the UI from an empty string to default that is less likely to result in giant directories (less likely, not guaranteed), and by extension shows site builders (not content editors, they're entirely irrelevant here) that they can (and should) set up a path structure for their files (which can be whatever the heck they want). That's it.

It only took us 2 years to add a default string because tokens. :-)

webchick’s picture

If it's the default, that means we are making a statement that we think it'll work for 80% of sites, and so I don't think asking for 10 minutes of research into what a couple of other CMSes do so we can (possibly) match them is out of line.

slashrsm’s picture

Wordpress uses [Y]/[m]/filename.ext as far as I know.

rootwork’s picture

Here's what I've found:

  • WordPress uses uploads/Y/m. source
  • Joomla puts everything in images. source Using the popular Joomla Content Editor it separates out all article-related uploads but puts them all in images/stories. source
  • Squarespace uploads files to a CDN and uses a hash. Most users never refer directly to the filename. source
  • Concrete5 puts everything into an upload directory, which may or may not be on the same server. Most users never refer directly to the filename. source, detailed source
  • Adobe Experience Manager puts everything in Digital Assets/[sitename]/[content_type] but within the content types everything is in the same folder by default. source
  • Sitecore puts everything in Media Library. source
  • EZ Publish puts uploads into storage/original/[mime_type] and then stores files with a hash. Most users never refer directly to the filename. source
  • Typo3 appears to put everything into uploads, though I'm not completely sure if it creates subfolders. source
  • OpenText requires registration to see any documentation. Sorry. Here's a vague source and some more vague information on their cloud storage.
  • SharePoint 2013 stores all files in the database. The only access to uploaded files is through the SharePoint interface. source Alternatively the files can be uploaded to cloud storage; access is still (only) through the SharePoint interface. source

Hope that helps!

Crell’s picture

Status: Needs review » Reviewed & tested by the community

Thanks, rootwork! My read of #79:

* The hosted services use a CDN or similar with file name hashes.
* Several use one-big-dumping-folder, which we're trying to avoid here.
* Among those that use some sort of segmenting, dates, content type, and file mime type(!) are all used by somebody but there's no consistent convention.
* "Most users never refer directly to the filename" seems to be popular, and is what Drupal has moved to as well. If anything, that's the only attribute that even approaches consistency across them.

Based on that, I think the current proposal of field_type/date is reasonable and achieves the goal of "a typical user in the default case won't have a million files in one directory and make his OS cry".

Based on that, setting back to RTBC for webchick.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 69: 2128055_69.patch, failed testing.

Status: Needs work » Needs review

jonhattan queued 69: 2128055_69.patch for re-testing.

slashrsm’s picture

Status: Needs review » Reviewed & tested by the community

Looks green.

Bojhan’s picture

Issue tags: -Needs usability review

I assume in the ideal WYSIWYG file browsing integration the map structure is not directly tied to what you show in your browser. If we expect no issues on this front, I don't think we should keep this patch from going in. I would like to have some assurances on this front though, as we should indeed not introduce a very big UX problem this late in the cycle.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 69: 2128055_69.patch, failed testing.

Status: Needs work » Needs review

Lucasljj queued 69: 2128055_69.patch for re-testing.

Status: Needs review » Needs work

The last submitted patch, 69: 2128055_69.patch, failed testing.

slashrsm’s picture

Status: Needs work » Needs review
FileSize
5.27 KB

Reroll.

Status: Needs review » Needs work

The last submitted patch, 88: 2128055_88.patch, failed testing.

Status: Needs work » Needs review

slashrsm queued 88: 2128055_88.patch for re-testing.

slashrsm’s picture

Status: Needs review » Reviewed & tested by the community
attiks’s picture

Small nit pick

+++ b/core/modules/field/field.module
@@ -304,3 +305,49 @@ function field_form_config_admin_import_form_alter(&$form, FormStateInterface $f
+          'name' => t("Field name"),
+          'description' => t("Machine name of the field."),

Use single quotes

rootwork’s picture

Small nit pick addressed :)

DrupalRanger’s picture

Will there be a back port for this addition for D7?

Pls’s picture

Yep, would be great to have it as default setting for new D7 installs. Is this possible?

anavarre’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/modules/field/field.module
@@ -304,3 +305,49 @@ function field_form_config_admin_import_form_alter(&$form, FormStateInterface $f
+          $replacements[$original] = $sanitize ? SafeMarkup::checkPlain($field_storage->getName()) : $field_storage->getName();

Didn't we just deprecate SafeMarkup::checkPlain()?

Applied #93 and there's a problem with the [field-storage:name] token that is not being computed:

/var/www/html/head/sites/default/files/field/[field-storage:name]/2015/09/myfile.txt

anavarre’s picture

Issue tags: +Needs reroll
anavarre’s picture

Attempting a straight reroll to chase HEAD.

anavarre’s picture

Status: Needs work » Needs review
Pls’s picture

+      'file_directory' => 'field/[field-storage:name]/[date:custom:Y]/[date:custom:m]',

Why we are using only Year and Month as date tokens? Year/month/day format looks reasonable, especially when site is big and files are uploaded frequently (month directory can get big fast). Actually, I would be totally cool with Y/m/d (for ex. 2015/10/05) directory format, field name prefix just seems unnecessary and drupal'y :)

Why do we need field prefix? Does it make better directory separation?

anavarre’s picture

re:#100 - #98 is a straight reroll only and there's more work to do as #93 pointed out.

To your comments, I agree with Y/m/d or even Ymd (if we're concerned with the number of folders Y/m/d would create in the long run), that's what I always do, too (and it's also what @attiks suggested in #4). I don't do field/field-name at all, mostly because it's ugly in the URL + reasons listed in #22

Large media sites will potentially upload hundreds of assets every day and from experience I've never seen issues with as many as ~2000 files per dir, which gives quite some room for a day-based organization. Let's make a guess: 2000 assets x 31 days = 62000 assets - If those were to sit under a Y/m organization, then this would not be optimal.

Migrations can definitely be causing issues too but I'd consider this to be an edge case that most likely should be planned ahead anyway. Also, we shouldn't wait on committing this based on edge cases IMHO. This is just too important for too many sites to have sane defaults for assets.

Pls’s picture

Right, Y/m/d or Ymd looks reasonable choices. Can we vote on directory format selection? This should be settled before making a patch.

I think that if majority of interested drupalers agree on one format, then there is no need to make ugly default with field prefixes, just because we can.

anavarre’s picture

Here's an opinionated approach that organizes files under /assets/YYYYMMDD

EDIT: nope, that doesn't fly as well as assets/YYYY/MM/DD - Thanks @stefan.r for talking this through.

anavarre’s picture

#102 - Quickly created https://docs.google.com/spreadsheets/d/1trOcEvW4Yl7iH8VJ1MTlrhlD0cYhQrDa... for people to vote.

EDIT: issue is closed, Google Doc has been deleted because it doesn't serve any purpose any longer.

anavarre’s picture

Priority: Normal » Major
Issue summary: View changes
Issue tags: -Needs reroll

Changed priority because this *has* to land before D8 is out to the masses.

tstoeckler’s picture

Status: Needs review » Needs work
+++ b/core/modules/field/field.module
@@ -280,3 +281,49 @@ function field_form_config_admin_import_form_alter(&$form, FormStateInterface $f
+function field_token_info() {
...
+function field_tokens($type, $tokens, array $data = array(), array $options = array()) {

Can we move those to a new field.tokens.inc please? That would be great.

anavarre’s picture

Status: Needs work » Needs review
FileSize
5.74 KB

Let's try with #106. Also reverted to the below organization as it seems to be preferred for now:

'file_directory' => 'assets/[date:custom:Y]/[date:custom:m]/[date:custom:d]',

anavarre’s picture

Messed up with the previous patch. Let's break this down in 2 steps. First, the new files reorg.

anavarre’s picture

And now reverting changes to file.module to address #106. Hopefully the patch makes sense now?

The last submitted patch, 107: 2128055-107.patch, failed testing.

webchick’s picture

FWIW if this changes to just date parts vs. exposing internals (field machine names) to site visitors (in URLs) / content authors (in browse windows), many of my concerns go down a significant amount. I assume that option was thrown out for some kind of valid reason, but the issue summary doesn't really explain how the proposed resolution was arrived at.

anavarre’s picture

#111 - As I understand things, #72 (using the file browser) and #79 (research on how other CMS'es do it) have tried to answer your concerns. Is there anything else that is still causing you concerns that we should address?

Also, stepping back a bit we have to take into account what is the current state of things and where we're headed:

  • Currently: by default all files are uploaded under files/ unless you have your own token-based files strategy - This is a non-sustainable strategy that will bite sites at some point (we all have 10s of horror stories to share I'm sure). Also nobody can visually search for files after a certain number of files have been uploaded (enter assets search field)
  • Suggested: by default, all files will now be uploaded under files/ but with a clearly defined token-based strategy (e.g. assets/YYYY/MM/DD) - There is no anticipated long-term issue and only edge cases that we can't take into account will have to handle issues when/if they arise

We *might* be losing a bit of UX via the WYSIWYG's file browser (click click click) but this UI would anyway become moot at some point because simply listing files will no longer be possible when the file system chokes on too many assets to load. Also, we're just suggesting sane defaults that would satisfy the vast majority of sites (aka most common denominator) and also would help people who don't know what they're doing from having FS issues. It might not be in scope for this issue but ideally I would also like to enforce sane defaults for the maximum size of uploads (2M? 5M?).

If people are not happy with the default files structure, they can totally go and change this to something that suits them better. I think that's pretty reasonable considering the current state of files in D6/D7.

Status: Needs review » Needs work

The last submitted patch, 109: 2128055-109.patch, failed testing.

pwiniacki’s picture

Hmm I remember at 2010 when I did create multilingual domain access page, that I used file_field_path module to exclude this drupal behavior - fixing it by YYYY/MM would be nice.

I did use [current-date:custom:Y-m]/[node:nid]-[node:language]-[node:domain:id] token. I also changed image folder to custom MEDIA inside drupal folder (shorter path ->yousite/media/token/picture).

anavarre queued 109: 2128055-109.patch for re-testing.

anavarre’s picture

Status: Needs work » Needs review
Crell’s picture

I have no idea why we're adding day. For most sites that will spread files out so far you'll never be able to find them. The typical site is NOT uploading thousands of files a day. Also, not grouping by field name means that I can't easily tell when auditing the file system what node or field the file came from, and thus how to know if it's left over flotsam.

I'm very tempted to unsubscribe from this issue at this point, as a simple minor UX and performance improvement is now at 117 messages and 2 years, and now we're talking about polls and votes and other pointless crap. It's like a microcosm of why I'm less involved in the core queues these days. The ROI of even trying to help on issues like this one is frankly negative more often than not.

nielsvm’s picture

+1 for FIELDNAME/ prefix, this is absolutely required as it doesn't only help auditing, it also clearly groups files by functional role and for instance allows manual removal during development and allows users to do "du -hs FIELD/" to assess the upload size by field.

I'm very tempted to unsubscribe from this issue at this point, as a simple minor UX and performance improvement is now at 117 messages and 2 years, and now we're talking about polls and votes and other pointless crap. It's like a microcosm of why I'm less involved in the core queues these days. The ROI of even trying to help on issues like this one is frankly negative more often than not.

Hearing you crell, let's instead just cut a decision and get this in and dealt with.

FIELDNAME/YYYY/MM/

Let's not forget that the lack of this sane default, caused important sites to go down years after the fact badly reflecting on Drupal. Such a simple change, such a great improvement.

nielsvm’s picture

Again folks, we're talking defaults here.

High-profile news sites simply requiring their site builders to append DD/ to my proposed default...
everyone else should be A LOT safer!

anavarre’s picture

I have no idea why we're adding day. For most sites that will spread files out so far you'll never be able to find them. The typical site is NOT uploading thousands of files a day.

Based on experience, many, many sites have been bitten by too many files under one directory (in regular day-to-day operations or with migrations). I'm not saying it's DD or nothing but YYYY/MM could not be enough for large Drupal sites. If people really think YYYY/MM is enough, well let's go for it for the sake of committing this sooner than later.

Also, not grouping by field name means that I can't easily tell when auditing the file system what node or field the file came from, and thus how to know if it's left over flotsam.

That is a very good point.

I'm very tempted to unsubscribe from this issue at this point, as a simple minor UX and performance improvement is now at 117 messages and 2 years, and now we're talking about polls and votes and other pointless crap.

That's called democracy. Sorry you don't like it :-/

It's like a microcosm of why I'm less involved in the core queues these days. The ROI of even trying to help on issues like this one is frankly negative more often than not.

This issue is no small change, while it looks like it. Whether or not we do it right will impact so many sites for the foreseeable future and I'd rather have 50 more comments on this issue and still get it committed before the GA as long as we get this right once and for all. Honestly, we should all be rejoicing because we're so close to fixing one of the main pain points with Drupal sites.

nielsvm’s picture

Let's get this accelerated:

https://www.drupal.org/files/issues/2128055-121.patch
https://www.drupal.org/files/issues/interdiff-109-121.txt

This patch:

  1. [field-storage:name]/[date:custom:Y]/[date:custom:m]/[date:custom:d]
  2. Puts core/p...install/field.field.user.user.user_picture.yml in line.
  3. Puts core/p...install/field.field.node.article.field_image.yml in line.
  4. Works, test changed too, just having to wait on testbot.
  5. Works on the most advanced sites, causes no issues on small sites.

Opinions on format still differ, but please let's focus on the patch code and go for consensus so that we can get this in before 8.0 gets out. Thing is, when 8.0 is out people already building production sites, will start off with messy files directory already risking the first harm..

So opinions aside, this fixes the problem.

anavarre’s picture

Applied #121 successfully. Confirmed that the behavior works as advertised, not only for new files but for the default image field, too (which originally defaulted to field/image). Really like the approach. +1

YesCT’s picture

day seems too fine grained to me.

slashrsm’s picture

I agree with @YesCT.

YesCT’s picture

Issue summary: View changes

added a remaining task to the summary to address the concern about exposure of machine names

anavarre’s picture

Addresses @slashrsm, @YesCT and @Crell's concern (and probably others in this issue I've missed to mention). Back to [field-storage:name]/[date:custom:Y]/[date:custom:m] - Hopefully we have a consensus.

anavarre’s picture

Tested with Article's default image field + a new file field added to the Article content type. Here's what I see in the URL bar:

http://head.local/sites/default/files/field_file/2015/10/test.txt
http://head.local/sites/default/files/styles/large/public/field_image/2015/10/test.jpg?itok=qu7PXrk7

And from the FS:

$ ls -l
total 20K
drwxrwxr-x 3 anavarre anavarre 4.0K Oct  6 19:02 config_vAjaCrz23zBtGFARA45EkZOPUNg_1J1tGlP_qKKQOv4ug4vYbG-I_FsRpQsfkUC_5y7Jk4DfGg
drwxrwxr-x 3 anavarre anavarre 4.0K Oct  6 19:06 field_file
drwxrwxr-x 3 anavarre anavarre 4.0K Oct  6 19:06 field_image
drwxrwxr-x 3 anavarre anavarre 4.0K Oct  6 19:05 simpletest
drwxrwxrwx 4 anavarre anavarre 4.0K Oct  6 19:06 styles

In more details:

$ ls -l field_*
field_file:
total 4
drwxrwxr-x 3 anavarre anavarre 4096 Oct  6 19:06 2015

field_image:
total 4
drwxrwxr-x 3 anavarre anavarre 4096 Oct  6 19:06 2015
$ ls -l field_*/2015/
field_file/2015/:
total 4
drwxrwxr-x 2 anavarre anavarre 4096 Oct  6 19:06 10

field_image/2015/:
total 4
drwxrwxr-x 2 anavarre anavarre 4096 Oct  6 19:06 10
$ ls -l field_*/2015/10/*
-rw-rw-r-- 1 anavarre anavarre     88 Oct  6 19:06 field_file/2015/10/test.txt
-rw-rw-r-- 1 anavarre anavarre 731324 Oct  6 19:06 field_image/2015/10/test.jpg
nielsvm’s picture

I fail to see the 'field name exposure' concern, we're exposing machine names at many other places (e.g. in views markup) too after all. Even besides that, who would design a site with a field name containing a secret too big to expose?

anavarre’s picture

Besides the concerns with exposing machine names (that I don't find to be problematic), I wonder if it's likely this will make it to core for 8.0.0 (with the rc eligible tag?) or if this should be tagged minor version target instead?

I honestly think we'd be missing a big opportunity to help users and companies help themselves by having this in ASAP and promote best practices. Else we're in for years of troubles to fix messed up file systems.

anavarre’s picture

Issue tags: +scalability

Also tagging scalability as this does impact distributed file storage systems.

Fidelix’s picture

I still think this is too deep: [field-storage:name]/[date:custom:Y]/[date:custom:m]/[date:custom:d]

Instead we should use [field-storage:name]/[date:custom:Y]-[date:custom:m] as a sane default.
That would work for the great majority of installations, and it's simpler.

anavarre’s picture

#131 - Doing this would mean 12 folders per year x N years and nothing to be concerned with indeed. Sorry I had overlooked your previous attempt at suggesting this in #19.

Status: Needs review » Needs work

The last submitted patch, 132: 2128055-132.patch, failed testing.

anavarre’s picture

Status: Needs work » Needs review

anavarre queued 132: 2128055-132.patch for re-testing.

Status: Needs review » Needs work

The last submitted patch, 132: 2128055-132.patch, failed testing.

anavarre queued 132: 2128055-132.patch for re-testing.

anavarre’s picture

Status: Needs work » Needs review
anavarre’s picture

Issue summary: View changes
anavarre’s picture

Issue summary: View changes
anavarre’s picture

To answer @YesCT's concern in #125 about fields/machine names being exposed in the URL, here's the HTML source for the default image field:

<div data-quickedit-field-id="node/2/field_image/en/full" class="field field--name-field-image field--type-image field--label-hidden field__item quickedit-field"> <img property="schema:image" src="http://head.local/sites/default/files/styles/large/public/field_image/2015-10/photo.JPG?itok=YMZpC79M" alt="My image" typeof="foaf:Image" class="image-style-large" height="360" width="480">

field_image is already exposed in data-quickedit-field-id and didn't seem to be a concern before? IMHO this would be security through obscurity but there are always ways to know what you want to know (e.g. Views is exposing quite a lot)

anavarre’s picture

Issue summary: View changes
pwolanin’s picture

For this use case you would probably want to have a machine name token that gives you _ replaced with - ?

It's too bad FieldItemBase seems to be using \Drupal and not better dependency injection, so there maybe it not another option for this patch to get the token service.

It would be good to have some more test coverage here and to think about how to handle tokens with odd characters.

nielsvm’s picture

For this use case you would probably want to have a machine name token that gives you _ replaced with - ?

It's one simple str_replace() away, but I wonder if we should at all. All it does is adding slight runtime overhead and the machine names aren't exactly user input either... the UI-level input-validation is very agressive for machine names, only machine names through imported yml remains a vector: and that attack route allows a lot, lot, lot more danger.

It's too bad FieldItemBase seems to be using \Drupal and not better dependency injection, so there maybe it not another option for this patch to get the token service

I gave it a stab to see if we can get rid of \Drupal::token()->replace, no way, unless we start doing setter injection deeper down, but that's way more disruptive than this in my opinion. Having that said, all higher level classes seem to use \Drupal quite a lot.

It would be good to have some more test coverage here and to think about how to handle tokens with odd characters

I'm trying to think of what more we need, happy to get the patch improved further.

nielsvm’s picture

FileSize
40.93 KB

@pwolanin just tested through the UI again and made a screenshot of it, there is no way that it lets through anything besides lowercase letters, numbers, and underscores. Even when clicking edit after the autogenerated machine name, the edit field gets validated.

machine name

This is good, as it means all ui-generated machine names are safe to use on the filesystem. The only open vector remains imported YML config, but as said before, that's not up to this patch.

Crell’s picture

Yet another reminder, since people still seem to be missing this part: All we're adding here is a default. We're adding a default value for a text field. That is all. A site still has complete freedom to organize its files however it wants using whatever black magic it can make tokens support. 70% is the target here, NOT "all sites big and small".

anavarre: No, Drupal is not a democracy, nor has it ever been, not even a little. We have a governance structure defined, and it's not democratic. In case of a bikeshed we should be deferring to the subsystem maintainer.

webchick’s picture

And because it's a default, if we make the default just Y/M date parts, we are both more inline with what other CMSes do, and don't have to deal with any of the stuff in #143, etc. People like Crell can always add machine name into the directory pattern if they want. (FTR, I'm not concerned from a security aspects of field name, but the UX aspect. URLs are part of the UX of files, and legibility of URLs to non-technical people matters.)

I do think it's too late for this in 8.0.0 now and would recommend postponing to 8.1.x, but if you want to tag it "rc target triage" given the benefits to sites with massive amounts of files, feel free. Core commiters can then discuss committing it during RC.

Crell’s picture

Issue tags: +rc target triage

Tagging as suggested. Note that as a side effect we're adding token definitions here. (Whether that makes it more or less likely to be accepted in RC I don't know.)

webchick: I really don't see any UX issue with the URL of the file containing the machine name. Image styles do more to make the URLs long and hard to copy already than anything we're doing here. :-)

anavarre’s picture

Okay, taking it from here, I'm tentatively adding the rc target triage tag to hopefully make this issue eligible to RC's guideline item #5 "certain other issues with high impact and low disruption at committer discretion only" - Thanks for offering that path forward! EDIT: cross-posted with #148

I've updated the issue summary yesterday to list the 3 current approaches. Here they are again for clarity:

  • #121 suggests [field-storage:name]/[date:custom:Y]/[date:custom:m]/[date:custom:d]
  • #126 suggests [field-storage:name]/[date:custom:Y]/[date:custom:m] to prevent too many directories
  • #132 suggests [field-storage:name]/[date:custom:Y]-[date:custom:m] and goes one step further by having even less recursion

I think/hope we've addressed everybody's concerns in terms of:

  • Providing a solution that is better than what we have today (which is everything under files/ except for field/image and pictures)
  • Making sure the suggested files organization is sustainable for the next 5 to 10 years (IMHO it's pretty safe to say so)
  • Exposing machine names is not a concern (#141, #145, #147)
  • Tests - Validation (e.g. special chars - See #144 & #145) is taken care of when creating the field so shouldn't be an issue here

@webchick - You've expressed concerns about the UX but IDK if anybody here is really qualified to answer. Should we tag the issue accordingly? IMHO http://site.tld/sites/default/files/styles/large/public/field_image/2015... is not much worse than http://site.tld/sites/default/files/styles/large/public/field/image/phot...

  • We have the typical Drupal path to files: sites/[sitename]/files
  • We also have the typical Drupal image style: styles/large/public

So maybe too much is too much and it's "one more thing" that degrades the UX? I'm not competent to tell but just wanted to bring up that we might already have things to be concerned about from a UX standpoint. If we step back and think about what we have today, the file paths only make sense for experienced site builders anyway.

My next question is: what can we do that is still missing from the current patch to ensure we have something as solid as we can?

rootwork’s picture

I'm supportive of any of the approaches, because they're all way better than the current situation. I would lean toward #126 but since these are just defaults and people can change them, I don't feel strongly about one versus another as long as something gets in. And I hope it does happen during 8.0.0-rc.

@webchick I do think having the field name in the pattern is helpful for (again, default) organization, and like anavarre I don't completely see the UX problem with those file names, though I'd love to see someone review that specifically. But where you say having just the Y/m date parts would be more in line with other CMSes -- I was surprised that in the research I did in #79 that only WordPress appears to use that pattern. The other CMSes either throw everything into one folder and put a hash on it, or group the files into much larger categories like mime type or (in Drupal terms) entity type of where it was uploaded.

I also agree with anavarre that it'd be great to strategize about what else could make the patch even stronger so when folks come back to review it it's solid. I suspect "pick an option" would make this a stronger case at least to new-to-the-issue reviewer. Given what crell said in #146 are there relevant maintainers that might be able to take a look and recommend one option over the others -- just for the purposes of putting forward a single patch?

Crell’s picture

Maintainers.txt's listed file system maintainers are... sadly out of date. :-(

File system
- Andrew Morton 'drewish' https://www.drupal.org/u/drewish
- Aaron Winborn 'aaron' https://www.drupal.org/u/aaron
anavarre’s picture

webchick’s picture

I do understand the desire to subdivide files beyond just date to something more semantically meaningful, so what about a compromise of files/[entity:machine-name]/[date:custom:Y]-[date:custom:m]. (Note that I don't know if [entity:machine-name] exists yet, but then again neither does [field-storage:name] so ;))

So files uploaded to issues on D.o would get the default path:

files/issue/2015/10/patchy.patch

User pictures would get the path:

files/user/2015/10/stinkyfoot.png

Commerce products would get the path:

files/product/2015/10/buymenow.png

...and so on.

This is much better for end users because they would've seen those words in the UI already, when they created the content from node/add or whatever in the first place, and generally-speaking content types are named something in plain english vs. field machine names which might be anything (and generally start with an annoying "field_" prefix which is a waste of 6 bytes on every file path).

Crell’s picture

The only downside of #153 is that it's a larger change to the patch, I suspect, to expose a new token. :-) I could live with it, though.

Also, your suggestion uses a dash, but the examples use a /. Which did you intend? (I don't have a strong enough preference between 2015-10 and 2015/10 to fight for either one, I think.)

anavarre’s picture

Note that [date:custom:Y]-[date:custom:m] was suggested in #19 and again in #132 + is listed as an option in the issue summary. The main advantage being to limit the number of directories that will have to be created. While I've never seen any issue with [date:custom:Y]/[date:custom:m] or [date:custom:Y]/[date:custom:m]/[date:custom:d] in the past, I also understand the need for simplicity and the low risk it represents makes [date:custom:Y]-[date:custom:m] a pretty good candidate.

Not opinionated on [entity:machine-name] either. Maybe it makes individual fields auditability a bit more challenging though?

Crell’s picture

anavarre: I suppose it's a question of whether we want to default to field-based auditing or entity-based auditing. Both have valid cases, I suppose. I don't feel strongly enough either way there, as both are logical and both are more logical than pure date.

webchick’s picture

Sorry, I missed that detail; no strong feelings one way or another about "-" so that's fine. :)

timhilliard’s picture

I prefer #121 over #126 over #132.

I think using hyphen is unnecessary and most people are used to the extra directory for month. I also think adding the day as #121 does is worth it as I see the problem where a feed importer or something to the like might ingest thousands of files every day. This could easily add up to a very large number of files for a whole month but would be manageable if separated by day. I realize that if the client does end up with a feed imported into their site they should be able to adjust the file path but it is often over looked and only dealt with once it becomes a problem, why not just handle this with the default since one extra folder layer will not be much overhead.

[entity:machine-name] sounds great but no strong opinion of using that over [field-storage:name], especially given it doesn't yet exist. Maybe we go with [field-storage:name] since we already have that and don't have to wait for / build a new token. So #121 as is with #126 as is as second preference.

slashrsm’s picture

I'd just like to remind everyone once again: we are talking defaults here. Every site can choose to take it's own approach, every field can use something different, distributions can ship with their own defaults.

Let's try not to spend few more months discussing trivial questions. Please, please :)

webchick’s picture

Just to be clear, unless we go with raw date parts (e.g. YYYY-MM, which is the simplest thing that could possibly work, but there was pushback on that), we're inventing a new token either way. This is why I was pushing for raw date parts, because that would cut this patch down to like one line (ok, maybe 10 lines ;)) and make it much more palatable as an RC target (because, as slashrsm says, these are just defaults).

anavarre’s picture

@webchick: are you saying that we should go with YYYY-MM defaults only if we want to see this land before the GA and we should pursue more solid defaults to target 8.1 instead? Or is there any chance it could make it 'as is' for the GA, still?

slashrsm’s picture

I think that @webchick tried to say exactly that. It is much more likely this will go in if simplified and something is better than nothing IMO.

It is not optimal, but it is a start. We can still add token support for fields and improve default setting in follow-ups.

Status: Needs review » Needs work

The last submitted patch, 162: interdiff_132_162.patch, failed testing.

slashrsm’s picture

Status: Needs work » Needs review

Meh... :)

Status: Needs review » Needs work

The last submitted patch, 162: interdiff_132_162.patch, failed testing.

timhilliard’s picture

Status: Needs work » Reviewed & tested by the community

Looks good. I still prefer YYYY/MM over hyphen but I'm not going to hold this up over that so RTBC.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 162: interdiff_132_162.patch, failed testing.

slashrsm’s picture

Status: Needs work » Needs review
FileSize
4.31 KB

This patch is identical to the one on #162. Hopefully this fixes the mess that I created.

nielsvm’s picture

Although I'm a strong advocate of the field name token, I'm backing the idea of 'getting it in' over 'getting it perfect'. For any high-traffic site, the #168 patch will do it, period.

As long as we're not shipping 8.0 with a ever-exploding sites/default/files by default, I'm more than happy to wait for even better defaults post 8.0.

+1

nielsvm’s picture

Status: Needs review » Reviewed & tested by the community

Reviewed #168 once more and still looking good and sane, let's hope testbot doesn't take long either.

The last submitted patch, 162: interdiff_132_162.patch, failed testing.

catch’s picture

I just scanned the past 50 or so comments.

Just a general point that it's only core where a file is tied to a specific field name. It's quite possibly to reference the same fid from multiple fields, and when that happens the field name will not make any sense - the file may even switch from one field name to another if the original entity it was uploaded to was deleted.

For auditing files, you can do that with code by comparing the contents of the files directory to the database, I'm not sure how naming helps there at all.

Year and month seems great for the default.

anavarre’s picture

alexpott’s picture

Issue tags: -rc target triage +rc target

Discussed with @effulgentsia - we've had a problem with default file organisation since forever. It makes sense to provide some defaults that make it easier to manage sites especially for people new to Drupal who don't know about such settings. Considering we're just changing the defaults here it is not a risky change especially now we're doing the minimum possible change so let's do this in RC - it makes the out-of-box experience of Drupal better.

+++ b/core/profiles/standard/config/install/field.field.user.user.user_picture.yml
@@ -18,7 +18,7 @@ default_value: {  }
-  file_directory: pictures
+  file_directory: '[date:custom:Y]-[date:custom:m]'

I was going to say I'm not sure about this change because user pictures are different - but pictures is hardly that helpful a name either.

webchick’s picture

Status: Reviewed & tested by the community » Needs work

While I've seen the lack of separating folders within the files directory bring down sites, I've never encountered any real-world problems with having user pictures segmented off in their own folder before. So this seems like an unnecessary change, that might introduce more problems than it solves. Should probably be updated to pictures/YYYY-MM instead.

timhilliard’s picture

I rerolled with @webchick's suggestion as I think it makes sense. I also updated node article field_image to prefix with field/image as this was the directory previously and I think it makes sense to make it consistent with user pictures.

timhilliard’s picture

Status: Needs work » Needs review

The last submitted patch, 162: interdiff_132_162.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 176: interdiff-168-176.patch, failed testing.

timhilliard’s picture

Status: Needs work » Needs review

The last submitted patch, 162: interdiff_132_162.patch, failed testing.

timhilliard’s picture

Re-uploading patches so interdiff doesn't trigger testing. Sorry for confusion.

The last submitted patch, 176: interdiff-168-176.patch, failed testing.

anavarre’s picture

I'm really confused now. field/image/[date:custom:Y]-[date:custom:m] makes field_image a special snowflake. Do we really, truly want that?

timhilliard’s picture

ok, re-rolled with the field_image snowflake removed.

timhilliard’s picture

+++ b/core/profiles/standard/config/install/field.field.node.article.field_image.yml
@@ -17,7 +17,7 @@ translatable: true
 settings:
-  file_directory: field/image
+  file_directory: '[date:custom:Y]-[date:custom:m]'
   file_extensions: 'png gif jpg jpeg'

Should we actually be omitting this since it is the same as the default?

anavarre’s picture

#186: seems fair to do it. Tested removing file_directory: '[date:custom:Y]-[date:custom:m]' from core/profiles/standard/config/install/field.field.node.article.field_image.yml and the default tokens are picked up fine in the UI and files are successfully uploaded in the expected location.

timhilliard’s picture

Re-rolled with field_image override removed.

anavarre’s picture

Issue summary: View changes

Status: Needs review » Needs work

The last submitted patch, 188: 2128055-188.patch, failed testing.

effulgentsia’s picture

Status: Needs work » Needs review
FileSize
5.28 KB
1.99 KB
  1. +++ b/core/modules/file/src/Plugin/Field/FieldType/FileItem.php
    @@ -313,6 +313,7 @@ public static function generateSampleValue(FieldDefinitionInterface $field_defin
    +    $dirname = \Drupal::token()->replace($dirname);
    

    I don't like the duplication of this line with a similar line in getUploadLocation(), especially since this doesn't fully duplicate it (it doesn't convert to plain-text).

  2. +++ b/core/profiles/standard/config/install/field.field.node.article.field_image.yml
    @@ -17,7 +17,6 @@ translatable: true
    -  file_directory: field/image
    

    Even though this works per #187, it's good practice to include the field defaults in the config/install .yml anyway, as it results in exported config not having extraneous differences from the initially imported config. And StandardInstallerTest tests for that via assertInstalledConfig(), which is the test failure in #188.

This patch addresses both of the above.

catch’s picture

Status: Needs review » Reviewed & tested by the community

#191 looks great to me now, I think this is a good balance for both the default, the pictures and the article image field.

The last submitted patch, 162: interdiff_132_162.patch, failed testing.

The last submitted patch, 188: 2128055-188.patch, failed testing.

effulgentsia’s picture

Ticking credit boxes for people who helped to substantially move this issue forward. Apologies if I missed anyone.

  • effulgentsia committed f081fa8 on 8.0.x
    Issue #2128055 by slashrsm, anavarre, timhilliard, estoyausente, nielsvm...
effulgentsia’s picture

Title: Files should be uploaded to directories based on tokens by default » Files should be uploaded to per year/month directories by default
Version: 8.0.x-dev » 7.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)
Issue tags: +Fast By Default

Discussed with @effulgentsia - we've had a problem with default file organisation since forever. It makes sense to provide some defaults that make it easier to manage sites especially for people new to Drupal who don't know about such settings. Considering we're just changing the defaults here it is not a risky change especially now we're doing the minimum possible change so let's do this in RC - it makes the out-of-box experience of Drupal better.

To give more background on this, when I first discussed this with @alexpott, I was initially against doing this during RC, because I don't think RC is the right phase in the development cycle to be changing default configuration away from what's been in use throughout all of Drupal 7's lifetime and Drupal 8's development phase. However, in this case, we have plenty of real world experience to know that the existing defaults cause problems for many site owners who don't know that that's what's causing the problem and how to improve their configuration to fix it. So with that and with our other efforts in "Fast By Default" issues, I think it does make sense to still allow this in. Therefore, pushed to 8.0.x and updated the change record to reflect the committed patch and published it. Thank you, everyone, who helped on this.

There were some comments on this issue that asked for a backport to 7.x, so moving to that queue for that to be discussed.

Crell’s picture

Should we add a new issue about adding the tokens to core that this issue was doing originally? (Probably postponed for 8.1)

rootwork’s picture

@crell I mentioned that in #2592251-4: Uploading files should have more robust default tokens -- so that could be the followup issue unless there needs to be a separate meta-issue for deciding on the tokens prior to an issue to actually create them.

mikeker’s picture

Anonymous’s picture

So, I am running D8RC2 site and this change is fine with me and for the future it is really good choice. But one issue I have with this is that the default path field for File-based Field API fields has no token tree nor any mention of available tokens in its description. Is there an issue for this or we just assume the user is all knowing? Because to me this looks like a huge UX/UI fail.

  • effulgentsia committed f081fa8 on 8.1.x
    Issue #2128055 by slashrsm, anavarre, timhilliard, estoyausente, nielsvm...

  • effulgentsia committed f081fa8 on 8.3.x
    Issue #2128055 by slashrsm, anavarre, timhilliard, estoyausente, nielsvm...

  • effulgentsia committed f081fa8 on 8.3.x
    Issue #2128055 by slashrsm, anavarre, timhilliard, estoyausente, nielsvm...

  • effulgentsia committed f081fa8 on 8.4.x
    Issue #2128055 by slashrsm, anavarre, timhilliard, estoyausente, nielsvm...

  • effulgentsia committed f081fa8 on 8.4.x
    Issue #2128055 by slashrsm, anavarre, timhilliard, estoyausente, nielsvm...
vijaycs85’s picture

Status: Patch (to be ported) » Needs review
FileSize
3.27 KB

Direct port of D8. Probably we might need an upgrade path? wondering if we need to move existing files. Surely the change would work for new fields.

rahul.nahar001’s picture

Hi @vijaycs85,

I have applied patch#207. it fixed the issue.
For new image and file fields, the image and file are uploaded in the proper location(files\year-month).

Thanks
Rahul Nahar

Berdir’s picture

Definitely not moving existing files or even changing existing settings. However, some tests would probably make sense to verify this works?

zopa’s picture

Y-m folder convention still causes too many files - we need files saved to Y/m/d folders.

Refactored patch #207 for Y/m/d file structure use case.

Berdir’s picture

Status: Needs review » Needs work

It is a default configuration, big sites with many files can always change it. It's meant to be a sane default for most sites, which do not create dozens or hundreds of files per day. This is what has been discussed and agreed upon and how it has been implemented in Drupal 8. So lets keep that.

Still needs tests, so needs work for that.

garth.powell’s picture

I have a major issue with the file being placed withing automatically generated folders by date

The problem we have with this is that when we update a file it remains it the folder it was created in, for example the may folder, and then places the new file into the July folder.

The issue we have with this is that all the links on the site to that file still points to the old location.

This means that when we updated files the links on pages will just point to the old file which is a major issue

We are mitigating this by using field redirect and linking to the parent node, not the file but users will always open the file and copy the URL from the file, not the node.

Has anyone else encountered this? If so is there a better solution?

ksenzee’s picture

Status: Needs work » Needs review

The patch at #207 does modify the existing tests to make sure that the upload location is correct. I'm not sure what else makes sense to test.

The last submitted patch, 182: interdiff-168-181-do-not-test.patch, failed testing. View results

The last submitted patch, 185: interdiff-168-185-do-not-test.patch, failed testing. View results

Berdir’s picture

Status: Needs review » Reviewed & tested by the community

I am blind :)

Yes, #207 seems fine.

ajayg’s picture

This has been RTBC for a while. What is preventing commit for this?

cilefen’s picture

marloAZ’s picture

What about using YYYYW (year + week number, no separator) ; e.g. "file/201830/file01.txt"
In a 5 day week, this allows 500 uploads/day = 2500/folder.
This would suit *most* sites, with the benefit of giving an idea of the file (upload) date without suggesting to users that it's the file version. Whereas if you add hyphen or slash separators, it appears to users that it's the file date.

rootwork’s picture

@marloAZ I'm going to quote @Berdir in #211:

It's meant to be a sane default for most sites, which do not create dozens or hundreds of files per day. This is what has been discussed and agreed upon and how it has been implemented in Drupal 8. So lets keep that.

This issue has gone a couple of hundred comments, and we're working on implementation. If you want to propose a new plan, you can create a follow-up issue.

rootwork’s picture

leventdal’s picture

I think this not the best arbitrary solution that is neither universal nor robust. Also it is misleading in many aspects.

It is implemented as if it is a photo album or a screenshot software. Using dates is a bit like an excuse for not using totally random hashes.

I also couldn't find any documentation to apply a custom folder other than using private folders.

MustangGB’s picture

Issue tags: +Drupal 7.60 target
joseph.olstad’s picture

Issue tags: -Drupal 7.60 target +Drupal 7.70 target

Bumping to 7.70, this didn't make it into 7.60

** Edit **
I have no objections to putting this as 7.61,
for the file_entity module what we did is provide a helper message in the default folder setting but we did not enforce it, either way I think it's good.

see related file_entity commit:
https://cgit.drupalcode.org/file_entity/commit/?id=073d1c06a6dc83f48e6de...

  $form['file_entity_default_file_directory'] = array(
    '#type' => 'textfield',
    '#title' => t('Default file directory'),
    '#default_value' => variable_get('file_entity_default_file_directory', ''),
    '#maxlength' => NULL,
  );
  if (module_exists('token')) {
    $form['file_entity_default_file_directory']['#description'] = t('Optional subdirectory within the upload destination where files will be stored if the file is uploaded through the file entity overview page and the directory is not specified otherwise. Do not include preceding or trailing slashes. This field supports tokens.  Suggest using: [current-date:custom:Y]/[current-date:custom:m]/[current-date:custom:d]');
    $form['file_entity_default_file_directory']['tokens'] = array(
      '#theme' => 'token_tree',
      '#dialog' => TRUE,
    );
  }

this is already taken care of by the file_entity module version 7.x-2.x / 7.x-3.x in a soft way without actually putting in a default value

but I agree with this change to core, core should provide a good default.

** End Edit **

ajayg’s picture

Why bump to 7.70? Why not 7.61?

joseph.olstad’s picture

I'd love to see this in 7.69, however 7.70 might be just around the corner.

With that said, file_entity provides a helper message suggesting to do default foldername by date and provides the token to do it.

If you need this right away, just upgrade to the latest releases of file_entity.

joseph.olstad’s picture

Those using file_entity get this already in recent releases but not all installations use file_entity. It is a good idea for core to do this also by default.

izmeez’s picture

Patch in #207 is the direct port from D8 using folder names Y-m for files.
Patch in #210 uses folder names Y/m/d for sites with large volume of files.

Meanwhile, comment in #212 identifies a problem,

The problem we have with this is that when we update a file it remains it the folder it was created in, for example the May folder, and then places the new file into the July folder.

The issue we have with this is that all the links on the site to that file still points to the old location.

Although it will not address the problem in comment #212, I was wondering if having a conf for settings.php might be useful to allow sites to decide whether they want to use 'Y-m' or 'Y/m/d' or leave empty '' for sites with fewer files that don't need this.

mcdruid’s picture

Issue tags: -Drupal 7.74 target

@Fabianx and I had a look at this for D7; generally backporting the D8/9 approach seems good although it's not perfect there either.

This change would only affect new fields, and most D7 sites where the "everything in one directory" setting caused problems will most likely have addressed this some other way.

mcdruid’s picture

One thing to remove the patch - we shouldn't change a hook_update_N implementation.

That could be fixed on commit, but ideally we'd test a patch without it.

#207 is preferred as it's a straight backport. Any sites that need slightly different token patterns can set those up for themselves.

  • mcdruid committed 664d391 on 7.x
    Issue #2128055 by slashrsm, anavarre, timhilliard, estoyausente, nielsvm...
mcdruid’s picture

Status: Reviewed & tested by the community » Fixed

Thank you!

izmeez’s picture

Does this require or have a change record?

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

jigarius’s picture

I wish the files were uploaded by default to a directory like "YYYY/MM" instead of "YYYY-MM". I'll see if I can find an open issue for that.