Comments

neclimdul’s picture

What's the status of this? Should #2504815: d6 to d8 migration throws integrity contraint warning with duplicate file paths expect a unique constraint to exist in the database target?

frob’s picture

Is this going to make it for 8.0 or should this be moved to 8.1?

xjm’s picture

Version: 8.0.x-dev » 8.1.x-dev
Status: Active » Postponed

Yep, this is 8.1.x material at this point, so contrib could provide it for 8.0.x. Thanks!

andypost’s picture

Status: Postponed » Active
Related issues: +#1925684: Provide a "original filename" field for file entity.

Suppose it's time, but maybe this blocked on #1925684: Provide a "original filename" field for file entity.

frob’s picture

I wouldn't consider this blocked on that one. They should be able to be worked on in parallel.

Deciphered’s picture

It's worth mentioning that this functionality is part of File (Field) Paths module, which has a D8 beta release.

Jon@s’s picture

It's worth mentioning that this functionality is part of File (Field) Paths module, which has a D8 beta release.

True but it seems like this functionality should be in core. This would also require the transliteration in core to support replacing spaces with a symbol which it, to my knowledge, currently does not. Also why the transliteration functionality in the File (Field) Paths module is limited compared to the old Transliteration module.

Deciphered’s picture

I'm not at all debating that it should be in core, but given @xjm's comment that it's not going to happen in 8.0.x, and that it needs to be dealt with in contrib, this is it being dealt with in contrib.

File (Field) Paths does also provide replacing of spaces with a symbol, as it has since inception (or not long afterwards) via the Pathauto integration.

I'm more than happy to remove the functionality from F(F)P as soon as it's no longer needed.

g089h515r806’s picture

I have test with File (Field) Paths with Drupal8.0.1, it does not works at first. it works correctly at last.
I want to upload chinese pdf file using file field module.

Deciphered’s picture

That seems like a poorly worded statement. But I will be happy to look into it further in the appropriate issue queue.

g089h515r806’s picture

Sorry for my English.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.0-beta1 was released on March 2, 2016, which means new developments and disruptive changes should now be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

eigentor’s picture

I have tried filefield paths to solve the problem of transliterating filenames. For me it dit not work. Some comments for filefield paths:

  1. It seems to keep the original filename and create the new filename, which is some kind of alias. I think this is by design and in the sense of the module, but unneccessary for the simple goal of getting web-safe filenames.
  2. One has do the setting "transliterate filename" in every single filefield, maybe even instance of every filefield. That is quite some work and can cause oversights.
  3. There is a major problem with filefield paths: it cannot be uninstalled https://www.drupal.org/node/2685731 The module is not alone with this problem. The new uninstall system of Drupal 8 requires all database content of a module to be removed before you can uninstall it. If this content is not deletable by hand, there is a problem.
imadalin’s picture

I don't think that the issue's topic is about File (Field) Paths and we should concentrate on implementing Transliteration for filenames manipulation in the Core.

pingwin4eg’s picture

Status: Active » Needs review
FileSize
1.34 KB

Let's start with something.

Here's a patch using code similar to that from a transliteration module.

Berdir’s picture

"Something" looks like a good start to me, at least it doesn't break any existing tests. Next step would be tests with file names that trigger this.

Also wondering about how to introduce this exactly, is there any reason to make this a setting, so it could be disabled? It will be a behavior change, although I suspect a very welcome one for most (all?) sites?

SKAUGHT’s picture

i'll be bold enough to say all sites should use.

The only other request I've ever had is about storing the original file name, that could be used for Title href. As people often name a file because they expect it would be saved. Same for Search API related issues. At this time, I haven't directly looked to see if the File API is yet doing that..

^I think there would be no need for an 'on/off' switch of transliteration if org. name is stored for about points.

$clean_filename = \Drupal::transliteration()->transliterate($file->getFilename(), 'en', '');
I should think 'en' shouldn't be hardcoded. sites will not all be english first.

stefan.r’s picture

This probably should be opt-in (at least for existing sites), so as to not change behavior.

SKAUGHT’s picture

Status: Needs review » Needs work
andypost’s picture

opt-in

Makes sense to have that configurable via container param

hgoto’s picture

I updated the patch #15.

1.

$clean_filename = \Drupal::transliteration()->transliterate($file->getFilename(), 'en', '');
I should think 'en' shouldn't be hardcoded. sites will not all be english first.

I replaced the hardcoded 'en' to $langcode with the following code:

      $langcode = \Drupal::languageManager()->getCurrentLanguage()->getId();

      // Transliterate and sanitize the destination filename.
      $filename = \Drupal::transliteration()->transliterate($file->getFilename(), $langcode, '');

2.

I changed the strtolower() to Drupal\Component\Utility\Unicode\Unicode::strtolower(). I consulted #1842718: Use new Transliteration functionality in core for machine names for this point. This may be unnecessary.

      // Force lowercase to prevent issues on case-insensitive file systems.
      $filename = Unicode::strtolower($filename);

3.

I introduced an option config to toggle the transliteration for the file names.

This probably should be opt-in (at least for existing sites), so as to not change behavior.

Probably it depends on the language that people want to make this setting opt-in or opt-out (of course, it shouldn't be changed for existing sites with update). In my small personal experience, the current transliteration function for Japanese is not very good and I'd like it to be off by default...

Tests are not yet added.

hgoto’s picture

Status: Needs work » Needs review

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

arshadcn’s picture

Patch in #21 tested and works. Thanks.

dawehner’s picture

+++ b/core/modules/file/file.module
@@ -842,7 +842,28 @@ function file_save_upload($form_field_name, $validators = array(), $destination
+    if (\Drupal::config('system.file')->get('filename_transliteration')) {
+      $langcode = \Drupal::languageManager()->getCurrentLanguage()->getId();
+
+      // Transliterate and sanitize the destination filename.
+      $filename = \Drupal::transliteration()->transliterate($file->getFilename(), $langcode, '');
+
+      // Replace whitespace.
+      $filename = str_replace(' ', '_', $filename);
+      // Remove remaining unsafe characters.
+      $filename = preg_replace('![^0-9A-Za-z_.-]!', '', $filename);
+      // Remove multiple consecutive non-alphabetical characters.
+      $filename = preg_replace('/(_)_+|(\.)\.+|(-)-+/', '\\1\\2\\3', $filename);
+      // Force lowercase to prevent issues on case-insensitive file systems.
+      $filename = Unicode::strtolower($filename);
+    }

I'm wondering why this is part of file_save_upload(), which is part of an API which is related to the form, not some actual underlying API level. IMHO you would like to have the same for potential files uploaded via RESt one day.

Maybe it would be enough for now to move all that logic into its own API function and call it from here?

hgoto’s picture

Thanks for your reviews. As dawehner told, surely it's better to make this logic an independent API function.

Maybe it would be enough for now to move all that logic into its own API function and call it from here?

I tried creating a new method getTransliteratedFilename() in \Drupal\file\Entity\File class and moved this new transliteration logic into it. I'm not sure where we should put this API... I'd like this patch to be fully reviewed. Thank you.

Status: Needs review » Needs work

The last submitted patch, 26: drupal-use_new_transliteration-2492171-26.patch, failed testing.

hgoto’s picture

The patch #26 failed the auto test because I didn't change the code properly when I extracted the logic to a method. I'll try it again.

tobiberlin’s picture

Tested patch in #28 with a file named with spaces and German umlaut -> works

tobiberlin’s picture

If this is added to core it would be very helpful to offer an update function to transliterate the already uploaded files, wouldn't it?

Berdir’s picture

Nope, that doesn't work. you don't know how and where those files are used, could be hardcoded in content. We can only offer it for new files.

tobiberlin’s picture

In my eyes this answer makes no sense as this patch is about using transliteration on images uploaded by a form. Whenever an image uploaded for a field for example is used somewhere hardcoded the same problem as described in #31 would appear when simply a new image is uploaded. Whenever an image path/ name is used hardcoded although the image can be changed by a form it is a wrong programming, isn't it?

ressa’s picture

It would be good to have transliteration of file names in Drupal 8 - and really great to get it into Drupal 8.3, if possible.

sense-design’s picture

One of the features why I have not yet completely switched to D8 for all my projects.
Would be great to have this in the upcoming 8.3 release.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

hampercm’s picture

Status: Needs review » Needs work

In response to #30 and #31: perhaps add a drush command to transliterate existing files, if a site admin needs to.

+++ b/core/modules/file/file.module
@@ -842,7 +842,10 @@ function file_save_upload($form_field_name, $validators = array(), $destination
+    $filename = $file->getTransliteratedFilename();
+
+    $file->destination = file_destination($destination . $filename, $replace);

Earlier code in this function does a setFilename() if the filename is changed (see code that handles insecure file extensions at line 818). I'd recommend doing that here, too.

+++ b/core/modules/file/src/Entity/File.php
@@ -196,6 +197,38 @@ public function preSave(EntityStorageInterface $storage) {
+      // Remove multiple consecutive non-alphabetical characters.
+      $filename = preg_replace('/(_)_+|(\.)\.+|(-)-+/', '\\1\\2\\3', $filename);

This seems extraneous. Remove it?

kovtunos’s picture

Updated the patch #28 to work with Drupal 8.3.0.
Changes: new PHP array syntax.

hgoto’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work
borisson_’s picture

I tried this patch and it works, thanks!

I think the failure in #37 is unrelated, but this does need it's own tests. Since we're adding a new public function for the file name transliteration, all we'd need to do is call that function with different file names in the test, I think?

Berdir’s picture

We can test it directly in a kernel test with diferent variations. I think we also need at least one integration test, upload an image with some spaces or so to make sure the function is actually called.

borisson_’s picture

This patch adds the kernel tests. Not sure if the test-class is the correct one.

Status: Needs review » Needs work

The last submitted patch, 42: use_new_transliteration-2492171-42.patch, failed testing.

borisson_’s picture

Ah, those migrate tests are related, I don't see how those should be fixed though.

hgoto’s picture

I believe the failed tests for #42 can be fixed just by providing the added configuration into $expectedConfig['system.file'] in the test classes.

       // temporary_maximum_age is not handled by the migration.
       'temporary_maximum_age' => 21600,
+      // filename_transliteration is not handled by migration.
+      'filename_transliteration' => FALSE,
     ],
     'system.image.gd' => [
       'jpeg_quality' => 75,

The test passed in my environment with this change and I'd like to see how the testbot reacts.

hgoto’s picture

Oops, I'm sorry, the extension of the interdiff is wrong.

Status: Needs review » Needs work

The last submitted patch, 45: interdiff-2492171-42-45.patch, failed testing.

hgoto’s picture

Status: Needs work » Needs review

The tests passed. (The failure was just caused by the wrong extension of the interdiff.)

J-Lee’s picture

@@ -274,4 +275,37 @@ public static function baseFieldDefinitions(EntityTypeInterface $entity_type) {
     return $fields;
   }
 
+  /**
+   * Gets the transliterated filename.
+   *
+   * @return string
+   *   The transliterated filename. If the filename transliteration options is
+   *   disabled, the raw filename of the file.
+   */

Just found a typo:

If the filename transliteration options is disabled, the raw filename of the file.

"options" should be "option"

The patch works for me. Thank you.