I am bulding Drupal 6 international site and we need attachments with unicode filenames [e.g. Czech, Russian, Georgian, Armenian]. It worked without problems locally on my Windows with XAMPP installed but on Linux hosting [hostmonster] it failed - the file names are right truncated, first word in name is lost. Nevertheless hosting server is supporting unicode as I can test transfering there the same files by FileZilla FTP client with forcing UTF8. Upload of attachments with unicode filename failed the same way also at tests with JumpBox Drupal6 vmware apliance.
Attachments
list-file.png: Files for test
FileZilla-results.png results of FileZilla uload
Drupal6-results.png results after atachments via Drupal6.6

CommentFileSizeAuthor
Drupal6-results.png31.54 KBsalava
FileZilla-results.png23.34 KBsalava
file_list.png9.01 KBsalava
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

salava’s picture

Title: Unicode filenames test » Upload of files with Unicode in filenames failed

better descriptive title

dpearcefl’s picture

Is this still an issue using current Drupal 6?

osha’s picture

I think I am having the same or a very similar problem in Drupal 7.7.

I have a very normal webform which allows file uploads. When uploading files with characters higher than ASCII 128 -- not very exotic ones, though -- Drupal is misreading the file's extension in a systematic way. This happens on the live version of the site (in all browsers), but not when running on localhost, which led me at first to think it wasn't a Drupal problem, but now I am tending towards thinking it's a Drupal issue.

Specifically: if the user tries to upload
zzzqqq.pdf -- works fine
zózqqq.pdf -- Drupal posts a message saying "files with the 'df' extension are not allowed"
zózqóq.pdf: -- Drupal posts a message saying "files with the 'f' extension are not allowed"
zózóqóq.pdf: -- Drupal posts a message saying "files with the '' extension are not allowed"

So the number of characters higher than ASCII 128 is related to how it misinterprets the extension. One character and it thinks the extension is "df". Two characters and it thinks the extension is "f". Three and it thinks the extension is zero.

The same non-Unicode characters in other (text) fields in the webform upload correctly and get e-mailed correctly. This leads me to think that there might be some bug in the way that Drupal's file upload widget processes filenames with Unicode characters.

This affects filenames using very widely spoken languages like French and Spanish. Here are some other not very rich threads that look as if they deal with this issue.
http://drupal.org/node/137034
http://drupal.org/node/100766

Has anyone experienced this? How to deal with it?

osha’s picture

I didn't mean to change the first post into an "issue summary", don't know how that happened!

osha’s picture

I solved my problem by installing the Transliterations module, but it would be nice to have support for two-byte characters in filenames. There are cases where changing the filename to only single-byte characters is just not good enough. I recognize this might be a PHP issue too.

jiakomo’s picture

I have this problem, too. The first greek word of the file name is cut out without notice. If the file name consists of only one word, it is a serious problem.

I am curious if this bug exists only under certain setups. I expected to see more comments here because, as I understand, it affects ALL uploads with non-english filenames.

Transliterations module can solve this, but this should also be addressed in core.

forpost’s picture

Version: 6.6 » 7.7

I do confirm this bug for unicode symbols in file names.
First word is omitted completely.
Temporary solution - add some Latin chars to beginning of the file name.
This way you can preserve original phrase in the name.

Version: 7.7 » 7.x-dev

Core issues are now filed against the dev versions where changes will be made. Document the specific release you are using in your issue comment. More information about choosing a version.