Problem/Motivation

system_update_7061() fails if there are files with the same name but different case, e.g.: Example.txt and example.txt

The error produced is:

Update #7061
Failed: PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'public://Example.txt' for key 'uri': INSERT INTO {file_managed} (fid, uid, filename, uri, filemime, filesize, status, timestamp) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7); Array ( [:db_insert_placeholder_0] => 18341 [:db_insert_placeholder_1] => 0 [:db_insert_placeholder_2] => Example.txt [:db_insert_placeholder_3] => public://Example.txt [:db_insert_placeholder_4] => text/plain [:db_insert_placeholder_5] => 447 [:db_insert_placeholder_6] => 1 [:db_insert_placeholder_7] => 1287789425 ) in system_update_7061() (line 2836 of modules/system/system.install).

Proposed resolution

Possible solutions include:

  1. Add the binary attribute to {file_managed}.uri so that it is case sensitive on MySQL -- see #1237252: DB Case Sensitivity: Allow BINARY attribute in MySQL
  2. Affirm that file uri should be case-insensitive unique. Catch the exception and rename files as required, although this would break existing links to the files. (See #5).

Remaining tasks

  1. #1237252: DB Case Sensitivity: Allow BINARY attribute in MySQL is blocking this issue.
  2. Once that patch is committed, the patches here (in #34) should be rerolled to include just the file test.
  3. The upgrade path still needs to be fixed.

Other related issues:

User interface changes

Some files might be renamed, breaking existing links, depending on which solution is used.

API changes

None from this issue. (The DB API addition has been moved to #1237252: DB Case Sensitivity: Allow BINARY attribute in MySQL.)

Original report by @mfb

system_update_7061() fails if there are files with the same name but different case, for example: Example.txt and example.txt

I guess file_managed.uri column needs the binary attribute so that it is case sensitive? Or else, system_update_7061() needs to rename files that violate the unique key constraint.

Update #7061
Failed: PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'public://Example.txt' for key 'uri': INSERT INTO {file_managed} (fid, uid, filename, uri, filemime, filesize, status, timestamp) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7); Array ( [:db_insert_placeholder_0] => 18341 [:db_insert_placeholder_1] => 0 [:db_insert_placeholder_2] => Example.txt [:db_insert_placeholder_3] => public://Example.txt [:db_insert_placeholder_4] => text/plain [:db_insert_placeholder_5] => 447 [:db_insert_placeholder_6] => 1 [:db_insert_placeholder_7] => 1287789425 ) in system_update_7061() (line 2836 of modules/system/system.install).

Files: 
CommentFileSizeAuthor
#110 boooboo.patch803 byteschx
PASSED: [[SimpleTest]]: [MySQL] 39,006 pass(es). View
#104 966210-d7.patch5.39 KBdrumm
PASSED: [[SimpleTest]]: [MySQL] 38,707 pass(es). View
#93 966210-d7.patch4.15 KBdrumm
FAILED: [[SimpleTest]]: [MySQL] 38,531 pass(es), 72 fail(s), and 98 exception(s). View
#90 966210-d7.patch2.19 KBdrumm
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 966210-d7.patch. Unable to apply patch. See the log in the details link for more information. View
#86 core-file_managed_case-5707824-86.patch1.73 KBiamEAP
PASSED: [[SimpleTest]]: [MySQL] 35,353 pass(es). View
#72 966210_72.patch1.49 KBBTMash
FAILED: [[SimpleTest]]: [MySQL] 36,323 pass(es), 0 fail(s), and 1 exception(s). View
#68 drupal8.file-uri-case.68.patch1.51 KBsun
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch drupal8.file-uri-case.68.patch. Unable to apply patch. See the log in the details link for more information. View
#60 files_966210-D7.patch1.72 KBdrewish
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch files_966210-D7.patch. Unable to apply patch. See the log in the details link for more information. View
#54 files_966210.patch1.93 KBdrewish
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch files_966210.patch. Unable to apply patch. See the log in the details link for more information. View
#45 test_plus_patch.patch3.3 KBcatch
PASSED: [[SimpleTest]]: [MySQL] 33,620 pass(es). View
#43 test_only.patch1.06 KBcatch
FAILED: [[SimpleTest]]: [MySQL] 33,623 pass(es), 0 fail(s), and 1 exception(es). View
#39 test.patch889 bytescatch
PASSED: [[SimpleTest]]: [MySQL] 35,801 pass(es). View
#34 966210_db_case_sensitivity_D7.patch4.84 KBcatch
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 966210_db_case_sensitivity_D7.patch. Unable to apply patch. See the log in the details link for more information. View
#34 966210-by-mfb-bfroehle.patch2.68 KBcatch
PASSED: [[SimpleTest]]: [MySQL] 33,591 pass(es). View
#31 966210_db_case_sensitivity.patch4.84 KBmakara
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 966210_db_case_sensitivity.patch. View
#23 966210-by-mfb-bfroehle-Fixed-DB-Case-Sensitivi.patch2.68 KBbfroehle
PASSED: [[SimpleTest]]: [MySQL] 29,394 pass(es). View
#22 966210-database-binary.patch5.46 KBmfb
PASSED: [[SimpleTest]]: [MySQL] 31,571 pass(es). View
#21 966210-database-binary-D7.patch5.46 KBmfb
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 966210-database-binary-D7.patch. Unable to apply patch. See the log in the details link for more information. View
#10 966210-tests-only.patch1.74 KBbfroehle
FAILED: [[SimpleTest]]: [MySQL] 31,481 pass(es), 2 fail(s), and 1 exception(es). View
#10 966210-tests-and-fix.patch4.53 KBbfroehle
PASSED: [[SimpleTest]]: [MySQL] 31,565 pass(es). View
#8 966210-varchar-binary.patch3.27 KBmfb
PASSED: [[SimpleTest]]: [MySQL] 27,312 pass(es). View
#7 varchar-binary.patch3.02 KBmfb
PASSED: [[SimpleTest]]: [MySQL] 27,024 pass(es). View

Comments

catch’s picture

Priority: Major » Critical
Issue tags: +D7 upgrade path, +beta blocker

Bumping this to critical as well, seems like a common enough scenario that we should account for it.

mfb’s picture

Is it even possible to add the binary attribute to a varchar column via schema api?

moshe weitzman’s picture

Priority: Critical » Major
Status: Active » Needs work
Issue tags: -beta blocker

We should account for it, sure. That does not mean it is critical. Seems like a plain old bug to me, but I will meet in the middle.

mfb’s picture

Ideas so far:

  • Change file_managed.uri column to TINYBLOB so it's case-sensitive [edit: schema api supports TINYTEXT but not TINYBLOB]
  • Patch schema api to support VARCHAR BINARY attribute on mysql via 'binary' => TRUE. I believe this attribute could be ignored on non-mysql dbs as VARCHAR is already case-sensitive on postgres.
  • Affirm that file uri should be case-insensitive unique. Catch the exception and rename files as required, although this would break existing links to the files. By the way, I noticed that this same exception pops up when you upload a file via ajax file field that collides with an existing file, so we need to catch it there too.

Some filesystems that Drupal runs on are case-insensitive so might be a good idea to enforce that, as files could end up being moved from case-sensitive to case-insensitive filesystem. But this will require active enforcement as case-sensitive filesystems will happily create a new file which collides in the database but not on the filesystem.

catch’s picture

Status: Needs work » Active

I would go for this - "Affirm that file uri should be case-insensitive unique. Catch the exception and rename files as required, although this would break existing links to the files. By the way, I noticed that this same exception pops up when you upload a file via ajax file field that collides with an existing file, so we need to catch it there too."

file_save_upload() and file http://api.drupal.org/api/drupal/includes--file.inc/function/file_destin... already handle duplicate file names, is there any reason not to just use those here? I can live with breaking links compared to actually losing file upload records. There are plenty of other reasons why hard coded links to a file might break.

mfb’s picture

file_destination() uses file_exists() which is case-sensitive (on case-sensitive filesystems). we'd need a case-insensitive version of it. Or else we have to catch the database exception and then go back and rename the file after the fact.

mfb’s picture

Status: Active » Needs review
FileSize
3.02 KB
PASSED: [[SimpleTest]]: [MySQL] 27,024 pass(es). View

This is what it would take to support binary collation. With this patch there's a way to bring MySQL's case-insensitive unique keys, = operator, etc. in line with the default case-sensitive behavior in other databases. Supporting case-insensitive unique key + case-sensitive filesystem would be a bit more effort..

mfb’s picture

FileSize
3.27 KB
PASSED: [[SimpleTest]]: [MySQL] 27,312 pass(es). View

Well, except that the schema would have to be patched in system_update_7034 too to fix this bug :p

Berdir’s picture

I discovered a related problem with date formats, see #1012620: Unique key on date_formats.(format|type) is problematic for case insensitive collations. Maybe other tables/columns could be affected as well as this is quite a generic problem?

bfroehle’s picture

FileSize
4.53 KB
PASSED: [[SimpleTest]]: [MySQL] 31,565 pass(es). View
1.74 KB
FAILED: [[SimpleTest]]: [MySQL] 31,481 pass(es), 2 fail(s), and 1 exception(es). View

I've updated mfb's patch in #8 to account for additional schema updates since that patch was written. Also I've included an upgrade test for this, which I have run successfully on a MySQL-based Drupal installation. (Edit: tests also pass on a sqlite installation, so it must treat varchar as case sensitive as well).

bfroehle’s picture

Title: system_update_7061() fails on inserting files with same name but different case » DB Case Sensitivity: system_update_7061() fails on inserting files with same name but different case
Component: system.module » database system

Kicking this over to the database folks so one of them can look at the proposed BINARY attribute.

Berdir’s picture

+++ modules/system/system.install
@@ -818,6 +818,7 @@ function system_schema() {
@@ -2203,6 +2204,7 @@ function system_update_7034() {

@@ -2203,6 +2204,7 @@ function system_update_7034() {
         'length' => 255,
         'not null' => TRUE,
         'default' => '',
+        'binary' => TRUE,
       ),
       'filemime' => array(
         'description' => "The file's MIME type.",

No point in changing an existing update function, it won't be run again for already upgraded systems and the others will run 7070 anyway.

Powered by Dreditor.

mfb’s picture

@Berdir, the reason for this is to patch the schema prior to inserting data in the file_managed table @ system_update_7061 (to prevent the bug this issue is about)

bfroehle’s picture

Component: database system » system.module

Just marked #1049938: upgrade 6->7 system_update_7061 failed as a duplicate. +1 to RTBC'ing this.

justdave’s picture

I just encountered this in my d6 -> d7 upgrade. Had to delete a file (fortunately it was one a user had accidentally uploaded twice and somehow changed the case of the extension in between, so it really was a duplicate and I could just nuke it from the files table) to make my upgrade work.

Alan D.’s picture

This is possible a more widespread issue. Along with #1068132: Duplicate files are not renamed successfully, I found a similar thing in hook_date_formats() that has an unique format index in the database. Thus this:

/**
 * Implements hook_date_formats().
 */
function almanac_date_formats() {
  return array(
    // MySQL throws an error here if both are returned.
    array('type' => 'time', 'format' => 'g:i a', 'locales' => array()),
    array('type' => 'time', 'format' => 'g:i A', 'locales' => array()),
  );
}

threw a PDO exception.

Are there ways to make the db layer case sensitive rather than refactoring the data types?

bfroehle’s picture

#10: 966210-tests-and-fix.patch queued for re-testing.

Berdir’s picture

@Alan D.: There is already an issue open about the date issue, see #1012620: Unique key on date_formats.(format|type) is problematic for case insensitive collations. We do need this patch to get in before that can be fixed.

swentel’s picture

Annoying bug, can confirm the patch works, but I'll let the database guys set it on RTBC.

- edit - Note, this is also needed for vannilla D7 installs.

bfroehle’s picture

Component: system.module » database system

Any of the database folks want to chime in?

mfb’s picture

FileSize
5.46 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 966210-database-binary-D7.patch. Unable to apply patch. See the log in the details link for more information. View

Rerolled with just a slight tweak to the logic in createFieldSql() method, to support BINARY attribute whether or not length is present. VARCHAR columns require length attribute, but we also want to support other column types (TEXT etc.) for which length is not required.

mfb’s picture

FileSize
5.46 KB
PASSED: [[SimpleTest]]: [MySQL] 31,571 pass(es). View

renaming file so it gets tested...

bfroehle’s picture

Version: 7.x-dev » 8.x-dev
Issue tags: +needs backport to D7
FileSize
2.68 KB
PASSED: [[SimpleTest]]: [MySQL] 29,394 pass(es). View

I've created a version of #22 for D8. This omits the changes to system_update_7061() and the new system_update_7070(), but does include the additional 'binary' option in the DB schema for the URI column of {file_managed}.

THIS PATCH SHOULD NOT BE APPLIED TO D7 --- instead use the patch in #22.

Damien Tournoud’s picture

Status: Needs review » Needs work
Issue tags: +API change
+ *     - 'binary': For type 'char', 'varchar' or 'text' fields on MySQL, forces
+ *       a case-sensitive binary collation. This has no effect on other database
+ *       types for which case sensitivity is already the default behavior.

Ok on the principle for this API change, but this is not at all specific to MySQL. A binary column should be able to store any sequence of bytes. On PostgreSQL, those columns should be implemented as bytea (PostgreSQL doesn't have a fixed-length binary type). Nothing is needed for SQLite (the default collation there is BINARY).

So needs work on:

  • Adding the PostgreSQL implementation,
  • Fixing the doc block to remove any mention of MySQL,
  • Removing 'text' from this list: we already have a blob type.
mfb’s picture

Status: Needs work » Needs review

The BINARY attribute in MySQL is not to store binary data, that would be VARBINARY and BLOB. It's to store text with a defined character set but a binary case-sensitive collation. The BINARY attribute automatically chooses an appropriate collation for the column's character set. E.g. varchar(255) BINARY is equivalent to varchar(255) CHARACTER SET utf8 COLLATE utf8_bin if the table has DEFAULT CHARSET=utf8.

For Postgres and Sqlite, operations on varchar and text columns are case-sensitive by default, so as far as I know setting binary collation is only useful for MySQL.

mdm’s picture

I'm encountering this bug trying to upgrade 6.22 to 7.2 and it's blocking a full upgrade. I've tried applying the patch in #22 to a copy of Drupal 7.2 and it causes update.php to return an HTTP 416 error (Requested Range not satisfiable) when I point my browser at it. Is there any way I can work around this problem with system_update_7061?

catch’s picture

That sounds like a chrome bug and not related to this patch. Try clearing browser history or running the update with firefox.

sun’s picture

cmsproducer’s picture

Issue tags: +D6 upgrade

I have just worked through a very similar issue, but for me the problem did not seem to be rooted in the existence of multiple filenames with the same name but different case. To be sure, I even truncated the DB table 'upload' and was still not able to go beyond that DB schema update issue.

Looking in schema.inc and system.install, I see that the error is meant to be thrown if the upload DB table exists... which points to deleting the table as a solutions. Here is what I did to complete the update - Solution to Drupal 6 to 7 Upload/File module Database Schema update error. In my case, I did not have many/any uploads to worry about loosing, so I just dropped the table. I am wondering if this check for the table in system.install is preceded by some code to migrate the data from the upload module to the file CCK - Should a file-CCK field be created during installation/upgrade and associated with the data already in the upload table?

bfroehle’s picture

... any more feedback? There does need to be a way to force case sensitive collation in MySQL, for both this issue and #1012620: Unique key on date_formats.(format|type) is problematic for case insensitive collations.

makara’s picture

FileSize
4.84 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 966210_db_case_sensitivity.patch. View

Rebuilt the patch in #22 for D7, because we are now on system_update_7072().

Status: Needs review » Needs work

The last submitted patch, 966210_db_case_sensitivity.patch, failed testing.

makara’s picture

How can we get it tested on 7.x?

catch’s picture

Priority: Major » Critical
Status: Needs work » Needs review
FileSize
2.68 KB
PASSED: [[SimpleTest]]: [MySQL] 33,591 pass(es). View
4.84 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 966210_db_case_sensitivity_D7.patch. Unable to apply patch. See the log in the details link for more information. View

Since this means fatal errors in the Drupal 7 upgrade path, and we have no idea how many sites are affected, I'm bumping this back to critical.

I'm re-uploading both D8 and D7 patches, but named so the bot doesn't choke. No commit credit please.

@makara - you need to apply the D7 patch to Drupal 7, see http://drupal.org/patch/apply

sun’s picture

Issue tags: -D6 upgrade

Hm. This will potentially lead to an inconsistent/unpredictable behavior on Windows-based servers, since Windows' filesystem is case-insensitive. :-/

bfroehle’s picture

There are some other uses for this that don't relate to the filesystem --- see, for example, #1012620: Unique key on date_formats.(format|type) is problematic for case insensitive collations. So regardless of the schema change to the {file_managed} table I think allowing the option of forcing case insensitive collation is important.

xjm’s picture

Tagging issues not yet using summary template.

catch’s picture

FileSize
889 bytes
PASSED: [[SimpleTest]]: [MySQL] 35,801 pass(es). View

Wonder if this is enough to make tests fail.

Status: Needs review » Needs work

The last submitted patch, test.patch, failed testing.

catch’s picture

Version: 8.x-dev » 7.x-dev
Status: Needs work » Needs review

Moving down to 7 so the test can run. Likely we need a different test that just tests file insertion, but might as well let the bot run this while I'm on my way home.

catch’s picture

#39: test.patch queued for re-testing.

catch’s picture

Version: 7.x-dev » 8.x-dev
FileSize
1.06 KB
FAILED: [[SimpleTest]]: [MySQL] 33,623 pass(es), 0 fail(s), and 1 exception(es). View

Here's a test that's not tied to the upgrade path so could be applied to both D8 and D7, also fails locally with utf8_general_ci collation.

Status: Needs review » Needs work

The last submitted patch, test_only.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
3.3 KB
PASSED: [[SimpleTest]]: [MySQL] 33,620 pass(es). View

Drupal 8 patch combined.

sun’s picture

+++ b/modules/system/system.install
@@ -808,6 +808,7 @@ function system_schema() {
+        'binary' => TRUE,

My remark about Windows case-insensitive filesystems still stands though... or do we already check OS-level validity elsewhere before we reach the DB-level?

29 days to next Drupal core point release.

catch’s picture

Looks like we don't to me.

We could go back to #5 and open a separate issue for the schema addition?

sun’s picture

yeah, I think there are two issues combined in one here: 1) Allow to use BINARY, and 2) Fix the upgrade path somehow. The former could be moved into a separate issue as an API addition (to be backported).

catch’s picture

So if windows is case-insensitive, then maybe file_unmanaged_copy() should handle this somewhere. Probably requires normalizing names to lowercase or similar before doing anything with them.

That still leaves the upgrade path.

mfb’s picture

The only problem I know of re: case-insensitive filesystems would be if a Drupal site is migrated from a case-sensitive filesystem to a case-insensitive filesystem. Then there will be filename collisions when the files directory is copied over. However, this has been a problem with all previous Drupal versions, so it's arguably not being introduced by this patch.. :)

bfroehle’s picture

In the spirit of

yeah, I think there are two issues combined in one here: 1) Allow to use BINARY, and 2) Fix the upgrade path somehow. The former could be moved into a separate issue as an API addition (to be backported).

I've created a separate issue (#1237252: DB Case Sensitivity: Allow BINARY attribute in MySQL) which hopefully will be less controversial and adopted more quickly.

xjm’s picture

Summary added.

xjm’s picture

Status: Needs review » Needs work

Setting NW based on #47-#50.

drewish’s picture

FileSize
1.93 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch files_966210.patch. Unable to apply patch. See the log in the details link for more information. View

Here's a re-roll that omits the bits that have been moved over to #1237252: DB Case Sensitivity: Allow BINARY attribute in MySQL.

xjm’s picture

Status: Needs work » Needs review
bfroehle’s picture

The test will fail, of course, because #1237252: DB Case Sensitivity: Allow BINARY attribute in MySQL isn't in yet.

Status: Needs review » Needs work

The last submitted patch, files_966210.patch, failed testing.

Jody Lynn’s picture

I think the patch will also need a change to system_update_7061 as I still got

Integrity constraint violation: 1062 Duplicate entry 'public://foo.jpg' for key 'uri'

I added binary attribute to files_managed.uri and files.filename and that seemed to do the trick. We'll need to do this at the start of 7061.

Jody Lynn’s picture

Status: Needs work » Needs review

NM, it's just that the above patch is for D8. For the D7 backport we also need to set files_managed.uri as binary when that table is created in system_update_7034.

drewish’s picture

Status: Needs review » Needs work
FileSize
1.72 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch files_966210-D7.patch. Unable to apply patch. See the log in the details link for more information. View

Here's a D7 version of the patch which changes system_update_7034() to include the binary flag. I suppose this is necessary to get D6 sites upgraded but it seems like we'll need another update to catch sites that have already launched on D7 or were already upgraded.

xjm’s picture

Status: Needs work » Needs review

Edit: Reading comprehension fail.

xjm’s picture

Status: Needs review » Needs work

And setting status back. Sorry for noise.

thill_’s picture

The patch in #60 did not work for me in solving the failure in 7061

clemens.tolboom’s picture

Not sure whether this is related but my D6 files table has multiple rows for the same filename.

Only the uid differ in this case. From the D6 253 entries I have an undubbed amount of 179 from which 119 end within D7.

/* D6 (undubbed) */
select filepath, filename, count(*) from files group by filepath, filename;
/ * D7 */
select count(*) from file_managed;

Example D6 output

mysql> select * from files where filename like 'aa%';
+-----+-----+-------------------------+-------------------------------+--------------------+----------+--------+-----------+
| fid | uid | filename                | filepath                      | filemime           | filesize | status | timestamp |
+-----+-----+-------------------------+-------------------------------+--------------------+----------+--------+-----------+
|  56 |   1 | AAR_Programme_clean.doc | files/AAR_Programme_clean.doc | application/msword |    37376 |      1 |         0 | 
| 274 |   2 | AAR_Programme_clean.doc | files/AAR_Programme_clean.doc | application/msword |    37376 |      1 |         0 | 
+-----+-----+-------------------------+-------------------------------+--------------------+----------+--------+-----------+
2 rows in set (0.00 sec)

leading to

Update #7061

    Failed: PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'public://AAR_Programme_clean.doc' for key 2: INSERT INTO {file_managed} (fid, uid, filename, uri, filemime, filesize, status, timestamp) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7); Array ( [:db_insert_placeholder_0] => 274 [:db_insert_placeholder_1] => 2 [:db_insert_placeholder_2] => AAR_Programme_clean.doc [:db_insert_placeholder_3] => public://AAR_Programme_clean.doc [:db_insert_placeholder_4] => application/msword [:db_insert_placeholder_5] => 37376 [:db_insert_placeholder_6] => 1 [:db_insert_placeholder_7] => 0 ) in system_update_7061() (line 2816 of /home/.../modules/system/system.install).
clemens.tolboom’s picture

xanchez’s picture

Thx #64 that really helped me

clemens.tolboom’s picture

@xanchez in what way did #65 helped you? What did you do to solve your problems?

I rechecked my site migration code to 'discover' my duplicate files seem to come from unpublished nodes. These come from a Drupal 5 site and are now deleted when upgrading to D6 thus preventing the D7 upgrade problem for system_update_7061 to fail.

I just deleted the files for unpublished nodes like 'DELETE FROM files INNER JOIN node n ON files.nid = n.nid WHERE n.status = 0'

Searching for 'duplicate files' unpublished I found a D6 mention how rules could produce the duplicates #457540: FileFramework + Rules = Duplicate Files.

sun’s picture

FileSize
1.51 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch drupal8.file-uri-case.68.patch. Unable to apply patch. See the log in the details link for more information. View
giorgio79’s picture

#29 solved it for me thanks

asoqa’s picture

Status: Needs work » Needs review
Issue tags: -API change, -D7 upgrade path, -needs backport to D7

#54: files_966210.patch queued for re-testing.

Status: Needs review » Needs work
Issue tags: +API change, +D7 upgrade path, +needs backport to D7

The last submitted patch, drupal8.file-uri-case.68.patch, failed testing.

BTMash’s picture

Status: Needs work » Needs review
FileSize
1.49 KB
FAILED: [[SimpleTest]]: [MySQL] 36,323 pass(es), 0 fail(s), and 1 exception(s). View

Reroll of #68. As @sun stated, it still depends on #1237252: DB Case Sensitivity: Allow BINARY attribute in MySQL.

Status: Needs review » Needs work

The last submitted patch, 966210_72.patch, failed testing.

bvanmeurs’s picture

clemens.tolboom’s picture

@bvanmeurs: Please add your steps from #1475124: Wrong charset for file_managed.uri causes Integrity constraint violation when inserting two filenames with different casing. to reproduce to this issues summary as there are great.

And :p ...
[Stock response from Dreditor templates and macros.]

Please do not subscribe anymore. Just click the Follow button next to the issue summary.

bvanmeurs’s picture

To reproduce:

  • Create two files on your desktop 'tree.jpg' and 'Tree.jpg' in separate folders.
  • On a Linux installation (or another case-sensitive file system) of Drupal, open a (new) node that has a field of type 'file'
  • Add the file 'tree.jpg' (don't save yet)
  • Then add the file 'Tree.jpg'
  • Now you get the following error:
    PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'public://Bomen.jpg' for key 2: INSERT INTO {file_managed} (uid, filename, uri, filemime, filesize, status, timestamp) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6); Array
    (
    [:db_insert_placeholder_0] => 1
    [:db_insert_placeholder_1] => Bomen.jpg
    [:db_insert_placeholder_2] => public://Bomen.jpg
    [:db_insert_placeholder_3] => image/jpeg
    [:db_insert_placeholder_4] => 770042
    [:db_insert_placeholder_5] => 0
    [:db_insert_placeholder_6] => 1331298860
    )

    (notice I used the files bomen.jpg and Bomen.jpg, the Dutch translation of trees ;-))

The error is obviously due to the fact that the column file_managed.uri is case insensitive (collation utf_general_ci) whereas it should have a case sensitive collation.

Solution: change the column to utf8_bin. This collation is case sensitive.

shumushin’s picture

Thanks bvanmeurs, you saved my day! Perfect fix.

And seriously, guys, this is one of the issues that blocks many people from moving from D6 to D7. Almost everyone use image field in D6.

iamEAP’s picture

Don't mean to add noise, but there was a very lengthy queue in CCK's content migrate script that I believe is a duplicate of this: #1176186: D6 -> D7 upgrade: Duplicate files cause Integrity constraint violation: 1062 Duplicate entry 'xx' for key 'PRIMARY'.

Just reiterating that this is indeed critical.

Berdir’s picture

Status: Needs work » Needs review
Issue tags: -API change, -D7 upgrade path, -needs backport to D7

#72: 966210_72.patch queued for re-testing.

Status: Needs review » Needs work
Issue tags: +API change, +D7 upgrade path, +needs backport to D7

The last submitted patch, 966210_72.patch, failed testing.

iamEAP’s picture

Note that #72 was not a perfect re-roll; #68 has binary set on uri while #72 has it set on filename.

apratt’s picture

@iamEAP Doesn't it need to be set on both uri and filename?

Does system_update_7034 also need to have binary set in both uri and filename since it is creating the same table?

iamEAP’s picture

@apratt: yes to all of the above.

Berdir’s picture

Status: Needs work » Needs review
Issue tags: -API change, -D7 upgrade path, -needs backport to D7

#72: 966210_72.patch queued for re-testing.

Edit: The BINARY patch got in, triggering a re-test to see if this passed now. Will still need to be extend as discussed above.

Status: Needs review » Needs work
Issue tags: +API change, +D7 upgrade path, +needs backport to D7

The last submitted patch, 966210_72.patch, failed testing.

iamEAP’s picture

Status: Needs work » Needs review
FileSize
1.73 KB
PASSED: [[SimpleTest]]: [MySQL] 35,353 pass(es). View

Re-rolled for new test location, adding binary attribute to both URI and filename.

bfroehle’s picture

This will need an update script, right? (So that existing installations will also gain the proper collation?)

iamEAP’s picture

That's correct, but only in the 7 backport. This 8.x version won't need it.

webchick’s picture

Issue tags: +Drupal.org D7
drumm’s picture

FileSize
2.19 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 966210-d7.patch. Unable to apply patch. See the log in the details link for more information. View

Status: Needs review » Needs work

The last submitted patch, 966210-d7.patch, failed testing.

drumm’s picture

Status: Needs work » Needs review

8.x is still good, and 7.x should be good with the dependency. I haven't actually tested it on 8.x myself, so I won't say rtbc.

drumm’s picture

Status: Needs review » Needs work
FileSize
4.15 KB
FAILED: [[SimpleTest]]: [MySQL] 38,531 pass(es), 72 fail(s), and 98 exception(s). View

We also need to add binary to system_update_7061() and add another update to ensure it happens on previously-upgraded sites. 8.x needs these. I think the 7.x patch here is okay.

iamEAP’s picture

@drumm What does 8.x need? Anyone installing D8 fresh would get the binary attribute by virtue of it being in the schema definition, and anyone upgrading from 7.x would have to update to the latest 7.x as part of the upgrade process anyway, thereby getting your system_update_7073() anyway.

Unless I'm mistaken in my assumptions about what changes need to be made in 8.x.

drumm’s picture

Status: Needs work » Reviewed & tested by the community

That makes sense.

Status: Reviewed & tested by the community » Needs work
Issue tags: -API change, -D7 upgrade path, -needs backport to D7, -Drupal.org D7

The last submitted patch, 966210-d7.patch, failed testing.

iamEAP’s picture

Status: Needs work » Needs review
Issue tags: +API change, +D7 upgrade path, +needs backport to D7, +Drupal.org D7
webchick’s picture

#1237252: DB Case Sensitivity: Allow BINARY attribute in MySQL has been committed now, so I think this should be unblocked.

Looks like #86 is the patch for D8?

iamEAP’s picture

Correct!

catch’s picture

Component: database system » file system
Status: Needs review » Reviewed & tested by the community

Marking #86 RTBC.

D7 patches posted against Drupal 8 now need to have a do_not_test suffix to avoid the test bot fwiw.

webchick’s picture

Version: 8.x-dev » 7.x-dev
Status: Reviewed & tested by the community » Needs review

Committed and pushed #86 to 8.x.

Marking needs review for #93 on 7.x.

webchick’s picture

Status: Needs review » Needs work
Issue tags: +API change, +D7 upgrade path, +needs backport to D7, +Drupal.org D7

The last submitted patch, 966210-d7.patch, failed testing.

drumm’s picture

Assigned: Unassigned » drumm
Status: Needs work » Needs review
FileSize
5.39 KB
PASSED: [[SimpleTest]]: [MySQL] 38,707 pass(es). View

Better patch.

iamEAP’s picture

Picked a good time to upgrade my site. Glad these issues are getting plenty of attention via d.o upgrade.

#104 has my vote. Additional +1s for RTBC would be great.

Oleksa’s picture

patch #104 worked for me.

BTMash’s picture

I had a chance to look through the patch (which was a bit confusing since it build changing lines that were being added by the patch itself). It looks very good. I'd like to say this is +1 on rtbc, but I'd like to know what is going to happen to those that are using sqlite or postgres. #1518506: Normalize how case sensitivity is handled across database engines has had no movement so we should try and atleast have some manual testing on sqlite/postgres to see if it works properly. I'm testing out sqlite now so I'll update with what happens from that.

BTMash’s picture

I had a chance to test things out with sqlite and that worked well. I'll have to see if I can install postgres for testing purposes but if someone else is able to get to it before then, I would appreciate it a lot :)

webchick’s picture

Status: Needs review » Fixed

Perfect. If SQLite works I'm fairly confident that pgsql will work, since this patch was mainly to work around MySQL weirdness, and SQLite is un-weird in the same way as pgsql is un-weird, as far as I know. According to http://drupaltesting.org/ we're not currently passing tests in any of our alternate database systems. :( So I don't imagine this will make things worse, though it could. OTOH, there are a lot of people hitting this and it'd be great to get this critical closed. So I'm going to take a risk and commit this.

Committed and pushed to 7.x. THANK YOU!!!!

chx’s picture

Status: Fixed » Needs review
FileSize
803 bytes
PASSED: [[SimpleTest]]: [MySQL] 39,006 pass(es). View

OPSIE!

webchick’s picture

Status: Needs review » Fixed

Whew. Thanks. Committed and pushed to 7.x.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Dave Reid’s picture

I'm really really sorry to add this comment on a closed issue, but there appears to be no actual explanation or reasoning why {file_managed}.filename was made a binary column as well. Unlike {file_managed}.uri, it is not a unique index or primary key. It simply stores the file name. WHY WHY WHY was this converted? This makes it impossible for us to add Views filters for {file_managed}.filename that are case insensitive without having to hack around the system. This makes the Media library much more difficult for users. I don't see any reasoning for making this change.

Dave Reid’s picture

Follow up with #1691438: Regression: {file_managed}.filename should NOT be binary (and case-sensitive by default) please. I have no idea why comment #82/86 was allowed to add to the patch without any validation.

bfroehle’s picture

Yep, big wtf.

webchick’s picture

In the future, I'd recommend not commenting on a closed issue, but rather re-opening it, since I never saw this and only caught #1691438: Regression: {file_managed}.filename should NOT be binary (and case-sensitive by default) because it made it to the RTBC queue.

Dave Reid’s picture

I already apologized for commenting on a closed issue, but it was important to notify people that participated in this issue about something that should not have made it in and got missed in review.

webchick’s picture

Right. I'm saying that I don't think it actually notified people. :) I almost missed this reply again! Sneaky closed status. :)

So in the future if a problem like this comes up, and it can be cleanly linked to one particular commit that already has a bunch of people on it, probably makes more sense to re-open and comment there w/ follow-ups/patches. That's all.

webchick’s picture

Issue summary: View changes

Updated issue summary.

kenorb’s picture

kenorb’s picture

I'm experiencing still similar issue, but somehow I've identical files in files table (Private Upload?).

SQL Query to identify the duplicates:

SELECT * FROM files f WHERE filepath LIKE "%my_filename%"

So here is SQL query to remove these duplicates leaving the highest fid:

DELETE n1 FROM files n1, files n2 WHERE n1.fid > n2.fid AND n1.filepath = n2.filepath

Note that you can't run system_update_7061() twice, as it'll generate some other error (e.g. 1062 Duplicate entry '10' for key 'PRIMARY), so you should do this only at the first time and only once.

turbogeek’s picture

Thanks kenorb, your solution worked. As mentioned, system_update_7061() will generate another error if it's ran again, so I started fresh with my backed up DB then checked for filepath duplicates in the files table (there was only one in my case) and ran the update.