Problem/Motivation
You get an error while updating from a D6 site
system module
Update #7061
Failed: PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'public://sites/default/files/HotspotFinder.apk' for key 'uri': INSERT INTO {file_managed} (fid, uid, filename, uri, filemime, filesize, status, timestamp) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3,
Proposed resolution
This issue has no solutions yet.
workaround / analysis
As boran tells in #2 you could analyze your files table for the fids
Some SQL to help you identify the duplicate filepath.
SELECT count(*), min(fid), max(fid), filepath FROM files GROUP BY filepath HAVING count(*) > 1 ORDER BY count(*);
Next you could delete the lowest fids with the next SQLs. Be aware that this does not fix the problem but you could test other parts of the upgrade process. Some node/user/? could mis there FIDs
CREATE TABLE temp_fids AS SELECT min(fid) fid FROM files GROUP BY filepath HAVING count(*)>1;
DELETE FROM files WHERE fid in (SELECT fid FROM temp_fids);
As observed in #11 a relation between published and unpublished nodes could be a cause.
Remaining tasks
Is this related to #966210: DB Case Sensitivity: system_update_7061() fails on inserting files with same name but different case
Original report by boran
system module
Update #7061
Failed: PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'public://sites/default/files/HotspotFinder.apk' for key 'uri': INSERT INTO {file_managed} (fid, uid, filename, uri, filemime, filesize, status, timestamp) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7); Array ( [:db_insert_placeholder_0] => 56 [:db_insert_placeholder_1] => 38 [:db_insert_placeholder_2] => HotspotFinder.apk [:db_insert_placeholder_3] => public://sites/default/files/HotspotFinder.apk [:db_insert_placeholder_4] => application/octet-stream [:db_insert_placeholder_5] => 507667 [:db_insert_placeholder_6] => 1 [:db_insert_placeholder_7] => 1253524245 ) in system_update_7061() (line 2808 of /disk2/www/drupal7/modules/system/system.install).
On the D6 site there are entries in the file table with the same filepath:
| fid | uid | filename | filepath | filemime | filesize | status | timestamp
| 54 | 38 | HotspotFinder.apk | sites/default/files/HotspotFinder.apk | application/octet-stream | 178090 | 1 | 1252920928 |
| 56 | 38 | HotspotFinder.apk | sites/default/files/HotspotFinder.apk | application/octet-stream | 507667 | 1 | 1253524245 |
On D7 the file_managed table has the uri column as a unique key.
So the upgrade script fails, as noted above, when one tries to insert a file with the same path.
possible solutions:
- if there duplicates, only upgrade the first one, reference that, and issues a warning to the User. The user will need to manually update any non core modules that might reference the duplicate file entries.
- make a copy of the file for each duplicate and reference that
Comment | File | Size | Author |
---|---|---|---|
#135 | log.txt | 8.54 KB | drzraf |
#124 | interdiff.txt | 2.62 KB | David_Rothstein |
#124 | 1260938-124.patch | 7.97 KB | David_Rothstein |
#123 | interdiff.txt | 3.57 KB | dawehner |
#123 | 1260938-123.patch | 8.01 KB | dawehner |
Comments
Comment #1
wessex CreditAttribution: wessex commentedI have just experienced this error too, upgrading from 6.22 > 7.8.
Is there a workaround for the time being?
Comment #2
boran CreditAttribution: boran commentedI was working on a patch to handle the first file and warn about the others, but am lacking time right now.
Suggestion:
query your file table to see what duplicates you have (do a group by on the filepath). If these are very few:
- note the fid, and find out in what other tables reference the fid (cck fields, nodes.. )
- manually delete the duplicates from the file table,
- upgrade to 7.x
- manually correct the affected nodes/fields or whatever the affected file was attached too.
Comment #3
catchDuplicate of #966210: DB Case Sensitivity: system_update_7061() fails on inserting files with same name but different case.
Comment #4
clemens.tolboom@catch: this report does not have different case file names.
I reopened this as I guess the root case here is different.
As I commented on #966210: DB Case Sensitivity: system_update_7061() fails on inserting files with same name but different case (http://drupal.org/node/966210#comment-5116402) my system had no case sensitive problem but duplicate entries based on uid in the files table.
What worries me here it the mentioned files have different sizes. Which one should be used?
Comment #5
clemens.tolboomI tried to delete the duplicate file rows by first checking for how many duplicates there are
[edit: replaced filename by filepath see comment #6]
then define a delete query
which is not allowed by mysql so I created a temp table having the lowest fid to delete
The delete the rows to further test the upgrade.
my 2cents
Comment #5.0
clemens.tolboomfix closing tag
Comment #6
clemens.tolboomHmmm comment #5 should mention filepath instead of filename as this is about the D7 file URI schemes like public://img.png
Comment #7
Deciphered CreditAttribution: Deciphered commented#5,
Based on information on the MySQL site, this is the query I used to workaround this issue myself:
Basically does the same thing.
I still don't know what effect it will have on the site to have removed columns from the Files table, but I would guess that it's not really a good thing. If anything, there should be some sort of find and replace functionality that replaces the removed FIDs with the leftover FIDs.
Comment #8
clemens.tolboom@Deciphered #7
What you did was dropping the unique constraint on the filepath column. And others.
(I should keep the files table by renaming it instead of dropping)
So yes ... that is a nice temporary workaround for testing the rest of the upgrade process.
But no ... I guess when you delete one of the fids through the UI the FILE will be deleted too. So the other fid becomes useless.
Comment #9
catchSo there was never a unique key on filepath in Drupal 6, but we have a unique key on uri in D7. This looks like at least major to me. Also tagging with upgrade path.
Comment #10
catchComment #10.0
clemens.tolboomUpdated issue summary.
Comment #11
clemens.tolboomA very weird observation I have is this SQL. Note this is data from a D5 site.
This give a list like this
That is _ALL_ duplicate filepath entries have different node.status
Maybe the cause was #92820: uploaded file attached to multiple nodes when submitting with two node/adds open
So my fix can be deleting all the files of all unpublished nodes and let the site owner repair them based on status and title.
[edit]
DELETE files FROM files INNER JOIN node n ON files.nid = n.nid WHERE n.status = 0;
[/edit]
(Puzzle for this particular site is why they created duplicate node entries in the first place)
Comment #12
clemens.tolboomA problem with #11 is the NIDs between the unpublished and published nodes are not next or near to each other. They differ between 94 - 220. But the published nodes all have consecutive NIDs
I should analyze the created and updated times. Maybe next time as I still have to upgrade a site :)
Comment #13
giorgio79 CreditAttribution: giorgio79 commentedAppreciated Clemens, your solution at #5 allowed me to execute the system update, so far so good. I have other errors now with other modules but hopefully I am a step closer for a functioning d7 site :)
Comment #14
markwk CreditAttribution: markwk commentedRunning into same error. I'd say this major.
Comment #15
clemens.tolboom@markwk: it is quite useless just saying me too :)
Please tell us more about your particular site. And ... :)
[Stock response from Dreditor templates and macros.]
Please update the issue summary by editing the issue. This helps people to understand this issue quicker.
Comment #16
markwk CreditAttribution: markwk commented@clemens.tolboom: it's even more "quite useless" to say what is useless or useful for someone else :P
I'm updating a site to D7 that started in something hybrid vBulletin/D5 and now in D6. It has lots of files and due to the nature of uploads quite a few have same names with different cases. I'm currently just deleting the duplicates in the db so I can move forward with other upgrade issues.
I don't really feel like digging into a core issue/patch to solve this issue so revising my end to make Drupal upgrade and be happy.
I haven't gotten around to coding a solution but since it involves only 60-100 files, I'll probably add two more steps to the bash script:
1. to take the duplicate files and append a _1 to the file, so where filename.doc and FileName.doc are creating problems, the more recent one will become Filename_1.doc.
2. rename the filepath with the revised file name.
Comment #17
Heine CreditAttribution: Heine commented@markwk, this issue is not about files that only differ in case. Issue #966210: DB Case Sensitivity: system_update_7061() fails on inserting files with same name but different case deals with those.
I too encountered this on a D6-D7 upgrade on a site that was upgraded from D5. The issue seems to be caused by node revisions created in the D5-site. Even though the revisions in my case point to the same filepath, they are all linked to different fids via the upload modules db table.
If others can confirm, and we can indeed pinpoint upload as the culprit a proper cause of action would be to find all fids pointing to the same filepath, keep one record and update the upload table to replace the spurious fids with the saved one. A good place to do so would IMO be in a new hook_update_N for a future D6 release.
Comment #18
boran CreditAttribution: boran commentedI don't see how we can do a generic fix, unless one decides to take the first file, ignore the duplicates and print a warning.
The impact of ignoring the duplicate will be site specific, i.e. could make little difference or mean content is lost.
However by not providing an automated fix, sysadmins are obliged to dive into the sql and make the fixes manually. Although they lose time, they are at least more aware of the data loss risks. This issue doe not seem to affect that many sites? (would be nice if one could se the number of followers to this thread :)
Comment #19
Heine CreditAttribution: Heine commented@boran, if the cause of this is the D5 upload module (and not the filesystem itself), it should be possible to provide a generic fix.
Comment #20
boran CreditAttribution: boran commentedAh. In my case there only D6. no D5.
Comment #21
scott.whittaker CreditAttribution: scott.whittaker commentedI'm also having this issue as a result of upgrading a large site from Drupal 5 to 6 and finally 7. I have over 9,000 files with duplicate names, between 2 and 2,500 copies of each, leaving a total of over 28,000 conflicted files.
None of these are due to revisions. Simply deleting references to copies is not an option since that will result in thousands of pages with missing content. That only leaves a few options, none of them good:
1. Change the definition of the file_managed table to allow the uri to be non-unique. That would be quick and easy, but I have no idea what problems that might cause down the line.
2. Create a table mapping duplicate file paths to the first fid with that path, reduce the file_managed table and work through all fields that reference files, replacing the fids. This might also have issues down the line since multiple content items will be referencing the same file in the table, and who knows what will happen if one of those content items delete a referenced file?
3. Make sure I have a copy of all the referenced files and make a script to duplicate and rename all the referenced files, then update the files table with the new filenames and filepaths before migrating again. This would probably be the safest but will result in thousands of duplicate files, not to mention being time consuming to implement.
Any advice?
Comment #22
Heine CreditAttribution: Heine commentedScott, can you check if those files were managed by the upload module?
Comment #23
clemens.tolboom@Heine How could one check for that? In #17 you mentioned the upload module. Could you elaborate a little on that? Ie some sql example?
Comment #24
Jumoke CreditAttribution: Jumoke commentedScott i am upgrading a site from D5, I have run into the same issue with multiple duplicate files. What was your resolution?
Comment #25
James Tipler CreditAttribution: James Tipler commentedI hope I'm not about to post this twice, post failed last time.
This script resolves the issue by passing over the database generating symlinks for each duplicate file, and then updating the "filepath" attribute, so that when the URI's are generated, they are all to unique locations. Not an ideal solution, but if you have less than a couple of thousand duplicates it only causes 2 or 3 megabytes of fluff in the file system.
I was a bit lazy and created it with absolute paths, so you'll need to adjust it a bit to be appropriate to your specific drupal installation.
Obviously this assumes a Linux based install.
Hope that helps someone.
Comment #26
Rhino CreditAttribution: Rhino commentedIs there no better solution for this? Like Scott I have several thousand files and am moving up from Drupal 4.7-5-6 and now I wanted to upgrade to 7. Should i even bother? What happens between 7 and 8?
Comment #27
Danny Englander@James Tipler - I am curious about your code in #25, where do you put that and at what point does it get run? Thanks.
Comment #28
splash112 CreditAttribution: splash112 commentedHad the same problem upgrading a module
Could a solution be something like:
At least the upgrade now works and the file can be associated with the right entity.
Comment #29
fgmSame problem here, on a site upgraded from 4.7 to 5 then 6, and now upgrading to 7. After looking into it a bit, I think it happens because in earlier versions upload could allow different revisions of the same node to use the same uploaded file, whereas more recent versions (D6) renamed duplicate files. In my case, I notice the entries in {upload} have the same description, belong to the same nid, and different file objects have been created: notice how the first entry is for fid 129 and all the later ones for fid 134 bound to the same actual file path. (note: file and site names anonymized)
fid vid description list nid weight
129 786 bar.jpg 0 217 0
134 792 bar.jpg 0 217 0
134 795 bar.jpg 0 217 0
134 796 bar.jpg 0 217 0
134 799 bar.jpg 0 217 0
134 803 bar.jpg 0 217 0
134 813 bar.jpg 0 217 0
134 815 bar.jpg 0 217 0
In turn, just before update 7061, this gives these entries in {files}
fid uid filename filepath filemime filesize status timestamp
129 57 bar.jpg sites/foo/files/bar.jpg image/jpeg 36131 1 0
134 57 bar.jpg sites/foo/files/bar.jpg image/jpeg 36131 1 0
And the duplicate happens because the update tries to insert an entry for fid 134 in {file_managed} after it has first inserted this row:
File ID uid filename uri filemime filesize status timestamp
129 57 bar.jpg public://sites/foo/files/bar.jpg image/jpeg 36131 1 0
But, obviously, this URI would be a duplicate of the original one.
Considering this, I think there can not be a database-only solution: any completely correct solution should work around the different file reuse cases by replicating reused files, renaming the copies as needed, referencing the updated file name instead of the original one, and only then applying the DB update strategy.
In my case, this happened for only one fid, so I just removed the offending node revision before running update 7061, removed fid 129 manually from files and upload, and all went like a charm.
Comment #30
David_Rothstein CreditAttribution: David_Rothstein commentedThis sounds like a critical bug to me.
Comment #31
flaviovs CreditAttribution: flaviovs commentedThis is how I managed to workaround this issue and upgrade a site from D6 to D7:
After migrating to D7, don't forget to perform the D6 CCK to D7 fields migration.
Comment #32
flaviovs CreditAttribution: flaviovs commentedWRT last post, one should notice that the presented workaround only allows upgrading from D6 to D7. When the CCK migration is run, migrating the (FileField) fields to core files will abort if the non-unique files are not addressed.
Comment #33
iamEAP CreditAttribution: iamEAP commentedResetting to "active," since there's no actual patch for review.
Before anyone else spends time looking for solutions, we actually need to better understand the roots of the problem; and more importantly, because this issue is now blocking D8, determine whether or not Drupal core itself is responsible for the duplicate file path entries. Best comment I've seen so far in this vein was #17.
Next step should be to verify that a D5 site with revisions and files generates duplicate entries. Then, upgrade to D6 and verify that the duplicate entries persist. If that's the case, we can push this back to Drupal 6 and write the proposed update function. If not, we either continue searching for a Core problem that produces the issue, or we cede that the problem is coming from an API-incompatible contrib module and close this.
Comment #34
iamEAP CreditAttribution: iamEAP commentedInstalled D5, enabled node revisions, created a node with a file uploaded. Created a new revision of the node (which creates a new file revision). Created a new revision of the node with the file deleted, and then created a final revision of the node, uploading the exact same file/filename (which creates a new file entry, appending _0 to the file path).
Then, upgraded to D6. There is no duplicate entry created in the files table.
Conclusion: it's probably not a core process that produces this, at least with the last D5 and the latest D6.
Unless we find a common thread that's core-related, this may be a won't fix.
Comment #35
flaviovs CreditAttribution: flaviovs commentedI experienced this issue when doing a D6->D7 upgrade of site that never had a D5 version, so I doubt that the issue is D5-related.
However, the site makes heavy use of revisions (by the use of the Revisioning module), so maybe there's other revision-related use cases that trigger the bug?
Comment #36
jmrivero CreditAttribution: jmrivero commented#7 worked to complete the upgrade.. almost.
The double grouping keeps the files with same name but different path in the tables, I used this instead:
And then cleared the system_update table:
Comment #36.0
jmrivero CreditAttribution: jmrivero commentedAdded another analysis for originated D5 sites upgrading to D7
Comment #37
mheinke CreditAttribution: mheinke commentedexpierincing this exact issue with a site upgrade from 6.x to 7.x. like others, its impossible for me to just delete duplicates. is there a solution to this?
Comment #38
mheinke CreditAttribution: mheinke commentedI got around this by doing a fresh install of drupal 7. inserting the drupal 6 content tables, using CCK migrate to migrate content types, then Migrate D2D to port content
not ideal, and took about 7-8 hours. but it worked
Comment #39
PinoloHaving a limited set of duplicated filepaths (~200), I created a modified version of the script proposed in #25. Instead of creating symlinks, it copies the relevant files adding a suffix to the filename and consequently updating the files table.
Also, the script is now a bit more portable, while still being quite less than efficient…
Comment #40
mjcarter CreditAttribution: mjcarter commentedThe script in #39 works great.
I have made some improvements with regard to error handling and behaviour if there are problems (such as if the file referenced in the DB doesn't exist) as well as upgrading the script to use the mysqli database layer.
I also added the necessary function call to invoke the script at the bottom - this is where you should edit in your database details.
Comment #41
Trejkaz CreditAttribution: Trejkaz commentedThe additional suggestion in #36 nearly worked for us. We did execute all the statements in #36:
But we also found that the
file_managed
table had partial updates in it, so we also had to run:delete from file_managed;
And on the next attempt, the migration finally worked.
So I think there are a number of bugs here:
Comment #42
jessZ CreditAttribution: jessZ commentedI am having this issue. Not much of a code person and I could live with re-uploading some of the files but there are over 1000 nodes with attachments. How can I instruct the ISP/Tech people doing the upgrade to run script to resolve the following error and get past 7061.
DOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '70' for key 'PRIMARY': INSERT INTO {system_update_7061} (vid) SELECT DISTINCT u.vid AS vid FROM {upload} u; Array ( ) in system_update_7061()
Tried the script but same error.
Comment #43
jessZ CreditAttribution: jessZ commentedSELECT count(*), min(fid), max(fid), filename FROM files GROUP BY filename HAVING count(*) > 1 ORDER BY count(*)
I only found 3 x2 (6 files with same filepath and less than 25x 2 with same filename. how can i group via columns in phpmyadmin for a fast delete.
I deleted all the duplicates files, but still get the same error Any other suggestion on workarounds?
• Failed: PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '70' for key 'PRIMARY': INSERT INTO {system_update_7061} (vid) SELECT DISTINCT u.vid AS vid FROM {upload} u; Array ( ) in system_update_7061() (line 2804 of /home/reimagi8/public_html/new/modules/system/system.install).
Comment #44
jessZ CreditAttribution: jessZ commentedAHA~ Clear system update !
delete FROM system_update_7061;
new message!
The following updates returned messages
system module
Update #7061
Upload module has been migrated to File module.
now let's go see what i got.
Yes! now i can see the pdf attachments. Now on to the image module headache.
Comment #45
kbahey CreditAttribution: kbahey commentedThis is a modified version of the script in #40 with some fixes.
First, a fix to handle UTF8 in file names, and not aborting when accessing a file as stored in the database.
Second, a fix to rename the files with the unique fid, rather than repeatedly adding _fclean_ until the length of the column is exhausted and the name gets truncated.
I also removed the HTML tags, so it runs from the command line.
Ideally, this script should be Drupalized, using the proper DB API, rather than accessing MySQL directly. But that is for another day ...
Comment #46
David_Rothstein CreditAttribution: David_Rothstein commentedDoes this issue affect the Drupal 6-to-8 migration path too?
Given the nature of it (data in Drupal 6 databases that doesn't work with the uniqueness assumptions of later Drupal versions), I would expect that it does, but I haven't tested it.
Comment #47
dawehner@amateesc and myself discussed about that and we are wondering why we can't fix the problem on the DB layer and "ignore" the fact where its coming from.
From reading the issue, its clear, you have some duplicate filepaths inside the {files} table. We should be able to
group by them and then update the upload module and point to just the first FIDs for all of them. That itself should be a pretty reasonable and doable.
Do people think that this is enough to fix the critical upgrade path issue? Yes, its not clear why exactly this happens, but at least it fixes the issue.
Comment #48
fgmIIRC this appears only on D6 sites upgraded from D5, when the file attachments were changed from being per-node to being per-revision, causing inconsistent behaviors for nodes where files changed over the course of revisions.
Comment #49
C-Logemann@fgm I am currently working on a customer upgrade project with duplicate paths. This was not upgraded from D5.
Comment #50
dawehner@C_Logemann
Can you think of another reason, maybe the usage of some filefield module, why you ended up with those duplicated paths?
Comment #51
C-Logemann@dawehner
Especially in this case it is possible that the old programmers made something wrong when they imported existing data from an old custom CMS. But I didn't have access to the old import code to check.
In my current test upgrade process I trigger some MYSQL commands to solve the problem. When I found this issue I thought there was maybe something wrong with D6 like duplicate entries in the search table which sometimes breaks an import in MYSQL?
Comment #52
boran CreditAttribution: boran commentedMy original post was for a D6 site, which had never been on D5. There was little custom programming. I no longer have the site or DB to experiment with though.
Comment #53
Fabianx CreditAttribution: Fabianx commentedI agree with dawehner its the references that need to be updated when de-duplicating the {files} table, not the files itself.
Comment #54
David_Rothstein CreditAttribution: David_Rothstein commentedSo that would basically be the first option listed in the issue summary:
With the downside mentioned there.
(Basically, system_update_7061() is trying to preserve the existing file IDs, but it can't if we throw away some.)
But I guess that might still be preferable to the alternative of creating lots of copies of the same file... And we could consider keeping a record somewhere of the old file IDs and which new ID they point to, in case any other code out there needs it.
Comment #55
jelo CreditAttribution: jelo commentedMy site was not upgraded from D5 to D6. My problem might have come from the module filefield_sources. A feature of that module was that you could reference existing files from any other nodes. I am not sure how it was implemented, but it sounds like a logical candidate to have stored a duplicate file path.
Comment #56
dawehnerThis so far is just some example data which could look like the problematic data, right?
Comment #57
Fabianx CreditAttribution: Fabianx as a volunteer commentedTestbot: Go!
Yes, example data looks great!
Comment #61
dawehnerComment #63
Fabianx CreditAttribution: Fabianx as a volunteer commented#61: Without looking too close at the current patch, I think it should be a two-step process and certainly be a pre-step to doing any import / conversion.
This can purely work on the D6 data structures still:
a) Create mapping table from duplicates (that is useful later always if you need to cleanup more data manually) and remove duplicates, now {files} is sane
b) Cleanup field records => now those are sane, too and map to the right de-duplicated {files} again based on the mapping table.
No need to deal with it in the middle of the conversion, this could be one of the first steps really.
Comment #64
dawehnerComment #65
dawehnerI'd prefer to not have to introduce a new update method in the middle of it, given that this will trigger at least one new schema update for all the D7 sites out there.
Given that the easiest way for #63 would be to not make it a 2 step process but simply fill up that mapping table over time.
Comment #67
dawehnerBack to green., was not able to use DBTNG correctly.
PS: Uploading a patch against D7 using
git diff 8.0.x
might take you a while.Comment #69
dawehnerPHP 5.4 does not have short array syntax.
Comment #70
fgm@dawehner, yes it does : http://php.net/manual/fr/migration54.new-features.php
But you probably meant some earlier version, since D7 does not require PHP 5.4 (only recommends it), only 5.2.5.
Comment #71
dawehnerEhe, yeah I wanted to express that versions before 5.4 did not had the future yet.
@fgm
Do you think this simple patch makes kinda sense?
Comment #72
neclimdulPatch makes some sense to me but we don't ever use the mapping table. is that on purpose?
Comment #73
fgm/me cringes seeing a db_query in a for loop, but.... I'm not sure of the whole logic, it seems the goal is to find duplicate fids for filepaths, so but wouldn't it be possible to just query for duplicates all at once, like
SELECT count(fid), filepath FROM {files} GROUP BY 2 HAVING count(fid) > 1
to avoid having such a loop ?Comment #74
neclimdulYou could use a having like that to limit the iterations but you'd still need a select in the for loop.
Comment #75
dawehnerWell, we certainly want to avoid to run just one query, because that won't scale. Iterating over FIDs make its easier. Feel free to improve it though.
Well, Fabianx asked for one
Comment #76
dawehnerThank you @neclimdul again for sharing the database dump. Sadly it does not contain the problems described as part of the issue.
Sadly that database dump did not contained some of the problems describe pretty good in #29, as the amount of entries in
{upload}
wasn't enough,though I have seen the corresponding data in
{files}
, see query:Beside from that, the issue fixes the problem at least @fgm has, so it can get in ...
Comment #77
vijaycs85We might want to make sure that the table doesn't exist?
I don't have any solution, but looks like it's a lots of queries. isn't there any other way to find all the duplicates and increment progress and delete from files?
Comment #78
dawehnerThank you a lot!!
Can't be a bad thing.
Yeah, that is a bit unfortunate though on the other hand, the site won't be online while executing them.
How worried are you about that? This way its at least easier to read, IMHO.
Comment #79
BerdirNitpick: needs a . or , or something after the issue link as the next line starts a new sentence.
Could also remove the "Therefore" here and below as well I think.
Naming is a bit weird here, e.g. $object. that's usually $rows and $row or so?
I also don't really like temporary variables like $filepath, seems easy enough to just inline that in the one place it is used? Maybe also use $duplicate_fids or so to separate it from $fid?
I was also writing a very long comment about what happens when we delete files and then progress will never reach max and we'd never finish. But the !empty() check prevents that....
However, something else can happen. If you have exactly 100 rows (or 200, 300, .. and no deletions), then #finished will be 1 and update.php will consider this batch function to be complete and won't call it again?
You might be able to simulate this in the test by setting batch size to count($files)?
So we need to make sure to never set #finished to 1, just .99 or something?
Comment #80
dawehnerFixed
So
$sandbox['max']
is the following bit:so even if we have that many rows, max will be still much higher
Comment #81
BerdirNice only noticed now that this was overriding $fid.. would have been a fun bug ;)
Right now I understand the max change, nice trick ;)
I haven't manually tested this, but it looks good to me after an extensive review and we have tests. Let's get this in.
Comment #82
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedI think given the number of actual sites affected by this bug, the patch could use some testing on real sites.
Is that safe? The {upload} table in Drupal 6 has a primary key on vid + fid, so if the same node revision (vid) referenced two files with the same filepath, this looks like it would blow up. (I have no idea if that could have happened in reality, but given the way this issue has gone it wouldn't be shocking at all.)
Overall it seems this code might be a lot simpler if we integrated it with the main update rather than running it out in front; then we should avoid risks like the above. Would be great to have the simplest possible change in this patch given how many problems this particular update function has had over the years.
Minor - code style/spacing is wrong on the second line.
Comment #83
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedIn other words, I'm wondering if this test (which should fail) needs to pass.
Comment #85
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedActually I think that test is wrong - Drupal 7's File module doesn't prevent a node revision from referencing the same file more than once so I guess if you are in that situation in Drupal 6, you should see separate attachments on the node revision after the upgrade, not just one.
Interdiff is against #80. This should still fail, though, for the same reasons.
Comment #87
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedAnd does the fix actually have to involve anything more than this? This is the "integrate it with the main update rather than running it out in front" approach I mentioned in #82. It passes for me locally.
No interdiff, since essentially all the code is moved compared to the earlier patches.
Comment #88
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedAs written this is not very efficient though - definitely more database queries than necessary (even for sites that have no duplicate files).
If we care, we could try to build a list of which file IDs actually have duplicates once up front (and store it in the $sandbox), then ensure we only do the extra stuff once for each of those.
Comment #89
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedActually I just realized something. The upgrade path never actually deletes the Drupal 6 {files} table. I think it's deliberately left behind so that contrib modules can use the data in it. So we don't need to introduce a new "system_update_7061_fid_mapping" table after all. All the information anyone needs about the duplicate files is already in the {files} table.
Removing that, combined with a query out front that builds a list of duplicate filepaths, should make this a really simple patch.
i will write a new patch and post it, assuming it works.
Comment #90
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedIt seemed to work. The new patch just changes system.install, but now it requires only a few lines of code.
Comment #91
fgmFollowing @dawehner's request off-issue, I tried to reproduce the problem I had two years ago, but could not find a DB backup still containing it : no way to ascertain this specific problem has been fixed, sorry about that.
Comment #92
steinmb CreditAttribution: steinmb commentedAlso tried to dig back in backups but unable to find a valid d6 db. backup though I'll do some more digging.
Comment #93
neclimdullatest patch regresses
system_update_7061_fid_mapping
which might be needed by things like filefield migrations.Comment #94
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commented@neclimdul, see my explanation in #89 for that. To get the mapping, they can essentially just do the same few lines of code that core is doing here, so it didn't seem worth it to me to cache it in a separate database table anymore.
Comment #95
webchickI think one of the problems we're running into here regarding real-world usage is a lot of people will use https://www.drupal.org/project/migrate_d2d for major version upgrades, so if they hit that issue they probably went a different route.
The fact that two of the people previously having this issue no longer do and can't find a backup old enough to show the problem might demonstrate that this does not actually need to be critical. :P Or at least, that it doesn't need to hold up D8, since obviously there is a workaround.
Comment #96
dawehnerThank you david for picking this up!
You never know :)
Totally agree. I just went with the version I was able to parse the flow during the update, and I certainly don't like the old version of the patch.
Good idea to ensure that.
Such a lookup could potentially return a lot of data. Should we maybe do that inline?
Oh wonderful, this is indeed much easier to understand!
Comment #97
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commented@webchick as far as I can see from reading this thread, the only workarounds people have come up with involve writing custom SQL queries or custom PHP scripts to massage their Drupal 6 site's data into a format that won't crash. If that's enough to downgrade a bug from critical, we wouldn't have too many critical bugs :)
Since this is an issue with the Drupal 6 database, anyone using migrate_d2d would presumably face something like it (although the symptoms might look different?). I don't see code in that module for dealing with it.
Comment #98
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedHowever, since neither I nor anyone else actually knew that core leaves behind the {files} table for this purpose, it wouldn't be a bad idea to have a code comment here explaining that. Tried adding one in the attached.
Thanks, though I'm pretty sure that was code that you wrote :)
Well, how big do you think it would have to be before it's a problem? I think it could be large, but not incredibly large. It is a one-dimensional array and I think the maximum number of elements is half the number of rows in the {files} table (so if the site has 1,000,000 files this array could have up to 500,000 elements). That maximum would presumably require a completely pathological setup to achieve, and probably it would be much smaller for real sites. Overall it doesn't sound like a huge problem to me for this to wind up in the batch sandbox (or in memory all at once), but I could be convinced otherwise.
Comment #99
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedI take that back, #38 mentions using Migrate D2D. I think with that solution it won't crash the site, though I'm still skeptical that it actually migrates the duplicate references correctly - I would guess it probably winds up dropping some data. Anyway, it's a whole other ballgame :)
Comment #100
catch@David I've also not been able to find anything in migrate to handle this, so I opened #2504553: Ensure that Drupal 6 duplicate file names are correctly migrated for the core 6-8 migration which will also have to deal with this issue.
Comment #101
catchOn finding a database dump, I believe Drupal.org ran into this.
#966210: DB Case Sensitivity: system_update_7061() fails on inserting files with same name but different case made the column binary (case sensitive) and was worked on by drumm as part of the Drupal.org upgrade, because it ran into this issue.
#1691438: Regression: {file_managed}.filename should NOT be binary (and case-sensitive by default) reverted that patch, because it broke filename searches or similar.
Which left this issue to fix the duplicates (whether due to case insensitivity or not).
So if there is a Drupal.org Drupal 6 database dump still around, that should hit the error when running update.php to Drupal 7.
Comment #102
scorchio CreditAttribution: scorchio commentedI don't know whether this will help anyone on working the issue, but I'm actually working on a real world D6 -> D7 migration which triggers this error. I will test #98 ASAP and update.
Comment #103
catch@scorchio that would be incredibly helpful - if we're able to confirm this successfully fixes the issue on a real database with the problem, that should allow the patch to get committed to 7.x.
Comment #104
flaviovs CreditAttribution: flaviovs commentedI am not so sure if fixing this on the D7 upgrade script is the right approach. There is no reliable way to map duplicate fids by just looking at {files}. We have many contrib modules out there using managed files, not to say user developed custom modules.
Ignoring duplicate fids means ignoring data. Surely, this data is hidden in the leftover {files} table (which, for me, is a bug by itself on the D7 upgrade script, but that's another issue), but until this data is also migrated by the corresponding module, it is ignored data -- and information will be missing on your site (i.e. it will be broken).
Yes, we can change "all" modules to handle entries in the leftover {files} table. This may look like a straightforward process, but the devil is in the details... For instance, the Upload module is gone. So who is going to fix fids that were left behind in the {upload} table? Also, we should question ourselves: at this point in the Drupal timeline, should we bloat D7 modules update scripts to handle (invalid) duplicate files that may be present in {files}? Can we fix all modules? I do not think so.
IMHO, it is a D6 (and below?) bug, so it deserves to be fixed there.
That said, and since I still have a very long list of D6 sites to upgrade to D7, I developed the Duplicate file fixer module that de-dups entries by generating copies of duplicated files (well, it actually hard links the files, so disk space is not duplicated). A straightforward and pragmatic approach that, at the end of the day, will let you upgrade to D7 without much hassle and loss of data (hopefully).
The module is still, and probably always will be, in alpha, so guys backup before running. The module has some heuristics to show more details about a duplicate file (such as checking for node attachments and FileFields fields) that may let you fix things by hand, but if you think about a check that can be done against other modules, open an issue and explain your case. Of course, feedbacks are more than welcome.
TL;DR version: if "Update #7061" is failing on you, go to https://www.drupal.org/project/d6dupfix and checkout out a module that may help you
Comment #105
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commented@scorchio that would definitely be very helpful if you can do that and report back here!
@catch's idea of testing with a Drupal.org database would be a good idea too (in retrospect Drupal.org is basically the poster-child site for running into an issue like this)... but maybe harder to actually arrange in practice.
I think with one manual test on a real site we're definitely in good shape here. Especially since the newer patch is simpler than the previous ones (and looks like it has less of a chance of accidentally breaking something else as a side effect) I would feel better about committing it with less testing than before. But knowing that it actually works on at least one real site and doesn't cause any bad side effects would definitely be helpful.
Comment #106
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commented@flaviovs, thanks for posting that module.
I think we do need some fix for this in Drupal 7 core. I'm not sure if this was actually a bug in Drupal 5 core (and Drupal 6 contrib) or just the way they worked... but either way nothing prevented it so we need to have the code do something to handle this during the upgrade.
On the risk of data loss, I took a look at the Drupal 6 files table and this doesn't seem like a big concern for many of the columns since they are properties of the file itself. I could imagine it being an issue for columns like 'uid' and 'filename' (and maybe 'status', although hopefully in practice none of these duplicates are temporary files). In comparison to a fatal error on upgrade it seems like a lesser concern :) Ultimately we have to do something to fix this, and the alternative of having lots of versions of the same file doesn't sound good either - especially if the site ever installed a module like https://www.drupal.org/project/file_entity which would expose all of them in the user interface.
Perhaps instead of keeping the record from the first file we should keep the last? In other words, change MIN to MAX in this:
That might be better if we expect that in a typical case the newer record is associated with a newer node revision and therefore the most up-to-date representation of the file's actual current properties.
Comment #107
flaviovs CreditAttribution: flaviovs commentedDavid, see my comment #35. I faced this problem on a site that never had a D5 version, so definitely this is an issue that pops out by itself on D6.
As of the loss of information, perhaps I didn't make myself clear. The information that is going to be missing is the one in the ignored rows of the {files} table. Granted, the underlining data is still there, but unless it gets migrated somehow (NB: and before a conscious DBA drop the leftover table after the upgrade is "done"), it is lost in the sense that it will not appear on your site.
It does not help to get MAX(fid) instead of MIN(fid).
Let me show you an example to illustrate the issue better: in one of the D6 sites I had to migrate, there were a couple of duplicated entries.
Now, let's try to see what's on {files} about this entry:
(NB: for some unknown reason, filesize differs.)
Now, after some investigation, I found that those fids are managed by an image field called "field_foo". So, let's try to see where those files were used in:
As you can see, the same physical file has two entries in {files}. But those entries point to two separated nodes. We're going to have problems here if you do not migrate all entries, e.g. if you migrate only fid = 510, then everything will appear ok when you visit node/112147, but when you go to node/112148, your image file won't be there.
✦ ✦ ✦
The Duplicate file fixer (DFF) module takes the approach that, since technically two rows on {files} are meant to be two separate files -- but aren't being keep this way due to a bug somewhere --, then it makes sense to fix the issue by duplicating the files with different names, and de-dup the entries. Surely, we all know that it is the same file, but why bother? In our case, we certainly have a bunch of image and PDF files that content managers keep uploading again and again to the same site to use on different contexts, so we are screwed already. Well... at least DFF hard link the files, so at least disk space is not wasted.
Yes, this is definitely a hack, but I am afraid that it is the most sensible thing to do.
Comment #108
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commented@flaviovs, as far as I can see, the D6->D7 core upgrade only moves over files that are controlled by core (i.e. those that were added to the site by the core Upload module).
So in your example, neither the file with fid 510 nor the one with 530 will be moved from the {files} table to the {file_managed} table. They are both left behind, and it's up to the CCK/Filefield/Imagefield upgrade path to decide how to move them over, along with all other files that are associated with a CCK field. That behavior is the same with or without the patch in this issue.
Comment #109
Fabianx CreditAttribution: Fabianx for Drupal Association commented#107: You are right that Duplicate File Fixer is the right approach for a contrib, one case fits all solution.
However core cannot and will not support the CCK upgrade path, but only fix the {upload} table upgrade path.
We still need to provide a patch here that people can upgrade their sites.
Compromise proposal:
- What about we show a message when duplicate files have been detected:
Duplicate file xyz had duplicate records 1,4,6 and was reduced to 15. [warning]
There had been duplicate files in the {files} table. If you only used the core {upload} module those have been fixed automatically, if you used contrib modules like the contributed file_field or image_field modules, please see {hand book page}. [warning]
In the handbook page we can the legitimately point to duplicate file fixer as another possibility that can be run pre-upgrade ...
Thoughts?
Comment #110
flaviovs CreditAttribution: flaviovs commentedDavid, I do not think that letting non-core modules do low-level upgrade from legacy {files} entries to core tables is a good idea. IMHO, {files} must be migrated by core update, with all fids kept unchanged. Only then contrib/custom can update their tables. Since fids don't change, they don't even need to touch the core tables (just use
file_usage_add()
to signal their files as being used).Maybe that's the differences in POV that make us disagree. In my POV, if we ignore some fids, how can a contrib module upgrade?
Sounds good to me. The important thing here is to pause and warn the user that dups were found, and that the upgrade process is unable to handle all cases. We definitely do no want an upgrade that leaves a site with random files "missing" without a warning!
Comment #111
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commented@flaviovs, I think you should create another issue if you want to propose that core migrate files it doesn't control. Again, the patch here is not changing that one way or another.
A drupal_set_message() that indicates the duplicate files sounds OK as there is some small possibility that the data from duplicate files isn't combined correctly (see my comment in #106). The message could get long though; maybe it should just be in the return value of the update function instead? Also since we're not touching the contrib-controlled files at all I don't think it needs to mention duplicates found there. I.e. if this part of the suggested message were 100% true:
...then there wouldn't be a point having a message at all. But as I pointed out above there is a chance that the upgrade can't fix them perfectly.
I don't see it as a blocker for this issue personally, but if someone wants to work on a message and a handbook page it sounds reasonable.
Comment #112
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedAlso any thoughts on my question in #106 about switching from MIN to MAX?
In other words, if the Drupal 6 data looks like this:
Should the migrated file in Drupal 7 actually use uid 2 and filename "file-new.jpg" (based on the idea that the most recent entry is more likely to be the up-to-date one)?
Comment #113
Fabianx CreditAttribution: Fabianx for Drupal Association commented#110: We cannot stop easily, but we can show the warning at least after the upgrade is done.
We do leave {files} alone, we only change {files_managed}. Also you need to make a backup before upgrading D6 anyway and we can add to the handbook page.
So the message should be fine, changing to CNW for that change.
Comment #114
flaviovs CreditAttribution: flaviovs commentedWell... {files} is a core table used to control all files in a Drupal site. Drupal gives fids to contrib so that they can store and refer to managed files later, so I do not agree with the "files it doesn't control" part.
But...
Migrating {files} is what I thought it already did. I was inspecting the upgrade routines and had a shocking surprise -- it does not. So my POV (although technically sound IMHO), does not apply. And there's no room to change this at this point of time. So let's move on here...
That's my point. You cannot safely migrate by just looking at {files}. It does not matter which entry you migrate. It does matter only the entries that you do not migrate, because there may be some module, somewhere, which expects a fid that won't be there. The uid issue you pointed out just highlights the problem.
See my example above, which I consider more complex:
Comment #115
Fabianx CreditAttribution: Fabianx for Drupal Association commentedWe look at nodes - if you read the patch.
We just only take care of the files uploaded by upload module as that is the only thing that Drupal 6 Core supported.
Therefore I still think a quick warning (not error) is a good compromise.
Comment #116
pfrenssenI'm going to dig in to this fun issue :P
Comment #117
pfrenssenI fired up a clean installation of Drupal 6.36 and can easily generate different entries with the same filepath in the files table:
I now end up with two identical entries in the files table, only the fid and timestamp are different.
Comment #118
izmeez CreditAttribution: izmeez commented@pfrenssen The results in #117 sounds like a bug in d6 don't you think?
Comment #119
pfrenssen@izmeez, no in Drupal 6 this was allowed, but not in Drupal 7, which is the reason we have this issue :)
I just provided that example to illustrate how easy it is to end up with a database in D6 that would expose this issue.
Comment #120
pfrenssenActually the example that I provided puts duplicate file entries in the table but these do not trigger the integrity constraint violation, the user pictures are updated in
user_update_7012()
but this gets the data from the{users}
table, not the{files}
table.Comment #121
pfrenssenI found a way to replicate it in D6 core with the Upload module too:
sites/default/files
folder.The image will be present twice in the database with the same filepath. If the site is then upgraded to Drupal 7 the integrity constraint violation is thrown.
It is of course not recommended to delete uploaded files, but you can image this happening in reality, especially if the site administrators have access to the files on the production server, or if a module such as IMCE is installed.
Comment #122
pfrenssenTested the patch with this real life D6 database and it solves the issue. The corrected file ID is visible in both nodes. The entries in the
{file_managed}
and{file_usage}
tables are correct.I reviewed the patch and it looks really solid. As mentioned by others I'm also not so much concerned that the queries are done in a loop. This is a batched operation and the duplicated files can be expected to be only affecting a small percentage of the total number of files. This will have little impact on the total duration of the upgrade.
I also think that @flaviovs concerns are not applicable in this context. The patch solves the problem responsibly, it only loops over the files from the Upload module and fixes any duplicates properly. Any duplicate files that might be relied upon by contrib modules are still intact in the legacy
{files}
table. Contrib modules can use this data in their own update hooks.Better to use the @see syntax on a new line, so that this will be linked properly in the documentation on api.drupal.org.
The 'reduced' seems to imply that the problem is improved but not completely solved. Maybe better to say "Ensure that filepaths are deduplicated".
I would indeed change this to use
MAX()
instead as was suggested by @David_Rothstein, so that we will have the filesize and timestamp of the latest version of the file, not of the original one. This will be more likely to be correct.If we change this, then the documentation of this line should also be updated + the fids in the test coverage.
Comment #123
dawehnerThat is a good point, IMHO. Let's ensure that we have the needed test coverage just to be sure it actually works that way.
Comment #124
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedI'm pretty sure @see is only for docblocks - see https://www.drupal.org/coding-standards/docs#see (or should I say "@see https://www.drupal.org/coding-standards/docs#see" ? :)
For an inline code comment there's no reason @see would be used for anything on api.drupal.org, so it's better to have it be a readable sentence.
I think we actually just need to fix the documentation elsewhere (which is now incorrect) rather than add a new sentence.
I fixed both the above in the attached. The rest of the changes look good to me.
Comment #125
Fabianx CreditAttribution: Fabianx for Drupal Association commentedAs there is consensus this is good enough for core, we should go ahead.
=> RTBC (yes!)
Can we still update the hand-book page to at least notion the other module?
Thanks,
Fabian
Comment #126
pfrenssenChanges look good, +1 to RTBC.
Comment #127
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commented@Fabianx, so just to make sure, earlier you thought this needed some messaging to the user who is running update.php to inform them that duplicate files were replaced - do you not think this anymore?
Would still be great to see the final version tested on a real site that experienced this problem, but if that can't happen at some point we'll need to commit it without that.
Comment #128
Fabianx CreditAttribution: Fabianx for Drupal Association commentedI still would love a message and/or a handbook update, but both are not critical for me and the majority thinks that core should fix what it supports and mainly inform contrib that they need to adjust their upgrade path, too.
e.g. CCK
So I think a message could be follow-up, but at least a handbook update should be made - as its very hard to find solutions to the problem - else.
Comment #129
scorchio CreditAttribution: scorchio commentedSorry for not reporting back earlier - I had to apply a manual workaround to let the mentioned project continue. (BTW, thanks to webchick for pinging me.)
Today I tried to redo the upgrade from scratch using 7.38 + #124 but when I started to migrate the fields with content_migrate, I hit the same issue again. In the {files} table, I have entries like this:
Take fid 192 for example:
What am I missing here?
Comment #130
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commented@scorchio and I talked about this on IRC earlier. The above is a CCK field, so it's migrated by CCK rather than by Drupal core - that's why the patch here doesn't fix it.
However, the above definitely does prove that CCK needs a similar fix. @scorchio filed #2529562: Filefield D6 -> D7 migration: Integrity constraint violation - Duplicate entry for key 'uri' and I think the existing issue #1176186: D6 -> D7 upgrade: Duplicate files cause Integrity constraint violation: 1062 Duplicate entry 'xx' for key 'PRIMARY' might be about this also.
Comment #131
webchickGood catch. Thanks so much for doing the testing!!
However, I believe that since those issues are coming from a contrib module, they do not qualify as blockers for D8, yes?
Comment #132
flaviovs CreditAttribution: flaviovs commented@scorchio,
As said by David, this problem is generated by a CCK field, so you will have to wait for a patch to come up for the issue. The patch discussed here only migrate files managed by D6 core Upload module.
You may want to take a look on the Duplicate file fixer module, which takes another approach. It let you fix the duplicates manually, or can be used to de-dup the entries automatically in a module-agnostic way (disclaimer: I am the author of DFF).
In your case, by using DFF you will have easy access to the old node revisions having duplicated file pointers, and may choose to delete them if not needed anymore. If you want to keep them, the module can de-dup the entries by linking new files so that you can proceed on the upgrade.
Comment #133
webchick@David_Rothstein, just curious if you think this one might make the 7.x bug fix release window for Sept. 2?
While we'd all feel more comfortable with real-world site testing, but after publicizing this issue on Twitter, the one person we could find who a) reported hitting this in the real-world and b) still had a backup recent enough that still includes the problem (as others had long since used workarounds to bypass the issue) turns out does not actually have this specific issue (or rather they do, but with a non-core module). There are also 48 subscribers to this issue, presumably most of them with a vested interest in having this fixed, and no one else has yet to speak up and say they've tried the patch and it works, despite it being available for literally a month. :\ I'm kind of out of ideas on how to solicit more real-world testing.
There was testing done by @pfrenssen on a real D6 database back in #122, so we at least have some sort of believable evidence it works. Do you feel that's sufficient to move forward here?
Comment #134
drzraf CreditAttribution: drzraf commentedI was beaten by this bug. The root issue is that filename were duplicated, and this was possible due to (I don't remember well) either non-uniqueness on the mysql column or case sensibility on the filename.
I once guess these duplicated filenames where caused by image_attach but I'm not sure either.
I gave up on this upgrade path long ago (although I spent hundreds of hours during almost two years of my free time for an NGO trying to overcome all the issues only related with a D6-D7 upgrade-path).
But with almost 40 modules it was just impossible. I kept the site running as it could using this ageing D6 and hoping for a better future.
I'll not even put a single finger in this drupal-upgrade-hell again or look at a sql-dump unless I'm strongly confident that it's for a successful upgrade.
And if that time does not come I won't touch drupal anymore until I'm forced to do so, and will move to something else. [the frustration of the time lost is still entire]
Side note: The time needed for maintainer/code authors to get an eye on bugs is the main issue here.
Many bugs where opened 2+ years ago but the strategy was "ignore them so that their importance is reduced as workarounds are discovered".
Starting two (or more) years later to try to resolve them will just make you lose 2x or 3x more times than if the issue had been considered in the first place.
good luck anyway
Comment #135
drzraf CreditAttribution: drzraf commentedanyway...
but can't guarantee about column join correctness
Comment #136
catch@dzraf people getting stuck on 6-7 upgrades is a big reason why #2030501: [meta] Ensure that Drupal 6 sites have a functional upgrade path to either Drupal 7 or 8 before Drupal 6 loses security support (and hence this issue) was made a release blocker for 8.x. It's also why the 6.x-8.x and 7.x-8.x upgrade paths will be using migrate rather than hook_update_N().
Thanks for the sample data.
Comment #137
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commented@webchick: Yeah, I think this should definitely go into the next release either way (that's basically what I was aiming for in #127).
@drzraf: This issue was marked critical in June 2013 - if you ran into it before that you could have marked it critical earlier? As far as I know it is the last remaining critical issue with the Drupal 6-to-7 core upgrade path (and even then it only seems to affect a relatively small number of sites) but if there are others you know of that would qualify as critical you should definitely mark them as such. Unfortunately it sounds like you may have run into problems with contrib modules too, which is a whole other ballgame...
Comment #138
jelo CreditAttribution: jelo commentedI commented in #55 that I had this issue. However, it may have been caused by filefield_sources. Anyway, I have a D6 backup that has duplicate file paths in the files table. When I run this query
SELECT count(*), min(fid), max(fid), filepath FROM files GROUP BY filepath HAVING count(*) > 1 ORDER BY count(*);
I get 42 duplicate files in the backup. I upgraded from 6.22 to 7.32 or something like that at that time.So I went, rebuild the old site, prepared everything for the upgrade (following the steps closely) which now puts me at an update from 6.36 to 7.38. As soon as I want to run the main update with update.php I run into this error:
PDOException: SQLSTATE[42S22]: Column not found: 1054 Unknown column 's.ssid' in 'where clause': SELECT u.*, s.* FROM {users} u INNER JOIN {sessions} s ON u.uid = s.uid WHERE s.ssid = :ssid; Array ( [:ssid] => xxxxxxxxxxxxxxxxx ) in _drupal_session_read() (line 92 of /includes/session.inc).
Is this a bug that was introduced? My D6 sessions table has a column called sid, but not ssid.
Jens
Comment #139
dawehnerI think you need to use the alternative way to get to
/update.php
:- Open settings.php with a text editor.
- Find the line that says:
$update_free_access = FALSE;
- Change it into:
$update_free_access = TRUE;
Comment #140
jelo CreditAttribution: jelo commented@dawehner: that doesn't make a difference, same error. Due to this error, I cannot test the patch provided here. I may just have to log a new issue with regard to the upgrade path and this error which I did not encounter when upgrading the site from a previous 6 version to a previous 7 version.
Comment #141
webchickThe update_fix_d7_requirements() function should be taking care of this. The code is at http://cgit.drupalcode.org/drupal/tree/includes/update.inc?h=7.x#n763.
There do seem to be some other people reporting this ssid problem at #1008352: Upgrade D6.20 to D7 fails with PDO Exception. Comment #57 there especially looks like a good list. In general, it seems to happen if you've tried to re-do the D6 -> D7 upgrade multiple times without restoring from clean D6 backup in between each run. None of the logic in that function will fire if it thinks the system's already been upgraded (has a system.schema value >= 7000).
Comment #142
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commented@jelo, thanks for trying to test this!
Are you running updates over an
https://
URL, by any chance? Looking through #1008352: Upgrade D6.20 to D7 fails with PDO Exception it seems like that's the actual condition that triggers the bug. The two possible workarounds are:http://
instead, or$update_free_access = TRUE
in settings.php and log out of the Drupal 6 site before running the updates so that you run them as the anonymous user (just changing settings.php by itself isn't enough).If this is the problem, we really should retitle the above issue to focus on it and up the priority to "Major". (I don't have time to test it right now, but I assume the fix would involve moving the code that's currently in update_fix_d7_requirements() to somewhere that runs a little earlier in the bootstrap.) But I don't believe it's related to this issue.
Comment #143
jelo CreditAttribution: jelo commented@webchick: the backup I am using has never been upgraded before and I get the error the first time hitting update.php.
David: that may be it. I did run it over https. I am on my way out on holiday. If I can find some time to fire up that old backup again, I will test it over http to check if I can get past this problem to test the patch from this issue.
Comment #144
webchickSpoke to David today, he said there will not be a Sep. 2 bug fix release of D7. :( Earliest opportunity is now in October. He did say it could potentially be committed sooner than that, however.
Comment #145
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedOK, people are really itching to get this committed and it has been RTBC for long enough. It would still be great to see real-world testing (feel free to post here with that even afterwards - would still be very helpful). But it looks like it's probably ready to go.
Therefore, committed to 7.x - thanks!
Above there was some discussion of having a handbook page about this issue (which could mention https://www.drupal.org/project/d6dupfix or other solutions as well). I'm not sure what the status of that is, but anyone should feel free to create one and if necessary we could link to it from somewhere within core (for example, a message displayed to sites that have this problem, as was suggested earlier).