I'm trying to migrate fields from D6->D7, and I'm using the current cck dev.

I originally posted this bug in a couple other forums. But all those issues appear to have been resolved. Since this issue appears to be persistent even with the latest dev I thought it deserved it's own thread.
#781088: Updating CCK Fields and Data from D6 to D7

Scroll to the bottom for my ugly workaround
#1157234: Various PHP fatal errors, typos, etc.

When first running the field migration I was getting a bunch of errors, about six similar in nature to the ones below. After crawling the issue que for hits I noticed several other people mentioning problems when trying to migrate revisions. Since I don't need my revisions the first thing I did was remove all of them. Since I had already migrated from D6 I can't use this module (http://drupal.org/project/revision_deletion). Based off of that sites code I ran the following SQL commands to remove revisions.

DELETE FROM node_revision WHERE vid NOT IN (SELECT vid FROM node)

and I ran this for each file field

DELETE FROM content_field_image_upload WHERE vid NOT IN (SELECT vid FROM node_revision)

and I ran this for each node type

DELETE FROM content_type_blog WHERE vid NOT IN (SELECT vid FROM node_revision)

Now when I test cck field migration I receive this error.

An AJAX HTTP error occurred. HTTP Result Code: 500 Debugging information follows. Path: /sandboxsite4/batch?id=1991&op=do StatusText: Service unavailable (with message) ResponseText: PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '46' for key 'PRIMARY': INSERT INTO {file_managed} (fid, uid, filename, uri, filemime, filesize, status, timestamp) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7); Array ( [:db_insert_placeholder_0] => 46 [:db_insert_placeholder_1] => 1 [:db_insert_placeholder_2] => IVAW_Bylaws_as_of_09_AUG_09.doc [:db_insert_placeholder_3] => public://IVAW_Bylaws_as_of_09_AUG_09_1.doc [:db_insert_placeholder_4] => application/msword [:db_insert_placeholder_5] => 101376 [:db_insert_placeholder_6] => 1 [:db_insert_placeholder_7] => 1269023616 ) in content_migrate_filefield_data_record_alter() (line 174 of C:\xampp\htdocs\sandboxsite4\sites\all\modules\cck\modules\content_migrate\modules\content_migrate.filefield.inc).

I run it again and I get another similar error

An AJAX HTTP error occurred. HTTP Result Code: 500 Debugging information follows. Path: /sandboxsite4/batch?id=1993&op=do StatusText: Service unavailable (with message) ResponseText: PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '152' for key 'PRIMARY': INSERT INTO {file_managed} (fid, uid, filename, uri, filemime, filesize, status, timestamp) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7); Array ( [:db_insert_placeholder_0] => 152 [:db_insert_placeholder_1] => 1 [:db_insert_placeholder_2] => media.jpg [:db_insert_placeholder_3] => public://images/header-images/media.jpg [:db_insert_placeholder_4] => image/jpeg [:db_insert_placeholder_5] => 21616 [:db_insert_placeholder_6] => 1 [:db_insert_placeholder_7] => 1270510994 ) in content_migrate_filefield_data_record_alter() (line 174 of C:\xampp\htdocs\sandboxsite4\sites\all\modules\cck\modules\content_migrate\modules\content_migrate.filefield.inc).

I run it a third time and it completes.

So now I want to narrow in on the problem files causing these error's

field_document_upload, fid 46, IVAW_Bylaws_as_of_09_AUG_09.doc
field_header_image_upload, fid 152, media.jpg

If I go to the file's table and delete the offending FID's and then rerun cck migration, I still get errors like above. Then if you delete those new FID's in the error you get more new FID's. Since it seemed to be endless I tried something that ended up working.

Moderate Success Achieved
Before migrating any fields empty your file_managed table. Now run the cck field migration. Everything should work and you should now have a bunch of files listed in the file_managed table. Copy this table and then restore your db to pre migration. What you have to do is make sure every FID listed, in that saved file_managed table, is deleted from the pre migration file_managed table. I had to delete around 60 rows. Now re-run the migration.

At this point you may continue to get more errors with FID's. Start writing these down. After each one shows up go in the file_managed table and delete. Then on the migration "Roll back selected fields". Now when you run it again everything will be successful or you will get another FID to record. Repeat this process until you stop receiving errors.

Now restore your pre-migration DB. Delete out all the offending FID's from the file_managed table. Run the CCK field migration. Now you're done!

Note: You do not need to delete revisions for this to work, but I do believe it reduced the number of FID's I ultimately had to remove.

While this isn't quick or optimal, it works. If anyone has a better/simpler solution for this, please share.

Comments

solotandem’s picture

In your post at http://drupal.org/node/781088#comment-4545832 you wrote, "Unfortunantly I've already migrated to D7, so I ran this against my D7 DB." Do you have the D6 database to run the query against? That is the one that needs to be scrubbed before attempting the field migration. If you eliminate the duplicate filepaths in the D6 table, then the integrity constraint issue should be eliminated.

I wouldn't term this a bug against this module. It is a data issue related to the API change between D6 and D7.

asb’s picture

@animelion: Do you have any idea where the duplicate filepaths come from?

bryancasler’s picture

I just ran that query against my D6 DB and there is not a single duplicate filepath being returned. I do have several like this

fid uid filename filepath
211 1 banner-membership.jpg sites/default/files/images/header-images/banner-membership.jpg image/jpeg
212 1 banner-membership2.jpg sites/default/files/images/header-images/banner-membership2.jpg image/jpeg
213 1 banner-membership3.jpg sites/default/files/images/header-images/banner-membership3.jpg image/jpeg
214 1 banner-membership4.jpg sites/default/files/images/header-images/banner-membership4.jpg image/jpeg

But none have duplicate filepaths.

bryancasler’s picture

asb, I'm not able to narrow it down, but I'm open to suggestions on how to do this.

When trying the migration on my D7 site I kept running into the duplicate fid "49". So I just did a search on my D6 DB for any fid = "46". I would expect multiple results because of the conflict in D7, but only one was found. I get the same results if I search for the filename instead of the FID.

bryancasler’s picture

I think I just got closer to figuring this out. I recorded all the conflicting FID's in D7. They were (33,34,40,42-97) and (152,157). The first group of FID's was generated when trying to migrate "field_document_upload". The second group was generated when trying to migrate "field_header_image_upload".

So in the D6 DB I looked in the "content_field_document_upload" table. It has the "field_document_upload_fid" field. Many of the values are null, but if I sort those values numerically, they match perfectly with the first group of duplicate FID's .
http://awesomescreenshot.com/068e4k550
http://awesomescreenshot.com/0d5e4klf9

If I search the "field_header_image_upload_fid" field in the "content_field_header_image_upload" table, I find my remaining 2 conflicting FID's
http://awesomescreenshot.com/02ee4ktd9

Seems like this is where the duplicates are coming from. Should I go ahead and delete them?

solotandem’s picture

@asb, In my case, the duplicate filepaths came about because of case insenstivity. Let me explain.

The site had multiple image fields stored in separate content_field_xxx tables. An image file with the same name (on a case insensitive basis) was uploaded to two different fields, resulting in two different fids. For example, the image file names were foo.png and Foo.png. On conversion to D7, content migrate merges the rows using the fid. In other words, if three of my content_field_xxx tables all referenced fid = 100, then only one row with fid = 100 would be found in file_managed table. However, the foo.png and Foo.png filepaths have different fids but are considered the same uri and the attempt to insert the second one results in a duplicate to file_managed. Its unique index on uri is case insensitive (with MySQL at least).

There may be other ways this occurs.

For example, the D6 file table does not have a unique index on its filepath. So, it would accept duplicate filepaths (e.g. if inserted programmatically), resulting in the same uri associated with different fids. The second of these rows would not be allowed on conversion to the D7 file_managed table, but would be attempted as Karen's code does not check for this condition (and admittedly it would seem to be an edge case).

solotandem’s picture

@animelion: The duplicate fid references in the content_field_xxx tables should not be a problem. Karen's code checks for this condition (as mentioned in #6 above).

Rather, the problem should be different fids pointing to the same filepath. That condition would not be caught by Karen's code and would cause the insert error when running content_migrate. Could you confirm that none of those fids have the same filepath (even on a case insensitive basis)? From the screenshots, you appear to be working on Windows. Is there any difference in MySQL's handling of case between Windows and Linux?

bryancasler’s picture

I just threw this SQL dump of my D6 DB into excel.

SELECT COUNT(*) AS Rows, filepath
FROM files
GROUP BY filepath
ORDER BY Rows DESC

I converted all the filepaths to lowercase and checked for duplicates. None found.

Not sure if this matter, but I did find one row without a filepath. http://awesomescreenshot.com/043e4u0e2 Nothing else to report.

bryancasler’s picture

solotandem,

Do you think it's possibly that we're encountering two different bugs, resulting in the same/similar error?

bryancasler’s picture

I've gone back over everything and found a few mistakes I've made. I've also found some additional insight into what's happening, but I still need help to figure out what we can do about it.

After reviewing other issue que's it's my belief we're running into two errors that will have to be addressed separately. Duplicate uri's and duplicate fid's.

Integrity constraint violation: 1062 Duplicate entry 'public://IVAWChapterapplication1.doc' for key 'uri'...
Integrity constraint violation: 1062 Duplicate entry 'node-2381-0-0-und' for key 'PRIMARY'...

Duplicate 'URI' (filepath's)

After running the following query.

SELECT COUNT(*) AS Rows, filepath
FROM files
GROUP BY filepath
ORDER BY Rows DESC

I exported the results and searched for duplicates, ignoring case sensitivity. No duplicate URI's were found. That's when I started comparing D6 and D7 URI fields and I realized that I needed to be searching for duplicate URI exluding "sites/default/files/" from the path. Once that was removed I found several duplicate URI's.

I also found this SQL snippet, it can help track down duplicate URI's

SELECT filepath, COUNT(filepath) as count FROM files GROUP BY filepath HAVING count > 1

This patch may be another solution for this problem http://drupal.org/node/781088#comment-3995354

Unfortunantly, no one has provided any feedback on it and the issue que there was closed.

Duplicate 'PRIMARY' (fid's)

I used the following sql query to find duplicate FID's in my CCK fields and in my Files table. You'll have to adjust and run this for each of your CCK fields.

SELECT * FROM cck_table_name INNER JOIN files on cck_table_name.cck_fid_name = files.fid WHERE files.fid

An example of how this looks in practice

SELECT * FROM content_field_document_upload INNER JOIN files on content_field_document_upload.field_document_upload_fid = files.fid WHERE files.fid

This method returned all the duplicate fid's I had found before as well as some new ones. Because of all the new duplicate fid's I'm even more confused than before. Not really sure how to explain this.

What's to be done

Ultimately I can't solve this alone, I don't have a deep understanding of D6 and D7 file and DB structures. I'll keep working on this problem to the best of my ability since I see a lot of folks reporting the same issues. If anyone new can help contribute to this I would be greatly appreciative.

KarenS’s picture

Title:PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'xx' for key 'PRIMARY'» Duplicate files cause Integrity constraint violation: 1062 Duplicate entry 'xx' for key 'PRIMARY'
Priority:Normal» Critical

Changing the title to reflect the fact that this problem is specifically related to files. I need clarification about what is duplicated in the original files. There shouldn't be any way to have duplicate file ids, the fid is an auto_increment field, there should be no way to have duplicate fids unless you have done something that forced bad values in there (some migration gone awry or something). I'm not sure the migration code should be trying to fix bad data, although I suppose we could.

Nothing in the database structure prevents duplicate paths. But I'm not sure how they would have gotten in there in the first place. If you upload files through the normal process I think Drupal automatically creates a unique file name, and hence a unique path. So again, I'm wondering if this would only come about if you created files in some non-standard way, like through a migration or some custom code or whatever.

If it is possible to have duplicate values in the D6 table using normal methods of creating files (doing it through the UI where values are validated and massaged as necessary), then the migration absolutely has to have a way to clean that up. If the only way this can happen is if you have manually inserted these values, I'm not sure.

Core is migrating files created with the Upload module. I wonder if there is any possibility of duplicates using that module, and if so, if the core upgrade properly cleans it up.

KarenS’s picture

Actually, if you have duplicate fids, I'm not sure how we could clean this up safely. You need to look at these duplicates. Do they have all the same data or is one pointing to one file and the other to a different one? If they all point to the same paths, you need to just remove all but one of the dups. If they point to different paths, you need to figure out which path is the 'right' path and remove the other. I think you will have to do this yourself before attempting a migration.

KarenS’s picture

And reading your comment more carefully, I assumed you were finding duplicate fids in the files table alone. But you are joining in the content table, so that is muddying the waters. See if you have duplicate fids in the files table. If not, that is OK. Duplicate *references* to fids is fine (which I think is what you're seeing by joining in the content table).

Take the value you are seeing in that error 'IVAWChapterapplication1.doc'. Look for that in your files table. See if it shows up more than once in the files table. Get the fid. See what content is using that fid. Look at the content and see what patterns there are. Is this a shared field across content types? Is this something that has multiple revisions? How is the field configured (you can enable the Content Copy module in D6 and export the field and paste it here).

bryancasler’s picture

On my D6 site I have not altered any DB values directly. Everything that is in my DB has been created and put there by various modules. One of which is a custom module that I mention later.

These are the fields I am working with on my D6 site.
http://awesomescreenshot.com/051e9klee

To clear something up, I'm sorry if I gave the impression that I'm trying to join the files table with content tables. I am not, but I thought the fields migration in D7 was. The reason I thought this was because I'm getting errors like Integrity constraint violation: 1062 Duplicate entry '46' for key 'PRIMARY'.... My files table only has one entry for FID '46' so I assumed the migration process was trying to import another field with a duplicate ID. That's when I started searching other content tables for the same FID's throwing errors, and I started finding them all in my (content_field_document_upload, content_field_header_image_upload) tables. To me there seemed to be correlation between the errors I was getting, and the content table's FID's compared to the file table's FID's.

Per your instructions I looked for "IVAWChapterapplication1.doc" in my fields table, it appears twice (fid 83, fid 93). I then searched each content field for these two FID's in their respective content *FID fields. I also repeated this process for all the duplicate URI's I found in my files table.

IVAWChapterapplication1.doc (fid 83, fid 93)

content_field_document_upload

vid nid fid
240 115 83
240 115 93

NID 115 is a book content type. Here is an export of my book content type. It has one cck file field (field_document_upload).
http://pastebin.com/1kpfhK2P
http://awesomescreenshot.com/030e9ob50

Iris_Chicago_V-day_2010.jpg (fid 777, fid 780)

content_field_document_upload

vid nid fid
2005 824 777
2006 824 777
2007 824 777
2008 824 777

NID 824 is a page content type.
http://pastebin.com/UXim5bxH
http://awesomescreenshot.com/054e9of77

Iris_Chicago_V-day_2010.jpg (fid 777, fid 780)

content_field_photo_upload

vid nid fid
2009 924 777
2010 924 777

NID 824 is a photo gallery content type.
http://pastebin.com/xBTK7Ztc
http://awesomescreenshot.com/066e9oi49

The rest of the duplicate URI's were not found.

28Aug2010_Minutes_0.pdf (519,589)
Hot_Chip_-_Ready_For_The_Floor.avi (101,286)
dd214.jpg (459,819,1165)
DD214.pdf (790,1101,1163)
DD214_0.pdf (1066, 1103)
dd214_1.jpg (831,876,1032)
dd214_2.jpg (849,985)

*Note: "28Aug2010_Minutes_0.pdf" was uploaded to "/tmp/28Aug2010_Minutes_0.pdf". I believe this is when I was testing out a new content type I had created and then I ultimately changed the file upload path.

*Note "Hot_Chip_-_Ready_For_The_Floor.avi" was a test video I uploaded for a content type that no longer exist.

*Note all of the "DD214..." files were uploaded by a custom module we had built for our website.

Karen I hope I was able to answer all your questions and clear up any confusion I might have caused.

KarenS’s picture

Then you do have duplicates in your file table (two file entries pointing to the same file). D7 will not allow duplicates, hence the problem in the migration. D6 allowed this, D7 does not.

You cannot create duplicates using the normal Drupal UI, you created them by using a custom module to create the files, so that's how they got there. As solotandem pointed out, that is your problem.

bryancasler’s picture

Thanks for everyone's help on this. Ultimately I had to remove the offending fields before migrating. No problems since then. Thanks again for everyone's time on this.

steinmb’s picture

Status:Active» Fixed

Status:Fixed» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

eosrei’s picture

Priority:Critical» Major
Status:Closed (fixed)» Needs review
StatusFileSize
new4.15 KB

I ran into this issue upgrading a client's large site. Rather than risk data loss, I decided check for duplication before saving the file to file_managed.

A single drupal_set_message() is set if duplication is found, and watchdog() called for each instance of duplication. The found FID is used in place of the old FID. The watchdog() calls for duplicate URIs are warnings. The watchdog() calls for duplicate FID, but different URIs are errors. I changed the code from db_merge() to drupal_write_record() for brevity, plus my code editor removed some extraneous whitespace.

Example of one of the watchdog() error messages:

Filefield FID duplication found for nid: 2765, field:field_blog_image. Old fid:9268. Old uri:public://picture_11.png. New fid:7780. New uri:public://Picture_11.png. File data may be incorrect.
eosrei’s picture

StatusFileSize
new4.61 KB

Bugfix: modified patch to update the filefield's fid value.

bryancasler’s picture

eosrei, Thanks so much for sharing your solution back with the community

eosrei’s picture

You're welcome.
Just noticed, the patch above includes the patch from #1293858: Typo in content_migrate.filefield.inc.

squiggy’s picture

Status:Needs review» Reviewed & tested by the community

Confirmed that the patch in #20 works. Thank you! :)

SangersDrupalDude’s picture

I have lots of issues with this, a file can be deleted, leaving a path in the revision info. A new file with the same path can be added and perhaps attached to a different node, with a different fid, I will have a look at running again, and perhaps empty files_managed, or try the patches, been trying to clean up duplicate path entries, and just leave the most current fid:

  $query = db_select('files', 'f');
  $query->addField('f', 'filepath', 'filepath');
  $query->addField('f', 'fid', 'fid');
  $query->addExpression('COUNT(*)', 'count');
  $query->groupBy('filepath');
  $query->orderBy('fid', 'ASC');
  $results = $query->execute();
  foreach ($results as $result) {
    if($result->count>1){
      db_delete('files')->condition('fid', $result->fid, '!=')->condition('filepath', $result->filepath, '=')->execute();
    }
  }
jenlampton’s picture

Status:Reviewed & tested by the community» Needs work

@eosrei thanks for the patch.

I've got a site with files named "Podium.jpg" and "podium.jpg". I think the ideal situation would be that on upgrade Drupal would rename one of the files - and the reference to it - podium_0.jpg rather than throwing out one set of data entirely and just logging it.

Even with logging errors - there's a huge possibility for data loss here if the logs are misunderstood, ignored, or overlooked. I think we can find a better solution - at least to the case insensitive duplicate filename issue mentioned by solotandem in #7.

eosrei’s picture

@jenlampton This patch fixes the error. It does not fix your data. It would be great to fix your data too, but that can be subjective.

My workflow once I built this patch was: backup site, run CCK migrate, restore backup, fix the errors in the db, and run CCK migrate again. I know this isn't the ideal solution, but we had a deadline ;) and discovered only six bad files in 3GB of sites/default/files.

jenlampton’s picture

@esori, The error is there for a reason - because there's a problem with your data. Let's fix the data problem and not just remove the errors :)

Drupal "subjectively" fixes your data already, when people try to upload a file with the same name as another, Drupal renames it with an _0, _1, _2, etc. There's no reason we can't do the same thing on upgrade - and retain the relationships between those files and their data.

You are lucky that you didn't have that many bad files, probably smartish end users? But I still think this patch in it's current state should not be applied to the cck content_migrate module as-is, it's just too dangerous.

I'm working on a data clean-up version now. Will post here when I'm done.

jenlampton’s picture

Status:Needs work» Needs review
StatusFileSize
new2.85 KB

Attached is a patch that does a buch of cleanup on the files table (and actual files) before the content_migrate process - to assure that files with upper/lower variations of the same name will not create a conflict with the URI.

There's one TODO in there - I still need to figure out how drupal determines weather these files should be public or private on upgrade. If anyone can shed any light on that I'd be grateful :)

jenlampton’s picture

Found it. :-)

SangersDrupalDude’s picture

FYI file_directory_path is not in D7 although you will have carried it over from D6.
I also found an issue with user pics, which I converted to private downloads, wouldn't display unless filename started with "picture-xxx.jpg". Heres my update fix:

/**
* Fix user pictures for private downloads.
*/
function startupseven_update_7047() {
  $path_to_user_pics = realpath('sites/default/files/domains/wtsi/userpics/');
  $userfiles = file_scan_directory($path_to_user_pics, '/.*\.jpg$/');
  foreach($userfiles as $key => $value){
    if( ($value->uri != '') && ($value->filename != '') && ($value->name != '') ){
      $uri = $value->uri;
      $filename = $value->filename;
      $name = $value->name;
      $newuri =  (str_replace($filename, '', $uri) . 'picture-' . $filename);
      if(strpos($filename, 'picture-')!==0){
        rename($uri, $newuri);
      }
    }
  }
  variable_set('user_picture_default','private://domains/wtsi/userpics/picture-avatar.jpg');
  $query = db_select('file_managed', 'fm');
  $query->fields('fm', array('fid', 'uid', 'filename', 'uri', 'filemime', 'filesize', 'status', 'timestamp'));
  $query->condition('uri', 'sites/default/%', 'LIKE');
  $results = $query->execute();
  foreach($results as $result){
    if($result->fid!=''){
      db_delete('file_usage')->condition('fid', (int) $result->fid)->execute();
      $result->uri = str_replace('sites/default/files/', 'private://', $result->uri);
      if(strpos($result->filename, 'picture-')!==0){
        $result->uri = str_replace($result->filename, 'picture-'.$result->filename, $result->uri);
        $result->filename = 'picture-'. $result->filename;
      }
      db_merge('file_managed')->key(
        array(
          'fid' => (int) $result->fid,
        )
      )->fields(
        array(
          'fid'   =>  (int) $result->fid,
          'uid'   => (int) 1,
          'filename' => (string) $result->filename,
          'uri'     => (string) $result->uri,
          'filemime' => (string) 'image/jpeg',
          'filesize' => (int) $result->filesize,
          'status' => (int) 1,
          'timestamp' => REQUEST_TIME,
        )
      )->execute();
      db_merge('file_usage')->key(
        array(
          'fid' => (int) $result->uid,
        )
      )->fields(
        array(
          'fid'   =>  (int) $result->fid,
          'module'   => (string) 'user',
          'type'   => (string) 'user',
          'id'   => (int) 1,
          'count'     => (int) 1,
        )
      )->execute();
    }
  }
  node_access_rebuild(TRUE);
  drupal_flush_all_caches();
  watchdog('startupseven', 'update_7047 - Fixed user pictures for private downloads.');
  return 'update_7047 - Fixed user pictures for private downloads.';
}
jenlampton’s picture

Status:Needs review» Needs work

Hm, then this patch still needs work.

SangersDrupalDude’s picture

I did my own migrate in the end as I got fed up of looking for bugs in other peoples code:

/**
* Migrate file_managed and file_usage values before file migrate breaks everything by overriting the values.
*/
function startupseven_update_7049() {
  $maxid = db_select('files','f')->fields('f', array('fid'))->orderby('fid','DESC')->range(0,1)->execute()->fetchCol();
  $maxid = $maxid[0]+1;
  $query = db_select('users', 'u');
  $query->fields('u', array('uid', 'picture'));
  $query->condition('picture', 0, '>');
  $query->orderBy('uid', 'ASC');
  $results = $query->execute();
  foreach($results as $result){
    $query3 = db_select('file_managed', 'fm');
    $query3->fields('fm', array('fid', 'uid', 'filename', 'uri', 'filemime', 'filesize', 'status', 'timestamp'));
    $query3->condition('fid', $result->picture);
    $results3 = $query3->execute();
    foreach($results3 as $result3){
      db_delete('file_managed')->condition('fid', $result->picture)->execute();
      if(strpos($result3->filename, 'picture-')===0){
        db_delete('files')->condition('fid', $result->picture)->execute();
        db_merge('file_managed')->key(
          array(
            'fid' => (int) $maxid,
          )
        )->fields(
          array(
            'fid' => (int) $maxid,
            'uid' => (int) $result->uid,
            'filename' => (string) $result3->filename,
            'uri' => (string) $result3->uri,
            'filemime' => (string) $result3->filemime,
            'filesize' => (int) $result3->filesize,
            'status' => (int) $result3->status,
            'timestamp' => (int) $result3->timestamp,
          )
        )->execute();
      }
      db_delete('file_usage')->condition('fid', $result->picture)->execute();
      if(strpos($result3->filename, 'picture-')===0){
        db_merge('file_usage')->key(
          array(
            'fid' => (int) $maxid,
          )
        )->fields(
          array(
            'fid' => (int) $maxid,
            'module' => (string) 'user',
            'type' => (string) 'user',
            'id' => (int) $result->uid,
            'count' => (int) 1,
          )
        )->execute();
      }
      db_merge('users')->key(
        array(
          'uid' => (int) $result->uid,
        )
      )->fields(
        array(
         'picture' => (int) $maxid,
        )
      )->execute();
      $maxid++;
    }
  }
  $query = db_select('files', 'f');
  $query->fields('f', array('fid', 'uid', 'filename', 'filepath', 'filemime', 'filesize', 'status', 'timestamp'));
  $query->condition('status', 0, '>');
  $query->orderBy('fid', 'ASC');
  $results = $query->execute();
  foreach($results as $result){
    $result->filepath = str_replace('sites/default/files/', 'private://', $result->filepath);
    //db_delete('files')->condition('fid', $result->picture)->execute();
    db_delete('file_managed')->condition('uri', $result->filepath)->execute();
    db_merge('file_managed')->key(
      array(
        'fid' => (int) $result->fid,
      )
    )->fields(
      array(
        'fid' => (int) $result->fid,
        'uid' => (int) $result->uid,
        'filename' => (string) $result->filename,
        'uri' => (string) $result->filepath,
        'filemime' => (string) $result3->filemime,
        'filesize' => (int) $result3->filesize,
        'status' => (int) $result3->status,
        'timestamp' => (int) $result3->timestamp,
      )
    )->execute();
    db_merge('file_usage')->key(
      array(
        'fid' => $result->fid,
      )
    )->fields(
      array(
        'fid' => (int) $result->fid,
        'module' => (string) 'imce',
        'type ' => (string) 'file',
        'id' => (int) $result->fid,
        'count' => (int) $result->count,
      )
    )->execute();
  }
  watchdog('startupseven', 'update_7049 - Migrated file_managed and file_usage values before file migrate breaks everything by overriting the values.');
  return 'update_7049 - Migrated file_managed and file_usage values before file migrate breaks everything by overriting the values.';
}
SangersDrupalDude’s picture

Title:D6 -> D7 upgrade: Duplicate files cause Integrity constraint violation: 1062 Duplicate entry 'xx' for key 'PRIMARY'» Duplicate files cause Integrity constraint violation: 1062 Duplicate entry 'xx' for key 'PRIMARY'
Status:Needs review» Needs work

This fixes about 80% of any missing file managed entries (as long as the uri does not already exist in the files_managed table, or the wrong fid has been added to the files_managed table.

/**
* Fix missing file_managed entries.
*/

function postmigrate_update_7009() {
  $checktables = array(
    'field_data_field_book_docs' => 'field_book_docs_fid',
    'field_data_field_chitchat_docs' => 'field_chitchat_docs_fid',
    'field_data_field_forum_docs' => 'field_forum_docs_fid',
    'field_data_field_group_docs' => 'field_group_docs_fid',
    'field_data_field_services_docs' => 'field_services_docs_fid',
  );
  $missingdocs=0;
  $cantfixmissingdocs=0;
  foreach($checktables as $key => $value){
    if(isset($value)){
      $query = db_select($key, 'ttt');
      $query->fields('ttt', array($value, 'entity_id') );
      $query->orderBy($value, 'ASC');
      $records = $query->execute();
      foreach($records as $record){
        if(isset($record->$value)){
          $query2 = db_select('file_managed', 'fm');
          $query2->fields('fm', array('fid', 'uri') );
          $query2->condition('fid', $record->$value);
          $query2->orderBy('fid', 'ASC');
          $records2 = $query2->execute();
          $valid=0;
          if(is_object($records2)){
            foreach($records2 as $record2){
              if(is_object($record2)){
                if(isset($record2->fid)){
                  $valid=1;
                }
              }
            }
          }
          if($valid==0){
            $query3 = db_select('files', 'f');
            $query3->fields('f', array('fid', 'uid', 'filename', 'filepath', 'filemime', 'filesize', 'status', 'timestamp') );
            $query3->orderBy('fid', 'ASC');
            $query3->condition('fid', $record->$value);
            $records3 = $query3->execute();
            foreach($records3 as $record3){
              if(is_object($record3)){
                $filepathok=1;
                $query4 = db_select('file_managed', 'fm');
                $query4->fields('fm', array('uri','fid') );
                $query4->condition(  'uri', str_replace('sites/default/files/', 'private://', $record3->filepath), '=' );
                $records4 = $query4->execute();
                foreach($records4 as $record4){
                  if(isset($record4->uri)){
                    if($record4->uri == str_replace('sites/default/files/', 'private://', $record3->filepath)){
                      $filepathok=0;
                      $cantfixmissingdocs++;
                    }
                  }
                }
                if($filepathok==1){
                  $missingdocs++;
                  $newfid = db_insert('file_managed')->fields(
                   array(
                     'uid' => (int) $record3->uid,
                     'filename' => (string) $record3->filename,
                     'uri' => (string) str_replace('sites/default/files/', 'private://', $record3->filepath),
                     'filemime' => (string) $record3->filemime,
                     'filesize' => (int) $record3->filesize,
                     'status' => (int) $record3->status,
                     'timestamp' => (int) $record3->timestamp,
                   )
                  )->execute(); 
                  db_merge($key)->key(
                    array(
                      $value => (int) $record3->fid,
                    )
                  )->fields(
                    array(
                      $value => (int) $newfid,
                    )
                  )->execute();
                }
                else{
                  $dupenode=1;
                  foreach($checktables as $key => $value){
                    if(isset($value)){
                      $query5 = db_select($key, $key);
                      $query5->fields($key, array($value, 'entity_id') );
                      $query5->condition('entity_id', $record->entity_id, '=');
                      $query5->orderBy($value, 'ASC');
                      $records5 = $query5->execute();
                      $isadupe=1;
                      foreach($records5 as $record5){
                        if( ($record->entity_id==$record5->entity_id)  && ($record5->$value==$record4->fid) ){
                          $query9 = db_select($key, $key);
                          $query9->addField($key, $value);
                          $query9->addField($key, 'entity_id');
                          $query9->condition('entity_id', $record5->entity_id, '=');
                          $query9->condition($value, $record4->fid, '=');
                          $records9 = $query9->execute();
                          $doupdate=0;
                          foreach ($records9 as $record9) {
                            if($record9->$value){
                              $fid='';
                              $query222 = db_select($key, $key);
                              $query222->addField($key, $value);
                              $query222->condition($value, $record4->fid, '=');
                              $records222 = $query222->execute();
                              foreach ($records222 as $record222) {
                                $fid = $record222->$value;
                                if($fid>0){
                                  // this went wrong and changed all fid's to point at one fid.
                                }
                                else{
                                  $uri='';
                                  $query223 = db_select('file_managed', 'file_managed');
                                  $query223->addField('file_managed', 'uri');
                                  $query223->addField('file_managed', 'fid');
                                  $query223->condition('uri', str_replace('sites/default/files/', 'private://', $record3->filepath), '=');
                                  $query223->orderBy('uri', 'ASC'); 
                                  $records223 = $query223->execute();
                                  foreach ($records223 as $record223) {
                                    $uri = $record223->uri;
                                  }
                                  if($uri==''){
                                    db_insert('file_managed')->fields(
                                      array(
                                        'fid' => (int) $record3->fid,
                                        'uid' => (int) $record3->uid,
                                        'filename' => (string) $record3->filename,
                                        'uri' => (string) str_replace('sites/default/files/', 'private://', $record3->filepath),
                                        'filemime' => (string) $record3->filemime,
                                        'filesize' => (string) $record3->filesize,
                                        'status' => (int) $record3->status,
                                        'timestamp' => (int) $record3->timestamp,
                                      )
                                    )->execute();
                                  }
                                  else{
                                    $query = db_merge($key)->key(
                                      array(
                                        $value => (int) $record3->fid,
                                      )
                                    )->fields(
                                      array(
                                        $value => (int) $record223->fid,
                                      )
                                    )->execute();
                                  }
                                }
                              }
                            }
                          }
                        }
                      }  // end foreach $records5
                    }  // end if(isset($value)){
                  }  // end foreach $checktables
                  if($dupenode==1){
                    $query33 = db_select('files', 'f');
                    $query33->fields('f', array('fid', 'uid', 'filename', 'filepath', 'filemime', 'filesize', 'status', 'timestamp') );
                    $query33->orderBy('fid', 'ASC');
                    $query33->condition('fid', $record->$value);
                    $records33 = $query33->execute();
                    foreach($records33 as $record33){
                      if(is_object($record33)){
                        if(is_numeric($record33->fid)){
                          $uri='';
                          $query22 = db_select('file_managed', 'file_managed');
                          $query22->addField('file_managed', 'uri');
                          $query22->addField('file_managed', 'fid');
                          $query22->condition('uri', str_replace('sites/default/files/', 'private://', $record33->filepath), '=');
                          $query22->orderBy('uri', 'ASC'); 
                          $records22 = $query22->execute();
                          foreach ($records22 as $record22) {
                            $uri = $record22->uri;
                          }
                          if($uri==''){
                            db_insert('file_managed')->fields(
                              array(
                                'fid' => (int) $record33->fid,
                                'uid' => (int) $record33->uid,
                                'filename' => (string) $record33->filename,
                                'uri' => (string) str_replace('sites/default/files/', 'private://', $record33->filepath),
                                'filemime' => (string) $record33->filemime,
                                'filesize' => (string) $record33->filesize,
                                'status' => (int) $record33->status,
                                'timestamp' => (int) $record33->timestamp,
                              )
                            )->execute();
                          }
                          else{
                            db_update($key)->fields(array($value => (int) $record22->fid))->condition($value, $record->$value, '=')->execute();
                          }
                        }
                      }
                    }
                    //watchdog('postmigrate', 'node ' . $record->entity_id . ' has a revision_id of ' . $revision_id . ' in table=' . $value . ' is fid=' . $record->$value);
                  } // end if($dupenode==1){
                } // end if($filepathok==0){
              } // if $filepathok=0
            } // end foreach $record3
          } // end valid=0
        } // end if $value
      } // end foreach $record
    }  // end if $value
  } // end foreach $checktables
  watchdog('postmigrate', 'update_7009 - Fixed missing file_managed entries, corrected ' . $missingdocs . ' entries. Could not fix ' . $cantfixmissingdocs . ' entries.');
  return 'update_7009 - Fixed missing file_managed entries, corrected ' . $missingdocs . ' entries. Could not fix ' . $cantfixmissingdocs . ' entries.';
}

SangersDrupalDude’s picture

also can past this into a php page to check for dupes in the files table, and missing entries in the files_managed table, modify $checktables array as required

<?php
  $dupecount
=0;
 
$dupecountfinder=0;
 
$checktables = array(
   
'field_data_field_book_docs' => 'field_book_docs_fid',
   
'field_data_field_chitchat_docs' => 'field_chitchat_docs_fid',
   
'field_data_field_forum_docs' => 'field_forum_docs_fid',
   
'field_data_field_group_docs' => 'field_group_docs_fid',
   
'field_data_field_services_docs' => 'field_services_docs_fid',
  );
 
$query = db_select('files', 'files');
 
$query->addField('files', 'fid');
 
$query->addField('files', 'filepath');
 
$query->addExpression('COUNT(*)', 'count');
 
$query->groupBy('filepath');
 
$query->orderBy('fid', 'ASC');
 
$results = $query->execute();
  foreach (
$results as $result) {
    if(
$result->count>1){
     
$dupecountfinder++;
     
$query4 = db_select('files', 'files');
     
$query4->addField('files', 'fid');
     
$query4->addField('files', 'filepath');
     
$query4->condition('filepath', $result->filepath, '=');
     
$query4->orderBy('fid', 'ASC');
     
$results4 = $query4->execute();
      foreach (
$results4 as $result4) {
        print
'checking path=' . $result->filepath  . '<br />';
       
$deletethisdupe=1;
        foreach(
$checktables as $key => $value){
         
$query3 = db_select($key, $key);
         
$query3->fields($key, array($value) );
         
$query3->condition($value, $result4->fid, '=');
         
$records3 = $query3->execute();
          if(
is_object($records3)){
            foreach(
$records3 as $record3){
              if(
$record3->$value == $result4->fid){
               
$deletethisdupe=0;
                print
'Do not need to delete ' . $result4->fid . ' path=' . $result4->filepath . ', record found in ' . $key . ' table.<br/>';
              }
            }
          }
        }
        if(
$deletethisdupe==1){
          print
'need to delete ' . $result4->fid . ' path=' . $result4->filepath . '<br/>';
         
$dupecount++;
        }
      }
      print
'<br/><br/>';
    }
  }
  print
'found ' . $dupecountfinder . ' entries with duplicates, need to check  ' . $dupecount . ' records.';
?>

<hr />

<?php
  $checktables
= array(
   
'field_data_field_book_docs' => 'field_book_docs_fid',
   
'field_data_field_chitchat_docs' => 'field_chitchat_docs_fid',
   
'field_data_field_forum_docs' => 'field_forum_docs_fid',
   
'field_data_field_group_docs' => 'field_group_docs_fid',
   
'field_data_field_services_docs' => 'field_services_docs_fid',
  );
 
$missingdocs=0;
  foreach(
$checktables as $key => $value){
    if(isset(
$value)){
     
$query = db_select($key, 'ttt');
     
$query->fields('ttt', array($value) );
     
$query->orderBy($value, 'ASC');
     
$records = $query->execute();
      foreach(
$records as $record){
        if(isset(
$record->$value)){
         
$query2 = db_select('file_managed', 'fm');
         
$query2->fields('fm', array('fid', 'uri') );
         
$query2->condition('fid', $record->$value);
         
$query2->orderBy('fid', 'ASC');
         
$records2 = $query2->execute();
         
$valid=0;
          if(
is_object($records2)){
            foreach(
$records2 as $record2){
              if(
is_object($record2)){
                if(isset(
$record2->fid)){
                 
$valid=1;
                }
              }
            }
          }
          if(
$valid==0){
           
$missingdocs++;
           
$query3 = db_select('files', 'f');
           
$query3->fields('f', array('fid', 'uid', 'filename', 'filepath', 'filemime', 'filesize', 'status', 'timestamp') );
           
$query3->orderBy('fid', 'ASC');
           
$query3->condition('fid', $record->$value);
           
$records3 = $query3->execute();
            foreach(
$records3 as $record3){
              if(
is_object($record3)){
                print
str_replace('sites/default/files/', 'private://', $record3->filepath) . '<br />';
              }
            }
          }
        }
      }
    }
  }
  print
'<br />missing docs='.$missingdocs;
?>

jenlampton’s picture

Title:Duplicate files cause Integrity constraint violation: 1062 Duplicate entry 'xx' for key 'PRIMARY'» D6 -> D7 upgrade: Duplicate files cause Integrity constraint violation: 1062 Duplicate entry 'xx' for key 'PRIMARY'
Status:Needs work» Needs review

@SangersDrupalDude since this issue is about upgrading from D6 to D7, then I'm not sure using file_directory_path is a bad idea. I'd love to get some other opinions on that. For now I'm re-marking #31 as needs review.

There's already an issue about turning user pictures into managed files, as well as one about making it possible for user pictures to be private instead of forcing them to always be public. I think maybe it would be good to note the work you've done where people are working on solutions to those problems too.

SangersDrupalDude’s picture

Title:Duplicate files cause Integrity constraint violation: 1062 Duplicate entry 'xx' for key 'PRIMARY'» D6 -> D7 upgrade: Duplicate files cause Integrity constraint violation: 1062 Duplicate entry 'xx' for key 'PRIMARY'
Status:Needs work» Needs review

and a bit more work...
Just need to compare content_field_*_docs from D6 and check that all the D7 entries are present.

/**
* Fix missing file_managed entries - wrong fid put in file_managed, bad path to file in in file_managed
*/
function postmigrate_update_7010() {
  $checktables = array(
    'field_data_field_book_docs' => 'field_book_docs_fid',
    'field_data_field_chitchat_docs' => 'field_chitchat_docs_fid',
    'field_data_field_forum_docs' => 'field_forum_docs_fid',
    'field_data_field_group_docs' => 'field_group_docs_fid',
    'field_data_field_services_docs' => 'field_services_docs_fid',
  );
  $missingdocs=0;
  foreach($checktables as $key => $value){
    $type=str_replace('_docs_fid', '', $value);
    $type=str_replace('field_', '', $type);
    if(isset($value)){
      $query = db_select($key, 'ttt');
      $query->fields('ttt', array($value, 'entity_id') );
      $query->orderBy($value, 'ASC');
      $records = $query->execute();
      foreach($records as $record){
        if(isset($record->$value)){
          $query2 = db_select('file_managed', 'fm');
          $query2->fields('fm', array('fid', 'uri') );
          $query2->condition('fid', $record->$value);
          $query2->orderBy('fid', 'ASC');
          $records2 = $query2->execute();
          $valid=0;
          if(is_object($records2)){
            foreach($records2 as $record2){
              if(is_object($record2)){
                if(isset($record2->fid)){
                  $valid=1;
                }
              }
            }
          }
          if($valid==0){
            $missingdocs++;
            $query3 = db_select('files', 'f');
            $query3->fields('f', array('fid', 'uid', 'filename', 'filepath', 'filemime', 'filesize', 'status', 'timestamp') );
            $query3->orderBy('fid', 'ASC');
            $query3->condition('fid', $record->$value);
            $records3 = $query3->execute();
            foreach($records3 as $record3){
              if(is_object($record3)){
                $query4 = db_select('file_managed', 'file_managed');
                $query4->fields('file_managed', array('fid', 'uri') );
                $query4->condition('uri', str_replace('sites/default/files/', 'private://', $record3->filepath));
                $badfid=0;
                $badpath=0;
                $records4 = $query4->execute();
                if(is_object($records4)){
                  foreach($records4 as $record4){
                    watchdog('postmigrate', 'changing file_managed fid=' . $record4->fid . ' to fid=' . $record3->fid);
                    db_merge('file_managed')->key(
                      array(
                        'fid' => (int) $record4->fid,
                      )
                    )->fields(
                      array(
                        'fid' => (int) $record3->fid,
                      )
                    )->execute();
                    $badfid=$record4->fid;
                  }
                }
                if($badfid<1){
                  $query5 = db_select('file_managed', 'file_managed');
                  $query5->fields('file_managed', array('fid', 'filename', 'uri') );
                  $query5->condition('filename', $record3->filename);
                  $records5 = $query5->execute();
                  if(is_object($records5)){
                    $numentries=0;
                    foreach($records5 as $record5){
                      if($record5->fid){
                        $numentries++;
                        if($numentries==1){
                          watchdog('postmigrate', 'wrong path and fid ' . $record5->fid . ':' . $record5->uri . ', updated to ' . str_replace('sites/default/files/', 'private://', $record3->filepath) . ':' . $record3->fid);
                          db_merge('file_managed')->key(
                            array(
                              'uri' => (string) str_replace('sites/default/files/', 'private://', $record5->uri),
                            )
                          )->fields(
                            array(
                              'uri' => (string)  str_replace('sites/default/files/', 'private://', $record3->filepath),
                              'fid' => (int) $record3->fid,
                            )
                          )->execute();
                        }
                        else{
                          db_delete('file_managed')->condition('fid', $record5->fid, '=')->execute();
                          watchdog('postmigrate', 'incorrect duplicate path deleted - ' . $record5->fid . ':' . str_replace('sites/default/files/', 'private://', $record5->uri));
                        }
                      }
                      $badpath=1;
                    }
                  }
                  if($badpath==0){
                    db_insert('file_managed')->fields(
                     array(
                       'fid' => (int) $record3->fid,
                       'uid' => (int) $record3->uid,
                       'filename' => (string) $record3->filename,
                       'uri' => (string) str_replace('sites/default/files/', 'private://', $record3->filepath),
                       'filemime' => (string) $record3->filemime,
                       'filesize' => (int) $record3->filesize,
                       'status' => (int) $record3->status,
                       'timestamp' => (int) $record3->timestamp,
                     )
                    )->execute(); 
                  }
                }
              }
            }
          }
        }
      }
    }
    $dupecount=0;
    $dupecountfinder=0;
    $checktables = array(
      'field_data_field_book_docs' => 'field_book_docs_fid',
      'field_data_field_chitchat_docs' => 'field_chitchat_docs_fid',
      'field_data_field_forum_docs' => 'field_forum_docs_fid',
      'field_data_field_group_docs' => 'field_group_docs_fid',
      'field_data_field_services_docs' => 'field_services_docs_fid',
    );
    $queryxx = db_select('files', 'files');
    $queryxx->addField('files', 'fid');
    $queryxx->addField('files', 'filepath');
    $queryxx->addExpression('COUNT(*)', 'count');
    $queryxx->groupBy('filepath');
    $queryxx->orderBy('fid', 'ASC');
    $resultsxx = $queryxx->execute();
    foreach ($resultsxx as $resultxx) {
      if($resultxx->count>1){
        $dupecountfinder++;
        $query44 = db_select('files', 'files');
        $query44->addField('files', 'fid');
        $query44->addField('files', 'filepath');
        $query44->addField('files', 'filename');
        $query44->condition('filepath', $resultxx->filepath, '=');
        $query44->orderBy('fid', 'ASC');
        $results44 = $query44->execute();
        foreach ($results44 as $result44) {
          $deletethisdupe=1;
          foreach($checktables as $key => $value){
            $query33 = db_select($key, $key);
            $query33->fields($key, array($value, 'entity_id') );
            $query33->condition($value, $result44->fid, '=');
            $records33 = $query3->execute();
            if(is_object($records33)){
              foreach($records33 as $record33){
                if($record33->$value == $result44->fid){
                  $deletethisdupe=0;
                }
              }
            }
          }
          if($deletethisdupe==1){
            db_delete('files')->condition('fid', $result44->fid, '=')->execute();
            $dupecount++;
          }
        }
      }
    }
  }
  watchdog('postmigrate', 'update_7010 - Fixed missing file_managed entries  - wrong fid put in file_managed, bad path to file in in file_managed');
  return 'update_7010 - Fixed missing file_managed entries  - wrong fid put in file_managed, bad path to file in in file_managed';
}
SangersDrupalDude’s picture

My file data is now correct, here is my script to confirm that migration has worked.

function startupseven_migrate_page(){
  $content='';
  $contenthasmigrated='';
  $themissing=0;
  $thetotal=0;
  $newtables = array(
    'field_data_field_book_docs' => 'book_docs',
    'field_data_field_chitchat_docs' => 'chitchat_docs',
    'field_data_field_forum_docs' => 'forum_docs',
    'field_data_field_group_docs' => 'group_docs',
    'field_data_field_services_docs' => 'services_docs',
    'field_data_field_page_uploads'  =>  'page_uploads',
  );
  $oldtables = array(
    'field_data_field_book_docs' => 'book_docs',
    'field_data_field_group_docs' => 'group_docs',
    'field_data_field_services_docs' => 'services_docs',
    'field_data_field_page_uploads' => 'page_uploads',
  );
  foreach($oldtables as $oldtableskey => $oldtablesvalue){
    $query = db_select('content_field_'.$oldtablesvalue, 'content_field_'.$oldtablesvalue);
    $query->fields('content_field_'.$oldtablesvalue, array('vid', 'nid', 'delta', 'field_' . $oldtablesvalue . '_fid', 'field_' . $oldtablesvalue . '_list', 'field_' . $oldtablesvalue . '_data') );
    $thefield3 = 'field_' . $oldtablesvalue . '_fid';
    $thefield4 = 'field_' . $oldtablesvalue . '_list';
    $query->condition($thefield3, 0, '>');
    $query->condition($thefield4, 0, '>');
    $query->orderBy('vid', 'ASC');
    $query->groupBy($thefield3);
    $results = $query->execute();
    $thefield6 = 'field_' . $oldtablesvalue . '_fid';
    foreach ($results as $result) {
      if($result->$thefield6>1){
        $thetotal++;
        $thefield2 = 'field_' . $oldtablesvalue . '_fid';
        $thefilename='';
        $thefilepath='';
        $query3 = db_select('files', 'files');
        $query3->fields('files', array('filename', 'filepath') );
        $query3->condition('fid', $result->$thefield2, '=');
        $results3 = $query3->execute();
        foreach ($results3 as $result3) {
          $thefilename = $result3->filename;
          $thefilepath = $result3->filepath;
        }
        if($thefilename!=''){
          $thefield5 =  'field_' . $newtables[$oldtableskey] . '_fid';
          $query5 = db_select('file_managed', 'file_managed');
          $query5->fields('file_managed', array('fid', 'filename', 'uri') );
          $query5->condition('uri',  str_replace('sites/default/files/', 'private://', $thefilepath), '=');
          $results5 = $query5->execute();
          $thefid='';
          foreach ($results5 as $result5) {
            $thefid = $result5->fid;
          }
          if($thefid<1){
            $thefid = $result->$thefield5;
          }
          $query6 = db_select('node', 'node');
          $query6->fields('node', array('nid', 'vid', 'status') );
          $query6->condition('status',  1, '=');
          $query6->condition('vid',  $result->vid, '=');         
          $results6 = $query6->execute();
          $livevid='';
          foreach ($results6 as $result6) {
            $livevid = $result6->vid;
          }
          if($livevid>0){
            $query2 = db_select($oldtableskey, $oldtableskey);
            $query2->fields($oldtableskey, array('entity_type', 'bundle', 'deleted', 'entity_id', 'revision_id', 'language', 'delta', 'field_' . $newtables[$oldtableskey] . '_fid', 'field_' . $newtables[$oldtableskey] . '_display',  'field_' . $newtables[$oldtableskey] . '_description') );
            $query2->orderBy('entity_id', 'ASC');
            //$query2->condition($thefield5, $result->$thefield5, '=');
            $query2->condition($thefield5, $thefid, '=');
            $results2 = $query2->execute();
            $hasmigrated=0;
            foreach ($results2 as $result2) {
              $thefield1 = 'field_' . $newtables[$oldtableskey] . '_fid';
              if($result2->$thefield1 > 0){
                $hasmigrated=1;
              }
            } // end foreach ($results2 as $result2) {
            if($hasmigrated==1){
              $contenthasmigrated .= '';
              $contenthasmigrated .='<br /><br />Passed: <a href="/node/' . $result->nid . '">' . $result->nid . '</a>[<a href="https://helix.wtgc.org/node/' . $result->nid . '">LIVE</a>]. ';
              $contenthasmigrated .='Field I.D. ' . $result->$thefield2 . ' from ' . 'content_field_'.$oldtablesvalue . ' was not migrated to ' . $oldtableskey;
              $contenthasmigrated .= '<br />path=' . $thefilepath . ', filename=' . $thefilename;
            }   
            if($hasmigrated==0){
              $themissing++;
              $content .='<br /><br />Failed: <a href="/node/' . $result->nid . '">' . $result->nid . '</a>[<a href="https://helix.wtgc.org/node/' . $result->nid . '">LIVE</a>]. ';
              $content .='Field I.D. ' . $result->$thefield2 . ' from ' . 'content_field_'.$oldtablesvalue . ' was not migrated to ' . $oldtableskey;
              $content .= '<br />path=' . $thefilepath . ', filename=' . $thefilename;
            }
          }
        }
      } // end if($result$thefield6>1){
    } // end foreach ($results as $result) {
  }  // end foreach($oldtables
  $content.='<br /><br />Total missing references: ' . $themissing  . ' out of ' . $thetotal;
  $content.= '<hr/>'.$contenthasmigrated;
  return $content;
}
SangersDrupalDude’s picture

and then fixed file usage table.

/**
* Fix missing file_usage entries.
*/
function postmigrate_update_7011() {
  //$content='';
  $notfound=0;
  $found=0;
  $contentgood='';
  $checktables = array(
    'field_data_field_book_docs' => 'field_book_docs_fid',
    'field_data_field_chitchat_docs' => 'field_chitchat_docs_fid',
    'field_data_field_forum_docs' => 'field_forum_docs_fid',
    'field_data_field_group_docs' => 'field_group_docs_fid',
    'field_data_field_services_docs' => 'field_services_docs_fid',
    'field_data_field_page_uploads'  =>  'field_page_uploads_fid',
  );
  $query = db_select('file_managed', 'file_managed');
  $query->fields('file_managed', array('fid', 'filename', 'uri') );
  $query->condition('status', 1, '=');
  $query->orderBy('fid', 'ASC');
  $records = $query->execute();
  if(is_object($records)){
    foreach($records as $record){
      if($record->fid){
        $entrycount=0;
        $revisioncount=0;
        $nids=array();
        $revisionids=array();
        foreach($checktables as $key => $value){
          $query2 = db_select($key, $key);
          $query2->fields($key, array($value, 'entity_id', 'entity_type', 'revision_id') );
          $query2->condition($value, $record->fid, '=');
          $query2->condition('entity_type', 'node', '=');
          $query2->orderBy('entity_id', 'ASC');
          $records2 = $query2->execute();
          if(is_object($records2)){
            foreach($records2 as $record2){
              if($record2->$value == $record->fid){
                $entrycount++;
                $nids[$record2->entity_id]=$record2->entity_id;
              } // end if($record2->$value == $record->fid){
            } // end foreach($records2 as $record2){
          } // end if(is_object($records2)){
        } // end  foreach($checktables as $key => $value){
        foreach($nids as $nidkey => $nidvalue){
          $query3 = db_select('node_revision', 'node_revision');
          $query3->fields('node_revision', array('nid', 'vid') );
          $query3->condition('nid', $nidvalue, '=');
          $records3 = $query3->execute();
          if(is_object($records3)){
            foreach($records3 as $record3){
              if($record3->nid>0){
                $revisioncount++;
              }
            } // end foreach($records3 as $record3){
          } // end if(is_object($records3)){
        }
        $fileusagenodeentryfound=0;
        $fileusageimceentryfound=0;
        if($entrycount>0){
          $query4 = db_select('file_usage', 'file_usage');
          $query4->fields('file_usage', array('fid', 'module', 'type', 'id', 'count') );
          $query4->condition('fid', $record->fid, '=');
          $records4 = $query4->execute();
          if(is_object($records4)){
            foreach($records4 as $record4){
              if($record4->fid>0){
                if($record4->module=='file'){
                  $fileusagenodeentryfound=1;
                }
                if($record4->module=='imce'){
                  $fileusageimceentryfound=1;
                }
              }
            }
          }
          $usagecount=0;
          foreach($nids as $nidkey => $nidvalue){
            $query6 = db_select('node_revision', 'node_revision');
            $query6->fields('node_revision', array('nid', 'vid', 'status') );
            $query6->condition('status', 1, '=');
            $query6->condition('nid', $nidkey, '=');
            $records6 = $query6->execute();
            if(is_object($records6)){
              foreach($records6 as $record6){
                if($record6->vid>0){
                  $revisionids[]=$record6->vid;
                }
              }
            }
          }
          if( ($fileusagenodeentryfound==1) && ($fileusageimceentryfound==1) ){
            $found++;
            //$contentgood .= '<br />Found: fid=' . $record->fid . ' nid\'s=' . implode(', ', $nids) . ' node count=' . $entrycount; 
          }
          else{
            $tmpfoundrids=array();
            $usagecount=0;
            foreach($nids as $nidkey => $nidvalue){
              foreach($checktables as $tableskey => $tablesvalue){
                $tablename = str_replace('field_data_field_', 'field_revision_field_', $tableskey);
                //$content .= '<br /><br />checking table ' . $tablename . ' with nid=' . $nidvalue;
                $query5 = db_select($tablename, $tablename);
                $query5->fields($tablename, array('entity_id', 'revision_id', $tablesvalue) );
                $query5->condition('entity_id', $nidvalue, '=');
                $query5->condition($tablesvalue, $record->fid, '=');
                $records5 = $query5->execute();
                if(is_object($records5)){
                  foreach($records5 as $record5){
                    if($record5->revision_id>0){
                      $tmpfoundrids[]=$record5->revision_id;
                      if(in_array($record5->revision_id, $revisionids)){
                        $usagecount++;
                      }
                    }
                  }
                }
              }
            }
            $query = db_merge('file_usage')->key(
              array(
                'fid' => (int) $record->fid,
                'module' => (string) 'imce',
                'type ' => (string) 'file',
              )
            )->fields(
              array(
                'fid' => (int) $record->fid,
                'module' => (string) 'imce',
                'type ' => (string) 'file',
                'id ' => (int) $record->fid,
                'count ' => (int) 0,
              )
            )->execute();
            $thenid=0;
            foreach($nids as $nidkey => $nidvalue){
              if($nidvalue>0){
                $thenid=$nidvalue;
              }
            }
            $usagecount=(int)$usagecount;
            if(!is_numeric($usagecount)){
              $usagecount= (int)1;
            }
            if($usagecount<1){
              $usagecount=(int)1;
            }
            if($usagecount>0){
              $query = db_merge('file_usage')->key(
                array(
                  'fid' => (int) $record->fid,
                  'module' => (string) 'file',
                  'type ' => (string) 'node',
                )
              )->fields(
                array(
                  'fid' => (int) $record->fid,
                  'module' => (string) 'file',
                  'type ' => (string) 'node',
                  'id ' => (int) $thenid,
                  'count ' => (int) $usagecount,
                )
              )->execute();
            }
            else{
              $query = db_merge('file_usage')->key(
                array(
                  'fid' => (int) $record->fid,
                  'module' => (string) 'file',
                  'type ' => (string) 'node',
                )
              )->fields(
                array(
                  'fid' => (int) $record->fid,
                  'module' => (string) 'file',
                  'type ' => (string) 'node',
                  'id ' => (int) $thenid,
                  'count ' => 1,
                )
              )->execute();
            }
            //$content .= '<br />Not found: fid=' . $record->fid . ' nid\'s=' . implode(', ', $nids) . ' node count=' . $entrycount . ' vid count=' .$usagecount;
            $notfound++;
          }
        }  // end if($entrycount>1){
      } // end if($record->fid){
    } // end foreach($records as $record){
  } // end if(is_object($records)){
  //return 'Found: ' . $found . ' entries. Not found: ' . $notfound . ' entries. Total: ' . ($found+$notfound) . ' entries.<hr/><br /><br />' . $contentgood.'<br /><br /><hr/>'.$content;
  watchdog('postmigrate', 'update_7011 - missing file_usage entries.');
  return 'update_7010 - missing file_usage entries.';
}
philsward’s picture

I know this is pretty much already in the works to be taken care of, but I want to share that I too had this same issue this evening and confirm that node revisions caused all of my issues.

Not being a database guru, nor a php programmer, I decided to use drupal the only way I know how: trial and error.

What I did to work around this issue was:

(pre-step) Make backup of database

1) D6 Download & Install Revision Deletion
2) In the settings, set the absolute lowest times for it to check for revisions. (15 min & always)
3) Mass deleted all revisions. (The most current "revision" will be unchecked & grayed which is normal)
4) Unchecked Create Revision in the workflow of EVERY content type.
5) Disabled & Deleted Revision Deletion. (there shouldn't be an uninstall, but check to make sure)
---- (The rest is standard procedure)
6) Made backup of database (optional)
7) Disabled all contrib & core modules
8) Performed the standard procedure to upgrade to D7

AFTER UPGRADING THE DATABASE TO D7, BEFORE DOING ANYTHING ELSE, I downloaded and installed the 7.x-dev of CCK, enabling it and Content Migrate. I personally migrated all of my fields one-by-one, doing image last, but it did finally go through without a hitch!

Hope that helps someone else who needs to upgrade but can't wait for the fix.

jshenry’s picture

philsward > Firstly, Thank you!

I've probably spent about 2 days trying to get a production D6.25 database upgraded into D7.12... I'm not a coder either and was going crazy getting this upgrade to run.

I followed your steps and the upgrade was successful!

FYI - I also had to use the http://drupal.org/node/1371216#comment-5600570 to get the upgrade to take, in case it helps anyone!

mandreato’s picture

I also encountered this issue during D7 upgrade.
In my case the duplications were due to Taxonomy Image module which saves the same filepath for same image files.
For example: if you load the same term_img.png to both term1 and term2, it will create two rows into the files table with the same "sites/default/files/category_pictures/term_img.png" filepath.

I resolved by duplicating the image files, giving them a different name and reassociating to terms.
Hope this helps !

iamEAP’s picture

Status:Needs review» Closed (duplicate)

Unless I'm misunderstanding, the primary issue going on in this thread appears to be related to the fact that names/paths in the D6 files table are case sensitive while names/files in the D7 file_managed table are case insensitive.

If that's the case, this is a duplicate of a critical core bug, which is currently being blocked by another critical core bug.

Please see:
#966210: DB Case Sensitivity: system_update_7061() fails on inserting files with same name but different case
#1237252: DB Case Sensitivity: Allow BINARY attribute in MySQL

philsward’s picture

@iamEAP out of curiosity, how are two sets of the same field in the database, related to case sensitivity?

I have confirmed that if a node has revisions, it causes this issue. Remove the duplicate revision, and the problem goes away.

Then you do have duplicates in your file table (two file entries pointing to the same file). D7 will not allow duplicates, hence the problem in the migration. D6 allowed this, D7 does not.

In my case the duplications were due to Taxonomy Image module which saves the same filepath for same image files.

(as long as the uri does not already exist in the files_managed table, or the wrong fid has been added to the files_managed table.

I only saw one comment referencing case sensitivity so I guess I don't understand why this issue was randomly closed and pointed to core case sensitivity issues when the majority of this issue deals with duplicate file names in the database...??

Am I totally missing something?

iamEAP’s picture

@philsward

Sorry, I may have been hasty in closing the issue, but maybe we can take this opportunity to split this issue into a few actionable tasks; it's clear there are a number of contributing factors which may end up being separate issues entirely.

Certainly the case sensitivity issue can cause this error and is being worked on in separate threads.

Regarding the Taxonomy Image module causing issues: I believe this is beyond the scope of Content Migrate. Also, given that it has no D7 version, my understanding is that it should have been disabled and uninstalled during an earlier step in the upgrade process, which may have made it a non-issue during the upgrade process. If the user did do that and Taxonomy Image didn't scrub that data, then it could arguably be said to be a bug in Taxonomy Image's uninstall processes.

Regarding duplicates in the file table: this is again, I believe, beyond the scope of Content Migrate. This, and the Taxonomy Image issue above are more or less the same: the user has bad data. D6 "allowed" this insofar as it didn't enforce a key constraint at the database level, but it certainly didn't encourage it because it appended numbers to the ends of existing filenames. While Content Migrate isn't responsible for cleaning up this bad data, it may be useful to create some documentation around offending modules (like Taxonomy Image) and offer workarounds (including some of the above) there.

As for SangersDrupalDude's work, I can't really make heads or tales of it. It'd be great to see it as either a patch against Content Migrate or made into another contrib module that could be used to massage file data in the upgrade process.

philsward’s picture

No worries... : )

I suppose the revisions being an issue could probably be separated out into it's own queue and I think might be a pretty easy documentation workaround. I have a feeling there are a lot of folks using node revisions, especially in environments where there are lots of hands on the site and at least having it noted in the upgrade path that revisions are known to cause issues with the work around of deleting them, might help out quite a bit. Same goes for some of the other issues that might have been encountered.

Grab my instructions from above and throw it somewhere it's easy to find : )

radiobuzzer’s picture

Status:Closed (duplicate)» Needs work

You are right to mention that this should be split in other queues. Though, I just bumped on this issue and I keep hearing that more people have the same problem as upgrading to 7 is a need. So if this queue is to be closed, new queues should be opened regarding all aspects.

The issues with uppercasing mentioned in #43 when this ticket was closed, are only one aspect. @philsward conflicting revisions is not solved by deleting the conflicting revisions as they may be actually needed.

iamEAP’s picture

Status:Needs work» Active

Let's set this to active for now, since "needs review" implies a patch is available; we can close it once we have created and linked separate issues.

I guess step one is to identify the separate bugs. Case sensitivity issues for MySQL should be resolved as long as you're upgrading using 7.14 or higher.

radiobuzzer’s picture

I just dealt with my version of the problem, which was apparently a bug in the image field convert (image_legacy module) process.
#1577648: Upgrade to D7 fails: PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry

I don't have any other separate bug myself so I won't stop you from closing the ticket. Can we elaborate more about the revisions issue?

attheshow’s picture

Here's an alternative to the steps mentioned in #40 related to installing/using/uninstalling the Revision Deletion module. If you feel comfortable using a database query, just delete all old revisions yourself via SQL:

DELETE
FROM `node_revisions`
WHERE `node_revisions`.`vid` NOT IN(SELECT `vid` FROM `node`);
attheshow’s picture

None of the above patches were working for me either. I'm using Drush to make a repeatable upgrade script for my site. I had duplicates in my files table too. I've attached the drush PHP script below that I'm using just in case anyone else finds it useful. I'm basically telling MySQL to delete all duplicate entries in the files table except for the most recent one. I do this on the legacy table before I run my actual field migrations using D7 CCK/Content Migrate module.

Here's the Drush command I use to execute the PHP script:

drush php-script apsu-d7-upgrade-content-migrate-prep --script-path=~/Sites/d7upgrade

And here is the content of the script (named "apsu-d7-upgrade-content-migrate-prep.php" and placed in my ~/Sites/d7upgrade directory):

<?php

/**
* A script for the D7 upgrade. Meant to be run using the drush php-script command.
* This script deletes duplicates from the D6 files table.
*/

$link = mysql_connect('HOST', 'USERNAME', 'PASSWORD');
if (!$link) {
    die('Could not connect: ' . mysql_error());
}
//echo 'Connected successfully';
$db_selected = mysql_select_db('LEGACY_DB_NAME') or die('Unable to connect to database. ' . mysql_error());
$query = "SELECT f1.fid, f1.filename, f1.filepath FROM files f1 WHERE (SELECT count(f2.filepath) FROM files f2 WHERE f2.filepath = f1.filepath) > 1 AND (filename LIKE '%.pdf' OR filename LIKE '%.doc%' OR filename LIKE '%.ppt%') ORDER BY f1.filename, f1.fid DESC";
$result = mysql_query($query) or die ("Query failed. <p>query = $query ".mysql_error());
$last_file = '';
$current_file = '';
while ($row = mysql_fetch_array($result)) {
  $current_file = $row['filepath'];
  if ($current_file == $last_file) {
    // It's a duplicate. Delete it!
    $query2 = "DELETE FROM files
    WHERE fid = '" . $row['fid'] . "'";
    $result2 = mysql_query($query2) or die ("Query failed. <p>query = $query2 ".mysql_error());
  }
  $last_file = $current_file;
}

mysql_close($link);

plachance’s picture

I'm trying to update from Drupal 6.25 to 7.15 on pgsql 8.3 after an upgrade from Drupal 5 and I'm getting this error message :

Update #7061
Failed: PDOException: SQLSTATE[25P02]: In failed sql transaction: 7 ERROR: current transaction is aborted, commands ignored until end of transaction block: SELECT 1 AS expression FROM {file_managed} file_managed WHERE ( (fid = :db_condition_placeholder_0) ) FOR UPDATE; Array ( [:db_condition_placeholder_0] => 21872 ) in system_update_7061() (line 2816 of /var/www/suivis7/modules/system/system.install).

I've found in my db that many files and upload records doesn't have any existing node record. It looks like a bug caused by the lack of referential integrity. So, I deleted these records but it didn't solve the problem.

I've then found that I still had duplicate filepaths and that they are associated on different nid. I havent figured why and how it's possible. Anyway, I've deleted duplicates with the lowest fid (oldest records) before the update and I don't get the error anymore. I had 50 duplicated on 15k files, excluding rows not associated with a node.

Here's my solution for pgsql. A little bit of work need to be done if you wish to use it on MySQL.

-- (Optionnal) Delete rows from files and upload tables that aren't associated with a node.
DELETE FROM files
WHERE fid NOT IN (SELECT u.fid
FROM upload u
INNER JOIN node_revisions nr ON nr.vid = u.vid
INNER JOIN node n ON n.nid = nr.nid
INNER JOIN files f ON f.fid = u.fid);

-- Deleting from the upload table is very slow without a temporary table.
CREATE TEMPORARY TABLE tmp_upload_delete
ON COMMIT DROP
AS SELECT u.fid, u.vid
FROM upload u
LEFT JOIN (SELECT u.fid, u.vid
FROM upload u
INNER JOIN node_revisions nr ON nr.vid = u.vid
INNER JOIN node n ON n.nid = nr.nid
INNER JOIN files f ON f.fid = u.fid) AS f ON f.fid = u.fid AND f.vid = u.vid
WHERE f.fid IS NULL
AND f.vid IS NULL;

DELETE FROM upload u
USING tmp_upload_delete AS f
WHERE u.fid = f.fid
AND u.vid = f.vid;

-- Delete duplicate filepaths with the lowest fid.
DELETE FROM files f1
USING (SELECT filepath, COUNT(*), MAX(fid) AS maxid
FROM files
GROUP BY filepath
HAVING COUNT(*) > 1
ORDER BY filepath) AS f2
WHERE f2.filepath = f1.filepath
AND f2.maxid != f1.fid;

jenlampton’s picture

I haven't seen any patches since #30, and there's been a ton of code dumped into this issue. Does anyone have a patch against CCK that solves the upgrade problems, or should I just start with the patch in #30?

kamil0788’s picture

I found one solution to this problem. I needed to transfer database of 60,000 users and about 5000 nodes from Drupal 6 to Drupal 7 and had similar issue with the dublicate content & field error

When you do CCK migrate and get the message "duplicate files cause Integrity constraint violation: 1062 Duplicate entry 'xx' for key 'PRIMARY'"

1)copy the file name
2)go to your database -> files
3)in files tables search for "filename.jpg"
4)delete the tables which contain "filename.jpg"
5)reverse CCK migrate and CCK migrate again

I had to import 5000 nodes with many various content types and field and I had only 2 files duplication out of 5000. I am aware that it is not the ideal solution, however 99% of the content has no errors and there will be only 2 nodes with missing files.

steinmb’s picture

@jenlamption I think #30 is the best starting point, have not seen any patches floating around.

radiobuzzer’s picture

jenlampton’s picture

Status:Active» Needs review
StatusFileSize
new2.83 KB

I haven't seen any progress on this issue since #30. Re uploading patch and marking needs review to get more eyes on it.

jenlampton’s picture

Here's one that will batch. I took a stab at a DBTNG query instead of using db_query_range, but couldn't get it working. It's in here in comments if anyone else wants to take a stab at it.

sblessley’s picture

Issue summary:View changes

Removed invalid link ("Luckily I found this site http://www.fivepaths.com/drupal-revision-removal-and-database-cleanup-by...)". Contacted fivepaths.com for replacement, if they provide one I'll re-add the sentence with the correct link.

colan’s picture

Version:7.x-2.x-dev» 7.x-3.x-dev
Issue summary:View changes
Status:Needs review» Needs work

If this is still an issue, we need to remove the commented code before this goes in.

@jenlampton: So your patch works without that code block, right? It just does it a different way?

organicwire’s picture

@jenlampton: I tried your patch but it didn't work for me. The problem is that the renamed new path is wrong (something like files/sites/default/myfile instead of sites/default/files/myfile).

$ drush updatedb
Cck  7001  7001 -
Do you wish to run all pending updates? (y/n): y
copy(files/sites/default/myfile.pdf): failed to open stream: No such file or directory     [warning]
cck.install:140

I reworked the way how the new filename is determined and how the copying is done:

$new_path = str_replace("public://", "", file_destination($destination . $record->filepath, FILE_EXISTS_RENAME));
$saved_filepath = file_unmanaged_copy($record->filepath, $new_path, FILE_EXISTS_RENAME);

I also fixed the batch size to 50 which was set to 5.

An open issue remains: the batch mechanism seems not to work properly so that only one iteration is executed. In order to have all my files fixed, I need to re-run the update manually.

organicwire’s picture

StatusFileSize
new3.18 KB

Corrected paths in patch file.

organicwire’s picture

@jenlampton: For you convenience here's an interdiff for the changes I made to your patch from #59.

organicwire’s picture

the batch mechanism seems not to work properly so that only one iteration is executed.

I fixed that. Here's the patch and interdiff.

organicwire’s picture

Status:Needs work» Needs review