From today update 7005 failed due to non-existant table column. Line number causing the fail #280 which is

db_drop_field($ret, 'filter_format', 'roles');

This may not be the place for it but I've seen this error over the past several years when running update.php no matter what the failed column is. In the function db_drop_field isn't it possible to check for the field and if it does not exist (such as with a prior update) to return and continue the update process.

Here's the issue, running off HEAD you use update.php quite consistently though not recommended or you run a fresh install. Fresh installs can be a pain as you have to reset all your modules and setting preferences. After a while you try running update.php in hopes that no errors are thrown.

Even when I commented out the line and ran update.php, it continued the update process but gave an error on the non-existant column. I saw an example in user.install that uses db_column_exist() such as

function user_update_7002(&$sandbox) {
  $ret = array('#finished' => 0);

  // Multi-part update.
  if (!isset($sandbox['user_from'])) {
    db_change_field($ret, 'users', 'timezone', 'timezone', array('type' => 'varchar', 'length' => 32, 'not null' => FALSE));
    $sandbox['user_from'] = 0;
    $sandbox['user_count'] = db_query("SELECT COUNT(uid) FROM {users}")->fetchField();
    $sandbox['user_not_migrated'] = 0;
  }
  else {
    $timezones = system_time_zones();
    // Update this many per page load.
    $count = 10000;
    $contributed_date_module = db_column_exists('users', 'timezone_name');
    $contributed_event_module = db_column_exists('users', 'timezone_id');

    $results = db_query_range("SELECT uid FROM {users} ORDER BY uid", $sandbox['user_from'], $count);
    foreach ($results as $account) {
      $timezone = NULL;
      // If the contributed Date module has created a users.timezone_name
      // column, use this data to set each user's time zone.
      if ($contributed_date_module) {
        $date_timezone = db_query("SELECT timezone_name FROM {users} WHERE uid = :uid", array(':uid' => $account->uid))->fetchField();
        if (isset($timezones[$date_timezone])) {
          $timezone = $date_timezone;
        }
      }
      // If the contributed Event module has stored user time zone information
      // use that information to update the user accounts.
      if (!$timezone && $contributed_event_module) {
        try {
          $event_timezone = db_query("SELECT t.name FROM {users} u LEFT JOIN {event_timezones} t ON u.timezone_id = t.timezone WHERE u.uid = :uid", array(':uid' => $account->uid))->fetchField();
          $event_timezone = str_replace(' ', '_', $event_timezone);
          if (isset($timezones[$event_timezone])) {
            $timezone = $event_timezone;
          }
        }
        catch (PDOException $e) {
          // Ignore error if event_timezones table does not exist or unexpected
          // schema found.
        }
      }
      if ($timezone) {
        db_query("UPDATE {users} SET timezone = :timezone WHERE uid = :uid", array(':timezone' => $timezone, ':uid' => $account->uid));
      }
      else {
        $sandbox['user_not_migrated']++;
        db_query("UPDATE {users} SET timezone = NULL WHERE uid = :uid", array(':uid' => $account->uid));
      }
      $sandbox['user_from']++;
    }

    $ret['#finished'] = $sandbox['user_from'] / $sandbox['user_count'];
    if ($sandbox['user_from'] == $sandbox['user_count']) {
      $ret[] = array('success' => TRUE, 'query' => "Migrate user time zones.");
      if ($sandbox['user_not_migrated'] > 0) {
        variable_set('empty_timezone_message', 1);
        drupal_set_message('Some user time zones have been emptied and need to be set to the correct values. Use the new ' . l('time zone options', 'admin/config/regional/settings') . ' to choose whether to remind users at login to set the correct time zone.', 'warning');
      }
    }
  }
  return $ret;
}

Couldn't db_drop_field() do a check with db_column_exist first and if the column exist then execute db_drop_field(), the error could still be thrown that the column didn't exist but why fatally kill the update process on a non-existent column.

Comments

sun’s picture

Priority: Normal » Critical
Issue tags: +D7 upgrade path

Tagging.

agentrickard’s picture

Component: filter.module » database system
Status: Active » Needs review
StatusFileSize
new2.99 KB

Core seems to be very inconsistent about this. user.install runs db_column_exists() before attempting these statements. Here's a patch that puts the burden on the API, not the caller.

I'm sure Crell will have a reason why this check should be done by the caller, though.

agentrickard’s picture

Option #2 is to wrap the filter_update in db_column_exists(), which seems wrong to me.

dries’s picture

Personally, I'm not a big fan of having APIs die silently.

 function db_add_field($table, $field, $spec, $keys_new = array()) {
-  return Database::getConnection()->schema()->addField($table, $field, $spec, $keys_new);
+  if (!db_column_exists($table, $field)) {
+    return Database::getConnection()->schema()->addField($table, $field, $spec, $keys_new);
+  }
 }

That seems wrong, not? We're trying to create the field so it wouldn't exist yet.

sun’s picture

Yes, we need Crell's input here, but ultimately, I disagree with the logic. For the record, I also disagree with the existing logic in http://api.drupal.org/api/function/db_create_table/7

IMHO, module update functions and all other functions that are trying to perform schema changes should know what they are doing - and why. Therefore, this logic belongs into the calling code, not into the low-level API functions.

agentrickard’s picture

I can go with sun on this. Just trying to close issues :-). But this means wrapping 20 or so update calls in IF db_column_exists().

@sun However, I think db_create_table() gets it right. Devs don't need to know that these things fail, they just need to fail gracefully.

@Dries See this part of the patch, too.

-    // Check that the field hasn't been updated in an aborted run of this
-    // update.
-    if (!db_column_exists('users', 'picture_fid')) {
-      // Add a new field for the fid.
-      db_add_field('users', 'picture_fid', $picture_field);
-    }
+    // Add a new field for the fid.
+    db_add_field('users', 'picture_fid', $picture_field);

There are reasons why we need to check fields before adding them, because things can explode.

/me goes to poke Crell.

David_Rothstein’s picture

I'm not sure I understand the original bug report here. It seems like the only way this can fail is if you run the same update function twice or otherwise mess with the schema_version in the database? If so, is this even a bug?

Anyway, better error handling in these database functions might not be a bad idea... and in the particular case of db_drop_field() I guess I can imagine it failing more gracefully (since if the database column already is gone, its work is basically done anyway). But in general I agree with the idea that it should be up to the caller to handle these things. Graceful failure should only happen if we can guarantee that what the caller wants has already been done exactly. For example, db_create_table() checks if the table already exists, but I think that's wrong - it should really also check if the existing table has the exact schema that was requested before deciding to fail gracefully.

agentrickard’s picture

Status: Needs work » Needs review

@David_Rothstein

It is a bug, as the example from user_update_7004() shows. Sometimes updates get corrupted and run partially, so some core update functions run checks.

But let's consider a code cruft standpoint.

1) db_ altering functions are almost always limited to .install and _update hooks.

2) There are currently 2 examples in core where those alter functions are wrapped in IF. (Two others are there to check for advanced conditions that need to be handled by the update function.) These were expressly put there because not having it broke something, due to a faulty update that someone, somewhere had a problem with.

3) There are 20 or so more that are not, all of which could be brittle in the same way as this original issue.

4) Adding the IF logic inside the API function (as it is in db_create_table()) gives us _much_ less code and leads to much less developer WTF. Your table changes Just Work, and can be run without prior knowledge of the target db.

This all seems much cleaner to use and, more importantly, easier to maintain.

From a DX perspective, putting the burden on the caller always seems like a lazy choice to me. APIs should be wrappers around patterns that enforce the logic of those patterns. drupal_write_record(), I'm looking squarely at you and calling you a bad API. The database layer can tell _me_ if this is an insert or update. Why else would we return SAVED_NEW or SAVED_UPDATED from that call?

By not running the check in the API function, which is generally < 5 lines of code (in this patch, its 2 lines per function), we instead bloat all of core and contrib with repeated checks for obvious conditions that could be handled at the API level.

Its not like adding these checks to the API are going to have negative impact, like obvious slowdown of execution times. We're shifting conditional checks from one layer (the caller) to another (the API), and in doing so, ensuring a uniform application of those conditionals.

We need to define a standard and stick to it. Right now, we have a mixed bag.

/me re-titles the issue to reflect the debate point.

damien tournoud’s picture

Title: update 7005 fails » Wrap database alteration APIs in proper validation
Status: Needs review » Needs work
Issue tags: +D7DX

I agree that we should prevent those functions from returning errors: db_drop_field(), db_drop_table(), db_drop_index().

*But* functions *creating* or *modifying* stuff should fail (db_create_field(), db_create_table(), db_create_index(), db_change_field(), etc.), because the existing object in the database might be different then the one that the caller wants to create.

damien tournoud’s picture

Status: Needs review » Needs work

To make my point, this is *really* wrong:

 function db_change_field($table, $field, $field_new, $spec, $keys_new = array()) {
-  return Database::getConnection()->schema()->changeField($table, $field, $field_new, $spec, $keys_new);
+  if (db_column_exists($table, $field)) {
+    return Database::getConnection()->schema()->changeField($table, $field, $field_new, $spec, $keys_new);
+  }
 }

D'oh. What if the original field doesn't exist? We just don't change it? Dumb.

ctmattice1’s picture

Title: Wrap database alteration APIs in proper validation » update 7005 fails
Status: Needs work » Needs review
Issue tags: -D7DX

The reason this was posted David is that the issue of sql errors on updating a system could be avoided. I agree that on dev releases this should not be considered a bug but what happens when either core or contrib change the schema and forget about the change in hook_update_N

@David_Rothstein; "Anyway, better error handling in these database functions might not be a bad idea... and in the particular case of db_drop_field() I guess I can imagine it failing more gracefully (since if the database column already is gone, its work is basically done anyway). But in general I agree with the idea that it should be up to the caller to handle these things. Graceful failure should only happen if we can guarantee that what the caller wants has already been done exactly. For example, db_create_table() checks if the table already exists, but I think that's wrong - it should really also check if the existing table has the exact schema that was requested before deciding to fail gracefully."

other functions I've seen screw an update on is db_index_exists(), db_drop_unique_key(), db_add_unique_key(). There is a issue now on db_index_exists() #722912: db_index_exists() must conform schemaAPI indexExists().

If you wish to take this off critical, I'll leave that up to those who know the innards of core.

damien tournoud’s picture

Status: Needs review » Needs work

Fixing cross-post.

Crell’s picture

Title: update 7005 fails » Schema safety: Decide!
Status: Needs work » Needs review
Issue tags: +D7DX

Restoring title and tags.

I'm going to think aloud a bit here. Please bear with me. :-)

For schema operations, we have the following groups of operations (on fields, tables, or indexes):

- Create
- Modify
- Delete
- Exists

Exists is not a modifier and already has its own issue (linked from #11), so we'll ignore that here.

I think it's safe to say that for all three types of schema objects (tables, fields, indexes) we want a consistent set of error handling rules. For the others:

- Create will succeed iff the object does not yet exist, and does exist after the operation.
- Modify will succeed iff the object does exist, and now has a new definition after the operation.
- Delete will succeed iff the object does exist and does not exist after the operation. OR, it could be argued that it will succeed only on the second condition, that is, if you delete a table that doesn't exist, that could arguably be a success condition since the state you wanted is now true. (Eg, unset($does_not_exist) is a safe operation.)

If we check for a failure of one of those conditions before we execute the query, we could do any of the following:

1) Do not check and let SQL throw a PDO exception for us.

2) Check for the conditions and silently do nothing if we should not execute.

3) Check for the error and throw an exception if we should not execute.

4) Check for the error and return something useful, for some definition of useful, to indicate success or the type of failure.

5)

Arguably #1 is the "don't babysit broken code" answer, although I've never been the biggest fan of that approach and none of these operations are used along a critical path. So let's reject that one now. I certainly hope we can also say that #2 is horribly bad, as fixing bugs when the system refuses to tell you what bug you have is a nightmare.

So that leaves us with two options: Don't explode but return something useful (eg, TRUE if the operation succeeded, 0 if it failed for an SQL reason, -1 if the table/field already existed, etc.) and Don't explode but throw an Exception instead.

I disagree with agentrickard in #6: "Devs don't need to know that these things fail, they just need to fail gracefully." On the contrary, if I'm doing something that fails I DO want to know that it fails... if it affects what I do next (which could be to halt entirely). That's where Delete becomes somewhat different, because whether the object was deleted or didn't exist in the first place doesn't generally affect what I do next. The post-condition is the same: The object does not exist.

For Create and Modify, though, the different pre-conditions do affect the post-condition. Modify on a field that doesn't exist will result in no field at the end rather than a field with the new properties. Create on a table that already exists will have the same post-condition iff the old table and new table have the exact same definition, which we cannot reasonably check for. (At least not at this point, that's out of scope for this issue.)

So, I'd recommend a standard of:

If a pre-condition fails in a way that would change the desired post-condition, throw an exception. If a pre-condition fails in a way that the post-condition is the same, return a useful value.

That means:

- Create: if the object already exists, throw DatabaseSchemaTableExistsException. (And equivalent for fields and indexes.)
- Modify: if the object does not already exist, throw DatabaseSchemaTableDoesNotExistException. (Ibid.)
- Delete: if the object exists, delete it and return TRUE. If not, return FALSE.
- Exists: Return TRUE/FALSE (as discussed in the other issue).

If for some reason we want to continue the same way anyway on a failed Create or Modify, the caller can catch and ignore the exception. If not, which I argue is the degenerate case, the caller ignores the exception and lets it bubble up, and it can be caught by the update system, which already has a try-catch block in it since the update system now uses exceptions for error handling.

Does that make sense to everyone?

Also note, the exists checks should NOT be in the wrapper functions. They belong back in the methods in the schema classes so that they work even if you use the schema object directly. The function wrappers should never have any real logic in them.

agentrickard’s picture

And this is why Crell is teh db maintainer ;-p. I like it.

andypost’s picture

Exactly agree with #13 + more consistent exceptions.

There is a lot of troubles with D6->D7 migration. But don't forget about databases that been migrated from D5 to D6 and now going to D7, they could have a lot of inconsistency in schema because D5->D6 was not clear.

So throwing exceptions is a great help with migration else database inconsistency never stop! Exceptions should make people to use schema module and find bugs!

Also all php-doc blocks should be examined because db layer is purely documented.

dries’s picture

I support Crell's proposal in #13. Let's go for it. Good write-up, Crell. This would be another step in getting the upgrade path fixed, and hence getting us to beta -- see http://buytaert.net/drupal-7-status-update-and-release-plan.

damien tournoud’s picture

Assigned: Unassigned » damien tournoud

Ok, on it. Thanks Crell for saying exactly what I was saying in so many more words :)

agentrickard’s picture

While we're at it, can we clean up the "fields" vs "columns" inconsistency in the API, which this issue highlights? #728338: Standardize on 'field' or 'column' in function names.

damien tournoud’s picture

Title: Schema safety: Decide! » Improve safety of schema manipulation
StatusFileSize
new40.63 KB

This should be feature complete on all the three database engines.

Offering the patch as an offrande to the test bot.

sun’s picture

Not sure what you think, but this looks like yet another awesome patch by DamZ - RTBC if bot passes. :)

Status: Needs review » Needs work

The last submitted patch, 582948-schema-safety.patch, failed testing.

damien tournoud’s picture

Status: Needs work » Needs review
StatusFileSize
new39.59 KB

Reverted the bits that belong in #722912: db_index_exists() must conform schemaAPI indexExists(). And fixed the test failure (confusion between fieldExists() and columnExists()).

@agentrickard: renaming columnExists() or all the field*() is not in the scope of this issue.

Crell’s picture

DamZ gets bonus points for including @throws directives in the docblocks. Rock! I almost wonder if it would be possible to move the error checking up into the DatabaseSchema class, but I suspect in practice it would be more trouble than it's worth.

#22 looks good to me as soon as the bot green-lights it.

dries’s picture

Status: Needs review » Fixed

Nice job. Committed to CVS HEAD. Thanks!

Let's create a follow-up issue for the 'field vs column' task.

David_Rothstein’s picture

Priority: Critical » Normal
Status: Fixed » Needs review
StatusFileSize
new20.03 KB

Followup patch for problems found while reviewing this - most should be pretty straightforward.

I'm also downgrading this from critical, since neither the original patch nor this followup patch affect the D6->D7 upgrade (unless there are some steps to reproduce that have not been mentioned here yet...) But making the error handling better is a great idea anyway, of course - nice patch :)

David_Rothstein’s picture

Now that we have this, by the way, another great followup would be to try to get rid of the db_table_exists() check here:

function db_create_table($name, $table) {
  if (!db_table_exists($name)) {
    return Database::getConnection()->schema()->createTable($name, $table);
  }
}

since as discussed above that is problematic, and especially since that invalidates the new error handling in createTable().

However, when I tried to do that, all of Drupal fell apart, I think due to the problem mentioned here: http://drupal.org/node/306151#comment-1777998 ....

Oh well. I guess that's for another followup patch somewhere else, then :)

damien tournoud’s picture

@David_Rothstein: thanks for those fixes, they all looks good, except the few of them where you added a check for the existence of a table before checking the existence of a field on that table, in fieldSetDefault(), fieldSetNoDefault() and changeField().

I consider that to be pure cruft: (1) if a table doesn't exist, a field doesn't exist, (2) there is no way for the caller to programmatically differentiate between the two conditions, (3) it doesn't really help the user.

Could you get those out?

Status: Needs review » Needs work

The last submitted patch, schema-582948-25.patch, failed testing.

David_Rothstein’s picture

Status: Needs work » Needs review
StatusFileSize
new17.95 KB

OK, thinking about that a bit more, I agree with you. If we had different exception classes for tables and columns then it would make sense, but we don't, and it's probably not worth adding them just for this. So, the attached patch takes those table existence checks back out.

Let's also see what the testbot thinks of this one. I'm not sure I believe that last set of failures...

agentrickard’s picture

damien tournoud’s picture

Status: Needs review » Reviewed & tested by the community

Many thanks for this clean-up.

David_Rothstein’s picture

Thanks!

By the way, I think I may have actually gotten the db_create_table() fix working, although it gets interesting enough that it certainly requires its own issue: #728820: Clean up installation of required modules

dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD. Thanks, David.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

David_Rothstein’s picture

Status: Closed (fixed) » Needs review
StatusFileSize
new5.27 KB

OK, #728820: Clean up installation of required modules was committed, but with the db_create_table() fix removed from it - turns out people were more interested in the cleanup itself than the reason for the cleanup :)

However, #686196: Meta issue: Install failures on various hosting environments indicates that we really need to remove the db_table_exists() call here because, among other reasons, it's a performance hit. This patch should do it while preserving the D6->D7 upgrade path - note that the change in system.install is because that duplicates code in update_fix_d7_requirements() and therefore led to a double attempt to create that table.

In general, update_fix_d7_requirements() becomes a bit fragile as a result of this patch (because if someone hits the stop button before that finishes creating its tables, they might now be in a lot of trouble...), so perhaps we want to also add some db_table_exists() checks in there? But the attached is the minimal patch that seems to work.

Crell’s picture

Dear god... David, please just open a new issue. There's no need to bring this thread back from the dead. Link to it if you must, but not everything related to the schema needs to be in the same thread here.

David_Rothstein’s picture

Uh, I did try to open a new issue - that's what #728820: Clean up installation of required modules started off as. And I was specifically told by the patch reviewer to leave the db_table_exists() part out of that patch and do it here instead:

Has nothing to do with this, should be dealt with at #582948: Improve safety of schema manipulation.

I'm not moving it again :)

Status: Needs review » Needs work

The last submitted patch, db-create-table-582948-35.patch, failed testing.

David_Rothstein’s picture

Status: Needs work » Needs review
StatusFileSize
new6 KB

Tracker module was apparently still trying to install its own schema. Untested patch, but I think this ought to take care of the failing tests.

Crell’s picture

I'm confused. What have tracker or cache_bootstrap to do with this issue? This should be a 3 line patch to remove an if statement.

catch’s picture

Status: Needs review » Reviewed & tested by the community

If we're checking db_table_exists() in createTable(), then that has hidden the tracker bug - since that table creation code has been running tice. Not sure why {cache_bootstrap} would be failing though - that hunk in update.inc is only supposed to run once anyway no?

This makes loads of sense but we should open a (new ;)) issue somewhere to find out why the db_table_exists() is so bad in the first place per the installer issue. I'm wondering if it's due to shared MySQL database with thousands of tables in it - would not surprise me, but still, would be good to know why.

For some reason I assumed the fix on the cleanup issue was going to be rolled into here easily once that was committed, but looks like they both got in a bit too quickly for that to happen. I stick by fundamental changes to schema API not being hidden in 'cleanup' issues though ;)

Either way the patch here is fine.

dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD. Hopefully, this patch can be closed forever now! :)

David_Rothstein’s picture

Yes, it is indeed time to let this die forever :) But just to follow up:

Not sure why {cache_bootstrap} would be failing though - that hunk in update.inc is only supposed to run once anyway no?

See the difference between http://api.drupal.org/api/function/update_prepare_d7_bootstrap/7 and http://api.drupal.org/api/function/update_fix_d7_requirements/7. The latter is guaranteed to run only once. But {cache_bootstrap} is a special case which is created in the former function.

So basically, yes, all the cleanup in #728820: Clean up installation of required modules as well as the additional cleanup here were all cases where db_create_table() was being called on a table that already existed, so all needed to be changed in order to get the three line fix here working :)

mikeryan’s picture

An observation, in case someone suddenly finds contrib modules failing to enable with DatabaseSchemaObjectExistsException - the removal of db_table_exists() from db_create_table() will catch modules which have lingering drupal_install_schema() calls in their install hooks. Three contrib modules guilty on our current project!

Status: Fixed » Closed (fixed)
Issue tags: -D7DX, -D7 upgrade path

Automatically closed -- issue fixed for 2 weeks with no activity.