User module has a perfectly good API for loading data into the user object, it's called hook_user_load(). user->data needs to go and die somewhere.

Proposed solution -

create table user_data;
INSERT INTO {user_data} (uid, data) SELECT uid, data FROM user;
db_drop_field(&$ret, 'user', 'data');

the rest of the patch would be removing everything from user module which involves user->data.

If no objections, patch forthcoming.

CommentFileSizeAuthor
#117 emergency_index.patch475 byteschx
#109 user-data-347988-109.patch40.32 KBBerdir
#102 user-data-347988-102.patch40.32 KBBerdir
#100 user-data-upgrade-tests-347988-100.patch40.48 KBBerdir
#100 user-data-upgrade-tests-347988-100-interdiff.txt5.77 KBBerdir
#98 user-data-upgrade-tests-347988-98-tests-only.patch40.53 KBBerdir
#98 user-data-upgrade-tests-347988-98.patch41 KBBerdir
#98 user-data-upgrade-tests-347988-98-interdiff.txt6.17 KBBerdir
#95 drupal-347988-95.patch37.16 KBdawehner
#92 drupal-347988-92.patch37.15 KBdawehner
#90 drupal-347988-90.patch37.23 KBdawehner
#83 user.data_.83.patch38.32 KBsun
#83 interdiff.txt1.09 KBsun
#76 user.data_.76.patch37.23 KBsun
#76 interdiff.txt4.48 KBsun
#73 user.data_.73.patch36.67 KBsun
#73 interdiff.txt1.19 KBsun
#71 user.data_.71.patch35.48 KBsun
#71 interdiff.txt419 bytessun
#68 user.data_.68.patch35.07 KBsun
#68 interdiff.txt1.18 KBsun
#66 user.data_.66.patch35 KBsun
#66 interdiff.txt5.81 KBsun
#61 user.data_.61.patch33.83 KBsun
#61 interdiff.txt17.81 KBsun
#59 user.data_.59.patch31.63 KBsun
#59 interdiff.txt6.29 KBsun
#57 user.data_.57.patch32.11 KBsun
#57 interdiff.txt6.59 KBsun
#55 user.data_.55.patch31.51 KBsun
#55 interdiff.txt27.82 KBsun
#52 user.data_.52.patch9.53 KBsun
#52 interdiff.txt709 bytessun
#43 drupal8.user-data.43.patch9.33 KBsun
#35 drupal8.user-data.35.patch9.88 KBsun
#25 users.data_.diff12.33 KBmoshe weitzman
#20 drupal.user-data.patch12.38 KBsun
#14 347988-remove-user-data.patch10.36 KBDamien Tournoud
#10 347988-remove-user-data.patch9.45 KBDamien Tournoud
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

catch’s picture

Just to make it clear, user_data is a table created just for the purposes of preserving user data during the 6 - 7 upgrade process - ideally we'll do the whole schema creation and install within an update forum to save an empty table on new sites. Then in Drupal 8 or 9 add another update to drop the table if it hasn't been dropped already - similar to the approach taken to sequences for 5->6.

sun’s picture

Issue tags: +DIE

- Move users.data into dedicated user_data table.
- Use columns uid, [module,] property, value.
- Upon user_load(), fetch that into $account.
- Done.

I'll let Morbus bikeshed this issue with his thoughts.

Morbus Iff’s picture

> I'll let Morbus bikeshed this issue with his thoughts.

What the fuck?

sun’s picture

@catch: On a second thought, I see where #1 is coming from - seems you thought of users being cached anyway, so hook_user_load() would be sufficient. However, many modules in contrib store data bits per-user, but do not really need a separate database table, and using fields for simple data bits would be a major overhead...

@Morbus: Didn't we talk in IRC about something like enhancing the variables table to store both module and user variables there? IIRC, you argued that content-type variables prefixed/suffixed with the type name are similar (ugly) workarounds already.

Morbus Iff’s picture

Sun: by using the term "bikeshed", you are suggesting that my (as yet written) comments are trivial and stop-motion.

catch’s picture

content type variables are indeed extremely ugly, especially when we have some settings in the content type table too.

Moving this stuff to a user_data table with proper storage works for me too by the way.

sun’s picture

Issue tags: +Favorite of sun

Tagging.

Seems like the main motivation is missing here: Contrib modules currently have no way to easily update or remove their data for users. On sites with 100,000 of users and a few contrib modules installed, you can take a quick peek at users.data to see all kind of garbage in there. All of that gets serialized and unserialized whenever a user is loaded or saved. And that happens often.

moshe weitzman’s picture

Issue tags: -Favorite of sun

I very much agree with the approach of removing the code that reads and writes from this column. I'm fine with either migrating the data to a new table or leaving it where it is. I guess it is clearer to move it.

I do think that contact module uses this column so there is a little more work to do.

Contrib modules can use own table to relate to uid. There is no reason core needs to provide that. It doesn't for nodes, comments, terms, etc. And now we have the fields system which puts the final nail. Custom sites are welcome to add columns to users table using schema alter. And user module will happily load and save for you. So, all in all we don't need this column anymore.

Removing personal tag. Only one person gets to abuse the tagging system like that. If not, we'll have stupid tagging emails going out, and barely meaningful rows in our term_node table.

kenorb’s picture

Some inconvenience with $user->data object on 6.x
#542844: how to update $user->data object

Damien Tournoud’s picture

Status: Active » Needs review
FileSize
9.45 KB

A very careful first stab at this. Not really tested yet. Handle carefully.

Status: Needs review » Needs work

The last submitted patch failed testing.

sun’s picture

Status: Needs work » Needs review

wow, simply dropping {user}.data will remove all data stored by contributed modules upon upgrade.

Adding a separate field for every single property will also bloat that table immensely on a site with many contributed modules installed.

What happened to the {user_data} table idea?

sun’s picture

Issue tags: +API clean-up

Tagging.

Damien Tournoud’s picture

Whooo, user cancel uses {user}.data too... awesome. Here is a new patch.

wow, simply dropping {user}.data will remove all data stored by contributed modules upon upgrade.

We are not dropping anything, it's up to the contrib modules to migrate their data.

sun’s picture

+++ modules/user/user.install
@@ -198,7 +198,21 @@ function user_schema() {
+      'user_cancel_method' => array(
+        'type' => 'varchar',
+        'length' => 255,
+        'not null' => FALSE,
+        'default' => '',
+        'description' => 'The cancellation method this user have chosen.',
+      ),
+      'user_cancel_notify' => array(
+        'type' => 'varchar',
+        'length' => 255,
+        'not null' => FALSE,
+        'default' => '',
+        'description' => 'The cancellation notification method this user have chosen.',
       ),

Well. That's exactly the reason why I'm arguing for a separate {user_data} table.

These two values will be needed for 0.1% or even less users of a site. They will only be inserted when a user starts the user account cancellation process. Otherwise, they do not exist.

For the other 99.9% of users, we add garbage to every single user load - for the entire lifetime of a site.

Technically, these examples could even work as system variables. But then again, also system variables should be deleted when the user is deleted, but maintaining a relationship between system variables and users sounds totally wonky. Which again brings me back to {user_data}.

This review is powered by Dreditor.

Damien Tournoud’s picture

I suggest we stop half-way through: keep the user.data serialized column (and automatically serialize/unserialize it), but stop merging the values from this column in the $user object, and stop automatically storing unknown values from $user into user.data.

moshe weitzman’s picture

Actually, I am quite happy with the latest patches.

@sun - I think you are micro optimizing. If a site installs contact module, it has every reason to think that $user->contact is in the user object. And no matter what table holds that info, it would still be in the user object. Not sure why you think thats garbage. So the only argument then is whether the contact column belongs in users table or not. I think it is fine there, personally. We have a nice schema alter api and we might as well use. No need to add joins if we don't have to. We have supported automatically loading/saving custom columns in the user table for many many years.

sun’s picture

Status: Needs review » Needs work

well... one of the major problems we also wanted to solve with this issue is that contributed modules have no way to update their data when it is stored in a serialized column. They have to iterate over all existing users, checking every single one, to properly determine whether the user needs to be updated. Not being able to easily maintain the stored data is the flaw of the current system.

There are many contributed modules that just want to add a single property for certain users. We don't want to force them to create an own table just to be able to properly maintain that single property per user.

And there are many other contributed modules that want to add a couple of settings for several users. Some of them might be converted into fields, but some of them not.

Likewise, we have modules that want to temporarily or dynamically attach data to a user. Those do not need an own, permanent storage/table/column, but they still want to carry on certain user data for a limited time-frame. The user account cancellation process in core is a valid example for that. And. The user account cancellation process currently has no way to clean up this data in case the user did NOT confirm the cancellation, since it would need to iterate over all users to find those with that temporary data assigned.

Hence the trivial suggestion in #2. A {user_data} table, keyed by uid, with variable property names and values - similar to the {variable} table, but simply with relation to uid.

moshe weitzman’s picture

So, will each module do its own loading and writing to this new table? I'm don't think core needs to provide this service. If we did do this, we should add a column (or two) to the variable table. it could be module and realm columns. in this case, the realm is uid.

sun’s picture

Status: Needs work » Needs review
FileSize
12.38 KB

Your suggestion sounds similar to Morbus', who raised the fact that node type specific settings have the same problem (though maybe not as apparent as users.data). However, introducing a realm + realm_id to variable_get/_set/_del and {variable} cannot be considered for D7.

Fresh patch. Including working upgrade path.

$object->data removed from a couple of places, not sure about those removals. However, this also clearly shows how important #585838: Consider a generic $entity->user property really is. I totally hate those half-baked fake "user" objects passed to arbitrary functions and can't count the number of times I needed to hack core to achieve a proper and scalable user loading.

Status: Needs review » Needs work

The last submitted patch failed testing.

sun’s picture

Guys, we quickly need an agreement here. Discussed this issue with moshe in IRC and privately with smk-ka.

We have two approaches here:

  1. Removing auto-load and auto-storage of {users}.data. Modules either use their own tables and load their user data when they need it, or they add columns to {users}.data, so it is always loaded.

    Problem: There is the use-case of having to temporarily store data for a user. The only example in core are the two user account cancellation flags - they will only ever be inserted and needed, when a user tries to cancel the own account. For all the other 100% of users on a Drupal site, developers will see cruft in the $user object:

    $user->user_cancel_method = '';
    $user->user_cancel_notify = 0;
    

    We also cannot remove the {users}.data column during upgrade, because modules have to migrate their data.

  2. Keeping auto-load and auto-storage of user data, but store it in a separate {users_data} table, basically following the {variable}/{cache} table schema:
    uid   name                 value                     serialized
    3     user_cancel_method   delete                    0
    3     user_cancel_notify   1                         0
    3     node_options         a:1:{i:0;s:6:"status";}   1
    

    This nicely leverages the new user_load_multiple() and allows for easy querying and updating of the data. We also neither store nor query nor load any data for users that do not need it.

    Note that also in this approach, we could drop the auto-storage in user_save(), so we just auto-load, but make user account form submit handlers responsible for saving/deleting.

    Also note that the 'serialized' column wasn't yet contained in my last patch.

moshe weitzman’s picture

Thanks for revisiting this.

I'm pretty firmly in #1. We should add new tables for both user cancel and contact. If we don't need those properties frequently, we can just access those tables when we need them, and not on every user load. But really these are indexed queries that are lightening fast - I see no need to avoid them.

And yes, the users.data column can stay where it is for a release. We should not load/save at all.

I object to #2 for the same reasons that everyone objects to profile.module storage.

sun’s picture

I'm opposed to #1, because I'm opposed to forcing modules to add a separate table for values like those user_cancel values, because that table will be empty all of the time. The values are only needed for a couple of minutes/hours, days at max, until the user confirmed the account cancellation.

I also do not agree that we're rebuilding Profile module here. That had a totally different purpose, and for that purpose, yes, I completely agree its storage was wrong.

However, if it's really only me who disagrees with #1, then please, let's not hold off this issue. Either way is much better than the {users}.data we have now, because that blocks a proper User API over in #118345: Revamp hook_user_form/_register/_validate/_submit/_insert/_update.

moshe weitzman’s picture

Status: Needs work » Needs review
FileSize
12.33 KB

reroll of #20. i'm going to agree with this approach now. in d8, i really do want to offer per user storage of variables so lets not force modules to handle the schema for themselves for d7. this helps with upgrade path to d8.

unfortunately, i am seeing a query to the data column during install but do won't tell me what query. anyway, here is reroll. i probably won't work on this more at this time.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Version: 7.x-dev » 8.x-dev
Priority: Normal » Critical
catch’s picture

Also this caused a very nasty security issue in #86299: Add "current password" field to "change password form" which was fortunately found during alpha.

andypost’s picture

Why not use FieldAPI to store users' variables and {cache_user} (or similar) to old {user}-data

catch’s picture

Priority: Critical » Major

(grinding my teeth on this one and) Downgrading all D8 criticals to major per http://drupal.org/node/45111

pillarsdotnet’s picture

moshe weitzman’s picture

Title: Remove user->data » Move $user->data into own table

We made a lot of progress in #721436: Remove magical fairy saving of cruft from user_save() but this is still needed.

Are people satisfied with the approach in the last patch? In other words, create a new table which stores name/value per user. I guess I can live it. I'm thinking that contact, user_cancel, and Contrib modules should actively migrate their bits into the new table. That way, we leave behind cruft behind.

moshe weitzman’s picture

I'm thinking that we could add a module column so that uninstall can wipe cruft from the new table. I also think the new table could replace authmap table. Some discussion of the future of authmap at #817118: Remove {authmap} and migrate OpenID entries to their own table

andypost’s picture

I still think we should use cache_user_data or similar to minimize performance impact of loading many items. On large user base this table (key-value) could grow to tremendous sizes and I think merge queries is overkill

sun’s picture

Status: Needs work » Needs review
FileSize
9.88 KB

Yeah, I still think that this is needed.

Agreed on the 'module' column. However, I'm opposed to auto-loading/attaching + saving any data. Just provide a per-user storage for simple or temporary variables.

Re-rolled the patch with a couple of adjustments in light of that -- didn't work on any hook_user_load() and hook_user_insert()/_update() implementations.

Status: Needs review » Needs work

The last submitted patch, drupal8.user-data.35.patch, failed testing.

moshe weitzman’s picture

Looks good to me so far.

I'm not sure, but I think BLOB and has restrictions that we may not like such as not supporting LIKE queries. If so, pick a different one from http://drupal.org/node/159605

bfroehle’s picture

andypost’s picture

Now we have key/value storage probably better to store this data there?

moshe weitzman’s picture

Yeah - good call

andypost’s picture

I see 2 ways of implementation:

1) User module adds {user_data} table and wraps its usage into UserDataStorage as DatabaseStorageExpirable does. It's needed to define a database table
2) Use current {key_value} table but define own $collection user.data

In case of 1) we could take #35 approach and query required values by demand having $collection per user using user.uuid as collection name

sun’s picture

I don't think that a key/value store is appropriate for this. This is about persistent site content, and I believe it should and must be queryable in all ways.

I'd recommend to pursue with #35.

sun’s picture

Status: Needs work » Needs review
FileSize
9.33 KB

FWIW, re-rolled against HEAD.

Status: Needs review » Needs work

The last submitted patch, drupal8.user-data.43.patch, failed testing.

andypost’s picture

Suppose at minimum we need a consistent API at this point, I think better to re-use of newly introduced entity-property-iapi and make psedo-property to describe and provide crud for this kind of data

chx’s picture

The data column was never queryable. It's a PHP serialized blob. So, moving it to the K-V store (not the expireable one) is something that might make sense. I would use a 'user.data' namespace, the uuid as key and data as value. Should border trivial to implement it.

andypost’s picture

@chx do you mean to make uuid as $collection name?

sun’s picture

Quoting myself from #7 in 2009:

Contrib modules currently have no way to easily update or remove their data for users. On sites with 100,000 of users and a few contrib modules installed, you can take a quick peek at users.data to see all kind of garbage in there. All of that gets serialized and unserialized whenever a user is loaded or saved. And that happens often.

The situation did not change or improve in any way since then. All contributed modules that want to store too simple or temporary data for a particular user account have no way to do so. They could create an own table, each, but enforcing that would lead to a significant increase of tables.

We have a typical example for that in core with User module's account cancellation process, which needs to store two simple values for a limited time-frame for the user that may or may not cancel the account.

The other typical example in core is Contact module. If the module happens to be enabled for some time but gets uninstalled later on, many user accounts will have orphan 'contact' properties for eternity.

I strongly believe this data needs to be queryable, especially to allow the system to clean up obsolete/orphan values.

At the same time, requiring individual tables appears to be a giant overhead for a majority of use-cases throughout core and contrib. I don't like that 1/0 and black/white thinking. Flexinode/Profile module's simple storage design is suitable for certain use-cases.

chx’s picture

Sure, stuff that data into K-V. If you need them temporarily then stuff it into the K-V Expirable. users.data can easily become a legacy thing, if we so want.

#47, new DatabaseStorage('user.data')->set($user->uuid, $user->data). I know it's not valid code but it described what I mean.

sun’s picture

That doesn't solve the maintenance problem:

How do you delete all stored values of a certain module in all user accounts upon uninstallation of a certain module?

The approach in the patch in #43 solves that in a simple and straightforward way.

Aside from that, I also don't see why a KeyValueStore is considered at all - the stored data is clearly content, there's no need to make the storage backend pluggable, and the data is usually accessed by a specific module for a specific user ID (stopping the anti-pattern of unconditionally loading/saving a serialized dumping ground bin is one major aspect to this after all).

As @Dries tends to say, "the Drupal community is very good in applying patterns", but a KeyValueStore would clearly apply the wrong pattern to something that has different requirements. Let's not make that mistake.

catch’s picture

How do you delete all stored values of a certain module in all user accounts upon uninstallation of a certain module?

Separate collections?

The account cancellation stuff does feel like KeyValueExpirable to me, but not some of the other things stuffed in user data.

sun’s picture

Assigned: Unassigned » sun
Status: Needs work » Needs review
FileSize
709 bytes
9.53 KB

Fixed duplicate user module update function name.
Fixed stale insert query in user_install().

Status: Needs review » Needs work

The last submitted patch, user.data_.52.patch, failed testing.

moshe weitzman’s picture

Tests need some love. I'd be fine with proceeding with latest patch.

sun’s picture

Status: Needs work » Needs review
FileSize
27.82 KB
31.51 KB

Let's see how far we get with this.

Added update sandbox to process users.data in chunks of 20 records.
Added user_data_get(), user_data_set(), user_data_del() API functions.
Converted Block, Contact, and Overlay module to User Data API.
Removed $account->data assertions from OpenIDRegistrationTest.
Updated hook_user_presave() phpDoc.
Updated user cancel process for new user data.

Status: Needs review » Needs work

The last submitted patch, user.data_.55.patch, failed testing.

sun’s picture

Status: Needs work » Needs review
FileSize
6.59 KB
32.11 KB

Fixed upgrade path failures.
Fixed Contact module's personal contact form user data handling.
Moved handling of user_cancel_* data values in UserStorageController.

Remaining: I wasn't able to figure out yet what's wrong with the user cancel process. Only user.cancel_notify is saved to users_data, but user.cancel_method is not.

Status: Needs review » Needs work

The last submitted patch, user.data_.57.patch, failed testing.

sun’s picture

Status: Needs work » Needs review
FileSize
6.29 KB
31.63 KB
  1. Merged HEAD + Fixed stale Block/Contact module update dependency numbers.
  2. Moved user data saving back into preSave().
  3. Improved user_data_get() logic.
  4. Use User::id() method instead of $uid in code we are touching.

This patch should pass tests.

chx’s picture

Status: Needs review » Needs work

This is serious scope creep -- instead of moving to a separate table you are adding a whole new API around it. An API which is not pluggable, not using injection, a totally Drupal 7 API. The API needs to move into a class. Also, I still believe the K-V store is entirely adequate here: the collection could be called user.data.$module , the key could be $uid and then you would get a small blob perhaps an array that you can look at for $name and $value. If you go with this approach then here are useful snippets for you. UserBundle.php:

    $container->register('user.data', 'Drupal\user\DataManager')
      ->addArgument(new Reference('keyvalue'));

In DataManager:

  public function __construct(KeyValueFactory $key_value_factory) {
    $this->keyValueFactory = $key_value_factory;
  }
  function get($module, $uid, $name = NULL) {
    $data = $this->keyValueFactory->get("user.data.$module")->get($uid);
    if (isset($name)) {
      return isset($data[$name]) ? $data[$name] : NULL;
    }
    return $data;
  }

And so on. Deleting a module is simple, deleting a user would require iterating over the modules to delete the value, alas, that's unfortunate, I know. I would be glad to help further.

And $query->addExpression("'block'", 'module'); is invalid SQL. Edit: no, it isn't.

sun’s picture

Status: Needs work » Needs review
FileSize
17.81 KB
33.83 KB
  1. Added UserBundle.
  2. Replaced procedural functions with user.data service.

Status: Needs review » Needs work
Issue tags: -DIE, -API clean-up

The last submitted patch, user.data_.61.patch, failed testing.

andypost’s picture

Status: Needs work » Needs review

#61: user.data_.61.patch queued for re-testing.

Status: Needs review » Needs work
Issue tags: +DIE, +API clean-up

The last submitted patch, user.data_.61.patch, failed testing.

chx’s picture

Great! After talking to others on IRC (wish you'd be there) "a specialized API does have advantages instead of trying to fit everything into one generic one". One small thing however: could you please add an interface?

sun’s picture

Status: Needs work » Needs review
FileSize
5.81 KB
35 KB

Added UserDataInterface.

Status: Needs review » Needs work

The last submitted patch, user.data_.66.patch, failed testing.

sun’s picture

Status: Needs work » Needs review
FileSize
1.18 KB
35.07 KB

Fixed lame mistakes.

Status: Needs review » Needs work

The last submitted patch, user.data_.68.patch, failed testing.

Berdir’s picture

Hm, that's unfortunate. The API is used in hook_init(), which is called before we initialize the kernel.

sun’s picture

Status: Needs work » Needs review
FileSize
419 bytes
35.48 KB

Nothing easier to fix than that.

Fixed overlay_init() is invoked before kernel bootstrap.

Status: Needs review » Needs work

The last submitted patch, user.data_.71.patch, failed testing.

sun’s picture

Status: Needs work » Needs review
FileSize
1.19 KB
36.67 KB

Fixed TestBase makes too many assumptions on test runner environment.

(i.e., merged in the relevant fixes from #1808220: Remove run-tests.sh dependency on existing/installed parent site)

Status: Needs review » Needs work

The last submitted patch, user.data_.73.patch, failed testing.

moshe weitzman’s picture

Patch looks great. My one thought is that the new user_data API has no hooks of its own. I guess changes to user_data are expected to happen within a $user->save()? Works for me.

f
function overlay_user_presave($account) {
   if (isset($account->overlay)) {
-    $account->data['overlay'] = $account->overlay;
+    drupal_container()->get('user.data')->set('overlay', $account->uid, 'enabled', $account->overlay);
   }
 }

I think this belongs in update hook, not presave.

sun’s picture

Status: Needs work » Needs review
FileSize
4.48 KB
37.23 KB

Yeah, I don't see any use-case for invoking hooks there. Modules can integrate with the user entity CRUD operations already.

Final changes:

  1. Removed Views data handler for {users}.data.
  2. Changed overlay_user_presave() into overlay_user_update().
  3. Use $account->id() instead of $uid property where possible. (global $user is not a fully loaded entity.)

This should come back green. The BreadcrumbsTest failures seem to be random test failures in HEAD currently.

Status: Needs review » Needs work
Issue tags: -DIE, -API clean-up

The last submitted patch, user.data_.76.patch, failed testing.

Fabianx’s picture

Status: Needs work » Needs review

#76: user.data_.76.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, user.data_.76.patch, failed testing.

sun’s picture

Status: Needs work » Needs review
Issue tags: +DIE, +API clean-up

#76: user.data_.76.patch queued for re-testing.

sun’s picture

Status: Needs review » Needs work

The last submitted patch, user.data_.76.patch, failed testing.

sun’s picture

Status: Needs work » Needs review
FileSize
1.09 KB
38.32 KB

Fixed random test failure in Menu\BreadcrumbTest.

This should come back green and be ready.

chx’s picture

#71, #73 and #76 contained that failure and it stayed during retest and I didn't see it occuring anywhere else -- it doesn't seem like a test fluke. If you need, I can give you access to a box which extremely closely mimicks a testbot (Debian squeeze, php 5.3.5 from dotdeb) to countercheck.

sun’s picture

Thanks, but I don't think we need that. The randomness is actually reproducible, Simpletest UI vs. run-tests.sh.

The difference is in the {menu_links}.mlid of /node/add — which seems to be caused by the drupal_bootstrap() level change to run-tests.sh.

Quite potentially, something of the test runner leaked into the test environment, which causes hooks to be invoked in a slightly different order, and we never noticed. The only change here is the DRUPAL_BOOTSTRAP_FULL to _CODE in the code path of run-tests.sh that eventually executes TestBase::run(). _FULL only sets up language, path, custom theme, and hook_init().

The bootstrap change is required, since PIFR installs a full-blown parent site with the Standard profile, and the Standard profile contains Overlay module, which in turn implements hook_init(), and within overlay_init(), it checks whether the current user has overlay enabled. That, however, is converted into UserData here, and the UserData service is registered through UserBundle on the kernel/container. DRUPAL_BOOTSTRAP_FULL, however, does not bootstrap a kernel. Consequently, overlay_init() blows up.

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

Seems like all feedback has been addressed.

catch’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs tests

There's no new data added to the upgrade test databases here, nor any new test assertions for the upgrade tests to ensure a successful migration of existing data. Am I missing something? We should at least ensure that there is a user or two with $user->data set for the core modules in the D7 dump.

Reading through the patch I'm left wondering how this fits in with the Entity/Property API, the answer is probably that it doesn't, but nor does $user->data now, and by formalizing this it should be an easier move to some kind of property-ish thing if someone wanted to try that later.

Since the data is no longer part of the object, the only way to modify it is to have your own hook_user_update() hook that runs later than the module you want to change, then delete or update the value again. It would've been nice to have kept that access in hook_user_presave() (i.e. updating a userData object that then writes to the database itself later on), but we could discuss that in a follow-up since this stuff is so broken now and is going to be a dump whatever we do with it.

Apart from lack of test coverage for the update path and not being 100% comfortable with the API, I noticed a few more minor issues:

-  $use_overlay = !isset($user->data['overlay']) || $user->data['overlay'];
+  $user_data = drupal_container()->get('user.data')->get('overlay', $user->uid, 'enabled');

This is now an extra query every request, bit of a shame but then it's Overlay module, so can't get excited about it compared to the extra queries elsewhere in core atm.

+  /**
+   * Implements \Drupal\user\UserDataInterface::get().
+   */
+  public function get($module, $uid = NULL, $name = NULL) {

There's a lot of arguments here, and a lot of if/else in the method, why not separate getters?

+  // Process 20 user records at a time. E.g., if there are 10 data keys per user
+  // record, that leads to an insert query with 200 values.

This could use a $conf variable so that sites can tweak it if it's a problem, I can imagine this taking hours to run on some sites. Also wonder if there's a more efficient way to do this - i.e. MySQL has CREATE TABLE LIKE - so we could do that to the {users} table, then drop the extraneous columns maybe?

However, I also love that there's a backup table so we can actually do this in the patch:

+/**
+ * Drop {users}.data column.
+ */
+function user_update_8012() {
+  db_drop_field('users', 'data');
+}
moshe weitzman’s picture

@sun - do you think you will implement catch's feedback. would love to review your reroll and push this in. i can't look at users.data for another release!

sun’s picture

Status: Needs work » Needs review

I'll try. Upgrade path tests are a pain to write, but @catch only seems to ask for some basic/minimal coverage, so hopefully that's not too hard. The part I struggle most with is the "API" part — the current method names and arguments just simply make sense to me from a DX/user perspective. I don't know how they could be separated. The only remotely closest pattern I know of are getBy*, setBy*, deleteBy* methods, but when thinking through that option, 1) there'd be a dozen of methods, 2) each taking different arguments, resulting in 3) an unpredictable/ungrokable API. Compare that to the current patch, which has 3 methods that always take 3 identical arguments in the same order. Hardly beatable?

dawehner’s picture

FileSize
37.23 KB

Sorry for the noise.
Opened a follow up for the removed views integration: #1845526: User data service handler for views
and rerolled against the breadcrumb fix

Status: Needs review » Needs work

The last submitted patch, drupal-347988-90.patch, failed testing.

dawehner’s picture

Status: Needs work » Needs review
FileSize
37.15 KB

Another recent reroll.

Status: Needs review » Needs work

The last submitted patch, drupal-347988-92.patch, failed testing.

catch’s picture

@sun: I think minimal testing is mostly OK for this one, stuff saved in {users}.data is usually personalised settings rather than content, and we're saving a backup of everything anyway for contrib modules that might have something important in there.

dawehner’s picture

Status: Needs work » Needs review
FileSize
37.16 KB

shame.

sun’s picture

Since the data is no longer part of the object, the only way to modify it is to have your own hook_user_update() hook that runs later than the module you want to change, then delete or update the value again. It would've been nice to have kept that access in hook_user_presave() (i.e. updating a userData object that then writes to the database itself later on), but we could discuss that in a follow-up since this stuff is so broken now and is going to be a dump whatever we do with it.

I think I agree that this is a potential issue. IIRC, for some reason I wanted to retain usages of hook_user_presave() in close relation to that, but in hindsight, that wouldn't change a thing.

My own, primary reason for trying to retain _presave() was that it's the only way to support customized Drupal installations, which might opt to expose certain user account settings on the user registration form already (and I personally had many situations in the past in which I was tasked to do exactly that — and, due to our current user account data handling, exposing form controls on the user registration form that normally appear on the edit/profile form only, most often was an easy task.)

That said, I actually think the topic has to be split up some more:

1) We don't want to end up with _presave() tacking arbitrary properties onto an entity, in order to have some fancy collector to collect them in _insert() or _update(). I think we can all agree on that.

2) If 1) is not an option, then we'd be left with a single trashbin à la $entity['data'] in _presave(), which would follow a certain structure and would be consumed by _insert() and/or _update() to perform the necessary writes [potentially even by user.module itself on behalf of all others].

3) The really interesting part about that however is that $entity['data'] technically won't exist unless there is a user account form involved. That is, because we have no reason to load all of those potentially existing per-account settings for every user account on every load. Consequently, they do not exist on $account when it is saved programmatically. Individual user account data values only exist if they have been explicitly loaded and/or assigned.

4) Given that, it almost sounds to me that the only and most natural answer for that problem space is hook_form_alter(). (although I'm not sure how bad or broken that might sound — I'm too involved in this patch to have an outsider perspective)

Feedback welcome.

+ // Process 20 user records at a time. E.g., if there are 10 data keys per user
+ // record, that leads to an insert query with 200 values.

This [module update] could use a $conf variable so that sites can tweak it if it's a problem, I can imagine this taking hours to run on some sites.

To my knowledge, we are not doing that in any other module update. And TBH, I'd leave possible performance optimizations for the upgrade path to later follow-up issues — there's a big risk of premature optimization in that field.

catch’s picture

1, 2, 3 and 4 I mostly agree with those, it's a weird system in the first place and can't really be helped. Fine discussing it more in a normal followup task, this patch is already a big step forward.

On the update, see system_update_7061(),

  // Process all files attached to a given revision during the same batch.
  $limit = variable_get('upload_update_batch_size', 100);

There's no such thing as premature optimization when you are working on a code base that has known installs with millions of users so let's not introduce code that those installs will definitely have to hack in order to get upgraded. Those installs may well use migrate module but we don't support that in core unfortunately so we have to at least pretend to support actually updating the sites that exist.

Berdir’s picture

Awww, the warm fuzzy feeling of knowing why you wrote tests :) Found two bugs, the first was that the primary key of the helper table was just on uid and the second was that all queries where missing a name condition. So all these queries failed due to existing primary key exceptions.

Some notes:

- Haven't yet made the limit configurable. There's no variable_get() anymore (will no more be) so I'm not sure what to use there...
- Also added delete statements to remove actually migrated data, so it's easier to see what's left in there.
- Noticed that the upgrade functions are quite fragile. They require the values to be serialized strings and not integers. Not sure how reliable that is.
- Did not add test coverage for block stuff, I think that is being dropped anyway in the block => plugin confirmation so we'd just have to remove it again there.

Berdir’s picture

Just checked the data column manually in 7.x:

data: a:2:{s:7:"overlay";i:0;s:7:"contact";i:1;}

So, integers, not strings. Which means that the upgrade path is wrong. And as I said, switching to integer might be wrong as well. I guess to be save we have to use batch to load all entries and process them in PHP :( Or do some LIKE queries that check on 0/1 only... Or take over the serialized values.

Berdir’s picture

Here's a new patch that keeps those values serialized. That's not as fast, but it's safe. The values will be converted to a scalar once updated by the users.

sun’s picture

Issue tags: -Needs tests

Thanks for adding upgrade path tests, @Berdir! :)

Can we just use a 'user_update_user_data' variable for now that defaults to 20, 50, or 100?

(20 might have been too defensive on my part... given that these are plain database queries without any other module/PHP code involved, I'd guess that they execute quickly, so 100 might be more appropriate.)

I looked through the other changes, and with that minimal variable adjustment, I think we're back to RTBC here.

Berdir’s picture

FileSize
40.32 KB

Re-roll after the picture and router access issues got it. No interdiff, just fixed the conflicts and moved the update functions to the end.

No variable. Let me explain why:

- update_variable_get() only reads from the database, so you would need to set this with variable_set() in your d7 site before you start the upgrade.
- variable_get() will go a way, so sooner or later, we would need to find a solution and if we don't have one now, then why should we then? The only thing that I can think if is global $conf and that's kinda ugly.
- Not sure why this function needs such a variable and the uuid update right above it (which defaults to just 10 users) doesn't (the same uuid update also runs on nodes and comments with the same limit)
- batch is intelligent enough to automatically call our function again until the time limit of 1s is reached, before it does an actual page refresh. So the difference between doing 20 or 50 for example is really not that big IMHO.
- I don't think it's required. For testing, I created 500k users with db_insert (much faster than entity_save()), added 10 dummy data arrays keys for each one. users table grew to 300+MB. Running the update created the _d7_user_data table with 3 million rows in ~1 minute ( I don't have an exact number but it was fast enough for me to not get bored and watch the progress bar until it was through ;)). This is a fast machine, so it might be slower for most others, but still, that sounds fast enough to me?

sun’s picture

Status: Needs review » Reviewed & tested by the community

That makes total sense to me. I'd even go one step further and say - if you have a gazillion of users in your DB, will you really use the interactive update.php frontend for upgrading that site? Or won't you much rather use non-interactive tools like Drush instead, which are using the non-progressive Batch API to begin with? :)

That said:
Batch API++
yched++
whoever-else-who-originally-designed-it++ ;)

webchick’s picture

Assigned: sun » catch

Wow, great job on the upgrade path testing, Berdir!

Looks like catch has sunk his teeth pretty deep into this one, so tossing this on his plate.

catch’s picture

@Berdir: Just reading from $conf would be fine, #1833516: Add a new top-level global for settings.php - move things out of $conf is supposed to cover things in $conf that were never in the variables table but it's had very little attention as yet Needs to be dealt with though.

The uuid update above also needs to be configurable batch size (and larger by default). I didn't commit that patch so didn't pick it up at the time.

@sun: The batch size will affect the size of the query, so it's going to affect drush too. The number of queries executed can really matter with innodb - transaction flush settings etc.

However it's not worth holding this patch up, so let's please create a (major) followup to review all batched updates in core and ensure they've got a sufficiently large batch size by default and that the batch size itself is configurable via settings.php. We could use a single variable possibly since I doubt anyone would want to tune each individual update individually. I'll either open it myself or commit this once it's open (but not right now).

moshe weitzman’s picture

I opened #1851902: Review batch size for update functions and make configurable. I made it a normal task with 'revisit before release' tag. I think that adequately describes the current importance. Please up the priority if you think thats not sufficient.

catch’s picture

Issue tags: -DIE, -API clean-up

#102: user-data-347988-102.patch queued for re-testing.

Status: Reviewed & tested by the community » Needs work
Issue tags: +DIE, +API clean-up

The last submitted patch, user-data-347988-102.patch, failed testing.

Berdir’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
40.32 KB

Re-roll, diff on the patch files show that it is identical except of a fieldset/details change that git was able to merge just fine. So back to RTBC. Going to create the follow-up issue now.

Berdir’s picture

Here we go: #1852200: Standardize batch size in update functions and add a global configuration.

Edit: Ups. Missed that there already is an issue for this.

catch’s picture

Title: Move $user->data into own table » Change notice: Move $user->data into own table
Priority: Major » Critical
Status: Reviewed & tested by the community » Active

Committing this while it's hot. Just short of four years for this one!

Will need a change notice.

Berdir’s picture

Assigned: catch » Berdir

Working on one.

Berdir’s picture

Status: Active » Needs review

Change notice created: http://drupal.org/node/1852360.

Quick review? :)

moshe weitzman’s picture

In the example code, I would prefer is one checked for the success of db_insert() before doing a db_delete(). Might be that our real upgrade path needs same protection.

moshe weitzman’s picture

Priority: Critical » Normal
Status: Needs review » Fixed

Berdir let me know that db_insert() throws an exception on failure, not return FALSE. So we are done here.

amateescu’s picture

Title: Change notice: Move $user->data into own table » Move $user->data into own table
Priority: Normal » Major

Restoring more metadata.

chx’s picture

Title: Move $user->data into own table » [HEAD BROKEN] Move $user->data into own table
Category: task » bug
Priority: Major » Critical
Status: Fixed » Needs review
FileSize
475 bytes
xjm’s picture

This is the "The total length of a key has to be less than 333 characters" issue, for those following along at home.

xjm’s picture

+++ b/core/modules/user/user.installundefined
@@ -267,6 +260,54 @@ function user_schema() {
+    'indexes' => array(
+      'module' => array('module'),
+      'name' => array('name'),
+    ),

@@ -788,6 +828,142 @@ function user_update_8013() {
+      'module' => array(
...
+        'length' => 255,
...
+      'name' => array(
...
+        'length' => 128,

Like so.

chx’s picture

Note that the update function does not fill in the value for module so you will not break anyone's data this way. We might want a followup to bikeshed how long a modulename we want to allow versus the name field -- say, authmap has a 128 char long module field which I think should be enough but name comes from a PHP array key which had no limits previously...

The previous "official" limit was the 255 characters filename in system table (while we had one. sniff. poor system table.) which would theoretically allow for very long module names. Consider, however sites/somedomain.com/contrib/.module which is already 36 characters allowing only for 219 long character module names. I do not think 204 will hurt anyone.

chx’s picture

Re #119 the problem is > PRIMARY KEY (`uid`, `module`, `name`),

xjm’s picture

Okay apparently I fail at dreditor. :) But yeah, I agree with #120. I think this is the safest way to resolve this by far.

xjm’s picture

Status: Needs review » Reviewed & tested by the community

RTBC if it comes back green. We should file a change notice issue for an official module name length cap.

xjm’s picture

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Committed and pushed to 8.x. Thanks!

webchick’s picture

Title: [HEAD BROKEN] Move $user->data into own table » Move $user->data into own table
Category: bug » task
Priority: Critical » Major
xjm’s picture

fago’s picture

Coming late to the party :/ -but checking the patch I wonder:

+function contact_user_update($account) {
+ if (isset($account->contact)) {
+ drupal_container()->get('user.data')->set('contact', $account->id(), 'enabled', (int) $account->contact);
+ }

- If this is just storing entity properties/fields, why isn't that issued from the storage controller?
- Why do we expose a separate API for storing user data? When do I use that and when a field?

If that has been clarified already, please provide pointers. I've not been able to find it.

catch’s picture

Assigned: Berdir » catch
Status: Fixed » Reviewed & tested by the community

@fago: see #87 for thoughts on that.

If we want to move this to the storage controller and/or move some of the information to fields, then those would be good follow-up issues to this one. The main purpose here was to move 'random user settings' out of the serialized column. If we were to move this to fields, we might for example want to allow for fields that don't normally load their data onto the entity object- I don't need to know contact settings 99.999% of the time when loading a user.

catch’s picture

Assigned: catch » Unassigned
Status: Reviewed & tested by the community » Fixed

Whoops.

fago’s picture

Assigned: Unassigned » catch
Status: Fixed » Reviewed & tested by the community

Yeah. I think this should be either moved to be an proper entity field - maybe with lazy-loading, or be removed from $account and various hook_user hooks. Not sure what's the desired direction to take? berdir argued for being able to load it without the entity on IRC.

fago’s picture

Assigned: catch » Unassigned
Status: Reviewed & tested by the community » Fixed

cross post

xjm’s picture

Status: Fixed » Closed (fixed)
Issue tags: -DIE, -API clean-up

Automatically closed -- issue fixed for 2 weeks with no activity.