In the following example, it's possible to see that the Sub_part for the unique key 'u_provider_type_base_id' is limited to 64 characters:

mysql> show index from scald_atoms;
+-------------+------------+-------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table       | Non_unique | Key_name                | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------------+------------+-------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| scald_atoms |          0 | PRIMARY                 |            1 | sid         | A         |          40 |     NULL | NULL   |      | BTREE      |         |
| scald_atoms |          0 | u_provider_type_base_id |            1 | provider    | A         |           2 |       64 | NULL   |      | BTREE      |         |
| scald_atoms |          0 | u_provider_type_base_id |            2 | type        | A         |           2 |       64 | NULL   |      | BTREE      |         |
| scald_atoms |          0 | u_provider_type_base_id |            3 | base_id     | A         |          40 |       64 | NULL   | YES  | BTREE      |         |

This can cause Duplicate Entry / Integrity constraint violations for a significantly long provider, type, or base_id. For example, we encountered this bug when creating two separate DMCloud videos, one with a base_id (URL, in this case) of:
http://api.dmcloud.net/player/embed/reallylonghashcodeherethatissomewhat...
and another with a base_id of:
http://api.dmcloud.net/player/embed/reallylonghashcodeherethatisnolonger...

For our case, with a provider of 'scald_sgv' and a type of 'video,' (and keeping in mind the 64 char limit), we thus have a unique key of:
scald_sgv-video-http://api.dmcloud.net/player/embed/reallylongha
for the first, and for the second:
scald_sgv-video-http://api.dmcloud.net/player/embed/reallylongha

Thus the integrity constant violation. It seems that the 'easy' solution is to increase the unique key to, at a minimum, 128 characters each. I am rolling a patch to the scald.install file that will handle this (posting soon).

Comments

jcisio’s picture

We should have limited base_id to 64 instead of 255. But it is too late now.

The key can't not be too long. In MySQL using UTF-8, it could not be longer than 333 characters (Drupal does not take care of it, but it should: #1852896: Throw an exception if a schema defines a key that would be over 1000 bytes in MySQL).

I suggest you use shorter base_id then. I don't think url should be used as base_id.

phillamb168’s picture

I would argue that 64 characters is much too limiting when it comes to a concatenated string of three different variables - especially when provider is limited varchar 255, type is varchar 128, and base_id is varchar 255. If any two elements are sufficiently long (in a human-readable context) it becomes impossible to handle anything of even moderate complexity. Example:

scald_external_source-articlewimage-december_24_2012_album_dad_and_mom

would be too long by 6 characters. It's easy to see that even with something as specific as this, there are other permutations of this string that would exceed the 64 character limit. Given that there is nothing in the documentation that flags this as a potential problem, it puts rather serious limitations on the use of the scald module in a high-volume production environment.

It is indeed too late now to fix this error from a character limitation perspective - perhaps the solution then is to change u_provider_type_base_id from unique to a standard index?

jcisio’s picture

Version: 7.x-1.0 » 7.x-1.x-dev

I think we will change this unique key to a normal key. The duplication control, if necessary, could be done at the validation.

jcisio’s picture

Title: scald_atoms unique key (u_provider_type_base_id) does not provide enough characters » Change scald_atoms unique key u_provider_type_base_id into normal key
Status: Active » Fixed

Discussed it in the #dcparis13 sprint yesterday and the consensus was to use a normal index. Committed in 0269d37.

About the very long key, well, it should not be. But if it is the case, then you can use for example the key hash for base_id.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.