S3FS Does not currently limit the length of URI's before they attempt to be placed into the database. This applies to refresh-cache as well as writeCache.

Original issue text below:

When running drush s3fs-refresh-cache I am getting the following:
exception 'PDOException' with message 'SQLSTATE[22001]: String data, right truncated: 1406 Data too long for column 'uri' at row 54' in [error]
/var/www/vhosts/xyz.com/httpdocs/includes/database/database.inc:2204
Stack trace:
#0 /var/www/vhosts/xyz.com/httpdocs/includes/database/database.inc(2204): PDOStatement->execute(Array)
#1 /var/www/vhosts/xyz.com/httpdocs/includes/database/database.inc(683): DatabaseStatementBase->execute(Array, Array)
#2 /var/www/vhosts/xyz.com/httpdocs/includes/database/mysql/query.inc(36): DatabaseConnection->query('INSERT INTO {s3...', Array, Array)
#3 /var/www/vhosts/xyz.com/httpdocs/sites/all/modules/s3fs/s3fs.module(516): InsertQuery_mysql->execute()
#4 /var/www/vhosts/xyz.com/httpdocs/sites/all/modules/s3fs/s3fs.module(400): _s3fs_write_metadata(Array, Array)
#5 [internal function]: {closure}(Object(Guzzle\Common\Event), 'resource_iterat...', Object(Symfony\Component\EventDispatcher\EventDispatcher))
#6 /var/www/vhosts/xyz.com/httpdocs/sites/all/libraries/awssdk2/Symfony/Component/EventDispatcher/EventDispatcher.php(164):
call_user_func(Object(Closure), Object(Guzzle\Common\Event), 'resource_iterat...', Object(Symfony\Component\EventDispatcher\EventDispatcher))
#7 /var/www/vhosts/xyz.com/httpdocs/sites/all/libraries/awssdk2/Symfony/Component/EventDispatcher/EventDispatcher.php(53):
Symfony\Component\EventDispatcher\EventDispatcher->doDispatch(Array, 'resource_iterat...', Object(Guzzle\Common\Event))
#8 /var/www/vhosts/xyz.com/httpdocs/sites/all/libraries/awssdk2/Guzzle/Common/AbstractHasDispatcher.php(40):
Symfony\Component\EventDispatcher\EventDispatcher->dispatch('resource_iterat...', Object(Guzzle\Common\Event))
#9 /var/www/vhosts/xyz.com/httpdocs/sites/all/libraries/awssdk2/Guzzle/Service/Resource/ResourceIterator.php(185):
Guzzle\Common\AbstractHasDispatcher->dispatch('resource_iterat...', Array)
#10 /var/www/vhosts/xyz.com/httpdocs/sites/all/modules/s3fs/s3fs.module(403): Guzzle\Service\Resource\ResourceIterator->next()
#11 /var/www/vhosts/xyz.com/httpdocs/sites/all/modules/s3fs/s3fs.drush.inc(40): _s3fs_refresh_cache(Array)
#12 [internal function]: drush_s3fs_refresh_cache()
#13 /root/.composer/vendor/drush/drush/includes/command.inc(361): call_user_func_array('drush_s3fs_refr...', Array)
#14 /root/.composer/vendor/drush/drush/includes/command.inc(212): _drush_invoke_hooks(Array, Array)
#15 [internal function]: drush_command()
#16 /root/.composer/vendor/drush/drush/includes/command.inc(180): call_user_func_array('drush_command', Array)
#17 /root/.composer/vendor/drush/drush/lib/Drush/Boot/BaseBoot.php(64): drush_dispatch(Array)
#18 /root/.composer/vendor/drush/drush/drush.php(70): Drush\Boot\BaseBoot->bootstrap_and_dispatch()
#19 /root/.composer/vendor/drush/drush/drush.php(11): drush_main()
#20 {main}

I tried clearing all caches, but still get the error. I also don't have any files with filenames > 256. Any help in resolving would be greatly appreciated.

Comments

Anonymous’s picture

josephgut created an issue. See original summary.

Anonymous’s picture

Issue summary: View changes
Anonymous’s picture

Issue summary: View changes
coredumperror’s picture

When you say you don't have any filenames with > 256 characters, are you including the full path to the file within your bucket? The entire path is stored in the URI, including "s3://" at the start, and that's what needs be be < 256 characters.

Anonymous’s picture

That's a good question and something I'm not 100% sure about. So, I've gone in and deleted all files with names > 200 chars and recreating the bucket. I'll know in about 3 days how that works. Thanks.

Anonymous’s picture

Question, beyond the scope of your excellent module, but is there a good way to ensure that you limit to a certain file name + path length on upload? I don't see a way to do that within the field or field settings in Drupal 7. Thanks.

coredumperror’s picture

If it's not easily doable from the field settings page in the Drupal admin, I wouldn't know any way to go about it.

However, I just realized that you might not even have a choice in the matter of filename + path length in some cases. For instance, deeply nested images will end up with even deeper-nested derivative paths for the image styles.

Unfortunately, even if I were to change the 255-character limit on the s3fs_file table (s3 actually allows 1024-byte paths), that wouldn't help, because Drupal core's file_managed table is limited to 255 as well. :(

mikeryan’s picture

Version: 7.x-2.6 » 8.x-3.x-dev
Priority: Normal » Major
Status: Active » Needs review
StatusFileSize
new1 KB

I'd call this major, since it prevents *any* files from being cached if there's one that's too long.

This is about as good as we can do, I think...

sebas5384’s picture

Hi,

here's a different approach to this problem, since derived images (styles) uri can get bigger than 255 chars.

What this patch does is creating an "id" field which is a truncated version of the actual uri so in that way we can increase the varchar limit to 655 and in that way is more unlikely that this problem will ever occur.

I guess I could add the limit verification so in that way we don't sacrifice all the files from being mapped to the metadata table.
Another thing is that I didn't create a hook_update_N() in order to upgrade where this module is already running, which is not my case right now.

Let me know any thoughts about this patch, so maybe I could reroll the patch with those "missing parts".

Cheers.

mikeryan’s picture

I don't think simple truncation is safe enough - with deeply-nested folders you'll get dupes on everything in the same folder.

How about a hash of the full uri?

sebas5384’s picture

hi @mikeryan !

yes! you are right! I guess a sha1 hash should do the work.

sebas5384’s picture

Attached is the patch using the sha1 hash as id.

mikeryan’s picture

Status: Needs review » Needs work

A few comments:

  1. There will need to be an update function to update the table.
  2. Since we're not concerned with security here, ideally we should choose a hash algorithm based on minimal collision likelihood and speed. I looked at https://stackoverflow.com/questions/3665247/fastest-hash-for-non-cryptog... and https://softwareengineering.stackexchange.com/questions/49550/which-hash..., and MurmurHash looks promising. But, damn, it's not one of PHP's built-in algorithms, we would need to require lastguest/murmurhash.
  3. Adding the new function to the interface would technically be a BC break, because any other implementations of the interface would break. In practice, has anyone produced their own implementations?
mikeryan’s picture

Oh, and also - as @coredumperror pointed out previously, Drupal's file_managed table limits URIs to a length of 255, so there isn't much point in expanding s3fs_file.uri beyond that.

aguilarm’s picture

Ran into this today when I was testing out switching to using s3fs for public, with a pretty large amount of files.

One of the uris happened to clock in at 254 characters with the public:// scheme, but s3fs prepends s3://s3fs-public when you run drush s3fs:copy-local.

I found this issue against core for limiting the length: https://www.drupal.org/project/drupal/issues/193954

Unless core file_managed gets updated, I think logging the message and ignoring the file is perhaps more appropriate? The message, imo, should happen every time you refresh cache so it's clear that a file rename needs to happen.

It's anecdotal, but my error was a gratuitously long filename that should just be changed. On a 12 year old site with several hundreds of thousands of files I've only run into one error so far.

Of course, ideally, the core table length is increased at some point and this should be adjusted to match. It's probably this way because 255 was the max size for varchars in mysql pre-5.0.3 but that was released in 2005, and this probably happens so infrequently it has not been updated despite that.

jordan8037310’s picture

The patch in #8 doesn't handle the cases where writeCache() is called, which is to say when a filename URI path is short enough, but the Image Style path is too long. The image file fails with an integrity constraint on upload. I tried applying a patching to writeCache where we would skip it if the URI was too long, but the imageStyle just suffered an integrity constraint upon insert into file_managed table. This needs some more review

kapil17’s picture

For a work around i have directly updated the schema via sql query and added that in .install file so that can be updated through drush but you need to make your database settings is configured else you can get limit byte memory error on sql.

kapil17’s picture

kapil17’s picture

StatusFileSize
new671 bytes
kapil17’s picture

eddie_c’s picture

db_table_exists() is removed in Drupal 9, so I've re-rolled the patch in #19 for compatibility.

eddie_c’s picture

Actually having read the thread properly and tested some more I've realised that logging is the best strategy in my case. The patch in #8 does that.

cmlara’s picture

Title: Getting Exception 'PDOException'SQLSTATE[22001] When Running drush s3fs-refresh-cache » S3FS Ignores its database limit on URI length.
Issue summary: View changes

Changing title to better reflect the root cause, that we are not enforcing a length limit on files before we get to the cache writing stage.

I'm going to work this issue as a "prevent files from being created that exceed our database limit" (255 characters) and leave the length increase to be discussed in a separate issue.

cmlara’s picture

I believe this should solve this issue.

Key calls of stream_open(), rename(), mkdir(), getS3fsObject(), and writeCache() are being checked for URI length. writeCache should never trigger, but since this is where the database will throw an error a final check is included there. stream_open, rename(), and mkdir should cover all the normal operations of creating files. getS3fsObject should stop us from even trying to lookup the files data on S3 or the local cache.

Because of #3200867: Move/Rename 'directory' fails. It is not possible to be sure the directory rename portion of the restrictions are working properly and they are commented out in the tests.

  • cmlara committed 94e32d6 on 8.x-3.x
    Issue #2823409 by kapil17, sebas5384, cmlara, mikeryan, eddie_c,...
cmlara’s picture

Status: Needs work » Fixed

Directories are being treated as not able to be renamed.

Committed without the directory rename checks and small cleanups.

Moved the MAX_URI_LENGTH constant to S3fsServiceInterface as it is an S3fsService limitation.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.