This patch modifies upload.module to associate files to uid's instead of nid's.

The benefits of this patch

  • Files won't be deleted when nodes are deleted.
  • You can have a filebrowser to manage files without needing nodes.
  • Simplify preview and upload handling for nodes with attached files.

The current version is incomplete. I still need to address file deletion. There are two options I have in mind.

  • clean up unused files with a cron job (possible 'undelete' opportunity)
  • let user clean up their own files via a file browser.

I prefer the filebrowser and should have an updated patch withit tomorrow evening.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

dopry’s picture

Status: Active » Needs review
FileSize
17.5 KB

Typo in the system update function.

moshe weitzman’s picture

subscribing. looks promising.

Dries’s picture

Associating files to users is useful! Great. However, does it have to be one or the other? Associating files with nodes can also be useful? Can you elaborate a little bit more about why you think we should no longer associate files with nodes?

dopry’s picture

I'm not precluding relating files to nodes, which the included patch maintains. Upload module work as expected in my tests, including after upgrading a test installation. I just think we should change the ownership to users. I think ownership does have to be one or the other. A lot of the reasoning I have revolves around the idea that user.inc and users are a lower level object than nodes which come from a module. It also simplifies issues that arise around files dependence on nodes.

What happens if multiple nodes want to refer to the same file owned by nodeB, and you delete nodeB. You lose that file, other nodes no longer look the same. Your archive is broken. If you delete userA is is easy to default the ownership of his files to another user, say uid 1, and referring nodes will be unaffected. An exaggeration of the idea would be you can run a drupal site without nodes, say just providing xml-rpc services. You cannot run a drupal site without users since the menu system uses access control that depends on users.

It also solves most of the problems around the what do we do with a file before we create a node. We would no longer need to store file metadata in the session until submit. We would no longer need to jump through hoops to get file previews to work before node submission. Which would make working with files simpler for drupal developers.

dopry’s picture

I also plan to sink the file related database functions into file.inc to maintain database sanity as people operate on files through code. The dependence on node currently prevents this for file_save_upload. I'm working on the filebrowser and fileupload form element currenlty. I hope to have the updated patch later this evening.

dopry’s picture

Title: Convert files to be associated to a user id instead of a node id. » Convert files to be associated to a user id instead of a node

This is an updated patch that moves all the work of saving an upload to file.inc instead of hacking it into upload.module. To test this patch you may need to ALTER TABLE files ADD COLUMN uid AFTER nid.

This patch offers us the ability to remove the session and menu hacks for displaying file_previews. It also allows us to store files without a node relationship.

Upload.module works as usual, only its work in the database is limited to the file_revisions table and maintaining the relationships between files and nodes.

With this patch you could theoretically write a module that alters the upload form adding a select existing file option and injecting it into _SESSION['upload_files'] once selected to add existing files.

It also opens the option to merge more file db handling into file.inc making it easier for developers to move and manipulate files with single function calls.

dopry’s picture

FileSize
47.95 KB

This is an updated patch that moves all the work of saving an upload to file.inc instead of hacking it into upload.module. To test this patch you may need to ALTER TABLE files ADD COLUMN uid AFTER nid.

This patch offers us the ability to remove the session and menu hacks for displaying file_previews. It also allows us to store files without a node relationship.

Upload.module works as usual, only its work in the database is limited to the file_revisions table and maintaining the relationships between files and nodes.

With this patch you could theoretically write a module that alters the upload form adding a select existing file option and injecting it into _SESSION['upload_files'] once selected to add existing files.

It also opens the option to merge more file db handling into file.inc making it easier for developers to move and manipulate files with single function calls.

dopry’s picture

FileSize
42.96 KB

Here is re-rolled version against current HEAD with system_install modifications and system_update.

dopry’s picture

FileSize
47.94 KB

hrm.. last patch was the wrong format...

dldege’s picture

I did a fresh get from CVS, then install.php, then applied this patch, and finally update.php.

1. 6009 schema version is a collision now.
2. install.php doesn't work if you apply your patch and try to do a fresh install

Next I created a page node and tried attaching a simple .png file and got there errors when clicking attach.

* notice: Undefined offset: 1 in C:\Program Files\xampp\htdocs\drupalhead\modules\upload\upload.module on line 514.
* notice: Undefined property: stdClass::$fid in C:\Program Files\xampp\htdocs\drupalhead\modules\upload\upload.module on line 189.
* notice: Undefined property: stdClass::$fid in C:\Program Files\xampp\htdocs\drupalhead\modules\upload\upload.module on line 190.
* notice: Undefined property: stdClass::$fid in C:\Program Files\xampp\htdocs\drupalhead\modules\upload\upload.module on line 428.
* notice: Undefined property: stdClass::$fid in C:\Program Files\xampp\htdocs\drupalhead\modules\upload\upload.module on line 444.

but the file was listed in the form.

I then submitted the form and got

* notice: Undefined offset: 1 in C:\Program Files\xampp\htdocs\drupalhead\modules\upload\upload.module on line 514.
* notice: Undefined property: stdClass::$fid in C:\Program Files\xampp\htdocs\drupalhead\modules\upload\upload.module on line 189.
* notice: Undefined property: stdClass::$fid in C:\Program Files\xampp\htdocs\drupalhead\modules\upload\upload.module on line 190.
* notice: Undefined property: stdClass::$fid in C:\Program Files\xampp\htdocs\drupalhead\modules\upload\upload.module on line 428.
* notice: Undefined property: stdClass::$fid in C:\Program Files\xampp\htdocs\drupalhead\modules\upload\upload.module on line 444.
* notice: Undefined property: stdClass::$old_vid in C:\Program Files\xampp\htdocs\drupalhead\modules\upload\upload.module on line 397.

No file attachments are listed for the node and there are no rows in {files}.

{file_revisions} did have one row

--
-- Dumping data for table `file_revisions`
--

INSERT INTO `file_revisions` VALUES (0, 0, 2, 'acc1.png', 0);

quicksketch’s picture

Status: Needs review » Needs work

Here's an updated patch which takes the 1013 update spot and cleans up some comments and notices.
- Changed update function
- Comments made to follow code standards (start with Capital, end with period, line wrap)
- Fixed PHP strict notices
- The file validation looks like it's being skipped entirely in _file_save() I think correcting the non-existent $file variable to $src corrected this.

Some comments:
- Uploading multiple of the same file replaces previous one rather than changing the name
- In the file_revisions table, the nid column is always 0

All in all, a completely necessary patch. Can't wait to see this going in.

quicksketch’s picture

FileSize
51.19 KB

Sorry the patch meant for the last post here.

dopry’s picture

FileSize
53.31 KB

fixed items pointed out by quicksketch...

removed some incomplete code and file_check_upload that is no longer used.

dopry’s picture

Status: Needs work » Needs review

hrm... yeah status.

dldege’s picture

I tried the most recent patch and was able to attach 3 files to a new page node. I then deleted the node and got some unrelated node errors but the node was deleted and the files table still had the file entries but file_revisions was properly updated. So, it looks like the current patch works as advertised but that was the extent of my testing.

dopry’s picture

Title: Convert files to be associated to a user id instead of a node » Simplify File Uploads, Centralize File Validation and Quotas, Fix File Previews....
FileSize
55.52 KB

UnConeD came up with a novel way to solve the dangling files situation. Using session garbage collection hooks that don't exist yet and a temp flag in the table. Since those sessions hooks don't exist, yet, I added a timestamp and status column to the files table and garbage collection in system_cron.

modules that implement uploads, after this patch hits core, will need to UPDATE {files} SET status = %s WHERE fid = %d, FILE_STATUS_STORED, once the file hits a permanent state. upload.module does this in upload_save.

quicksketch’s picture

That sounds *awesome*. I'll test and review.

quicksketch’s picture

First impressions:

- Has a minor applying conflict in system_cron
- Removed file_check_upload still in user.module, system.module, and locale.inc
- No mechanism added for moving file from temporary status to permanent
- Status field probably only needs to be 1 length
- Indexes on status and timestamp?
- Delete checkbox should be renamed "Remove" as it doesn't actually delete the file

quicksketch’s picture

Per my post above, I'm calling the "Delete" checkbox "Remove" in this post. Also in my previous post "Removed file_check_upload..." should be "file_check_upload needs to be removed...".

Things that do work:
- Files immediately moved to final location. On save, they're verified and status set correctly.
- Node IDs are now saved correctly in file_revisions.
- File previews. Awesome! Here we come imagecache previews :)
- Revisions! Also very cool. Adding new files and creating new revisions (mostly) works as expected. This a great coup.

More things that could be improved:

- If the remove checkbox is checked AND a new revision is made simultaneously, the file isn't removed from the new revision. No action occurs and it remains in both the new and previous revision.

- If a file is uploaded, then immediately removed from the post, it could be deleted right away rather than waiting for cron... Hmm, actually it looks like the file status is set to 1, even if it is removed from the post immediately after uploading. So the file gets uploaded, even though it never was used anywhere. Immediately removing all files with status = 0 and the remove checkbox checked would solve both problems.

- Hooks in various places. A few that would be usefult 1) when the file is uploaded 2) when the file is confirmed or when the status set. These could probably go in upload.module, and let other modules modify as necessary. One of my long standing peeves has been upload.module dumping files directly in the root /files directory. Adding the possibility to modify it's behavior is a must. Dopry has mentioned that these will go in a later patch, but we should be thinking about where we'd like to include them.

dopry’s picture

Here is an updated patch that...

1) Removes file_check_upload in user.module, system.module, and locale.inc
2) add a mechanism for changing file status, file_set_status.
3) adds indexes to timestamp and status
4) renames the delete checkbox to remove.
5) fixes revision support.
6) Adds garbage collection delay as a setting.

known bugs:

theme logo and favicon uploads put files in the right place, but the new files do not appear in the theme. I believe this to be the fault of Garland and not my patch.

retorts:

- I don't want status field as int(1) so additional status can be added in the future.
- I will not include hooks, there is already enough in this patch, create a new task for them.

dopry’s picture

FileSize
61.28 KB

Here is an updated patch that...

1) Removes file_check_upload in user.module, system.module, and locale.inc
2) add a mechanism for changing file status, file_set_status.
3) adds indexes to timestamp and status
4) renames the delete checkbox to remove.
5) fixes revision support.
6) Adds garbage collection delay as a setting.

known bugs:

theme logo and favicon uploads put files in the right place, but the new files do not appear in the theme. I believe this to be the fault of Garland and not my patch.

retorts:

- I don't want status field as int(1) so additional status can be added in the future.
- I will not include hooks, there is already enough in this patch, create a new task for them.

Dries’s picture

1. Maybe FILE_STATUS_STORED should be FILE_STATUS_PERMANENT? Not sure.

2. In the code comments, I would explain what it means for a file to be 'temporary'. The current documentation makes too many assumptions about the developer knowing how PHP's file handling work. I think we could drastically help new developers by adding one or two more lines of documentation.

3. The line-wrapping of code comments is inconsistent.

4. file_destination has no PHPdoc and the filename does not communicate what the function does. The function seesm to do various things (creating a file, outputting a message, etc). Add some PHPdoc comment to explain when, why and by whom this function should be used.

5. Some code comments (sentences) start with a lower-case letter, whereas all others start with a capital case letter. Consistency!

6. In code comments we write 'Drupal', not 'drupal'.

7. What is " Drupal's file temp"? I know what it is, but it's not very readable for non-native English people, or those not familiar with PHP's file handling.

8. "+ // If we got this fat lets get our fid save this puppy to the db.". Fat should be far.

9. if(db_query('UPDATE ... -- coding style. We write a space after the 'if'.

10. I'd remove the file_gc_ttl setting. Just make this a define. Just pick a good default and no one will need to modify this. By adding this settings, we're exposing too much confusing implementation details to the user.

That's all for now ...

BioALIEN’s picture

Subscribing... dopry, you the man!

drewish’s picture

FileSize
61.81 KB

I've:

  • removed file_gc_ttl setting and replaced it with a constant DRUPAL_MAXIMUM_TEMP_FILE_AGE.
  • Tried to clean up the comments and fix the style issues Dries pointed out.
  • renamed FILE_STATUS_STORED to FILE_STATUS_PERMANENT. I think it's a better description.
  • replaced a bunch of needlessly double quoted strings with single quotes.

the things i'd still like to see resolved with this patch:

  • rename file_destination() to something that better describes what it does. we've already got a ton of badly named functions in file.inc. i'd rather not add another.
  • file_validate() shouldn't have error messages about files "attached to this post", it could be uploading a user's profile image for all it knows.
  • shouldn't user.module system.module be calling file_set_status() to mark their files as permanent? rather than just having a comment?
drewish’s picture

The more I play around with this the more I think we're going to have to make some major changes before it's core worthy.

We're not deleting files now, just removing the row from {file_revisions}. So it seems like the {file_revisions} table should really be named something else, {node_files}? {upload}? All it's doing is letting the upload module associate files to nodes. Changing the name to {upload_something} would make it clear to other module that they shouldn't put their files in it unless they want upload monkeying around with them.

I don't like all the validation that's occurring in file_validate(), it'd be much better off in a hook so the module that's handling the upload can enforce it's own requirements. This would move all the upload limits from the system module back to the upload module and put them into a hook_file_validate (or something like that) handler.

It seems like there's a choice, we can preserve the old upload behavior of many-files-to-one-node in which case deleting nodes results in deleted files. Or, allow a file to be attached to multiple nodes in which case we probably need to provide a UI for users to attach their existing files to nodes. And a UI or managing and deleting files. I'm assuming that dopry is in favor of the later.

drewish’s picture

so after talking with dopry on IRC, i'll try and work on the following:
- renaming {file_revisions} to {upload} since it's upload specific attachments.
- restore the the previous upload.module behavior of deleting the node deletes the files.
- work out a better comment for user.module and system.module to explain why they're copying the temp files rather than calling file_set_status().

and i'll shelve my ideas for file_validate() and get that into the subsequent hooks patch. though i don't want to rename any of the upload_* quota variables because I hope to put at least half of them back into upload ;)

drewish’s picture

FileSize
64.69 KB

So this patch restores the previous behavior for the upload module. When you delete a node, attached files are deleted too.

It renames {file_revisions} to {upload} to make it clear that the table is just for the upload module to work with.

I realized that the update function wasn't adding the status or timestamp columns. I fixed that and spent quite a bit of time trying to make sure that the update is correct. I filled the part for PostgreSQL and I'm hoping someone can test it. The MySQL half is a bit more verbose than it could be but I wanted it to match up to the Postgres part.

It also adds a better comment to places that are using file_save_upload() and then making a copy.

drewish’s picture

oh, i just noticed that i'd left in a little test code that changes DRUPAL_MAXIMUM_TEMP_FILE_AGE to 1 second. if you're going to test this before i post another patch you'll want to fix that.

drewish’s picture

FileSize
64.69 KB

okay well this fixes that constant. i'm going to hold off on doing any more until i can talk to dopry and figure out where our grand visions overlap.

forngren’s picture

subscribing.

drewish’s picture

I created a new issue for hook_file (or what ever we end up calling it): http://drupal.org/node/142995

dopry’s picture

FileSize
65.81 KB

Here is the same patch with some doc updates to the status column and hey drewish alread doc'd file_destination. I also fixed some grammar in the phpdoc for file_validate.

dopry’s picture

Status: Needs review » Reviewed & tested by the community

I believe the last of the doc issues to be addressed. I'll revisit them if necessary, but there are still more file patches hoping to land we can clean up docs in.

Dries’s picture

Status: Reviewed & tested by the community » Needs work

A couple of things:

1. system_update_6014 should be system_update_6015. I fixed that on my local copy.

2. When I upload a file using the upload module, I no longer have to click 'Attach'. It will be uploaded automatically on 'Preview'. Is that new?

3. The PostgreSQL upgrade path is not tested (according to the TODO). We can test it later.

4. This might warrant an entry in the CHANGELOG.txt.

5. "Filesystem Limits" should be "Filesystem limits". Actually, I think it's also "File system" and not "Filesystem".

6. On ?q=admin/settings/file-system, there is inconsistent use of fieldsets. The top-part should probably also be wrapped in a fieldset.

7. My configuration settings now has 'File system' and 'File uploads'. I originally went to look in 'File uploads' to find the per-role things. I wonder if it would be better to consolidate these into one page or something. Food for thought. The current behavior is confusing because I upload the files to the database, so the quota have nothing to do with the 'file system' (well, indirectly it has, of course). At least, that was my reasoning but maybe I'm biased (as a techie).

8. The settings per role don't have the 'MB'-postfix after the textfield.

9. We don't explain (on the configuration screen) how the 'File system limits' impact the per-role settings. It's not clear to me what settings take preference. *confused*

10. Also, what happens when I'm in 2 or 3 roles? What settings would apply to me?

drewish’s picture

Status: Needs work » Needs review
FileSize
67.14 KB

1. system_update_6014 should be system_update_6015. I fixed that on my local copy.

Done.

2. When I upload a file using the upload module, I no longer have to click 'Attach'. It will be uploaded automatically on 'Preview'. Is that new?

No, that's the been the behavior since at least 5.0. Attach just gives you the nice javascript upload and avoids a page refresh.

4. This might warrant an entry in the CHANGELOG.txt.

I think you're right. I've added a few lines.

5. "Filesystem Limits" should be "Filesystem limits". Actually, I think it's also "File system" and not "Filesystem".

I changed to "File system limits"

6. On ?q=admin/settings/file-system, there is inconsistent use of fieldsets. The top-part should probably also be wrapped in a fieldset.

7. My configuration settings now has 'File system' and 'File uploads'. I originally went to look in 'File uploads' to find the per-role things. I wonder if it would be better to consolidate these into one page or something. Food for thought. The current behavior is confusing because I upload the files to the database, so the quota have nothing to do with the 'file system' (well, indirectly it has, of course). At least, that was my reasoning but maybe I'm biased (as a techie).

I'd be into moving those settings back to the upload myself. Other than the total disk quota I think all those setting should be upload module specific. Other modules will want to implement their own extension white lists and size/dimension requirements. I'll wait for dopry to weigh in on this.

8. The settings per role don't have the 'MB'-postfix after the textfield.

Fixed.

9. We don't explain (on the configuration screen) how the 'File system limits' impact the per-role settings. It's not clear to me what settings take preference. *confused*

10. Also, what happens when I'm in 2 or 3 roles? What settings would apply to me?

It's not clear to me either. I'll try to follow up on this later.

I've also moved the creation of the upload table from the system.install file to a new upload.install file.

drewish’s picture

FileSize
57.68 KB

i'll admit the per-role file system limits is wonky but it's what's in core now. to lessen the impact of dealing with it i've returned the upload validation settings back to the upload module. there's still a lot of upload.module specific logic in file_validate() that we'll need to take care of once this is in but that's best done via the the proposed hook_file.

this patch is a huge improvement over where we're at now. moving to user, rather than node owned, files makes it possible for a much more diverse range of uses for core's files table. putting the file munging functions into files.inc also makes it much easier for contrib modules to securely handle files.

drewish’s picture

FileSize
5.97 KB

re-roll to keep up with HEAD.

drewish’s picture

FileSize
56.55 KB

obviously, that was the wrong patch.

drewish’s picture

looking at this i feel like file_validate() should be passed an array of callback functions and parameters so that the callers can determine what restrictions should be placed on the file, valid extensions, dimensions, sizes, that sort of thing.

$file = file_validate('image', array(
    'file_size_compare' => array('lt', 1048),
    'file_valid_extensions' => array('img jpg gif'),
    'file_is_image' => array(),
  )
);
Owen Barton’s picture

Wow, this looks nice :)

I agree with pushing off the role stuff into a hook_file, and focusing on the important stuff here (i.e. the uid association, and all the good stuff that it brings along with it).

Dries’s picture

Looking better, but still not up to core-quality. For example, in PHPdoc/documentation we write "Drupal" and not "drupal". We usually don't glue words together (i.e. we use user_size instead of usersize).

Give this patch some more love, and it is ready to go in. :-)

drewish’s picture

Thanks for the review Dries, I'll fix the capitalization issues and try to clean up those variables. In our defense, the $total_usersize bit was already in core, we just moved it around. But the right thing to do would be fix it when we touch it.

drewish’s picture

FileSize
60.16 KB

make some pretty big changes here. file validation is now an array of callbacks passed into file_save_upload() as part of it's new options parameter. i'm starting to like where this is going. all that's really left is figuring out how to pass in a list of valid extensions for file_munge.

drewish’s picture

FileSize
60.6 KB

did some cleanup in the file name munging function. i need to figure out if file_save_upload() really has to be passing it a whitelist of file extensions.

also, i want to point out that the previewing the node after uploading a file doesn't work (you get empty form controls on the second preview) but it's not the fault of this patch, it's broken in core right now.

Dave Cohen’s picture

I will undoubtably be writing modules that prompt for uploads and associate the files with a node. Here are the assumptions I am making, can someone confirm?

  • The {files} database table is part of core, and every module (including the ones I write) can and should list their files there. And this happens if the APIs in file.inc are used.
  • The {upload} database table belongs to the upload module. My modules should not touch that table, even if they do want to associate files with nodes.

If I am correct, I have a suggestion:

The code in upload_file_download is problematic because it allows all users with 'view uploaded files' permission to download any file in the {files} table. It should instead query the {upload} table. I've had problems in the past where download hooks like this break any security I've imposed on a site. (In short, upload_file_download is allowing users to download files it really knows nothing about.

drewish’s picture

yogadex, your assumptions are correct and your concern about upload_file_download() is correct. the query should join to the upload table to only allow permissions for its files.

this ties into another issue i've been trying to address: adding module field to the files table. it'll make much apparent what module put the entry into the files table and who's responsible for it. that issue changes the query to include a WHERE module='upload'. the issue is marked RTBC but if you wanted to weigh in with your support it i'd appreciate it.

drewish’s picture

yogadex, i didn't realize that i was preaching to the choir, i just noticed that you'd commented on that issue.

drewish’s picture

FileSize
60.86 KB

updated all the upload.module queries to join {files} to {upload} to ensure they're only counting upload's files.

Dave Cohen’s picture

In these patches, the {upload} table is little more than the {file_revisions} table which the patches do away with. I predict many modules will want to map files to nodes (revisions). So I suggest file.inc provide, in addition to the {file} table, a {file_revision} table. And an API for easily writing and reading from that table. Upload module would use that API to store its file <--> node mappings, and a table of its own to store upload-specific data, like the 'list' column.

The table would look something like:

     db_query("CREATE TABLE {file_revision} (
       fid int unsigned NOT NULL default 0,
       vid int unsigned NOT NULL default 0,
       module varchar(255) NOT NULL default '',
       PRIMARY KEY (fid, vid),
       KEY (vid)
       ) /*!40100 DEFAULT CHARACTER SET UTF8 */ ");

The current core (before this patch) maps files to nodes. This is useful to upload module and various third-party modules. So why remove that nice feature, just because we want to associate files with users? I'm not saying every file has to be associated with a node. But in my experience, most of them will be. Including a table like this in Drupal core will save developers like me the trouble of re-inventing it later.

drewish’s picture

yogadex, I'm glad to get your feedback, it seems like you've also given this topic a lot of though. Looking back up the thread I realize that there are quite a few assumptions that haven't been explained very clearly.

The point of this change is to start centralizing the file handing that contrib modules have been continually re-inventing. But we want to so in a way that allows for a variety of different relations between files and Drupal objects. This change doesn't prevent users from mapping files to nodes, it just removes the assumption that files will map to nodes.

The reason for renaming file_revisions to upload, is to make it clear that that table is for the upload module's exclusive use. I've seen quite a few modules that insert their records and then expend considerable effort to prevent the upload module from displaying those files in its output (see upload_image, image, etc). In large part I think it was because the generic name led developers (myself included) to believe that that was the correct thing to do.

You ask about creating a default table for mappings. The problem with such a default is that it's a one-size-fits-all, one-to-one relationship. If you want to be able to map a file to multiple nodes, you'll have to create your own table. If you want to map files to taxonomy terms, you'll have to create your own table. If you want to map files to comments, you'll have to create your own table.

What it comes down to is, if you're writing a module that wants to map files to nodes, odds are you'll also have some additional metadata to store along with that relation. Creating your own table gives you the ability to do that and is much cleaner that than trying to repurpose the upload modules table. And with the .install files it's very easy to create tables.

drewish’s picture

FileSize
71.09 KB

I'm really getting excited about this patch. I've consolidated all the image validation and resizing from the user and upload modules into: file_validate_image().

While I was doing that I made some changes to the upload module adding a _upload_file_limits() function to determine the effective upload permissions from all of a users roles. This gets used in two places, validating the upload and in a new message to the user explaining what restrictions are in effect before they start the upload. IMHO it's a very nice usability improvement.

Owen Barton’s picture

FileSize
70.93 KB

Just reviewing this, but it seems the patch was already out of sync...so here is a reroll!

Owen Barton’s picture

Here is my review:
* Tested node uploads (with and without image resizing), user avatars, site logo/icon, file system/upload admin forms. I didn't test a site upgrade.
* Everything worked fine, with the exception of two files that remained in my files table after all was done. Others I managed to delete easily by bumping the timestamp down and hitting cron, but these just seem to stick - they are status 1, there are no corresponding entries in the uploads table, and the files are still on disk. I got some popups with the upload AJAX (xdebug throwing unrelated notices), so perhaps this was it (although I don't see how...a race condition somewhere?). Let me know if there is anything else you would like me to check!
* Code style and commenting look great

drewish’s picture

FileSize
70.03 KB

here's a re-roll post schema api. i haven't been able to test the upgrade it because it seems like something in update.php is broken.

drewish’s picture

FileSize
70.49 KB

okay this changes the parameters of file_save_upload:

function file_save_upload($source, $validators = array(), $dest = FALSE, $replace = FILE_EXISTS_RENAME) {

which is much closer to the original signature.

it also splits the image validation in to two functions: file_validate_is_image() and file_validate_image_resolution(). this lets upload.module enforce dimensions but not require image files.

from here on i'll heed the advice dopry gave me on irc and stop changing thing so we can get this committed ;)

drewish’s picture

FileSize
70.39 KB

small change, the filesizes of upload.module attachments weren't being checked, i'd put in the wrong function name.

dmitrig01’s picture

ummm

+    * Entries in the files table are now keyed to a user, not a node. Modules need to have their own table to join files to nodes.

I consider myself a pretty decent coder :P but I can't understand 'Modules need to have their own table to join files to nodes'
maybe it should be *link* files to nodes

drewish’s picture

dmitrig01, that's a good point, i was using join in the query sense of {node} INNER JOIN {files}. link is probably a more widely understood term. i'll make sure to updated that in the next re-roll of this patch.

drewish’s picture

FileSize
70.59 KB

Bumped up the update number.
Updated the change log to take into account dmitrig01's comment.

pwolanin’s picture

FileSize
72.92 KB

In general, this looks great. However, patch doesn't apply cleanly. Fuzz and one conflicit. I made a manual merge for function upload_js(), but YMMV. Patch attached.

I also see the problem where a file marked for removal during node creation is still uploaded.

Is the name munging working right? For example, if I attach cron.php, in the upload list it shows cron.php.txt, but the URL to the file is http:/example.com/files/cron.php

In D.5, the filename itself seems to be changed (though maybe I don't understand what's going on). For example, this script I posted:
http://drupal.org/node/146425#comment-251104 the listed filename and URL match.

drewish’s picture

pwolanin, thanks for the review and update.

I also see the problem where a file marked for removal during node creation is still uploaded.

i'm not totally clear on what you mean here. how can a file marked for removal be uploaded? it'd have to be uploaded before the form could have a remove checkbox...

Is the name munging working right?

it should be... i'll do some testing on this as soon as i'm able to reinstall HEAD. i've been caught by #55516.

ChrisKennedy’s picture

file_validate_name_length() - $errors[] = t('Its name is too long.', array('%name' => $file->filename)); -> do you mean to use "%name's" instead of "its"?

file_space_used() - the current IF statement precludes the use of 0 as the uid, which might be useful to support.

file_validate_is_image() - checking if the extension is empty isn't necessary, nor is it done in the next function (file_validate_image_resolution) to check if the file is an image. I would just do if (!image_get_info($file->filename)). The internal comment can be removed too.

if (preg_match('/\.(php|pl|py|cgi|asp|js)$/i', ... -> doesn't the regex remove the need for the substr on that same line?

adrian’s picture

I've read through the patch, and tested the functionality.

A big +1 from me.

drewish’s picture

FileSize
70.63 KB

ChrisKennedy, thanks for the review you raised some good issues.

file_validate_name_length() - $errors[] = t('Its name is too long.', array('%name' => $file->filename)); -> do you mean to use "%name's" instead of "its"?

I just dropped the unused %name parameter. The file name is inserted into the message in file_save_upload() and I found it distracting when it was repeated.

file_space_used() - the current IF statement precludes the use of 0 as the uid, which might be useful to support.

Good call, I've changed that from 0 to NULL.

file_validate_is_image() - checking if the extension is empty isn't necessary, nor is it done in the next function (file_validate_image_resolution) to check if the file is an image. I would just do if (!image_get_info($file->filename)). The internal comment can be removed too.

I'm unsure about this change... Looking at the way that image_get_info() is implemented by calling getimagesize() which can return more values than the 'gif', 'jpg', and 'png' that Drupal supports. Functions like image_gd_resize() use the extension to use to use the correct function to load the image. So effectively if it doesn't have an image we can't resize or manipulate it. In my opinion we're better of filtering these images out sooner... but I'm not absolutely set on this I'll drop it if it hold up the patch.

if (preg_match('/\.(php|pl|py|cgi|asp|js)$/i', ... -> doesn't the regex remove the need for the substr on that same line?

I've got no idea. It looked like black magic to me so I didn't touch it. Looking at the patch I'mm fairly certain that it was copied bit-for-bit from upload.module.

Stefan Nagtegaal’s picture

Status: Needs review » Needs work

OKay, another review here.

My marks:
+ drupal_set_message(t('Your upload has been renamed to %filename to conform to the site policy.', array('%filename' => $filename)));

Ehm, perhaps this is nitpicking but I think this isn't the right description for the problem.
This message would suggest that people could read some site policy somewhere, which isn't included on every website out there.
Unfortunatly I can't come up with a better description. Maybe that's because the functionality of file_munge_filename() isn't clear to me.

+ drupal_set_message(t('The file %file could not be saved, because it exceeds the maximum allowed size for uploads.', array('%file' => $source)), 'error');
Shouldn't we give feedback to the user what the allowed size for uploaded files is in the description?

+ drupal_set_message(t('The file %file could not be saved, because the upload did not complete.', array('%file' => $source)), 'error');
Not sure about this one, but it feels like a weird description.

+ watchdog('file', t('Upload Error. Could not move uploaded file (%file) to destination (%destination).', array('%file' => $file->filename, '%destination', $file->filepath)));
'Upload Error', should at least be 'Upload error'. Besides that i would suggest removing the () around the variables.

+ $errors[] = t('Its name is too long.');
Would be better if it was:
+ $errors[] = t('The filename selected for upload exceeds the 255 characters limit. Please rename it.');

Furthermore, I would love to see a watchdog message for:
+ case UPLOAD_ERR_INI_SIZE:
History tells us that people are setting the drupal filesize variable to 0, and still max_upload_size in php.ini interfers with their upload sizes. Having a notice/error inside the watchdog would be pretty easy to implement and will save us a couple of 100 support issues a month. ;-)

Afterall, i definatly like the functionality of the patch!
It works great, allthough i think the above *small* additions/changes would make this RTBC.
I'll keep an eye on this issue, and review it again when you did an update.

If English was my native language, I would have updated the patch myself, but unfortunatly I don't think it would be in favour of this patch. ;-)

Keep up the good work!

drewish’s picture

Status: Needs work » Needs review
FileSize
70.31 KB

In response to pwolanin's issue on #60 I gave file_save_upload() a close look and noticed that the addition of '.txt' onto the end of dangerous file names was happening after the destination filename was computed. I re-orded the flow to fix this but I'd like to have some other eyes on it before we commit it.

Now for Stefan Nagtegaal's coments:

+ drupal_set_message(t('Your upload has been renamed to %filename to conform to the site policy.', array('%filename' => $filename)));

Ehm, perhaps this is nitpicking but I think this isn't the right description for the problem.
This message would suggest that people could read some site policy somewhere, which isn't included on every website out there. Unfortunatly I can't come up with a better description. Maybe that's because the functionality of file_munge_filename() isn't clear to me.

I've changed it to: 'For security reasons, your upload has been renamed to %filename.'

+ drupal_set_message(t('The file %file could not be saved, because it exceeds the maximum allowed size for uploads.', array('%file' => $source)), 'error');
Shouldn't we give feedback to the user what the allowed size for uploaded files is in the description?

Yeah, good point. How about 'The file %file could not be saved, because it exceeds %maxsize, the maximum allowed size for uploads.'

+ drupal_set_message(t('The file %file could not be saved, because the upload did not complete.', array('%file' => $source)), 'error');
Not sure about this one, but it feels like a weird description.

I think it's alright but I'm open to suggestions.

+ watchdog('file', t('Upload Error. Could not move uploaded file (%file) to destination (%destination).', array('%file' => $file->filename, '%destination', $file->filepath)));
'Upload Error', should at least be 'Upload error'. Besides that i would suggest removing the () around the variables.

Done.

+ $errors[] = t('Its name is too long.');
Would be better if it was:
+ $errors[] = t('The filename selected for upload exceeds the 255 characters limit. Please rename it.');

Excellent point, I've changed that to: 'Its name exceeds the 255 characters limit. Please rename the file and try again.'

Furthermore, I would love to see a watchdog message for:
+ case UPLOAD_ERR_INI_SIZE:
History tells us that people are setting the drupal filesize variable to 0, and still max_upload_size in php.ini interfers with their upload sizes. Having a notice/error inside the watchdog would be pretty easy to implement and will save us a couple of 100 support issues a month. ;-)

I was toying with changing the message to "The file %file could not be saved, because it exceeds %maxsize, PHP's maximum allowed size for uploads." It would help admins but be too technical for end users. Maybe a watch dog message is the way to go... There's already a note on the upload_admin_settings() form so the admin knows what PHP's hard limit is when they're adjusting the upload module's settings... perhaps a separate patch could add some theming to make it more visible. It does sort of lurk at the bottom of the page. For now I'm going to leave it alone.

I've made a few other small changes with the patch:
- Fixed a bug in updates that crept in when converting to the schema functions.
- Removed all the trailing whitespace I was introducing.

drewish’s picture

FileSize
70.32 KB
<Dries_> drewish: i just spent another 10 minutes looking at your patch and it looks sane
...
<Dries_> + *       The total number of bytes for all a users files.
<Dries_> should be "for all of the user's files" ?
<drewish> Dries_, yes you're correct
<Dries_> if the patch is well-tested, it looks like it might be ready to go

So if I can get a few more people to bang on it, I think its RTBC. The attached patch fixes the punctuation error Dries found.

ChrisKennedy’s picture

I think it's fine to keep file_validate_is_image() the way it is. The substr() can be removed from that line with the preg_match though, because the $ anchor makes it impossible for the substr test to ever be used/executed.

Stefan Nagtegaal’s picture

Status: Needs review » Reviewed & tested by the community

Did another eview and tested functionality. Looks very good..
Everything works as expected, so IMO it's a go!

When this hits the trunk, hook_file() is the only thing left for much improved file handling for drupal 6! :-)

drewish’s picture

on IRC ChrisKennedy mentioned that the substr issue should hold this up. It can go in as a separate patch.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD. This is a killer patch. ;-)

dww’s picture

Status: Fixed » Needs work

very cool work, folks. glad to see this in core. however, the D6 upgrade docs need work:
http://drupal.org/node/114774
feel free to mark this "fixed" again once all API changes are mentioned there.

drewish’s picture

dww, good call. i've taken a first pass at the docs. i'd love some feedback.

drewish’s picture

Status: Needs work » Fixed

i think this is taken care of at this point.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Chill35’s picture

Status: Closed (fixed) » Active

The benefits of this patch:
* Files won't be deleted when nodes are deleted.

In what situation can this be a benefit? It's not altogether clear how file handling has improved.
Is there a place where we can read more than merely "files are keyed to users instead of nodes".
What practical consequences does that have?

In the change that was brought, a bug was introduced: see issue http://drupal.org/node/257716

coupet’s picture

Status: Active » Closed (fixed)

Excellent traction in file system!

You ask about creating a default table for mappings. The problem with such a default is that it's a one-size-fits-all, one-to-one relationship. If you want to be able to map a file to multiple nodes, you'll have to create your own table. If you want to map files to taxonomy terms, you'll have to create your own table. If you want to map files to comments, you'll have to create your own table.

Post #50

We need to document this in api and docs.

drewish’s picture

coupet, it could be expanded upon but it's mentioned in the updating docs: http://drupal.org/node/114774#upload-table