Active
Project:
File Tree
Version:
6.x-1.0
Component:
Code
Priority:
Normal
Category:
Support request
Assigned:
Unassigned
Reporter:
Created:
27 Apr 2011 at 11:56 UTC
Updated:
8 Dec 2011 at 15:31 UTC
I set up filetree and dumped 200 finance documents in series of upload subdirectories, configured filetree and it worked perfectly as advertised. The speed and user experience is great! However then I realized that every time I save the node, the filetree directories and document names seem to get cached somewhere.
Problem is, that I would like to automate the transfer of finance documents into filetree directories at the end of each month. It seems that unless I refresh each node, then the filetree will not be up to date.
So, Is there a way to make the filetree show documents in directories in real-time or, trigger the refresh by cron job? thanks!
Comments
Comment #1
itp commentedI think that I just found the answer...
I just ran cron.php and files were refreshed. Issue seems to be Drupal cache which can be remedied by turning off cache, clearing cache or runnning cron job.
Comment #2
Toongenius commentedIf you want your page to update on the refresh, add php to your input filter and then add the following line to your body
$GLOBALS['config']['cache'] = false;This will disable caching for that particular page.
Comment #3
joelstein commentedThe problem here is that the HTML generated by all your filters gets cached, as you discovered. Flushing the caches works, which is what happens when cron.php runs. I'm not a big fan of the PHP filter (for security reasons), and there may be other ways to disable caching per node.
What would be really cool is to set a watcher on your folder and when the contents of your folder change, to write a small script which flushes the cache for any content using a filetree filter. But that sounds pretty specific... I can't envision a way to manage it through the module.
If anyone comes up with an idea, share it here and we'll try to make it work. Otherwise, I'm marking this as "works as designed."
Comment #4
RSpliet commentedhook_filter_info() expects the following optional array element for it's return value (Drupal 7):
"cache (default TRUE): Specifies whether the filtered text can be cached. Note that setting this to FALSE makes the entire text format not cacheable, which may have an impact on the site's overall performance. See filter_format_allowcache() for details."
Setting cache to false (or a setting in the newly invented settings menu) should solve the problem experienced. Having said that, it does not seem to work. Possibly this cache setting is broken in Drupal 7.7. However this might be way to solve this problem from a Filetree point of view, at least for Drupal 7+.
Comment #5
joelstein commentedI don't think that making File Tree a non-cacheable filter is a good solution, because it would make the entire text format to which it belongs un-cacheable. Since most people will probably add File Tree to their main text format, that would make nearly every node on their site un-cacheable.
One solution you could use in a custom module is to flush the node's cache when it's loaded and/or viewed (but only if it contains a [filetree] token), via hook_node_load or hook_node_view. The downside, obviously, is that File Tree'd nodes would never benefit from caching.
There's probably a better solution. The best approach I can think of is to respond to a change in the file system (somehow), and then flush the cache entry for any corresponding nodes. Or, use Rules + Cache Actions to flush your node caches periodically, or in response to an action.
I think this is a good problem to solve in the File Tree module itself, so I'm re-opening this issue. Keep the ideas coming.
Comment #6
RSpliet commentedIn a way I (personally) think there are little use cases that involve a cacheable file tree. In no case I can think of you would want a deleted file to be available in your system as a link. As for adding files that should be visible from date X: there are different methods to achieve a behaviour like that. Must that be filetree's responsibility? I would say there is no problem with filetree-ified nodes never being cached.
I'm pretty sure we cannot count on triggers based on file system changes ever happening in Drupal or PHP in general. Leaves timed events or uncached data as options.
For timed events (and similarly for filesystem events) the main problem is finding the corresponding nodes. This probably must be achieved by checking every entity (not just nodes?) you can think of, and then some more, every time the event occurs. Checking millions of entities for a filetree I would say is infeasible and undesired. Alternative would be keeping an index, but that would not get updated on a node delete thus would gather superfluous data over time.
As for disabling the caching altogether selectively: Indeed many people will probably enable filetree in their default (admins) text format, because they have no reason not to. From a technical point of view there is however nothing wrong with defining a "special" text format for filetree, and only using that when you actually want to add a filetree. Main problem then becomes telling the admins/users that this is preferred and even important for caching reasons.
Comment #7
joelstein commentedThanks for your feedback.
Though it's not easy, there do exist ways to make your file system respond to changes and trigger Drupal to do stuff (such as trigger a URL specially designed to flush the cache).
Or, looking at it another way, we could make File Tree "know" which files are associated with which entities, periodically scan the file system to check for changes, and flush the appropriate caches when needed. Yes, building that cache would take awhile the first time we do it, but it's very manageable with a batch job. And we can simply respond to entity insertions, updates, and deletions to keep the File Tree mapping up-to-date. (Yes, we can respond to a node delete... there's a hook for that.) I believe Hotfolder does folder monitoring. Perhaps we can implement this within FIle Tree, or see if File Tree and something like Hotfolder could work well together.
However, I'm not going to make File Tree a cached filter. Disabling the cache will introduce a performance problem. I don't want to trade one problem for another. If you don't want to cache your nodes, there are ways to accomplish this.
I really think we can solve this problem, without introducing more problems. Let's give more thought to the solution I outlined here, or feel free to suggest another.
Comment #8
RSpliet commentedYou're welcome :-).
I think the first thing we need to consider is the following. What would take more time: "rebuilding" an entity, or checking (part of) the filesystem for changes? Of course this depends a lot on the size of both the entity and the file system. However, for a larger number of folders and files (and I'm thinking in hundreds right now) I feel that checking the fs for changes is a more intensive job than rebuilding a large entity (is 50 fields large?). Checking whether the filesystem has changed is at least as time-consuming as running the "build filetree" routine, given it does nearly the same task complemented with database access. I (clearly) can not back this up with actual data though, it's merely an assumption based on the task of "storing a cache of the entire tree, and comparing it periodically".
If you agree on this, we can reduce this problem to "flushing caches for entities that contain a filetree periodically". All we need then is a list of entities (bundle_type and id?) that contain one or more file trees, carefully administered by all the involved hooks. Of course this also required a little cron job, and perhaps a way of controlling how often this job is performed. For that last bit there might be modules doing that though.
If you do wish to check if a filesystem has changed, I propose to have a table with bundle_type, id, requested path and a specially crafted hash. This hash should be generated by the file- and folder names with their modified date. An SHA1sum of a string like this should suffice. Or do you think the module should account for the odd case of collision? And do you think it's possible to cover all the corner cases (multiple filetrees in one entity? In one field? Path/definition changing on edit? One of them in one field removed on edit, other one doesn't?)
As for not making the filter uncached: there's always the "make it optional" alternative (with "advanced help" documentation as a clear warning). That is, if the value for hook_filter_info() isn't cached (and... on second thought, it's pretty likely Drupal caches it), If not, it leaves controls entirely at the website maintainer, and still is the easiest workaround. However, I cannot judge the impact of this as I don't have much sight on the use cases. How many accesses a minute on a large website to one of these entities? How many entities are we talking about in a typical "large website" tops?
Food for thought...
Comment #9
RSpliet commentedA little amendment on the earlier proposal to store a hash to check if the filesystem has changed. It's probably a lot easier to just store the highest modified date you can find in the folder. Saves a lot of time and cycles, while avoiding any possible collisions.
Comment #10
RSpliet commentedI have solved this issue by creating the "Node cache expire" module. This sandbox project can be found here. It allows users to set an expiry time for node types by using the node type settings form.
Please note that this module will likely only work with MySQL and MariaDB, as it uses the "replace into" SQL statement.