Hi Guys

After working with WebFM for a while, I've come to realize the the biggest downside it has for me is the negative effect it has on sites performance. First because WebFM sends a no-cache header and second because of the time spent by webfm processing the request.

for sites where you have 1 or 2 files here and there there is no problem, but if you use the module a lot, this becomes a problem.

I created an input filter that will change this:
wenfm_send/123
to
path/to/file.ext
On any "img" or "a" tags.

This is brilliant because it lets you keep using the great darg'n'drop functionality of WebFM, while avoiding the extra code form being run.
And well, it is WAAAAY faster when the server serves the file directly.

On some sites that I've implemented this already, page load speed have been reduced in more than 4 seconds, from 7 seconds to 2.

This off course will not work for sites that use WebFM for file access permissions.

I'm attaching the files here for now. IIf you want to take over this and include it on WebFM and give some love to it let me know, if not, I will create a new project in drupal and make it its own module.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

nhck’s picture

Category: task » feature

Hello jm.federico,

thank you for posting this and helping to make webfm better.

As far as I understand you are saying your installation is faster when linking to img files and the like directly? I understand and that might be correct. From my point of view: Isn't it a bit overload to first create webfm_send/XX just to afterwards filter to get the "real" path back? In my humble opinion it would be much cleaner to have an option to bypass webfm_send in those cases, wouldn't it?

Again thank you.

jm.federico’s picture

Hi!

Not really. One of the reason I use WebFM a lot is because it lets my clients manage their assets (files) easier, and lets them move them around. And with this filter you get the best of both worlds:

  1. Really nice file management
  2. freedom to reorganize files

Scenarios where files might be moved around, and it wouldn't be possible without webfm:

  • I know as a fact that very few companies have a strong assets management policy, so ppl just upload files with no real knowledge of where they should be, so later one someone will move them around just because they were uploaded to the wrong folder.
  • A company might change their content structure, and that might imply re-organizing their assets, what a pain in the ass it would be to do it without webfm_send/123 as the original file path/link. With this module, all links would get updated immediately, with a great advantage over modules like http://drupal.org/project/pathfilter, where they use [internal:123] because the original path would still work even if the filter fails
  • So, yes, this is not for everyone, but from my experience, it is more common to see sites where you need a good asset management system rather than a file access control. That's where this module is just perfect.

    I use webfm everywhere, it is just so much better than any other asset manager out there, but for high traffic sites it is just a no-go.

    I guess my point is:

    WebFM + MyPatch = Very flexible asset management + brilliant performance.

    Cheers

nhck’s picture

Status: Needs review » Needs work

Okay, I understand now - sounds good. Thanks for providing this and helping to make webfm better.

Now it would be nice if you could fullfill the coding standards and the file types this is applied to should really be choosable from a backend.

jm.federico’s picture

Sure, I will review the code, make sure coder does not complaint and add comments where appropriate.

Will be back in a few days.

jm.federico’s picture

For the original module I used the DOM extension, it makes things a bit easier, but I'm changing it to regex.

Thing is the DOM extension was giving me more trouble than easy, and lately I've been reading (yeah!) about REGEX. I think this works well.

Any objections? opinions?

Will attach module in couple of days.

webservant316’s picture

coolness, thanks for this module addition to webfm.
can't wait to see if it solves my slow pdf loading problem.

nhck’s picture

Dear jim.federico,

thank you for continuing your work on this.

In my opinion Regex is better than the dom solution, because it seems more generic. You should try if it also works if you use pathauto with different webfm-path-patterns.

Thank you for working on this.

jm.federico’s picture

Right, code attached.

It uses regex, it does not work with pathauto aliases. In the help I recommend using it before pathologic if pathauto is aliasing the paths.

I have never really worked with tokens, and because the aliases can be so incredibly diverse, I wouldn't know where to start.

There is one BIG TODO and is create the setting form where the user can select which tags should be filtered. For now it filters "a" and "img" tags and will only change the content from "href" and "src" properties, any other property stays unchanged (e.g. alt, title).

jm.federico’s picture

Status: Needs work » Needs review
nhck’s picture

FileSize
3.01 KB

rather upload as *.patch for commenting with dreditor.

nhck’s picture

      $help =  '<p>' . t('Substitutes WebFM paths (webfm_send) for real server paths on any tag that includes "href" or "src" attributes.') . '</p>';
      $help .= '<p>' . t('e.g. &lt;a href="/webfm_path/123" alt="a webfm link"&gt; : &lt;a href="/base/path/path/to/file.ext" alt="a webfm link"&gt;') . '</p>';

Make this one t() function - otherwise its impossible to translate this.

      $help .= '<p>' . t('If you use "<a href="http://drupal.org/project/pathologic">Pathologic</a>" in conjunction with "<a href="http://drupal.org/project/pathauto">Pathauto</a>" to create webfm aliases, it is recommended to run it after WebFM Path Filter or it will fail to replace aliased paths.') . '</p>';

I don't think we should make pathlogic mandatory with pathauto. We should use drupal_get_normal_path.

            $escaped_base_path = preg_quote(base_path(),'/');

add drupal_get_normal_path here?

Also in my humble opinion the logic should be some what different: Why don't we do a string match on the normal_path or somehow else check if its a webfm_send path. If it is we should just replace the url that carries webfm_send with the real url.

Powered by Dreditor.

cgmonroe’s picture

A couple of top of mind thoughts:

The related security issues with this module need to be documented. E.g., this will probably only work if you have not set up a .htaccess file to deny direct access to the files. A lot of people use Web_FM with .htaccess deny settings because even if it's slow, it's secure. As opposed to normal attachments, which anyone with the URL can access directly without logging on. I.e. security by obscurity = no security...

Bottom line is that people need to consider their security requirements before using this. But as a separate module, they have the option of weighing what they are protecting vs performance. It just needs to be clear that better performance = less secure.

Also, can this be limited to different "role stores"? E.g., I might want to use this for files related to the anonymous or authenticated roles, but not for files related to an internal role.

jm.federico’s picture

FileSize
2.76 KB

Hello,

About pathologic, I wasn't making it mandatory at all. It is just that If it is being used, it will change non-aliased paths to aliased paths; without using drupal_get_normal_path() it wold cause webfm_pf to not find the match. But with drupal_get_normal_path() all solved.

Now, it might be a good idea to suggest the use of pathologic, not make it mandatory, but suggest it. Badly formatted links and img-src are common and pathologic is just brilliant at fixing most of the problems.

I'm not mentioning it on the help anymore, will leave at your discretion.
I would put something like:
Using pathologic with this filter could improve the results. If using it, please make sure this filter runs after pathologic.

Right, after some good thinking I have this solution, me like it:

  1. REGEX parses any tag we want, so far "a" and "img" are hard-coded.
  2. Get the path part and remove domain and base_path if present
  3. Pass path through drupal_get_normal_path()
  4. Check if we have a webfm_send/123 path
  5. Get real file path
  6. Replace and return

Comments?

jm.federico’s picture

@cgmonroe
Hum, saw your post after posting mine.

Yeap, security is a concern, will include in documentation.

As for how we limit which files we change, there are limitless possibilities, after all we are getting a file object from webfm which (I haven't check) I guess includes plenty of info about the file. It is possible to do some checks and limit the files using any info provided by webfm. NICE!!!

But I will leave that for later, first I want to make sure the Regex works flawlessly. Once that part is covered we can extend the module more and more.

pillarsdotnet’s picture

FileSize
3.49 KB

#13 as a patch.

notasheep’s picture

subscribing

pillarsdotnet’s picture

Re-rolled #15 according to current patch standards. No code changes; needs testing.

webservant316’s picture

trying to get this patch to install and learn how to use it. any help?
I have copied in the code and enabled the sub-module.
however, the code doesn't appear to be called in any circumstance.
Do I need to configure it somewhere?
Also my webfm root folder is 'webfm' in my sites/files directory.

webservant316’s picture

I would love to use the patch above, but couldn't figure out how to get it to work.
So I just did this instead...

>>added to line 2814 of webfm.module

   //WEBSERVANT316 - Direct access hack
   if ( stripos($f->fpath,'authenticated') == FALSE ) {
        // direct access to all my webfm files, unless in the 'authenticated' folder
 	global $base_url;
 	header('Location: ' . $base_url . '/' . $f->fpath);
}