If I have a link to an image derivative (images that get created by applying the effects of an image style) and the derivative has not yet been created, Pathologic handles the link as if it is a Drupal link and adds the language prefix.

This error won't be noticed easily because these links arrive at Drupal and are handled by the image module (because the language part is stripped off by then). The image module will detect that the image already exists and will deliver it. However, this leads to a full Drupal bootstrap and less optimized file transfer then when the call would have been handled directly by the web server.

I think that everything under sites/default/files (or actually the directory the public:\\ scheme points to) should be handled as a file, as not only the image module generates files under that directory (aggregated css and js, js language files). Note that also links to the private file system should be seen as files, though in that case you will always get a full Drupal bootstrap to determine access rights and transfer the file from a directory (that is hopefully outside the Drupal root, outside the document root and even not reachable for the webserver (via aliases)).

Note that the idea of user azinck in #1763696: Paths outside of Drupal root incorrectly detected to use menu_get_item() to see if a link points to a file is (further) invalidated by this issue.

Comments

Garrett Albright’s picture

Priority: Normal » Major

The problem with treating a path like it's a file is that, on servers not using clean URLs, the path won't get the ?q= parameter added to the URL. Which is generally fine and dandy, except that now there's no way for Drupal to notice that the derivative needs to be created, and any query to that path will result in a 404 from the daemon. Welcome to Pathologicville; Population 100,000 Edge Cases.

I suppose a fix for Pathologic could be to not pass a language to url() if it looks like the path that will be created will be under the public files directory. There could theoretically be edge cases where an image derivative filter could be language-dependent, but that would probably be pretty rare…

For now, if you're really worried about this and have some development chops, you could implement hook_pathologic_alter() to tell Pathologic to stop adding language prefixes to file paths, at least until this unmotivated developer gets around to fixing it properly. See the pathologic.api.php file.

fietserwin’s picture

I'll try to have a look at it. I will assign the issue to myself when I really start working on it.

Garrett Albright’s picture

Issue summary: View changes
Status: Active » Postponed (maintainer needs more info)

Currently, Pathologic tries to be smarter about not trying to add a language prefix if it looks like the path is to a file. Is this still a problem?

fietserwin’s picture

Version: 7.x-2.x-dev » 7.x-3.x-dev
Status: Postponed (maintainer needs more info) » Needs review
StatusFileSize
new1.52 KB
new5.42 KB

OK, I gave it a try as I ran again into this problem. Attached patch solves the problems I had with the current dev version of pathologic:

1 patch is just for this issue, the "full" patch solves:
- this issue.
- contains the most recent patch (and thus solves) as in #348421: Multilingual links with Pathologic.
- solves something that, I think, resembles or is equal to #2071695: local URLs don't work when Drupal is in a subdirectory.

To explain the last point:

  • my settings for "also consider local"
/mysite/
/
  
  • I work locally on http://localhost/mysite
  • So the first line is not added to 'local_paths_exploded' in the foreach loop ...
  • but is added after the loop.
  • When matching the path with local exploded paths, a path like "/mysite/sites/default/files/styles/..." now matches the "/" first (instead of "/mysite/") and is thus further processed as "mysite/sites/default/files/styles/..." instead of "sites/default/files/styles/..." which leads to failures further on, especially with the 'is_file' setting.
  • The problem is thus that due to the giant if statement the order is not preserved. To solve, I think it is better that the order of local paths (as configured) should not be changed so that site managers can put sub-paths before parent paths (this is a bit difficult to program.

    To not add the current paths twice, I added some variables that keeps track of that, though I doubt if it really matters (besides a slightly longer matching loop), it would further clean up though.

    fietserwin’s picture

    StatusFileSize
    new1.52 KB

    Attached a rerolled patch (only linenumbers and a some doc punctuation and only or this issue, not a combined patch like above).

    Furthermore, I think that #1 is not true:
    - Links to image derivatives are always generated like sites/default/files/styles/large/my-image.jpg. This to prevent a full Drupal bootstrap when the file already exists. this is also on systems that do not use clean URL's.
    - When the image derivative (or any other file in the public files directory that gets created on first access) gets requested and the file does not yet exist, Drupal gets bootstrapped and the function drupal_environment_initialize() will assign the q parameter. See that function and the comment in .htaccess:

      # Pass all requests not referring directly to files in the filesystem to
      # index.php. Clean URLs are handled in drupal_environment_initialize().
    

    Thus, Drupal will create the derivative, even on non clean URL systems.

    Therefore this patch will work on both clean and non-clean URL systems. So, I would like to ask you to consider it for committing.

    fietserwin’s picture

    The bug mentioned in comment #4 has been filed separately under #2545396: Pathologic fails to correctly recognize the current local path when on a sub-directory.

    herved’s picture

    StatusFileSize
    new1.56 KB

    Hello,

    Thanks for the patch.
    I have an issue (after a migration) where all paths going to files in content are as follows: sites/default/files/...
    However my public directory ended up being sites/something/files/... which I didn't expect.

    So additionally to checking if the path starts with the public files directory I suggest to apply a regular expression on the path:
    something like preg_match('|^sites/.+?/files/.+|i', $path)

    It's an edge case but this doesn't hurt and is quite safe.
    What do you think?

    Thank you,
    Hervé

    fietserwin’s picture

    Doesn't seem to be a good idea to me.

    The public files directory is a system variable and doesn't have to comply to your regexp at all. Might it be that this setting was corrupted during your migration? Or did you go from a multi to a single site install?

    The image derivatives get created using the value of the public file system variable and all other modules that do crate files that have to be accessible should do so as well. If not the error is on their side.

    So let's stay with using the public file directory.

    If you insist and want to check for both: I don't get the ? in the middle of the regexp and the .+ (or at least the +) at the end seem superfluous (you don't have an "end of string" marker).

    Garrett Albright’s picture

    Status: Needs review » Closed (won't fix)

    I don't get the ? in the middle of the regexp

    It's the non-greedy operator. It stops the . from matching the /files/ that follows. For example, if you have /a(.+)c/ and you're matching against "abcc", the capture will be "bc", but if you change it to /a(.+?)c/ the capture will be just "b" because the regex engine sees that the first "c" matches the next part of the pattern, so it doesn't let the dot be greedy and match it. (It's so totally awesome how regex has different meanings for its own special characters depending on their position like this, right?)

    That being said, I side with fietserwin here - we can't assume that a site's old files directory was "sites/something/files", even though that will be the case 95% of the time at least with a D6 or D7 site. For your case, implementing hook_pathologic_alter() would probably be the best approach. See the pathologic.api.php file in the Pathologic module directory for more info - it's pretty straightforward.

    herved’s picture

    Hello and thanks for you explanation.
    I'm coming from a single and going to a multisite instance.

    Indeed hook_pathologic_alter() would be a good solution for me but I don't see any way to mark the url as a file.
    Any suggestion are welcome.

    Garrett Albright’s picture

    Status: Closed (won't fix) » Needs work

    Hmm, yeah, there's no way to alter "this is a file" from an alter implementation, is there? That's an oversight. Whelp, still interested in giving us a patch? :P

    fietserwin’s picture

    @herved: it seems you have a problem that is bigger than just this issue. You have content pointing to sites/default/files/... but the resources are not there anymore and won't be created there anymore, so I think you better change all your content (export sql dump, use text editor to search and replace, import sql dump) to point to the new public files directory and move all you files over there. Maybe you could also just create a link at the file system level.

    herved’s picture

    Hello and thanks for your time.

    I finally decided to do it as follows:

    /**
     * Implements hook_pathologic_alter().
     */
    function mymodule_pathologic_alter(&$url_params, $parts, $cached_settings) {
      if (preg_match('|^/?sites/.+?/files/.+|i', $url_params['path'])) {
        // Rewrites sites/default/files in paths to our public folder.
        $stream = file_stream_wrapper_get_instance_by_scheme('public');
        $dir = $stream->getDirectoryPath();
        $url_params['path'] = str_replace('sites/default/files', $dir, $url_params['path']);
    
        // We're linking to a file, use a fake LANGUAGE_NONE language object.
        $url_params['options']['language'] = (object) array('language' => LANGUAGE_NONE, 'prefix' => '');
      }
    }
    

    I saw the LANGUAGE_NONE trick above is used in pathologic.module:386 (on version 7.x-3.1) so I used the same approach for now but tagging an url as a file (using 'is_file') would be the best. I'll try to find some time after new year to submit a patch for this.

    Eventually I should indeed replace all occurrences of sites/default/files in DB. But that's actually another issue (on my side).

    Happy new year ;)

    fietserwin’s picture

    This almost sounds like you would be better off writing your own custom filter, I guess the boiler plate code shouldn't be too difficult nor the filter itself (that would even be very easy). But if this is temporary, so you can continue working on your site, this solution will do as well.

    fietserwin’s picture

    Status: Needs work » Needs review

    So, this issue is back to Needs review for the patch in #5.

    Garrett Albright’s picture

    fietserwin, could you step me through some steps with which to test your patch? Give me an exact scenario which is broken before the patch and fixed after it.

    fietserwin’s picture

    Environment:

    • multilingual site
    • pathologic installed and enabled
    1. Upload an image or have a link to an image available.
    2. Create some content with an image tag linking to a not yet existing image style derivative. E.g.<img alt="" class="image-thumbnail" src="/drupal7-test/sites/default/files/styles/thumbnail/public/field/image/20120914_151628.jpg?itok=Pkq1OKdU" height="113" width="150">
    3. View content and view source of it: <img alt="" class="image-thumbnail" src="/drupal7-test/en/sites/default/files/styles/thumbnail/public/field/image/20120914_151628.jpg?itok=Pkq1OKdU" height="113" width="150"> Note the .../en/...
    4. Remove created image derivative.
    5. Install patch.
    6. Clear all caches.
    7. View content again and view source of it: <img alt="" class="image-thumbnail" src="/drupal7-test/sites/default/files/styles/thumbnail/public/field/image/20120914_151628.jpg?itok=Pkq1OKdU" height="113" width="150"> Note that the .../en/... is gone.

    Thus this patch tries to prevent adding a language prefix to links that point to an image derivative even if that image derivative has not yet been created.

    Notes:

    • The site will seem to work correctly as the image gets created and showed in the browser. But as the filter result gets cached, subsequent page views will result in an additional Drupal bootstrap to process the request to .../en/sites/default/files/styles/... as this is not pointing to an existing file and Drupal will serve the file contents instead of Apache.
    • IMO, given that the Drupal public files directory is only meant for all kinds of files to be accessed directly by the public and is never meant for paths to be processed by Drupal (unless it is for file creation) a language prefix should never be added to links pointing to that part of the website.

    Garrett Albright’s picture

    Status: Needs review » Fixed

    Okay, thanks for that - that helped. I could replicate the problem, though I couldn't entirely test the solution because I couldn't figure out a simple way for Drupal to give me the path of an image derivative with a correct "itok" parameter. I could see where Pathologic stopped affecting the URL, though.

    Patch in #5 accepted and pushed. Thanks!

    fietserwin’s picture

    Thanks for adding this patch.

    BTW: You can test image derivatives without itok by adding this line to your settings.php:
    $conf['image_allow_insecure_derivatives'] = TRUE;

    Status: Fixed » Closed (fixed)

    Automatically closed - issue fixed for 2 weeks with no activity.