I ran into issue when source url was like this: en/products/tshirts/black+and+white/ from old site and was need to redirect to new store. But there were problem with this because Drupal received url without these "+" just with spaces between. Therefore hash that is stored in DB did not matched with source urls hash..
I created really quick and easy solution in patch, everyone welcome to create better and more Drupal way solution. --This has been already improved.
Problem/Motivation
The Redirect module fails to match redirects when the source URL contains encoded characters such as
+, %20, or other special characters.
When a request comes in for a URL like en/products/tshirts/black+and+white/, Drupal decodes
the URL before processing, converting + to spaces. This causes a hash mismatch between the
hash stored in the database (generated from the encoded source path) and the hash generated from the
decoded incoming request. As a result, the redirect is never matched and the user receives a 404.
This affects any site migrating from a legacy platform where old URLs contain encoded characters that
need to redirect to new paths.
Steps to reproduce
- Create a redirect with source path
products/tshirts/black+and+white - Visit
/products/tshirts/black+and+white/in the browser - Expected: User is redirected to the configured destination
- Actual: The redirect is not matched and the user gets a 404 page
Proposed resolution
Normalize the source path using urldecode() before generating the hash in
Redirect::generateHash(). This ensures the stored hash matches the hash generated from the
decoded incoming request.
A post-update hook is included to regenerate hashes for existing redirects that contain encoded
characters.
Remaining tasks
- Fix PHPCS failures in
src/RedirectRepository.phpfrom href="https://git.drupalcode.org/issue/redirect-2884630/-/pipelines/585283/fai...">pipeline #585283:- Line 266: Missing newline after closing brace
(Drupal.ControlStructures.ControlSignature.NewlineAfterCloseBrace) - Line 307: Missing blank line after last function
(Squiz.WhiteSpace.FunctionSpacing.AfterLast) - Line 308: Missing empty line before class closing brace
(Drupal.Classes.ClassDeclaration.CloseBraceAfterBody)
All 3 are auto-fixable via
phpcbf. - Line 266: Missing newline after closing brace
- Review and test the latest patch (#36) against redirect 1.12.x
- Address performance concerns for sites with large numbers of redirects during the post-update hash
regeneration - Ensure no duplicate hash collisions after normalization
User interface changes
None.
API changes
Redirect::generateHash() now applies urldecode() to the source path before
generating the hash.
Data model changes
No schema changes. Existing redirect hash values are regenerated via a post-update hook to reflect the
new normalized hash calculation.
| Comment | File | Size | Author |
|---|---|---|---|
| #36 | redirect-support-encoded-urls-2884630-36.patch | 7.26 KB | esod |
| #32 | redirect-support-encoded-urls-2884630-32.patch | 6.39 KB | cmd87 |
| #31 | reroll_diff_25_31.patch | 1.26 KB | richardbporter |
| #31 | redirect-support-encoded-urls-2884630-31.patch | 6.4 KB | richardbporter |
| #25 | interdiff_20-25.txt | 3.38 KB | codebymikey |
Issue fork redirect-2884630
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
amaisano commentedSame issue (I think):
Source (old site):
/files/directory/file%20name.pdf
To:
/sites/default/files/new-file.pdf
The %20 is messing things up in the source - I'm getting a 404 when trying to test the redirect.
Comment #3
amaisano commentedSince this affects %20, the above patch doesn't work. I will try and create a patch that covers both situations!
Comment #4
amaisano commentedHere is one solution.
Given the following params...
the vanilla Redirect code generates a hash of:
_i_oMuvBlA4Qsn-VEClBOj-_Tt8iMZsvRENKZyg3s7E.This however does not match the hash generated by trying to compare/find a match, which ends up being:
m42bDy29-bg4mbmKGJfTZoNlpFbnuS031Wb63k-n2Zo.With this new patch, the hash that is initially stored/created in the database now matches the lookup comparison hash in these situations.
The other solution would to do the inverse, and change the lookup/comparison hash check to *en*code the path prior to hash generation.
Comment #5
amaisano commented@maris.abols
If you substitute
rawurldecodewithurldecode, it will also fix situations with "+" spaces, like your sample URL. I did not include that however because of RFC 3986, which is the current standard. The issue with this is there may be non-conformant encoding/decoding going on elsewhere in the Redirect module or in core.Comment #6
idebr commentedComment #8
idebr commentedReroll against 8.x-1.x after #2995075: Unicode methods are deprecated in Drupal 8.6.0 was committed.
Comment #9
berdirShouldn't be hard to extend our test coverage to cover this?
Comment #10
akshayadhavPatch #8 works as expected.
Successfully redirecting URL aliases with spaces to mentioned URLs.
Steps:
1. Applied patch.
2. Checked the URL redirection.
3. The redirection was not working. It was showing 404 page.
4. Deleted the old redirect.
5. Cleared cache.
6. Added URL redirect again.
And now the URL redirection for URL with spaces works.
Comment #11
paulocsIt needs work because of #9.
Comment #12
paulocsBtw I'm not sure if it works because
rawurldecode()does not decode '+' into spaces.See: https://www.php.net/manual/en/function.rawurldecode.php#refsect1-function.rawurldecode-notes
Comment #13
paulocsHere is a patch that proves that the solution #8 does not work. The tests will fail.
Comment #14
paulocsHere is a patch that will work using
urldecode()Comment #16
marcos_lima commentedHi everyone. I did a review of patch #14. I reproduced the error before applying it, using spaces in the redirect, both with %20 on the source and with a + on it. Then I applied the patch and tested the redirects, as well as created new ones to test. It worked well in all cases.
I think it is important to state that the old redirects (created before the patch) won’t automatically work. You have to update them by going to the edit page and then saving, or deleting and creating them again.
Then I did a code review of the changes in the redirect entity class and in the test, and everything looks good to me.
Comment #17
paulocsComment #18
berdirCan we include the functional test from #2364881: Redirect not working when path contains a (+) symbol here (note that it needs some coding standard improvements). And maybe test some other special characters too. This looks like something that could result in regressions.
Comment #19
paulocsAddressed #18.
Comment #20
hchonovre-roll only.
Comment #21
lucienchalom commentedi will review it
Comment #22
lucienchalom commentedI reviewed checking the CS, no new problem added.
just like #10 and #16, creating redirects before and after the patch.
tested with space, + and %20 in both fields To and From, and all work fine after the patch.
As was said on #16 the ones created before the path need to be saved again, deleted, or edited.
maybe add a note about it on the page update?
Anyway, all looks good
Comment #25
codebymikey commentedAddressing #22, I've added a post update hook which regenerates the redirect hash for existing content.
It also includes a helper static function which can be reused for future updates to the redirect hash.
Comment #26
codebymikey commentedComment #27
codebymikey commentedComment #28
paulocsLooks good @codebymikey!
Thanks!
Comment #29
dpagini commented+1 for this patch, it solves the same problem I'm facing... Url's in the "From" field with a special character, like `path/to/a%20file.pdf` won't work.
This issue has been RTBC for almost a year now, is there anything left the maintainers need on this issue? I see that, it appears, the last requested changes by Berdir were completed, so I think this just needs eyes again from an maintainer.
Comment #30
berdirI'm concerned about the upgrade path. I maintain sites with tens and hundreds of thousands of redirects, a forced upgrade path that will recalculate the hash for all of them is going to take time and could possibly mean quite a long downtime for a site of that size. What about adding a button to the settings page to do recalculate hashes, and the update function could just add a message that people might want to run that? We could reuse that if other changes need to happen.
The cache tag handling is also not correct, the list cache tag is mostly just for render caching, the entity storage will still cache those entries, should either invalidate them all through EntityStorageInterface::resetCache() or alternatively the specific entities that are being changed here.
this isn't testing much because it would also pass on HEAD, it's just testing that the hash is generated. We probably want to hardcode the hash for that case.
Comment #31
richardbporter commentedRerolled #25 for 8.x-1.9.
Comment #32
cmd87 commentedUploading the patch for version 8.x-1.10
Comment #33
dark05 commentedThanks for patch! It works for me.
Comment #34
joao.ramos.costa commentedHi @cmd87, I tend to agree in what @berdir concerns, also I'd like to add that currently the upgrade doesn't take care when a given hash already exists.
Comment #36
esod commentedHere's a patch reroll for Drupal 10.5.x, redirect v1.12.
Comment #37
anicoto