Duplicate entry error for paths containing trailing spaces in redirect_404 table [#3201335]

Comment	File	Size	Author
#38	redirect-3201335-unique-key-3.patch	1.91 KB	loopy1492
#32	redirect_404_mysql8.png	88.55 KB	sonnykt
#7	Screenshot 2021-08-15 at 11.51.21.png	50.61 KB	darrenwh

Comment #1

3 March 2021 at 08:46

volkerk created an issue. See original summary.

Log in or register to post comments

Comment #2

arjunonemail@gmail.com commented 1 April 2021 at 08:56

I am also having a similar kind of issue When I am importing the Database.

ERROR 1062 (23000) at line 48278: Duplicate entry '/' -en' for key 'PRIMARY'

Due to this import failed.

Log in or register to post comments

Comment #3

rmitt commented 8 April 2021 at 06:31

I have same issue when importing mysql 5.7 database dump to mysql 8.

Log in or register to post comments

Comment #4

aluzzardi

Pelotas, RS

commented 23 April 2021 at 15:09

I have a similar issue with strings containing emojis:

the database saved 2 times:
/kurse/outdoor-kurse?

but for the cms shows:
/kurse/outdoor-kurse?
and:
/kurse/outdoor-kurse🆕

when I try to import I have and duplicate error.
My suggestion is to remove the idea of do primary key being only path and langcode, and add an id together, that will avoid errors errors when we search for a string with emoji or whitespaces.

Log in or register to post comments

Comment #5

joe huggans

English

Kiel, Germany

commented 24 May 2021 at 10:57

Also seeing this issue.

Log in or register to post comments

Comment #6

bjackson commented 24 May 2021 at 21:32

I have a client that is also encountering this issue when trying to restore a dump from a 5.7 MySQL server to 8.0:

ERROR 1062 (23000) at line 6465: Duplicate entry 'xxxxx' for key 'redirect_404.PRIMARY'

Entry ID intentionally xxx'd out. Anyone have a workaround?

Log in or register to post comments

Comment #7

darrenwh commented 15 August 2021 at 10:52

Priority:

Normal

» Major

Status	File	Size
new	Screenshot 2021-08-15 at 11.51.21.png	50.61 KB

I can't restore backups of my database because of this issue, escalating to Major

Error

Log in or register to post comments

Comment #8

darrenwh commented 15 August 2021 at 11:16

As a interim fix when you export your database add the extra flag --insert-ignore to the dump command

drush sql-dump --extra-dump='--no-tablespaces --insert-ignore' | gzip > ~/dbbacks/site_db.$(date +%Y-%m-%d-%H%M).sql.gz

The code should be fixed to prevent these records being entered into the database.

Log in or register to post comments

Comment #9

marianojp commented 24 September 2021 at 12:55

@darrenwh this solution works for me. Thanks for the awesome flag tips

Log in or register to post comments

Comment #10

joegl commented 27 April 2022 at 20:24

Stuck on this. We don't create the database dumps ourselves and can't use the workarounds provided.

Log in or register to post comments

Comment #11

joegl commented 28 April 2022 at 16:56

I believe the issue we are having is due to Pantheon using MariaDB 10.4. We are importing the database into MySQL 8 locally.

Log in or register to post comments

Comment #12

sharmaraunak1998 commented 28 July 2022 at 12:26

@joegl,
I am stuck in this situation.
Did you get any solution?

Log in or register to post comments

Comment #13

joegl commented 29 July 2022 at 14:52

I did not have a solution, however as a workaround I've been using the -f parameter when importing the database locally. This will keep importing and display the errors but ignore them. Usually there's only 3-4 errors related to this and everything else works fine. It's not ideal but it works. E.g.,

gunzip < database.sql.gz | mysql -u dbuser -p -f dbname

Log in or register to post comments

Comment #14

loopy1492 commented 10 February 2023 at 16:55

Similar error on both sql-sync and fin db import on our site.

> ERROR 1062 (23000) at line 25846: Duplicate entry '/consumer/ca_ren.htm-en' for key 'redirect_404.PRIMARY'

I ran sql-sync with --extra-dump='--no-tablespaces --insert-ignore' but got:
> ERROR 1227 (42000) at line 18: Access denied; you need (at least one of) the SUPER, SYSTEM_VARIABLES_ADMIN or SESSION_VARIABLES_ADMIN privilege(s) for this operation

Clearly I do not have permissions for on our shared hosting environment.

At any rate, I went into admin/config/search/redirect/404 and actually searched for this redirect. I was shocked/not shocked to see that there actually were two of these in there.

Apparently, this module uses the url as the unique key for this field instead of a unique ID. That probably should be changed; either that or some kind of check needs to be performed that denies the user the creation of a duplicate entry.

Log in or register to post comments

Comment #15

loopy1492 commented 10 February 2023 at 18:40

So, it seems that telling the module to do nothing with one of these urls does not delete it from the table. Instead, it marks it as resolved. Interestingly, when you mark one as resolved, that stops both it and the duplicate from displaying on the page. You cannot resolve the other one. It wouldn't matter anyways because both are still in the database table. Time to break out the big guns:

fin drush @oursite.env sql-query "SELECT * FROM redirect_404 WHERE path LIKE '%consumer\/ca_ren%'"

The response:

/consumer/ca_ren.htm en 1 1675924164 0 0
/consumer/ca_ren.htm en 1 1675924164 1 0

Okay!

fin drush @oursite.env sql-query "DELETE FROM redirect_404 WHERE path LIKE '%consumer\/ccra_registration.htm%

Of course, when I do that, ANOTHER site comes up as a duplicate. What's a boy to do? Oh right....

fin drush @oursite.env sql-query "DELETE FROM redirect_404" && echo "GET REKT!"

Now I can sql-sync without error.

Log in or register to post comments

Comment #16

joegl commented 10 February 2023 at 18:46

Apparently, this module uses the url as the unique key for this field instead of a unique ID. That probably should be changed; either that or some kind of check needs to be performed that denies the user the creation of a duplicate entry.

I am guessing the issue has to do with collation of a database and how it stores unique characters. Because if the unique key is the URL, then duplicate entries by definition shouldn't exist. The only reason they exist is because they're not duplicates, and one is probably using a special character of some kind. The problem is when the database gets dumped and imported, somewhere along that process the unique character which makes two entries separate gets stripped or modified and thus the entries now match and are considered duplicates.

In my case, it was because the difference in MySQL versions between the production site and my local where I was importing and how certain characters were treated between those versions.

As a crude, made-up example, let's say on production there are two entries like this:
1. /path/my-path
2. /path/my-path😀

These are considered unique paths and then stored in the database. Then let's say I export the DB and import into a database that doesn't allow emoji's. So when importing, it strips out the emoji from example #2, and now I'm left with two "DUPLICATE ENTRIES".

Log in or register to post comments

Comment #17

berdir

German

Switzerland

commented 4 March 2023 at 10:17

Priority:	Major	» Normal
Status:	Active	» Postponed (maintainer needs more info)

I don't see what this module could change to fix that, as #16 explains, this is a collation problem and different databases with different settings.

If you get this on local environments for development, it's perfectly fine to just ignore this table unless you for some reason want to work specifically on this. On staging/test environments, you likely want to ensure that you use the same collation.

Log in or register to post comments

Comment #18

joelpittet

he/him

English

Vancouver

commented 10 March 2023 at 18:39

Status:

Postponed (maintainer needs more info)

» Active

@Berdir, we are running into this, the path is used as an index, and MySQL 8 changes what can be in an index and seems to normalize the values where MySQL 5.7 didn't care that there was some weird character some hacker tried to put in the path should also be an index.

Maybe we can either remove the index/primary key on path, or apply a filter to remove characters that can't be real paths as per some RFC (didn't google yet)

Log in or register to post comments

Comment #19

joelpittet

he/him

English

Vancouver

commented 10 March 2023 at 18:56

Some examples of paths where this MySQL 8 key collision occurred

/programs/vancouver）
/programs/vancouver)

/∼user/demos/mdp/vi.html
/~user/demos/mdp/vi.html

/user/demos/mdp/vi.html
/user/demos/mdp/vi.html

Can you spot the funny characters?

Edit: The last one doesn't show on the page but looks like this in sublime text <0x18>

Log in or register to post comments

Comment #20

berdir

German

Switzerland

commented 10 March 2023 at 21:31

> that can't be real paths as per some RFC (didn't google yet)

Well, they can be in the path or the server wouldn't be able to receive and log them.

But yeah, fair enough, I'm open to trimming and otherwise removing characters, but not sure how easy that part is, there's plenty of international characters that might actually be used in an URL.

Log in or register to post comments

Comment #21

joelpittet

he/him

English

Vancouver

commented 13 March 2023 at 20:30

@Berdir, how about URL encoding them?

There may be some that are worth stripping, I'll look around and see if there is a spec to reference.

Log in or register to post comments

Comment #22

joelpittet

he/him

English

Vancouver

commented 13 March 2023 at 20:38

This is a really good answer, but still trying to grok how to implement it. https://stackoverflow.com/a/36667242/80281 Basically what you said written as RFC 3987 is out-of-date from reality.

Log in or register to post comments

Comment #23

joelpittet

he/him

English

Vancouver

commented 28 March 2023 at 19:30

Issue tags:

+Needs tests

We should have tests to show a sampling of characters that should work. And maybe if they are valid characters for URLs we ensure they save properly in MySQL 8.0?
Like maybe we don't index the path, which would be a performance issue I imagine.

There is probably a nice middle ground here... but lets start with test coverage to set the expectations.

Log in or register to post comments

Comment #24

sseto commented 22 May 2023 at 20:19

I'm also getting this from trying to dump from 5.7 to 8.0. Is it safe to just delete the data from the table `redirect_404` and mysqldump that way?

Log in or register to post comments

Comment #25

loopy1492 commented 25 August 2023 at 12:16

Just got it on another site. Need to copy the db to a different environment upstream, and wipe the redirect_404 table before I can download it and use it locally.

Log in or register to post comments

Comment #26

taran2l

Ukrainian

Lviv

commented 18 December 2023 at 16:27

URL encoding should be the way here, as @joelpittet suggested

Log in or register to post comments

Comment #27

scott_earnest commented 19 December 2023 at 16:55

We have been copying our database to a lower environment, then running a database update, then sql-sync to local.

This command will clear the problem records:
drush @stage sql-query "DELETE FROM redirect_404 WHERE path LIKE '%my-duplicate-key-redirect%'"

Or we have dropped the PRIMARY KEY, this also works:
drush @stage sql-query 'ALTER TABLE redirect_404 DROP PRIMARY KEY'

This is our current workaround.

Log in or register to post comments

Comment #28

tedfordgif commented 24 January 2024 at 16:17

Issue tags:

+MySQL 8

Log in or register to post comments

Comment #29

tedfordgif commented 26 January 2024 at 18:06

Adding core issue for reference on handling case sensitivity, but with only one mention of "collation".

Log in or register to post comments

Comment #30

tedfordgif commented 26 January 2024 at 18:43

Another approach might be to simply diverge from core's handling of URLs, and make redirect_404 everything-sensitive, e.g. throw 'binary' on the 'path' field in the redirect_404 schema. That would be unexpected for users/devs, and is probably not correct, but would have the same effect as the URL-encoding suggestion (absent normalization).

This is not the right ticket to discuss it, but another place we end up with inconsistencies is that public files don't get served by Drupal, but the 404 page for the same does. Should core have a redirect if the file_managed uri matches the path modulo collation? That would also mean fewer redirect_404 records would get created.

The issue with allowing the database collation to determine matches is that we may want different behavior in different circumstances. For example, if a single character in /some/long-URL-slüg has a diacritic mark like that, we probably want to serve the same page, especially if the canonical path has the diacritic and the request path does not. However, in a short url slug, such as you might see on /a/view-page-with/tags/slüg, some languages probably have word collisions we wouldn't want. Maybe the correct approach is to always have unambiguous short slugs (add context to the "machine name", e.g. garden_slug).

Log in or register to post comments

Comment #31

loopy1492 commented 16 April 2024 at 13:18

Still busted.

Log in or register to post comments

Comment #32

sonnykt

Melbourne, Australia

commented 13 August 2024 at 10:51

Status	File	Size
new	redirect_404_mysql8.png	88.55 KB

The issue is still occurring with MySQL 8:

MySQL [v8032]> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.32    |
+-----------+
1 row in set (0.000 sec)

MySQL [v8032]> SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8mb4                  | utf8mb4_0900_ai_ci   |
+--------------------------+----------------------+
1 row in set (0.001 sec)

MySQL [v8032]> SHOW CREATE TABLE redirect_404\G
*************************** 1. row ***************************
       Table: redirect_404
Create Table: CREATE TABLE `redirect_404` (
  `path` varchar(191) NOT NULL COMMENT 'The path of the request.',
  `langcode` varchar(12) CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT 'und' COMMENT 'The language of this request.',
  `count` int unsigned NOT NULL DEFAULT '0' COMMENT 'The number of requests with that path and language.',
  `timestamp` int unsigned NOT NULL DEFAULT '0' COMMENT 'The timestamp of the last request with that path and language.',
  `resolved` int NOT NULL DEFAULT '0' COMMENT 'Boolean indicating whether or not this path has a redirect assigned.',
  `daily_count` int unsigned NOT NULL DEFAULT '0' COMMENT 'The number of requests with that path and language in a day.',
  PRIMARY KEY (`path`,`langcode`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='Stores 404 requests.'
1 row in set (0.001 sec)

#*** one trailing space 0x20
MySQL [v8032]> INSERT INTO redirect_404 VALUES ('/test-trailing ','en',0,0,0,0);
Query OK, 1 row affected (0.004 sec)

#** two trailing spaces 0x20
MySQL [v8032]> INSERT INTO redirect_404 VALUES ('/test-trailing  ','en',0,0,0,0);
Query OK, 1 row affected (0.002 sec)

#** one trailing nbsp 0xC2A0
MySQL [v8032]> INSERT INTO redirect_404 VALUES ('/test-trailing ','en',0,0,0,0);
ERROR 1062 (23000): Duplicate entry '/test-trailing -en' for key 'redirect_404.PRIMARY'

MySQL [v8032]> SELECT "a " = "a  ";
+--------------+
| "a " = "a  " |
+--------------+
|            1 |
+--------------+
1 row in set (0.001 sec)

MySQL [v8032]> SELECT "a " = "a ";
+--------------+
| "a " = "a "  |
+--------------+
|            0 |
+--------------+
1 row in set (0.001 sec)

The special character is 0xC2A0 (UTF-8 non-breaking space).
redirect_404_mysql8

Log in or register to post comments

Comment #33

joao.ramos.costa commented 11 September 2024 at 09:11

Faced the same issue with mysql 8.

ERROR 1062 (23000) at line 11049: Duplicate entry '/mypage page/example/1.pdf-en' for key 'redirect_404.PRIMARY'

Log in or register to post comments

Comment #34

loopy1492 commented 7 October 2024 at 15:22

With Acquia moving over to mysql 8 soon, folks on that platform may want to petition Acquia to contribute to this module.

Log in or register to post comments

Comment #35

loopy1492 commented 26 November 2024 at 15:57

still busted

Log in or register to post comments

Comment #36

29 November 2024 at 12:21

loopy1492 opened merge request !126

Log in or register to post comments

Comment #37

loopy1492 commented 29 November 2024 at 14:52

While I've been developing a solution, I can confirm @joao.ramos.costa that we're getting this issue still with upstream MySQL 8. For some reason, locally with Docksal running mysql 8.0.28, it thinks the full-width kana characters are the same as the medium width ones.

Log in or register to post comments

Comment #38

loopy1492 commented 29 November 2024 at 19:55

Status	File	Size
new	redirect-3201335-unique-key-3.patch	1.91 KB

I started by changing how the URLs were stored, but then I realized we'd have to change how they were accessed as well. Different characters that happen to look the same are still different after all. I didn't want to risk the possibility that there were intentional uses of these characters.

I went back to my original thought that we simply need a proper unique key. The merge request/patch creates an update that creates a new id field, assigns an id value to all the existing 404 entries, then sets that ID as the primary key, then sets the value as auto-incrementing.

Log in or register to post comments

Comment #39

loopy1492 commented 3 December 2024 at 12:33

Testing steps:

Deploy the previous tag to dev
Copy the database and files from prod to dev
Fire up a local site. You will might get a database sync error if your local build includes a sql-sync. That is fine.
Now, on the upstream DEV SITE, log into the remote site
From the admin screen, paste these urls into the address bar and visit them one at a time:

/example/path/
/example/path
/example%2Fpath
/example%20path/
/example%20path
/example path
/example/path）
/example/path)
/example）/path
/example)/path
/∼example/path
/~example/path

Note that the two parentheses are different… U+FF09 and U+0029
Note that the two tildes are different as well… U+223C and U+007E
Check the ./admin/config/search/redirect/404 page. There should be a bunch of new records in there.
Deploy the branch with the patch to the dev server
Run drush cr && drush updb -y on the dev server
Test to ensure sql-sync works now:
- Now run drush sql-sync @thesite.dev @self -y
- You should not get a database sync error – it should successfully import the database.
Ensure the path still increments when you hit it multiple times:
- Locally, run drush sql-query "DESCRIBE redirect_404;", then…
- Run drush sql-query "select * from redirect_404;"
- Make note of the count (the third field).
- Hit /~example/path followed by running drush cr several times. We have to clear cache every time we hit the url because cache protects the resource from incrementing after the first hit.
- Run drush sql-query "select * from redirect_404;" and check to make sure that path has incremented.
Ensure redirects for that path still work
- Go to the /admin/config/search/redirect/404 page.
- Create a redirect from /~example/path to a page that actually exists on the site.
- Try going to /~example/path again in the browser. It should redirect you.

Log in or register to post comments

Comment #40

loopy1492 commented 3 December 2024 at 12:39

Status:

Active

» Needs review

Log in or register to post comments

Comment #41

joao.ramos.costa commented 3 December 2024 at 16:06

Hi @loopy1492, I will soon test your steps, although I think we should add them as kernel/unit tests. As for the table auto-incremental unique id, here's my two cents: Seems indeed like the most wise choice; otherwise, we might get lost in the infinite creativity of character combinations.

Log in or register to post comments

Comment #42

grayle commented 7 May 2025 at 13:01

Besides any possible performance issues with joins which should be investigated, and writing an update hook to update the existing column, adding `binary` to the hook schema should "fix it" no?

https://www.drupal.org/node/1237252

https://git.drupalcode.org/project/redirect/-/blob/8.x-1.x/modules/redir...

That will result in binary collation. So if the database default charset is utf8 the collation would essentially be utf8_bin. If the charset is utf8mb4 the collation would then be utf8mb4_bin, etc.

Log in or register to post comments

Duplicate entry error for paths containing trailing spaces in redirect_404 table

Problem/Motivation

Proposed resolution

Issue fork redirect-3201335

Comments

Related issues

Referenced by