Problem/Motivation

ERROR 1062 (23000) at line 16479: Duplicate entry '/newsletter -de' for key 'PRIMARY'

Duplicate entry error for paths containing trailing spaces in redirect_404 table when importing a database dump from amazon rds into mariadb/mysql.

Documentation for Mariadb / mysql database states behavior for varchar keys https://mariadb.com/kb/en/varchar/
If a unique index consists of a column where trailing pad characters are stripped or ignored, inserts into that column where values differ only by the number of trailing pad characters will result in a duplicate-key error

Proposed resolution

Escape path value before doing the upsert on the redirect_404 table

Issue fork redirect-3201335

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

volkerk created an issue. See original summary.

arjunonemail@gmail.com’s picture

I am also having a similar kind of issue When I am importing the Database.

ERROR 1062 (23000) at line 48278: Duplicate entry '/' -en' for key 'PRIMARY'

Due to this import failed.

rmitt’s picture

I have same issue when importing mysql 5.7 database dump to mysql 8.

aluzzardi’s picture

I have a similar issue with strings containing emojis:

the database saved 2 times:
/kurse/outdoor-kurse?

but for the cms shows:
/kurse/outdoor-kurse?
and:
/kurse/outdoor-kurse🆕

when I try to import I have and duplicate error.
My suggestion is to remove the idea of do primary key being only path and langcode, and add an id together, that will avoid errors errors when we search for a string with emoji or whitespaces.

joe huggans’s picture

Also seeing this issue.

bjackson’s picture

I have a client that is also encountering this issue when trying to restore a dump from a 5.7 MySQL server to 8.0:

ERROR 1062 (23000) at line 6465: Duplicate entry 'xxxxx' for key 'redirect_404.PRIMARY'

Entry ID intentionally xxx'd out. Anyone have a workaround?

darrenwh’s picture

Priority: Normal » Major
StatusFileSize
new50.61 KB

I can't restore backups of my database because of this issue, escalating to Major

Error

darrenwh’s picture

As a interim fix when you export your database add the extra flag --insert-ignore to the dump command

drush sql-dump --extra-dump='--no-tablespaces --insert-ignore' | gzip > ~/dbbacks/site_db.$(date +%Y-%m-%d-%H%M).sql.gz

The code should be fixed to prevent these records being entered into the database.

marianojp’s picture

@darrenwh this solution works for me. Thanks for the awesome flag tips

joegl’s picture

Stuck on this. We don't create the database dumps ourselves and can't use the workarounds provided.

joegl’s picture

I believe the issue we are having is due to Pantheon using MariaDB 10.4. We are importing the database into MySQL 8 locally.

sharmaraunak1998’s picture

@joegl,
I am stuck in this situation.
Did you get any solution?

joegl’s picture

I did not have a solution, however as a workaround I've been using the -f parameter when importing the database locally. This will keep importing and display the errors but ignore them. Usually there's only 3-4 errors related to this and everything else works fine. It's not ideal but it works. E.g.,

gunzip < database.sql.gz | mysql -u dbuser -p -f dbname

loopy1492’s picture

Similar error on both sql-sync and fin db import on our site.

> ERROR 1062 (23000) at line 25846: Duplicate entry '/consumer/ca_ren.htm​-en' for key 'redirect_404.PRIMARY'

I ran sql-sync with --extra-dump='--no-tablespaces --insert-ignore' but got:
> ERROR 1227 (42000) at line 18: Access denied; you need (at least one of) the SUPER, SYSTEM_VARIABLES_ADMIN or SESSION_VARIABLES_ADMIN privilege(s) for this operation

Clearly I do not have permissions for on our shared hosting environment.

At any rate, I went into admin/config/search/redirect/404 and actually searched for this redirect. I was shocked/not shocked to see that there actually were two of these in there.

Apparently, this module uses the url as the unique key for this field instead of a unique ID. That probably should be changed; either that or some kind of check needs to be performed that denies the user the creation of a duplicate entry.

loopy1492’s picture

So, it seems that telling the module to do nothing with one of these urls does not delete it from the table. Instead, it marks it as resolved. Interestingly, when you mark one as resolved, that stops both it and the duplicate from displaying on the page. You cannot resolve the other one. It wouldn't matter anyways because both are still in the database table. Time to break out the big guns:

fin drush @oursite.env sql-query "SELECT * FROM redirect_404 WHERE path LIKE '%consumer\/ca_ren%'"

The response:

/consumer/ca_ren.htm en 1 1675924164 0 0
/consumer/ca_ren.htm​ en 1 1675924164 1 0

Okay!

fin drush @oursite.env sql-query "DELETE FROM redirect_404 WHERE path LIKE '%consumer\/ccra_registration.htm%

Of course, when I do that, ANOTHER site comes up as a duplicate. What's a boy to do? Oh right....

fin drush @oursite.env sql-query "DELETE FROM redirect_404" && echo "GET REKT!"

Now I can sql-sync without error.

joegl’s picture

Apparently, this module uses the url as the unique key for this field instead of a unique ID. That probably should be changed; either that or some kind of check needs to be performed that denies the user the creation of a duplicate entry.

I am guessing the issue has to do with collation of a database and how it stores unique characters. Because if the unique key is the URL, then duplicate entries by definition shouldn't exist. The only reason they exist is because they're not duplicates, and one is probably using a special character of some kind. The problem is when the database gets dumped and imported, somewhere along that process the unique character which makes two entries separate gets stripped or modified and thus the entries now match and are considered duplicates.

In my case, it was because the difference in MySQL versions between the production site and my local where I was importing and how certain characters were treated between those versions.

As a crude, made-up example, let's say on production there are two entries like this:
1. /path/my-path
2. /path/my-path😀

These are considered unique paths and then stored in the database. Then let's say I export the DB and import into a database that doesn't allow emoji's. So when importing, it strips out the emoji from example #2, and now I'm left with two "DUPLICATE ENTRIES".

berdir’s picture

Priority: Major » Normal
Status: Active » Postponed (maintainer needs more info)

I don't see what this module could change to fix that, as #16 explains, this is a collation problem and different databases with different settings.

If you get this on local environments for development, it's perfectly fine to just ignore this table unless you for some reason want to work specifically on this. On staging/test environments, you likely want to ensure that you use the same collation.

joelpittet’s picture

Status: Postponed (maintainer needs more info) » Active

@Berdir, we are running into this, the path is used as an index, and MySQL 8 changes what can be in an index and seems to normalize the values where MySQL 5.7 didn't care that there was some weird character some hacker tried to put in the path should also be an index.

Maybe we can either remove the index/primary key on path, or apply a filter to remove characters that can't be real paths as per some RFC (didn't google yet)

joelpittet’s picture

Some examples of paths where this MySQL 8 key collision occurred

/programs/vancouver)
/programs/vancouver)

/∼user/demos/mdp/vi.html
/~user/demos/mdp/vi.html

/user/demos/mdp/vi.html
/user/demos/mdp/vi.html

Can you spot the funny characters?

Edit: The last one doesn't show on the page but looks like this in sublime text <0x18>

berdir’s picture

> that can't be real paths as per some RFC (didn't google yet)

Well, they can be in the path or the server wouldn't be able to receive and log them.

But yeah, fair enough, I'm open to trimming and otherwise removing characters, but not sure how easy that part is, there's plenty of international characters that might actually be used in an URL.

joelpittet’s picture

@Berdir, how about URL encoding them?

There may be some that are worth stripping, I'll look around and see if there is a spec to reference.

joelpittet’s picture

This is a really good answer, but still trying to grok how to implement it. https://stackoverflow.com/a/36667242/80281 Basically what you said written as RFC 3987 is out-of-date from reality.

joelpittet’s picture

Issue tags: +Needs tests

We should have tests to show a sampling of characters that should work. And maybe if they are valid characters for URLs we ensure they save properly in MySQL 8.0?
Like maybe we don't index the path, which would be a performance issue I imagine.

There is probably a nice middle ground here... but lets start with test coverage to set the expectations.

sseto’s picture

I'm also getting this from trying to dump from 5.7 to 8.0. Is it safe to just delete the data from the table `redirect_404` and mysqldump that way?

loopy1492’s picture

Just got it on another site. Need to copy the db to a different environment upstream, and wipe the redirect_404 table before I can download it and use it locally.

taran2l’s picture

URL encoding should be the way here, as @joelpittet suggested

scott_earnest’s picture

We have been copying our database to a lower environment, then running a database update, then sql-sync to local.

This command will clear the problem records:
drush @stage sql-query "DELETE FROM redirect_404 WHERE path LIKE '%my-duplicate-key-redirect%'"

Or we have dropped the PRIMARY KEY, this also works:
drush @stage sql-query 'ALTER TABLE redirect_404 DROP PRIMARY KEY'

This is our current workaround.

tedfordgif’s picture

Issue tags: +MySQL 8
tedfordgif’s picture

Adding core issue for reference on handling case sensitivity, but with only one mention of "collation".

tedfordgif’s picture

Another approach might be to simply diverge from core's handling of URLs, and make redirect_404 everything-sensitive, e.g. throw 'binary' on the 'path' field in the redirect_404 schema. That would be unexpected for users/devs, and is probably not correct, but would have the same effect as the URL-encoding suggestion (absent normalization).

This is not the right ticket to discuss it, but another place we end up with inconsistencies is that public files don't get served by Drupal, but the 404 page for the same does. Should core have a redirect if the file_managed uri matches the path modulo collation? That would also mean fewer redirect_404 records would get created.

The issue with allowing the database collation to determine matches is that we may want different behavior in different circumstances. For example, if a single character in /some/long-URL-slüg has a diacritic mark like that, we probably want to serve the same page, especially if the canonical path has the diacritic and the request path does not. However, in a short url slug, such as you might see on /a/view-page-with/tags/slüg, some languages probably have word collisions we wouldn't want. Maybe the correct approach is to always have unambiguous short slugs (add context to the "machine name", e.g. garden_slug).

loopy1492’s picture

Still busted.

sonnykt’s picture

StatusFileSize
new88.55 KB

The issue is still occurring with MySQL 8:

MySQL [v8032]> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.32    |
+-----------+
1 row in set (0.000 sec)

MySQL [v8032]> SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8mb4                  | utf8mb4_0900_ai_ci   |
+--------------------------+----------------------+
1 row in set (0.001 sec)

MySQL [v8032]> SHOW CREATE TABLE redirect_404\G
*************************** 1. row ***************************
       Table: redirect_404
Create Table: CREATE TABLE `redirect_404` (
  `path` varchar(191) NOT NULL COMMENT 'The path of the request.',
  `langcode` varchar(12) CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT 'und' COMMENT 'The language of this request.',
  `count` int unsigned NOT NULL DEFAULT '0' COMMENT 'The number of requests with that path and language.',
  `timestamp` int unsigned NOT NULL DEFAULT '0' COMMENT 'The timestamp of the last request with that path and language.',
  `resolved` int NOT NULL DEFAULT '0' COMMENT 'Boolean indicating whether or not this path has a redirect assigned.',
  `daily_count` int unsigned NOT NULL DEFAULT '0' COMMENT 'The number of requests with that path and language in a day.',
  PRIMARY KEY (`path`,`langcode`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='Stores 404 requests.'
1 row in set (0.001 sec)

#*** one trailing space 0x20
MySQL [v8032]> INSERT INTO redirect_404 VALUES ('/test-trailing ','en',0,0,0,0);
Query OK, 1 row affected (0.004 sec)

#** two trailing spaces 0x20
MySQL [v8032]> INSERT INTO redirect_404 VALUES ('/test-trailing  ','en',0,0,0,0);
Query OK, 1 row affected (0.002 sec)

#** one trailing nbsp 0xC2A0
MySQL [v8032]> INSERT INTO redirect_404 VALUES ('/test-trailing ','en',0,0,0,0);
ERROR 1062 (23000): Duplicate entry '/test-trailing -en' for key 'redirect_404.PRIMARY'

MySQL [v8032]> SELECT "a " = "a  ";
+--------------+
| "a " = "a  " |
+--------------+
|            1 |
+--------------+
1 row in set (0.001 sec)

MySQL [v8032]> SELECT "a " = "a ";
+--------------+
| "a " = "a "  |
+--------------+
|            0 |
+--------------+
1 row in set (0.001 sec)

The special character is 0xC2A0 (UTF-8 non-breaking space).
redirect_404_mysql8

joao.ramos.costa’s picture

Faced the same issue with mysql 8.

ERROR 1062 (23000) at line 11049: Duplicate entry '/mypage page/example/1.pdf-en' for key 'redirect_404.PRIMARY'

loopy1492’s picture

With Acquia moving over to mysql 8 soon, folks on that platform may want to petition Acquia to contribute to this module.

loopy1492’s picture

still busted

loopy1492’s picture

While I've been developing a solution, I can confirm @joao.ramos.costa that we're getting this issue still with upstream MySQL 8. For some reason, locally with Docksal running mysql 8.0.28, it thinks the full-width kana characters are the same as the medium width ones.

loopy1492’s picture

StatusFileSize
new1.91 KB

I started by changing how the URLs were stored, but then I realized we'd have to change how they were accessed as well. Different characters that happen to look the same are still different after all. I didn't want to risk the possibility that there were intentional uses of these characters.

I went back to my original thought that we simply need a proper unique key. The merge request/patch creates an update that creates a new id field, assigns an id value to all the existing 404 entries, then sets that ID as the primary key, then sets the value as auto-incrementing.

loopy1492’s picture

Testing steps:

  • Deploy the previous tag to dev
  • Copy the database and files from prod to dev
  • Fire up a local site. You will might get a database sync error if your local build includes a sql-sync. That is fine.
  • Now, on the upstream DEV SITE, log into the remote site
  • From the admin screen, paste these urls into the address bar and visit them one at a time:

/example/path/
/example/path
/example%2Fpath
/example%20path/
/example%20path
/example path
/example/path)
/example/path)
/example)/path
/example)/path
/∼example/path
/~example/path

  • Note that the two parentheses are different… U+FF09 and U+0029
  • Note that the two tildes are different as well… U+223C and U+007E
  • Check the ./admin/config/search/redirect/404 page. There should be a bunch of new records in there.
  • Deploy the branch with the patch to the dev server
  • Run drush cr && drush updb -y on the dev server
  • Test to ensure sql-sync works now:
    • Now run drush sql-sync @thesite.dev @self -y
    • You should not get a database sync error – it should successfully import the database.
  • Ensure the path still increments when you hit it multiple times:
    • Locally, run drush sql-query "DESCRIBE redirect_404;", then…
    • Run drush sql-query "select * from redirect_404;"
    • Make note of the count (the third field).
    • Hit /~example/path followed by running drush cr several times. We have to clear cache every time we hit the url because cache protects the resource from incrementing after the first hit.
    • Run drush sql-query "select * from redirect_404;" and check to make sure that path has incremented.
  • Ensure redirects for that path still work
    • Go to the /admin/config/search/redirect/404 page.
    • Create a redirect from /~example/path to a page that actually exists on the site.
    • Try going to /~example/path again in the browser. It should redirect you.
loopy1492’s picture

Status: Active » Needs review
joao.ramos.costa’s picture

Hi @loopy1492, I will soon test your steps, although I think we should add them as kernel/unit tests. As for the table auto-incremental unique id, here's my two cents: Seems indeed like the most wise choice; otherwise, we might get lost in the infinite creativity of character combinations.

grayle’s picture

Besides any possible performance issues with joins which should be investigated, and writing an update hook to update the existing column, adding `binary` to the hook schema should "fix it" no?

https://www.drupal.org/node/1237252

https://git.drupalcode.org/project/redirect/-/blob/8.x-1.x/modules/redir...

That will result in binary collation. So if the database default charset is utf8 the collation would essentially be utf8_bin. If the charset is utf8mb4 the collation would then be utf8mb4_bin, etc.