Provide ways to clean up old honeypot_time_restriction values [#2997609]

In #2931611: Honeypot key value entries never expire (effectively), a bug which allowed an infinite number of honeypot_time_restriction entries to be written was fixed. However, for any Drupal 8 sites running Honeypot prior to the version with that fix, there are likely at least hundreds, if not thousands or more rows in the database that are set to never expire.

An update hook should be written which deletes all entries without an expiration set. The update hook could take a long time to run for some sites with a huge number of entries, so it may need to process them using the batch api.

Comment	File	Size	Author
#57	2997609-test-only.patch	993 bytes	claudiu.cristea
#57	2.0.x: PHP 7.3 & MySQL 5.7, D9.3.17 20 pass
#43	batch-operation.png	18.21 KB	pfrenssen
#43	confirm-form.png	36.35 KB	pfrenssen
#43	status-report.png	25.99 KB	pfrenssen
#26	2997609-26.patch	1.94 KB	Manuel Garcia
#26	8.x-1.x: PHP 7 & MySQL 5.5, D8.9 17 pass
#26	interdiff-2997609-25-26.txt	1.17 KB	Manuel Garcia
#25	honeyput_key_value_expire_cleanup-2997609-25.patch	2.04 KB	marthinal
#25	8.x-1.x: PHP 7 & MySQL 5.5, D8.9 17 pass
#17	honeyput_key_value_expire_cleanup-2997609-17.patch	2.56 KB	Spokje
#17	8.x-1.x: PHP 7 & MySQL 5.5, D8.8 16 pass
#16	honeypot_key_value_expire_cleanup-2997609-16.patch	2.63 KB	Spokje
#16	8.x-1.x: PHP 7 & MySQL 5.5, D8.8 Patch Failed to Apply
#14	honeyput_key_value_expire_cleanup-2997609-14.patch	2.58 KB	Spokje
#14	8.x-1.x: PHP 7 & MySQL 5.5, D8.8 16 pass
#11	2997609-11.patch	781 bytes	Manuel Garcia
#11	8.x-1.x: PHP 7 & MySQL 5.5, D8.8 22 pass
#11	interdiff-2997609-10-11.txt	265 bytes	Manuel Garcia
#10	2997609-10.patch	869 bytes	Manuel Garcia
#10	8.x-1.x: PHP 7 & MySQL 5.5, D8.8 22 pass
#10	interdiff-2997609-7-10.txt	743 bytes	Manuel Garcia
#7	2997609-7.patch	876 bytes	Manuel Garcia
#7	8.x-1.x: PHP 7 & MySQL 5.5, D8.8 22 pass

Issue fork honeypot-2997609

Show commands

Start within a Git clone of the project using the version control instructions.

Add & fetch this issue fork’s repository

Or, if you do not have SSH keys set up on git.drupalcode.org:

Add & fetch this issue fork’s repository

2997609-delete-expired-records changes, plain diff MR !2
Check out this branch

About issue forks

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

5 September 2018 at 20:24

geerlingguy created an issue. See original summary.

Comment #2

C-Logemann

German

Frankfurt/M, Germany

CreditAttribution: C-Logemann at Nodegard, paratio.com commented 11 January 2019 at 13:10

We have a customer system where we already have more than 6,7 million honeypot entries blowing up the DB with some GB. I will create a direct database command to delete. But I agree that there should be a solution for all the unneeded honeypot data on all systems.

Comment #3

geerlingguy CreditAttribution: geerlingguy at Midwestern Mac, LLC commented 11 January 2019 at 21:30

@C_Logemann - Thanks for the information! Once you have that DB query, could you share it here in case someone else needs to do the same thing until this update hook is written?

Comment #4

C-Logemann

German

Frankfurt/M, Germany

CreditAttribution: C-Logemann at Nodegard, paratio.com commented 12 January 2019 at 09:46

@geerlingguy Yes I already planned to share my SQL-Snippit when it's tested.
What's your Idea for the clean up at the update hook? Maybe also a database call for deleting all honeypot entries with too long time to expiration in key_value_expirable? This is something I can develop on upcoming Drupal Global Contribution Weekend January 25-27, 2019.

Comment #5

C-Logemann

German

Frankfurt/M, Germany

CreditAttribution: C-Logemann at Nodegard, paratio.com commented 12 January 2019 at 13:42

All following SQL-Codes are wrapped in drush 8 commands.

It's good to find out how many table rows you have to decide what to do next. Maybe we can also analyze the data in the update hook and maybe do not delete all at once if the value is to high (see below):

drush sql:query 'SELECT collection, COUNT(*) AS num FROM key_value_expire WHERE collection = "honeypot_time_restriction";'

Because all "expire" values are identical in my case I removed an additional condition to filter time based. But if you like to operate more granular you can calculate a unix timestamp and use it in this way:

drush sql:query 'SELECT collection, COUNT(*) AS num FROM key_value_expire WHERE collection = "honeypot_time_restriction" AND expire > "1147483647";'

Before you decide which strategy you want to use to delete it's better to test it because it can take a long time to delete and this can maybe affect the normal usage of your system.
In our case the following command was running some minutes. So I decided to not use on the production system but it was helpful to use on the dev system to analyze the data. Because of this performance impact I think we need to warn people if we put this in an update hook. It should only be used with drush without php time limit and decoupled from SSH connection (screen, tmux, byobu ...):

drush sql:query 'DELETE FROM key_value_expire WHERE collection = "honeypot_time_restriction";'

After analyzing the data which were not deleted with the command above I decided to truncate the complete table because this was running much faster and was freeing disk space very fast.
(Edit:) Maybe take a look at this Post in "Drupal Answers": https://drupal.stackexchange.com/questions/221985/what-is-the-purpose-of....

But this is nothing for an update hook and should only be used if you can loose other data in the table "honeypot_time_restriction" which is not from the honeypot module:

drush sql:query 'TRUNCATE TABLE key_value_expire;'

Comment #6

Anybody

German

Porta Westfalica

CreditAttribution: Anybody at DROWL.de commented 2 April 2019 at 08:28

Whao key_value_expire grew up to 2,3 GB with honeypot entries on a customer site... So this is still quite important. ;)

Comment #7

Manuel Garcia CreditAttribution: Manuel Garcia as a volunteer commented 18 April 2019 at 18:08

Status:

Active

» Needs review

File	Size
2997609-7.patch	876 bytes
8.x-1.x: PHP 7 & MySQL 5.5, D8.8 22 pass

Had a go at this, turns out we cant do it in batches, as far as I know - though let me know if you have ideas:

key_value_expire does not have any id-like columns
you cant use range/limit on delete queries

So here is the update hook deleting all entries with an expiry date older than what we set by default. I also removed an unused use statement while I was at it :)

Comment #8

Manuel Garcia CreditAttribution: Manuel Garcia as a volunteer commented 18 April 2019 at 18:09

+++ b/honeypot.install
@@ -73,3 +73,14 @@ function honeypot_update_8100() {
+    ->where('collection = honeypot_time_restriction AND expire > ' . \Drupal::time()->getRequestTime() - 3600*24)

actually, this should be a < I believe

Comment #9

geerlingguy CreditAttribution: geerlingguy at Midwestern Mac, LLC commented 19 April 2019 at 15:04

Status:

Needs review

» Needs work

It looks like this adds 3 new coding standards issues... https://www.drupal.org/pift-ci-job/1265461

Comment #10

Manuel Garcia CreditAttribution: Manuel Garcia as a volunteer commented 19 April 2019 at 20:06

Status:

Needs work

» Needs review

File	Size
interdiff-2997609-7-10.txt	743 bytes
2997609-10.patch	869 bytes
8.x-1.x: PHP 7 & MySQL 5.5, D8.8 22 pass

Addressing #8 & #9

Comment #11

Manuel Garcia CreditAttribution: Manuel Garcia as a volunteer commented 19 April 2019 at 20:08

File	Size
interdiff-2997609-10-11.txt	265 bytes
2997609-11.patch	781 bytes
8.x-1.x: PHP 7 & MySQL 5.5, D8.8 22 pass

Forgot one.

Comment #12

Anybody

German

Porta Westfalica

CreditAttribution: Anybody at DROWL.de commented 20 April 2019 at 11:59

Wonderful, @Manuel Garcia, RTBC+1 for #11. Works perfectly for me. Thanks!!

Comment #13

C-Logemann

German

Frankfurt/M, Germany

CreditAttribution: C-Logemann at Nodegard, paratio.com commented 25 April 2019 at 07:45

Status:

Needs review

» Needs work

The current patch has no solution to handle situations where we have to many wrong entries in database. There is a big risk that this update hook can fail in such situations especially when triggered by update.php. So I strongly recommend to add a strategy for communication with site admin before starting a big cleanup. Maybe it's better to move the cleanup thing away from update hook to main code with an UI e.g. a warning and a corresponding "cleanup link" on the status page.

Comment #14

Spokje

Dutch

CreditAttribution: Spokje at Annertech commented 17 June 2019 at 12:30

Status:

Needs work

» Needs review

File	Size
honeyput_key_value_expire_cleanup-2997609-14.patch	2.58 KB
8.x-1.x: PHP 7 & MySQL 5.5, D8.8 16 pass

Added a patch that uses the Batch API deleting the incorrect/never expiring entries for honeypot.

If there are a _lot_ of those entries, this update will take a _long_ time, but it _shouldn't_ take down the site doing the cleanup.

I think the decision to do this to the UI as suggested in #13 could be better, we (with around 20 D8 sites) managed to get away with the batched hook_update_N approach as in the patch, since we didn't manage a D8 site with "mega enormous" amounts of the mentioned entries

Comment #15

Spokje

Dutch

CreditAttribution: Spokje at Annertech commented 17 June 2019 at 13:04

4 files were hidden/shown/deleted

File	Size
interdiff-2997609-7-10.txt	743 bytes
2997609-10.patch	869 bytes
8.x-1.x: PHP 7 & MySQL 5.5, D8.8 22 pass
interdiff-2997609-10-11.txt	265 bytes
2997609-11.patch	781 bytes
8.x-1.x: PHP 7 & MySQL 5.5, D8.8 22 pass

Hiding previous patches, since they _will_ take a site down due too DB becoming unresponsive upon running update.

Comment #16

Spokje

Dutch

CreditAttribution: Spokje at Annertech commented 17 June 2019 at 14:08

File	Size
honeypot_key_value_expire_cleanup-2997609-16.patch	2.63 KB
8.x-1.x: PHP 7 & MySQL 5.5, D8.8 Patch Failed to Apply

2 files were hidden/shown/deleted

File	Size
2997609-7.patch	876 bytes
8.x-1.x: PHP 7 & MySQL 5.5, D8.8 22 pass
honeyput_key_value_expire_cleanup-2997609-14.patch	2.58 KB
8.x-1.x: PHP 7 & MySQL 5.5, D8.8 16 pass

Note to self: first make sure you uploaded the correct patch, and then click 'Save'...

Also another extra reason to do this in an UI operation: Since triggering a hook_update (at least through the command line with drush) puts the site automatically in Maintenance Mode, and this can be a _long_ update, the Maintenance Mode "down time" can be more than just annoying.

Comment #17

Spokje

Dutch

CreditAttribution: Spokje at Annertech commented 17 June 2019 at 14:19

File	Size
honeyput_key_value_expire_cleanup-2997609-17.patch	2.56 KB
8.x-1.x: PHP 7 & MySQL 5.5, D8.8 16 pass

1 file was hidden/shown/deleted

File	Size
honeypot_key_value_expire_cleanup-2997609-16.patch	2.63 KB
8.x-1.x: PHP 7 & MySQL 5.5, D8.8 Patch Failed to Apply

Sorry, incorrect index line in previous patch

Comment #18

geerlingguy CreditAttribution: geerlingguy at Midwestern Mac, LLC commented 17 June 2019 at 16:43

The proper way to do this cleanup is via update hook, though—if we don't automate it or leave it to the site owner to do it, in 90%+ of cases it will never get done... I haven't had issues with batch API on a well-tested update hook on many different sites, often operating on hundreds of thousands or millions of database rows.

Maybe I'm missing something major, but what is the main risk with this being in an update hook?

Since triggering a hook_update (at least through the command line with drush) puts the site automatically in Maintenance Mode, and this can be a _long_ update, the Maintenance Mode "down time" can be more than just annoying.

Is that true? I don't believe running updates via drush up or even via Drupal's update.php ever puts the site into maintenance mode, unless you explicitly do that _before_ the updates...

Comment #19

Spokje

Dutch

CreditAttribution: Spokje at Annertech commented 17 June 2019 at 17:30

@geerlingguy

If we don't automate it or leave it to the site owner to do it, in 90%+ of cases it will never get done

QFT

I haven't had issues with batch API on a well-tested update hook on many different sites, often operating on hundreds of thousands or millions of database rows.

Maybe I'm missing something major, but what is the main risk with this being in an update hook?

Same here, I just did an update with the patch in #18 to wipe out around 525.000 rows, took around 10 minutes, site behaved normally during the whole time.

Is that true? I don't believe running updates via drush up or even via Drupal's update.php ever puts the site into maintenance mode, unless you explicitly do that _before_ the updates...

That would be the only caveat I can see.

Also, I never noticed this before, but seeing this hook_update_N took much longer than the "normal" ones, the reason for not noticing could be that "normal" ones are finished before my normal browser-refresh to check on the site is done.

I couldn't remember seeing a CR for this so I did some research:

Turns out when using the Batch API in a hook_update_N and using drush, Maintenance Mode is triggered: https://drupal.stackexchange.com/a/270795

So basically I would say: hook_update_N should be the way to go, batching seems the only way to keep the database from choking whilst doing the purge, so my patch in #18 seems, at the very least IMHO, the correct way of tackling the purge.

Comment #20

Manuel Garcia CreditAttribution: Manuel Garcia as a volunteer commented 18 June 2019 at 08:23

Glad to see there's a way about batching this, nice work figuring that out.

The proposed batch update is going to be very slow since it does a delete query per row. This would be solved by using LIMIT on the delete query per batch, but unfortunately since Postgres does not support his feature, neither does Drupal.

Comment #21

Spokje

Dutch

CreditAttribution: Spokje at Annertech commented 18 June 2019 at 08:30

@Manual Garcia

The proposed batch update is going to be very slow since it does a delete query per row. This would be solved by using LIMIT on the delete query per batch, but unfortunately since Postgres does not support his feature, neither does Drupal.

I agree on the slowness, but I, as you did, found out there's no LIMIT available on a DELETE query in Drupal.
I sadly think, this is the fastest way that is possible with the tools available, but would love to be proven wrong.

Comment #22

gambry

Italian

Milan

CreditAttribution: gambry at manifesto commented 10 September 2019 at 09:18

I'm not sure - but I presume - using DELETE WHERE name IN ([..]) query is faster than looping and querying 500 rows per batch?
i.e.

$query = $database->select('key_value_expire', 'kve');
$query->fields('kve', ['name']);
$query->condition('collection', 'honeypot_time_restriction');
$query->condition('expire', 2147483647);
$query->range(0, $deletable_honeypot_time_restriction_entries_per_batch);
$result = $query->execute();

$database->delete('key_value_expire')
->condition('collection', 'honeypot_time_restriction')
->condition('expire', 2147483647) // Not needed.
->condition('name', $result->fetchCol(), 'IN')
->execute();

It runs in ms on mine, so probably the limit can be bumped up to 1k. And I presume even higher.

UPDATE: to optimise and speed up the delete query we can at this point also remove the ->condition('expire', 2147483647) bit, as the condition already ran above and the primary key for the table is collection + name.

Comment #23

jwilson3

English

Ecuador

CreditAttribution: jwilson3 at Bluespark commented 17 December 2019 at 12:59

+++ b/honeypot.install
@@ -6,7 +6,6 @@
-use Drupal\Core\Link;

@@ -48,7 +47,7 @@ function honeypot_schema() {
-    \Drupal::messenger()->addMessage(t(
+    drupal_set_message(t(

These unrelated lines should be removed from this patch.

See https://www.drupal.org/node/2774931

Comment #24

jwilson3

English

Ecuador

CreditAttribution: jwilson3 at Bluespark commented 17 December 2019 at 12:59

Status:

Needs review

» Needs work

Comment #25

marthinal CreditAttribution: marthinal at Bluespark commented 18 December 2019 at 07:01

Status:

Needs work

» Needs review

File	Size
honeyput_key_value_expire_cleanup-2997609-25.patch	2.04 KB
8.x-1.x: PHP 7 & MySQL 5.5, D8.9 17 pass

Rerolled

Comment #26

Manuel Garcia CreditAttribution: Manuel Garcia as a volunteer commented 18 December 2019 at 11:45

File	Size
interdiff-2997609-25-26.txt	1.17 KB
2997609-26.patch	1.94 KB
8.x-1.x: PHP 7 & MySQL 5.5, D8.9 17 pass

Thanks for the rerroll @marthinal

I had a thought about what @gambry said on #22 and to me it makes a lot of sense, so I went ahead and updated the patch to use that approach. I've not actually tested this so please kindly do so on your environments and report back :)

Comment #27

C-Logemann

German

Frankfurt/M, Germany

CreditAttribution: C-Logemann at Nodegard commented 20 December 2019 at 21:26

Triggering the maintenance mode is not a caveat for me because it's recommended to use maintenance mode when running updates (e.g. mentioned @ /core/UPDATE.txt). Especially site where user data are collected a backup (after activating maintenance mode) is highly recommended. So admins can switch back when an update fails. I just had this situation some days ago.

Comment #28

geerlingguy CreditAttribution: geerlingguy at Midwestern Mac, LLC commented 22 December 2019 at 03:23

I'm willing to merge this into the next release if I can get someone to review it with a live site's backup and make sure it doesn't cause any weird problems on running it.

Comment #29

aleksip

Finnish

Finland

CreditAttribution: aleksip commented 16 February 2020 at 17:57

Status:

Needs review

» Reviewed & tested by the community

I tested the latest patch (#26) on a real production site (Drupal 8.8.1, Honeypot 1.30). The size of the key_value_expire table was 1,101 MB, and it contained 4,059,873 honeypot_time_restriction entries that needed to be deleted.

The update did put the site in maintenance mode. It took 3 hours 46 minutes on a relatively modest DigitalOcean Droplet setup and finished successfully. The size of key_value_expire went down to 5.55 MB.

The patch looks fine to me and seems to work as intended so marking as RTBC.

Comment #30

Manuel Garcia CreditAttribution: Manuel Garcia as a volunteer commented 17 February 2020 at 09:05

Thanks @aleksip for reporting on that, great to hear it worked fine for you.

+++ b/honeypot.install
@@ -73,3 +73,52 @@ function honeypot_update_8100() {
+  $deletable_honeypot_time_restriction_entries_per_batch = 500;

(re #29) I think 3h and 46min is a very long time to execute an update hook. I'm guessing we could do much better here as @gambry mentioned in #22. We may want to bump this to 1000 or even higher.

Any chance you still have that DB before running the upgrade @aleksip? - would be great to test if it runs fine with 1000 or 2000 etc.

Comment #31

geerlingguy CreditAttribution: geerlingguy at Midwestern Mac, LLC commented 17 February 2020 at 15:54

Yikes; deleting 1 GB of db records shouldn't take 3 hours—I agree a larger batch size may be appropriate, at least 1000, maybe even like 5000? It's a matter of playing around with RAM usage vs time per batch.

Comment #32

gambry

Italian

Milan

CreditAttribution: gambry at manifesto commented 18 February 2020 at 14:20

If it tooks 226 min with 500 deletions per batch, increasing this to 5000 (x10) should reduce that time to about 22min, which is much shorter, but still an eternity for some websites to stay in maintenance mode.... have we investigated the possibility to do this cleanup in a cronjob?

Another option could be to make $deletable_honeypot_time_restriction_entries_per_batch configurable, i.e. using Settings or State API. We can set a default of 1000 (about 2hours) or 5000 (about 22min) and site managers can test the batch on stage websites and decide an alternative better value. I presume websites not accepting 22min as downtime have servers which should easily support 7.5k - 10k $deletable_honeypot_time_restriction_entries_per_batch.

Comment #33

geerlingguy CreditAttribution: geerlingguy at Midwestern Mac, LLC commented 18 February 2020 at 14:46

@gambry - We could also, in the release notes for the next version of the module that has this fix, add a warning like "if you would like to avoid having your site in maintenance mode for a while, you can manually remove all the entries using this process: (link here)." That might sidestep the issue entirely.

Comment #34

gambry

Italian

Milan

CreditAttribution: gambry at manifesto commented 24 February 2020 at 11:51

@geerlingguy that is a nice option we could do anyway. But I'm wondering how many site-managers follow the best practices of reading Release Notes before upgrading a module? 🤔

Also an additional option - which could sidestep the issue entirely AND more likely to be viewed by site-manager - is to add a hook_requirements() which queries key_value_expire table and if any "bad" record is found it raises a warning or error and links to a documentation page, or shows directly the query to clean up the table.

Then the work done in here can still be added:
a. as a API/helper for anyone to extend and use it
b. as a custom batch form, for not-technical site-managers
c. as a cron job task, site-managers can opt-out/opt-in from.

Thoughts?

Comment #35

Manuel Garcia CreditAttribution: Manuel Garcia as a volunteer commented 12 June 2020 at 12:01

I think we should go MVP on this and write first the update hook to clean up people's sites before they run into (more) trouble.

We can always do follow ups if people are interested in the ideas on #34.

Comment #36

jonathanshaw

English

Stroud, UK

CreditAttribution: jonathanshaw commented 29 July 2020 at 21:52

The point of #34 is to not use an update hook at all, but to flag the problem and make the solution opt-in. It seems like a good direction to me. People won't read the release notes, they will see their site is offline for some time, the will try to restart their server part way through the update, the issue queues here will get flooded with various kinds of weirdness...

Hook_requirements and a custom batch form might be safer.

Comment #37

geerlingguy CreditAttribution: geerlingguy at Midwestern Mac, LLC commented 30 July 2020 at 14:21

@gambry - I think I like your suggestion, specifically:

as a API/helper for anyone to extend and use it

We could just put the functionality into a helper function and say:

Run the following drush command to clean up the table: drush ev "honeypot_clean_up_old_entries();"

(Not that exact command/name, but basically if someone can just run one drush command (or one database query to run), it would be sufficient.

Comment #38

PapaGrande

English

CreditAttribution: PapaGrande commented 14 October 2020 at 20:53

Status:

Reviewed & tested by the community

» Needs work

We could just put the functionality into a helper function and say:

Run the following drush command to clean up the table: drush ev "honeypot_clean_up_old_entries();"

I agree with this approach and pulling the code out of an update hook.

As a benchmark, I changed $deletable_honeypot_time_restriction_entries_per_batch to 10000, and ran drush updatedb on a Pantheon Multidev server. 7584423 rows were removed in 6 minutes 53 seconds.

If I had deployed the update unawares and a production site was in maintenance mode for almost 7 minutes, I'd have probably had an aneurysm.

Comment #39

pfrenssen

Sofia

CreditAttribution: pfrenssen commented 12 January 2021 at 17:02

Assigned:

Unassigned

» pfrenssen

I'll do an iteration on this. I'll start by implementing the hook_requirements() entry.

Comment #40

pfrenssen

Sofia

CreditAttribution: pfrenssen at Randstad Digital for European Commission and European Union Institutions, Agencies and Bodies commented 13 January 2021 at 11:06

Version:

8.x-1.x-dev

» 2.0.x-dev

Switching to the latest version so I can make an issue fork.

Comment #41

13 January 2021 at 12:43

pfrenssen opened merge request !2

Comment #42

pfrenssen

Sofia

CreditAttribution: pfrenssen at Randstad Digital for European Commission and European Union Institutions, Agencies and Bodies commented 13 January 2021 at 12:54

Status:

Needs work

» Needs review

Created a batch process that can be run in the UI through a link in a hook_requirements() warning in the status report. I did several tests and indeed the higher the number of items deleted in a single batch the faster the process goes. I settled on 50000 records per batch which matched roughly to the refresh time of the batch UI on my machine. If I put it any lower it processes multiple batches per page refresh. RAM usage seemed OK, we are retrieving 50000 integer values from the database after all, these don't take so much space.

Next step is to create a drush command to trigger the batch process.

Comment #43

pfrenssen

Sofia

CreditAttribution: pfrenssen at Randstad Digital for European Commission and European Union Institutions, Agencies and Bodies commented 13 January 2021 at 13:08

Title:

Write update hook to clean up old honeypot_time_restriction values

» 2997609-delete-expired-records

File	Size
status-report.png	25.99 KB
confirm-form.png	36.35 KB
batch-operation.png	18.21 KB

Here's how it looks in the UI:

Status report:
Screenshot of the status report

Confirmation form:
Screenshot of the confirmation form

Batch operation:
Screenshot of the batch operation

Comment #44

pfrenssen

Sofia

CreditAttribution: pfrenssen at Randstad Digital for European Commission and European Union Institutions, Agencies and Bodies commented 13 January 2021 at 13:35

Assigned:

pfrenssen

» Unassigned

Added a drush command that triggers the batch cleanup, it goes something like this:

$ ./vendor/bin/drush honeypot:delete-expired-records
>  [notice] Deleted 50000 of 2068589 records (2.41711% complete)
>  [notice] Deleted 100000 of 2068589 records (4.83421% complete)
>  [notice] Deleted 150000 of 2068589 records (7.25132% complete)
>  [notice] Deleted 200000 of 2068589 records (9.66843% complete)
>  [notice] Deleted 250000 of 2068589 records (12.08553% complete)
>  [notice] Deleted 300000 of 2068589 records (14.50264% complete)
>  [notice] Deleted 350000 of 2068589 records (16.91975% complete)
>  [notice] Deleted 400000 of 2068589 records (19.33685% complete)
>  [notice] Deleted 450000 of 2068589 records (21.75396% complete)
>  [notice] Deleted 500000 of 2068589 records (24.17107% complete)
>  [notice] Deleted 550000 of 2068589 records (26.58817% complete)
>  [notice] Deleted 600000 of 2068589 records (29.00528% complete)
>  [notice] Deleted 650000 of 2068589 records (31.42239% complete)
>  [notice] Deleted 700000 of 2068589 records (33.83949% complete)
>  [notice] Deleted 750000 of 2068589 records (36.25660% complete)
>  [notice] Deleted 800000 of 2068589 records (38.67370% complete)
>  [notice] Deleted 850000 of 2068589 records (41.09081% complete)
>  [notice] Deleted 900000 of 2068589 records (43.50792% complete)
>  [notice] Deleted 950000 of 2068589 records (45.92502% complete)
>  [notice] Deleted 1000000 of 2068589 records (48.34213% complete)
>  [notice] Deleted 1050000 of 2068589 records (50.75924% complete)
>  [notice] Deleted 1100000 of 2068589 records (53.17634% complete)
>  [notice] Deleted 1150000 of 2068589 records (55.59345% complete)
>  [notice] Deleted 1200000 of 2068589 records (58.01056% complete)
>  [notice] Deleted 1250000 of 2068589 records (60.42766% complete)
>  [notice] Deleted 1300000 of 2068589 records (62.84477% complete)
>  [notice] Deleted 1350000 of 2068589 records (65.26188% complete)
>  [notice] Deleted 1400000 of 2068589 records (67.67898% complete)
>  [notice] Deleted 1450000 of 2068589 records (70.09609% complete)
>  [notice] Deleted 1500000 of 2068589 records (72.51320% complete)
>  [notice] Deleted 1550000 of 2068589 records (74.93030% complete)
>  [notice] Deleted 1600000 of 2068589 records (77.34741% complete)
>  [notice] Deleted 1650000 of 2068589 records (79.76452% complete)
>  [notice] Deleted 1700000 of 2068589 records (82.18162% complete)
>  [notice] Deleted 1750000 of 2068589 records (84.59873% complete)
>  [notice] Deleted 1800000 of 2068589 records (87.01584% complete)
>  [notice] Deleted 1850000 of 2068589 records (89.43294% complete)
>  [notice] Deleted 1900000 of 2068589 records (91.85005% complete)
>  [notice] Deleted 1950000 of 2068589 records (94.26716% complete)
>  [notice] Deleted 2000000 of 2068589 records (96.68426% complete)
>  [notice] Deleted 2050000 of 2068589 records (99.10137% complete)
>  [notice] Deleted 2068589 of 2068589 records (100% complete)
>  [notice] Message: Successfully deleted 2068589 expired records.

Comment #45

pfrenssen

Sofia

CreditAttribution: pfrenssen at Randstad Digital for European Commission and European Union Institutions, Agencies and Bodies commented 13 January 2021 at 13:38

Title:

2997609-delete-expired-records

» Provide ways to clean up old honeypot_time_restriction values

Comment #46

idimopoulos CreditAttribution: idimopoulos for European Commission and European Union Institutions, Agencies and Bodies commented 14 January 2021 at 10:36

Status:

Needs review

» Reviewed & tested by the community

To get the easy part out of the way, code looks fine. Tested the UI part and works fine. Status report reports fine.
I must say I don't really enjoy this approach. We are having some stale data and we are creating a form to delete them. This form is a one time form which we will have to ditch later as it only works on records with expiration time set to 2147483647. But then again, we don't know when to remove it because we cannot know when everyone is going to remove the stale data.

However, given
A) the comments starting from #34
B) I tried to manually expire the records on our side. It seems that the system_cron is the one taking care of the expired records but not in batches. system_cron will attempt to delete them all at once. Since key_value_expire is a standalone table (not a fieldable entity or anything complex), it is quite fast, but not fast enough nonetheless. It took me 331 seconds for < 3m records to be deleted (I also had XDEBUG enabled).
C) Creating a different cron run would more or less do the same thing - would work on stale data until no more stale data exist and then would be called forever without purpose.

I am going to give the approach a green.

Comment #47

joseph.olstad

French

CreditAttribution: joseph.olstad commented 12 March 2021 at 13:37

we have 225 megabytes in this table .... just noticed!

***EDIT*** our entries were actually from access_unpublished

Comment #48

joseph.olstad

French

CreditAttribution: joseph.olstad commented 11 March 2021 at 19:01

Status:

Reviewed & tested by the community

» Needs work

Patch 26 ok, but latest merge request doesn't provide a drush command for drush 10.x

does not work with drush 10.x

so I'm going with patch 26 instead as a temporary workaround.

Comment #49

joseph.olstad

French

CreditAttribution: joseph.olstad commented 11 March 2021 at 19:09

Status:

Needs work

» Needs review

OOPS,

honeypot was not enabled, that's probably why I couldn't find the drush command
need to review

Comment #50

gambry

Italian

Milan

CreditAttribution: gambry at manifesto commented 12 March 2021 at 09:30

Status:

Needs review

» Reviewed & tested by the community

I think this was RTBC before your comment @joseph.olstad?
Move it back if I'm missing something.

Comment #51

joseph.olstad

French

CreditAttribution: joseph.olstad commented 12 March 2021 at 13:36

yes the drush command seems to work fine, thanks, my problem was actually access_unpublished expired keys, not honeypot.

Comment #52

loopy1492 CreditAttribution: loopy1492 commented 9 July 2021 at 14:18

@joseph.olstad I think I'm having the same isssue. Did you just use the same command except use 'access_unpublished' instead of 'honeypot'?

Comment #53

Anybody

German

Porta Westfalica

CreditAttribution: Anybody at DROWL.de commented 30 May 2022 at 10:57

Any active maintainer? Would anyone like to contact the maintainer and tell him about this issue or offer Co-Maintainership?

Comment #54

TR CreditAttribution: TR commented 30 May 2022 at 18:18

@Anybody: WTF is that supposed to mean?

This may be RTBC, but it's a terrible solution, which is acknowledged by the person who RTBC'd it above.

It's great to have a patch here for those who need to use it for a one-time cleanup, but IMO it's not something that should be added to Honeypot permanently.

Comment #55

Anybody

German

Porta Westfalica

CreditAttribution: Anybody at DROWL.de commented 30 May 2022 at 18:25

Status:

Reviewed & tested by the community

» Needs work

@TR: No reason for strong language. After reading my comment again, I totally understand that it was too short, and should point out why I thought it would make sense to have a maintainer look at this as it's RTBC. But still doesn't justify your unfriendly reply.

I read the comments, had a look at the status and had the impression it would be useful to have a maintainer have a look at this. And it seemed to me, that there was no maintainer activity in this issue since a longer period of time.

So let's set this back to NW, as it should be, as of your unfriendly comment. Anyway, thanks for your reply... but please let's stay friendly.

To be honest, it was just meant to ask for help with Maintainership, but as I see now, you propably read it as criticism.

Comment #56

claudiu.cristea

Romanian

Arad 🇷🇴

CreditAttribution: claudiu.cristea as a volunteer and at Webikon for European Commission and European Union Institutions, Agencies and Bodies commented 13 July 2022 at 21:11

If I'm running drush core:requirements immediately after a clean site install, I'm getting:

  SQLSTATE[42S02]: Base table or view not found: 1146 Table '9ToAslWwz5.key_v  
  alue_expire' doesn't exist: SELECT COUNT(*) AS "expression"                  
  FROM                                                                         
  (SELECT 1 AS "expression"                                                    
  FROM                                                                         
  "key_value_expire" "kve"                                                     
  WHERE ("collection" = :db_condition_placeholder_0) AND ("expire" = :db_cond  
  ition_placeholder_1)) "subquery"; Array                                      
  (                                                                            
      [:db_condition_placeholder_0] => honeypot_time_restriction               
      [:db_condition_placeholder_1] => 2147483647                              
  )                                                                            
                                                                               
In StatementWrapper.php line 145:
                                                                               
  SQLSTATE[42S02]: Base table or view not found: 1146 Table '9ToAslWwz5.key_v  
  alue_expire' doesn't exist

The key_value_expire table is created on demand. This explains the failure.