In IRC just now, I was asked to forcibly unsubscribe someone from some project issue queue email subscriptions, since they had an evil "if you're really a human, you must reply again" spam filter thing, which was bouncing back to the project owners. ( if you happen to care).

In the process, I noticed this rather shocking fact:

mysql> select count(*) from project_subscriptions;
| count(*) |
|  1514537 | 
1 row in set (1.14 sec)

Good god. No wonder the issue queue seems sluggish. :( Every time someone creates or follows up to an issue, we have to query that enormous table to see who should get email about it. Yikes.

I suppose this really belongs in the project_issue queue, but I wanted it to at least start life as a d.o infra issue so that other folks might see it and decide they want to help... I'm not really sure what we can do about it, in fact. I suppose we could be smarter about how we store "subscribe to all issues" than adding different rows for each project. That'd probably help.

mysql> select count(distinct uid) from project_subscriptions;
| count(distinct uid) |
|                1897 | 

There's only on the order or 2K users subscribed to any queues. They're all probably subscribed to either less than 10 queues, or subscribed to "all of my issues in any queue". There are some open issues about the crappy behavior of "subscribe to all" anyway, since that only subscribes you to all the existing projects... new projects that are added don't take effect.

Anyway, this seems like an area where we could definitely improve the load on the DB and the performance of the issue queues without too much hacking. Any volunteers?


kbahey’s picture

Would be interesting to see what this says:

SELECT uid, COUNT(*) AS cnt FROM project_subscriptions GROUP BY uid ORDER BY cnt DESC LIMIT 10;

Would tell us if a few are over represented.

Also, check if there are dups for (nid, uid). I see that the schema is:

CREATE TABLE IF NOT EXISTS {project_subscriptions} (
          nid int(10) unsigned NOT NULL default '0',
          uid int(10) unsigned NOT NULL default '0',
          level tinyint(3) unsigned NOT NULL default '0',
          KEY project_subscriptions_nid_uid_level (nid, uid, level)

If that turns out to be the case, then maybe a unique key on (nid, uid) so no one can subscribe more than once?

dww’s picture

Yeah, I already looked into something like your query, kbahey. There are a *huge* number of users with 1000+ subscriptions.

I'm nearly positive that we need to just split out the 'individual' subscriptions and the "subscribe all" stuff. The schema and the code isn't even really the problem -- it's the upgrade path. :(

Should we just send out a bulk email to every uniq uid in {project_subscriptions} and say "The email subscriptions have changed in a drastic way that requires you to resubscribe"?

Should we use some heuristic like if you're subscribed to less than 100 issue queues we assume those are individual, and more than 100, it must be a 'subscribe all' from some point in time?

Fixing this in the backend might be a good chance to also fix the subscribe UI, too.

Some related issues worth looking at:

Moshe's offer to use job_queue.module would be perfect for this:

This is the root of the problem for the "subscribe all" vs. individual subscriptions.

Crappy UI for the project/issue/subscribe page:

A way to specify site-wide defaults for issue subscription:

Any help with any/all of the above would be most welcome.

David Strauss’s picture

From a performance perspective, the only thing that should matter is the number of subscriptions per issue. The overall count and the number of subscriptions per user should not matter.

David Strauss’s picture

Oh, and the primary key for {project_subscriptions} ought to be (nid, uid). I have no idea why it would include level.’s picture

mysql> select count(*) from project_subscriptions where level = 0;
| count(*) |
| 1042096 |

That means that 2/3 of all "subscriptions" aren't really subscriptions.

mysql> select count(*) from project_subscriptions where uid = 0;
| count(*) |
| 1724 |

Probably provides us witha nice amount of bounces...

David Strauss’s picture

If level = 0 or uid = 0 means that it's not really a subscription, then we need to clean out that table. As a plan B, we can add an index on (nid, level, uid) and query the subscriptions with the criteria "nid = %d AND level > 0 AND uid > 0".

Jose Reyero’s picture

I think for life subscription to issues you comment on is fair enough, so people just comments on issues that matter to them. However there should be some warning about it.

(I just wish we could set up a different email for the contact tab, as these are the only ones I read anyway)

philbar’s picture

geerlingguy’s picture

Subscribe.’s picture

Title: 1.5 million rows in {project_subscriptions} » 6 million rows in {project_subscriptions}

I've put the current numer into the title. However, since 0 is indeed no subscribtion, I plan to delete the 3.3 Mio rows that have level = 0.

I have already deleted the 1.7k rows with uid = 0.’s picture

delete from project_subscriptions where level = 0;
Query OK, 3725244 rows affected (1 min 16.22 sec)

There is a bug in the code which uses !== 0 do stop inserting these rows into the DB.

dww’s picture

Assigned: Unassigned » dww
Status: Active » Fixed
Issue tags: notifications

Thanks to #34496: [meta] Add Flag module to allow users to subscribe/unsubscribe without posting a comment and in particular #397458-78: Revamp mailing logic to leverage flag module the {project_subscriptions} table no longer exists. It's been replaced with a pair of new tables:

mysql> select count(*) from project_issue_notification_project;
| count(*) |
|    42795 | 
1 row in set (0.01 sec)

mysql> select count(*) from project_issue_notification_global;
| count(*) |
|     1040 | 
1 row in set (0.00 sec)

Status: Fixed » Closed (fixed)
Issue tags: -flag integration, notifications

Automatically closed -- issue fixed for 2 weeks with no activity.

Component: Database » Servers