Can we remove the row created for the author of a message in pm_index? currently when a message is created, a row is created for each recipient, and also one for the author. This seems like it complicates things without really adding any functionality.

As a preliminary attempt to patch this, I simply removed the query that adds this row from pm_send(). All tests still pass after this, so I don't see any red flags.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

litwol’s picture

Tests are not completely encompasing all PMSG functionality so major decisions should not be made on behalf of *all tests passing*, just FYI.

On the other hand, various non intuitive behavior of PMSG such as one described above do make developer life difficult. So this is worthy for a more thorough discussion.

JirkaRybka’s picture

This is VERY related to: #490650: Message sent to oneself breaks new/#replies data loading (although it's NOT a duplicate)

NaheemSays’s picture

Status: Needs review » Needs work

If this ever goes in, we will also need an update function to clear up the past duplicates.

Not tested, but does this patch break either "Sent messages" or "inbox" (or even "all messages") from privatemsg filter? I would suspect that this change would alter the nature of one two or all three of those listings.

(it may be just a case of fixing the code there to use the new methods etc, and it may simplify it too, but without it, this patch is incomplete).

A functional problem with this change is that an authored message would have no thread_id for the author (I think this would break reading/replying to unreplied message sent to another). How do we overcome that? would we need more extensive schema changes?

@litwol - you mentioned in another issue about a better schema from a speed perspective... if we do change schema, what were your thoughts there (or was it just a generic idea?) if we do change schema, we should try to do it as few times as possible IMO, so it may be a good idea to roll all ideas into one.

Berdir’s picture

I agree that we should remove it.

However, it is not as easy as just removing that line. This wil currently break several things, for example the all messages and sent messages tab, because they expect that to be there.

@nbz: regarding schema change. We need to think about that, but I would prefer to do it after we have a 1.x release, either 6.x-2.x (not the current one, obviously..) or even 7.x-1.x...

NaheemSays’s picture

If we do do anything, the approach by JirkaRybka in #490650-1: Message sent to oneself breaks new/#replies data loading is another one to evaluate - subtly different and this should not have the same problem about the thread_id as the patch in this issue.

A question about the complexity this issue is trying to solve - what exactly is it? (just to make sure we are not trrading one set of issues for another set that is the same or even worse...)

@nbz: regarding schema change. We need to think about that, but I would prefer to do it after we have a 1.x release, either 6.x-2.x (not the current one, obviously..) or even 7.x-1.x...

I prefer waiting for 7.x-1.x-dev too, and if we use fields api, there will necessarily be one there. However if we do decide to change how things are stores, it may aswell not be moving from one sub optimal approach to another suboptimal approach.

litwol’s picture

I am big +1 on doing schema changes after 7.x, but none the less discussion on it need to happen early in the game. Seems that we are sticking to what we have right now until we are able to find a concrete solution to our problems, i am fine with that.

NaheemSays’s picture

Just to add here, I don't think it is a god idea to remove the author from the index - that could cause threading problems.

What I think we need is to remove duplicate data insertion and also have an update function that removes existing duplicates - so one row per participant per message.

Berdir’s picture

I agree. I did a short test some time ago and it turned out to be quite a hassle.

Removing the duplicates could really help though, we could then add a UNIQUE (or even primary key) on (uid, mid) to be able to access each entry explicitly. It might be a bit of a hassle for PostgreSQL, since that doesn't support DELETE ... LIMIT statements but I'm sure we can find a way.

PS: @nbz: If you have time, can you join #drupal-games?

Berdir’s picture

I did some more tests with this, short version: This is more complicated than it looks ;)

The problem is that we can't figure out if the message has been sent to the current user explicitly. It still works, but the message won't show up in the inbox, for example.

YK85’s picture

subscribing

Berdir’s picture

Title: remove author uid from pm_index » remove duplicate records from pm_index

Setting correct title...

crea’s picture

I don't see a problem here. Actually having author in the pm_index.uid column helps to find all user's threads i.e. threads where he participates (otherwise you need to make extra joins). It's confusing when you think about it as "recipient user" but if you change it to "participant" it makes perfect sense.
Privatemsg Views already uses this column and this change will totally break it.
I suggest to "won't fix" this.

Berdir’s picture

See #7 and #8. We agreed already that it's too complicated to remove the author from pm_index. However, we want to remove duplicates which can for example be created if you explicitly send a message to yourself. Removing these will allow us to add a primary key on mid, uid.

Berdir’s picture

Title: remove duplicate records from pm_index and add primary key » remove duplicate records from pm_index
Version: » 6.x-1.x-dev
Status: Needs review » Needs work
FileSize
5.09 KB

Attaching a first version of the patch.

The actual changes are very simply, we just need to conditionally create an entry for the author only when necessary.

The tricky part is the upgrade function. First I tried some fancy sub-query stuff to be able to delete all entries minus one for PostgreSQL too (It's easy for MySQL). But that doesn't work well when there are some entries that are unread and some are. So the current patch deletes all entries and then re-insert a single one, if existing, unread and undeleted.

I tested this on the data from #717876: PM_Index ridiculous size. and the upgrade function managed to get the entries from 2,128,682 down to 907,464, mostly because of bugs in older versions of privatemsg.module.

However, testing of other data is very important for this issue, we *are* deleting data here (Probably unecessary data now, but still). Also, the update function tries to avoid running before privatemsg_filter_update_6003 if privatemsg_filter is enabled to tag messages sent to themself first. Not sure if that works correctly, so please check in which order the updates are run. If my code works, privatemsg_update_6008 should display message that you need to re-run update.php if privatemsg_filter_update_6003 has not been run yet.

Something other to test is whats better: PK on uid, mid or PK on uid, mid, thread_id. uid, mid is unique, but we often query on thread_id. The order might be important too.

Berdir’s picture

Version: 6.x-1.x-dev »
Status: Needs work » Needs review
Berdir’s picture

Title: remove duplicate records from pm_index » remove duplicate records from pm_index and add primary key
Berdir’s picture

Title: remove duplicate records from pm_index » remove duplicate records from pm_index and add primary key
Version: 6.x-1.x-dev »
Status: Needs work » Needs review

#14: privatemsg_remove_duplicates.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, privatemsg_remove_duplicates.patch, failed testing.

Berdir’s picture

Status: Needs work » Needs review
FileSize
5.26 KB

Re-roll.

I'd *love* to see some testing of the upgrade path.

BenK’s picture

I tried the patch in #19, but I couldn't get it to apply.... But since I'm not using the most up-to-date version of Private Message on my D6 test site, the problem could be on my end. I'll try again on a clean D6 install... just may need to manually make a bunch of test messages to test this properly.

--Ben

Berdir’s picture

#19: duplicate_recipient3.patch queued for re-testing.

Edit: Still passes the tests, so this should apply.

BenK’s picture

Status: Needs review » Reviewed & tested by the community

Hey Berdir,

On a fresh D6 install (with the latest 6.x-2.x-dev version of Private Message) the patch applied fine and everything worked great.

After applying the patch, in addition to all of the successful patching messages, I did get a command line message that said "You have mail in /var/mail/benkaplan". I'm not sure why that would suddenly pop up after applying a patch, but maybe that's something else happening on my laptop.

I successfully ran update.php and the database update worked fine, with no error messages.

Before running the patch I created about 25-30 test messages between 4-5 users and after the patch all of those messages still worked as expected (with them properly labeled "new" or not depending if I had already looked at them). And then I sent another 10-15 messages after the patch and all of those were properly labeled "new" too. Reading the messages removed the "new" label so everything seems to be fine in that regard.

So as far as I can tell everything is working just as expected after applying the patch. I think this is ready to be committed to D6 (and then ported to D7).

--Ben

Berdir’s picture

Version: » 7.x-1.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

Awesome, commited to 6.x-2.x. I guess if there are issues with big amounts of private messages, we'll get an issue about it soon. Will port this soon.

Berdir’s picture

Status: Patch (to be ported) » Needs review

Re-roll, will commit this once the tests passed.

Berdir’s picture

Not only is the patch missing, the comment is actually the wrong way round. Will need to fix this in 6.x-2.x too ;)

Berdir’s picture

Erm, no, the comment is correct.

Anyway, the code is the same.

Status: Needs review » Needs work

The last submitted patch, duplicate_recipient3_d7.patch, failed testing.

BenK’s picture

I'll test this manually as soon as the automated tests pass....

Berdir’s picture

Status: Needs work » Needs review
FileSize
2.16 KB

Ups, the code was actually the wrong way round too ;)

This should pass.

Status: Needs review » Needs work

The last submitted patch, duplicate_recipient3_d7-3.patch, failed testing.

Berdir’s picture

Status: Needs work » Needs review
FileSize
2.17 KB

Well, but now...

BenK’s picture

Status: Needs review » Needs work

Hey Berdir,

I did some testing of the new patch in #31 and most things worked as expected, but I noticed one bug: If you write a new message and then send it, the message is showing up as "new" in both your "Sent Messages" and "All Messages" pages.

This is problematic because it also means that in the "Messages" link in the user menu (or anywhere else there is a new message notification), it is displaying the message you just sent as a new message ("Messages (1 new)") even though it was the message that you sent yourself. Also, confusing to the user because when they go to their inbox, nothing is new (even though they have been notified that there is a new message).

I've confirmed that the problem exists both for sending to users and sending to roles.

I've also confirmed that this bug didn't exist until I applied the patch in #31. I'm not sure if we missed this in the D6 version of the patch or if this is a bug that is exclusive to the D7 version.

--Ben

Berdir’s picture

Status: Needs work » Needs review
FileSize
2.17 KB

Nice catch.

No, this bug was introduced when porting. That entry obviously needs to be added as not new.

I must have been smoking something really bad when I ported the patch. So many tries for something so simple....

BenK’s picture

Status: Needs review » Reviewed & tested by the community

The patch in #33 works great. The new message bug on sent messages has been solved. So this is ready to be committed!

Thanks,
Ben

Berdir’s picture

Status: Reviewed & tested by the community » Fixed

Thanks for testing, commited.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

bdlangton’s picture

Version: 7.x-1.x-dev » 6.x-1.x-dev
FileSize
3.73 KB

We need this patch on 6.x-1.x, so here it is.

pdrake’s picture

As a followup to #37, this patch just increases the number of threads from 20 to 20,000 because when executing updatedb batches using drush, it rebuilds the module cache twice for each iteration. For very large numbers of threads (1M) this causes the update hook to take... forever. With the change to 20,000 the update hook takes only a few minutes.

fuzzy76’s picture

Is this actually commited to 6.x as well, or does the port need testing?