Support from Acquia helps fund testing for Drupal Acquia logo

Comments

tvn’s picture

Project: Drupal.org site moderators » Project issue tracking
Version: » 7.x-2.x-dev
Component: Redesign » Issues
Priority: Major » Critical
Issue tags: +Drupal.org 7.1

Thanks for the report, I also noticed this on a couple of issues. I think this might be related to what we did to fix #2119133: Body field populated by past revision's content by migration.

jonathan1055’s picture

Here are two more examples. Both of these were closed in the D6 version.

#1854388: Scheduler settings fields empty on content type edit - now shows as 'active' due to comment #7 being dated last, even though we have #8 and #9

#270219: Nodes are not being published. - now shows as 'fixed' due to comment #9 being dated later, even though we have comment #10 closing the issue.

It seems to be taking the status of the latest-dated comment, even though that comment was not the last added. Maybe due to the old comments being able to be editted, thus making them out of sequence date-wise?

Jonathan

jonathan1055’s picture

Title: Issue RTBC when last comment set to needs work » Issue status not taken from last comment

Changed title to be more generic, and therefore more easily searchable.

webchick’s picture

Found another example at #1285304: Get tags from a pagerDefault instance. Was very surprised to see the oldest RTBC issue in the Drupal core queue from 1+ years ago. :D

jonathan1055’s picture

Yes, in #1285304: Get tags from a pagerDefault instance it looks like the status has been taken from comment #2 which was RTBC. That had been editted and had a timestamp of 6:12. Comments #3-5 were later than the original addition of #2, but they had timestamps of 6:03 and 6:04. I think the issue status needs to be taken from the highest comment number, not the latest comment timestamp.

Jonathan

alexpott’s picture

Looks like #1005004: Editing a comment destroys its creation date was never fixed in D6 :)

jonathan1055’s picture

Also, for reference, in #122098: Split comment.timestamp into 'created' and 'updated' columns for D7, we have two comment timestamp columns.

tvn’s picture

jonathan1055’s picture

Status: Active » Needs review

Four people have confirmed this behaviour, therefore setting to 'needs review' as per project guidelines.

drumm’s picture

Status: Needs review » Active

Since this has been triaged out to a contrib project, we follow the usual customs for issue status. Sorry for any confusion.

dww’s picture

Honestly, I'm not sure what can be done about this now. This was a bug in the data migration code. The migration already happened, and no one noticed during the weeks of QA, nor in the middle of the night when the d.o upgrade was happening and we had a chance to abort and restore the D6 site. Now that the D7 version has been live for multiple days, there's no way to go back and re-migrate the issues. I hate to say it, but I think "can't fix" would be the most appropriate status here. :/

I suppose we could spend many hours trying to write some code to try to go back and re-process every issue, attempt to find the correct status, and update everything that's not matching the expected values. But that seems like a huge use of resources for something potentially very risky and error prone that might be worse than the disease. If we were to go this route, we'd probably need to schedule more d.o downtime, too.

I'm curious what drumm thinks about this. But sadly, I believe there's basically nothing we can do to solve this now...

webchick’s picture

Hm. I'm not sure why it would matter how long the dataset has been incorrect. Isn't it basically just:

foreach ($issue) {
  $last_comment = comment_load(db_result("SELECT max(cid) FROM comment WHERE nid = $issue->nid"));
  if ($issue->status != $last_comment->status) {
    $issue->status = $last_comment->status;
    node_save($issue);
  }
}

Totally handwaving pseudocode, obviously. ;)

I guess it's probably true that there would be some downtime associated with fixing this, though probably more in the minutes than the hours range. But OTOH, between this bug and the one where issue queues aren't showing the right statuses (can't find it atm), it's getting really hard to trust what we're seeing atm. :\

drumm’s picture

My theory is that the field cache is latching on to stale data from the slave DB server.

drumm’s picture

Whoops, wrong issue for #13.

jonathan1055’s picture

I've just been looking through the project_issue.migrate.inc file. I can see on line 685 where it is iterating over all issue data to create the sequence of changes the sort is by nid and sort_timestamp, which for comments was taken from the 'created' timestamp. But I guess that because on D6 there was only one timestamp field in the comments table, this must have got updated when a comment was edited. It looked like the code was doing the correct thing, but because the D6 to D7 migration of the comment table had to replicate the one field into both columns you were working with a 'created' field which actually held the 'changed' timestamp data. Bad luck ;-)

Are the migration data tables still available for you to use, for example project_issue_migration_original_issue_data and project_issue_migration_timeline_init, etc? If so, we could write the kind of code that webchick suggests and at least find out how big the problem is, before actually fixing the real data. If the results were made available in this thread, we could check a few projects to see if we agree with the changes.

Jonathan

guy_schneerson’s picture

Got the same issue with commerce_stock can just update the state but will wait to see how this issue resolves.

jonathan1055: sounds like a good idea to find out the scope of the issue first.

MustangGB’s picture

+1 (sorry, follow button has disappeared)

I do have something useful to contribute though.

In #434562: Node revision does not get created when BODY field is omitted or left blank comment #4 has a timestamp after #5 instead of before, so could this be the cause, or is this a separate issue.

jonathan1055’s picture

Thanks. Yes that is exactly the cause - you have found one of the affected issues - it follows exactly the pattern I described in #2 and #5. I presume the status of that issue was 'Closed(duplicate)' before the upgrade.

May I suggest you do not alter it, so that we can use it alongside the other examples for testing potential fixes.

TR’s picture

Here's another: a very old core issue, closed almost 5 years ago, re-opened by the upgrade: #357304: Cron seems not to be working despite it says so...

There are also other re-opened issues detailed in #2126789: Issue marked Closed (fixed) > three years ago is now showing up as Active.

I think it's fair to say there are hundreds of these problem issues cluttering up the queues, so I don't think it's something that should be ignored and marked as won't fix. The fix doesn't have to go back and correct all the history that was improperly changed by the upgrade, but at a minimum it should:
1) Search the queue and identify all issues with the wrong status.
2) Reset those statuses to their correct values.

jonathan1055’s picture

Hi TR, thanks for pointing to that old one - yes, exactly like the pattern we've seen, the comment timestamp in #2 is later than the final comment #4 causing the wrong ultimate status.

I agree that we should not let this issue go unfixed. From the comments in #12 we have an idea of how to tackle this, but the data is not available to ordinary users, so I can't do anything about it. If the migration tables I mentioned in #15 were somehow made available somewhere, or zipped and added to this thread for download, then we could atleast attempt to get an idea of the scope of the problem. I guess the tables are quite large but I'm sure we could get round that problem.

Jonathan

jonathan1055’s picture

Title: Issue status not taken from last comment » Issue status not taken from last comment during D7 migration
Related issues: -#2125729: Old already closed issues are closed again (and notifications are sent), -#2126789: Issue marked Closed (fixed) > three years ago is now showing up as Active

Clarified title. Removed two related issues, as those already refer to this one. Do not need to see them repeated in the sidebar, just slows down comprehension.

jonathan1055’s picture

Now that some of the other issues have been resolved, is anyone able to provide the issue tables (or a sample of) that were used during the migration. We do not even know the scope of the problem yet.

Jonathan

TR’s picture

I keep running into this when trying to manage my issue queues and when trolling the core issue queue for problems to fix - there are a LOT of issues that had their status improperly changed from closed to open during the migration. I found three new ones just today.

I'd be willing to write a script to identify the broken issues and correct the problem (i.e. reset the issue status on those issues affected by the migration). But I would need a portion of the database like jonathan1055 requested in #20 and #22 in order to develop code to identify the problem issues. Once the issues are identified, it should be easy to reset the status.

drumm’s picture

See https://drupal.org/node/1018084 for access to dev, which is production-like data. We do trim out the tables constructed during the upgrade, so it may not have what's needed here. It does have the D6 tables, like project_issues, which may be better anyway.

Since this is just data collection initially, you don't need a whole dev site. The most recently build dev site is always the best one for read-only queries.

mgifford’s picture

would it be good to have these listed? I just ran into one here #874370: System messages need identifying icons (WCAG 2.0) that should be fixed. I updated this manually, but it really helps erode confidence.

I don't know how to fix this but it should be made a higher priority I think. It's hard to manage your queues when stuff is supposed to be fixed, but isn't.

xjm’s picture

#1000784: Another Fatal error: Out of memory is an issue with that behavior, pointed out by @helmo in #2130059: Issue status change not consistently saved in node revisions. It hasn't been updated yet so probably a good candidate to investigate.

xjm’s picture

This is very interesting.

mysql> mysql> select from_unixtime(n.changed), s.field_issue_status_value 
    -> from field_data_field_issue_status s
    -> inner join node n on n.nid = s.entity_id
    -> where n.nid='1000784';
+--------------------------+--------------------------+
| from_unixtime(n.changed) | field_issue_status_value |
+--------------------------+--------------------------+
| 2011-01-09 15:44:20      |                        1 | 
+--------------------------+--------------------------+
1 row in set (0.00 sec)
mysql> select nr.vid, from_unixtime(nr.timestamp), sr.field_issue_status_value
    -> from field_revision_field_issue_status sr
    -> inner join node_revision nr on nr.nid = sr.entity_id and nr.vid = sr.revision_id
    -> where nr.nid='1000784'
    -> order by nr.vid desc;
+---------+-----------------------------+--------------------------+
| vid     | from_unixtime(nr.timestamp) | field_issue_status_value |
+---------+-----------------------------+--------------------------+
| 4952701 | 2011-01-09 15:44:19         |                        1 | 
| 4952699 | 2011-01-01 01:50:02         |                        7 | 
| 4952697 | 2010-12-18 01:43:52         |                        2 | 
| 4952695 | 2010-12-18 01:42:19         |                        1 | 
| 4952693 | 2010-12-18 01:37:26         |                        2 | 
| 4952691 | 2010-12-17 04:47:05         |                        1 | 
| 4952689 | 2010-12-17 03:39:50         |                       16 | 
| 1274612 | 2011-01-09 15:44:20         |                        1 | 
+---------+-----------------------------+--------------------------+
8 rows in set (0.23 sec)
mysql> select cid, from_unixtime(created), from_unixtime(changed) from comment where nid='1000784' order by cid desc;
+---------+------------------------+------------------------+
| cid     | from_unixtime(created) | from_unixtime(changed) |
+---------+------------------------+------------------------+
| 3886846 | 2011-01-01 01:50:02    | 2011-01-01 01:50:02    | 
| 3842262 | 2010-12-18 01:43:52    | 2010-12-18 01:43:52    | 
| 3842256 | 2010-12-18 01:42:19    | 2010-12-18 01:42:19    | 
| 3842254 | 2010-12-18 01:37:26    | 2010-12-18 01:37:26    | 
| 3842198 | 2010-12-18 01:15:34    | 2010-12-18 01:15:34    | 
| 3841772 | 2010-12-17 22:53:01    | 2010-12-17 22:53:01    | 
| 3841726 | 2011-01-09 15:44:19    | 2011-01-09 15:44:19    | 
| 3841618 | 2010-12-17 22:13:12    | 2010-12-17 22:13:12    | 
| 3841596 | 2010-12-17 22:08:27    | 2010-12-17 22:08:27    | 
| 3841542 | 2010-12-17 21:55:15    | 2010-12-17 21:55:15    | 
| 3841516 | 2010-12-17 21:53:45    | 2010-12-17 21:53:45    | 
| 3841316 | 2010-12-17 20:59:05    | 2010-12-17 20:59:05    | 
| 3841236 | 2010-12-17 20:37:01    | 2010-12-17 20:37:01    | 
| 3841218 | 2010-12-17 20:31:40    | 2010-12-17 20:31:40    | 
| 3838576 | 2010-12-17 04:51:36    | 2010-12-17 04:51:36    | 
| 3838566 | 2010-12-17 04:47:05    | 2010-12-17 04:47:05    | 
| 3838546 | 2010-12-17 04:35:56    | 2010-12-17 04:35:56    | 
| 3838542 | 2010-12-17 04:32:20    | 2010-12-17 04:32:20    | 
| 3838538 | 2010-12-17 04:30:30    | 2010-12-17 04:30:30    | 
| 3838520 | 2010-12-17 04:25:14    | 2010-12-17 04:25:14    | 
| 3838500 | 2010-12-17 04:14:44    | 2010-12-17 04:14:44    | 
| 3838440 | 2010-12-17 03:39:50    | 2010-12-17 03:39:50    | 
+---------+------------------------+------------------------+
22 rows in set (0.00 sec)

So somehow, mysteriously, the first and last revisions for this issue are mysteriously identical with the same date of 2011-01-09. Edit: which somehow is in the middle of the comment thread.

xjm’s picture

Issue summary: View changes

Adding a list of all the affected issues I've found to the summary, so that the different variations of this bug can be investigated.

xjm’s picture

(moving this up into the other comment so the queries for the same issue are together)

jonathan1055’s picture

Hi xjm
In #27 why is it so mysterious? In the issue #1000784: Another Fatal error: Out of memory comment 16 is on 9th Jan 2011 (ie editted on that date), which caused the status of the issue to be taken from #16 not #22. Isn't that exactly the same behaviour we have been seeing on all the other issues listed here?

Jonathan
[edit I wrote this comment before the table of data above in #29 was posted]

xjm’s picture

@jonathan1055 yes, I just noticed that as well and edited my comment. But it is mysterious because I have no clue how that happened to any of these issues. :) I'm just trying to sort out how to get them all from a query so that I can work around this bug in core issue metrics.

jonathan1055’s picture

But it is mysterious because I have no clue how that happened to any of these issues

What do you mean, exactly? We know precisely how it happens, as explained in #2 and followed up in #15.

Jonathan

xjm’s picture

For #1854388: Scheduler settings fields empty on content type edit:

mysql> select from_unixtime(n.changed), s.field_issue_status_value
    -> from field_data_field_issue_status s
    -> inner join node n on n.nid = s.entity_id
    -> where n.nid='1854388';
+--------------------------+--------------------------+
| from_unixtime(n.changed) | field_issue_status_value |
+--------------------------+--------------------------+
| 2012-12-08 11:38:58      |                        1 | 
+--------------------------+--------------------------+
1 row in set (0.96 sec)

mysql> select nr.vid, from_unixtime(nr.timestamp), sr.field_issue_status_value
    -> from field_revision_field_issue_status sr
    -> inner join node_revision nr on nr.nid = sr.entity_id and nr.vid = sr.revision_id
    -> where nr.nid='1854388'
    -> order by nr.vid desc;
+---------+-----------------------------+--------------------------+
| vid     | from_unixtime(nr.timestamp) | field_issue_status_value |
+---------+-----------------------------+--------------------------+
| 6211773 | 2012-12-08 11:38:58         |                        1 | 
| 6211771 | 2012-12-08 11:36:54         |                        3 | 
| 6211769 | 2012-12-06 15:45:38         |                        1 | 
| 6211767 | 2012-12-05 21:42:52         |                        1 | 
| 2465652 | 2012-12-08 11:38:58         |                        1 | 
+---------+-----------------------------+--------------------------+

mysql> select cid, from_unixtime(created), from_unixtime(changed) from comment where nid='1854388' order by cid desc;
+---------+------------------------+------------------------+
| cid     | from_unixtime(created) | from_unixtime(changed) |
+---------+------------------------+------------------------+
| 6823902 | 2012-12-08 11:36:54    | 2012-12-08 11:36:54    | 
| 6820854 | 2012-12-07 12:26:08    | 2012-12-07 12:26:08    | 
| 6820256 | 2012-12-08 11:38:58    | 2012-12-08 11:38:58    | 
| 6817550 | 2012-12-06 15:45:38    | 2012-12-06 15:45:38    | 
| 6817434 | 2012-12-06 15:16:32    | 2012-12-06 15:16:32    | 
| 6814876 | 2012-12-05 22:01:51    | 2012-12-05 22:01:51    | 
| 6814708 | 2012-12-05 21:42:52    | 2012-12-05 21:42:52    | 
| 6809788 | 2012-12-04 18:16:35    | 2012-12-04 18:16:35    | 
| 6801404 | 2012-12-02 15:04:16    | 2012-12-02 15:04:16    | 
+---------+------------------------+------------------------+
9 rows in set (1.77 sec)

So this one has the same pattern: A comment in the middle of the list when ordered by ID has a newer date than the final one, and so that comment's status ended up on the node and the final node revision. Let's compare to a "normal" (uncrazified) node, #440040: file_check_directory chmod unsets sticky bit.:

mysql> select from_unixtime(n.changed), s.field_issue_status_value
    -> from field_data_field_issue_status s
    -> inner join node n on n.nid = s.entity_id
    -> where n.nid='440040';
+--------------------------+--------------------------+
| from_unixtime(n.changed) | field_issue_status_value |
+--------------------------+--------------------------+
| 2011-10-12 06:20:24      |                        7 | 
+--------------------------+--------------------------+
1 row in set (0.11 sec)

mysql> select nr.vid, from_unixtime(nr.timestamp), sr.field_issue_status_value
    -> from field_revision_field_issue_status sr
    -> inner join node_revision nr on nr.nid = sr.entity_id and nr.vid = sr.revision_id
    -> where nr.nid='440040'
    -> order by nr.vid desc;
+---------+-----------------------------+--------------------------+
| vid     | from_unixtime(nr.timestamp) | field_issue_status_value |
+---------+-----------------------------+--------------------------+
| 3985877 | 2011-10-12 06:20:24         |                        7 | 
| 3985875 | 2011-06-26 02:40:05         |                       13 | 
| 3985873 | 2011-06-15 14:11:11         |                       16 | 
|  524410 | 2011-10-12 06:20:24         |                        1 | 
+---------+-----------------------------+--------------------------+
4 rows in set (0.04 sec)

mysql> select cid, from_unixtime(created), from_unixtime(changed) from comment where nid='440040' order by cid desc;
+---------+------------------------+------------------------+
| cid     | from_unixtime(created) | from_unixtime(changed) |
+---------+------------------------+------------------------+
| 5106144 | 2011-10-12 06:20:24    | 2011-10-12 06:20:24    | 
| 4655362 | 2011-06-26 02:40:05    | 2011-06-26 02:40:05    | 
| 4614278 | 2011-06-17 05:50:00    | 2011-06-17 05:50:00    | 
| 4605360 | 2011-06-15 14:11:11    | 2011-06-15 14:11:11    | 
| 1710670 | 2009-06-17 06:43:36    | 2009-06-17 06:43:36    | 
| 1637570 | 2009-05-28 09:28:53    | 2009-05-28 09:28:53    | 
| 1500498 | 2009-04-21 07:51:26    | 2009-04-21 07:51:26    | 
+---------+------------------------+------------------------+
7 rows in set (0.44 sec)

So the first and last revisions having the same date isn't an indicator -- only the mysterious comment ordering bit.

xjm’s picture

What do you mean, exactly? We know precisely how it happens, as explained in #2 and followed up in #15.

Jonathan

You are reading way too much into my use of the word mysterious. ;) It is clearly caused by a bizarre bug in the D6 to D7 migration. But really, that's not the point... I'm just working to find a way to identify all of the affected issues with a query so that I can work around the bug.

xjm’s picture

Oh, also, it'd be great to add your findings from #15 to the issue summary. Totally missed them.

xjm’s picture

So this (expensive) query will find all the nodes affected by this bug:

select max_cid.nid, from_unixtime(max_cid.created) as max_cid_created, from_unixtime(max_timestamp.created) as max_timestamp_created
from
(
select c.nid, c.cid, c.created
from comment c
inner join node n on n.nid = c.nid
left join comment c2 on (c.nid = c2.nid and c.created < c2.created)
where n.type = 'project_issue'
and c2.nid is null
) max_timestamp
inner join
(
select c.nid, c.cid, c.created
from comment c
inner join node n on n.nid = c.nid
left join comment c2 on (c.nid = c2.nid and c.cid < c2.cid)
where n.type = 'project_issue'
and c2.nid is null
) max_cid on max_cid.nid = max_timestamp.nid
where max_cid.created <> max_timestamp.created
limit 10; -- sanity

First result is:
#4354: Select from various users for assigning issues
which shows the bug.

Gotcha.

Now to just find the subset of those where it affects the node status. (E.g., if the issue has only ever been active, it doesn't matter which comment assigned the node's current revision and therefore its status.)

xjm’s picture

Query in #36 misses issues that were updated since the upgrade. This abomination of a query finds all potentially affected nodes. (Edit: It could probably be made less horrible but getting a one-time query down from 20 minutes to 10 or whatever is not worth half an hour of my time).

select broken_nodes.nid from (
  select max_cid.nid, max_cid.created as last_c_time, max_timestamp.created as last_time from (
    select c.nid, c.created
    from (select nid, cid, created from comment where from_unixtime(changed) < '2013-11-01 00:00:00') c
    left join  (select nid, cid, created from comment where from_unixtime(changed) < '2013-11-01 00:00:00')  c2 on c.nid = c2.nid and c.cid < c2.cid
    where c2.nid is null
  ) max_timestamp
  inner join (
    select c.nid, c.created
    from  (select nid, cid, created from comment where from_unixtime(changed) < '2013-11-01 00:00:00')  c
    left join  (select nid, cid, created from comment where from_unixtime(changed) < '2013-11-01 00:00:00')  c2 on c.nid = c2.nid and c.created < c2.created
    where c2.nid is null
  ) max_cid on max_cid.nid = max_timestamp.nid
  inner join node n on n.nid = max_cid.nid and n.type = 'project_issue'
  where max_cid.created <> max_timestamp.created
) broken_nodes
inner join node_revision last_c_nr on last_c_nr.timestamp = broken_nodes.last_c_time
inner join node_revision last_time_nr on last_time_nr.timestamp = broken_nodes.last_time
inner join field_revision_field_issue_status last_c_s on last_c_s.entity_id = last_c_nr.nid and last_c_s.revision_id = last_c_nr.vid
inner join field_revision_field_issue_status last_time_s on last_time_s.entity_id = last_time_nr.nid and last_time_s.revision_id = last_time_nr.vid
where last_time_s.field_issue_status_value <> last_c_s.field_issue_status_value
group by broken_nodes.nid
order by broken_nodes.nid desc
limit 1000
;

I currently have that monstrosity running on staging and I'll dump the results to CSV if it doesn't have to be killed. For the record, though, out of over 16,000 nodes filed against core's 8.x branch, it only appeared 20 or so were affected, unless I made a mistake somewhere. So I don't think this bug is worth taking d.o down to fix, especially since it only affects past data.

xjm’s picture

FileSize
7.35 KB

So more than 1000 (my query limit) issue nodes across all projects are potentially affected -- but the annoying thing is I couldn't figure out a way to actually say for sure if they were or not; I had to look at them in a browser. Sometimes the comment timestamp thing doesn't seem to cause the issue; sometimes it doesn't matter; sometimes it's been fixed since. (Of the core 8.x nodes I looked at before, maybe half of the issues matched by the query had actually been affected by the bug.)

But here are the results of the query above, the 1000 most recent issue nodes that meet those criteria. And with that I've had quite enough of this issue. :)

(Oh, in the process of trying to upload the query results, I noticed that if you try to upload a non-allowed extension, it just fails silently. Do we have an issue for that yet?) :P

jonathan1055’s picture

Issue summary: View changes

Thanks for all this, it is very useful to see your results. Several of the issues in the summary are in your list, but I've not checked them all.

... but the annoying thing is I couldn't figure out a way to actually say for sure if they were [affected] or not; I had to look at them in a browser

I thought I would try to tackle this not from the starting point (where comments have been edited) but from the end point (where issues have the wrong status), to give a different angle on the problem. I've had a look at the project_issue migration code but not studied it in detail enough to understand exactly what it does, but I think it creates node revision rows from the comments then independently sets the final node status from the latest datestamped comment. If this is the case, then we have a simpler way to detect at least some of the incorrect issues, without having to read the actual comments and times.

-- Join node, field_data_field_issue_status and field_revision_field_issue_status
-- to get status according to last revision, and return where this differs from
-- the current node status. This will only detect problem issues where there was
-- a status change on the final comment in D6, but this will be the case for many
-- so at least it will prove the concept. If it works, then the query can be 
-- expanded to extract the last occurrence of 'field_issue_status_value' even if
-- this was not changed on the current node vid.

select n.title
  ,n.nid
  ,n.vid
  ,fd.field_issue_status_value as current_status
  ,fr.field_issue_status_value as revision_status
from node n
left join field_data_field_issue_status fd
  on fd.entity_id = n.nid
left join field_revision_field_issue_status fr
  on fr.entity_id = n.nid
  and fr.revision_id = n.vid
where n.type = 'project_issue'
  and fd.field_issue_status_value <> fr.field_issue_status_value
order by nid, vid
limit 1000;

I do not have access to the Drupal tables as you do, but I wrote this using my own local dev site, with mocked-up incorrect issues. Do you think you could take a look and see if you could run it. I know that even if my assumptions are correct it will only pick up the bad issues where there was a status change on the final D6 comment, but that should include many of the issues, as lots should really be 'closed(fixed)' which is likely to be the final comment. If this sql does find some issues, then it proves the concept, and we can expand it to find those where the latest chronological status change is not in the last node_revision row.

Jonathan
[I also edited the summary to put the issues in node order and to remove two or three duplicates. I also split them into nodes which still have the problem and nodes which have since been fixed]

drumm’s picture

Assigned: Unassigned » drumm

The query I'm working with is:

SELECT n.nid, t.cur_sid, pi.sid FROM project_issue_migration_timeline t INNER JOIN node n ON n.nid = t.nid AND n.vid = t.vid INNER JOIN project_issues pi ON pi.nid = t.nid AND pi.sid <> t.cur_sid INNER JOIN field_data_field_issue_status fis ON fis.entity_id = t.nid AND fis.field_issue_status_value = t.cur_sid;

The project_issue_migration_timeline table was populated as part of the migration. Joining on vid limits to nodes which are still at the same revision. I don't really want to attempt updating nodes that have been updated by people in the meantime. Unfortunately, this table is huge and not generally useful in staging, so it it production only.

The join on field_data_field_issue_status double checks that the status from migration is the issues current status. We don't want to change the issue status if something else, like the issue auto-closer, has changed it before.

project_issues has the final issue statuses before migration, and should be the correct issue values.

drumm’s picture

FileSize
9.54 KB

Attached are the nodes my query would update, their current status, and the correct status.

drumm’s picture

The 75 core issues this will update are:

#62306: no php closing tag in some inc and module files in 4.7, #70722: Search results should respect the content type's "Display author and date information." option, #83738: LOWER(name) queries perform badly; add column name_lower and index., #93030: Who's online block reporting wrong, #107061: Add jQuery Teaser Splitter, #139290: When a user has no blog posts, a simple message should be present, #161354: Disabling a module which provides a text format leaves the text format behind, #172398: Improve security notice in installer, #185302: Take care of X-Forwarded-For header only if Remote-Addr is known, in ip_address(), #200586: file_check_location and non-existing directories, #227273: Use cache headers also for private downloads, #251868: Menu doesn't play well with taxonomy/term/%/all pages, #253577: Allow new comment timestamp to be passed, #280310: Force {users}.mail to lowercase, #300900: disable register_globals?, #301926: taxonomy_get_vocabularies and taxonomy_get_children overload my server, #302237: Longer string creates linebreak > text overlap on admin/user/user, #305653: Themes disabled during update, #332481: drupal_add_css doesn't seem to allow .php files parsed as .css files with variables, #357304: Cron seems not to be working despite it says so..., #362178: Add support for Firebird/Interbase, #362361: Drupal changing my form in block even I've got Full HTML, #394694: "content has been modified" message confusing, #425676: Unable to edit user account with PHP 5.3, #440344: Tests needed for forum blocks, #468534: Add a region at the top of the page above the header region., #477176: Invalid argument supplied for foreach() ... taxonomy.module on line 1214, #540340: Unable to change the field titles for Password fields on the user_profile_form., #580970: drupalsite.com/index.php should redirect drupalsite.com, #582414: English and native name for the Kyrgyz language has to be changed from 'Kirghiz(Кыргыз)' to 'Kyrgyz(Кыргыз тили)'., #588436: How can a user update a page that has been promoted to front page, #603210: Select different theme for anonymous user, #611480: _locale_import_one_string_db() should look at location for non-default textgroups, #658148: The "access user profiles" permission is poorly named, #679888: Upgrade to Mysql 5 breaks tables, #687588: Remove access check from submit() in UserCancelForm, #738132: form API doesn't properly update its cache, #752554: Content displays ok in editing mode and doesn't show in full node. How to change maximum node (not teaser) length limit?, #772328: Optionally disable using css_js_query_string, #835132: BreadCrumbs aren't showing in users profiles, only Admin name is --> remedy / solved, #838450: Migrated mysql db to localhost , #849862: Bartik code problems, #861136: Add ajaxSubmit reset and clear options, #899644: Explicit user account blocked check never called in user_pass_reset, #902736: cache_path fills up with invalid entries, #953722: Create/edit content type: Translation info missing in vertical tabs, #966530: Securing the default login in Drupal, #1008352: Upgrade D6.20 to D7 fails with PDO Exception, #1023748: D7 "Install From A URL" Error., #1036348: Collapsible Fieldgroups don't react on clicking anymore, #1117000: DatabaseSchemaObjectExistsException: Table field_data_field_my_field already exists, #1152132: Menu links of secondary menus moving to primary menu on basic page editing, #1269878: Missing closing " in .htaccess, #1319736: SimpleTest: When using drupalGet to send the cookie, it's sending Cookie header twice, so the $_COOKIE is not valid, #1404762: Undefined property: stdClass::$langcode all over the place, #1490336: Weight the system CSS at the top of the CSS_DEFAULT group instead of placing it in a separate aggregation group (CSS_SYSTEM), #1500302: Make the toolbar menu configurable per role/per user (D7), #1559678: Periodically lose default theme setting, #1574760: Add a drupalCreateComment() helper function, #1611280: SlideShow Help, #1622952: The right way to programatically render the preview of node object before the creation, #1690228: Where is the nodes of the site in the FTP?, #1814120: Comment put closed when status select closed.... , #1852026: Can I define the user id instead of having drupal automatically put it in during user create?, #1854672: remove seven_node_add_list() from core (update the markup in theme_node_add_list() instead), #1878982: Filter name 'default', #1920666: Allow additional external libraries with composer, #1935938: Drupal ahah implementation in form alter, #1977028: Remove RDF module from core, #2009672: [META] CodeSprint UA 2013, #2017497: Multilingual tour for content translation process, #2032773: Use Libricons (icon font) in Seven, consider using it more broadly in core, #2037549: Notice: Undefined index: display_field in file_field_widget_value() (line 587 of */modules/file/file.field.inc), #2042239: Fix DeleteQuery::execute() return value documentation and DeleteQuery_sqlite documentation (fix Delete::execute() documentation in Drupal 8), #2050531: Empty page in index.php/block/add

drumm’s picture

Status: Active » Fixed
FileSize
20.52 KB

I ran the attached script to correct the issue statuses.

webchick’s picture

Thanks!

Given that #2130059: Issue status change not consistently saved in node revisions is still an open issue though, I think we might continue to introduce new instances of this as time goes on. I hit a couple this morning where testbot didn't mark an issue back to needs work properly. (Unless that's different, not sure.)

drumm’s picture

TR’s picture

Thanks to everyone who worked on fixing this. It did fix all the re-opened issues I knew about, as well as a bunch that I hadn't found yet.

jonathan1055’s picture

Awsome work drumm, thanks very much. I had wanted to help, and got as far as creating my ssh key and getting it added to the dev server, but I could not connect and get any further, so could not test my queries.

Interesting to see the number fixed was 827.

drumm’s picture

jonathan1055 - Please confirm the key at https://drupal.org/user/92645/ssh-keys is good and (re?)open an issue in the infrastructure queue with the output of ssh -v jonathan1055@devwww.drupal.org. We do block IPs with multiple failed SSH attempts, so include your public IP address for potential unblocking. Or catch me in #drupal-infrastructure IRC, I'm in US Pacific time.

jonathan1055’s picture

Thanks for the help drumm, I will check and try again, then and follow up on the original #2163025: I would like my public key added to devwww rather than here.

Do you have text files for the drush log output your script created? It would be very informative for others who are following this issue to learn from your work, and to see how this type of problem can be fixed.

I noticed that your original node list posted on 31st Dec 01:19 [all times are UTC/GMT] had 836 nodes but the final script posted on 2nd Jan 19:43 only had 827. I compared the two lists and of the 9 nodes dropped from the second list:

However, I'm not saying that there is any more work to here, just recording the results of my checking. Once again thanks for all your effort on this, it is appreciated.

Jonathan

drumm’s picture

I didn't save the logs of these scripts.

Your list looks accurate. I do think #73019: Missing t()s in og_views was part of a test run.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.