Hi,

On the module page it says:

"IMPORTANT: For nodes that were published BEFORE the installation of this module, we can not know the exact date of publication, so $node->published_at will initially contain the creation date."

However after installing the module the published_at does not contain the creation date, it is empty.

Not sure if this is relevant but I'm also using the workflow module.

Thanks,
Bez

CommentFileSizeAuthor
#61 publication_date--2983348-61.patch13.1 KBesolitos
#61 interdiff.txt577 bytesesolitos
#60 interdiff-2983348-56-60.txt354 bytesLeon Kessler
#60 publication_date-fill_published_at-2983348-60-for-composer.patch13.52 KBLeon Kessler
#56 publication_date-fill_published_at-2983348-56-without-infoyml.patch13.1 KBeelkeblok
#55 interdiff_54_55.txt352 bytesdutchyoda
#55 2983348-55.patch13.41 KBdutchyoda
#54 interdiff-53-54.patch2.31 KBadamzimmermann
#54 2983348-54.patch13.41 KBadamzimmermann
#53 interdiff_46-53.txt5.85 KBmsielski
#53 2983348-53.patch13.4 KBmsielski
#46 2983348-46.patch11.35 KBpfrenssen
#46 interdiff.txt346 bytespfrenssen
#45 interdiff.txt1.2 KBpfrenssen
#45 2983348-45.patch11.01 KBpfrenssen
#44 interdiff.txt3.89 KBpfrenssen
#44 2983348-44.patch10.95 KBpfrenssen
#39 interdiff.txt348 bytespfrenssen
#39 2983348-39.patch10.87 KBpfrenssen
#38 interdiff.txt3.32 KBpfrenssen
#38 2983348-38.patch10.87 KBpfrenssen
#36 2983348-36.patch9.66 KBidimopoulos
#36 interdiff.txt673 bytesidimopoulos
#32 interdiff.txt769 bytespfrenssen
#32 2983348-32.patch9.66 KBpfrenssen
#29 interdiff.txt9.17 KBpfrenssen
#29 2983348-29.patch9.63 KBpfrenssen
#27 interdiff.txt4.65 KBpfrenssen
#27 2983348-27.patch9.56 KBpfrenssen
#19 2983348-19.patch6.63 KBpfrenssen
#19 interdiff.txt488 bytespfrenssen
#18 interdiff.txt1.3 KBpfrenssen
#18 2983348-18.patch6.63 KBpfrenssen
#17 interdiff.txt4.63 KBpfrenssen
#17 2983348-17.patch5.33 KBpfrenssen
#16 interdiff.txt3.04 KBpfrenssen
#16 2983348-16.patch3.49 KBpfrenssen
#13 interdiff_8-13.txt1.38 KBszato
#13 2983348-published-on-empty-for-existing-nodes-13.patch6.52 KBszato
#8 2983348-8.patch6.17 KBwebflo
#7 2983348-7.patch4.9 KBwebflo
#5 2983348-5.patch1.71 KBwebflo
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

bezlash@gmail.com created an issue. See original summary.

iuana’s picture

Assigned: Unassigned » iuana
bezlash@gmail.com’s picture

Hi,

Just wondering what the progress with this issue is or if I am misunderstanding something?

May thanks,
Bez

webflo’s picture

The D8 Version does not provide this feature atm, because its not easy to build it in a scalable which works on a site with many thousands nodes.

webflo’s picture

Status: Active » Needs review
FileSize
1.71 KB

This should work as long the node module is used with the default storage backend (SqlContentEntityStorage).

Status: Needs review » Needs work

The last submitted patch, 5: 2983348-5.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

webflo’s picture

Status: Needs work » Needs review
FileSize
4.9 KB
webflo’s picture

And a test.

esolitos’s picture

The changes to _publication_date_update_existing() seems fine, but I don't see that's the relevancy of the change in PublicationDateNodePermissions/PublicationDatePermissions .

iuana’s picture

Assigned: iuana » Unassigned
jhuhta’s picture

Status: Needs review » Reviewed & tested by the community

Tested #8 and it works as expected.

@esolitos, you're right, the permissions change seems unrelated and out of scope of this issue. However, the change itself looks reasonable: don't just assume Node module is installed and then fail if the assumption goes wrong.

Etroid’s picture

This is great, might I suggest adding this part of an update hook as well to ensure people don't need to uninstall/install the module for this to work?

szato’s picture

Version: 8.x-2.0-beta1 » 8.x-2.x-dev
Component: Miscellaneous » Code
Status: Reviewed & tested by the community » Needs review
FileSize
6.52 KB
1.38 KB

I agree with @Etroid.

I added an update hook (skipping nodes which have already set published_at) + some PHPCS fixes.

pfrenssen’s picture

Status: Needs review » Needs work

Instead of creating our own code to copy the database values we should leverage the initial_from_field feature from the Schema API that is designed for this exact use case. See this change record for more info: Change record #2755201: Added support for a 'initial_from_field' field schema specification key.

In the install hook we should only set the values of the unpublished entities to NULL.

It is unfortunate that this module is already in beta status even though it is not feature complete. This means we still need the current code from the install hook and move it to the post update hook since initial_from_field only runs on installation and not on update.

pfrenssen’s picture

There are some unrelated changes in the patch that deal with permissions. We should remove these. I think it is fine to do small fixes such as addressing spelling errors and coding standards violations, but these should be limited to the files that are being touched by the patch. We should avoid changing any unrelated files.

pfrenssen’s picture

Assigned: Unassigned » pfrenssen
FileSize
3.49 KB
3.04 KB

Removed unrelated changes from the patch.

pfrenssen’s picture

Assigned: pfrenssen » Unassigned
Status: Needs work » Needs review
FileSize
5.33 KB
4.63 KB

Something like this could do the trick.

pfrenssen’s picture

This patch includes a post update hook that mimics the working of the initial_from_field. I took the approach from the earlier install hook.

pfrenssen’s picture

Forgot a semicolon.

idimopoulos’s picture

+++ b/publication_date.install
@@ -5,32 +5,45 @@
+      \Drupal::database()
+        ->update($node_storage->getDataTable())
+        ->expression('published_at', PUBLICATION_DATE_DEFAULT)
+        ->condition('status', 0)
+        ->execute();
+      \Drupal::database()
+        ->update($node_storage->getRevisionDataTable())
+        ->expression('published_at', PUBLICATION_DATE_DEFAULT)
+        ->condition('status', 0)
+        ->execute();

According to this, in the data table, if an entity is unpublished, it will take the PUBLICATION_DATE_DEFAULT value. The same will happen for all revisions.

However, I think this will not work as intended. The published_at is supposed to hold the initial publication date. After an entity is initially published, this publication date will persist throughout the lifetime of the entity.

However, if we consider an entity with 5 revisions:

| revision | status | publication date |
| 1        | 0      | PUBLICATION_DATE_DEFAULT |
| 2        | 1      | <actual publication date> |
| 3        | 0      | <actual publication date> |
| 4        | 1      | <actual publication date> |
| 5        | 0      | <actual publication date> |

The latest revision, 5, is not published but it still has a publication date. Same applies for revision 3. However, with this code, both of them will get the default date.

EDIT and disclaimer: I saw that the previous update was doing exactly the same, but I still think it was also wrong. If we go like this, loading an unpublished revision can report that the entity was never published while that is wrong.

idimopoulos’s picture

+++ b/publication_date.post_update.php
@@ -0,0 +1,36 @@
+  _publication_date_set_default_publication_date_for_unpublished_nodes();

If this runs on an update, it will fail with an exception that the function is not found. A require_once for the publication_date.install file must be included here as well.

In general, the update path is based on the previous already existing code which was also wrong.
As I noted above, an entity with multiple versions (published and unpublished) has to pass the publication date to every version regardless of status, after the initial publication date.
The publication date itself can be derived from the first version of the entity that was published.

I could not think of a way to do this with a quick direct query (I am not a mastermind in SQL queries) thus I do not think that this can be done on the hook_install method (from what I read, hook_install cannot be batchified.

What I propose we do is that:

  • On publication_isnstall we set a flag to the state like PUBLICATION_DATE_INITIALIZED to FALSE.
  • We create the publication_date_init hook that checks for the state and sets a warning for the administrator that the published_at property is not initialized.
  • We create a route with a form that calls for field initialization.
  • We loop through all nodes individually (I don't think we can avoid this)
  • Loops of 300 are a good candidate. I did update them with the following function
    /**
     * Initialize the published_at column.
     *
     * This function will iterate through all content and will apply the default
     * value to all revisions until the first published one and then, it will apply
     * the created date to all other revisions. When an entity is initially
     * published, then every subsequent revision should cary the same value
     * regardless of whether it is published or not.
     *
     * In the below example, the entity is created as an unpublished and is
     * published in the second version. All other versions retain the value.
     * @codingStandardsIgnoreStart
     * | revision | status | publication date          |
     * | 1        | 0      | PUBLICATION_DATE_DEFAULT  |
     * | 2        | 1      | <actual publication date> |
     * | 3        | 0      | <actual publication date> |
     * | 4        | 1      | <actual publication date> |
     * | 5        | 0      | <actual publication date> |
     * @codingStandardsIgnoreEnd
     *
     * The above approach is applied to the revision table. For the data table, the
     * published_at value will have 2 cases. It will receive the default value if
     * the entity has never been published or the publication date if the entity has
     * been published at least once, regardless of the current state.
     *
     * @param array $nids
     *   A list of nids to process
     */
    function _publication_date_initialize_published_at($nids) {
      $database = \Drupal::database();
      $node_storage = \Drupal::entityTypeManager()->getStorage('node');
      
      $query = $database->select('node', 'n');
      $query->leftJoin('node_field_revision', 'nr', 'n.nid = nr.nid');
      $query->fields('nr', ['nid', 'vid', 'changed']);
      $query->condition('status', 1);
      $query->condition('n.nid', $nids, 'IN');
      $query->groupBy('nr.nid, nr.vid, nr.changed');
      $query->orderBy('nr.vid', 'DESC');
    
      $node_data = $query->execute()->fetchAllAssoc('nid');
      foreach ($nids as $nid) {
        if (empty($node_data[$nid])) {
          // Node has never been published. Set the default value to all revisions.
          $database
            ->update($node_storage->getDataTable())
            ->expression('published_at', PUBLICATION_DATE_DEFAULT)
            ->condition('nid', $nid)
            ->execute();
          $database
            ->update($node_storage->getRevisionDataTable())
            ->expression('published_at', PUBLICATION_DATE_DEFAULT)
            ->condition('nid', $nid)
            ->execute();
        }
        else {
          // Node has been published at least once. The data table will receive the
          // publication date while for the revision table, the publication date
          // will be set for all versions including and following the first
          // published version. All others will receive the default value.
          $first_published_timestamp = $node_data[$nid]->changed;
          $first_published_vid = $node_data[$nid]->vid;
    
          // Set the revisions that will receive the publication timestamp.
          $database
            ->update($node_storage->getDataTable())
            ->expression('published_at', $first_published_timestamp)
            ->condition('nid', $nid)
            ->execute();
          $database
            ->update($node_storage->getRevisionDataTable())
            ->expression('published_at', $first_published_timestamp)
            ->condition('nid', $nid)
            ->condition('vid', $first_published_vid, '>=')
            ->execute();
    
          // The revisions that are left have a null publication date and need to
          // receive the default value.
          $database
            ->update($node_storage->getRevisionDataTable())
            ->expression('published_at', PUBLICATION_DATE_DEFAULT)
            ->condition('nid', $nid)
            ->isNull('published_at')
            ->execute();
        }
      }
    }
    

I did not implement the rest as I am not sure if you like the solution. However, this is the one thing that worked for me. The current solution created wrong data in my database.

idimopoulos’s picture

+++ b/tests/src/Functional/PublicationDateInstallationTest.php
@@ -0,0 +1,30 @@
+  public function testPublicationDateInstall() {
+    $publishedNode = $this->createNode(['status' => 1]);
+    $unpublishedNode = $this->createNode(['status' => 0]);

We should also have more tests for this,e.g. entity that has been updated a couple of times with different status.

pfrenssen’s picture

Looping over all nodes is a possible solution but it is not very efficient. I have looked into this and it seems to be possible to update the revisions table in a single query when using a subquery:

UPDATE node_field_revision r, (
  SELECT nid, vid, changed 
  FROM node_field_revision
  WHERE status = 1
  GROUP BY nid
  ORDER BY vid
) s
SET r.published_at = s.changed
WHERE r.nid = s.nid AND r.vid >= s.vid;

I tested this on our production database and it updated 23000 records in less than half a second:

MariaDB [joinup]> update node_field_revision r, (select nid, vid, changed as published from node_field_revision where status = 1 group by nid order by vid) s set r.published_at = s.published where r.nid = s.nid and r.vid >= s.vid;
Query OK, 23601 rows affected (0.441 sec)
Rows matched: 23611  Changed: 23601  Warnings: 0

I'll assign this to me to roll this into a patch.

idimopoulos’s picture

@pfrenssen, keep in mind that probably, in order to be more compatible with the databases, add all fields in the GROUP BY clause, like GROUP BY nid, vid, changed because most mysql instances now have it as a forced option to have all fields in the SELECT clause listed also under GROUP BY.

Edit: Since the vid is the unique key, your query is safe to not produce different results than your current query. For more info (even though your probably already know about it) search for the ONLY_FULL_GROUP_BY flag in SQL.

idimopoulos’s picture

Sorry, good job BTW, I was simply trying to avoid direct queries because I am not sure whether it will affect different database systems.

pfrenssen’s picture

Assigned: Unassigned » pfrenssen

Going to work on this. I am going to start by update the test coverage to prove that our previous approach is wrong.

pfrenssen’s picture

Here is a test. Next up I am going to implement the update query.

pfrenssen’s picture

@idimopoulos I researched the ONLY_FULL_GROUP_BY limitation you mentioned (more specifically, only_full_group_by Improved, Recognizing Functional Dependencies, Enabled by Default.

So there is a history to this feature and as far as I can see my query is compatible with the SQL-2011 standard but not with the SQL-92 standard. It should work on most current databases including MariaDB (however this might depend on this issue), MySQL and postgres.

I think to be sure we will need some testing on different databases. I am running MariaDB 10.4.7.

pfrenssen’s picture

Here is a patch. My initially proposed query wasn't working on MariaDB as I feared. I came up with an alternative after some iterations. In testing this updates ~23000 nodes in around half a second.

SocialNicheGuru’s picture

Status: Needs review » Needs work

I get this error after applying the latest patch in #29

ParseError: syntax error, unexpected '' (T_ENCAPSED_AND_WHITESPACE), expecting '-' or identifier (T_STRING) or variable (T_VARIABLE) or number (T_NUM_STRING) in drupal8.7.5/html/modules/contrib/publication_date/publication_date.install on line 89

pfrenssen’s picture

@SocialNicheGuru How does line 89 look for you? I checked it but when I apply the patch on the latest 8.x-2.x branch then line 89 is the following:

      \Drupal::database()->query($query_data['query'], $query_data['arguments']);

This looks correct to me.

pfrenssen’s picture

Status: Needs work » Needs review
FileSize
9.66 KB
769 bytes

It was the HEREDOC syntax, the trailing comma is only allowed in PHP 7.3. In PHP 7.2 and earlier it needs to be on a separate line.

idimopoulos’s picture

Status: Needs review » Needs work

The test fails with an error about composer being not on a proper version :/

idimopoulos’s picture

  1. +++ b/publication_date.install
    @@ -5,32 +5,95 @@
    + * Helper function to clear the published on date for unpublished nodes.
    

    to clear => to populate.

  2. +++ b/publication_date.install
    @@ -5,32 +5,95 @@
    + * This function makes sure that any existing nodes have the publication date
    

    I think the any existing nodes have is not correct (sorry for the grammar nazism..)

    any existing nodes have => any existing node has

  3. +++ b/publication_date.install
    @@ -5,32 +5,95 @@
    +UPDATE {node_field_revision} r, (
    +  SELECT
    +    nid,
    +    MIN(vid) as vid,
    +    MIN(changed) as changed
    +  FROM {node_field_revision}
    +  WHERE status = 1
    +  GROUP BY nid
    +  ORDER BY vid
    +) s
    

    I have tested the query and confirm that it is working. Nice one! However, these queries need to be tested as well in other systems, right?

  4. +++ b/publication_date.post_update.php
    @@ -0,0 +1,16 @@
    +function publication_date_post_update_populate_publication_dates() {
    +  $module_path = DRUPAL_ROOT . '/' . drupal_get_path('module', 'publication_date') . '/publication_date.install';
    +  require_once $module_path;
    +
    +  _publication_date_populate_database_field();
    +}
    

    Why do we need this? The update is set to run in the hook_install.
    Also, if these queries do not run in the hook install successfully, it will be a broken database.
    If for any reason, this would be to be called in the post update, the values would need to be cleared before run again.

    Also, for projects that have the patch applied already, it will update some values somehow. Either we should drop this entirely, or clean the values beforehand if you insist that we should have an update path after the patch for some reason. Still, the patch is built to configure this on module install which was not being done up to now.

pfrenssen’s picture

  1. to clear => to populate.

    Absolutely agree, good find!

  2. I think the any existing nodes have is not correct (sorry for the grammar nazism..)

    I agree, it should be all existing nodes have.

  3. It is only tested on MySQL and MariaDB now, and as a precaution I added a switch that will skip the queries and output a warning when run on a system that doesn't run MySQL/MariaDB. If someone can test this on postgres we can add this.
  4. The module has a stable release, but anyone that was using this on an system with pre-existing nodes will not have the published on field populated. The idea behind the update hook is to make sure that when people update to the next version that their existing nodes will have receive the right publication date.
idimopoulos’s picture

Status: Needs work » Needs review
FileSize
673 bytes
9.66 KB

I agree with the latest two points. Fixing the 2 first really quick. It is RTBC from me for MySQL and MariaDB. Possibly, it still needs checks from other db systems.

pfrenssen’s picture

Status: Needs review » Needs work

I have found a case in which this code results in the wrong publication date: if revisioning is turned off and a published node is edited then the changed date will be newer than the creation date. In these cases we cannot be certain of the publication date, but it is safer to go with the creation date than the changed date.

pfrenssen’s picture

Assigned: pfrenssen » Unassigned
Status: Needs work » Needs review
FileSize
10.87 KB
3.32 KB

I added a test for the missing case, and updated the patch to fix it.

pfrenssen’s picture

Small cleanup.

idimopoulos’s picture

Status: Needs review » Reviewed & tested by the community

Wow, @pfrenssen thanks for the update. Indeed this edge case slipped our attention! Nice to have an extension in our tests. I will RTBC it since I don't see any movement for other db systems.

idimopoulos’s picture

Status: Reviewed & tested by the community » Needs work

I am taking back the RTBC.
Reading from the Issue scope

"IMPORTANT: For nodes that were published BEFORE the installation of this module, we can not know the exact date of publication, so $node->published_at will initially contain the creation date."

However, the main property that is used in the patch to derive the publication date is the changed date.

This can lead on multiple issues. First, the changed date of the revision is not permanent either. We created an extension of the patch to cover cases where the entity has only one revision. In these cases, we take the created time because it it the "best approach".
However, the created time of an entity that has only one revision, can be far in the past, way further than the last update time. On the other side, on a normal site, an entity can be updated multiple times before adding a new revision, either manually or through some kind of workflow.
I don't think it is a very good practice to assign the changed property to cases where the entity has multiple revisions and the created date where entities have but a single revision.

What I would suggest is to always use the creation time of the version that has the actual publication time. This can be done by indeed using the created property in case the entity has only one revision or revisions are disabled, and use the revision_timestamp property from the node_revision of the first revision that was published in case we have more than one revisions.

pfrenssen’s picture

Assigned: Unassigned » pfrenssen

Thanks for the review! I will investigate this and provide a fix + additional test coverage.

pfrenssen’s picture

I can confirm that the approach suggested by @idimopoulus in #2983348-42: Published on empty for existing nodes is correct. The current patch is mistakenly using the created timestamp from the revision, but this does not contain the creation time of the revision. Instead it contains the creation time of the node, as it was edited in the node form when the user submitted the form.

We should use the revision_timestamp column from the node_revision table. The Entity API method for this is NodeInterface::setRevisionCreationTime().

I will update the test to use this API call and update the code accordingly.

pfrenssen’s picture

FileSize
10.95 KB
3.89 KB

Updated the test. It is failing now as expected. Will update the queries next.

pfrenssen’s picture

pfrenssen’s picture

Some of the other tests fail with the message that the node revisions tables don't exist. This is possibly because the dependency on the node module is missing.

pfrenssen’s picture

Assigned: pfrenssen » Unassigned

And it's green! The composer error is because Drupal 8.8 is not compatible with PHP 5.5 any more, this is out of scope for this issue and handled in #3076577: Run 8.x-2.x branch tests with >= PHP7.

idimopoulos’s picture

Now we are talking about RTBC :D Thanks @pfrenssen for the changes. Sorry for the wrong initial RTBC.

idimopoulos’s picture

Status: Needs review » Reviewed & tested by the community

Oops. Forgot the status change.

webflo’s picture

Thanks for the hard work. The patch in #8 hat an additional check for SqlContentEntityStorage. Should we add this checks? Since the queries are specific to this schema.

pfrenssen’s picture

Yes it would be a good idea to check that we are on SqlContentEntityStorage.

adamzimmermann’s picture

This patch in 46 was failing to apply for me when I was adding it using Composer.

It would apply perfectly when I cloned this module's code though.

I eventually tracked this down to the information that Composer adds to a module's .info.yml file.

I fixed it by adding this to my Composer file.

        "preferred-install": {
            "drupal/publication_date": "source",

#3036459: Packaging info from .info.yml often creates conflicts when patching explains this more. Figured I would share here in case this helped anyone else and to see if I was missing an alternate approach.

The patch seems to be working great though otherwise. Thanks!

msielski’s picture

adamzimmermann’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
13.41 KB
2.31 KB

Re-rolled the patch for the new 8.x-2.0-beta2 release.

dutchyoda’s picture

I encountered a problem applying #54

../2983348-54.patch:136: trailing whitespace.
  } 
warning: 1 line adds whitespace errors.

I only removed the trailing whitespace.

eelkeblok’s picture

Co-worker of dutchyoda here. I think actually the main problem is that the patch contains changes to the .info.yml and since we are installing the module through composer, the .info.yml will be different from the Git version. Here is a patch without the change to .info.yml (just to be able to apply though composer, the added dependency on the node module is actually required, strictly speaking).

Am I correct in saying that since the start of this issue, the scope has actually reduced? It looks like the latest version actually contains a mechanism to fill the DB field upon installation. That might leave the Postgres compatibility and the post_update hook.

jlancaster’s picture

#56 seems to work great for me on a very old D7 -> D8 site conversion where I really needed to import an assumed published on date that matched the authored on date. Thanks.

chefigueroa’s picture

#56 worked like a charm, thanks!

couloir007’s picture

As far as can tell the most recent publication_date:^2.0@beta has this patch added but it doesn't work on 9.3.12. I tried applying this patch first and it sort of worked, it the date applied was not the authored on date.

Leon Kessler’s picture

Attaching patch from #54/#55/#56, including the .info.yml changes (which are required for the test runner to work).

The patch is based off the packagist version of the module, so applies cleanly. This was suggested in #2858245: Patching .info.yml files.

@couloir007 I don't think your comment is correct. The patch is not included yet in the module repo, it is working for me using latest Drupal 9.3.

esolitos’s picture

The update hook is not working on Drupal core 9.4 because of the change Core provided database drivers moved to their own modules.

This is a re-roll of #56 compatible with core >= 9.4

  • webflo committed 665015d on 8.x-2.x
    Issue #2983348: Convert code to short array
    
  • webflo committed 88dfbc4 on 8.x-2.x
    Issue #2983348 by pfrenssen, webflo, adamzimmermann, idimopoulos,...
webflo’s picture

Status: Needs review » Fixed

Fixed the failing test and converted the whole code base to short array syntax. Thanks to all!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.