I have storage setup with to work with Amazon S3 and everything is working fine except xmlsitemap complains on the status report:

The XML cached files are out of date and need to be regenerated. You can run cron manually to regenerate the sitemap files.

When I run cron I can see it regenerating the sites/default/files/storage/xmlsitemap/RAndOmTokeN/1.xml file but the status report still complains and http://mydomain.com/sitemap.xml doesn't exist.

Any suggestions?

P.S. I'm also using https://www.drupal.org/project/storage_api_stream_wrapper

Comments

emmonsaz created an issue. See original summary.

emmonsaz’s picture

Issue summary: View changes
perignon’s picture

Hrm... Do you see anything in the Drupal watchdog logs or in your PHP server logs?

perignon’s picture

Can you load the sitemap in the browser?

emmonsaz’s picture

Logs:

[NOTICE]
File instance destroyed: xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25KojD7u4e9aZwOM/1.xml
container: Filesystem,
class: Cloud,
storage_id: 54, file_id: 49, size: 946 B

[NOTICE]
The file was not deleted, because it does not exist.

[NOTICE]
File instance created: xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25KojD7u4e9aZwOM/1.xml
container: Filesystem,
class: Cloud,
storage_id: 55, file_id: 49, size: 946 B

[NOTICE]
File added: xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25KojD7u4e9aZwOM/1.xml
class: Cloud,
storage_id: 55, file_id: 49, size: 946 B

[ERROR]
PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'storage-api-public://xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25Koj' for key 'uri': INSERT INTO {storage_stream_wrapper} (storage_id, uri) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1); Array ( [:db_insert_placeholder_0] => 55 [:db_insert_placeholder_1] => storage-api-public://xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25KojD7u4e9aZwOM/1.xml ) in StorageApiStreamWrapper->stream_close() (line 215 of ...\sites\all\modules\contrib\storage_api_stream_wrapper\StorageApiStreamWrapper.inc).

[NOTICE]
Cron run completed.

emmonsaz’s picture

I can load sites/default/files/storage/xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25KojD7u4e9aZwOM/1.xml in a text editor and it has correct XML data but http://mydomain.com/sitemap.xml loads a generic browser page that says

This XML file does not appear to have any style information associated with it. The document tree is shown below.

And when I view source it's completely empty.

emmonsaz’s picture

I also get this Drupal message:

Warning: curl_setopt_array(): DrupalTemporaryStreamWrapper::stream_cast is not implemented! in StorageS3->request() (line 209 of ...\sites\all\modules\contrib\storage_api\services\s3.inc).

emmonsaz’s picture

That last warning may be due to me using PHP 5.5.30 See: https://www.drupal.org/node/1698110#comment-8553965

emmonsaz’s picture

When I try to regenerate the sitemap (/admin/config/search/xmlsitemap) I get this error:

An AJAX HTTP error occurred. HTTP Result Code: 500 Debugging information follows. Path: /batch?id=108&op=do StatusText: Service unavailable (with message) ResponseText: PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'storage-api-public://xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25Koj' for key 'uri': INSERT INTO {storage_stream_wrapper} (storage_id, uri) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1); Array ( [:db_insert_placeholder_0] => 56 [:db_insert_placeholder_1] => storage-api-public://xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25KojD7u4e9aZwOM/1.xml ) in StorageApiStreamWrapper->stream_close() (line 215 of ...\sites\all\modules\contrib\storage_api_stream_wrapper\StorageApiStreamWrapper.inc).

perignon’s picture

First we got to get back to a clean state. There is a problem in the table with a duplicate entry. Manually delete the conflicting entry for stream wrapper. I will ping the dev of the stream wrapper helper module

perignon’s picture

If you don't mind, can you give me the steps to reproduce this so I can simulate your setup exactly?

jonhattan’s picture

Project: Storage API » Storage API stream wrappers

Moving over to "Storage API stream wrappers" queue.

In case it is reproducible, the problem probably is in the way that xmlsitemap checks the existance or filesize of the sitemap file, in xmlsitemap_sitemap_get_max_filesize().

emmonsaz’s picture

Steps to reproduce:

1. Start with new vanilla Drupal 7 site (Standard install) using PHP 5.4.45
2. $ drush en -y xmlsitemap
3. $ drush xmlsitemap-regenerate
4. $ curl http://myactualdomain.com/sitemap.xml --> displays an XML sitemap with an entry for the homepage
5. $ drush dl storage_api-7.x-1.x-dev && drush en -y storage storage_core_bridge
6. $ drush xmlsitemap-regenerate
7. $ curl http://myactualdomain.com/sitemap.xml --> displays an XML sitemap with an entry for the homepage
8. $ drush dl storage_api_stream_wrapper-7.x-1.x-dev && drush dis -y storage_core_bridge && drush en -y storage_stream_wrapper
9. /admin/structure/storage/create-class --> Name = myclass (all other fields left alone)
10. /admin/structure/storage/stream-wrappers/storage-api-public/edit --> switch Storage class to "myclass"
11. /admin/config/media/file-system --> change Default download method to "Storage API Public (class: myclass)"
12. $ rm -rf sites/default/files/xmlsitemap && drush xmlsitemap-regenerate

WD storage_api_stream_wrapper: Operation not implemented dir_opendir()                       [error]
WD storage_api_stream_wrapper: Operation not implemented dir_readdir()                       [error]
WD storage_api_stream_wrapper: Operation not implemented dir_closedir()                      [error]
XML sitemap files regenerated in 279.03 ms. Peak memory usage: 15.5 MB.

13. $ curl http://myactualdomain.com/sitemap.xml --> Drupal's page not found error
14. $ drush xmlsitemap-regenerate

exception 'PDOException' with message 'SQLSTATE[23000]: Integrity constraint violation:      [error]
1062 Duplicate entry 'storage-api-public://xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25Koj'
for key 'uri'' in
DRUPALROOT\includes\database\database.inc:2171
Stack trace:
#0 DRUPALROOT\includes\database\database.inc(2171):
PDOStatement->execute(Array)
#1 DRUPALROOT\includes\database\database.inc(683):
DatabaseStatementBase->execute(Array, Array)
#2 DRUPALROOT\includes\database\mysql\query.inc(36):
DatabaseConnection->query('INSERT INTO {st...', Array, Array)
#3
DRUPALROOT\sites\all\modules\storage_api_stream_wrapper\StorageApiStreamWrapper.inc(215):
InsertQuery_mysql->execute()
#4
DRUPALROOT\sites\all\modules\xmlsitemap\xmlsitemap.generate.inc(320):
StorageApiStreamWrapper->stream_close()
#5
DRUPALROOT\sites\all\modules\xmlsitemap\xmlsitemap.generate.inc(320):
xmlsitemap_generate_page(Object(stdClass), 1)
#6 [internal function]: xmlsitemap_regenerate_batch_generate('NXhscRe0440PFpI...', Array)
#7 DRUPALROOT\includes\batch.inc(284):
call_user_func_array('xmlsitemap_rege...', Array)
#8 DRUPALROOT\includes\form.inc(4704): _batch_process()
#9
DRUPALROOT\sites\all\modules\xmlsitemap\xmlsitemap.module(1513):
batch_process()
#10
DRUPALROOT\sites\all\modules\xmlsitemap\xmlsitemap.drush.inc(49):
xmlsitemap_run_unprogressive_batch('xmlsitemap_rege...')
#11 [internal function]: drush_xmlsitemap_regenerate()
#12 ...\drush\drush\includes\command.inc(364):
call_user_func_array('drush_xmlsitema...', Array)
#13 ...\drush\drush\includes\command.inc(215):
_drush_invoke_hooks(Array, Array)
#14 [internal function]: drush_command()
#15 ...\drush\drush\includes\command.inc(183):
call_user_func_array('drush_command', Array)
#16 ...\drush\drush\lib\Drush\Boot\BaseBoot.php(65):
drush_dispatch(Array)
#17 ...\drush\drush\includes\preflight.inc(64):
Drush\Boot\BaseBoot->bootstrap_and_dispatch()
#18 ...\drush\drush\drush.php(12):
drush_main()
#19 {main}
emmonsaz’s picture

15. $ drush sql-query "delete from storage_stream_wrapper where uri like '%xmlsitemap%'"
16. $ drush xmlsitemap-regenerate

WD storage_api_stream_wrapper: Operation not implemented dir_opendir()  [error]
WD storage_api_stream_wrapper: Operation not implemented dir_readdir()                [error]
WD storage_api_stream_wrapper: Operation not implemented dir_closedir()               [error]
XML sitemap files regenerated in 79 ms. Peak memory usage: 14.5 MB.

17. $ drush xmlsitemap-regenerate --> same PDOException as above

Note: 1.xml file was being generated with storage_core_bridge but not with storage_stream_wrapper

emmonsaz’s picture

Warning: fopen(sites/default/files/storage/xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25KojD7u4e9aZwOM/1.xml): failed to open stream: No such file or directory in StorageApiStreamWrapper->stream_open() (line 268 of ...\sites\all\modules\storage_api_stream_wrapper\StorageApiStreamWrapper.inc).

emmonsaz’s picture

18. /admin/config/media/file-system --> Default download method = Public local files served by the webserver
19. $ drush dis -y storage_stream_wrapper && drush en -y storage_core_bridge
20. $ drush xmlsitemap-regenerate

XML sitemap files regenerated in 34 ms. Peak memory usage: 14.5 MB.

21. $ curl http://myactualdomain.com/sitemap.xml --> displays an XML sitemap with an entry for the homepage

emmonsaz’s picture

22. $ drush xmlsitemap-regenerate

XML sitemap files regenerated in 33 ms. Peak memory usage: 14.5 MB.

23. $ curl http://myactualdomain.com/sitemap.xml --> displays an XML sitemap with an entry for the homepage

emmonsaz’s picture

See attached for a proposed fix.

emmonsaz’s picture

Status: Active » Needs review
emmonsaz’s picture

Easier test plan:

1. Start with new vanilla Drupal 7 site (Standard install)
2. $ drush dl storage_api-7.x-1.x-dev storage_api_stream_wrapper-7.x-1.x-dev
3. $ drush en -y storage storage_stream_wrapper
4. /admin/config/media/file-system --> change Default download method to "Storage API Public (class: Everything)"
5. $ drush en -y xmlsitemap
6. $ drush xmlsitemap-regenerate
7. $ curl http://myactualdomain.com/sitemap.xml --> displays an XML sitemap with an entry for the homepage
8. $ drush xmlsitemap-regenerate --> no PDOException or other database error occurs
9. $ curl http://myactualdomain.com/sitemap.xml --> displays an XML sitemap with an entry for the homepage

emmonsaz’s picture

P.S. Any reason you add an "Operation not implemented dir_*dir()" watchdog error message for every cron? storage_core_bridge has placeholders but doesn't throw watchdog error messages: https://github.com/taz77/drupal_storage_api/blob/7.x-1.x/core_bridge/sto...

jonhattan’s picture

@emmonsaz this watchdog message is added elsewhere to fish when/who/why calls this method and have a reference to implement it! Please create a new issue if you have any clue.

I'll look to your patch asap.

ckng’s picture

Having the same issue. However, the patch #18 does nothing for me. Still getting PDOException.
In fact the same PDOException when doing migrate update.

$ drush xmlsitemap-regenerate
PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry                                                              [error]
'96014ef47c119ee77df2755980660854-xmlsitemap/NXhscRe0440PFpI5dSzn' for key 'fingerprint' in
/var/www/drupal/includes/database/database.inc:2171
Stack trace:
#0 /var/www/drupal/includes/database/database.inc(2171): PDOStatement->execute(Array)
#1 /var/www/drupal/includes/database/database.inc(683): DatabaseStatementBase->execute(Array, Array)
#2 /var/www/drupal/includes/database/mysql/query.inc(36): DatabaseConnection->query('INSERT INTO {st...', Array, Array)
#3 /var/www/drupal/includes/common.inc(7334): InsertQuery_mysql->execute()
#4 /var/www/drupal/sites/default/modules/contrib/storage_api/storage.module(693): drupal_write_record('storage_file', Array)
#5 /var/www/drupal/sites/default/modules/contrib/storage_api/selector.inc(294): _storage_file_id(Object(StorageTempURI),
'xmlsitemap/NXhs...', NULL)
#6 /var/www/drupal/sites/default/modules/contrib/storage_api_stream_wrapper/StorageApiStreamWrapper.inc(208):
StorageSelector->storageAdd(Array)
#7 /var/www/drupal/sites/default/modules/contrib/xmlsitemap/xmlsitemap.generate.inc(153): StorageApiStreamWrapper->stream_close()
#8 /var/www/drupal/sites/default/modules/contrib/xmlsitemap/xmlsitemap.generate.inc(320): xmlsitemap_generate_page(Object(stdClass), 1)
#9 /var/www/drupal/includes/batch.inc(284): xmlsitemap_regenerate_batch_generate('NXhscRe0440PFpI...', Array)
#10 /var/www/drupal/includes/form.inc(4712): _batch_process()
#11 /var/www/drupal/sites/default/modules/contrib/xmlsitemap/xmlsitemap.module(1513): batch_process()
#12 /var/www/drupal/sites/default/modules/contrib/xmlsitemap/xmlsitemap.drush.inc(49):
xmlsitemap_run_unprogressive_batch('xmlsitemap_rege...')
#13 phar:///usr/local/bin/drush/includes/command.inc(364): drush_xmlsitemap_regenerate()
#14 phar:///usr/local/bin/drush/includes/command.inc(215): _drush_invoke_hooks(Array, Array)
#15 phar:///usr/local/bin/drush/includes/command.inc(183): drush_command()
#16 phar:///usr/local/bin/drush/lib/Drush/Boot/BaseBoot.php(65): drush_dispatch(Array)
#17 phar:///usr/local/bin/drush/includes/preflight.inc(64): Drush\Boot\BaseBoot->bootstrap_and_dispatch()
#18 phar:///usr/local/bin/drush/includes/startup.inc(289): drush_main()
#19 phar:///usr/local/bin/drush/drush(114): drush_startup(Array)
#20 /usr/local/bin/drush(10): require('phar:///usr/loc...')
#21 {main}
ckng’s picture

The issue appears to be upstream in storage_api? The fingerprint unique keys is causing Integrity constraint violation in both of my cases, where file are being updated using the same name, with likely the same content.
- xmlsitemap re-generation
- migrate update on file or file field

    'unique keys' => [
      'fingerprint' => ['whirlpool', 'filename'],
    ],

That looks like the deduplication mechanism not handled properly? When there is deduplication it should be reused instead of throwing PDOException?

ckng’s picture

ckng’s picture

Still not completely working, if I split the xml sitemap into smaller chunks, creating multiple files, the sub-files are not accessible. Have more than 50K links, max 50K per files.
E.g.
http://mysite.com/sitemap.xml, with links to
http://mysite.com/sitemap.xml?page=1
http://mysite.com/sitemap.xml?page=2
http://mysite.com/sitemap.xml?page=3
...
http://mysite.com/sitemap.xml?page=11

All the subpages are giving error to Google.

This turns out to be unrelated, due to webserver config. Patch works great.

smaz’s picture

I'm having the same issue with:

xmlsitemap
storage_api
storage_api_stream_wrapper
Amazon S3 for file storage

When the sitemap regenerates, it tries to clear the sitemap directory first.

storage_stream_wrapper_load() is being given this directory (in my case: storage-api://xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25KojD7u4e9aZwOM), but fails to find any results as files are stored in the db with the filename - i.e. storage-api://xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25KojD7u4e9aZwOM/1.xml.

So it fails to delete the current sitemap, meaning it can't save the new one due to the file still existing.

It looks like the patch in #18 will update/merge when trying to add a file that already exists to the table. I'm not sure if that's the best way, as if a sitemap goes from 4 pages to 2 for example, the last two files will be left behind.

rahul_sankrit’s picture

'm using Xmalsitemap-7.x-2.3 and S3 File System (s3fs) - 7.x-2.8 but if I use with Amazon S3 File System for storage then I get a 404 - Page Not Found on /sitemap.xml.

In my case, the sitemap seems to work fine at other host but with Amazon S3 File System I got following message:

1. The requested page "/sitemap.xml" could not be found.

I ran the "drush xmlsitemap-regenerate" but I get following warnings:

XML sitemap files regenerated in 874.77 ms. Peak memory usage: 52 MB.
PHP Warning: fseek(): supplied resource is not a valid stream resource in /var/www/html/Project/sites/all/libraries/awssdk2/Guzzle/Stream/Stream.php on line 232
PHP Warning: array_key_exists() expects parameter 2 to be array, null given in /var/www/html/Project/includes/bootstrap.inc on line 3654
PHP Fatal error: Uncaught Error: Access to undeclared static property: Database::$activeKey in /var/www/html/Project/includes/database/database.inc:1522
Stack trace:
#0 /var/www/html/Project/includes/database/database.inc(2626): Database::getConnection()
#1 /var/www/html/Project/includes/cache.inc(349): db_escape_table('cache_bootstrap')
#2 /var/www/html/Project/includes/cache.inc(330): DrupalDatabaseCache->getMultiple(Array)
#3 /var/www/html/Project/includes/cache.inc(57): DrupalDatabaseCache->get('module_implemen...')
#4 /var/www/html/Project/includes/module.inc(754): cache_get('module_implemen...', 'cache_bootstrap')
#5 /var/www/html/Project/includes/module.inc(1083): module_implements('file_mimetype_m...')
#6 /var/www/html/Project/includes/file.mimetypes.inc(23): drupal_alter('file_mimetype_m...', Array)
#7 /var/www/html/Project/sites/all/modules/contrib/s3fs/S3fsStreamWrapper.inc(244): file_mimetype_mapping()
#8 /v in /var/www/html/Project/includes/database/database.inc on line 1522
PHP Notice: Undefined index: seekable in /var/www/html/Project/sites/all/libraries/awssdk2/Guzzle/Stream/Stream.php on line 232
PHP Fatal error: Uncaught Error: Access to undeclared static property: S3fsStreamWrapper::$mimeTypeMapping in /var/www/html/Project/sites/all/modules/contrib/s3fs/S3fsStreamWrapper.inc:244
Stack trace:
#0 /var/www/html/Project/sites/all/modules/contrib/s3fs/S3fsStreamWrapper.inc(768): S3fsStreamWrapper::getMimeType('s3://xmlsitemap...')
#1 [internal function]: S3fsStreamWrapper->stream_flush()
#2 {main}
thrown in /var/www/html/Project/sites/all/modules/contrib/s3fs/S3fsStreamWrapper.inc on line 244

Any Idea??

Thanks

jollysolutions’s picture

Status: Needs review » Reviewed & tested by the community