I have storage setup with to work with Amazon S3 and everything is working fine except xmlsitemap complains on the status report:
The XML cached files are out of date and need to be regenerated. You can run cron manually to regenerate the sitemap files.
When I run cron I can see it regenerating the sites/default/files/storage/xmlsitemap/RAndOmTokeN/1.xml file but the status report still complains and http://mydomain.com/sitemap.xml doesn't exist.
Any suggestions?
P.S. I'm also using https://www.drupal.org/project/storage_api_stream_wrapper
| Comment | File | Size | Author |
|---|---|---|---|
| #18 | storage_api_stream_wrapper-xmlsitemap-compatibility-2632036-18.patch | 806 bytes | emmonsaz |
Comments
Comment #2
emmonsaz commentedComment #3
perignon commentedHrm... Do you see anything in the Drupal watchdog logs or in your PHP server logs?
Comment #4
perignon commentedCan you load the sitemap in the browser?
Comment #5
emmonsaz commentedLogs:
[NOTICE]
File instance destroyed: xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25KojD7u4e9aZwOM/1.xml
container: Filesystem,
class: Cloud,
storage_id: 54, file_id: 49, size: 946 B
[NOTICE]
The file was not deleted, because it does not exist.
[NOTICE]
File instance created: xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25KojD7u4e9aZwOM/1.xml
container: Filesystem,
class: Cloud,
storage_id: 55, file_id: 49, size: 946 B
[NOTICE]
File added: xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25KojD7u4e9aZwOM/1.xml
class: Cloud,
storage_id: 55, file_id: 49, size: 946 B
[ERROR]
PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'storage-api-public://xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25Koj' for key 'uri': INSERT INTO {storage_stream_wrapper} (storage_id, uri) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1); Array ( [:db_insert_placeholder_0] => 55 [:db_insert_placeholder_1] => storage-api-public://xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25KojD7u4e9aZwOM/1.xml ) in StorageApiStreamWrapper->stream_close() (line 215 of ...\sites\all\modules\contrib\storage_api_stream_wrapper\StorageApiStreamWrapper.inc).
[NOTICE]
Cron run completed.
Comment #6
emmonsaz commentedI can load sites/default/files/storage/xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25KojD7u4e9aZwOM/1.xml in a text editor and it has correct XML data but http://mydomain.com/sitemap.xml loads a generic browser page that says
And when I view source it's completely empty.
Comment #7
emmonsaz commentedI also get this Drupal message:
Comment #8
emmonsaz commentedThat last warning may be due to me using PHP 5.5.30 See: https://www.drupal.org/node/1698110#comment-8553965
Comment #9
emmonsaz commentedWhen I try to regenerate the sitemap (/admin/config/search/xmlsitemap) I get this error:
Comment #10
perignon commentedFirst we got to get back to a clean state. There is a problem in the table with a duplicate entry. Manually delete the conflicting entry for stream wrapper. I will ping the dev of the stream wrapper helper module
Comment #11
perignon commentedIf you don't mind, can you give me the steps to reproduce this so I can simulate your setup exactly?
Comment #12
jonhattanMoving over to "Storage API stream wrappers" queue.
In case it is reproducible, the problem probably is in the way that xmlsitemap checks the existance or filesize of the sitemap file, in xmlsitemap_sitemap_get_max_filesize().
Comment #13
emmonsaz commentedSteps to reproduce:
1. Start with new vanilla Drupal 7 site (Standard install) using PHP 5.4.45
2. $ drush en -y xmlsitemap
3. $ drush xmlsitemap-regenerate
4. $ curl http://myactualdomain.com/sitemap.xml --> displays an XML sitemap with an entry for the homepage
5. $ drush dl storage_api-7.x-1.x-dev && drush en -y storage storage_core_bridge
6. $ drush xmlsitemap-regenerate
7. $ curl http://myactualdomain.com/sitemap.xml --> displays an XML sitemap with an entry for the homepage
8. $ drush dl storage_api_stream_wrapper-7.x-1.x-dev && drush dis -y storage_core_bridge && drush en -y storage_stream_wrapper
9. /admin/structure/storage/create-class --> Name = myclass (all other fields left alone)
10. /admin/structure/storage/stream-wrappers/storage-api-public/edit --> switch Storage class to "myclass"
11. /admin/config/media/file-system --> change Default download method to "Storage API Public (class: myclass)"
12. $ rm -rf sites/default/files/xmlsitemap && drush xmlsitemap-regenerate
13. $ curl http://myactualdomain.com/sitemap.xml --> Drupal's page not found error
14. $ drush xmlsitemap-regenerate
Comment #14
emmonsaz commented15. $ drush sql-query "delete from storage_stream_wrapper where uri like '%xmlsitemap%'"
16. $ drush xmlsitemap-regenerate
17. $ drush xmlsitemap-regenerate --> same PDOException as above
Note: 1.xml file was being generated with storage_core_bridge but not with storage_stream_wrapper
Comment #15
emmonsaz commentedWarning: fopen(sites/default/files/storage/xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25KojD7u4e9aZwOM/1.xml): failed to open stream: No such file or directory in StorageApiStreamWrapper->stream_open() (line 268 of ...\sites\all\modules\storage_api_stream_wrapper\StorageApiStreamWrapper.inc).
Comment #16
emmonsaz commented18. /admin/config/media/file-system --> Default download method = Public local files served by the webserver
19. $ drush dis -y storage_stream_wrapper && drush en -y storage_core_bridge
20. $ drush xmlsitemap-regenerate
XML sitemap files regenerated in 34 ms. Peak memory usage: 14.5 MB.21. $ curl http://myactualdomain.com/sitemap.xml --> displays an XML sitemap with an entry for the homepage
Comment #17
emmonsaz commented22. $ drush xmlsitemap-regenerate
XML sitemap files regenerated in 33 ms. Peak memory usage: 14.5 MB.23. $ curl http://myactualdomain.com/sitemap.xml --> displays an XML sitemap with an entry for the homepage
Comment #18
emmonsaz commentedSee attached for a proposed fix.
Comment #19
emmonsaz commentedComment #20
emmonsaz commentedEasier test plan:
1. Start with new vanilla Drupal 7 site (Standard install)
2. $ drush dl storage_api-7.x-1.x-dev storage_api_stream_wrapper-7.x-1.x-dev
3. $ drush en -y storage storage_stream_wrapper
4. /admin/config/media/file-system --> change Default download method to "Storage API Public (class: Everything)"
5. $ drush en -y xmlsitemap
6. $ drush xmlsitemap-regenerate
7. $ curl http://myactualdomain.com/sitemap.xml --> displays an XML sitemap with an entry for the homepage
8. $ drush xmlsitemap-regenerate --> no PDOException or other database error occurs
9. $ curl http://myactualdomain.com/sitemap.xml --> displays an XML sitemap with an entry for the homepage
Comment #21
emmonsaz commentedP.S. Any reason you add an "Operation not implemented dir_*dir()" watchdog error message for every cron? storage_core_bridge has placeholders but doesn't throw watchdog error messages: https://github.com/taz77/drupal_storage_api/blob/7.x-1.x/core_bridge/sto...
Comment #22
jonhattan@emmonsaz this watchdog message is added elsewhere to fish when/who/why calls this method and have a reference to implement it! Please create a new issue if you have any clue.
I'll look to your patch asap.
Comment #23
ckngHaving the same issue. However, the patch #18 does nothing for me. Still getting PDOException.
In fact the same PDOException when doing migrate update.
Comment #24
ckngThe issue appears to be upstream in storage_api? The fingerprint unique keys is causing Integrity constraint violation in both of my cases, where file are being updated using the same name, with likely the same content.
- xmlsitemap re-generation
- migrate update on file or file field
That looks like the deduplication mechanism not handled properly? When there is deduplication it should be reused instead of throwing PDOException?
Comment #25
ckngPatch #18 works fine, my issues are related to storage api, see #2713761: File deduplication broken + PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry .
Comment #26
ckngStill not completely working, if I split the xml sitemap into smaller chunks, creating multiple files, the sub-files are not accessible. Have more than 50K links, max 50K per files.E.g.
http://mysite.com/sitemap.xml, with links to
http://mysite.com/sitemap.xml?page=1
http://mysite.com/sitemap.xml?page=2
http://mysite.com/sitemap.xml?page=3
...
http://mysite.com/sitemap.xml?page=11
All the subpages are giving error to Google.This turns out to be unrelated, due to webserver config. Patch works great.
Comment #27
smazI'm having the same issue with:
xmlsitemap
storage_api
storage_api_stream_wrapper
Amazon S3 for file storage
When the sitemap regenerates, it tries to clear the sitemap directory first.
storage_stream_wrapper_load() is being given this directory (in my case: storage-api://xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25KojD7u4e9aZwOM), but fails to find any results as files are stored in the db with the filename - i.e. storage-api://xmlsitemap/NXhscRe0440PFpI5dSznEVgmauL25KojD7u4e9aZwOM/1.xml.
So it fails to delete the current sitemap, meaning it can't save the new one due to the file still existing.
It looks like the patch in #18 will update/merge when trying to add a file that already exists to the table. I'm not sure if that's the best way, as if a sitemap goes from 4 pages to 2 for example, the last two files will be left behind.
Comment #28
rahul_sankrit commented'm using Xmalsitemap-7.x-2.3 and S3 File System (s3fs) - 7.x-2.8 but if I use with Amazon S3 File System for storage then I get a 404 - Page Not Found on /sitemap.xml.
In my case, the sitemap seems to work fine at other host but with Amazon S3 File System I got following message:
1. The requested page "/sitemap.xml" could not be found.
I ran the "drush xmlsitemap-regenerate" but I get following warnings:
XML sitemap files regenerated in 874.77 ms. Peak memory usage: 52 MB.
PHP Warning: fseek(): supplied resource is not a valid stream resource in /var/www/html/Project/sites/all/libraries/awssdk2/Guzzle/Stream/Stream.php on line 232
PHP Warning: array_key_exists() expects parameter 2 to be array, null given in /var/www/html/Project/includes/bootstrap.inc on line 3654
PHP Fatal error: Uncaught Error: Access to undeclared static property: Database::$activeKey in /var/www/html/Project/includes/database/database.inc:1522
Stack trace:
#0 /var/www/html/Project/includes/database/database.inc(2626): Database::getConnection()
#1 /var/www/html/Project/includes/cache.inc(349): db_escape_table('cache_bootstrap')
#2 /var/www/html/Project/includes/cache.inc(330): DrupalDatabaseCache->getMultiple(Array)
#3 /var/www/html/Project/includes/cache.inc(57): DrupalDatabaseCache->get('module_implemen...')
#4 /var/www/html/Project/includes/module.inc(754): cache_get('module_implemen...', 'cache_bootstrap')
#5 /var/www/html/Project/includes/module.inc(1083): module_implements('file_mimetype_m...')
#6 /var/www/html/Project/includes/file.mimetypes.inc(23): drupal_alter('file_mimetype_m...', Array)
#7 /var/www/html/Project/sites/all/modules/contrib/s3fs/S3fsStreamWrapper.inc(244): file_mimetype_mapping()
#8 /v in /var/www/html/Project/includes/database/database.inc on line 1522
PHP Notice: Undefined index: seekable in /var/www/html/Project/sites/all/libraries/awssdk2/Guzzle/Stream/Stream.php on line 232
PHP Fatal error: Uncaught Error: Access to undeclared static property: S3fsStreamWrapper::$mimeTypeMapping in /var/www/html/Project/sites/all/modules/contrib/s3fs/S3fsStreamWrapper.inc:244
Stack trace:
#0 /var/www/html/Project/sites/all/modules/contrib/s3fs/S3fsStreamWrapper.inc(768): S3fsStreamWrapper::getMimeType('s3://xmlsitemap...')
#1 [internal function]: S3fsStreamWrapper->stream_flush()
#2 {main}
thrown in /var/www/html/Project/sites/all/modules/contrib/s3fs/S3fsStreamWrapper.inc on line 244
Any Idea??
Thanks
Comment #29
jollysolutions