Problem/Motivation
When heading ID is generated, ToC API only takes into account IDs it generates, but not those already present in the
source. There is a chance that user generated content would have a duplicating ID. The chances are higher for the IDs
being generated based on the heading title.
Steps to reproduce
Generate ToC for the following HTML:
<p><a id="documents"></a></p>
<h3>Documents</h3>
Proposed resolution
Take into account IDs from the source for ToC.
Remaining tasks
User interface changes
API changes
Data model changes
| Comment | File | Size | Author |
|---|---|---|---|
| #8 | toc_api-3360540-5-8-interdiff.txt | 3.38 KB | rosk0 |
| #8 | toc_api-3360540-8.patch | 6.33 KB | rosk0 |
Comments
Comment #2
rosk0Here is a patch which fixes and issue. Together with test. Test passes locally , but it can't be run in CI as all the other tests in the module are outdated and broken.
Comment #3
ericgsmith commentedPatch looks good - fixes the error with duplicate IDs.
Test is good too - as noted by you it could be a unit test, I think given currently the tests are not running that this shouldn't prevent it being RTBC and a follow up issue can be created to get the existingtests running and passing.
Comment #4
ericgsmith commentedSorry, was a bit premature on setting RTBC.
On closer look the test and code assumes that the ID precedes the generated ID, e.g
Will still produce duplicate IDs.
Comment #5
rosk0Thanks for the review Eric and picking that issue up!
Also spotted this.
When ToC API faces this type of HTML:
with now deprecated,
nameattribute.It produces this:
and throws this warning:
So I added a workaround and test for that as well.
Comment #6
ericgsmith commentedNice work - looks good to me.
Ran test locally on D10 / PHP 8.1 and all green.
Comment #7
rosk0Nice , thanks!
I was testing on D9.5, PHP 8.0 so we have a good test coverage!
Comment #8
rosk0We've identified a huge performance hit with introduced ID deduplication code on larger HTML documents. For the context, the documents we were testing this are around 5 Mb of plain HTML and are very complex , with lots of tables , headings and ID attributes.
Rendering the page in development environment was taking around 15 min to render and failing at times.
With this new approach based on XPath filtering , rather than whole DOM processing, it is now way more performance and we saw results of under 60 seconds to render on a cold cache.
Based on the XPath spec:
So we shouldn't loose heading order here. Tests to prove this would be nice to have surely, I don't have capacity for that at the moment.
Comment #9
ericgsmith commentedAwesome work! Significant improvement when dealing with large content.
Comment #10
rosk0We have this patch running in production since 2023-06-27 with no negative effects found to date.
Comment #12
vladimirausThank you! Commited and released! 🍻