Problem/Motivation
Characters with Māori macrons are stripped from ids, so heading `Tā mātou` becomes `t-mtou`. It would be best to allow the macrons through, but since convertStringToId() uses regex \w to strip all non-Latin characters, it would require ToC API to offer a per-site preference for convertStringToId().
Steps to reproduce
Create content with heading <h2>Tā mātou</h2>. Apply ToC filter. Observe generated anchor id is t-mtou.
Proposed resolution
Refactor convertStringToId() to enable a per-site preference to allow specific non-latin characters to remain. This would likely entail revising the use of regex \w. A less desirable work-around is to map characters with macrons to Latin characters, but this can change the meaning of a word so is not ideal. A patch for the work-around is offered below.
Remaining tasks
Implement site preference described above.
User interface changes
Add settings form for per-site preference.
API changes
Unsure.
Data model changes
Would need to save site preference in config.
| Comment | File | Size | Author |
|---|---|---|---|
| #4 | map-macrons-to-latin-3348964-4.patch | 2.38 KB | jonathan_hunt |
Comments
Comment #2
jonathan_hunt commentedComment #3
jonathan_hunt commentedComment #4
jonathan_hunt commentedPatch to map macrons to latin as a work-around until site preferences for non-Latin characters can be implemented.
Comment #5
rosk0I believe currently suggested in the patch implementation is the best way forward - Mozilla suggests using only ASCII characters in the ID attribute value and I'm completely support this recommendation:
Comment #7
vladimirausThank you! Released and committed! 🥂
Comment #8
vladimiraus