Problem/Motivation
The “HTML Filter” processor doesn’t work as expected with regard to the set HTML tags (even with the default tags). Specifically, the tags for HTML headings (<h1>
, <h2>
, <h3>
, …) do not work.
Proposed resolution
The cause of the problem is a bad regular expression for matching HTML tags in the input: '#^(/?)([-:_a-zA-Z]+)#'
I propose to change the regular expression to the following (see also the patch that I will attach in a minute): '#^(/?)([:_a-zA-Z][-:_a-zA-Z0-9.]*)#'
Remaining tasks
Someone with more insight into the module should review my patch. Perhaps some kind of unit test would be useful, too.
User interface changes
N/A
API changes
There are no changes to the API. Perhaps module users should update their search indexes after this change, though.
Comments
Comment #1
cspurk CreditAttribution: cspurk commentedComment #2
Yaron Tal CreditAttribution: Yaron Tal at One Shoe commentedTook me a while to figure this out. Will try to create a test to get this moving again.
Comment #3
Yaron Tal CreditAttribution: Yaron Tal at One Shoe commentedAdded a test-only and fix patch that apply to the latest dev.
Comment #6
Yaron Tal CreditAttribution: Yaron Tal at One Shoe commentedForgot adding the h3 score to the test. New set of patches.
Comment #7
Yaron Tal CreditAttribution: Yaron Tal at One Shoe commentedComment #9
eelkeblokI reviewed the patch and it looks good to me.
Comment #11
drunken monkeyOh, sorry for not seeing this earlier! Thanks for reviving that, and then even providing test coverage – awesome!
Committed the patch now.
Thanks again, everyone, and my apologies for the (considerable) delay!