In my opinion (and I fully expect to get shot down, but that's what suggestions are for) the name of check_markup() is problematic in several ways:
1.) False friends: check_markup()'s nominal counterpart is historically the check_plain() function, which fits on some level since you call them in similar ways and sometimes for similar reasons. But check_plain() is in common.inc, while check_markup() is part of filter.module. This distinction goes very deep; check_plain() is a basic library function while check_markup() is high-level and tied to a site's input format settings.
2.) Code refactoring: As mentioned above, check_markup() is part of filter.module. Almost all non-theme module functions are prefixed with the name of the module as a general policy.
3.) Purpose: When you "check" something, you make sure it is safe, correct, valid, etc. This may work for check_plain(), which is an output security feature. check_markup() can take the same role with certain filters (HTML correction, HTML restriction). But it can also do something completely different, like evaluate PHP, render BBCode, generate syntax-highlighted source code... none of which have to do with output security.
-
I don't have an alternative suggestion myself, except that it should probably begin with "filter_". But I do think what we have now could be improved, and since we are renaming functions already (drupal_execute to drupal_form_submit, for example) it might be a good idea to look at it for Drupal 7.
Comment | File | Size | Author |
---|---|---|---|
#62 | filter-rename-check_markup-455724-62.patch | 12.99 KB | cburschka |
#60 | filter-rename-check_markup-455724-60.patch | 12.83 KB | cburschka |
#42 | 455724-filter-markup.patch | 12.77 KB | agentrickard |
#12 | filter-rename-check_markup-455724-12.patch | 9.56 KB | cburschka |
#2 | filter-rename-check_markup-455724-1.patch | 9.55 KB | cburschka |
Comments
Comment #1
gregglescheck_plain - sanitizes text
check_markup - filters text, may sanitize, may not
filter_xss_admin - sanitizes text more liberally
I agree, this is a naming inconsistency and a DX issue that leads to a potential security issue.
Maybe check_markup could be changed to something about "running the filter" or filter_text or apply_filter.
Comment #2
cburschkafilter_output()
is probably not a perfect name either, but here is a patch to get things rolling.Edit: This comes from not reloading the page in time! :)
We have a number of possibilties beside the one in this patch, including
filter_apply()
,filter_run()
,filter_text()
,filter_markup()
orfilter_execute()
.Comment #3
Heine CreditAttribution: Heine commentedAs filtertext and filterformat are always needed both, why not make a richtext object offered by filter module with a method that outputs HTML?
Comment #4
gregglesI think in 7.x we are moving to "[verb]_[noun]" style, which is where I got my suggestions. filter_output is in that format as well, and does a good job of communicating the idea that Drupal filters on output, but we aren't really filtering "output" we're filtering input and making it output.
If we do this, then perhaps we should reevaluate filter_xss and any other helper functions (maybe there aren't any, but I don't know) as well.
Comment #5
Heine CreditAttribution: Heine commentedcheck_plain converts plain text to HTML.
check_markup converts richtext (RT) to HTML.
filter_output doesn't tell me that either.
Comment #6
gregglescheck_markup does a ton of other things as well.
Another related function worth reconsidering:
drupal_html_to_text
Heine pointed out in irc (and I somewhat agree) that check_markup will always return HTML. That's kind of true, though it's possible that a chunk of php would return an empty string or that an input format which has no filters assigned to it can return any text instead of HTML. To be really accurate the new name for check_markup should either be agnostic or we should change the way it works to always return HTML (I prefer the first solution).
I'd be happy with something like:
drupal_html_to_text -> leave alone
check_plain -> drupal_text_to_html
filter_xss, filter_xss_admin leave alone.
check_markup -> filter_text
This follows the logic that the functions with known output be called drupal_[something]_to_[otherthingthing] and that functions which do a more complex filtering or which have a specific filtering purpose are called filter_*
Comment #7
cburschkaIt makes sense for drupal_html_to_text to be named this way, and it also makes sense for drupal_text_to_html to be named this way. But they will need comments to clarify that their operation is not symmetric or even related (check_plain escapes HTML-relevant characters, while d_h_t_t generates a "close-enough" plaintext version of HTML specifically for emails - it lives in mail.inc).
filter_text() sounds good, though.
Still leaving on NR for more comments.
Comment #8
Heine CreditAttribution: Heine commentedThat's kind of beside the point as the output is slung into the content as if it _was_ HTML.
Comment #9
catchfilter_text($text, $format) is really tidy.
And we have 'text formats' now instead of input formats so it's pretty self explanatory there as well.
Comment #11
cburschka#367214: hook_node_alter almost disappeared ? broke HEAD. Retest.
Comment #12
cburschkaHere is a new patch using filter_text().
Note that check_markup is a popular function that contrib may call as well, so this change should be mentioned in the module updating guide.
Comment #14
cburschkaTestbot is silly.
Comment #15
sunGood.
This API change would be pretty nonsense on its own, but luckily we can accompany this change with some real value: #446518: Remove $check argument from check_markup()
Most probably, however, this means that we need to merge this patch into the other issue.
Comment #16
dmitrig01 CreditAttribution: dmitrig01 commentedCheck_plain doesn't check if something is plain - it converts it.
Comment #17
sunAnd?
Comment #18
Dries CreditAttribution: Dries commented(Somewhat off-topic: newer versions of PHP sports some filter mechanisms. I haven't looked at them recently, but I wonder if it makes sense to bring ours closer to PHP's? It could help developers.)
Comment #19
agentrickardSemantically, I suppose I'd like to see some consistent use of
filter_[name]
where the second element indicates what will be output. That might give us:And so forth. Not sure about the last function, since it effectively returns HTML.
Comment #20
dmitrig01 CreditAttribution: dmitrig01 commentedto me, filter_X implies that I want the input to be X (filter text means I want to do some filtering on a piece of text, not that I want to output text).
Comment #21
cburschkaIn fact, all of these functions return HTML.
check_plain()
takes plain text and escapes characters that are reserved in HTML. The result is valid HTML markup that will show the plain content (albeit with collapsed whitespace) when viewed in a browser.Consistency is good, but "
filter_[type]
" sounds like [type] is what goes in, not what comes out. One would expectfilter_html
to accept html to be filtered (which it may do, but which isn't its defining purpose). filter_markup() may be better.There is one point this change *wouldn't* address, though: check_plain() is not a filter.module function. It is a common.inc function. It is used at levels that don't even know what a module or a database is - see conf_init() in bootstrap.inc. check_plain() cannot be moved out of common.inc for this reason. If we rename it to filter_plain(), we've got the opposite of our current problem: Instead of a filter.module function being named as if it was elsewhere, we'd have a common.inc function named as it if was part of filter.module.
Comment #22
agentrickard/me almost concedes the point.
PHP uses filter_* for these functions -- http://us.php.net/manual/en/book.filter.php -- which effectively negates dmitri's argument.
If we did not have Drupal-specific functions for this, we would use something like:
See -- http://us.php.net/manual/en/filter.filters.sanitize.php -- for the sanitize tokens.
Would it make sense to use
secure_
as the function namespace?e.g.
This namespace is not used by PHP or Drupal ATM.
The other option, it seems to me is to use the new PHP 5 functions and dustbin our old Drupal-specific ones.
Comment #23
sunMixing check_plain() in here and trying to unify these function names is wrong. check_plain() and check_markup() are completely different functions. The resulting content of a variable processed by check_markup() can be anything but not necessarily safe for HTML. What you really mean are the filter_xss*() functions, which are about to be moved out of filter.module: #470632: Move filter_xss*() into common.inc
That's why I agree that check_markup() is poorly named and should using a prefix of filter_ instead. It applies a configurable set of filters - or none at all - to some arbitrary input. The input - as well as the output - can be plain text, ASCII, raw HTML, safe HTML, XML, PHP, BBCode or whatever. The result can be safe HTML, but can also be the original, unaltered input. There are modules that use filters to not output HTML, but plain-text that is safe for sending via e-mail.
Comment #24
agentrickard@sun -- cross post while I was gathering info. Does #22 change anything for you?
Comment #25
sunNope.
1) Renaming of filter_xss*() as well as check_plain(), check_url(), check_file(), valid_email_address(), valid_url(), and possible more should happen in a separate issue. Those functions validate and/or sanitize (ensure) a certain syntax or format. check_markup() has nothing to do with them.
2) Which also means that we'll simply move the filter_xss*() functions in #470632: Move filter_xss*() into common.inc without renaming, so the renaming may or may not happen later on.
3) This issue is solely about renaming check_markup() in filter.module.
Comment #26
agentrickardI am in agreement with sun.
On this patch, I like
filter_text
, so this is a vote for RTBC.Other issues moved to #471184: Reconcile Drupal's input security functions with PHP filter_*, including the renaming of other functions and possibly replacing this function with
filter_var
.Comment #27
sunRenaming is now: #471264: Consistently name validation/sanitation functions
Comment #28
cburschkaShould all discussion of this issue move there?
Comment #29
sunNo - I repeat: check_markup() is not a validation/sanitation function. It is a tool to apply a set of configurable filters to a string.
Comment #30
Heine CreditAttribution: Heine commentedcheck_plain is also not just a validation/sanitation function. It is a tool to convert plain text strings (
Ladies & Gentlemen
) to HTML (Ladies & Gentlemen
).While not apparant in the PHPDoc for check_markup, its intended output is HTML. See Handle text in a secure fashion. I consider it a hack that some modules use this in another way.
Instead of renaming check_markup to a very ambiguous filter_text, it is time for a good look at how Drupal handles different string types, how to make the relation between function and the conversion being done can be made obvious. and how this all can be made easier for developers to prevent foe #1: XSS.
Comment #31
dmitrig01 CreditAttribution: dmitrig01 commentedas per http://acko.net/blog/safe-string-theory-for-the-web what about filter($context, $string, $param) which would do everything?
Comment #32
cburschkaIntriguing idea - but what is "everything"?
- filtering the request's cookie domain while finding the site's settings.php file on bootstrap
- querying an in input format from the database, getting its component filters and dispatching the content to the various modules
There's no way to do all of this in the same function without playing havoc with encapsulation.
Comment #33
agentrickardWe are also moving away from $context style functions (replace $context with $op to get a clear idea). So having
filter('email', $input_string, $param);
seems like a step backwards to me.Comment #34
gregglesJust some more brainstorming on this... we also have db_escape_table and db_escape_string that will sanitize strings for use in different contexts.
Regardless of whether the $context is in the function name or the first argument to the function, there is a potential solidify a whole lot of different "escape strings for use in context X" functions.
Comment #35
sunGood catches! However, those should go into #471264: Consistently name validation/sanitation functions.
Even if the purpose of check_markup() was (or is) to produce HTML, it still applies a configurable set of filters to the value (the input format). The major difference to all other functions is that a developer cannot rely on an expected, validated and sanitized result in a certain syntax or format.
Let's keep this issue on focus. I would be very happy to discuss potential ideas for changes to check_markup() in a separate issue (please leave a pointer here).
Comment #36
Heine CreditAttribution: Heine commentedRight, it converts richtext (with an associated format) to HTML.
Nor does the developer have to.
Changing check_markup to filter_text doesn't add a thing to understanding its function. And change for the sake of change is IMO never a good argument for an API change.
Comment #37
sunThat's why I referred to #446518: Remove $check argument from check_markup() in #15.
If we can find a better name than filter_text(), sure. However, putting the function into the namespace of filter.module makes a lot sense to me.
Comment #38
agentrickardWell, thinking about this, I would call it
filter_output()
, since it is designed to sanitize data before outputting markup.Comment #40
sunBumping to critical, since #446518: Remove $check argument from check_markup() has been committed.
We need a new function name for check_markup() ASAP now.
Comment #41
cburschkaWe should probably avoid getting caught on the output/text bikeshed. What makes this issue critical now seems to be the "check_".
The proposals so far have been, unless I glossed over something:
1. Renaming
s/check_markup/filter_[foobar]/g
(where foobar is still to be determined)1b) Accompanied by renaming other assorted filtering functions like check_plain, drupal_html_to_text, filter_xss_admin, etc.
2. Restructure filtering into an OO form, with a
RichText::html()
class/method.3. Consolidate all filtering into a single
filter($op, ...)
function.The last is a deprecated pattern being removed from D7, so we're not using that I guess.
What about the others?
Comment #42
agentrickardNow is not the time to introduce new OO, IMHO, so I would rule #2 out.
So let's just check for namespace collisions on filter_foo() --> http://us2.php.net/manual/en/book.filter.php
filter_markup() is fine by me, and legal. Patch attached.
Comment #43
agentrickardSee also #557148: Rename check_plain() to drupal_htmlspecialchars() for discussion of check_plain() and a patch.
Comment #44
sunI thought about this and I think that the proposed change is very good, because
- "markup" in filter_markup() can mean both input and output, which is actually the case for this function. You can filter text to HTML, other markup languages to HTML, and HTML to HTML; and even if it was not intended to allow for it, it also allows to filter HTML to text.
- filter_markup() is in the namespace of the implementing module.
- filter_markup() is both understandable by existing developers and new developers, while it's an easily recallable API change for existing developers.
Comment #45
Damien Tournoud CreditAttribution: Damien Tournoud commentedI'm with Heine on the richtext_to_html() proposal (http://drupal.org/node/557148#comment-1959718).
Comment #46
agentrickardBut the function is _in_ filter module.
So namespaces without filter_ are no good.
This is a horrible bikeshed debate, IMO. filter_markup is a perfectly good incremental improvement. If you want a name that doesn't start with filter_, then you have to move the function to common.inc.
Comment #47
sunok, seems like we need a bit more discussion.
richtext_to_html() is wrong, because
- it is not guaranteed that it returns HTML; whether it returns HTML or not depends on the enabled filters in a format
- you can pass plain text, HTML, or other markup languages in; whether the input is properly converted into the intended output language depends on the enabled filters in a format
- the function is, unlike all other filter_* and check_* functions, a highly configurable converter that applies multiple string/text processors to the input; only each (enabled) text processor is supposed to return some expected result, while not all processors need to be enabled.
- it's not in the namespace of filter.module
Comment #48
catchI like filter_markup().
Comment #49
agentrickardThat's 3 votes for filter_markup().
Any other filter_* suggestions?
Comment #50
sunComment #51
Heine CreditAttribution: Heine commentedThis is not a bikeshed debate. Proper text handling appears to be one of the things most taxing on the Drupal developer's brain. Just count the XSS issues reported in the last two years.
I'm disappointed by the focus on the namespace and the imo callous disregard for central role of the check_* and filter_* functions in string handling. Getting these function names right could minimize developer confusion and prevent countless XSS issues in the coming years.
Even in this issue I see confusion about what check_markup does: check_markup takes richtext (an 'arbitrary' format defined by the associated INPUT FORMAT) and converts it to HTML. That the resulting string doesn't have to contain a single tag doesn't mean the resultant string type isn't HTML; the function implicitly returns a string that is destined to be used as HTML. If check_markup did not return HTML, how could we show nodes and comments to visitors?
(BTW Filter use by the messaging module is a hack; the module will be rewritten.)
Comment #52
cburschkaFrom the DX standpoint, Drupal is a web interface. Everything it outputs is converted to HTML (and if some day we want alternatives to that, the entire filtering system would need a rewrite anyway). In that sense, "Filter this" is equivalent to "[convert] this to HTML".
If "filter_richtext" and "richtext_to_html" are equivalent in meaning, then the first fits a lot better.
Comment #53
sunYou seem to insist on the mysterious assumption that check_markup() always returns HTML that is safe for output.
Please have one more look at http://api.drupal.org/api/function/check_markup/7 and tell me where filter_xss() or any other sanitation function is invoked there. It is not.
check_markup() applies an arbitrary set of text processors to some string. That set can contain some processors or can be empty. Whatever string or formatted text you pass in may or may not be converted to something else.
Sanitation only happens if at least one security filter is enabled in the given text format, for example "HTML filter" (but there are more advanced security filters in contrib).
check_markup() is able to process many markup languages, especially lightweight markup languages, such as Markdown or BBCode. It is processing markup and is supposed to return HTML. The result may or may not be sanitized, depending on whether any security filter is additionally enabled in the given format.
For that sake, the "check" in check_markup() referred to the user access check for the given format that was previously contained. The $check parameter is gone now, so all what this function does now is to apply arbitrary filters to some potentially formatted text in some markup language and potentially convert that to another markup language (HTML is supposed). Therefore, filter_markup().
Comment #54
agentrickardsun made the argument much better than I ever could. This is RTBC still.
Comment #55
cburschkaFor the sake of completeness, whether we call something "markup" or "richtext" really /is/ a bikeshed. However, markup (as in Hyper-Text Markup Language) is the term we're already using throughout Drupal, whereas the first thing anyone connects with "richtext" is Microsofts completely unrelated RTF format.
So yeah, +1 for [filter_] and +1 for [_markup] from me.
Comment #56
Heine CreditAttribution: Heine commentedDid I?
That is absolutely NOT my "assumption". Check_markup returns HTML. That's it. Whether or not it is "safe" is immaterial and something for filter_access to resolve.
Excellent so we agree. check_markup processes a markup language defined by the input format to HTML.
I didn't think I pledged for keeping the check_* function names. Those are pretty lame. filter_markup IMO sucks just as bad.
Comment #57
agentrickardIs there a filter_* named function you thinks sucks less?
Comment #58
Heine CreditAttribution: Heine commentedOn this question:
I can only answer with a quote from my earlier post (emphasis added):
Comment #60
cburschkaBroken by a change in the function code.
Comment #62
cburschkaRenaming {box} to {block_custom} changed one line too close to a check_markup call.
Comment #64
cburschka#394182: DBTNG search.module broke HEAD. Retest.
Comment #65
agentrickardPersonally, I still like one of the following:
filter_output
filter_markup
drupal_filter
drupal_sanitize
drupal_secure_text
drupal_secure_output
drupal_filter_markup
filter_markup
Any chance we can pick something and move forward? And if the namespace is not filter_*, do we have to move the function to common.inc?
Comment #66
roychri CreditAttribution: roychri commentedI personally do not like *_secure_* *_sanitize because the output may not be secure at all.
We can also specify in the PHPDoc that the output will be valid (perhaps insecure) HTML format (even if no tags are present in the output). Security depends on the actual filters that were applied based on access permissions.
The documentation says "Run all the enabled filters on a piece of text."
What about drupal_apply_filters(), apply_filters() or filters_apply()?
This is very technical name and refer to the implementation details. It performs its job by applying the necessary filters.
As mentioned in the original description, this is a very high-level functions.
Why do we need to apply all the filters anyway? To prepare the input and making it ready to display/output. But why? What is the ultimate goal from a user's perspective? Remove this function and the input will not be formatted the way you intended. It seems to me this is more about formatting than filtering. Filtering is only a mean to an end.
format_output()? format_input()?
There are other format_* function in common.inc which takes an input and sometimes some formatting option and return an altered input based on the formatting options. Surprisingly, they (almost) all take the $langcode as well.
Just my 2 cents.
Comment #68
catchD8.
Comment #69
catchNo such thing as a critical task.
Comment #70
pillarsdotnet CreditAttribution: pillarsdotnet commentedHow about "filter_apply" ?
Comment #71
quicksketchEven "major" tasks are now preventing development of new features per the new policy at http://drupal.org/node/1201874. This would be a nice change, but largely just to fulfill OCD naming conventions. Considering we don't even have a consensus here, this shouldn't hold up development of new features.
Comment #72
gregglesI disagree, it's about a consistent API being easier to understand when first encountered which makes it easier to apply _safely_.
That said, I'm fine with the new priority.
Comment #73
Wim LeersComment #74
catchI think we could add the newly named function in a minor release and leave check_markup() as a deprecated wrapper if we wanted to.
We've halved the calls to check_markup() in core since 7.x - 16 down to 9 https://api.drupal.org/api/drupal/core%21modules%21filter%21filter.modul...
So contrib interaction with this should be minimal now.
Comment #87
quietone CreditAttribution: quietone at PreviousNext commentedI think no longer postponed.
Comment #88
catchVery unlikely we'd rename this to another procedural function now, but the problem is still there 13 years later.
Reading back through some of the terminology discussions, a big problem is that filter module is called filter module. Filter module now provides 'text formats' which are configured collections of 'filter plugins', so it should be called 'format' module or 'text_format' module. You could then have a service like format.formatter or similar with a ::formatMarkup() or ::applyFormat() method.
Comment #89
Wim LeersAmen!
And a related big problem is: #3231354: [PP-1] [META] Discuss: merge the Editor config entity into the FilterFormat config entity.
IMHO we should finally do that, and merge
filter
andeditor
module intotext_format
module. Neither https://www.drupal.org/project/format nor https://www.drupal.org/project/text_format are taken. Perhaps I should pre-emptively register both? 🤓