Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Problem/Motivation
- Misuse of the method
DrupalTestBase::randomString()
is frequently responsible for random, hard-to-debug test failures, because it generates a list of random characters that includes special characters like#, *, /, \, ?
, etc. - Developers do not expect this from the name.
- #1672764: Improve documentation of randomString() and randomName() has been committed to clarify when each of these methods should and should not be used, but the names are still not necessarily intiuitive and developers may use them without having read the documentation.
Proposed resolution
- More explicit method names would prompt developers to read the documentation and/or consider the purpose of their randomized strings more carefully.
- Rename
randomName()
torandomMachineName()
. - Potentially rename
randomString()
torandomUnsanitizedString()
.
Remaining tasks
- Need consensus on whether to rename one or both of these methods, and what the names should be,
- @sun and @djdevin have pointed out that the name
randomString()
really is exactly accurate; "a string is unsanitized by design". - @sun also pointed out that the name
randomUnsafeString()
would dissuade developers from using this method when they should (whenever unsanitized input is possible). - Any renames will need a change notice.
Related issues
- #1676910: Rename randomName() to randomMachineName()
- #1672764: Improve documentation of randomString() and randomName()
API changes
TBD
Comment | File | Size | Author |
---|---|---|---|
#6 | random-unsafe-string-1659682-6.patch | 324.24 KB | xjm |
#3 | random-unsafe-string-1659682-2.patch | 15.34 KB | xjm |
Comments
Comment #1
xjmInstances of
randomString()
:Comment #2
donutdan4114 CreditAttribution: donutdan4114 commentedI would argue that
randomString()
should remain the same, based on the name, which clearly states random. I don't think there is confusion there.However, a 'safe string' variant would be nice as well, so what about
randomSafeString()
.Also, you could make it a parameter
public static function randomString($length = 8, $safe = FALSE) {
, and then tweak the function to not include special characters if$safe == TRUE
.Comment #3
xjmFirst pass.
Comment #5
xjmOh right, can't do this because it's static.
It might just be better to rename rather than providing the wrapper anyway.
Comment #6
xjmReplacing both
randomString()
andrandomName()
. This is a way bigger patch. ;)Comment #7
xjmFor reviewers, this is the actual API change. The rest is string replace.
Comment #8
sunugh, I couldn't disagree more.
The method names randomString() and randomName() are technically correct. The actual problem are:
1) People do not read their documentation.
2) People do not properly account for character escaping in their tests.
It is not randomString() that is causing test failures. It is poorly written test and assertion code that is causing the test failures.
Tests absolutely must take string escaping/sanitization into account. Renaming the methods in the way proposed will only lead to developers using the wrong "safe" function everywhere.
This won't fix for me.
Comment #9
andypostThis correct only for some developers, so let's make it more saner for others
Comment #10
xjm@sun is correct in #8 that the problem is carbon- and not silicon-based. However:
Giving developers more obvious method names is good DX. In a perfect world devs would always consider the implications of using any random character vs. alphanumeric. However, people are only human, and experience shows us that patches with misuse of
randomString()
slip past reviewers all the time and break QA all the time. Adding the word "unsafe" improves DX by making it abundantly clear that they need to take something else into consideration.The current method documentation is sadly lacking, so it's not just people being stupid. I'd like to backport the docs improvements I have here.
I'm open to only renaming
randomString()
, but I think a docs fix by itself would not remove this headache.Comment #11
djdevinI agree with sun in #8, String is inherently unsafe in any language (outside of Drupal, outside of PHP), and that shouldn't have to be stated in the function name.
Also, I don't think it's possible to define "unsafe" or "safe" without context (markup, mysql...). Renaming random* to either of these means that there is a chance that the function will not do what it says it will do.
I would only agree with randomPossiblyUnsafeString() and randomProbablySafeString() :p
Comment #12
sunThinking about this some more, I personally could only accept a name change for randomName() [if that helps in any way].
The problem is that people should use randomString() wherever appropriate. And that means almost everywhere.
Each of those random test failures means that character escaping/sanitization was not considered at all in a test. But developers have to write tests that properly take security into account.
The only exception to randomString() are places in which a machine name input field or the like needs to be populated, which has a limited range of allowed characters. That is the only reason why randomName() exists.
I strongly object to the original presumption that is made here. Developers (and also reviewers) need to use randomString(), and everyone has to think about security. randomString() is what you use, always. randomName() is the exception, and needs a valid reason for being used.
We can:
Lastly, I do not see how this can be backported - these functions are heavily used in tests and a rename is an API change.
Comment #13
xjmSorry, what I meant only was that the documentation fixes included in my patch can and should be backported. Not the renaming. I'll factor that out into its own issue.
Comment #14
xjmMoved the docs improvements into #1672764: Improve documentation of randomString() and randomName().
Comment #15
merlinofchaos CreditAttribution: merlinofchaos commentedWhile randomString() technically correct, it is not complete, so I disagree with sun's assertion. I believe that clarity in what that method is going to return is valuable to the developer, and if you want people to be testing for security you should be telling them explicitly when they're using unsecure strings vs secure strings and not leaving it up to the chance of them reading the documentation to see what kind of random string the random string is.
Comment #16
merlinofchaos CreditAttribution: merlinofchaos commentedrandomSanitizedString() and randomUnsanitizedString() would be very clear.
Comment #17
sunThat doesn't really change the fact that randomName() has nothing to do with randomSanitizedString(), because it does not sanitize the string in any way. :) randomName() could/should be renamed to randomMachineName() to resolve that confusion, since the function's clearly documented purpose is to create a string that is compound of characters that are supposed to be compatible with most of the machine name cases in Drupal.
As @djdevin mentioned in #11, a string is unsafe/unsanitized by design. Whether you need to escape/sanitize it always depends on the context you're using the string in.
We could add a separate randomASCIIString() or randomLatinString() helper method, but then again, there are so many possible different character class cases in the first place that it's going to be hard to explain why one case is privileged over others.
Comment #18
xjmDuh. Let's do this. :)
I'm still of the opinion that we should also rename
randomString()
torandomUnsanitizedString()
as well. I think that name is more neutral thanrandomUnsafeString()
and reminds the developer exactly what they're supposed to keep in mind: whether or not the string has been sanitized. #1672764: Improve documentation of randomString() and randomName() is in as well, so hopefully the renames will prompt people to read those docs and think about sanitization when they write their tests.Comment #18.0
xjmUpdated issue summary.
Comment #19
xjmSummary updated.
Comment #19.0
xjmUpdated issue summary.
Comment #19.1
xjmUpdated issue summary.
Comment #19.2
xjmUpdated issue summary.
Comment #19.3
xjmUpdated issue summary.
Comment #20
andypostI think
randomUnsafeString()
is shorter and more sane thenrandomUnsanitizedString()
There's no sanitization just generation
Comment #21
sunI will also admit and agree that randomName() is currently being (ab)used by many tests, which "just need to implant some value for some other module."
I almost guess that most of the confusion stems from that usage actually.
Random example:
While this is a dedicated helper function, and dedicated helper functions could/should actually take the extra mile of doing things properly, it's really just one of many examples for code that does not want to care for the actual allowed and valid characters in usernames as well as e-mail addresses.
The essence is that all code like this is not testing the actual user entity, account, and registration functionality - instead, it just needs a new user account to work with. So the test code wrongly assumes it could simply get away with ignoring the possible string escaping/sanitization issues. (which especially in case of this generic helper function pretty much means that no module tests are taking possible special characters in usernames and e-mails into account...)
Now one could start to think about dedicated helper functions like randomAlphaString(), randomNumbers(), randomAlphaNumericString(), etc and lastly randomUnicodeString() [== randomString()].
In the end, however, use-case specific helper functions à la randomUserName() and randomEmail() would be the most appropriate thing to do.
Only such dedicated functions are able to generate random results that actively leverage built-in knowledge about the minimum and maximum bounds of possible values. For example, both of these are valid random examples:
a@à.ca
someone.who.dare.not.be.named+drupal@super.institute.united-kingdom.emea...
With our current, eight character limited alphanumeric machine name strings, no tests in core and contributed modules actually ensure that they are able to properly handle the real range of possible values - neither regarding data storage (schema), nor regarding secure string escaping/sanitization, nor in the user interface regarding possible layout breakage.
(All of this is closely related to #913086: Allow modules to provide default configuration for running tests, so thanks for bringing up this additional need.)
Comment #22
xjm#21 is a much bigger scope than I'd originally thought but it sounds like a good direction, and we could add these helpers one at a time without difficulty.
randomMachineName()
is then actually the first specific helper. The bigger task would be converting existing tests so that core provides good examples (instead of using machine names all over the place to sidestep the question of prior validation by Drupal, the filesystem, or whatever).Aside, our current
randomString()
is actuallyrandomBasePrintableASCIIString()
, I think. :)Comment #23
xjmI spun #1676910: Rename randomName() to randomMachineName() off into its own issue since it seems non-controversial and is easy to do by itself. Updating the summary.
Comment #23.0
xjmUpdated issue summary.
Comment #24
xjmAlso, it occurs to me that new helpers (and the conversion of existing tests to them) could be backportable.
Comment #24.0
xjmUpdated issue summary.
Comment #25
xjmBetween #1676910: Rename randomName() to randomMachineName(), #1672764: Improve documentation of randomString() and randomName(), and #1860594: Ensure that randomString() always returns a character that needs to be escaped for HTML, I think the original goal of this issue has been met. We could have other context-specific methods, but that's really a separate issue scope.