Currently, the Access Control page contains the following code:

<td class="permission">administer blocks</td><td align="center" title="administer blocks"><div class="form-item">
 <label class="option"><input type="checkbox" name="edit[1][administer blocks]" id="edit-1-administer blocks" value="administer blocks"   class="form-checkbox" /> </label>
</div>
</td>

Notice id="edit-1-administer blocks". This is an illegal id value because it has a space in it.

I'm guessing that this happens other places in Drupal as well, so I've written a function that can be used to 'filter' id values before they get written to the page:

/**
 * Converts a string to a suitable html ID attribute.
 * - Preceeds initial numeric with 'n' character.
 * - Replaces space and underscore with dash.
 * - Converts entire string to lowercase.
 * 
 * @param string $string
 *   the string
 * @return
 *   the converted string
 */
function drupal_id_safe($string){
  if(is_numeric($string{0})){
    $string = 'n'. $string;
  }
  return strtolower(str_replace(array(' ', '_'), '-'));
}

Does this seem like the right approach? Is this a good name for this function?

I'm working on a "select all" checkbox form element, but it can't proceed until this problem is solved in core.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Steven’s picture

I think it would be safer to use:

  return strtolower(preg_replace('/[^a-zA-Z0-9_-]+/, '-', $string));
jjeff’s picture

Yes, but your regex still leaves underscores in tact. So it should probably be:

return strtolower(preg_replace('/[^a-zA-Z0-9-]+/, '-', $string));

I like this approach. But keep in mind that all non-alpha-numeric characters will get converted to dashes. This means that:

title: "War And Peace" will become title---war-and-peace
M&M's will become m-m-s

I guess that's okay with me.

webchick’s picture

What about a second regex that does replaces multiple instances of - with just a single -? Or too much processing for a little function that's going to be used everywhere? Would doing preg_match on non-alphanumeric/hyphen characters and _then_ execute the replace logic improve this?

webchick’s picture

Here's the current version with Steven's suggested changes and jjeff's revision, plus a fix to a parse error:

/**
* Converts a string to a suitable html ID attribute.
* - Preceeds initial numeric with 'n' character.
* - Replaces space and underscore with dash.
* - Converts entire string to lowercase.
* 
* @param string $string
*  the string
* @return
*  the converted string
*/
function drupal_id_safe($string) {
  if (is_numeric($string{0})) {
    $string = 'n'. $string;
  }
  return strtolower(preg_replace('/[^a-zA-Z0-9-]+/', '-', $string));
}

Handy little function! A couple queries:

1. Should we be using ctype_digit rather than is_numeric?
2. Is the {0} syntax supported in PHP 6? I remember either that or [0] wasn't, but can't remember which one.

jvandyk’s picture

Status: Active » Needs work
FileSize
883 bytes

We already have form_clean_id() in form.inc. Here's a patch that modifies that function with the above code.

But after sleeping on it, and thinking about the number of times this will be called on a page, I think this is a bug in the module that is creating a form with a checkbox with a bad id in it. So -1 for this patch, and +1 for just having code that creates forms use a correct id in the first place.

moonray’s picture

It seems to me that underscores need to be converted to dashes in id names.
Take this as an example (from issue.inc):

$group .= drupal_get_form('project_issue_query_result_quick_search', $form);

Now, the underscore version here is required to make the call to theme_project_issue_query_result_quick_search($form), but now we're left with a form with the id project_issue_query_result_quick_search

<form action="/project/issues" method="post" id="project_issue_query_result_quick_search">

So, even a preg is too expensive, there should still be an underscore replace added to the form_clean_id() function.

moonray’s picture

Version: 4.7.2 » 4.7.4

Still valid in 4.7.4

webchick’s picture

Version: 4.7.4 » 5.x-dev

API changes won't happen in a stable release.

Still a useful function though. :)

moonray’s picture

Priority: Normal » Critical
Status: Needs work » Needs review
FileSize
6.07 KB

Here's a patch that should fix the underscores in IDs and CLASSes so that the css doesn't need to use underscores.

Hardcoded HTML in modules will obviously need to be manually fixed. Also CSS files will need to be updated for those modules that use underscores in CLASS and ID attributes. I did a quick search, and there don't appear to be very many in the modules ported to DRUPAL-5: calendar (patch submitted), flickrhood, gallery, greybox, mysite, panels, views_bonus, and widgeditor.

Drupal core itself has no CLASS and ID attributes that contain underscores.

The only thing I haven't checked is javascripts that might possibly reference IDs or CLASSes with underscores.

I believe this should be implemented in DRUPAL-5 before release, or we'll have to wait for DRUPAL-6 before getting this in...

webchick’s picture

Priority: Critical » Normal

"critical bug" status is restricted to bugs which severely cripple the functionality of Drupal. This does not qualify.

RobRoy’s picture

Status: Needs review » Needs work

We probably don't need a form_clean_class as well. Why not just use the form_clean_id as the selectors should have the same validation.

moonray’s picture

The reason I added a separate function to clean the classes, was because I'm not sure whether the following like would apply to the classes as well:

$id = str_replace('][', '-', $id);

I'll be happy to have them all run through one function, provided it won't mess anything up.
I think you are right, though, and they probably could run through the same class... however, I'd like confirmation, to be sure. :-)

RobRoy’s picture

They can run through the same function. Go for it!

moonray’s picture

Assigned: Unassigned » moonray
Status: Needs work » Needs review
FileSize
5.72 KB

Updated patch using the form_clean_id function instead of the form_clean_class funtion for CLASSes.

RobRoy’s picture

Looking good. It seems like there is a lot of repetition, not sure if we can focus on this at a higher level to clean ALL $element['#id']s or if we want to do it this way like right before display like check_plain(). Thoughts?

moonray’s picture

If there's a possibilty that the id's and class's values can be changed AFTER cleanup and BEFORE being displayed (by theme_xxx()) then I think it needs to be done as it is now.

If, however, there is no possibility of changes between the one cleanup (maybe form_builder() is good place?) and the call to theme_xxx() then that would be a lot more elegant, and probably less prone to errors...

Steven’s picture

Status: Needs review » Needs work

Definitely needs to happen at a higher level. We already assign id's for those elements that don't have one yet, and really, only the autogenerated ones should be cleaned.

But, it seems the latest patch doesn't actaully do anything about spaces, which is what the issue was originally about?

moonray’s picture

The question is: do we trust developers to be aware of the issue with underscores and spaces in the CLASS and ID names? Personally, I'm paranoid; I'd rather prevent (auto fix). ;-)

Either way, I think the code can be rewritten to only call the form_clean_id function from the form_builder function.

QUESTION: do we leave user assigned ids and classes alone, or de we clean them?

RobRoy’s picture

Steven's right about spaces and that we should only be cleaning auto-generated class/ID names, not user defined.

moonray’s picture

What if a developer enters the following (which will determine the id and classnames)?

  $form['cover fid'] = array(
    '#type' => 'file',
    '#title' => t('Cover Image'),
    '#size' => 47,
  );

Shouldn't this raise an error somewhere? (different issue, but still related)

RobRoy’s picture

No, that goes under the GIGO (Garbage In - Garbage Out) philosophy. Developers are expected to know how to properly construct a form.

moonray’s picture

Status: Needs work » Needs review
FileSize
2.26 KB

New patch that doesn't affect pre-defined ids.
Classes are all user defined, so no longer affected by form_clean_id.
Spaces are now being cleaned up by form_clean_id as well.
Function form_clean_id is now called from function form_builder (central location), so removed all instances in theme_xxx (redundant).
Had to add a call to form_clean_id in drupal_prepare_form to fix the form's id.

Hopefully this addresses all issues while adding the smallest amount of overhead. :-)

webchick’s picture

No, that goes under the GIGO (Garbage In - Garbage Out) philosophy. Developers are expected to know how to properly construct a form.

Hm. Really? That reduces the usefulness of this patch then imo.. I'd love to be able to do stuff like:

foreach ($categories as $category) {
 $form[$category] = something
}

and have form_builder be smart enough to convert a form ID of "Stories about food" to "stories-about-food." Not doing this in the "root" form builder means an awful lot of duplicated code cleaning up IDs in various contrib modules, etc. before they're passed to the form.

But whatever the consensus is, I guess...

RobRoy’s picture

Yeah, I could be wrong. I guess I shoulda put a IMHO in my comment. ;) I see your point about increasing the utility of this if we go after contrib names as well. I'm fine with doing it webchick's way. Now I'll shut my mouth.

moonray’s picture

From a coding point of view, it's very easy to add fixing of existing IDs (only one more call to form_clean_id is required).

Now, if a contrib module uses spaces or underscores in their "name" and they're it gets auto-fixed, their CSS ID and CLASS names need to be updated. But I guess they'll figure that out when their presumed IDs and CLASSes don't get styled properly, and fix it.

Now, I guess the question is, do we raise an error, or do we auto-fix? Raising an error will promote cleaner code, whereas auto-fixing is more of a "I don't need to think about it" approach.

Steven’s picture

webchick: as far as the element name goes, it does seem that PHP does restrict the valid names for form elements, though the XHTML specs do not. I can't seem to find anything specific on php.net about which characters are valid for GPC keys though.

But this has little to do with this patch. The only thing that we're concerned about here is the XHTML id, which must be unique and is limited to a set of characters. In your example, you don't specify #id, so it is autogenerated based on the element name / parents.

I think the number of times when you need to dynamically supply names or id is quite small, and when you do, it is easy for a module to call form_clean_id() itself.

I'd still prefer it if we removed all characters except an allowed set, rather than just removing 3 disallowed ones though.

moonray’s picture

Steven: I'd still prefer it if we removed all characters except an allowed set, rather than just removing 3 disallowed ones though.

That, I agree, would be the best. I thought there were concerns about performance (regexes are cpu intensive), which is why I went for the cheaper str_replace solution.

Is the overhead worth the safe IDs?

p.s.
From the description of str_replace on php.net: If you don't need fancy replacing rules (like regular expressions), you should always use this function instead of ereg_replace() or preg_replace().
I'm assuming this means it's much more CPU/memory intensive to use preg_replace than using str_replace. But how much more expensive is it really?

dvessel’s picture

What's wrong with underscores in ID's & classes? They are perfectly legal within CSS2. Just get rid of the spaces. There are some instances of underscores used outside of this anyway. I know taxonomy links do.

moonray’s picture

Using underscores wasn't always part of the CSS2 spec.

Underscores are legal in HTML attributes according to the HTML standards but not according to the original CSS standards (this has been amended in the CSS2 errata document so that CSS now allows the underscore).

Between mistakes in implementation and changes to the specification, browser behavior with regard to underscores is rather convoluted.

  • Netscape 6.x permits underscores and escaped underscores.
  • Navigator 4.x honored the restriction against underscores, and so any class or ID name with an underscore will be ignored by Navigator 4.x, so the associated styles will never be applied. It also ignored escaped underscores.
  • Internet Explorer 4.x and 5.x for Windows erroneously allowed underscores, and so were both non-conformant on this point until the errata was published. The same is true of IE4.x and 5.x for Macintosh.
  • Internet Explorer 6 for Windows, published after the errata, permits underscores and escaped underscores.
  • Opera 3.x through 5.x does not recognize underscores or escaped underscores, and so acts the same as Navigator 4.x in this regard.

dvessel’s picture

If underscores can be prevented easily then cool, make it happen. But look at the list of browsers.. Since when was Drupal concerned with ancient browsers? There would be larger issues if that were the case.

Steven’s picture

Status: Needs review » Fixed

Tested and verified. The spaces are gone.

Let's assume people only use code-like id's and names for now. That way form_clean_id() can keep the str_replace, which is faster (remember, it's called for every element).

I also replaced two other instances of str_replace with form_clean_id (form id, form token). This is more consistent.

Committed to HEAD, thanks.

Anonymous’s picture

Status: Fixed » Closed (fixed)