I recently noticed Drupal was writting two meta content-type tags to my pages:

  <head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="shortcut icon" href="/is118/misc/favicon.ico" type="image/x-icon" />

I believe the issue is with drupal_final_markup and drupal_get_html_head functions in common.inc.
I fixed the issue by preventing drupal_get_html_head() from adding another meta content type, however,
i dont think this is ideal.

is drupal_final_markup necessary?

Files: 
CommentFileSizeAuthor
#132 drupal-n451304-131.patch1.58 KBDamienMcKenna
PASSED: [[SimpleTest]]: [MySQL] 190 pass(es). View
#90 duplicate.patch992 bytessreynen
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch duplicate_1.patch. View
#80 one_return_statement.patch964 bytesmcrittenden
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch one_return_statement.patch. View
#75 duplicate.patch960 bytessreynen
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch duplicate_0.patch. View
fix.patch389 bytesiamfil
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch fix_8.patch. View

Comments

loze’s picture

subscribing. Just noticed this happening after upgrading to 6.11.

ainigma32’s picture

Status: Active » Needs work

Even drupal.org is outputting the content-type meta tag twice...

drupal_final_markup was added to prevent certain attacks (see the diff here http://cvs.drupal.org/viewvc.py/drupal/drupal/includes/common.inc?r1=1.7...)

As for the patch; I suggest removing the first line of the function as well.
After that, set this issue to patch needs review so the your patch will be tested automagically.

- Arie

Damien Tournoud’s picture

Status: Needs work » Closed (works as designed)

This was introduced by SA-CORE-2009-005 on purpose, to prevent non updated themes to cause issues. The duplication of the meta content-type header is harmless.

BENNYSOFT’s picture

The duplicate of the meta content-type header is harmless.

Harmless but inelegant.

Gábor Hojtsy’s picture

Yes, it is inelegant, we (at the security team) all agree it is inelegant, ugly, or whatever you want to call it. We are not going to argue over it. Unfortunately, we found most themes vulnerable to this bug, so we needed a fix which solves the vulnerability with all themes regardless of their release schedules. Thus we needed to make sure that the content type value is output no matter what. We ensured that the output still validates against the W3C validator and that we found no standard saying we should not output the same meta tag two times. We also looked at the output in various browsers and found that it had no issues in practice.

jcnventura’s picture

Status: Closed (works as designed) » Needs work

Just one thing.. It seems that there's this thing called FastCGI that seems to have an issue with duplicate content headers. See #462336: TCPDF: Pdf generator Problem with Fastcgi for more details.

I would actually prefer if we could apply the patch in #1 with the removal of the first line as suggested in #2 to remove one of the content-type lines, at least to Drupal 5 and 6.

Knowing how drupal_final_markup() is working I see no conflict in applying this patch and still complying with all the security concerns expressed in #5.

João

Gábor Hojtsy’s picture

jcnventura: First, did you verify that when you remove one of the content type headers, it is indeed working right with TCPDF?

Peacog’s picture

My website delivers HTML4, not XHTML, so this new feature means that my site doesn't validate because of the trailing slash. Is there some configuration setting that tells Drupal whether to output HTML4 or XHTML, so that drupal_final_markup() can check which to use?

Gábor Hojtsy’s picture

Drupal outputs XHTML, and the suggested XHTML notation for HTML backwards compatibility (from the W3C) is the slash separated with a space before the closing >. Drupal hardwires outputting XHTML in so many places, that it is hard to believe that you managed to output straight HTML without heavy hacking.

Damien Tournoud’s picture

Status: Needs work » Closed (works as designed)

#462336: TCPDF: Pdf generator Problem with Fastcgi is unrelated. The "duplicate" headers in the error message is not the HTML one, but a true HTTP header.

Back to "by design".

jcnventura’s picture

Status: Closed (works as designed) » Needs work

Gábor: I am the print module maintainer, that bug was reported by one of the users, so no I can't try to reproduce it with only one tag.. For me TCPDF works fine with one or two tags.

The problem is with FastCGI which is stupidly aborting when it detects a duplicate header. That's a known problem and is already solved in their CVS snapshot (of November 2008).. If you follow one of the links in the other issue, there's a patch there which simply removes the duplicate detection code to avoid that problem. (http://www.fastcgi.com/archives/fastcgi-developers/2008-September/000048...)

However, I would guess that most host providers which use FastCGI will be using the official release version of 2007 and not the CVS snapshot, so it may be that Drupal itself may run into problems in that setup. On the other hand, I don't understand how he can be running print 6.x-1.7 (which because it uses drupal_final_markup() requires D6.11 or later), and not have the same problem with core Drupal...

Anyway, how bad is it to commit the patch in this thread and have a single Content-Type tag??

João

Peacog’s picture

I didn't have to do much hacking, just a couple of lines in phptemplate_preprocess_page to replace '/>' with '>'. But I will no doubt run into more problems as I add more modules. I'm using HTML4 for historical reasons because I ported an existing site to Drupal. I guess it's time to bite the bullet and move to XHTML.

Slightly off-topic, are there any plans to support HTML5 if it ever takes off? At that stage there might be a need for a Drupal setting to allow a choice between XHTML or HTML5.

Damien Tournoud’s picture

Status: Needs work » Closed (works as designed)

I take it that jcnventura cross-posted.

Gábor Hojtsy’s picture

jcnventura: The patch you posted deals with HTTP headers, not meta tags in HTML. Therefore looks unrelated.

jcnventura’s picture

Status: Closed (works as designed) » Needs work

Yes, I cross-posted..

And indeed, you're right, that's an HTTP header, so I very much doubt that he'll be able to generate PDFs as it is. I will investigate the issue further with the user on the other queue..

João

jcnventura’s picture

Status: Needs work » Closed (works as designed)

Yes, I cross-posted..

And indeed, you're right, that's an HTTP header, so I very much doubt that he'll be able to generate PDFs as it is. I will investigate the issue further with the user on the other queue..

João

ao2’s picture

Hi, I also came across this one.

For me how the fix is implemented is actually an issue because I would like to serve my pages with content type application/xhtml+xml and having two conflicting content types is not correct, not only inelegant, which I could well live with.

I can't even preg_match in my THEME_preprocess_page() hook because drupal_final_markup() seems to skip all the output mangling paths.

Could you please suggest a way to work around that? Otherwise I will have to hack the core (sounds like a mild menace, eh?) :)

Thanks for the attention.

With Kind Regards,
Antonio

markus_petrux’s picture

Status: Closed (works as designed) » Needs work

Sorry, to re-open, but I disagree on the status and I think it could be easily fixed. :)

Any reason for not fixing drupal_get_html_head() ?

This was introduced by SA-CORE-2009-005 on purpose, to prevent non updated themes to cause issues. The duplicate of the meta content-type header is harmless.

s/harmless/hack-ish, I would say, and it could be easily fixed with the patch provided on top of the issue. It is completely unnecessary that drupal_get_html_head() prepends the Content-type meta while this job is now done by drupal_final_markup().

As a follow up to #17, I would say drupal_final_markup() would have to check if a Content-type meta exists in the header and ensure it is placed on top of it. So maybe drupal_get_html_head() does not need to be fixed, but add a bit more logic to drupal_final_markup(). And maybe even it wouldn't have to be restricted to certain values of $hook ('page' and 'book_export_html'). Think about alternative templates to 'page' that are used to render iframes, etc (several contrib modules do). So I think the issue status would have to be switched to "needs work" and let's try to elaborate on this please.

The security issue was fixed, so now that we have (relative) peace, I think we can try to think about a better approach.

Gábor Hojtsy’s picture

@ao2: the reason we implemented it in a way that themes cannot override is exactly to ensure that it is always present first.

@markus_petrux: the reason we left the meta in drupal_get_html_head() is that we thought we cannot be 100% sure that the regular expression matches (however simple that regular expression is), and we still need to have that meta information on the page. If you can find a way which totally ensures that all themes are fixed regardless of their state, then that would be great.

markus_petrux’s picture

Since Drupal 6 has no API for using a different document type (as requested by #17), one possible approach would be to apply the patch in #0. That would mean Drupal 6 is still forcing the eixstence of the content-type meta, but injected on a different place. ie. what was doing drupal_get_html_head() to the header output, it now would be done from drupal_final_markup() only, but not on both places, which is the cause of duplicated meta tags for content-type.

Also, at the bottom of theme(), where drupal_final_markup() is invoked, I would remove the condition that triggers the call to drupal_final_markup(), so a mini-patch for discussion would look like this.

Changes to theme.inc:

-  if ($hook == 'page' || $hook == 'book_export_html') {
-    $output = drupal_final_markup($output);
-  }
+  $output = drupal_final_markup($output);

Changes to common.inc:

 function drupal_get_html_head() {
-  $output = "\n";
-  return $output . drupal_set_html_head();
+  return drupal_set_html_head();
 }

 function drupal_final_markup($content) {
   // Make sure that the charset is always specified as the first element of the
   // head region to prevent encoding-based attacks.
-  return preg_replace('/]*>/i', "\$0\n", $content, 1);
+  if (strpos($content, ']*>/i', "\$0\n", $content, 1);
+  }
 }

And if you wanted to provide support for different mime types (rather than forcing 'text/html'), then Drupal could provide a $conf variable or something else than can be defined by the theme. But this part would really be a "feature request". Hence, probably something to take into account for D7, maybe.

Gábor Hojtsy’s picture

@markus_petrux: it is totally possible that some theme function is used to generate arbitrary XML or other markup. That could easily contain markup which matches <head[^>]*> or have <head> with a different meaning then HTML. Also, as said, we cannot ensure that <head[^>]*> is actually a character sequence present in all themes, so we cannot say we ensured that the meta tag appears, if we only rely on that being matched. Plus the regular expression is purposefully case insensitive. You now put a case sensitive strpos ahead of it, so themes using <HEAD[^>]*> would not work. We went through discussing these options in the security team, and were not at all happy to come out with the solution as we did, but we had the duty to try and ensure security with all custom themes, whoever and however did them.

markus_petrux’s picture

hmm... I see.

Maybe drupal_get_html_head() could add a placeholder instead of the content-type meta tag, and that could be used in drupal_final_markup() to catch up it's a page template that was fine with that content-type meta tag, so it can be now safely added by drupal_final_markup().

Se the mini-patch for discussion would look like:

Changes to theme.inc:

-  if ($hook == 'page' || $hook == 'book_export_html') {
-    $output = drupal_final_markup($output);
-  }
+  $output = drupal_final_markup($output);

Changes to common.inc:

 function drupal_get_html_head() {
-  $output = "\n";
-  return $output . drupal_set_html_head();
+  return ''. drupal_set_html_head();
 }

 function drupal_final_markup($content) {
   // Make sure that the charset is always specified as the first element of the
   // head region to prevent encoding-based attacks.
-  return preg_replace('/]*>/i', "\$0\n", $content, 1);
+  if (strpos($content, '') !== FALSE) {
+    return preg_replace('/]*>/i', "\$0\n", $content, 1);
+  }
 }

Now, you can be sure strpos() will match.

markus_petrux’s picture

#22 follow up:

The string '<!--head-->' could be replaced by something a bit more esoteric to ensure it's unique enough.

...or if we don't find a way to remove the check in theme(), then just apply the patch in #0, so we don't output more than one content-type meta tag?

ao2’s picture

@Gábor: I understand why you did this way, I just didn't like the hardcoded content-type. But IIUC this is a non issue for D6 since the content-type is hardcoded in other places as well, and this discussion is Off Topic here anyway. So I'll have to hack the core for the time being. I just wanted to be sure :)

Opening a feature request for D7 (#475242: Support Content-Type different from text/html).

Sorry for the noise.

Regards,
Antonio

samuele’s picture

I have solved with this code in page.tpl.php file

<?php /*print $head;*/
$headlines = explode("\n", $head);
foreach ($headlines as $line) {
if ($line == '<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />') {
  $line = "" . "\n";
}
print $line ."\n";
}
?>

Is there a way to do it with a function in template.php file?

ao2’s picture

@samuele, something like:


function THEME_preprocess_page(&$vars) {
  $vars['head'] = preg_replace('/]*>/', '', $vars['head']);
}

gerritvanaaken’s picture

Thank you for the quick fix @samuele and @ao2!

This needs to be fixed in a proper way, because the HTML 5 validator throws an error with this. In this particular case the validator might be wrong, because it’s not a real error to have two identical meta tags, but still ...

shelleyp’s picture

Having to echo the above, as I found the additional meta tag was added when trying to validate HTML5. It would also be nice to be able to provide our own input with this, to meet differing standards.

Note, you can fix one of the meta tags using template.php, but not the other. The one that comes before Title. The only option seems to be to crack the core code, which I won't just to validate.

jgadrow’s picture

To paraphrase this issue using an old commercial as a base:

Themer: Hey! You got some of your model in my view!
Developer: Hey! You got some of your view in my model!
Both: *munch* Whoa! This tastes like utter crap!

The argument about allowing the extra output in html_get_head is a moot point. If you NEED to place this call into drupal_final_markup (because you can't trust html_get_head to protect you from the vulnerability) then if the theme developer has done something screwy enough to mess with your regular expression, you still haven't guaranteed that the design has the content-type as the first line of the head. This will still render sites utilizing that theme vulnerable to attack. Thus, you probably should not be claiming that you have 'solved' this vulnerability issue.

While I agree that security is critical, I'm conflicted because I also believe this violates MVC. Clearly, if the theme is where the issue lies, why should it be the model's job to fix it? Granted, there are multiple examples at how Drupal already breaks the principles of MVC, but that's another argument. At most, Drupal should put out a warning for site admins to check that their themes are outputting a content-type and place the responsibility for this back where it belongs: with the theme designers. If a site admin is doing their job, they should be looking for news updates from Drupal anyways and if they decide they can't take 30 seconds to view the source of their page and verify that they see a content-type output, they deserve to be hacked!

However, I should state that I'm very against 'Big Brother' tactics like this, the seat belt laws, and pretty much anything ever made "mandatory" for my personal safety. So take my opinion with a grain of salt. ;)

Damien Tournoud’s picture

Status: Needs work » Closed (works as designed)

@jgadrow: this is only true in Drupal 5 and 6, where is was not pratical to update all the themes distributed on drupal.org *and* all the custom themes deployed in the several hundred thousands of websites around the world. This ugly regexp is not in Drupal 7.

And by the way: (1) Drupal is not MVC (I'm still not sure what MVC means in the context of a web application anyway), (2) drupal_final_markup() is part of the theming layer, so it could be argued that it is part of the View (at least in the definition of MVC you seems to have).

jgadrow’s picture

Status: Closed (works as designed) » Needs work

Wonderful that it's not in Drupal 7. So, it's only in the currently supported, stable systems. Which is, what? MOST Drupal sites then?

And I noticed that you completely skipped the major statement that I made: the fact that this vulnerability STILL exists because you can't be 100% positive that theme designers have:

1) Placed the $head variable at the beginning of their tag (or placed it at all for that matter)
2) Not done something stupid to mess up the regular expression in drupal_final_markup

So, you're forcing themers to remove an extra declaration of the content-type (or just look inelegant and inept) on the basis that this allows you to 'solve' a security vulnerability. And it STILL does not solve the vulnerability 100% of the time.

Once again, why not just leave the declaration out of html_get_head and put out a security warning advising administrators to check the output of their sites? Site admins can then alter their themes or not (at their leisure) to fix the vulnerability. You're not forcing themers to overcome a kludgy 'fix' to a problem that may not be affecting their theme anyways. Since themes are separate from the Drupal core, future updates will not invalidate a change made by a site admin meaning they make the change only once. And if a new version of their theme comes out and they overwrite their current theme. They should either already know to check the output for the content-type, or, the theme is being actively developed and the theme developer should have already altered their theme to remove the vulnerability. Note: If they haven't that's a good indicator to go with a different theme developer!

Additionally, there's the added performance hit (albeit a small one) of constantly telling PHP to remove that extra declaration that the theme I'm using didn't need in the first place. Or the extra bandwidth being consumed by the unnecessary characters if it's left in. Some people ARE charged for bandwidth, so, technically, there IS a cost (and an actual, physical cost at that!) associated with leaving the extra declaration over-and-above future semantic compatibility. So, the statement that "declaring it twice is harmless" is also misleading.

And, as for a definition of MVC:
Model-View-Controller - An architecture for programs that have graphical user interfaces. The architecture separates those parts of the program that handle the application data (the model) from those that present that data to users (e.g. graphically, on the screen) (the views) and from those parts that respond to changes made to the data (e.g. by the user manipulating the data presented on the screen) (the controllers). The architecture defines the ways these parts of the program communicate. The parts of the architecture have high cohesion and they are connected in such a way as to give low coupling.

That definition is just as suitable for web applications as it is for 'standard' applications. And I don't think you're allowed to state that Drupal is "not MVC" and then make a claim that it still follows "MVC." It's fine for a program to NOT be MVC. I had already stated in my previous post that Drupal is not truly MVC (try making your theme output an XML document utilizing XSL(T) instead of (X)HTML and serve it correctly as text/xml without hacking core... I dare ya! lol) but has tried to, largely, not interfere in the presentational layer. However, to claim that it's not MVC and then state that it could be argued that it conforms to it is, at best, contradictory.

Also, I would like to state that I'm not 'upset' or 'angry.' I know Drupal has limitations (just like any other system does) but I do enjoy in participating in intellectual debate. So, please, I'm not intentionallly trying to step on any toes here, only give a new opinion. My native language just happens to be sarcasm. :)

Damien Tournoud’s picture

Status: Needs work » Closed (works as designed)

@jgadrow: according to your definition, the whole theming layer (and more) is part of the "View", that includes drupal_final_markup(), making your point in #29 moot.

jgadrow’s picture

I have already yielded on the grounds of MVC (see #31) as it's not terribly important. However, if you insist on making this 'by design' is there a known issue section for Drupal? I believe you should add one in that case:

Known issue:
If you are running Drupal 5 or Drupal 6 and the theme that you're using has not output the $head variable, has not supplied its own content-type declaration, and has irregular code that causes the regular expression in drupal_final_markup to fail, you will be vulnerable to an encoding type attack. Additional errors may crop up in some instances if your theme is not removing an extra content-type declaration supplied by Drupal in an attempt to safeguard MOST systems from this vulnerability.

design4effect’s picture

Whether it validates or not, having duplicate headers is a sloppy fix to anything.

There is no way to avoid putting the Content-Type meta before the TITLE tag when dynamically generating titles with user content.

If no content type is set, IE tries to interpret this:

+ADw-script+AD4-alert(1)+ADw-/script+AD4-

as UTF-7 which is this:

<script>alert(1)</script>

For a full explanation please refer to: http://code.google.com/p/doctype/wiki/ArticleUtf7

So that is not an issue.

The issue is:
- drupal_get_html_head() [/includes/common.inc] tacks it on to the top of the headers.
- then, drupal_final_markup($content) [/includes/common.inc] adds one immediately after the HEAD tag (if there is a HEAD tag)

The two functions above are called in the given order. The result... is 2 identical headers.

So
- Drupal always adds it to the top of Headers
- Drupal adds it after HEAD only if there is a HEAD tag

My solution:

If there is a HEAD tag
- If there is a CONTENT-TYPE tag, remove it
- Insert a CONTENT-TYPE tag immediately after the HEAD tag

I changed the drupal_final_markup($content) function by adding two lines at the top as follows:

/**
 * Make any final alterations to the rendered xhtml.
 */
function drupal_final_markup($content) {
  // If a HEAD tag exists and there is already a CONTENT-TYPE header, remove it
  $content = preg_replace('/(<head.*?)<meta http-equiv="Content-Type"[^>]*>[\\r\\n \\t]*/si','$1',$content,1);
  // Make sure that the charset is always specified as the first element of the
  // head region to prevent UTF-7 encoding-based attacks.
  return preg_replace('/<head[^>]*>/i', "\$0\n  <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />", $content, 1);
}

Which also removes the trailing space to keep the code good looking.

- Kent

rzelnik’s picture

subscribe

jantoine’s picture

subscribe

hass’s picture

+

robert-hartl’s picture

subscribe

security issue is okay, but for themers with standards in mind there must be a simple way to influence this. Framework!

rlmaltais’s picture

I don't have an elegant solution for this, so take this with a grain of salt. This fix just 'feels' wrong to me.
1. It is fixing a security issue in IE, not drupal.
2. It is forcing output that may not be desired.

I don't know a whole lot about charsets, but I did notice this in the link from #34: "Declaring an incorrect charset can be worse than not setting one at all."

To be honest, I never would have noticed if I wasn't working with HTML5. I generally skip over this part of the header when I view source.

Like I said, I don't have a good solution for this, my current one is to disable the action in common.inc and code my themes correctly. It's a pain, but it works.

naxoc’s picture

Subscribe

asak’s picture

subscribing...

ianchan’s picture

subscribe

Summit’s picture

Version: 6.11 » 6.14

Subscribing
I added http://drupal.org/node/451304#comment-1711102 to my page.tpl.php and it worked.
But I also want the title tag to be the first and not the content line, is this possible using this core-patch. Will that ever be committed?
Hopefully no security issues doing this?

greetings, Martijn

sreynen’s picture

After reading this thread, the two meta tags don't seem to actually be "by design," but rather just one way to accomplish the security goal here. Seems like it should be possible to find a solution that accomplishes that goal and does not involve duplicate content types. Would be nice if someone in the "by design" camp would comment on the general idea of removing existing content types before inserting the new tag in drupal_final_markup().

hedac’s picture

amazing silly bugs drupal has...
oh well...

kimblim’s picture

The duplication of the meta content-type header is harmless.

..unless you care about the HTML output of your page. Come on, outputting it twice is a hack and you know it - why not check if it's already been set and then decide whether or not to output it?

Summit’s picture

Agree, also out of SEO-purposes it is better not to show it twice!
Greetings, Martijn

Damien Tournoud’s picture

..unless you care about the HTML output of your page. Come on, outputting it twice is a hack and you know it - why not check if it's already been set and then decide whether or not to output it?

We output valid HTML (nothing in the (X)HTML specification says you should only have one of those meta-tag.

But yes, it is a hack. It was made to make everyone safer and it will be gone in Drupal 7. Overall, not a big deal at all.

Feel free to come up with an efficient code to "check if it's already been set and then decide whether or not to output it". I would be happy to review this patch.

Agree, also out of SEO-purposes it is better not to show it twice!

You should fire your SEO consultant.

kimblim’s picture

We output valid HTML (nothing in the (X)HTML specification says you should only have one of those meta-tag.

Valid HTML is not necessarily good HTML! Layouts built with tables validate, that doesn't mean that it's good HTML... The HTML validator service is a tool, not a stamp of approval.

Nagyman’s picture

it is totally possible that some theme function is used to generate arbitrary XML or other markup. That could easily contain markup which matches ]*> or have with a different meaning then HTML

This is exactly how I came across this. I was generating a NewsML feed, which defines a <HeadLine> tag. Mysteriously, a meta tag was being injected into the content. Ugh.

MatthieuP’s picture

Version: 6.14 » 6.15

subscribe

mattiasj’s picture

subscribe

gollyg’s picture

subscribe

gollyg’s picture

Clearly there are some valid reasons for putting in this code, but it is (as everyone seems to agree) an inelegant hack. The people complaining here seem to be those who actually value their html validation (I include myself amongst them).

A major issue seems to be the lack of an option to disable the behaviour, and the way that this update has silently broken validation on, ironically, any site that is concerned enough about security to apply that upgrade. This is a slight betrayal of trust which is why it is evoking a strong response. I am not criticising the security team - they do an amazing job and I would rather they do things like this than not.

I have two ideas that may be worth discussing.

  • We use the current solution, but provide a theme setting to disable the behaviour. By default it is set to add the element
  • Rather than checking final markup, we use the regex on the active/enabled themes to check for the $head variable, and modify the output based upon its absence. At least the themes that are doing the right thing would not be affected

And on the issue of xhtml output, there are valid reasons to use HMTL 4.01. It would be great to have a theme level setting that triggered the empty tag syntax. I know, scratch my own itch ;)

hendrinx’s picture

Sorry - mispost

hendrinx’s picture

Did you manage to come up with a workaround? I've got a similar issue when generating xml, which has a tag defined in it. Cheers

d3zign7reak’s picture

Subscribing!

mcrittenden’s picture

Version: 6.15 » 6.x-dev
Status: Closed (works as designed) » Needs work

Sub. Changing to CNW based on #48:

Feel free to come up with an efficient code to "check if it's already been set and then decide whether or not to output it". I would be happy to review this patch.

Please don't change back to 'by design' prematurely. Yes, it was originally done by design, but it's a hack and it's a pretty big DrupalWTF for D6 at the moment. Let's see if somebody wants to try and tackle this the right way.

brianmercer’s picture

subscribed

sartogo’s picture

Just a small tweak to #25 to still give out the character encoding for html 5

<?php //print $head;
  $headlines = explode("\n", $head);
  foreach ($headlines as $line) {
   if ($line == '<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />') {
    $line = '<meta charset="utf-8" />' . "\n";
   }
   print $line ."\n";
  }
 ?>

You still need to edit drupal_final_markup($content) [/includes/common.inc] like this:

function drupal_final_markup($content) {
  // Make sure that the charset is always specified as the first element of the
  // head region to prevent encoding-based attacks.
  //return preg_replace('/<head[^>]*>/i', "\$0\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />", $content, 1);
  return $content;
}

-- Alex Sartogo
Hot Sauce Studios

laja’s picture

subscribe

denis_sle’s picture

It's possible to solve the problem on theming level (without patching the drupal core)
Something like next line in your page.tpl.php (in used theme folder)

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="<?php print $language->language ?>" lang="<?php print $language->language ?>" dir="<?php print $language->dir ?>">
<head>
<?php $head = str_replace('<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />','',$head);
print $head; ?>
<title><?php print $head_title; ?></title>
mcrittenden’s picture

Re: #62, first of all, something like that should go in template.php, and also, won't your example remove all instances of it?

ao2’s picture

@denis_sle, please see #26, and yes, with this _workaround_ you still need to emit your own content-type in page.tpl.php

Bye,
Antonio

sreynen’s picture

In an effort to move a fix forward, I'm going to try to summarize the constraints such a fix would need to work under. Please correct any mistakes here:

  • There are two content-type meta tags being inserted, which makes the HTML bloated.
  • The first insertion is in drupal_get_html_head
  • The second insertion is in drupal_final_markup
  • The drupal_final_markup implementation uses a regular expression to check for <head>, which is not 100% reliable, as it's possible to create an invalid theme without a <head> to match this expression.
  • The drupal_get_html_head implementation will always output the meta tag at the top of $head, but $head is not necessarily at the top of <head>, so this is also not 100% reliable.
  • Because neither addition is 100% reliable, simply removing either would make the security solution less reliable, as the two meta tags couldn't cover potential gaps left by the other.
  • It's important that the content-type meta tag be inserted as reliably as possible, as the lack of this tag is a known security problem.
  • A solution that removed one of the insertions only when certain the other has already been inserted would address the duplicate tag problem without introducing any additional security vulnerability.

Is that all correct?

Danny Englander’s picture

subscribing

System6Hosting’s picture

Subscribe.

antiorario’s picture

subscribe

ayalon’s picture

still an issue...

veriKami’s picture

...issue... and annoying thing...

I have a question for security team - what happens if we add this declaration to php header - as far as I know it takes precedence over the html declaration...


header('Content-Type: text/html; charset=utf-8');

...is UTF-7 injection (in ie) is the case then...?

sobi3ch’s picture

subscribing

codepeak’s picture

subscribing

fax8’s picture

subscribing

achton’s picture

subscribing

sreynen’s picture

Status: Needs work » Needs review
FileSize
960 bytes
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch duplicate_0.patch. View

This is a simple change to drupal_final_markup() that only adds the content-type meta tag at the top of head if we can't verify that it's already at the top of head. If we can verify it's already there, there's no benefit to adding it again, so we just return $content as-is.

perusio’s picture

The patch in #75 is working correctly. Thank you.

intyms’s picture

The patch from #75 fixes this issue.
I am using:
Drupal 6.17.

mcrittenden’s picture

Re: #75, seems like it might be a bit cleaner to only have 1 return statement, like so:

if (!preg_match('/]*>\s*/i', $content)) {
  $content = preg_replace('/]*>/i', "\$0\n", $content, 1);
}
return $content;

No?

sreynen’s picture

@mcrittenden, I don't think that's really better, but it's pretty subjective and not a significant gain in speed or readability on either side (with the exact same functional result). If you think it's better, please post a patch so this can maybe get resolved before D8 is released.

mcrittenden’s picture

FileSize
964 bytes
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch one_return_statement.patch. View

This implements #78. Not sure what the Drupal coding style preference is on one or multiple return statements, so this may or may not be necessary, but it just looks better to me.

sreynen’s picture

Status: Needs review » Reviewed & tested by the community

Looks good to me. Since this is functionally identical to the previous patch, which had 2 positive reviews, I'm going to go ahead and move the status to RTBC.

Gábor Hojtsy’s picture

Because the security team introduced this code, I've sent a request to them to help double check.

pwolanin’s picture

Looks, reasonable - though this effectively is a minor performance hit for all sites without an updated theme, right?

davidwhthomas’s picture

Similar to #26, this snippet will strip out only the duplicate metatag in template.php, without a core hack/patch.

<?php
THEME_preprocess_page(&$vars){
  // Strip duplicate head charset metatag, use limit param to strip 1 instance only.
  $vars['head'] = preg_replace('/<meta http-equiv=\"Content-Type\"[^>]*>/', '', $vars['head'], 1);
}
?>

Note: The assumes the tag is always output twice, which, as it's output in core is reasonable. But if a core patch is applied, this snippet will need to be removed.

*EDIT*

Here's a version that checks the string exists twice before stripping one:

<?php
THEME_preprocess_page(&$vars){
// Strip duplicate head charset metatag
  $matches = array();
  preg_match_all('/(<meta http-equiv=\"Content-Type\"[^>]*>)/', $vars['head'], $matches);
  if( count($matches) >= 2){
    $vars['head'] = preg_replace('/<meta http-equiv=\"Content-Type\"[^>]*>/', '', $vars['head'], 1); // strip 1 only
  }
}
?>

DT

perusio’s picture

Yes. It's a good idea. It can even be implement in a module:

clean_duplicate_meta_tag_preprocess_page(&$vars){
  // Strip duplicate head charset  tag.
  $matches = array();
  if (preg_match_all('/(]*>)/im', $vars['head'], $matches) > 1) {
    $vars['head'] = preg_replace('/]*>/', '', $vars['head'], 1); // strip 1 only
  }
}
GuyPaddock’s picture

Has anyone considered proposing a theme hook that allows the theme to certify that it correctly handles the content type? I mean, after all, the heavy-handed insertion of the content type meta tag was in response to lots of themes that exist that don't correctly set the content type. I would think that, with that in mind, it would make sense to give theme developers the option of turning off the insertion behavior simply by going the "Drupal Way" and implementing the appropriate hook.

I propose something like THEME_content_type_is_provided().

Kutakizukari’s picture

My website will display just the source code one minute or the webpage the next. My web host says it is due of the duplicate content header tags. Which I really had three because I added one to my page.tpl.php file.

Gábor Hojtsy’s picture

@Kutakizukari: did the patch solve your issue (I'm suspicious that it did not)?

@all: not sure its a good idea to preg_match case insensitively through the whole generated page content. That can easily get darn slow. Did you measure this?

sreynen’s picture

@Gábor Hojtsy, this should actually be faster than the current code in the vast majority of cases, specifically cases where themes are doing things right, the cases we want to optimize for.

The existing code also does a case-insensitive search, but with preg_replace rather than preg_match. Because preg_match stops after the first match, and in the vast majority of cases a match will be found very early in the document, preg_match will almost never search the whole document. preg_replace, on the other hand, defaults to replacing all matches, meaning it currently does look through the whole document in every case.

I do think the preg_match could be case-sensitive, though, since it would still be case-insensitive on the replace. That would mean maintaining duplicate tags on invalid (uppercase) XHTML in themes, which I think is probably fine with people who don't care about valid markup anyway.

We could also apply a $limit of 1 to the preg_replace while we're at it, and speed it up even for the cases of bad themes. I'll make another patch with all of this soon if no one else gets to it first.

[edited to fix a typo]

sreynen’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
992 bytes
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch duplicate_1.patch. View

This patch is essentially #80, with case-sensitive rather than case-insensitive preg_match. When I went to add a limit of 1 to preg_replace, I discovered it was already there, so some of what I wrote in #89 is wrong. Still, this should be faster than the current code in most cases now that it's case sensitive.

markus_petrux’s picture

Just a couple of minor improvements/suggestions over #90:

1) if the condition of the IF statement is reversed, then a) the patch is simpler, and b) the result of the preg_replace does not need to be assigned to $content.
2) preg_match could use a delimiter that does not exist in the regular expression, making it a bit easier to read.

The patch would look like this:

   // Make sure that the charset is always specified as the first element of the
   // head region to prevent encoding-based attacks.
+  if (preg_match('`<head[^>]*>\s*<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />`', $content)) {
+    return $content;
+  }
   return preg_replace('/<head[^>]*>/i', "\$0\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />", $content, 1);
sreynen’s picture

@markus_petrux, that's essentially undoing the change in #78, which is fine with me, but is starting to feel a bit like bikeshedding. Could you post an actual patch for review to keep this moving?

bleen’s picture

Subscribing

hellaswebnews’s picture

Ant help on the error below regarding printing would be appreciated.

www.hellaswebnews.com

Fatal error: Call to undefined function drupal_final_markup() in /home/.porche/myftp/hellaswebnews.com/sites/all/modules/print/print.pages.inc on line 32
array(4) { ["type"]=> int(1) ["message"]=> string(49) "Call to undefined function drupal_final_markup()" ["file"]=> string(80) "/home/.porchemyftp/hellaswebnews.com/sites/all/modules/print/print.pages.inc" ["line"]=> int(32) } n/a

sreynen’s picture

@hellaswebnews, that looks like an unrelated issue that should have a separate thread.

dtsio’s picture

Subscribing

dragan_r’s picture

Subscribing

hillaevium’s picture

Subscribing :)

psynaptic’s picture

Version: 6.x-dev » 7.x-dev

Seeing what happens...

psynaptic’s picture

Version: 7.x-dev » 6.x-dev

Only effects 6.x.

stemount’s picture

Only affects 6.x.

momper’s picture

subscribe

DamienMcKenna’s picture

Could this be wrapped in a variable_get() check to allow some sites to disable it when needed? e.g.:

function drupal_final_markup($content) {
  // Make sure that the charset is always specified as the first element of the
  // head region to prevent encoding-based attacks.
  if (variable_get('duplicate_meta_content_type', TRUE)) {
    if (!preg_match('/]*>\s*/', $content)) {
      $content = preg_replace('/]*>/i', "\$0\n", $content, 1);
    }
  }
  return $content;
}
mcrittenden’s picture

@DamienMcKenna, why would some sites need to disable it? Just curious as to what I'm missing.

sreynen’s picture

@DamienMcKenna if you're going to create a variable (and I also don't see the point of that, as no one wants duplicate meta tags), you need to also provide an interface for updating that variable.

As this is currently in "needs review" status and we're well over a year in now, can we maybe focus on reviewing the current patch before we introduce more new ideas? If there's a better solution, let's state the benefits clearly and change the status to "needs work" while we work on that better solution. Or better yet, submit a better patch. But if there's just a different solution, that's not really helping move this forward.

TD912’s picture

Issue tags: +favicon

This might also be affecting how favicons are displayed. The $head prints out a "shortcut icon" that does not change when I set a custom icon via the Theme settings. It always prints out "site/misc/favicon.ico". I could work around that by manually overwriting the favicon.ico file there, but I'd prefer not to.

hass’s picture

Issue tags: -favicon

Do not highjack this case, please.

Kutakizukari’s picture

@Gábor Hojtsy the host that I was using to get Drupal running I had to use Apache 1.2, PHP 5 Flex, CGI to get the file temporary directory right. I followed this https://members.nearlyfreespeech.net/wiki/Applications/Drupal.

File system settings and permissions

• All the Drupal files should belong to the web group (25000).
o All files you uploaded to the server will have your user ID for Owner and Group (A number like 165576, or "me"). The "web" group's number is 25000. You can change your files' group to web by typing "chgrp web filename" or "chgrp 25000 filename".
• sites/default/files or sites/default/yoursite.com/files should have 775 permissions, and again, belong to the web group.
• For the temporary directory, PHP's safe mode will keep it from writing to the system's /tmp directory. You should set this to your /home/tmp directory. To get the full path to this directory, visit your admin/reports/status/php page on your Drupal site and scroll down to the PHP Variables section and find the _SERVER["TMPDIR"] variable, which should look something like /f1/content/site-name/tmp/. Use that value for your Drupal site's temporary directory.
• Alternative: If the above doesn't work you might need to try creating a subdirectory of /home/tmp and chgrp it to web. Then in Drupal, go to Administer > Site Configuration > File System and update the temp directory path to use the subdirectory you just created. (Applicable to 6.16 and possibly other versions.)

Was able to get it to work with no problems running Apache 2.2, PHP 5.2 Fast, CGI.

Status: Needs review » Needs work

The last submitted patch, duplicate.patch, failed testing.

mattwfx’s picture

What about delete some code from line 124 in includes/common.inc?

From:
$output = "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\n";

To this:
$output = "";

Simple... but it is safe?

_______________
Sorry for my English

nedim.hadzimahmutovic’s picture

Small modification to #84:


function THEME_preprocess_page(&$vars){
// Strip duplicate head charset metatag
  $matches = array();
  preg_match_all('/(]*>)/', $vars['head'], $matches);
  if( count($matches) >= 2){
    $vars['head'] = preg_replace('/]*>/', '', $vars['head'], 1); // strip 1 only
	$vars['head'] = preg_replace('/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/', '', $vars['head']);
  }
}

Now there is no blank line.

anujahiniduma’s picture

try this. this will do the magic.
function drupal_get_html_head() {
//$output = "\n";
return $output . drupal_set_html_head();
}

adamcadot’s picture

subscribe

kndr’s picture

#111 works. Thanks for simple workaround!

tomass_zutis’s picture

#111 works. Thanks for simple workaround! +1

benschaaf’s picture

#111 worked to remove the second occurrence for me as well- which is great!

Is it possible for the first occurrence to output in the html5 meta tag format for charset?

<meta charset="utf-8" />

sreynen’s picture

@snowfox, that format is only a new option in HTML5. The longer version Drupal uses is still valid in HTML5.

benschaaf’s picture

@117 Correct. They would both still be valid in HTML5 (but what isn't?). But, I would prefer to use the correct(er) more valid(er) flavor of meta tag declaration.

lizac’s picture

#111 also worked for me. Is this the best solution?

Also, just for the sake of the conversation on this thread. does this issue has been solved in D7?

rosshj’s picture

@snowfox

re: <meta charset="utf-8" />

Would love to know how to do this as well.

arjun0819’s picture

Status: Needs work » Needs review

#75: duplicate.patch queued for re-testing.

ReKoNe’s picture

If you have more than 2 lines displayed (which should not append) :

<?php
function THEME_preprocess_page(&$vars){
// Strip duplicate head charset metatag
  $matches = array();
  preg_match_all('/(<meta http-equiv=\"Content-Type\"[^>]*>)/', $vars['head'], $matches);
  $nb_matches = count($matches);
  if($nb_matches >= 2){
	$vars['head'] = preg_replace('/<meta http-equiv=\"Content-Type\"[^>]*>/', '', $vars['head'], $nb_matches - 1); // strip all but one
    $vars['head'] = preg_replace('/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/', '', $vars['head']);
  }
}
?>

this is a small change from #111

Eduart’s picture

Issue tags: +Drupal, +tags, +meta, +ouput

#111 it works for me thank you!
subscribed

chinita7’s picture

Thanks #111 also works for me.

janton’s picture

thanks!

jelo’s picture

Sorry, I did not read the whole discussion, just glanced through and am wondering if 3 years after this was reported a solution has been found that
- avoids the security issue with IE
- avoids a duplicate declaration of content type?
In 6.26 core I am still having this issue... Is #111 the agreed on solution that should be implemented in the theme?

aaronmeister’s picture

Version: 6.x-dev » 6.26

The snippet from #122 worked perfectly. Thank you for providing all the information necessary to insert this code to fix this issue, especially for a non-coder like myself.

I'm not sure if this is a core issue or a theming issue, but I think, for the betterment of the Drupal community, this should be included in documentation and applied to core files.

hass’s picture

Version: 6.26 » 6.x-dev
perusio’s picture

I think that the simplest thing is just to first remove all tags and add the correct one (for preventing UTF-7 encoding attacks on IE) later:

function drupal_final_markup($content) {
  // To avoid duplicate meta tags for Content-Type we remove all before and
  // add it afterwards immediately after .
  if(preg_match('/]*>/i', $content)) {
    $content = preg_replace('/]*>/i', '', $content); // strip all
  }
  // Make sure that the charset is always specified as the first element of the
  // head region to prevent encoding-based attacks.
  return preg_replace('/]*>/i', "\$0\n", $content, 1);
}

I'll provide a patch later (I'm on a very old D6 site now).

DamienMcKenna’s picture

(fixed spelling of the "content-type" tag.

DamienMcKenna’s picture

This version of the patch removes all existing http-equiv meta tags and then adds its own properly-formatted one at the top, thus ensuring that a) there are no duplicates, b) the first tag in the HEAD section will always be a correct http-equiv meta tag. Additionally, it will use the first http-equiv 'content' definition it finds, which might be useful for sites that needed to hardcode its own version, but most sites will not notice this additional step. Lastly, it removes the meta tag that was manually included via drupal_get_html_head() as that had become superfluous.

Note: the logic will still create duplicate meta tags should an existing meta tag exist with either of the following formats:

<meta http-equiv='Content-Type' content="text/html; charset=utf-8" />
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />

Any feedback on this approach to resolve the problem?

DamienMcKenna’s picture

FileSize
1.58 KB
PASSED: [[SimpleTest]]: [MySQL] 190 pass(es). View

Doh.

mheinke’s picture

#132: drupal-n451304-131.patch queued for re-testing.

RytoEX’s picture

I hate to ask at the risk of stirring the pot, but what is the status of this issue? I notice that there is a patch that is marked "PASSED" (http://drupal.org/comment/reply/451304#comment-6483206), but then it was "queued for re-testing", and nothing has happened since then (September 2012). What exactly does that mean for this issue? Thanks!

mheinke’s picture

im not exactly sure if this was solved or not.

but it should be apart of a release.

geerlingguy’s picture

I would RTBC... but this adds some regex which would be run on every page request. It won't affect load times much, but is there any other way we could do this?

Status: Needs review » Closed (outdated)

Automatically closed because Drupal 6 is no longer supported. If the issue verifiably applies to later versions, please reopen with details and update the version.