In node.tokens.inc, I found a few errors with references to $node->body.

The lines in question are 139-148

        case 'body':
          if (!empty($node->body)) {
            $replacements[$original] = $sanitize ? $node->body[0]['safe'] : $node->body[0]['value'];
          }
          break;

        case 'summary':
          if (!empty($node->body)) {
            $replacements[$original] = $sanitize ? $node->body[0]['safe_summary'] : $node->body[0]['summary'];
          }
          break;

There are two problems in the above code. First, $node->body is keyed by language code, then field instance. Second, 'safe' and 'safe_summary' do not exists in the object.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

te-brian’s picture

FileSize
1.24 KB

Ok, I'm not 100% sure about this one, but here's a patch.

I keyed the $node->body by FIELD_LANGUAGE_NONE and changed the 'safe' and 'safe_summary' to actual calls to check_markup().

Patch Attached.

David_Rothstein’s picture

FileSize
1.08 KB

Adding FIELD_LANGUAGE_NONE definitely makes sense, but I looked into it a bit, and I'm not sure why we would need to call check_markup() directly?

The Field API already takes care of sanitizing the text correctly and putting it in 'safe' and 'safe_summary', so in general it seems like it would be wasteful to run it through check_markup() again. What is the use case where you found that 'safe' and 'safe_summary' don't exist? I just tried it now via a direct call to node_tokens() passing in the result of node_load() for the $node object, and the 'safe' and 'safe_summary' keys were there.

The attached patch is what I used for that. I think this is the correct patch - and anything that doesn't have 'safe' or 'safe_summary' indicates a bug in the calling code rather than in this function - but I could be totally wrong about that :)

David_Rothstein’s picture

Also, it is an interesting question whether it is even the responsibility of node.module to define these tokens, or rather they should all be defined by the Field module (since the node body is essentially just another field). However, that's probably not for this issue to solve :)

Dries’s picture

Status: Needs review » Fixed

Committed to CVS HEAD. Thanks.

plach’s picture

Status: Fixed » Needs work

By using FIELD_LANGUAGE_NONE we are skipping language negotiation/fallback here. This problem is still being discussed in #571654: Revert node titles as fields.

yched’s picture

- Plach is right. + the question of how tokens should behave wrt field values languages doesn't seem clear to me.

- 'safe' and 'safe_sumary' can indeed not be expected to be found in a loaded $node. Text module precomputes them and stores them in field cache when a text field is loaded - text_field_load() -, but *not* if the input format is not cacheable, in which case they are only added at node display time - text_field_sanitize().

te-brian’s picture

Yeah, @yched, this is what I experienced ... I think. After reading David's review I just assumed the particular version of HEAD I was using at the time had something else going on, because I can not reproduce safe and safe_summary not loading now.

Would your recommendation be to check for their existence and then conditionally sanitize the output? Or is this a bigger issue than this one instance?

plach’s picture

Issue tags: +translatable fields
fago’s picture

hm, I stumbled over the same problem when coding entity metadata integration. Previously I used hook_sanitize() to sanitize the field values if they are not present, but that's not possible any more since #658302: repurpose hook_field_sanitize() into hook_field_prepare_view(). I wonder how a fields can be sanitized now? Is there an API function for that?

My current hack works for text fields only - however a generic fied-type agnostic approach would be certainly better.

yched’s picture

hook_field_sanitize() was a flawed concept. There is no such thing as generic field data sanitation, even inside a given field type - it depends on what you want to do with the data and which formatter runs. All formatters don't have the same sanitation requirements.

The fact that formatter functions, hook_field_formatter_view(), now return render arrays instead of being theme functions mean they can do their own sanitation trhemselves.
field_format() should take care of running a given formatter on a given field value - that is, once it gets fixed in #612894: field_format() is just a pile of code that doesn't work :-(.

I don't think, however, that tokens should use the 'formatters' system. We don't want to have tokens like [node:field_foo|formatter_name], do we ? Plus, formatters can have settings, and this becomes unmanageable.

Each field type should expose its own tokens, and the code expanding its tokens should take care of doing its own sanitation if needed.

In the specific case of text fields - or in the more specific case of title and body fields (since currently we hardcode tokens for just those fields), the _text_sanitize($instance, $langcode, $item, $column) helper function can be used to return the sanitized 'value' or 'summary'.
It takes into account the fact that sanitized data is pre-computed at load time when possible, and runs the right function (check_plain or check_markup) according to the instance configuration.

fago’s picture

>There is no such thing as generic field data sanitation, even inside a given field type - it depends on what you want to do with the data and which formatter runs. All formatters don't have the same sanitation requirements.

Really? I can't think of a case where sanitation requirements are different when the formatter changes? Could you give me an example?

@entity-metadata:
What I want here is to get the actual, useful data stored in the field, e.g. for exporting to XML. In case of text fields having input formats the interesting data is the already processed data as the text input is written with certain text processing in mind. Thus it needs to be applied to get the desired output. However processing the data is tied to sanitizing the data - not a problem, but that's the cause why I ask for such an API function.

@token: Well I think it would be useful to have different formats available in tokens, but yes I see the problem with unmanageable settings. Using build modes would work, but it isn't the best option as different token formats might be desired for different uses. So ideally I think one could define multiple formatting presets depending on the field/data type which then can be applied using tokens [node:field_foo|preset_name].

yched’s picture

> "I can't think of a case where sanitation requirements are different when the formatter changes? Could you give me an example?"
text_default and text_trimmed do not need to sanitize the summary
text_plain does not need to run check_markup()
As been pointed in the 'token in core' thread, sanitation depends on the target and context.

Formatters are targeted to produce HTML output. We would need to hear from the token folks here, but IMO formatters are not appropriate for token generation.

fago’s picture

>text_default and text_trimmed do not need to sanitize the summary text_plain does not need to run check_markup()
I see, I missed text_plain. However I don't think it does well for processed text fields. If the field is configured to have text processing, the text plain formatter will bypass it - thus it won't respect the setting. I think it should run check_markup() to do text processing and then strip the tags. Actually check_markup() is a bit misnamed, as it doesn't only checks markup, it applies the filters which might do much more than just checking markup.

So well, stripping tags is also a kind of sanitation, so I have to agree with you that sanitation depends on the formatter. But still (text) processing depends only on the field instance - we already have the text_processing flag in the instance settings. So a "give me the processed data" API function would imho make much sense.

>As been pointed in the 'token in core' thread, sanitation depends on the target and context.
Whether sanitation is necessary - yes. But how the data needs to be sanitized can only know the token provider. Still for some certain use cases you need extra sanitizing steps, e.g. for use in URLs. Those then can be applied by the token consumer.

mcarbone’s picture

FYI, this will get test coverage: #701818: Several tokens are broken, especially file tokens; test coverage of every core token

Although those tests will have to wait until the issue here is resolved, as it's not very clear how to test the body and summary tokens at the moment.

Also, should this perhaps be critical? Otherwise we are potentially shipping with some broken tokens. Not quite certain of myself to flip the switch, tho.

plach’s picture

I think this could be solved by #657972: Make field fallback logic actually reusable. Another possible solution, having monolingual tokens, might be relying on #632172: Node language and field languages may differ, which should allow to use $node->language to select an always-valid field value.

Not sure about the critical status, though.

mcarbone’s picture

Re: the summary and safe_summary keys -- I'm finding that node_load loads the right value for them only if a summary was added separately when creating the node. If the summary wasn't expanded and added separately, node_load has empty values for these keys (although the summary renders properly on the front page, which means that the rendering process doesn't use the summary and safe_summary values directly).

Re: the token tests -- I was able to test the body and summary fields by making sure to explicitly add a summary, so that issue no longer relies on this. However, if those tests are committed first, and this issue ends up changing the body and summary structure, those will need to be fixed in the tests.

plach’s picture

Status: Needs work » Needs review
FileSize
1.13 KB

This should work also with locale enabled, let's leave true multilingual handling to contrib modules.

plach’s picture

Issue tags: +Quick fix
plach’s picture

FileSize
1.13 KB

Actually the previous patch was only partially correct.

plach’s picture

Priority: Normal » Critical

At the moment the body token simply does not work for nodes having language different from LANGUAGE_NONE.

catch’s picture

Status: Needs review » Reviewed & tested by the community

Looks good.

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Looks like a straight-forward fix. Committed to HEAD. Thanks!

David_Rothstein’s picture

Status: Fixed » Needs work

Is the issue with 'safe'/'safe_summary' not always being available in this token being handled in some other issue? As far as I can tell from above, it isn't, so this is not fixed yet.

catch’s picture

@David - has anyone reproduced that bug yet? As far as I can see it's not been.

plach’s picture

untagging

yched’s picture

see #6

yched’s picture

Status: Needs work » Needs review
FileSize
1.21 KB

Attached patch should fix :
- the fact that a loaded $node does not contain precomputed sanitized strings if the input format is not cacheable (PHP filter...) (see #23 and #6)
- the fact that the key for the sanitized full body is 'safe_value', not 'safe'

catch’s picture

Could we check if value and safe_value are set before calling text_sanitize? Seems like that'd save a check_markup().

I'm wondering a little if this is a duplicate of #691078: Field tokens, although that's pushed to contrib now, but still.

yched’s picture

Could we check if value and safe_value are set before calling text_sanitize? Seems like that'd save a check_markup().

that check is done within _text_sanitize() - the function exists because text.module's own formatters needed that same extra "sometimes it's already there, sometimes not" logic.

catch’s picture

Status: Needs review » Reviewed & tested by the community

Doh, good point.

Dave Reid’s picture

We don't have tests to back this up?

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Looks like tests are covered in #701818: Several tokens are broken, especially file tokens; test coverage of every core token. So I think this is good to go, providing that one makes it in. I just bumped it to critical.

Committed to HEAD. Thanks!

catch’s picture

Status: Fixed » Closed (duplicate)

Tests are added in #701818: Several tokens are broken, especially file tokens; test coverage of every core token, however that patch doesn't have the correct fix for this particular issue. I've marked this as duplicate and iinked from that issue.

catch’s picture

Status: Closed (duplicate) » Fixed

Or that's another option :)

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.