I have attached a CSV feed to a page, however the body text is being exported as plain text.

Is this by design or have I overlooked a setting?

Thanks

John

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

exiled_hammer’s picture

Status: Active » Closed (works as designed)

Will answer my own question!

It's by design, see template_preprocess_views_bonus_export_csv()

burningdog’s picture

Title: HTML Stripped from CSV » Body field should include html tags for CSV export
Status: Closed (works as designed) » Active

If this is "by design" then the design is wrong. The body field has a checkbox option which reads, "Strip HTML tags". That means the default behaviour should be including html tags, which views_bonus doesn't do.

exiled_hammer’s picture

I agree it's not obvious but that's what lines 38 to 46 of views_bonus_export.theme.inc imply:

  // Format row values.
  foreach ($vars['themed_rows'] as $i => $values) {
    foreach ($values as $j => $value) {
      $output = decode_entities(strip_tags($value));
      if ($vars['options']['trim']) {
        $output = trim($output);
      }
      $vars['themed_rows'][$i][$j] = $wrap . str_replace('"', $replace_value, $output) . $wrap;
    }

I came across this because my client wants a CSV file for the English version of his site so that it can be translated then imported it back using the node_import module.

So, unless there is a good reason for the tags being stripped, should it be patched?

burningdog’s picture

Most of the html tags *do* need to be stripped, because by the time the above function starts dealing with them, they're in full html, which is unsuitable for a csv export. For instance, I have a CCK colorpicker field which defines the colour of a background in my node (on a per-node basis). The value of it is #b00f0f but when it hits template_preprocess_views_bonus_export_csv() its value is:

<div class="colorpicker #b00f0f">#b00f0f</div>

I can see that it was easier for the views_bonus author to simply set $output = decode_entities(strip_tags($value)); - which works well for any field that we *don't* want html in - but doesn't work for any field that we *do* want html in.

@exiled_hammer: any idea *where* we can check if the "Strip HTML tags" checkbox in views for a particular field is checked, and then change the value sent to template_preprocess_views_bonus_export_csv()?

burningdog’s picture

I'm onto something - instead of working in template_preprocess_views_bonus_export_csv() to check if html tags should be stripped, it's better to work in function _views_bonus_export_shared_preprocess() in views_bonus_export.module

In there I can check the value of strip_tags per field, and if it's set, then strip the html tags there. For almost all fields we want to strip html tags; we only need to check the value of strip_tags on textareas. This involves 2 checks: the 'body' field, and any CCK fields which use an input format.

Patch to follow.

burningdog’s picture

Status: Active » Needs review
FileSize
1.65 KB

Patch attached. Works for me using the body field, and also a CCK textarea (which uses an input format). The patch respects the "Strip HTML" option for these fields in views.

burningdog’s picture

Oh no - the above patch doesn't strip the html within fields that it should be stripped from. It works fine on the body field and all CCK fields. Unfortunately, I was mistaken: checking for the presence of 'format' (in $fields[$id]->options['format']) does *not* differentiate between CCK fields which use input formats and those that don't. For those that don't, the value of $fields[$id]->options['format'] is simply 'default', which means that it's impossible to tell which fields need html stripped from them and which ones don't. Unless we explicitly do this in views.

Unfortunately, this significantly alters the behaviour of the csv export, because now *all* fields that generate html must be manually edited in views to have the checkbox "Strip HTML" selected. This setting makes sense to me, because the default behaviour of views is to generate html (that views_bonus then strips), but this does mean that if this patch is accepted, everyone will need to re-edit all their views upon upgrading.

Nonetheless, patch attached.

burningdog’s picture

Maybe a better approach would be to first check if the field value contains html (i.e. strip the first and last html tags that views adds by default), and if it does, then check if "Strip HTML" has been checked. That would at least preserve the current behaviour of the csv export.

However, my regex-fu is not good enough to strip the first and last html tags in a string. Anyone else?

burningdog’s picture

After much investigation, I can't figure out how to tell the difference between fields that legitimately have html in them (like a body field or cck textarea) and other fields (like a file field, believe it or not) programmatically. The best I can do is count the number of closed html tags, and if it's greater than 2, AND views tells us to, then to strip tags from the field. If the count is less than 2 but views doesn't tell us to strip the tags, then leave it alone.

Something like this:

    foreach ($keys as $id) {
      if (empty($fields[$id]->options['exclude'])) {
        // Strip tags by default; don't if the field contains html and Views says not to
        $strip_tags = TRUE;
        if (substr_count($fields[$id]->theme($row), '</') > 2 && $fields[$id]->options['alter']['strip_tags'] == FALSE) {
          $strip_tags = FALSE;
        }

        if ($strip_tags) {
          $vars['themed_rows'][$num][$id] = decode_entities(strip_tags($fields[$id]->theme($row)));
        }
        else {
          $vars['themed_rows'][$num][$id] = $fields[$id]->theme($row);
        }
      }
    }

At least this preserves the behaviour of the cvs exports. It will break for any field containing html that contains less than 2 html tags, e.g. <p><a href="http://example.com">Link!</a></p> will come out as Link! whether views says to strip it or not.

There must be a better way! Patch attached, at any rate.

exiled_hammer’s picture

Thanks for the work Roger, had to put this issue on the back burner for a bit to keep the development on track. For the moment I have hacked the views_bonus_export.theme.inc but I do need a long term solution.

Will have a look at the patches next week and get back to you.

Cheers

burningdog’s picture

I'm kind of wondering if my approach is wrong. What I'm trying to do is export content from a drupal site so that I can import it into another. I don't particularly mind how that happens, and csv seems to be a decent approach.

However, given the complexity of how to do this, it might be that a views XML feed and the feeds module is actually a much better fit for me. That also lets me move content directly from one site to another, rather than going through the intermediate step of downloading a csv file, and uploading it again.

Adding the patch at #9 might be useful in some cases, but I think it's impossible to handle the edge cases, in which case this introduces more possibility for confusion.

exiled_hammer’s picture

This project might be of interest http://drupal.org/project/content_push once it's been reviewed.

I met the maintainer, Joachim at DrupalCamp Edninburgh last year and he did a demonstration where nodes & users from Drupal 5 site were imported into Drupal 6.

rcahana’s picture

Are there any updates?

exiled_hammer: did you get to update the patch?

tonytosta’s picture

Searching for an update as well.

TechNikh’s picture

Status: Needs review » Reviewed & tested by the community

patch in #9 worked for me.

Abelito’s picture

patch in #9 worked for me too!

jennypanighetti’s picture

Issue summary: View changes

Patch in #9 worked great for me, thanks.

siva_drupal’s picture

But Patch for #9 not worked for me please advice

siva_drupal’s picture

please attach any exported csv file for reference

captjanko’s picture

@siva_drupal what version of Views Bonus Pack are you using?

neclimdul’s picture

Status: Reviewed & tested by the community » Closed (outdated)