I was updating from 7.x-beta2 to the latest -dev in order to try to work on fixing #1738438: Unable to render media from... (error message) . I was finding that this other issue was due to html not being decoded properly when the token filter does a drupal_json_encode. I mention that because it might be related to this issue:

After this update, I'm seeing the markup in a field attached to the file being corrupted if it contains HTML. (I have a 'caption' field which accepts full html attached to my file entity) This is NOT happening when I edit the file directly. It's only happening when editing it through the Media Wysiwyg button (hence doesn't appear to be a file entity issue...but a media wysiwyg issue).

To recreate this problem:
1. Add a textarea field onto your file entity and allow full html with a wysiwyg. Be sure this field displays when viewing the image as it's embedded into the body of a node (ie. you may have to setup your view mode properly to include this field)
2. Create a new node and use the media wysiwyg button to upload a new image
3. When uploading your image, add markup to your textarea field on the file entity. For example, I added: <p>This <a href="http://google.com">contains a link</a> to google</p> using the wysiwyg link functionality.
4. Set any other options and submit your image...thereby inserting it into the body of the node THIS is point in the process where the html from the html field is incorrectly encoded!
5. Save the node

The first thing to notice is if you put 'www' in your link (ie. you link to www.google.com instead of google.com) the image doesn't appear at all. That's related to #1738438: Unable to render media from... (error message).

If you do not put in the 'www', you'll see the image appear...but the full html textarea field (which you should have set to display) contains encoded jibberish. Screenshot:
Jibberish Media Field

To go further:
6. Edit the node
7. Click the image you inserted in your body field to select it, then click the 'media wysiwyg' button to edit it.
8. Notice your textarea where you typed in the html text. It's also now being encoded improperly. Screenshot:

Jibberish Media Field

My assumption is that, during save, media wysiwyg is attempting to encode everything properly, but it's not. It simply breaks any html in these fields.

Marking as critical because this will radically break existing html field content on files if users update to this version.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Rob_Feature created an issue. See original summary.

Rob_Feature’s picture

Title: media wysiwyg tokens with html are corrupted on edit » media wysiwyg tokens with html are corrupted on save

updated title since I think the markup is corrupted on save, not edit.

Rob_Feature’s picture

Issue summary: View changes
joseph.olstad’s picture

Thanks for the issue.

A workaround would be to use media_ckeditor module and configure it accordingly.

so you're using the wysiwyg module with the ckeditor library, this is good to know, I didn't realize that the media module was being used that way. It'd be nice to make media play nice with your configuration as well.

joseph.olstad’s picture

Category: Bug report » Support request
Priority: Critical » Normal
Status: Active » Needs work

not sure if this is a bug yet, it might require a combination of documentation or some helper code to help people figure out the configuration easier.

if this does require a patch, it'd have to be written in a way that doesn't break the standard implementation that uses media_ckeditor

Rob_Feature’s picture

Category: Support request » Bug report
Priority: Normal » Major

Thanks Joseph....to put it bluntly (but I mean this in the kindest way possible), I completely disagree :) This is how Media module has always worked up until this release. Due to changes that were made recently (not sure what they were exactly since I haven't reviewed how everything changed) it now breaks existing content (permanently converting working data to broken data in the database, not just changing how something is displayed which can easily be changed/reverted). I'd suggest any changes that break existing content (unless it's a major version change) are ALWAYS a bug!

I'd really like to set this as a bug and, at least, set it to Major. This is going to literally break sites and corrupt data for anyone using this config and updates. This isn't a niche issue that really only applies to me...it's really the only method to use to add attribution to images so I don't think it's uncommon.

Can we please get some other folks to weigh in on this before you downgrade the issue? It has potential to cause real havoc on existing data!

joseph.olstad’s picture

How about we start by isolating when this started to occur, so this is since 7.x-beta2 according to the description, however if you are able to restore your database, could you go back to beta6 and see if the issue occurs there?

Note: it could be a configuration issue as well, media since beta8 no longer uses variables to store certain configuration
#1792738: Allow custom file view modes for WYSIWYG display

and this commit:
http://cgit.drupalcode.org/media/commit/?id=31e20aef657ac223f58c3d9228cd...

This change I speak of came in at beta8.

If you wanted to upgrade to something newer than beta2 then I'd suggest beta6 in the meanwhile, unless of course you're willing to continue helping us fix this.

joseph.olstad’s picture

any ideas?

Rob_Feature’s picture

Playing with this this morning...let me see what I can find.

Rob_Feature’s picture

I see why I had some confusion....it looks like when you took over development (?) the method for releases changed a little bit. I thought the version of file entity module and media module always needed to match (it had over the previous years) but now you're releasing versions of media module much 'faster' so the release numbers don't match. File entity is only at beta3 :) This must have been a philosophy change I didn't hear about (I was always trying to keep them matching which is why I was so far behind in releases).

Tested an update to beta6 and it broke before then sometime. Beta6 shows the broken text as shown in the original post. I'm going to try going back further.

Rob_Feature’s picture

It looks like it broke going all the way back to the beta3 release. Updating from beta2->beta3 causes the problem. What updates were done in that release with regards to the media token that could cause this?

Rob_Feature’s picture

OK, this is really really odd...I may not be thinking clearly:

  • I'm running beta3
  • It was still encoding the field incorrectly
  • My media with field is embedded in the 'body' field with Full HTML input filter. This input filter has the 'convert media tags to markup' checked.
  • I wanted to ensure this was an issue with the media input filter code, so I went in and UNchecked the 'media tags to markup' input filter on full html
  • I cleared caches (multiple times)
  • I'm STILL SEEING the image that was embedded into the body (even though the 'convert media tags...' is disabled).
  • BUT the text is no longer garbled...it's displaying correctly

Unless I'm confused about something in the beta2->beta3 update...the embedded media image should have disappeared, but it's still visible! (this seems really wrong...so I'm doublechecking everything....I think something may be getting encoded twice here...)

Rob_Feature’s picture

Category: Bug report » Support request
Priority: Major » Normal

OK, I'm going to eat my previous words here....the double encoding encountered above really smells like something specific to my installation (modules conflicting, etc). I'm going to move this back to a support/normal issue until I find out more. I no longer believe this to be a bug...thanks for graciously sticking with me on this even though I insisted otherwise :)

EDIT: Going further...I'm torn. This may still be a bug, trying to figure that out by removing as many modules/theme modifications I have as possible. Trying to nail it down, for sure, to it being this module doing the double encoding. I wont convert it back to a bug unless I have distinct results that it's caused within the module.

joseph.olstad’s picture

as for file entity, I recommend the latest version of file_entity.

You might also want to try these two patches for file_entity, this is what I use.

- projects[file_entity][patch][2000934]
https://www.drupal.org/files/issues/allow_selection_of-2000934-30.patch

#2000934: Allow selection of which folder a file is to on the file/add form

- projects[file_entity][patch][2198973]
https://www.drupal.org/files/issues/file_entity_override_widgets-2198973-01.patch

Not sure if this suggestion will help at all.

Rob_Feature’s picture

Early next week I'm going to continue my testing my trying to recreate all this on a fresh installation and see where I get. will report back then.

joseph.olstad’s picture

Ok, thanks, hope we can resolve this soon.

LittleRedHen’s picture

Category: Support request » Bug report
Priority: Normal » Major
FileSize
1.82 KB

I've run into this as well (or at least I think it's the same issue!)
My setup has file_entity 7.x-2.0-beta3, media 7.x-2.0-beta10 with media_wysiwyg also enabled, ckeditor 7.x-1.17 and media_ckeditor 7.x-2.0-alpha3 (i.e. latest stable versions of all those as of today) plus a couple of patches to media_ckeditor for unrelated issues.

I have a long-text caption field on my image file entity type, which I allow to be overridden in the media browser.

What appears to be happening is that the recent versions of media_wysiwyg and media_ckeditor use an identical code pattern to check for html-formatted fields in the overridable fields (identified by having both a value and a format), and then ensuring that the value-which-possibly-contains-markup is URL encoded so that it doesn't break the JSON value in the media token. This makes sense, and those tokens are swapping between token and markup much, much better now than they were in previous versions.

Unfortunately neither module seems to have implemented anything that decodes that field again. The media_wysiwyg_filter_field_parser function has a comment that "Fields that use rich-text markup will be urlencoded", but nothing I could find actually decodes those values, and so the URLEncoded value value is what appears when the file entity is actually rendered, and also when you attempt to edit the fields again.

I've made a patch to that function to notice format-and-value tags, and to URLdecode the values when appropriate. This has solved the issue for me, and should do so for people using WYSIWG rather than CKEditor and Media-CKEditor as well, due to the shared code pattern. This should work with both 2.x-dev and 2.x-beta10, since those are currently the same commit.

LittleRedHen’s picture

Status: Needs work » Needs review
Related issues: +#2720997: Overridden fields not encoded in token
LittleRedHen’s picture

joseph.olstad’s picture

ok great stuff @LittleRedHen , while you're into it and if its not too much trouble can you also confirm that this patch is safe /compatible with media_ckeditor configurations?

joseph.olstad’s picture

@LittleRedHen, wondering if this type of condition should be used?:

 if (!module_exists('media_ckeditor')) {
LittleRedHen’s picture

@joseph.olstad - Using media_wysiwyg with media_ckeditor is actually my use case. The URL decoding is definitely still needed when media_ckeditor is in use, because media_ckeditor uses an identical processing of html values in its override of Drupal.media.formatForm.getOptions.

There may be a more appropriate location for this fix, though. It does need to be part of the token-to-markup filter, but perhaps media_ckeditor ought to implement hook_media_wysiwyg_token_to_markup_alter and do its own URLDecoding in order to reduce the co-dependecy between these two modules a bit? Or else perhaps the handling of HTML-field override ought to be refactored out of Drupal.media.formatForm.getOptions into something that media_ckeditor isn't overriding?

The patch at #19 is a bit of a quick and dirty solution that will get my production system functional again (I use drush make for deployment so need a linkable patchfile). Actual integration may need a bit more design work...

joseph.olstad’s picture

@LittleRedHen , I think your patch is better than no patch. RTBC? commit?

joseph.olstad’s picture

Status: Needs review » Fixed

fixed in dev branch , beta11 will be comming soon, maybe tomorrow or a few days from now.

fuzzy76’s picture

I have a (possibly) related problem, but this patch does not help me at all. I have opened a separate issue at #2832384: media_vimeo and media_wysiwyg tokenhandling broken except when using the dev version of wysiwyg.

Rob_Feature’s picture

Confirmed! Thanks for the fix guys...this solved the original issue i posted.

jvogt’s picture

I can confirm that this patch fixes the same error if you happen to be using ckeditor_media with wysiwyg rather than media_ckeditor. Thanks for the patch!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

loze’s picture

I know this is several years old, but I am having the exact same issue.
In my case it is only happening when only one input format is available on the textarea (by using better_formats module) and there is no format select element in the form. If there is more than one format and the select box is present, it works.

im using Media 7.x-2.x-dev

joseph.olstad’s picture

@loze , if you have a screenshot handy, maybe attach it.