Unicode issue with views_natural_sort_remove_symbols [#2775643]

I use the views_natural_sort_remove_symbols transformation with the following symbols:

#"'\()[]«?!»¡¿

Somehow it manages to turn

Cuando...se abre, ¿dará algún tipo de señal?

into

Cuando...se abre, dar? algún tipo de señal

With the following result when field is inserted into db

WD node: PDOException:  in views_natural_sort_store() (line 203 of                                                
/Users/nirbhasa/Documents/htdocs/libry/sites/all/modules/views_natural_sort/views_natural_sort.module).
WD php: PDOException:  in views_natural_sort_store() (line 203 of                                               
/Users/nirbhasa/Documents/htdocs/libry/sites/all/modules/views_natural_sort/views_natural_sort.module).

I fixed by making adding the unicode (u) modifier to the preg_replace regex, but I am still not 100% sure what is happening. It does strip ¿ from other fields, but there is some combination of ¿ and á that is making it go funny:

My modified function:

function views_natural_sort_remove_symbols($string) {
  $symbols = variable_get('views_natural_sort_symbols_remove', '');
  if (strlen($symbols) == 0) {
    return $string;
  }
  return preg_replace(
    '/[' . preg_quote($symbols) . ']/u',
    '',
    $string
  );
}

Comment	File	Size	Author
#4	views_natural_sort-unicode_save_tranformations-2775643-4.patch	1.02 KB	generalredneck
#4	7.x-2.x: PHP 5.3 & MySQL 5.5, D7 8 pass PHP 5.4 & MySQL 5.5, D7 8 pass PHP 5.5 & MySQL 5.5, D7 8 pass PHP 5.6 & MySQL 5.5, D7 8 pass PHP 7 & MySQL 5.5, D7 Patch Failed to Apply

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

29 July 2016 at 18:14

nirbhasa created an issue. See original summary.

Comment #2

generalredneck

him/his

English

Texas, USA 🇺🇸

CreditAttribution: generalredneck as a volunteer commented 1 August 2016 at 00:47

Are you able to always reproduce it with that string? Like in other fields? Maybe on a fresh install? Just a quick question... to see if it's not a database encoding thing. If it's possible can you email me a sanatized sql dump? Generalredneck at Gmail dot com. Or at the very least double check the database encoding for me and make sure it's utf8_general_ci.

But I bet it's me using a unicorn unsafe function somewhere.

In the mean time I'll see what I can do as far as testing. I may not get a fast turn around on this one though. Good call on the /u anyway... it probably should be there.

Comment #3

generalredneck

him/his

English

Texas, USA 🇺🇸

CreditAttribution: generalredneck as a volunteer commented 9 April 2017 at 04:41

I was revisiting this issue. Here was some info I found on PHP.net

If the _subject_ contains utf-8 sequences the 'u' modifier should be set, otherwise a pattern such as /./ could match a utf-8 sequence as two to four individual ASCII characters. It is not a requirement, however, as you may have a need to break apart utf-8 sequences into single bytes. Most of the time, though, if you're working with utf-8 strings you should use the 'u' modifier.

http://php.net/manual/en/reference.pcre.pattern.modifiers.php#107498

Since this is the case, it would be prudent to actually put this on all the preg_replaces I have. I'm going to do that and run my tests against it.

This might also explain some of the funkiness you found... Though I couldn't reproduce it at the time.

Comment #4

generalredneck

him/his

English

Texas, USA 🇺🇸

CreditAttribution: generalredneck as a volunteer commented 9 April 2017 at 04:44

Status:

Active

» Needs review

File	Size
views_natural_sort-unicode_save_tranformations-2775643-4.patch	1.02 KB
7.x-2.x: PHP 5.3 & MySQL 5.5, D7 8 pass PHP 5.4 & MySQL 5.5, D7 8 pass PHP 5.5 & MySQL 5.5, D7 8 pass PHP 5.6 & MySQL 5.5, D7 8 pass PHP 7 & MySQL 5.5, D7 Patch Failed to Apply

Comment #5

15 April 2017 at 02:01

generalredneck committed bdd421d on 7.x-2.x

Issue #2775643 by generalredneck, nirbhasa: Unicode issue with...

Comment #6

generalredneck

him/his

English

Texas, USA 🇺🇸

CreditAttribution: generalredneck as a volunteer commented 15 April 2017 at 02:03

Status:

Needs review

» Fixed

So I went ahead and committed this without review because I wrote a test to double check the removal function. See bdd421d.

Comment #7

15 April 2017 at 02:33

generalredneck committed 6b8f779 on 8.x-2.x

Issue #2775643 by generalredneck, nirbhasa: Unicode issue with...

Comment #8

generalredneck

him/his

English

Texas, USA 🇺🇸

CreditAttribution: generalredneck as a volunteer commented 15 April 2017 at 02:34

Ported to D8 as well.

Comment #9

29 April 2017 at 02:35

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Unicode issue with views_natural_sort_remove_symbols

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Thank you to these Drupal contributors

News items

Our community

Documentation

Drupal code base

Governance of community