Radicore Forum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » RADICORE development » Bug Reports » preg_replace operation distorts multibyte unicode characters
preg_replace operation distorts multibyte unicode characters [message #5293] Wed, 13 January 2016 19:11 Go to next message
kong is currently offline  kong
Messages: 90
Registered: December 2011
Member
When a $where string includes multibyte characters such as Chinese, sometimes you will notice that the $where clause is not interpreted correctly.

This problem can traced back to the operation
$where = trim(preg_replace("/\s+/", " ", $where));  // replace tabs and newlines with ' '
in function where2indexedArray ($where) in file include.library.inc.

Turns out that the /s criteria of the regex applies also to individual bytes of multibyte characters and when such a single byte meets /s criteria it would then be replaced with a "space byte", changing the multibyte character into something else. I solved this problem by adding the /u modifier to the regex:
$where = trim(preg_replace("/\s+/u", " ", $where));  // replace tabs and newlines with ' '

For reference: http://www.regular-expressions.info/php.htmlQuote:
If you want your regex to treat Far East characters as individual characters, you'll either need to use the mb_ereg functions, or the preg functions with the /u modifier.
Re: preg_replace operation distorts multibyte unicode characters [message #5294 is a reply to message #5293] Thu, 14 January 2016 04:53 Go to previous message
AJM is currently offline  AJM
Messages: 2347
Registered: April 2006
Location: Surrey, UK
Senior Member
Thanks for spotting that. I will include this change in the next release.

Previous Topic: Value is missing for fieldspec['type']
Next Topic: subsystem export does not cover menu tasks for roles' task access
Goto Forum:
  


Current Time: Fri Mar 29 11:21:12 EDT 2024

Total time taken to generate the page: 0.01000 seconds