Friday 5 August 2016

Displaying different unicode values for same character in php




I am using a function to determine the unicode value in decimal for different Bengali characters.The function is :



               function uniord($u) {
$k = mb_convert_encoding($u, 'UCS-2LE', 'UTF-8');
$k1 = ord(substr($k, 0, 1));
$k2 = ord(substr($k, 1, 1));
return $k2 * 256 + $k1;
}



It works for all the Bengali characters except which unicode value 09DC in hex and 2524 in decimal.This works perfect when I take this character from console/textarea field.such as:



                $data = $_POST['data'];
echo uniord($data);


But is display different unicode value when use this character from a variable .such as:



                $data_one = 'ড়';

echo uniord($data_one);


this provide unicode value of 09A1 in hex or 2465 in decimal which is a another similar word but not my desire character value.



How to solve this. Thanks


Answer



U+09DC has a canonical decomposition as U+09A1 U+09BC. It sounds like your text editor is saving text using decomposed normal form. See if you can change the settings to be able to save using the composed normal form, or try using a different text editor.



Or use escape codes: "\xe0\xa7\x9c"



No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...