I am using a function to determine the unicode value in decimal for different Bengali characters.The function is :
function uniord($u) {
$k = mb_convert_encoding($u, 'UCS-2LE', 'UTF-8');
$k1 = ord(substr($k, 0, 1));
$k2 = ord(substr($k, 1, 1));
return $k2 * 256 + $k1;
}
It works for all the Bengali characters except ড়
which unicode value 09DC in hex and 2524 in decimal.This works perfect when I take this character from console/textarea field.such as:
$data = $_POST['data'];
echo uniord($data);
But is display different unicode value when use this character from a variable .such as:
$data_one = 'ড়';
echo uniord($data_one);
this provide unicode value of 09A1 in hex or 2465 in decimal which is a another similar word but not my desire character value.
How to solve this. Thanks
Answer
U+09DC has a canonical decomposition as U+09A1 U+09BC. It sounds like your text editor is saving text using decomposed normal form. See if you can change the settings to be able to save using the composed normal form, or try using a different text editor.
Or use escape codes: "\xe0\xa7\x9c"
No comments:
Post a Comment