Thursday 25 May 2017

C# - Comparing strings of different encodings



Using C#, I fetch a TextBox.Text value from an .ascx page. When I compare the equality of the value to a regular string object inside a LINQ-query, it always returns false.



I have come to the conclusion that they are differently encoded, but have so far had no luck in converting or comparing them.



docname = "Testdoc 1.docx"; //regular string created in C#

fetchedVal = ((TextBox)e.Item.FindControl("txtSelectedDocs")).Text; //UTF-8


The above two strings are identical when represented as literals, but comparing the byte[] they are obviously different due to the encoding.



I've tried alot of different things, such as:



System.Text.Encoding.Default.GetString(utf8.GetBytes(fetchedVal));



but that will return the value "Testdoc 1.docx".



If I instead try



System.Text.Encoding.Default.GetString(System.Text.Encoding.Default.GetBytes(fetchedVal));


it returns "Testdoc 1.docx" but an Equals()-check still returns false.



I have also tried the following, which seem to be the recommended approach, but with no luck:




byte[] utf8Bytes = Encoding.UTF8.GetBytes(fetchedVal);
byte[] unicodeBytes = Encoding.Convert(Encoding.UTF8, Encoding.Unicode, utf8Bytes);
string fetchedValConverted = Encoding.Unicode.GetString(unicodeBytes);


The culprit appears to be the whitespace, because when examining the byte sequence it's always the seventh byte that differs.



How do you properly convert from UTF-8 to default string encoding in C#?


Answer




Strings don't have encodings or byte arrays. Encodings only come into play when you convert a string into a byte array; you can only do that by specifying which encoding to use to pick bytes.



It sounds like you actually simply have different characters in your strings. You might have an invisible character in one of them, or they might have different characters that look the same.



To find out, look at the Unicode codepoint values of each character in each string (eg, (int) str[0]).


No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...