c# - Reversing the bytes of a unicode txt file -
i have file 1.txt in unicode. text of file: "12345" if read bytes byte array, 12 bytes:
255 254 49 0 50 0 51 0 52 0 53 0
that's fine. can't understand if reverse bytes this:
0 53 0 52 0 51 0 50 0 49 254 255
c# method encoding.unicode.getstring(bytearray)
returns 㔀㐀㌀㈀�
, that's correct, notepad shows 5 4 3 2 1юя
, why?
you can find byte reverse method here:
your text file encoded utf16.
the 2 bytes @ front byte order mark (bom) , aren't part of text.
you must not alter them. should skip first 2 bytes , reverse remainder of bytes.
but give problems because can't reverse bytes in utf16 code - give code different character, or indeed invalid code.
anyway, what's happening when reverse order wind bom stuck @ end forms invalid utf16 code happens "?" character @ end you're seeing, , messes encoding other characters.
however, looks notepad opening file using ansi encoding, code page used current locale.
the text file contains bytes 0 53 0 52 0 51 0 50 0 49 254 255
, notepad converting 0 space, , other values less 0x80 being converted ascii characters, while 254 converted ю
, 255 я
(which assume value of characters in ansi code page current locale).
i'm guessing you're in slavic region uses cyrillic script.
Comments
Post a Comment