c# - Reversing the bytes of a unicode txt file -

September 15, 2014

i have file 1.txt in unicode. text of file: "12345" if read bytes byte array, 12 bytes:

255 254 49 0 50 0 51 0 52 0 53 0

that's fine. can't understand if reverse bytes this:

0 53 0 52 0 51 0 50 0 49 254 255

c# method encoding.unicode.getstring(bytearray) returns 㔀㐀㌀㈀㄀� , that's correct, notepad shows 5 4 3 2 1юя, why?

you can find byte reverse method here:

your text file encoded utf16.

the 2 bytes @ front byte order mark (bom) , aren't part of text.

you must not alter them. should skip first 2 bytes , reverse remainder of bytes.

but give problems because can't reverse bytes in utf16 code - give code different character, or indeed invalid code.

anyway, what's happening when reverse order wind bom stuck @ end forms invalid utf16 code happens "?" character @ end you're seeing, , messes encoding other characters.

however, looks notepad opening file using ansi encoding, code page used current locale.

the text file contains bytes 0 53 0 52 0 51 0 50 0 49 254 255 , notepad converting 0 space, , other values less 0x80 being converted ascii characters, while 254 converted ю , 255 я (which assume value of characters in ansi code page current locale).

i'm guessing you're in slavic region uses cyrillic script.

Search This Blog

New Mian

c# - Reversing the bytes of a unicode txt file -

Comments

Post a Comment

Popular posts from this blog

jquery - How can I dynamically add a browser tab? -

android - java.net.UnknownHostException(Unable to resolve host “URL”: No address associated with hostname) -

keyboard - C++ GetAsyncKeyState alternative -