c++ - Character Encoding independent character swap -


i use piece of code when want reverse string. [when not using std::string or other inbuilt functions in c]. beginner when thought of had ascii table in mind. think can work unicode too. assumed since difference in values (ascii etc) fixed, works.

are there character encodings in code may not work?

char a[11],t; int len,i; strcpy(a,"particl");     printf("%s\n",a); len = strlen(a); for(i=0;i<(len/2);i++) {     a[i] += a[len-1-i];     a[len-1-i] = a[i] - a[len-1-i];     a[i] -= a[len-1-i]; } printf("%s\n",a); 

update:

this link informative in association question.

this not work encoding in (not all) codepoints require more 1 char unit represent, because reversing byte-by-byte instead of codepoint-by-codepoint. usual 8-bit char includes all encodings can represent of unicode.

for example: in utf-16be, string "hello" maps byte sequence 00 68 00 65 00 6c 00 6c 00 6f. algorithm applied byte sequence produce sequence 6f 00 6c 00 6c 00 65 00 68 00, utf-16be encoding of string "漀氀氀攀栀".

it gets worse -- doing codepoint-by-codepoint reversal of unicode string still won't produce correct results in cases, because unicode has many codepoints act on surroundings rather standing alone characters. trivial example, codepoint-reversing string "spın̈al tap", contains u+0308 combining diaeresis, produce "pat länıps" -- see how diaeresis has migrated n a? consequences of codepoint-by-codepoint reversal on string containing bidirectional overrides or conjoining jamo more dire.


Comments

Popular posts from this blog

jquery - How can I dynamically add a browser tab? -

node.js - Getting the socket id,user id pair of a logged in user(s) -

keyboard - C++ GetAsyncKeyState alternative -