java - StringEscapeUtils.unescapeHtml doesn't work on strings read from files -
i'm trying read in file contains unicode characters, convert characters corresponding symbols , print resulting text new file. i'm trying use stringescapeutils.unescapehtml lines being printed is, unicode points still intact. did practice run copying single line file, making string , calling stringescapeutils.unescapehtml on that, works perfectly. code below:
class filewrite { public static void main(string args[]) { try{ string teststring = " \"text\":\"dude knit hat @ party calls beer \u2018libations\u2019 http://t.co/rop8nsnrfu\" "; filereader instream = new filereader("home timeline.txt"); bufferedreader b = new bufferedreader(instream); filewriter fstream = new filewriter("out.txt"); bufferedwriter out = new bufferedwriter(fstream); out.write(stringescapeutils.unescapehtml3(teststring) + "\n");//this gives desired output, //with unicode points converted string line = b.readline().tostring(); while(line != null){ out.write(stringescapeutils.unescapehtml3(line) + "\n"); line = b.readline(); } //close output streams b.close(); out.close(); } catch (exception e){//catch exception if system.err.println("error: " + e.getmessage()); } } }
//this gives desired output, //with unicode points converted out.write(stringescapeutils.unescapehtml3(teststring) + "\n");
you mistaken. java unescapes string literals of form @ compile time when builds them class file:
"\u2018libations\u2019"
there no html 3 escapes in code. method have chosen designed unescape escape sequences of form ‘
.
you want unescapejava method.
Comments
Post a Comment