html - parsing xml attribute: strange Encoding issue -
i strange encoding problem when try parse attribute of xml/html document. here reproducible example , containing 2 items 2 titles (note use of french accent here)
library(xml) doc <- htmlparse('<note> <item title="é">1</item> <item title="ï">3</item> </note>',astext=true,encoding='utf-8') now using xpathapply , can read items this. note special accents formatted here.
xpathapply(doc,'//item') [[1]] <item title="é">1</item> [[2]] <item title="ï">3</item> but when try read attribute title , :
xpathapply(doc,'//item',xmlgetattr,'title') [[1]] [1] "é" [[2]] [1] "ï" i tried other xpath versions :
xpathapply(doc,'//item/@title') xmlattrs(xpathapply(doc,'//item')[[1]]) but doesn't work. please?
its not pretty , cant test on linux machine try:
xpathapply(doc,'//item', function(x) iconv(xmlattrs(x,'title'), "utf-8", "utf-8")) [[1]] title "é" [[2]] title "ï" xmlattrs calls rs_xml_xmlnodeattributes examining code there appears no facility handling encoding. xmlvalue calls r_xmlnodevalue has encoding added. looking @ ?xmlvalue have encoding: experimental functionality , parameter related encoding. maybe encoding on attributes added @ later date.
Comments
Post a Comment