html - parsing xml attribute: strange Encoding issue -


i strange encoding problem when try parse attribute of xml/html document. here reproducible example , containing 2 items 2 titles (note use of french accent here)

library(xml) doc <- htmlparse('<note>               <item title="é">1</item>               <item title="ï">3</item>           </note>',astext=true,encoding='utf-8') 

now using xpathapply , can read items this. note special accents formatted here.

xpathapply(doc,'//item')  [[1]] <item title="é">1</item>   [[2]] <item title="ï">3</item>  

but when try read attribute title , :

xpathapply(doc,'//item',xmlgetattr,'title') [[1]] [1] "é"  [[2]] [1] "ï" 

i tried other xpath versions :

  xpathapply(doc,'//item/@title')    xmlattrs(xpathapply(doc,'//item')[[1]]) 

but doesn't work. please?

its not pretty , cant test on linux machine try:

  xpathapply(doc,'//item',          function(x) iconv(xmlattrs(x,'title'), "utf-8", "utf-8")) [[1]] title    "é"   [[2]] title    "ï"  

xmlattrs calls rs_xml_xmlnodeattributes examining code there appears no facility handling encoding. xmlvalue calls r_xmlnodevalue has encoding added. looking @ ?xmlvalue have encoding: experimental functionality , parameter related encoding. maybe encoding on attributes added @ later date.


Comments

Popular posts from this blog

Change php variable from jquery value using ajax (same page) -

Pull out data related to my apps from Android Play Store and iOS App Store -

How can I fetch data from a web server in an android application? -