Extract HTML table with superscripts using R -
i trying extract table @ this webpage using r, following code:
library('htmltab') url <- "http://www.math.leidenuniv.nl/~desmit/abc/index.php?set=2" app.data<- htmltab(url, = 3, rm_superscript = f, rm_whitespace=f, rm_invisible=f) however, superscripts integrated main text, entry in table 3^{10}109 outputs 310109, not same thing. if 1 sets rm_superscript = t, output e.g. 3109, i.e. superscripts absent entirely, not right. i'd superscripts indicated, output 3^{10}109. can help? thanks!
here's alternate approach.
library(xml2) library(rvest) url <- "http://www.math.leidenuniv.nl/~desmit/abc/index.php?set=2" pg <- read_html(url) extract table , convert raw html
tab <- as.character(html_nodes(pg, "table")[[3]]) manually replace <sup></sup> {}, convert , extract table
dat <- html_table(read_html(gsub("</sup>", "}", gsub("<sup>", "{", tab) )))[[1]] head(dat) ## quality size merit on b c ## 1 1 1.6299 6.81 8.64 er 19870101 2 3{10}109 23{5} ## 2 2 1.6260 7.68 10.18 bdw 19850920 11{2} 3{2}5{6}7{3} 2{21}23 ## 3 3 1.6235 15.70 26.86 jb jb 19940401 19·1307 7·29{2}31{8} 2{8}3{22}5{4} ## 4 4 1.5808 9.92 13.01 jb jb 19930312 283 5{11}13{2} 2{8}3{8}17{3} ## 5 5 1.5679 3.64 2.89 bdw 19880106 1 2·3{7} 5{4}7 ## 6 6 1.5471 4.77 4.17 bdw 19880106 7{3} 3{10} 2{11}29
Comments
Post a Comment