r - How to submit login form in Rvest package w/o button argument -
i trying scrape web page requires authentication using html_session() & html_form() rvest package. found e.g. provided hadley wickham, not able customize case.
united <- html_session("http://www.united.com/") account <- united %>% follow_link("account") login <- account %>% html_nodes("form") %>% extract2(1) %>% html_form() %>% set_values( `ctl00$contentinfo$signin$onepass$txtfield` = "gy797363", `ctl00$contentinfo$signin$password$txtpassword` = password) account <- account %>% submit_form(login, "ctl00$contentinfo$signinsecure") in case, can't find values set in form, hence trying give user , pass directly: set_values("email","password")
i don't know how refer submit button, tried: submit_form(account,login)
the error got submit_form function is: error in names(submits)[[1]] : subscript out of bounds
any idea on how go appreciated. thank you
currently, issue same open issue #159 in rvest package, causes issues not fields in form have type value. buy may fixed in future release.
however, can work around issue monkey patching underlying function rvest:::submit_request.
the core problem helper function is_submit. initially, it's defined this:
is_submit <- function(x) tolower(x$type) %in% c("submit", "image", "button") as logical is, however, fails in 2 scenarios:
- there no
typeelement. - the
typeelementnull.
both of these happen occur on united login form. can resolve adding 2 checks inside function.
custom.submit_request <- function (form, submit = null) { is_submit <- function(x) { if (!exists("type", x) | is.null(x$type)){ return(f); } tolower(x$type) %in% c("submit", "image", "button") } submits <- filter(is_submit, form$fields) if (length(submits) == 0) { stop("could not find possible submission target.", call. = false) } if (is.null(submit)) { submit <- names(submits)[[1]] message("submitting '", submit, "'") } if (!(submit %in% names(submits))) { stop("unknown submission name '", submit, "'.\n", "possible values: ", paste0(names(submits), collapse = ", "), call. = false) } other_submits <- setdiff(names(submits), submit) method <- form$method if (!(method %in% c("post", "get"))) { warning("invalid method (", method, "), defaulting get", call. = false) method <- "get" } url <- form$url fields <- form$fields fields <- filter(function(x) length(x$value) > 0, fields) fields <- fields[setdiff(names(fields), other_submits)] values <- pluck(fields, "value") names(values) <- names(fields) list(method = method, encode = form$enctype, url = url, values = values) } to monkey patch, need use r.utils package (install via install.packages("r.utils") if don't have it).
library(r.utils) reassigninpackage("submit_request", "rvest", custom.submit_request) from there, can issue our own request.
account <- account %>% submit_form(login, "ctl00$contentinfo$signinsecure") and works!
(well, "works" misnomer. due united employing more aggressive authentication requirements -- including known browsers -- results in 301 unauthorized. however, fixes error).
a full reproducible example involved couple of other minor code changes:
library(magrittr) library(rvest) url <- "https://www.united.com/web/en-us/apps/account/account.aspx" account <- html_session(url) login <- account %>% html_nodes("form") %>% extract2(1) %>% html_form() %>% set_values( `ctl00$contentinfo$signin$onepass$txtfield` = "user", `ctl00$contentinfo$signin$password$txtpassword` = "pass") account <- account %>% submit_form(login, "ctl00$contentinfo$signinsecure")
Comments
Post a Comment