r - How to submit login form in Rvest package w/o button argument -


i trying scrape web page requires authentication using html_session() & html_form() rvest package. found e.g. provided hadley wickham, not able customize case.

united <- html_session("http://www.united.com/") account <- united %>% follow_link("account") login <- account %>%          html_nodes("form") %>%          extract2(1) %>%          html_form() %>%          set_values(                 `ctl00$contentinfo$signin$onepass$txtfield` = "gy797363",                 `ctl00$contentinfo$signin$password$txtpassword` = password) account <- account %>%  submit_form(login, "ctl00$contentinfo$signinsecure") 

in case, can't find values set in form, hence trying give user , pass directly: set_values("email","password")

i don't know how refer submit button, tried: submit_form(account,login)

the error got submit_form function is: error in names(submits)[[1]] : subscript out of bounds

any idea on how go appreciated. thank you

currently, issue same open issue #159 in rvest package, causes issues not fields in form have type value. buy may fixed in future release.

however, can work around issue monkey patching underlying function rvest:::submit_request.

the core problem helper function is_submit. initially, it's defined this:

is_submit <- function(x) tolower(x$type) %in% c("submit",          "image", "button") 

as logical is, however, fails in 2 scenarios:

  1. there no type element.
  2. the type element null.

both of these happen occur on united login form. can resolve adding 2 checks inside function.

custom.submit_request <- function (form, submit = null)  {   is_submit <- function(x) {     if (!exists("type", x) | is.null(x$type)){       return(f);     }     tolower(x$type) %in% c("submit", "image", "button")   }    submits <- filter(is_submit, form$fields)   if (length(submits) == 0) {     stop("could not find possible submission target.", call. = false)   }   if (is.null(submit)) {     submit <- names(submits)[[1]]     message("submitting '", submit, "'")   }   if (!(submit %in% names(submits))) {     stop("unknown submission name '", submit, "'.\n", "possible values: ",           paste0(names(submits), collapse = ", "), call. = false)   }   other_submits <- setdiff(names(submits), submit)   method <- form$method   if (!(method %in% c("post", "get"))) {     warning("invalid method (", method, "), defaulting get",              call. = false)     method <- "get"   }   url <- form$url   fields <- form$fields   fields <- filter(function(x) length(x$value) > 0, fields)   fields <- fields[setdiff(names(fields), other_submits)]   values <- pluck(fields, "value")   names(values) <- names(fields)   list(method = method, encode = form$enctype, url = url, values = values) } 

to monkey patch, need use r.utils package (install via install.packages("r.utils") if don't have it).

library(r.utils)  reassigninpackage("submit_request", "rvest", custom.submit_request) 

from there, can issue our own request.

account <- account %>%       submit_form(login, "ctl00$contentinfo$signinsecure") 

and works!

(well, "works" misnomer. due united employing more aggressive authentication requirements -- including known browsers -- results in 301 unauthorized. however, fixes error).

a full reproducible example involved couple of other minor code changes:

library(magrittr) library(rvest)  url <- "https://www.united.com/web/en-us/apps/account/account.aspx" account <- html_session(url) login <- account %>%   html_nodes("form") %>%   extract2(1) %>%   html_form() %>%   set_values(     `ctl00$contentinfo$signin$onepass$txtfield` = "user",     `ctl00$contentinfo$signin$password$txtpassword` = "pass") account <- account %>%    submit_form(login, "ctl00$contentinfo$signinsecure") 

Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

android - Keyboard hides my half of edit-text and button below it even in scroll view -

css - Make div keyboard-scrollable in jQuery Mobile? -