yammer - Web scraping password protected website using R -
i web scrap yammer data using r,but in order first il have login page,(which authentication app created).
https://www.yammer.com/dialog/authenticate?client_id=ivgck1tohbzgs7zc8dpjg
i able yammer data once login page in browser standard yammer urls (https://www.yammer.com/api/v1/messages/received.json)
i have read through similar questions , tried suggestions still cant through issue.
i have tried using httr,rselenium,rvest+selector gadget.
end goal here in r (getting data,cleaning,sentiment analysis...the cleaning , sentiment analysis part done of getting data part manual , automate handling r)
1.trial using httr:
usinghttr<- get("https://www.yammer.com/dialog/authenticate?client_id=ivgck1tohbzgs7zc8dpjg", authenticate("username", "password"))
corresponding result : response [https://www.yammer.com/dialog/authenticate?client_id=ivgck1tohbzgs7zc8dpjg] date: 2015-04-27 12:25 status: 200 content-type: text/html; charset=utf-8 size: 15.7 kb content of page showed has opened login page didnt authenticate.
2.trial using selector gadget + rvest
i tried scraping wikipedia using method couldnt apply yammer authentication required prior calling html tag selctor gadget gives.
3.trial using rselenium
tried using standard browsers , phantomjs got errors
> startserver()
remdr <- remotedriver$new()
remdr$open() [1] "connecting remote server" undefined error in rcurl call. error in queryrd(paste0(serverurl, "/session"), "post", qdata = tojson(serveropts)) :
> pjs <- phantom()
error in phantom() : phantomjs binary not located.
i spent long time manage access password-protected sites inside r. managed submitting credentials html form. had quick login page on yammer , seems similar case managed have access.
here code used. need adapt context: first start session on login page, reach form collects id , password , submit form. think in case, code below work:
session <- html_session("https://www.yammer.com/dialog/authenticate?client_id=ivgck1tohbzgs7zc8dpjg") login_form <- session %>% html_nodes("form") %>% .... %>% #instructions lead login form, e.g. extract2(1) html_form() %>% set_values(`login` = yourid,`password` = yourpasswd) logged_in=session %>% submit_form(login_form))
logged_in should contains session information after logging in.
br