yammer - Web scraping password protected website using R -


i web scrap yammer data using r,but in order first il have login page,(which authentication app created).

https://www.yammer.com/dialog/authenticate?client_id=ivgck1tohbzgs7zc8dpjg

i able yammer data once login page in browser standard yammer urls (https://www.yammer.com/api/v1/messages/received.json)

i have read through similar questions , tried suggestions still cant through issue.

i have tried using httr,rselenium,rvest+selector gadget.

end goal here in r (getting data,cleaning,sentiment analysis...the cleaning , sentiment analysis part done of getting data part manual , automate handling r)

1.trial using httr:

usinghttr<- get("https://www.yammer.com/dialog/authenticate?client_id=ivgck1tohbzgs7zc8dpjg",      authenticate("username", "password")) 

corresponding result : response [https://www.yammer.com/dialog/authenticate?client_id=ivgck1tohbzgs7zc8dpjg] date: 2015-04-27 12:25 status: 200 content-type: text/html; charset=utf-8 size: 15.7 kb content of page showed has opened login page didnt authenticate.

2.trial using selector gadget + rvest

i tried scraping wikipedia using method couldnt apply yammer authentication required prior calling html tag selctor gadget gives.

3.trial using rselenium

tried using standard browsers , phantomjs got errors

> startserver() 

remdr <- remotedriver$new()

remdr$open() [1] "connecting remote server" undefined error in rcurl call. error in queryrd(paste0(serverurl, "/session"), "post", qdata = tojson(serveropts)) :

> pjs <- phantom() 

error in phantom() : phantomjs binary not located.

i spent long time manage access password-protected sites inside r. managed submitting credentials html form. had quick login page on yammer , seems similar case managed have access.

here code used. need adapt context: first start session on login page, reach form collects id , password , submit form. think in case, code below work:

session <- html_session("https://www.yammer.com/dialog/authenticate?client_id=ivgck1tohbzgs7zc8dpjg")     login_form <- session %>% html_nodes("form") %>%     .... %>%  #instructions lead login form, e.g. extract2(1)                     html_form() %>%                     set_values(`login` = yourid,`password` = yourpasswd)        logged_in=session %>%  submit_form(login_form)) 

logged_in should contains session information after logging in.

br


Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -