python - GET html data from multiple urls on website in one connection -
i have python script takes in input of few urls. script loops through each of these urls , prints out htmltext each page. website see 3 seperate requests , therefore 3 "hits" site or see socket connection , see 1 "hit" page?
i think it's first option checking debug, if so, possible data multiple urls on same site site see 1 "hit" site? can utilise keep-alive functionality achieve in urllib3?
my script below:
for u in url: opener = urllib2.build_opener(urllib2.httpcookieprocessor(cj)) req = urllib2.request(u) req.add_header('user-agent','mozilla/5.0') print urllib2.build_opener(urllib2.httphandler(debuglevel=1)).open(req) resp = opener.open(req) htmltext = resp.read()
would website see 3 seperate requests , therefore 3 "hits" site or see socket connection , see 1 "hit" page?
yes, if reuse socket connections, still 3 distinct requests (over 1 socket). server's access log show 3 requests regardless of how many connections you've used.
the benefit of reusing connections creating new tcp socket , negotiating handshake server relatively expensive procedure. can take more time retrieve http response body itself. reusing connection, can skip part after first request.