python - GET html data from multiple urls on website in one connection -


i have python script takes in input of few urls. script loops through each of these urls , prints out htmltext each page. website see 3 seperate requests , therefore 3 "hits" site or see socket connection , see 1 "hit" page?

i think it's first option checking debug, if so, possible data multiple urls on same site site see 1 "hit" site? can utilise keep-alive functionality achieve in urllib3?

my script below:

for u in url:     opener = urllib2.build_opener(urllib2.httpcookieprocessor(cj))     req = urllib2.request(u)     req.add_header('user-agent','mozilla/5.0')     print urllib2.build_opener(urllib2.httphandler(debuglevel=1)).open(req)     resp = opener.open(req)     htmltext = resp.read() 

would website see 3 seperate requests , therefore 3 "hits" site or see socket connection , see 1 "hit" page?

yes, if reuse socket connections, still 3 distinct requests (over 1 socket). server's access log show 3 requests regardless of how many connections you've used.

the benefit of reusing connections creating new tcp socket , negotiating handshake server relatively expensive procedure. can take more time retrieve http response body itself. reusing connection, can skip part after first request.


Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -