Max retries exceeded with url 错误

Max retries exceeded with url

今天在写一个脚本的时候老是出现这个错误，各种头都加了还是没效果。

headers = {

'User-Agent':'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6',

"Host:":"192.168.1.1",

"Connection":"keep-alive",

"Accept-Encoding":"identity",

"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"}

结果请教大牛，仅有提示

http://panyongzheng.iteye.com/blog/1952538

因为我用的是python，而且用的是requests模块，所以就按照其中第一个方式进行添加，但是却不能成功。

无聊中想到升级，不小心就搞定了，原来是requests模块太老，不支持。

requests.adapters.DEFAULT_RETRIES = 5

升级后就好了

pip install --upgrade requests

估计知乎对这些url的访问做了限制，虽然我弄了代理，但还是碰到了这个问题。

解决办法如下：

在requests库获取html时，如果碰到访问不成功，则用try-except加上循环继续访问，并用sleep控制访问频率

html = ""

while html == "": #因为请求可能被知乎拒绝，采用循环+sleep的方式重复发送，但保持频率不太高

try:

proxies = get_random_ip(ipList)

print("这次试用ip：{}".format(proxies))

r = requests.request("GET", url, headers=headers, params=querystring, proxies=proxies)

r.encoding = 'utf-8'

html = r.text

return html

except:

print("Connection refused by the server..")

print("Let me sleep for 5 seconds")

print("ZZzzzz...")

sleep(5)

print("Was a nice sleep, now let me continue...")

continue

问题到这里应该就解决了。

参考：Max retries exceed with URL （需要翻墙）

爬虫多次访问同一个网站一段时间后会出现错误 HTTPConnectionPool（host:XX）Max retries exceeded with url '<requests.packages.urllib3.connection.HTTPConnection object at XXXX>: Failed to establish a new connection: [Errno 99] Cannot assign requested address'

是因为在每次数据传输前客户端要和服务器建立TCP连接，为节省传输消耗，默认为keep-alive，即连接一次，传输多次，然而在多次访问后不能结束并回到连接池中，导致不能产生新的连接

headers中的Connection默认为keep-alive，

将header中的Connection一项置为close

headers = { 'Connection': 'close',}

r = requests.get(url, headers=headers)

此时问题解决