2019年10月29日,我定向爬取金交所网站的爬虫出了故障,抛出ssl.SSLCertVerificationError + urllib3.exceptions.MaxRetryError + requests.exceptions.SSLError错误。金交所的数据对我的量化对冲研究至关重要,所以有了这篇Debug的文章。

用FireFox打开金交所的网站,页面如图:

warning: Potential Security Risk Ahead

Advanced,Error code: SEC_ERROR_UNKNOWN_ISSUER

Error code: SEC\_ERROR\_UNKNOWN_ISSUER

Who care…Accept the Risk Continue

Connection Is Not Secure

Webpage works well unless you Remove Exception。

我并不关心金交所的ssl证书是否安全,我只关心中国贵金属的交易数据,所以我选择了无视。

回到爬虫上来,三个报错具体如下:

1
2
3
4
5
ssl.SSLCertVerificationError: \[SSL: CERTIFICATE\_VERIFY\_FAILED\] certificate verify failed: unable to get local issuer certificate (_ssl.c:1076)

urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.sge.com.cn', port=443): Max retries exceeded with url: /graph/quotations (Caused by SSLError(SSLCertVerificationError(1, '\[SSL: CERTIFICATE\_VERIFY\_FAILED\] certificate verify failed: unable to get local issuer certificate (_ssl.c:1076)')))

requests.exceptions.SSLError: HTTPSConnectionPool(host='www.sge.com.cn', port=443): Max retries exceeded with url: /graph/quotations (Caused by SSLError(SSLCertVerificationError(1, '\[SSL: CERTIFICATE\_VERIFY\_FAILED\] certificate verify failed: unable to get local issuer certificate (_ssl.c:1076)')))

后两个错误Caused by SSLError(SSLCertVerificationError),因此先来解决第一个SSLCertVerificationError问题。

requests发生https请求时默认启用SSL验证,如果无法验证SSL证书,将会引发SSLError。故在发送请求时设置verify=False就好。

1
2
3
4
response = requests.post(url, headers=headers, cookies=cookies, data=data, verify=False)
将验证关闭之后,程序可以运行,但会抛出如下警告:
/usr/lib/python3.7/site-packages/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)

这也很烦人,知道这是一个不安全的请求,不用一再的提醒,禁用安全提示就好。

1
2
3
import urllib3
\# 禁用安全请求警告
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

至此,问题解决,谨以此记。