利用Python解决GitHub访问速度慢的问题

GitHub访问速度慢,下载慢,可以在https://www.ipaddress.com/查询最新的GitHub地址,通过向Host文件添加记录的方式解决。

需要同时向Host文件添加"github.com", "github-cloud.s3.amazonaws.com", "codeload.github.com", "github.global.ssl.fastly.net","github-cloud.s3.amazonaws.com"五条记录

 

 另外为了更方便获取地址,我写了一段Python代码,来爬取地址。

 1 import os
 2 from urllib import request
 3 from lxml import etree
 4 import ssl
 5 from urllib.error import HTTPError
 6 import time
 7 
 8 
 9 def Path(pages: str) -> list:
10     return etree.HTML(pages).xpath('.//ul[@class="comma-separated"]/li/text()')
11 
12 
13 def CreatUrlList(urls: list) -> list:
14     lists = []
15     for line in urls:
16         lists.append(f"https://{line}.ipaddress.com")
17     return lists
18 
19 
20 def GetIp(page_url: str) -> str:
21     try:
22 
23         context1 = ssl._create_unverified_context()
24         r = request.urlopen(page_url, context=context1)
25         page = r.read().decode()
26         html = Path(page)
27         return list(set(html))[0]
28     except HTTPError as er:
29         print(er.code)
30 
31 
32 url_list = ["github.com", "github-cloud.s3.amazonaws.com", "codeload.github.com", "github.global.ssl.fastly.net",
33             "github-cloud.s3.amazonaws.com"]
34 
35 
36 if __name__ == '__main__':
37     lists_url = CreatUrlList(url_list)
38     lists = []
39     # 在py文件目录下生产一个host.txt文件以便大家添加Host记录
40     path = os.getcwd() + "/host.txt"
41     print(path)
42     file = open(path, 'w', encoding="utf-8")
43     now = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
44     file.write(f"# GitHub网址爬取 {now}\n", )
45     for url in lists_url:
46         ip = GetIp(url)
47         url = ip + "\t" + url.replace("https://", "").replace(".ipaddress.com", "")
48         try:
49             file.write(url + "\n")
50         except Exception as e:
51             print(e)
52         finally:
53             print(url)
54     file.close()
55     print("GitHub地址爬取完成!")

运行结果如下:

C:\Python\Python38\python.exe "C:/Users/Albert Yu/PycharmProjects/untitled/ipaddress.py"
C:\Users\Albert Yu\PycharmProjects\untitled\host.txt
140.82.112.3    github.com
52.217.37.164    github-cloud.s3.amazonaws.com
140.82.113.10    codeload.github.com
199.232.69.194    github.global.ssl.fastly.net
52.217.37.164    github-cloud.s3.amazonaws.com
GitHub地址爬取完成!

Process finished with exit code 0

 

posted @ 2020-08-13 09:23  大黑哥哥  阅读(471)  评论(0编辑  收藏  举报