Python3网络学习案例四:编写Web Proxy

代理服务器的定义和作用请走百度百科~

1. Web Proxy的实现思路

这是基于上一篇“编写Web Server”写的,主要逻辑见下图:

 

 我们要写的就是中间的Web Proxy部分,当客户端向Web Proxy发送对某一个网址的访问请求(Request)时,Web Proxy会首先查看自己是否有该请求文件,如果有则直接返回(Response),如果没有,Web Proxy就要像Web Server(该访问网址的服务器)发送请求来获取目标文件,然后再向Client返回。

2. Web Proxy的使用

首先,我们在访问一个网址时为了通过代理访问就不能简单地打开浏览器输入网址进行访问(那样就变成Client直接向Web Server发送Reuest了),在这里可以下载一个名为Wget的工具,这个东西对于Web Proxy就好像是jdk对于Java一样(当然也许有其他的工具可以先访问代理服务器,这里不讨论),下载完成后可以解压就可以使用了,就像使用jdk一样首先在命令行窗口中找到该文件所在文件夹,如果不想每次都输入一串目录来查找的话也可以将这个文件的路径添加至环境变量(至于如何配置自行搜索)。

当Web Proxy和Wget都准备好之后就可以开始运行了:

首先运行Web Proxy程序,然后通过Wget请求使用代理并且发送Request

(Wget命令:wget xxx.xxx.xx -e use_proxy=on -e http_proxy=127.0.0.1:8000),其中“xxx.xxx.xx”就是你要请求的网址

3. 运行结果

wget关于请求的回应:

 proxy缓存的路径:

  

 4. Web Proxy源码

import os
import socket


def handleReq(clientSocket):
    # recv data
    # find the fileName
    # judge if the file named "fileName" if existed
    # if not exists, send req to get it

    recvData = clientSocket.recv(1024).decode("UTF-8")
    fileName = recvData.split()[1].split("//")[1].replace('/', '')
    print("fileName: " + fileName)
    filePath = "./" + fileName.split(":")[0].replace('.', '_')
    try:
        file = open(filePath + "./index.html", 'rb')
        print("File is found in proxy server.")
        #responseMsg = file.readlines()
        #for i in range(0, len(responseMsg)):
           # clientSocket.sendall(responseMsg[i])
        responseMsg = file.read()
        clientSocket.sendall(responseMsg)
        print("Send, done.")
    except:
        print("File is not exist.\nSend request to server...")
        try:
            proxyClientSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            serverName = fileName.split(":")[0]
            proxyClientSocket.connect((serverName, 80))
            proxyClientSocket.sendall(recvData.encode("UTF-8"))
            responseMsg = proxyClientSocket.recv(4069)
            print("File is found in server.")
            clientSocket.sendall(responseMsg)
            print("Send, done.")
            # cache
            if not os.path.exists(filePath):
                os.makedirs(filePath)
            cache = open(filePath + "./index.html", 'w')
            cache.writelines(responseMsg.decode("UTF-8").replace('\r\n', '\n'))
            cache.close()
            print("Cache, done.")
        except:
            print("Connect timeout.")


def startProxy(port):
    proxyServerSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    proxyServerSocket.bind(("", port))
    proxyServerSocket.listen(0)
    while True:
        try:
            print("Proxy is waiting for connecting...")
            clientSocket, addr = proxyServerSocket.accept()
            print("Connect established")
            handleReq(clientSocket)
            clientSocket.close()
        except Exception as e:
            print("error: {0}".format(e))
            break
    proxyServerSocket.close()


if __name__ == '__main__':
    while True:
        try:
            port = int(input("choose a port number over 1024:"))
        except ValueError:
            print("Please input an integer rather than {0}".format(type(port)))
            continue
        else:
            if port <= 1024:
                print("Please input an integer greater than 1024")
                continue
            else:
                break
    startProxy(port)

 

5. Wget工具包

链接:https://pan.baidu.com/s/1Ae2_Cq9SYbKnfhhyJ1VhpQ
提取码:awsl 

 

posted @ 2020-11-04 20:25  YIYUYI  阅读(2402)  评论(0编辑  收藏  举报