python web 服务器学习笔记(四)
在开始新内容之前,我们先解决socket error 98:Address already in use问题
很容易发现可能是由于端口被占用导致的,端口被占用又有很多可能,比如说的关闭你挂上去的服务器,然后又秒开,你会发现这个错误。
此时似乎是由于tcp本身的特性,端口未来得及释放导致的
我们可以通过以下命令查看端口的使用情况,并试着处理它
kill -9 [进程id] 杀死该进程
lsof -i:[端口号]查看端口属于哪个程序
netstrat tln | grep [端口号]查看端口使用情况
嘛。。网上很多方法改socket让它可以重复使用或者restart很快,总觉得对于一个服务器的源码来说加这些不合适(其实我也不会)
然后是Linux关闭防火墙命令
1) 永久性生效,重启后不会复原 开启:chkconfig iptables on 关闭:chkconfig iptables off
2) 即时生效,重启后复原 开启:service iptables start 关闭:service iptables stop
下面要讨论的问题是公共网关接口(CGI)机制,它为web服务器提供一个标准的方式运行外部程序
比如我要运行一个展示时间的python小程序
from datetime import datetime print '''\ <html> <body> <p>Generated {0}</p> </body> </html>'''.format(datetime.now())
下面又是处理方法啦
import sys, os, BaseHTTPServer #------------------------------------------------------------------------------- class ServerException(Exception): '''For internal error reporting.''' pass #------------------------------------------------------------------------------- class case_no_file(object): '''File or directory does not exist.''' def test(self, handler): return not os.path.exists(handler.full_path) def act(self, handler): raise ServerException("'{0}' not found".format(handler.path)) #------------------------------------------------------------------------------- class case_cgi_file(object): '''Something runnable.''' def test(self, handler): return os.path.isfile(handler.full_path) and \ handler.full_path.endswith('.py') def act(self, handler): handler.run_cgi(handler.full_path) #------------------------------------------------------------------------------- class case_existing_file(object): '''File exists.''' def test(self, handler): return os.path.isfile(handler.full_path) def act(self, handler): handler.handle_file(handler.full_path) #------------------------------------------------------------------------------- class case_directory_index_file(object): '''Serve index.html page for a directory.''' def index_path(self, handler): return os.path.join(handler.full_path, 'index.html') def test(self, handler): return os.path.isdir(handler.full_path) and \ os.path.isfile(self.index_path(handler)) def act(self, handler): handler.handle_file(self.index_path(handler)) #------------------------------------------------------------------------------- class case_directory_no_index_file(object): '''Serve listing for a directory without an index.html page.''' def index_path(self, handler): return os.path.join(handler.full_path, 'index.html') def test(self, handler): return os.path.isdir(handler.full_path) and \ not os.path.isfile(self.index_path(handler)) def act(self, handler): handler.list_dir(handler.full_path) #------------------------------------------------------------------------------- class case_always_fail(object): '''Base case if nothing else worked.''' def test(self, handler): return True def act(self, handler): raise ServerException("Unknown object '{0}'".format(handler.path)) #------------------------------------------------------------------------------- class RequestHandler(BaseHTTPServer.BaseHTTPRequestHandler): ''' If the requested path maps to a file, that file is served. If anything goes wrong, an error page is constructed. ''' Cases = [case_no_file(), case_cgi_file(), case_existing_file(), case_directory_index_file(), case_directory_no_index_file(), case_always_fail()] # How to display an error. Error_Page = """\ <html> <body> <h1>Error accessing {path}</h1> <p>{msg}</p> </body> </html> """ # How to display a directory listing. Listing_Page = '''\ <html> <body> <ul> {0} </ul> </body> </html> ''' # Classify and handle request. def do_GET(self): try: # Figure out what exactly is being requested. self.full_path = os.getcwd() + self.path # Figure out how to handle it. for case in self.Cases: if case.test(self): case.act(self) break # Handle errors. except Exception as msg: self.handle_error(msg) def handle_file(self, full_path): try: with open(full_path, 'rb') as reader: content = reader.read() self.send_content(content) except IOError as msg: msg = "'{0}' cannot be read: {1}".format(self.path, msg) self.handle_error(msg) def list_dir(self, full_path): try: entries = os.listdir(full_path) bullets = ['<li>{0}</li>'.format(e) for e in entries if not e.startswith('.')] page = self.Listing_Page.format('\n'.join(bullets)) self.send_content(page) except OSError as msg: msg = "'{0}' cannot be listed: {1}".format(self.path, msg) self.handle_error(msg) def run_cgi(self, full_path): cmd = "python " + full_path child_stdin, child_stdout = os.popen2(cmd) child_stdin.close() data = child_stdout.read() child_stdout.close() self.send_content(data) # Handle unknown objects. def handle_error(self, msg): content = self.Error_Page.format(path=self.path, msg=msg) self.send_content(content, 404) # Send actual content. def send_content(self, content, status=200): self.send_response(status) self.send_header("Content-type", "text/html") self.send_header("Content-Length", str(len(content))) self.end_headers() self.wfile.write(content) #------------------------------------------------------------------------------- if __name__ == '__main__': serverAddress = ('', 8080) server = BaseHTTPServer.HTTPServer(serverAddress, RequestHandler) server.serve_forever()
从run_cgi上可以看出这个办法是把该.py放在服务器上运行的。显然会带来一些问题。比如服务器上放了一个会产生死循环的程序,当客户端方知道了有这个程序,客户端可以通过这cgi强行运行它,相当于我们的服务器受到攻击了呢
我们的处理方法是:
1.在子进程中运行该程序
2.捕获子进程发送到标准输出的一切
3.返回给发起请求的客户端
完整的cgi协议比这更丰富,它允许URL中存在参数,服务器会将他们传入正在运行的程序(似乎这就是搜索引擎的架构哇,在url中添加参数然后丢到引擎里去然后返回一个html),
但这并不影响系统的整体架构。
于是我们的源码如下:
import sys, os, BaseHTTPServer #------------------------------------------------------------------------------- class ServerException(Exception): '''For internal error reporting.''' pass #------------------------------------------------------------------------------- class base_case(object): '''Parent for case handlers.''' def handle_file(self, handler, full_path): try: with open(full_path, 'rb') as reader: content = reader.read() handler.send_content(content) except IOError as msg: msg = "'{0}' cannot be read: {1}".format(full_path, msg) handler.handle_error(msg) def index_path(self, handler): return os.path.join(handler.full_path, 'index.html') def test(self, handler): assert False, 'Not implemented.' def act(self, handler): assert False, 'Not implemented.' #------------------------------------------------------------------------------- class case_no_file(base_case): '''File or directory does not exist.''' def test(self, handler): return not os.path.exists(handler.full_path) def act(self, handler): raise ServerException("'{0}' not found".format(handler.path)) #------------------------------------------------------------------------------- class case_cgi_file(base_case): '''Something runnable.''' def run_cgi(self, handler): cmd = "python " + handler.full_path child_stdin, child_stdout = os.popen2(cmd) child_stdin.close() data = child_stdout.read() child_stdout.close() handler.send_content(data) def test(self, handler): return os.path.isfile(handler.full_path) and \ handler.full_path.endswith('.py') def act(self, handler): self.run_cgi(handler) #------------------------------------------------------------------------------- class case_existing_file(base_case): '''File exists.''' def test(self, handler): return os.path.isfile(handler.full_path) def act(self, handler): self.handle_file(handler, handler.full_path) #------------------------------------------------------------------------------- class case_directory_index_file(base_case): '''Serve index.html page for a directory.''' def test(self, handler): return os.path.isdir(handler.full_path) and \ os.path.isfile(self.index_path(handler)) def act(self, handler): self.handle_file(handler, self.index_path(handler)) #------------------------------------------------------------------------------- class case_directory_no_index_file(base_case): '''Serve listing for a directory without an index.html page.''' # How to display a directory listing. Listing_Page = '''\ <html> <body> <ul> {0} </ul> </body> </html> ''' def list_dir(self, handler, full_path): try: entries = os.listdir(full_path) bullets = ['<li>{0}</li>'.format(e) for e in entries if not e.startswith('.')] page = self.Listing_Page.format('\n'.join(bullets)) handler.send_content(page) except OSError as msg: msg = "'{0}' cannot be listed: {1}".format(self.path, msg) handler.handle_error(msg) def test(self, handler): return os.path.isdir(handler.full_path) and \ not os.path.isfile(self.index_path(handler)) def act(self, handler): self.list_dir(handler, handler.full_path) #------------------------------------------------------------------------------- class case_always_fail(base_case): '''Base case if nothing else worked.''' def test(self, handler): return True def act(self, handler): raise ServerException("Unknown object '{0}'".format(handler.path)) #------------------------------------------------------------------------------- class RequestHandler(BaseHTTPServer.BaseHTTPRequestHandler): ''' If the requested path maps to a file, that file is served. If anything goes wrong, an error page is constructed. ''' Cases = [case_no_file(), case_cgi_file(), case_existing_file(), case_directory_index_file(), case_directory_no_index_file(), case_always_fail()] # How to display an error. Error_Page = """\ <html> <body> <h1>Error accessing {path}</h1> <p>{msg}</p> </body> </html> """ # Classify and handle request. def do_GET(self): try: # Figure out what exactly is being requested. self.full_path = os.getcwd() + self.path # Figure out how to handle it. for case in self.Cases: if case.test(self): case.act(self) break # Handle errors. except Exception as msg: self.handle_error(msg) # Handle unknown objects. def handle_error(self, msg): content = self.Error_Page.format(path=self.path, msg=msg) self.send_content(content, 404) # Send actual content. def send_content(self, content, status=200): self.send_response(status) self.send_header("Content-type", "text/html") self.send_header("Content-Length", str(len(content))) self.end_headers() self.wfile.write(content) #------------------------------------------------------------------------------- if __name__ == '__main__': serverAddress = ('', 8080) server = BaseHTTPServer.HTTPServer(serverAddress, RequestHandler) server.serve_forever()
为我们所有的事件创建一个父类,当且仅当方法在多个处理器间共享时将他们移入父类中,然后分别重载我们的方法就可以啦
然后这个是cgi处理部分
class case_cgi_file(base_case): '''Something runnable.''' def run_cgi(self, handler): cmd = "python " + handler.full_path child_stdin, child_stdout = os.popen2(cmd) child_stdin.close() data = child_stdout.read() child_stdout.close() handler.send_content(data) def test(self, handler): return os.path.isfile(handler.full_path) and \ handler.full_path.endswith('.py') def act(self, handler): self.run_cgi(handler)
os.popen2是实现一个管道,从这个命令中获取的值可以继续被调用。它本来不是阻塞的,但当我测试它的性能时意外发现它阻塞了
我让它运行一个输出1到10000000的数字的脚本,于是这个时候服务器上的其他资源都不能被访问了
原因是subprocess的PIPE是有大小的,在python2.6的版本中,它的大小是65536。当PIPE被填满,就把它给塞死了