wsgi初探
大半夜的不睡觉,起来看技术文档,我这是什么精神啊~
ok 本文的大部分内容都是阅读 http://wsgi.readthedocs.org/en/latest/ 得来的。下面开始研究 wsgi
wsgi全名叫 Web Server Gateway Interface.是一个python的标准,定义了python程序应该如何和webserver通信。本文主要分为以下四个部分:
what is wsgi
wsgi是python的一个标准,定义了python application与web server通信的接口标准。它不是一个模块,一个程序,也不是一个server。简单的说,如果一个application是按照 wsgi规范写的,一个web server也是按照wsgi规范写的,那么这个application 就可以运行在这个server上。
wsgi的server所做的事情非常简单,就是把client(通常是浏览器) 的request转交给 wsgi application,然后把wsgi application产生的response返回给client。就是这么简单。
wsgi的application则是可以像积木一样堆叠的。比如,wsgi server上面放一个 程序wsgi 程序A, wsgi 程序A上面再放一个wsgi 程序B, wsgi 程序B上面再放一个wsgi 程序C。理论上可以无限的堆叠。对于那些在中间的wsgi 程序,比如A 和 B, 它们就是wsgi middleware。由于它们处于中间,所以它们与上下层通信都需要实现wsgi规范的接口。
application interface
wsgi application的interface需要是一个可以调用的对象,比如function,class或者一个实现了__call__方法的实例(我猜只要一个实例具有__call__方法,我们就可以调用它吧?等下验证一下 -- 经过验证,是这样的)
- 这个可调用对象必须接收下面两个位置参数:
- 一个装有类似于CGI变量的字典对象
- 一个wsgi server提供的回调函数,该函数用来把wsgi application的HTTP status code/message 和HTTP headers发给wsgi server
- 这个可调用对象必须把response body以string的形式放在一个iterable的对象中
如下是一个代码示例:
# This is our application object. It could have any name, # except when using mod_wsgi where it must be "application" def application( # It accepts two arguments: # environ points to a dictionary containing CGI like environment variables # which is filled by the server for each received request from the client environ, # start_response is a callback function supplied by the server # which will be used to send the HTTP status and headers to the server start_response): # build the response body possibly using the environ dictionary response_body = 'The request method was %s' % environ['REQUEST_METHOD'] # HTTP response code and message status = '200 OK' # These are HTTP headers expected by the client. # They must be wrapped as a list of tupled pairs: # [(Header name, Header value)]. response_headers = [('Content-Type', 'text/plain'), ('Content-Length', str(len(response_body)))] # Send them to the server using the supplied function start_response(status, response_headers) # Return the response body. # Notice it is wrapped in a list although it could be any iterable. return [response_body]
这段代码暂时还不能运行,因为我们还没有wsgi server。下一部分会涉及到
Environment dictionary
环境变量字典会包含一些CGI 变量,wsgi server 在收到client的request后根据request填充这个字典。下面的脚本会输出整个字典:
#! /usr/bin/env python # Our tutorial's WSGI server from wsgiref.simple_server import make_server def application(environ, start_response): # Sorting and stringifying the environment key, value pairs response_body = ['%s: %s' % (key, value) for key, value in sorted(environ.items())] response_body = '\n'.join(response_body) status = '200 OK' response_headers = [('Content-Type', 'text/plain'), ('Content-Length', str(len(response_body)))] start_response(status, response_headers) return [response_body] # Instantiate the WSGI server. # It will receive the request, pass it to the application # and send the application's response to the client httpd = make_server( 'localhost', # The host name. 8051, # A port number where to wait for the request. application # Our application object name, in this case a function. ) # Wait for a single request, serve it and quit. httpd.handle_request()
Response Iterable
如果把上面application中的return [response_body] 换成了 return response_body。 则会发现程序的响应速度慢了很多。这是因为server会把response_body的字符串整个当做一个iterable的对象。一个字符一个字符的返回给客户端。 所以,一定要把response_body放进可迭代对象中。 另外,如果一个response_body中包含了多个字符串,那么content-length就是所有字符串的字符数量之和。
Parsing the Request - Get
如果在访问上面的application的时候用下面这样的url
http://localhost:8051/?age=10&hobbies=software&hobbies=tunning
那么在environ字典中REQUEST_METHOD 和 QUERY_STRING 就会是GET 与 age=10&hobbies=software&hobbies=tunning。要注意到hobbies出现了2次。这很正常,比如你提交的表单里面可能有checkbox。通过 CGI module 的 parse_qs 函数,可以很方便的解析query string。parse_qs返回的结果是一个字典,key是如age,hobbies这样的键,而值是list 比如 hobbies对应的值是['software','tunning']。
运行下面的代码,再用上面的URL去访问,就可以看到返回解析过的query string
#!/usr/bin/env python from wsgiref.simple_server import make_server from cgi import parse_qs, escape def application(environ, start_response): # Returns a dictionary containing lists as values. d = parse_qs(environ['QUERY_STRING']) # In this idiom you must issue a list containing a default value. age = d.get('age', [''])[0] # Returns the first age value. hobbies = d.get('hobbies', []) # Returns a list of hobbies. # Always escape user input to avoid script injection age = escape(age) hobbies = [escape(hobby) for hobby in hobbies] response_body = 'age is '+age+' hobbies is '+' '.join(hobbies) status = '200 OK' # Now content type is text/html response_headers = [('Content-Type', 'text/html'), ('Content-Length', str(len(response_body)))] start_response(status, response_headers) return [response_body] httpd = make_server('localhost', 8051, application) # Now it is serve_forever() in instead of handle_request(). # In Windows you can kill it in the Task Manager (python.exe). # In Linux a Ctrl-C will do it. httpd.serve_forever()
Parsing the Request - Post
如果request是post,那么query string就会在http body中,而不是在URL中。wsgi server在environ字典的wsgi.input这个键对应的value处放了一个类文件对象。这个类文件对象中存放了具体的request string。wsgi server还在environ字典的content_length键对应处放了这个query string的长度。下面的代码解析post request
#!/usr/bin/env python from wsgiref.simple_server import make_server from cgi import parse_qs, escape def application(environ, start_response): # the environment variable CONTENT_LENGTH may be empty or missing try: request_body_size = int(environ.get('CONTENT_LENGTH', 0)) except (ValueError): request_body_size = 0 # When the method is POST the query string will be sent # in the HTTP request body which is passed by the WSGI server # in the file like wsgi.input environment variable. request_body = environ['wsgi.input'].read(request_body_size) d = parse_qs(request_body) age = d.get('age', [''])[0] # Returns the first age value. hobbies = d.get('hobbies', []) # Returns a list of hobbies. # Always escape user input to avoid script injection age = escape(age) hobbies = [escape(hobby) for hobby in hobbies] response_body = age+hobbies status = '200 OK' response_headers = [('Content-Type', 'text/html'), ('Content-Length', str(len(response_body)))] start_response(status, response_headers) return [response_body] httpd = make_server('localhost', 8051, application) httpd.serve_forever()