结合nginx的内嵌perl-module,实现高性能的web静态化
到底web服务器的极限是多少呢?100 r/s? 500 r/s?还是1000 r/s?
在StaticGenerator上看到,原来1000 r/s都不算什么.
真的是这样吗?如果是真的那就太恐怖了。
目前的web基本上都是动态的,为了提高性能,我们常常会使用各种各样的手段优化,例如减少IO,合理的字符串使用和操作,memcached等等。但是尽管全部优化都做足了,全动态的web点击率以是在1000 - 2000 r/s(4核4G标准的服务器配置)之间。
还能进一步优化吗?能,这就需要利用前面提到的,“静态文件”。我们知道,web服务器处理静态文件的请求是非常高效的,特别是nginx,它宣称“支持高达 50,000 个并发连接数的响应”。(呵呵,感觉有点像在为nginx卖广告。)
将动态内容静态化输出,有几个问题是必须解决的:
- url怎么对应到静态文件路径?
- 静态文件什么时候失效?即防止脏数据的存在。
- 什么时候生成静态文件?
URL转换成静态文件路径
如果web应用的url是友好的,是非常容易对应到静态文件的。例如:
http://hostname/post/some-post-detailname/
这样我们很容易想象到对应的静态文件路径是/post/some-post-detailname/index.html或者/post/some-post-detailname.html等等.
一个通用的方案:对url算md5后来确定静态文件的路径.
还是使用上面的url,它的path是/post/some-post-detailname/,算md5结果为:81982658fe1d78f51d228950babd1457,则可以取路径为/8/1/982658fe1d78f51d228950babd1457
在nginx中如果生成这样的路径呢?答案是使用内嵌perl,以下是我的一个例子:
use Digest::MD5 qw(md5_hex);
use File::stat;
sub {
my $r = shift;
my $s = md5_hex($r->uri);
my $path_md5 = join "", join("/", substr($s, 0, 1), substr($s, 1, 1), substr($s, 2)), ".html";
my $filepath = "/data/www/".$path_md5;
if(-f $filepath) {
my $mtime = stat($filepath)->mtime;
if(time() - $mtime > 1800) {
return $path_md5.".new";
}
}
return $path_md5;
}';
生成静态文件
生成静态文件的时机,这个取决你的应用,像blog,新闻等内容为主的应用,可以在第一次请求的时候生成;又如一些web api类型的,可以在同一请求达到指定的次数时,才生产静态文件。
至于怎样生成静态文件?最简单的方式是将response的内容直接写到文件中,如果你是基于django的话,StaticGenerator可以帮你大忙了。
静态文件的失效时间,像上面生成静态文件路径的例子中,我设定的失效时间的30分钟。
完整的配置文件例子如下: 点击nginx.conf下载
pid /home/test/nginx.pid;
worker_processes 8;
error_log /data/nginx/logs/error.log;
events {
worker_connections 2048;
use epoll;
}
http {
# default nginx location
include /home/test/mime.types;
default_type text/html;
log_format main
'$remote_addr $host $remote_user [$time_local] "$request" $status $bytes_sent "$http_referer" "$http_user_agent" $process $request_time $sent_http_x_type';
client_header_timeout 10s; # If after this time the client send nothing, nginx returns error "Request time out" (408).
client_body_timeout 10s;
send_timeout 10s; # if after this time client will take nothing, then nginx is shutting down the connection.
connection_pool_size 256;
client_header_buffer_size 1k;
large_client_header_buffers 4 2k;
request_pool_size 4k;
output_buffers 4 32k;
postpone_output 1460;
sendfile on;
tcp_nopush on;
keepalive_timeout 20 10;
tcp_nodelay on;
fastcgi_connect_timeout 300;
fastcgi_send_timeout 300;
fastcgi_read_timeout 300;
fastcgi_buffer_size 64k;
fastcgi_buffers 4 64k;
fastcgi_busy_buffers_size 128k;
fastcgi_temp_file_write_size 128k;
client_max_body_size 10m;
client_body_buffer_size 256k;
proxy_connect_timeout 90;
proxy_send_timeout 90;
proxy_read_timeout 90;
client_body_temp_path /data/nginx/logs/client_body_temp;
proxy_temp_path /data/nginx/logs/proxy_temp;
fastcgi_temp_path /data/nginx/logs/fastcgi_temp;
gzip off;
gzip_min_length 1100;
gzip_buffers 4 32k;
gzip_types text/plain text/html application/x-javascript text/xml text/css;
ignore_invalid_headers on;
perl_set $path_md5 '
use Digest::MD5 qw(md5_hex);
use File::stat;
sub {
my $r = shift;
my $s = md5_hex($r->uri);
my $path_md5 = join "", join("/", substr($s, 0, 1), substr($s, 1, 1), substr($s, 2)), ".html";
my $filepath = "/data/www/".$path_md5;
if(-f $filepath) {
my $mtime = stat($filepath)->mtime;
if(time() - $mtime > 1800) {
return $path_md5.".new";
}
}
return $path_md5;
}';
server {
listen 80;
server_name 127.0.0.1;
index index.html;
root /data/www;
set $process "nginx";
# static resources
#location ~* ^.+\.(html|jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|doc|xls|exe|pdf|ppt|txt|tar|mid|midi|wav|bmp|rtf|js)$
#{
# expires 30d;
# break;
#}
location /site_media {
root /home/test/web;
access_log off;
#expires 30d;
rewrite ^/site_media/(.*) /media/$1 break;
}
location /nginx_status {
# copied from http://blog.kovyrin.net/2006/04/29/monitoring-nginx-with-rrdtool/
stub_status on;
access_log off;
allow 127.0.0.1;
allow 192.168.0.0/16;
allow 219.131.196.66;
deny all;
break;
}
location /request_status {
access_log off;
allow 127.0.0.1;
allow 192.168.0.0/16;
deny all;
rewrite ^/request_status/(.*) /rrd/$1 break;
autoindex on;
}
location ~* ^/(webmd5|weburl|urlsafe|website|urlnotfound|reporturl|suggesturl|receive|leak|virus|site|admin|index)/ {
#rewrite (.*) /$path_md5 redirect;
try_files /$path_md5 @fastcgi;
}
location @fastcgi {
set $process "fcgi";
fastcgi_pass unix:/home/test/gateway.sock;
fastcgi_param PATH_INFO $fastcgi_script_name;
fastcgi_param REQUEST_METHOD $request_method;
fastcgi_param QUERY_STRING $query_string;
fastcgi_param CONTENT_TYPE $content_type;
fastcgi_param CONTENT_LENGTH $content_length;
fastcgi_pass_header Authorization;
fastcgi_param REMOTE_ADDR $remote_addr;
fastcgi_param SERVER_PROTOCOL $server_protocol;
fastcgi_param SERVER_PORT $server_port;
fastcgi_param SERVER_NAME $server_name;
fastcgi_param REQUEST_FILENAME $path_md5;
#fastcgi_param HTTP_X_FORWARDED_FOR $proxy_add_x_forwarded_for;
fastcgi_intercept_errors off;
break;
}
location /403.html {
root /usr/local/nginx;
access_log off;
}
location /401.html {
root /usr/local/nginx;
access_log off;
}
location /404.html {
root /usr/local/nginx;
access_log off;
}
location = /_.gif {
empty_gif;
access_log off;
}
access_log /data/nginx/logs/access.log main;
}
}
希望本文对你有用. ^_^