原先有tcp超时现像(WGET 十几次后会有超时),插拨网线并重启后已恢复正常。

目前一小时近2W人在线,留下些操作记录

系统信息
======================================
dell R410
5504(4核2G) *2 4G*2 sas15K 146*2
Intel(R) Xeon(R) CPU E5504 @ 2.00GHz

centos 5.2 64bit
nginx+php+mysql

nginx 6个进程
php 96个进程

Linux bora 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64 x86_64 x86_64 GNU/Linux

sysctl内核
net.ipv4.tcp_max_syn_backlog = 65536
net.core.netdev_max_backlog = 32768
net.core.somaxconn = 32768

net.core.wmem_default = 8388608
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2

net.ipv4.tcp_tw_recycle = 1
#net.ipv4.tcp_tw_len = 1
net.ipv4.tcp_tw_reuse = 1

net.ipv4.tcp_mem = 94500000 915000000 927000000
net.ipv4.tcp_max_orphans = 3276800

#net.ipv4.tcp_fin_timeout = 30
#net.ipv4.tcp_keepalive_time = 120
net.ipv4.ip_local_port_range = 1024 65535

优化文件句柄
vi /etc/security/limits.conf
* soft nofile 51200
* hard nofile 51200

vi /etc/rc.local
ulimit -SHn 51200

===================
Active connections: 2419
server accepts handled requests
73668795 73668795 232420556
Reading: 11 Writing: 28 Waiting: 2380

在线会员 - 总计 13433 人在线

top - 13:29:01 up 33 days, 22:53, 2 users, load average: 1.22, 1.60,
Tasks: 265 total, 1 running, 264 sleeping, 0 stopped, 0 zombie
Cpu(s): 9.3%us, 1.4%sy, 0.0%ni, 88.2%id, 0.5%wa, 0.0%hi, 0.6%si,
Mem: 8168412k total, 6691148k used, 1477264k free, 917728k buffe
Swap: 4096532k total, 228k used, 4096304k free, 3841696k cache

netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
TIME_WAIT 4845
SYN_SENT 1
FIN_WAIT1 185
ESTABLISHED 2698
FIN_WAIT2 381
SYN_RECV 162
CLOSING 5
LAST_ACK 137

netstat -n |wc -l
8441

修改tcp_no_metrics_save
默认情况下一个tcp连接关闭后,把这个连接曾经有的参数比如慢启动门限snd_sthresh,拥塞窗口snd_cwnd 还有srtt等信息保存到dst_entry中, 只要dst_entry 没有失效,下次新建立相同连接的时候就可以使用保存的参数来初始化这个连接.通常情况下是关闭的。

echo '1' > /proc/sys/net/ipv4/tcp_no_metrics_save

依然超时

vi /etc/sysctl.conf
增加一行
net.ipv4.tcp_no_metrics_save =1

sysctl -p

开启5分钟后系统负载有小额上升

Active connections: 2520
server accepts handled requests
73715479 73715479 232593589
Reading: 11 Writing: 14 Waiting: 2495

top - 13:46:30 up 33 days, 23:10, 2 users, load average: 1.49, 1.74, 1.64
Tasks: 265 total, 3 running, 262 sleeping, 0 stopped, 0 zombie
Cpu(s): 13.8%us, 2.4%sy, 0.0%ni, 82.9%id, 0.1%wa, 0.1%hi, 0.7%si, 0.0
Mem: 8168412k total, 7225928k used, 942484k free, 925236k buffers
Swap: 4096532k total, 228k used, 4096304k free, 3893016k cached

vi /etc/sysctl.conf
修改
net.ipv4.tcp_no_metrics_save =0

sysctl -p

5分钟后负载下载,net.ipv4.tcp_no_metrics_save 不是很管用

top - 13:52:18 up 33 days, 23:16, 2 users, load average: 1.18, 1.46, 1.56
Tasks: 265 total, 1 running, 264 sleeping, 0 stopped, 0 zombie
Cpu(s): 12.9%us, 2.1%sy, 0.0%ni, 83.5%id, 0.8%wa, 0.1%hi, 0.6%si, 0.0
Mem: 8168412k total, 7274212k used, 894200k free, 927504k buffers
Swap: 4096532k total, 228k used, 4096304k free, 3911832k cached

=======================================

cat /proc/sys/net/ipv4/tcp_fin_timeout
60
cat /proc/sys/net/ipv4/tcp_keepalive_time
7200
vi /etc/sysctl.conf

net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 120

sysctl -p

========================
ifconfig eth0 txqueuelen 1000

看了下已经是1000了
ifconfig

TX packets:1924158031 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:339974963014 (316.6 GiB) TX bytes:2165156209254 (1.9 TiB)

================================
cat /proc/sys/net/ipv4/netfilter/ip_conntrack_max
65536

=========================
cat /proc/sys/fs/file-nr
4590 0 765985

=======================================
cat /proc/sys/net/ipv4/route/gc_interval
60
cat /proc/sys/net/ipv4/route/gc_timeout
300
cat /proc/sys/net/ipv4/route/gc_elasticity
8

eaccelerator

  1. [eaccelerator]
  2. zend_extension="/opt/php/lib/php/extensions/no-debug-non-zts-20060613/eaccelerator.so"
  3. eaccelerator.shm_size="32"
  4. eaccelerator.cache_dir="/opt/php/eaccelerator_cache"
  5. eaccelerator.enable="1"
  6. eaccelerator.optimizer="1"
  7. eaccelerator.check_mtime="1"
  8. eaccelerator.debug="0"
  9. eaccelerator.filter=""
  10. eaccelerator.shm_max="0"
  11. eaccelerator.shm_ttl="3600"
  12. eaccelerator.shm_prune_period="3600"
  13. eaccelerator.shm_only="0"
  14. eaccelerator.compress="1"
  15. eaccelerator.compress_level="9"

nginx

  1. user  www website;
  2.  
  3. worker_processes 6;
  4.  
  5. error_log  /var/log/nginx/nginx_error.log  crit;
  6.  
  7. pid        /dev/shm/nginx.pid;
  8.  
  9. #Specifies the value for maximum file descriptors that can be opened by this process.
  10. worker_rlimit_nofile 51200;
  11.  
  12. events
  13. {
  14.      use epoll;
  15.  
  16.      worker_connections 51200;
  17. }
  18.  
  19. http
  20. {
  21.      include       mime.types;
  22.      default_type  application/octet-stream;
  23.  
  24.      log_format  access  '$remote_addr - $remote_user [$time_local] "$request" '
  25.                            '$status $body_bytes_sent "$http_referer" '
  26.                            '"$http_user_agent" $http_x_forwarded_for';
  27.  
  28.     
  29.  
  30.  
  31.  
  32.      server_names_hash_bucket_size 128;
  33.      client_header_buffer_size 32k;
  34.      large_client_header_buffers 4 32k;
  35.      client_body_timeout 60;
  36.      client_max_body_size 8m;
  37.  
  38.      #linux 2.4+
  39.      sendfile on;
  40.      tcp_nopush     on;
  41.      tcp_nodelay on;
  42.  
  43.      server_name_in_redirect off;
  44.  
  45.      keepalive_timeout 60;
  46.  
  47.      fastcgi_intercept_errors on;
  48.      fastcgi_hide_header X-Powered-By;
  49.      fastcgi_connect_timeout 180;
  50.      fastcgi_send_timeout 180;
  51.      fastcgi_read_timeout 180;
  52.      fastcgi_buffer_size 128k;
  53.      fastcgi_buffers 4 128K;
  54.      fastcgi_busy_buffers_size 128k;
  55.      fastcgi_temp_file_write_size 128k;
  56.      fastcgi_temp_path /dev/shm;
  57.  
  58.      gzip on;
  59.      gzip_min_length  1k;
  60.      gzip_comp_level 5;
  61.      gzip_buffers     4 16k;
  62.      gzip_http_version 1.1;
  63.      gzip_types       text/plain application/x-javascript text/css application/xml;
  64.  
  65.      limit_zone   one  $binary_remote_addr  10m;
  66.  
  67.      server
  68.      {
  69.              listen       80;
  70.              server_name  bbs.xxx.com *.bbs.xxx.com;
  71.              index index.html index.htm index.php;
  72.              root  /opt/lampp/htdocs/bbs;
  73.              error_page 404 403 /404.html;
  74.         
  75.              location ~/\.ht {
  76.                  deny all;
  77.              }
  78.  
  79.              location ~ /bbs/attachment\.php?$ {
  80.                   include fcgi.conf;     
  81.                   fastcgi_pass  127.0.0.1:9000;
  82.                   fastcgi_index index.php;
  83.                  limit_conn   one  1;
  84.                  limit_rate 30k;
  85.              }
  86.  
  87.              location ~ .*\.php?$
  88.              {
  89.                   #fastcgi_pass  unix:/tmp/php-cgi.sock;
  90.                   fastcgi_pass  127.0.0.1:9000;
  91.                   fastcgi_index index.php;
  92.                   include fcgi.conf;     
  93.              }
  94.  
  95.                 rewrite ^(.*)/archiver/((fid|tid)-[\w\-]+\.html)$ $1/archiver/index.php?$2 last;
  96.                 rewrite ^(.*)/forum-([0-9]+)-([0-9]+)\.html$ $1/forumdisplay.php?fid=$2&page=$3 last;
  97.                 rewrite ^(.*)/thread-([0-9]+)-([0-9]+)-([0-9]+)\.html$ $1/viewthread.php?tid=$2&extra=page\%3D$4&page=$3 last;
  98.                 rewrite ^(.*)/profile-(username|uid)-(.+)\.html$ $1/viewpro.php?$2=$3 last;
  99.                 rewrite ^(.*)/space-(username|uid)-(.+)\.html$ $1/space.php?$2=$3 last;
  100.  
  101.              location ~(favicon.ico) {
  102.                  log_not_found off;
  103.                  expires 99d;
  104.                  break;
  105.              }
  106.              location ~(robots.txt) {
  107.                  log_not_found off;
  108.                  expires 7d;
  109.                  break;
  110.              }
  111.  
  112.              location ~* ^.+\.(jpg|jpeg|gif|png|swf|rar|zip|css|js)$ {
  113.                 valid_referers none blocked *.xxx.com *.xxx.net localhost;
  114.                 if ($invalid_referer) {
  115.                     rewrite ^/ ;
  116.                     return 412;
  117.                 }
  118.                  access_log   off;
  119.                  root /opt/lampp/htdocs/bbs;
  120.                  expires 7d;
  121.                  break;
  122.              }
  123.  
  124.              access_log  /var/log/nginx/bbs.xxx.com.log  access;
  125.      }
  126. }

==================
总计 18518 人在线

top - 17:40:04 up 5:49, 2 users, load average: 1.81, 2.08, 2.20
Tasks: 265 total, 6 running, 259 sleeping, 0 stopped, 0 zombie
Cpu(s): 26.0%us, 3.4%sy, 0.0%ni, 69.0%id, 0.7%wa, 0.0%hi, 0.9%si,
Mem: 8168412k total, 7173156k used, 995256k free, 486980k buffe
Swap: 4096532k total, 0k used, 4096532k free, 3559140k cache

Active connections: 2306
server accepts handled requests
297904 297904 980053
Reading: 7 Writing: 10 Waiting: 2289

netstat -n|wc -l
8157

sysctl

net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
net.ipv4.tcp_max_syn_backlog = 65536
net.core.netdev_max_backlog = 32768
net.core.somaxconn = 32768
net.ipv4.tcp_max_tw_buckets = 10000
net.core.wmem_default = 8388608
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_mem = 196608 262144 393216
net.ipv4.tcp_max_orphans = 3276800
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 120
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_syncookies = 1

===================================
2009/11/2更新
近日网络有些不稳定,系统日志中有下列信息
tail /var/log/messages

  1. Nov  1 19:37:32 bora kernel: ip_conntrack: table full, dropping packet.
  2. Nov  1 19:38:31 bora kernel: ip_conntrack: table full, dropping packet.
  3. Nov  1 19:43:15 bora kernel: ip_conntrack: table full, dropping packet.
  4. Nov  1 20:38:31 bora kernel: ip_conntrack: table full, dropping packet.
  5. Nov  1 20:42:16 bora last message repeated 2 times
  6. Nov  1 20:51:42 bora last message repeated 2 times
  7. Nov  1 21:23:38 bora last message repeated 3 times
  8. Nov  1 21:28:14 bora last message repeated 2 times
  1. cat /var/log/messages |grep 'ip_conntrack: table full'
  2. cat /var/log/messages |grep 'syslogd 1.4.1: restart'

发现个规律
ip_conntrack: table full信息一般隔5天出现,一天后系统就出现假死,然后需手工重启。
应该就是ip_conntrack的问题

查看当前的参数

  1. cat /proc/sys/net/ipv4/ip_conntrack_max
  2. 65536

65536为系统默认1G内存的数值
ip_conntrack_max 计算公式
参考:http://www.wallfire.org/misc/netfilter_conntrack_perf.txt

  1. CONNTRACK_MAX = RAMSIZE (in bytes) / 16384 / (x / 32)
  2. #where x is the number of bits in a pointer (for example, 32 or 64 bits)

1G内存的话:1024*1024*1024/16384/(32/32)=65536
我的配值为8G 64bit
8192*1024*1024/16384/(64/32)=262144

查看RAMSIZE

  1. cat /proc/sys/net/ipv4/netfilter/ip_conntrack_buckets
  2. 8192

查看ip_conntrack timeout

  1. cat /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_timeout_established
  2. 432000

#432000(5天)改成36000 (10小时)

vi /etc/sysctl.conf
修改内核在尾部增加两行

  1. net.ipv4.ip_conntrack_max = 262144
  2. net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 36000

立即生效
sysctl -p

http://blog.c1gstudio.com/archives/870

 

posted on 2011-01-07 17:08  Dufe王彬  阅读(807)  评论(0编辑  收藏  举报