Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. root is allowed to allocate slighly more memory in this mode. This is the default.
Don't overcommit. The total address space commit for the system is not permitted to exceed swap + a configurable percentage (default is 50) of physical RAM. Depending on the percentage you use, in most situations this means a process will not be killed while accessing pages but will receive errors on memory allocation as appropriate.
Nov 8 18:07:04 dp3 /usr/sbin/gmond[1664]: [PYTHON] Can't call the metric handler function for [diskstat_sdd_reads] in the python module [diskstat].#012
Nov 8 18:07:04 dp3 /usr/sbin/gmond[1664]: [PYTHON] Can't call the metric handler function for [diskstat_sdd_writes] in the python module [diskstat].#012
Nov 8 18:07:28 dp3 console-kit-daemon[1760]: WARNING: Error writing state file: No space left on device
Nov 8 18:07:28 dp3 console-kit-daemon[1760]: WARNING: Cannot write to file /var/run/ConsoleKit/database~
Nov 8 18:07:28 dp3 console-kit-daemon[1760]: WARNING: Unable to spawn /usr/lib/ConsoleKit/run-session.d/pam-foreground-compat.ck: Failed to fork (Cannot allocate memory)
Nov 8 18:07:28 dp3 console-kit-daemon[1760]: WARNING: Error writing state file: No space left on device
Nov 8 18:07:28 dp3 console-kit-daemon[1760]: WARNING: Cannot write to file /var/run/ConsoleKit/database~
Nov 8 18:07:28 dp3 console-kit-daemon[1760]: WARNING: Cannot unlink /var/run/ConsoleKit/database: No such file or directory
Nov 8 18:08:12 dp3 /usr/sbin/gmond[1664]: slurpfile() open() error on file /proc/stat: Too many open files
Nov 8 18:08:12 dp3 /usr/sbin/gmond[1664]: update_file() got an error from slurpfile() reading /proc/stat
Nov 8 18:08:12 dp3 /usr/sbin/gmond[1664]: slurpfile() open() error on file /proc/stat: Too many open files
Nov 8 18:08:12 dp3 /usr/sbin/gmond[1664]: update_file() got an error from slurpfile() reading /proc/stat
Nov 8 18:08:12 dp3 /usr/sbin/gmond[1664]: slurpfile() open() error on file /proc/stat: Too many open files
Nov 8 18:08:12 dp3 /usr/sbin/gmond[1664]: update_file() got an error from slurpfile() reading /proc/stat
Nov 8 18:08:12 dp3 /usr/sbin/gmond[1664]: slurpfile() open() error on file /proc/stat: Too many open files
Nov 8 18:08:12 dp3 /usr/sbin/gmond[1664]: update_file() got an error from slurpfile() reading /proc/stat
Nov 8 18:08:12 dp3 /usr/sbin/gmond[1664]: slurpfile() open() error on file /proc/stat: Too many open files
Nov 8 18:08:12 dp3 /usr/sbin/gmond[1664]: update_file() got an error from slurpfile() reading /proc/stat
Nov 8 18:08:12 dp3 kernel: [4319715.969327] gmond[1664]: segfault at ffffffffffffffff ip 00007f52e0066f34 sp 00007fff4e428620 error 4 in libganglia-3.1.2.so.0.0.0[7f52e0060000+13000]
Nov 8 18:10:01 dp3 cron[1637]: (CRON) error (can't fork)
Nov 8 18:13:53 dp3 init: tty1 main process (2341) terminated with status 1
Nov 8 18:13:53 dp3 init: tty1 main process ended, respawning
Nov 8 18:13:53 dp3 init: Temporary process spawn error: Cannot allocate memory
而在hadoop的datanode日志里面,有下面这些错误(只是给出部分exception):
2012-11-08 18:07:01,283 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.18.10.56:50010, storageID=DS-1599419066-10.18.10.47-50010-1329122718923, infoPort=50075, ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:290)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:334)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:398)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:577)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:480)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:171)
2012-11-08 18:07:02,163 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.18.10.56:50010, storageID=DS-1599419066-10.18.10.47-50010-1329122718923, infoPort=50075, ipcPort=50020):DataXceiverServer: Exiting due to:java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:640)
at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:131)
at java.lang.Thread.run(Thread.java:662)
2012-11-08 18:07:04,964 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.18.10.56:50010, storageID=DS-1599419066-10.18.10.47-50010-1329122718923, infoPort=50075, ipcPort=50020):DataXceiver
java.io.InterruptedIOException: Interruped while waiting for IO on channel java.nio.channels.SocketChannel[closed]. 0 millis timeout left.
at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at java.io.DataInputStream.read(DataInputStream.java:132)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:287)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:334)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:398)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:577)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:480)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:171)
2012-11-08 18:07:04,965 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_-1079258682690587867_32990729 1 Exception java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:120)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:937)
at java.lang.Thread.run(Thread.java:662)
2012-11-08 18:07:05,057 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_1523791863488769175_32972264 1 Exception java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at java.io.DataOutputStream.flush(DataOutputStream.java:106)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1047)
at java.lang.Thread.run(Thread.java:662)
2012-11-08 18:07:04,972 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.18.10.56:50010, storageID=DS-1599419066-10.18.10.47-50010-1329122718923, infoPort=5
0075, ipcPort=50020):DataXceiver
java.io.IOException: Interrupted receiveBlock
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:622)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:480)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:171)
2012-11-08 18:08:02,003 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for threadgroup to exit, active threads is 1
2012-11-08 18:08:02,025 WARN org.apache.hadoop.util.Shell: Could not get disk usage information
java.io.IOException: Cannot run program "du": java.io.IOException: error=12, Cannot allocate memory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:200)
at org.apache.hadoop.util.Shell.run(Shell.java:182)
at org.apache.hadoop.fs.DU.access$200(DU.java:29)
at org.apache.hadoop.fs.DU$DURefreshThread.run(DU.java:84)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: java.io.IOException: error=12, Cannot allocate memory
at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
接着之后就一直打印下面日志hang住了
2012-11-08 18:08:52,015 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for threadgroup to exit, active threads is 1
This enables or disables panic on out-of-memory feature.
If this is set to 0, the kernel will kill some rogue process, called oom_killer. Usually, oom_killer can kill rogue processes and system will survive.
If this is set to 1, the kernel panics when out-of-memory happens. However, if a process limits using nodes by mempolicy/cpusets, and those nodes become memory exhaustion status, one process may be killed by oom-killer. No panic occurs in this case. Because other nodes' memory may be free. This means system total status may be not fatal yet.
If this is set to 2, the kernel panics compulsorily even on the above-mentioned.
The default value is 0. 1 and 2 are for failover of clustering. Please select either according to your policy of failover.
note(dirlt):对于1,2不是很理解,可能是用于分布式集群Linux系统上面的策略
1.1.5 /proc/sys/net
1.1.5.1 /proc/sys/net/ipv4/ip_local_port_range
本地port分配范围.
1.1.5.2 /proc/sys/net/ipv4/tcp_tw_reuse
重复使用处于TIME_WAIT的socket.
Allow to reuse TIME_WAIT sockets for new connections when it is safe from protocol viewpoint.
This specifies a limit on the total number of file descriptors that a user can register across all epoll instances on the system. The limit is per real user ID. Each registered file descriptor costs roughly 90 bytes on a 32-bit kernel, and roughly 160 bytes on a 64-bit kernel. Currently, the default value for max_user_watches is 1/25 (4%) of the available low memory, divided by the registration cost in bytes.
1.1.7 /proc/sys/kernel
1.1.7.1 /proc/sys/kernel/hung_task_timeout_secs
Detecting hung tasks in Linux
Sometimes tasks under Linux are blocked forever (essentially hung). Recent Linux kernels have an infrastructure to detect hung tasks. When this infrastructure is active it will periodically get activated to find out hung tasks and present a stack dump of those hung tasks (and maybe locks held). Additionally we can choose to panic the system when we detect atleast one hung task in the system. I will try to explain how khungtaskd works.
The infrastructure is based on a single kernel thread named as “khungtaskd”. So if you do a ps in your system and see that there is entry like [khungtaskd] you know it is there. I have one in my system: "136 root SW [khungtaskd]"
The loop of the khungtaskd daemon is a call to the scheduler for waking it up after ever 120 seconds (default value). The core algorithm is like this:
Iterate over all the tasks in the system which are marked as TASK_UNINTERRUPTIBLE (additionally it does not consider UNINTERRUPTIBLE frozen tasks & UNINTERRUPTIBLE tasks that are newly created and never been scheduled out).
If a task has not been switched out by the scheduler atleast once in the last 120 seconds it is considered as a hung task and its stack dump is displayed. If CONFIG_LOCKDEP is defined then it will also show all the locks the hung task is holding.
One can change the sampling interval of khungtaskd through the sysctl interface /proc/sys/kernel/hung_task_timeout_secs.
之前在hdfs一个datanode上面出现了磁盘损坏问题,然后在syslog里面发现了下面日志
May 14 00:02:50 dp46 kernel: INFO: task jbd2/sde1-8:3411 blocked for more than 120 seconds.
May 14 00:02:50 dp46 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secsmahung_task_timeout_secs" disables this message.
May 14 00:02:50 dp46 kernel: jbd2/sde1-8 D 0000000000000000 0 3411 2 0x00000000
May 14 00:02:50 dp46 kernel: ffff880817a71a80 0000000000000046 ffff880096d12f00 0000000000000441
May 14 00:02:50 dp46 kernel: ffff880818052938 ffff880818052848 ffff88081805c3b8 ffff88081805c3b8
May 14 00:02:50 dp46 kernel: ffff88081b22e6b8 ffff880817a71fd8 000000000000f4e8 ffff88081b22e6b8
May 14 00:02:50 dp46 kernel: Call Trace:
May 14 00:02:50 dp46 kernel: [<ffffffff8109b809>] ? ktime_get_ts+0xa9/0xe0
May 14 00:02:50 dp46 kernel: [<ffffffff81110b10>] ? sync_page+0x0/0x50
May 14 00:02:50 dp46 kernel: [<ffffffff814ed1e3>] io_schedule+0x73/0xc0
May 14 00:02:50 dp46 kernel: [<ffffffff81110b4d>] sync_page+0x3d/0x50
May 14 00:02:50 dp46 kernel: [<ffffffff814eda4a>] __wait_on_bit_lock+0x5a/0xc0
May 14 00:02:50 dp46 kernel: [<ffffffff81110ae7>] __lock_page+0x67/0x70
May 14 00:02:50 dp46 kernel: [<ffffffff81090c30>] ? wake_bit_function+0x0/0x50
May 14 00:02:50 dp46 kernel: [<ffffffff811271a5>] ? pagevec_lookup_tag+0x25/0x40
May 14 00:02:50 dp46 kernel: [<ffffffff811261f2>] write_cache_pages+0x392/0x4a0
May 14 00:02:50 dp46 kernel: [<ffffffff81124c80>] ? __writepage+0x0/0x40
May 14 00:02:50 dp46 kernel: [<ffffffff81126324>] generic_writepages+0x24/0x30
May 14 00:02:50 dp46 kernel: [<ffffffffa00774d7>] journal_submit_inode_data_buffers+0x47/0x50 [jbd2]
May 14 00:02:50 dp46 kernel: [<ffffffffa00779e5>] jbd2_journal_commit_transaction+0x375/0x14b0 [jbd2]
May 14 00:02:50 dp46 kernel: [<ffffffff8100975d>] ? __switch_to+0x13d/0x320
May 14 00:02:50 dp46 kernel: [<ffffffff8107c0ec>] ? lock_timer_base+0x3c/0x70
May 14 00:02:50 dp46 kernel: [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40
May 14 00:02:50 dp46 kernel: [<ffffffffa007d928>] kjournald2+0xb8/0x220 [jbd2]
May 14 00:02:50 dp46 kernel: [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40
May 14 00:02:50 dp46 kernel: [<ffffffffa007d870>] ? kjournald2+0x0/0x220 [jbd2]
May 14 00:02:50 dp46 kernel: [<ffffffff81090886>] kthread+0x96/0xa0
May 14 00:02:50 dp46 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
May 14 00:02:50 dp46 kernel: [<ffffffff810907f0>] ? kthread+0x0/0xa0
May 14 00:02:50 dp46 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
The JBD is the journaling block device that sits between the file system and the block device driver. The jbd2 version is for ext4.
[dirlt@localhost.localdomain]$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 4 45752 33460 99324 0 0 1 1 1 9 0 0 99 0 0
0 0 4 45752 33460 99324 0 0 0 0 1 8 0 0 100 0 0
[zhangyan04@tc-hpc-dev.tc.baidu.com]$ vmstat -m
Cache Num Total Size Pages
nfs_direct_cache 0 0 168 24
nfs_write_data 69 69 704 23
Num 当前多少个对象正在被使用
Total 总共有多少个对象可以被使用
Size 每个对象大小
Pages 占用了多少个Page(这个Page上面至少包含一个正在被使用的对象)
[zhangyan04@tc-hpc-dev.tc.baidu.com]$ vmstat -s
8191996 total memory
4519256 used memory
1760044 active memory
2327204 inactive memory
3672740 free memory
76200 buffer memory
3935788 swap cache
1020088 total swap
0 used swap
1020088 free swap
423476 non-nice user cpu ticks
91 nice user cpu ticks
295803 system cpu ticks
70621941 idle cpu ticks
39354 IO-wait cpu ticks
800 IRQ cpu ticks
52009 softirq cpu ticks
317179 pages paged in
54413375 pages paged out
0 pages swapped in
0 pages swapped out
754373489 interrupts
500998741 CPU context switches
1323083318 boot time
418742 forks
taskset is used to set or retrieve the CPU affinity of a running pro-
cess given its PID or to launch a new COMMAND with a given CPU affin-
ity. CPU affinity is a scheduler property that "bonds" a process to a
given set of CPUs on the system. The Linux scheduler will honor the
given CPU affinity and the process will not run on any other CPUs.
Note that the Linux scheduler also supports natural CPU affinity: the
scheduler attempts to keep processes on the same CPU as long as prac-
tical for performance reasons. Therefore, forcing a specific CPU
affinity is useful only in certain applications.
1.2.6 lsof
todo(dirlt):
1.2.7 hdparm
hdparm - get/set hard disk parameters
下面是使用的用法
/sbin/hdparm [ flags ] [device] ..
对于device的话可以通过mount来查看
[dirlt@localhost.localdomain]$ mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
NOTE
This program is obsolete. Replacement for netstat is ss. Replacement
for netstat -r is ip route. Replacement for netstat -i is ip -s link.
Replacement for netstat -g is ip maddr.
[zhangyan04@tc-hpc-dev.tc.baidu.com]$ netstat -s
Ip:
322405625 total packets received
0 forwarded
0 incoming packets discarded
322405625 incoming packets delivered
369134846 requests sent out
33 dropped because of missing route
Icmp:
30255 ICMP messages received
0 input ICMP message failed.
ICMP input histogram:
echo requests: 30170
echo replies: 83
timestamp request: 2
30265 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 10
echo request: 83
echo replies: 30170
timestamp replies: 2
IcmpMsg:
InType0: 83
InType8: 30170
InType13: 2
OutType0: 30170
OutType3: 10
OutType8: 83
OutType14: 2
Tcp:
860322 active connections openings
199165 passive connection openings
824990 failed connection attempts
43268 connection resets received
17 connections established
322306693 segments received
368937621 segments send out
56075 segments retransmited
0 bad segments received.
423873 resets sent
Udp:
68643 packets received
10 packets to unknown port received.
0 packet receive errors
110838 packets sent
UdpLite:
TcpExt:
1999 invalid SYN cookies received
5143 resets received for embryonic SYN_RECV sockets
2925 packets pruned from receive queue because of socket buffer overrun
73337 TCP sockets finished time wait in fast timer
85 time wait sockets recycled by time stamp
4 delayed acks further delayed because of locked socket
Quick ack mode was activated 7106 times
5141 times the listen queue of a socket overflowed
5141 SYNs to LISTEN sockets ignored
81288 packets directly queued to recvmsg prequeue.
297394763 packets directly received from backlog
65102525 packets directly received from prequeue
180740292 packets header predicted
257396 packets header predicted and directly queued to user
5983677 acknowledgments not containing data received
176944382 predicted acknowledgments
2988 times recovered from packet loss due to SACK data
Detected reordering 9 times using FACK
Detected reordering 15 times using SACK
Detected reordering 179 times using time stamp
835 congestion windows fully recovered
1883 congestion windows partially recovered using Hoe heuristic
TCPDSACKUndo: 1806
1093 congestion windows recovered after partial ack
655 TCP data loss events
TCPLostRetransmit: 6
458 timeouts after SACK recovery
7 timeouts in loss state
3586 fast retransmits
178 forward retransmits
425 retransmits in slow start
51048 other TCP timeouts
37 sack retransmits failed
1610293 packets collapsed in receive queue due to low socket buffer
7094 DSACKs sent for old packets
14430 DSACKs received
4358 connections reset due to unexpected data
12564 connections reset due to early user close
29 connections aborted due to timeout
TCPDSACKIgnoredOld: 12177
TCPDSACKIgnoredNoUndo: 347
TCPSackShifted: 6421
TCPSackMerged: 5600
TCPSackShiftFallback: 119131
IpExt:
InBcastPkts: 22
InOctets: 167720101517
OutOctets: 169409102263
InBcastOctets: 8810
[zhangyan04@tc-hpc-dev.tc.baidu.com]$ netstat --ip --tcp -a -e -p
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 tc-hpc-dev.tc.baidu.c:19870 *:* LISTEN zhangyan04 30549010 28965/echo_server
tcp 1024 0 tc-hpc-dev.tc.baidu.c:19870 tc-com-test00.tc.baid:60746 ESTABLISHED zhangyan04 30549012 28965/echo_server
tcp 0 1024 tc-hpc-dev.tc.baidu.c:19870 tc-com-test00.tc.baid:60745 ESTABLISHED zhangyan04 30549011 28965/echo_server
Simple answer: you cannot. Longer answer: the uninterruptable sleep means the process will not be woken up by signals. It can be only woken up by what it's waiting for. When I get such situations eg. with CD-ROM, I usually reset the computer by using suspend-to-disk and resuming.
The D state basically means that the process is waiting for disk I/O, or other block I/O that can't be interrupted. Sometimes this means the kernel or device is feverishly trying to read a bad block (especially from an optical disk). Sometimes it means there's something else. The process cannot be killed until it gets out of the D state. Find out what it is waiting for and fix that. The easy way is to reboot. Sometimes removing the disk in question helps, but that can be rather dangerous: unfixable catastrophic hardware failure if you don't know what you're doing (read: smoke coming out).
Server Software: nginx/1.2.1
Server Hostname: localhost
Server Port: 80
Document Path: /
Document Length: 1439 bytes
Concurrency Level: 100
Time taken for tests: 0.760 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 16500000 bytes
HTML transferred: 14390000 bytes
Requests per second: 13150.09 [#/sec] (mean)
Time per request: 7.605 [ms] (mean)
Time per request: 0.076 [ms] (mean, across all concurrent requests)
Transfer rate: 21189.11 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 1.4 0 18
Processing: 2 7 1.8 7 20
Waiting: 1 7 1.8 7 20
Total: 5 7 2.0 7 20
Percentage of the requests served within a certain time (ms)
50% 7
66% 7
75% 8
80% 8
90% 9
95% 10
98% 14
99% 19
100% 20 (longest request)
#echo set terminal postscript color > gnuplot.cmd
echo set terminal png xffffff > gnuplot.cmd
#echo set data style linespoints >> gnuplot.cmd
echo set style data linespoints >> gnuplot.cmd
kernel将这个packet传递给IP层进行处理。IP层需要将信息组装成为ip packet。如果ip packet是tcp的话那么放到socket backlog里面。如果socket backlog满了的话那么将ip packet丢弃。 copy packet data to ip buffer to form ip packet
note(dirlt):这个步骤完成之后IP layer就可以释放sk_buffer结构
tcp layer从socket backlog里面取出tcp packet, copy ip packet tcp recv buffer to form tcp packet
tcp recv buffer交给application layer处理, copy tcp recv buffer to app buffer to form app packet
application layer将数据copy到tcp send buffer里面,如果空间不够的话那么就会出现阻塞。 copy app buffer to tcp send buffer as app packet
tcp layer等待tcp send buffer存在数据或者是需要做ack的时候,组装ip packet推送到IP layer copy tcp send buffer to ip send buffer as tcp packet
IP layer从kernel memory申请sk_buffer,将ip data包装成为packet data,然后塞到qdisc(txqueuelen控制队列长度)里面。如果队列满的话那么就会出现阻塞,反馈到tcp layer阻塞。 copy ip send buffer to packet data as ip packet
NIC driver如果检测到qdisc有数据的话,那么会将packet data从qdisc放置到ring buffer里面,然后调用NIC DMA Engine将packet发送出去 todo(dirlt):可能理解有误
a 16-bit number, whose low byte is the signal number that killed the process, and whose high byte is the exit status (if the signal number is zero); the high bit of the low byte is set if a core file was produced.
dp@dp8:~$ dmesg | grep eth0
[ 15.635160] eth0: Broadcom NetXtreme II BCM5716 1000Base-T (C0) PCI Express f
[ 15.736389] bnx2: eth0: using MSIX
[ 15.738263] ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 37.848755] bnx2: eth0 NIC Copper Link is Up, 100 Mbps full duplex
[ 37.850623] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 1933.934668] bnx2: eth0: using MSIX
[ 1933.936960] ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 1956.130773] bnx2: eth0 NIC Copper Link is Up, 100 Mbps full duplex
[ 1956.132625] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[4804526.542976] bnx2: eth0 NIC Copper Link is Down
[4804552.008858] bnx2: eth0 NIC Copper Link is Up, 100 Mbps full duplex
[4804552.008858] bnx2: eth0 NIC Copper Link is Up, 100 Mbps full duplex dp8上的网卡速度被识别成100 Mbps了。,可能的原因如下:
Table of Contents
1 linux
1.1 proc filesystem
1.1.1 /proc
1.1.1.1 /proc/meminfo
系统中关于当前内存的利用状况等的信息,常由free命令使用;可以使用文件查看命令直接读取此文件,其内容显示为两列,前者为统计属性,后者为对应的值;
1.1.1.2 /proc/stat
实时追踪自系统上次启动以来的多种统计信息;如下所示,其中,
1.1.1.3 /proc/swaps
当前系统上的交换分区及其空间利用信息,如果有多个交换分区的话,则会每个交换分区的信息分别存储于/proc/swap目录中的单独文件中,而其优先级数字越低,被使用到的可能性越大;
1.1.1.4 /proc/cmdline
在启动时传递至内核的相关参数信息,这些信息通常由lilo或grub等启动管理工具进行传递;
1.1.1.5 /proc/uptime
系统上次启动以来的运行时间,如下所示,其第一个数字表示系统运行时间,第二个数字表示系统空闲时间,单位是秒;系统上次启动以来的运行时间,如下所示,其第一个数字表示系统运行时间,第二个数字表示系统空闲时间,单位是秒;
1.1.1.6 /proc/version
当前系统运行的内核版本号
1.1.1.7 /proc/mounts
系统当前挂载的所有文件系统.第一列表示挂载的设备,第二列表示在当前目录树中的挂载点,第三点表示当前文件系统的类型,第四列表示挂载属性(ro或者rw),第五列和第六列用来匹配/etc/mtab文件中的转储(dump)属性;
1.1.1.8 /proc/modules
当前装入内核的所有模块名称列表,可以由lsmod命令使用,也可以直接查看;如下所示,其中第一列表示模块名,第二列表示此模块占用内存空间大小,第三列表示此模块有多少实例被装入,第四列表示此模块依赖于其它哪些模块,第五列表示此模块的装载状态(Live:已经装入;Loading:正在装入;Unloading:正在卸载),第六列表示此模块在内核内存(kernel memory)中的偏移量;
1.1.1.9 /proc/diskstats
每块磁盘设备的磁盘I/O统计信息列表
1.1.1.10 /proc/cpuinfo
1.1.1.11 /proc/crypto
系统上已安装的内核使用的密码算法及每个算法的详细信息列表
1.1.1.12 /proc/loadavg
保存关于CPU和磁盘I/O的负载平均值,其前三列分别表示每1秒钟、每5秒钟及每15秒的负载平均值,类似于uptime命令输出的相关信息;第四列是由斜线隔开的两个数值,前者表示当前正由内核调度的实体(进程和线程)的数目,后者表示系统当前存活的内核调度实体的数目;第五列表示此文件被查看前最近一个由内核创建的进程的PID.
1.1.1.13 /proc/locks
保存当前由内核锁定的文件的相关信息,包含内核内部的调试数据;每个锁定占据一行,且具有一个惟一的编号;如下输出信息中每行的第二列表示当前锁定使用的锁定类别,POSIX表示目前较新类型的文件锁,由lockf系统调用产生,FLOCK是传统的UNIX文件锁,由flock系统调用产生;第三列也通常由两种类型,ADVISORY表示不允许其他用户锁定此文件,但允许读取,MANDATORY表示此文件锁定期间不允许其他用户任何形式的访问;
1.1.1.14 /proc/slabinfo
在内核中频繁使用的对象(如inode、dentry等)都有自己的cache,即slab pool,而/proc/slabinfo文件列出了这些对象相关slap的信息;详情可以参见内核文档中slapinfo的手册页;
1.1.1.15 /proc/vmstat
当前系统虚拟内存的多种统计数据,信息量可能会比较大,这因系统而有所不同,可读性较好;
1.1.1.16 /proc/zoneinfo
内存区域(zone)的详细信息列表
1.1.2 proc/<pid>
其中pid为对应的进程号,目录下面就是这个进程对应的信息。
1.1.2.1 fd
todo(zhangyan04):
1.1.2.2 io
TODO:
1.1.2.3 limits
TODO:
1.1.2.4 maps
当前进程关联到的每个可执行文件和库文件在内存中的映射区域及其访问权限所组成的列表
1.1.2.5 mount
TODO:
1.1.2.6 net
todo(zhangyan04):
1.1.2.7 sched
todo(zhangyan04):
1.1.2.8 status
TODO:
1.1.2.9 statm
Provides information about memory usage, measured in pages. The columns are:
1.1.3 /proc/sys
在/proc/sys下面有一些可以动态修改的内核参数,有两种方式可以修改这些参数。
首先可以使用sysctl工具来进行修改。比如如果想修改sys/vm/swappiness==0的话,那么可以
上面修改方式是临时的,如果想进行永久修改的话可以修改/etc/sysctl.conf文件
然后重启那么这个设置就会永久生效。
1.1.4 /proc/sys/vm
1.1.4.1 /proc/sys/vm/overcommit_memory
所谓的overcommit是过量使用的意思。
下午将dp3的overcommit_memory参数修改成为2之后,首先出现的问题就是不能够再执行任何shell命令了,错误是fork can't allocate enough memory,就是fork没有那么多的内存可用。然后推出会话之后没有办法再登陆dp3了。这个主要是因为jvm应该基本上占用满了物理内存,而overcommit_ration=0.5,并且没有swap空间,所以没有办法allocate更多的memory了。
从/var/log/syslog里面可以看到,修改了这个参数之后,很多程序受到影响(ganglia挂掉了,cron不能够fork出进程了,init也不能够分配出更多的tty,导致我们没有办法登陆上去)在ganglia里面看到内存以及CPU使用都是一条直线,不是因为系统稳定而是因为gmond挂掉了。
而在hadoop的datanode日志里面,有下面这些错误(只是给出部分exception):
接着之后就一直打印下面日志hang住了
hdfs web页面上面显示dead node,但是实际上这个datanode进程还存活。原因估计也是因为不能够分配足够的内存出现这些问题的吧。
最后可以登陆上去的原因,我猜想应该是datanode挂掉了,上面的regionserver暂时没有分配内存所以有足够的内存空间,init可以开辟tty。
现在已经将这个值调整成为原来的值,也就是0。索性的是,在这个期间,这个修改对于线上的任务执行没有什么影响。
1.1.4.2 /proc/sys/vm/overcommit_ratio
如果overcommit_memory值为2的话,那么这个参数决定了系统的<可用内存>的大小。计算方式是 (Physical-RAM-Size) * ratio / 100 + (Swap-Size).
所以对于我这个系统来说,可用的虚拟内存在(491*50/100)+509=754M. note(dirlt):这个仅仅是在overcommit_memory=2的时候估算的<可用内存>大小, 实际上对于其他情况来说可用内存大小还是(Physical-RAM-Size) + (Swap-Size).
1.1.4.3 /proc/sys/vm/swappiness
这个参数决定系统使用swap的程度。但是这个参数并没有禁止使用swap分区,而只是一个依赖于swap分区的程度。 如果这个值设置成为0的话那么,那么系统会尽可能地将减少page swap in/out操作,将更多的内存操作于物理内存上面。
1.1.4.4 /proc/sys/vm/dirty_*
这几个参数主要是用来控制脏页刷回磁盘策略。关于脏页刷回磁盘的过程可以参看"文件IO/write"一节。
note(dirlt)@2013-05-25: 我copy了一份内容过来
对于这些脏页的写回策略是:
注意到这里可能启动pdflush daemon在后台刷新脏页。另外系统每隔dirty_writeback_centisecs时间会启动pdflush daemon将脏页刷到磁盘上面。而pdflush daemon工作方式是这样的,检查脏页是否存在超过dirty_expire_centisecs时间的,如果超过的话那么就会在后台刷新这些脏页。
1.1.4.5 /proc/sys/vm/drop_caches
可以用来释放kernel保存的buffers和cached memory,buffers保存的是目录以及文件的inode,cached memory保存的是操作文件时候使用的pagecache
为了防止数据丢失,可以在修改这个文件之前先调用sync强制写盘
1.1.4.6 /proc/sys/vm/panic_on_oom
This enables or disables panic on out-of-memory feature.
If this is set to 0, the kernel will kill some rogue process, called oom_killer. Usually, oom_killer can kill rogue processes and system will survive.
If this is set to 1, the kernel panics when out-of-memory happens. However, if a process limits using nodes by mempolicy/cpusets, and those nodes become memory exhaustion status, one process may be killed by oom-killer. No panic occurs in this case. Because other nodes' memory may be free. This means system total status may be not fatal yet.
If this is set to 2, the kernel panics compulsorily even on the above-mentioned.
The default value is 0. 1 and 2 are for failover of clustering. Please select either according to your policy of failover.
note(dirlt):对于1,2不是很理解,可能是用于分布式集群Linux系统上面的策略
1.1.5 /proc/sys/net
1.1.5.1 /proc/sys/net/ipv4/ip_local_port_range
本地port分配范围.
1.1.5.2 /proc/sys/net/ipv4/tcp_tw_reuse
重复使用处于TIME_WAIT的socket.
Allow to reuse TIME_WAIT sockets for new connections when it is safe from protocol viewpoint.
1.1.5.3 /proc/sys/net/ipv4/tcp_tw_recycle
快速回收处理TIME_WAIT的socket.
Enable fast recycling of TIME_WAIT sockets.
1.1.5.4 /proc/sys/net/ipv4/tcp_max_syn_backlog
等待client做ack的连接数目上限
1.1.5.5 /proc/sys/net/core/somaxconn
每个端口监听队列的最大长度。
1.1.5.6 /proc/sys/net/core/netdev_max_backlog
网络设备接收数据包的速率比内核处理这些包的速率快时,允许送到队列的数据包的最大数目。
1.1.6 /proc/sys/fs
1.1.6.1 /proc/sys/fs/file-max
所有进程允许打开文件的最大数量 note(dirlt):这个应该是和文件描述符有区别的
1.1.6.2 /proc/sys/fs/epoll/max_user_instances
单个用户使用epoll的文件描述符上限。如果超过上限会返回EMFILE错误。 note(dirlt):不过在我的文件系统下面似乎没有这个选项
1.1.6.3 /proc/sys/fs/epoll/max_user_watches
单个用户使用epoll进行watch的文件描述符上限。 note(dirlt):对于服务器应该特别有用,可以限制内存使用量
This specifies a limit on the total number of file descriptors that a user can register across all epoll instances
on the system. The limit is per real user ID. Each registered file descriptor costs roughly 90 bytes on a 32-bit
kernel, and roughly 160 bytes on a 64-bit kernel. Currently, the default value for max_user_watches is 1/25 (4%)
of the available low memory, divided by the registration cost in bytes.
1.1.7 /proc/sys/kernel
1.1.7.1 /proc/sys/kernel/hung_task_timeout_secs
Detecting hung tasks in Linux
Sometimes tasks under Linux are blocked forever (essentially hung). Recent Linux kernels have an infrastructure to detect hung tasks. When this infrastructure is active it will periodically get activated to find out hung tasks and present a stack dump of those hung tasks (and maybe locks held). Additionally we can choose to panic the system when we detect atleast one hung task in the system. I will try to explain how khungtaskd works.
The infrastructure is based on a single kernel thread named as “khungtaskd”. So if you do a ps in your system and see that there is entry like [khungtaskd] you know it is there. I have one in my system: "136 root SW [khungtaskd]"
The loop of the khungtaskd daemon is a call to the scheduler for waking it up after ever 120 seconds (default value). The core algorithm is like this:
One can change the sampling interval of khungtaskd through the sysctl interface /proc/sys/kernel/hung_task_timeout_secs.
之前在hdfs一个datanode上面出现了磁盘损坏问题,然后在syslog里面发现了下面日志
The JBD is the journaling block device that sits between the file system and the block device driver. The jbd2 version is for ext4.
1.1.8 /proc/net
1.1.8.1 /proc/net/tcp
记录所有tcp连接,netstat以及lsof都会读取这个文件. 我们遇到过一个问题就是netstat/lsof速度非常慢,通过strace发现是在读取这个文件时候非常耗时,下面两个链接给出了一些相关信息
todo(dirlt):
1.2 system utility
1.2.1 SYS DEV
1.2.2 mpstat
mpstat - Report processors related statistics.
通常使用就是"mpstat -P ALL 1"
其中每个字段的意思分别是:
1.2.3 vmstat
1.2.4 free
1.2.5 taskset
可以用来获取和修改进程的CPU亲和性。
如果不指定-c的话那么就是获取亲和性。程序上的话可以使用sched_setaffinity/sched_getaffinity调用来修改和获取某个进程和CPU的亲和性。
1.2.6 lsof
todo(dirlt):
1.2.7 hdparm
hdparm - get/set hard disk parameters
下面是使用的用法
对于device的话可以通过mount来查看
我们关注自己读写目录,比如通常在/home下面,这里就是使用的device就是/dev/mapper/VolGroup00-LogVol00
todo(dirlt):好多选项都不太清楚是什么意思.
1.2.8 pmap
todo(dirlt):
1.2.9 strace
todo(dirlt):
1.2.10 iostat
iostat主要用来观察io设备的负载情况的。首先我们看看iostat的样例输出
第一行显示了CPU平均负载情况,然后给出的信息是自从上一次reboot起来今的iostat平均信息。如果我们使用iostat采用interval输出的话,那么下一次的数值是相对于上一次的数值而言的。这里解释一下CPU的各个状态:
然后在来看看iostat的命令行参数
其中interval表示每隔x时间刷新一次输出,而count表示希望输出多少次.下面解释一下每隔参数的含义:
iostat也可以指定选择输出哪些block device.
通常命令也就是iostat -d -k -x 1.我们来看看样例输出
然后分析其中字段:
1.2.11 vmtouch
https://github.com/hoytech/vmtouch
note(dirlt):可以用来warmup数据,使用参数似乎也比较简单
里面有一些系统调用比较值得注意和学习:
1.2.12 latencytop
todo(dirlt): https://latencytop.org/
1.2.13 iotop
可以用来观察单独进程的IO使用状况
todo(dirlt):
1.2.14 dstat
todo(dirlt): https://github.com/dagwieers/dstat
http://weibo.com/1840408525/AdGkO3uEL dstat -lamps
1.2.15 slurm
Simple Linux Utility for Resource Management todo(dirlt): https://computing.llnl.gov/linux/slurm/
1.2.16 sar
sar - Collect, report, or save system activity information.
下面是所有的选项
关于网络接口数据显示的话,下面是使用DEV可以查看的字段
下面是使用EDEV可以查看的字段
下面是使用SOCK可以查看的字段
选项非常多,但是很多选项没有必要打开。对于网络程序来说的话,通常我们使用到的选项会包括
通常我们使用的命令就应该是sar -n DEV -P ALL -u 1 0(1表示1秒刷新,0表示持续显示)
1.2.17 netstat
netstat - Print network connections, routing tables, interface statistics, masquerade connections, and multicast memberships
netstat可以查看很多信息,包括网络链接,路由表,网卡信息,伪装链接以及多播成员关系。但是从文档上看,一部分工作可以在/sbin/ip里面完成
我们这里打算对netstat使用限制在查看网络连接,以及各种协议的统计数据上.
首先我们看看如何查看各种协议的统计数据.
我们可以查看和tcp,udp以及raw socket相关的数据,delay表示刷新时间。
内容非常多这里也不详细分析了。
然后看看连接这个部分的功能
对于address_family允许指定协议族,通常来说我们可能会使用
然后剩下的选项
我们看看一个使用的例子
下面是对于tcp socket的字段解释.对于unix domain socket字段不同但是这里不写出来了.
1.2.18 tcpdump
todo(zhangyan04):
1.2.19 iftop
todo(dirlt): http://www.ex-parrot.com/~pdw/iftop/
1.2.20 iftraf
todo(dirlt): http://iptraf.seul.org/
1.2.21 rsync
常用选项:
常用命令:
1.2.22 iodump
1.2.23 iopp
1.2.24 nethogs
todo(dirlt):
1.2.25 slabtop
slabtop - display kernel slab cache information in real time
1.2.26 nmon
nmon - systems administrator, tuner, benchmark tool.
http://nmon.sourceforge.net/pmwiki.php Nigel's performance Monitor for Linux
1.2.27 collectl
collectl http://collectl.sourceforge.net/ todo(dirlt):似乎相当不错,将很多关键信息都做了收集和整合
1.2.28 numactl
todo(dirlt):
1.2.29 jnettop
todo(dirlt):
1.2.30 glances
http://nicolargo.github.io/glances/
todo(dirlt):
1.2.31 ifconfig
ifconfig - configure a network interface
/sbin/ifconfig可以用来配置和查看network interface.不过从文档上看的话,更加推荐使用/sbin/ip这个工具
这里我们不打算学习如何配置network interface只是想查看一下网卡的信息。使用/sbin/ifconfig -a可以查看所有的网卡信息,即使这个网卡关闭。
我们这里稍微仔细看看eht1的网卡信息
使用ifconfig能够创建虚拟网卡绑定IP
1.2.32 ps(process snapshot)
进程状态有下面几种:
在使用ubuntu的apt-get时候,可能会出现一些异常的状况,我们直接终止了apt-get。但是这个时候apt-get软件本身出于一个不正常的状态, 导致之后不能够启动apt-get。如果观察进程的话会出现下面一些可疑的进程
这些进程的父进程都是init进程,并且状态是uninterruptible sleep,给kill -9也没有办法终止,唯一的办法只能reboot机器来解决这个问题。关于这个问题可以看stackoverflow上面的解答 How to stop 'uninterruptible' process on Linux? - Stack Overflow http://stackoverflow.com/questions/767551/how-to-stop-uninterruptible-process-on-linux
1.2.33 ulimit
todo(dirlt)
1.2.34 sysprof
Sysprof - Statistical, system-wide Profiler for Linux : http://sysprof.com/
1.2.35 ss
1.2.36 SYS ADMIN
1.2.37 uptime
1.2.38 top
1.2.39 htop
1.2.40 ttyload
1.2.41 dmesg
能够察看开机时启动信息(启动信息保存在/var/log/dmesg文件里)
1.2.42 quota
http://blog.itpub.net/post/7184/488931
quota用来为用户编辑磁盘配额。
修改/etc/fstab增加usrquota以及grpquota
如果为用户设定可以使用/usr/sbin/edquota -u testuser,如果需要为群组设定的话/usr/sbin/edquota -g testgrp.
各个字段含义如下:
可以使用/usr/sbin/edquota -t来修改软限额期限。
可以对WWW空间,FTP空间,Email空间进行磁盘配额限制。Quota只能基于磁盘分区进行配额管理,不能基于目录进行配额管理,因此只能把数据存放在有配额限制的分区,再用符号链接到实际应用的目录。
1.2.43 crontab
crontab就是为了能够使得工作自动化在特定的时间或者是时间间隔执行特定的程序。crontab -e就可以编辑crontab配置文件,默认是vim编辑器。crontab配置文件里面可以像shell一样定义变量,之后就是任务描述,每一个任务分为6个字段: minute hour day month week command
对于每个字段可以有3种表示方式
对于系统级别的crontab配置文件在/etc/crontab貌似里面还多了一个用户字段.下面是几个配置的例子:
1.2.44 ntp
ntp(network time protocol)是用来做机器时间同步的,包含下面几个组件:
一个最重要的问题就是,daemon以什么时间间隔来和指定的server进行同步。
ntp是可以在minpoll和maxpoll指定的时间间隔内来选择同步间隔的,默认使用minpoll也就是64seconds.
其实如果不考虑为其他机器提供服务的话,完全可以在cron里面使用ntpdate来进行同步。
1.2.45 cssh
todo(dirlt):
1.2.46 iptables
查看当前过滤规则 iptables -L/-S
可以使用 iptables -A [chain] [chain-specification]来添加规则
其中chain指INPUT, 之后部分都是chain-specification. 其中s表示过滤源地址,d表示目的地址,而-j而表示动作。
可以使用 iptables -D 来删除规则。其中规则既可以使用rule-num来引用,也可以使用chain-specification来指定
1.2.47 HTTP BENCH TOOL
1.2.48 httperf
download http://www.hpl.hp.com/research/linux/httperf/
paper http://www.hpl.hp.com/research/linux/httperf/wisp98/httperf.pdf
httperf是用来测试HTTP server性能的工具,支持HTTP1.0和1.1.下面是这个工具命令行参数
httperf有几种不同的workload方式:
note(dirlt):关于session-oriented这个概念,是后来看了论文里面才清楚的。主要解决的就是实际中browse的场景。 通常我们请求一个页面里面都会嵌入很多objects包括js或者是css等。我们一次浏览称为session,而session里面会有很多请求。 这些请求通常是,首先等待第一个请求处理完成(浏览器解析页面),然后同时发起其他请求。
常用选项
note(dirlt):不过httperf采用select模型,导致最大连接数存在上限。
结果分析
Connection部分
Request部分
Reply部分
如果使用session模式的话,那么结果会有
1.2.49 ab
ab(apache benchmarking)是apache httpd自带的一个HTTP Server个的工具。下面是这个工具命令行参数
功能上没有httperf多但是觉得应该大部分时候足够使用的。
note(dirlt):ab和httperf工作模型不同。httperf是指定建立多少个链接,每个链接上发起多少个calls。而ab指定一共发送多少个请求, 每批发送多少个请求,然后计算每批时间统计。ab必须等待这批请求全部返回或者是失败或者是超时。可以作为对方的互补。nice!!!
下面看看每个参数的含义:
我们可以这样使用ab -c 100 -n 10000 -r localhost/ 输出还是很好理解的。对于最后面百分比时间,注意是包含100个concurrency的结果。
1.2.50 autobench
http://www.xenoclast.org/autobench/
autobench作为httperf包装,也提供了分布式压力测试的工具。
这里先介绍一下单机使用情况。autobench的manpage提供了非常清晰的说明 http://www.xenoclast.org/autobench/man/autobench.html. 可以看到autobench提供了比较两个站点的性能。
默认配置文件是~/.autobench.conf,方便经常使用。常用命令方式就是
得到tsv文件之后可以使用bench2graph转换成为png格式。bench2graph需要做一些修改
使用bench2graph bench.tsv bench.png,然后会提示输入title即可生成比较图。
todo(dirlt):后续可能需要学习如何使用autobench分布式测试,因为httperf该死的select模型。
1.3 kernel
1.3.1 vmlinuz
vmlinuz是可引导的、压缩的内核。“vm”代表“Virtual Memory”。Linux 支持虚拟内存,不像老的操作系统比如DOS有640KB内存的限制。Linux能够使用硬盘空间作为虚拟内存,因此得名“vm”。vmlinuz是可执行的Linux内核,它位于/boot/vmlinuz,它一般是一个软链接。vmlinux是未压缩的内核,vmlinuz是vmlinux的压缩文件。
vmlinuz的建立有两种方式。一是编译内核时通过“make zImage”创建,然后通过:“cp /usr/src/linux-2.4/arch/i386/linux/boot/zImage /boot/vmlinuz”产生。zImage适用于小内核的情况,它的存在是为了向后的兼容性。二是内核编译时通过命令make bzImage创建,然后通过:“cp /usr/src/linux-2.4/arch/i386/linux/boot/bzImage /boot/vmlinuz”产生。bzImage是压缩的内核映像,需要注意,bzImage不是用bzip2压缩的,bzImage中的bz容易引起误解,bz表示“big zImage”。 bzImage中的b是“big”意思。
zImage(vmlinuz)和bzImage(vmlinuz)都是用gzip压缩的。它们不仅是一个压缩文件,而且在这两个文件的开头部分内嵌有gzip解压缩代码。所以你不能用gunzip 或 gzip –dc解包vmlinuz。内核文件中包含一个微型的gzip用于解压缩内核并引导它。两者的不同之处在于,老的zImage解压缩内核到低端内存(第一个640K),bzImage解压缩内核到高端内存(1M以上)。如果内核比较小,那么可以采用zImage或bzImage之一,两种方式引导的系统运行时是相同的。大的内核采用bzImage,不能采用zImage。
1.3.2 tcp IO
http://www.ece.virginia.edu/cheetah/documents/papers/TCPlinux.pdf
note(dirlt):后面影响拥塞部分没有看
packet reception
整个流程大致如下:
其中内核参数有
packet transmission
整个流程大致如下:
其中内核参数有:
note(dirlt):在wangyx的帮助下这个配置在ifconfig下面找到了
txqueuelen = 1000
1.3.3 tcp congestion control
1.3.4 kernel panic
todo(dirlt):
1.4 application
1.4.1 返回值问题
首先看下面一段Java程序
然后这个Java程序被Python调用,判断这个打印值
返回值不为1而是256,对此解释是这样的
然后下面这段Python程序,使用echo $?判断返回值为0而不是256
1.4.2 dp8网卡问题
当时dp8的网络流量从一个非常大的值变为非常小的值,检查/proc/net/netstat,以下几个统计数值dp8和其他机器差距较大(相差1-2个数量级):
之后在dmesg上面发现如下线索:
[4804552.008858] bnx2: eth0 NIC Copper Link is Up, 100 Mbps full duplex dp8上的网卡速度被识别成100 Mbps了。,可能的原因如下:
我们的网线都是由 世xx联 提供的,质量应该不错,有两种情况需要优先排除。
1.4.3 修改资源限制
临时的修改方式可以通过ulimit来进行修改,也可以通过修改文件/etc/security/limits.conf来永久修改
1.4.4 CPU温度过高
这个问题是我在Ubuntu PC上面遇到的,明显的感觉就是运行速度变慢。然后在syslog里面出现如下日志:
1.4.5 sync hangup
1.4.6 更换glibc
@2013-05-23 https://docs.google.com/a/umeng.com/document/d/12dzJ3OhVlrEax3yIdz0k08F8tM8DDQva1wdrD3K49PI/edit 怀疑glibc版本存在问题,在dp45上操作但是出现问题。
我的操作顺序计划是这样的:
但是进行到2之后就发现cp不可用了,并且ls等命令也不能够使用了。原因非常简单,就是因为2之后libc.so.6没有对应的文件了,而cp,ls这些基本的命令依赖于这个动态链接库。
todo(dirlt): what is the correct way to do it(change glibc)
@2013-08-03
A copy of the C library was found in an unexpected directory | Blog : http://blog.i-al.net/2013/03/a-copy-of-the-c-library-was-found-in-an-unexpected-directory/
上面的链接给出了升级glibc的方法
1.4.7 允许不从tty执行sudo
修改/etc/sudoers文件,注释掉
1.4.8 ssh proxy
http://serverfault.com/questions/37629/how-do-i-do-multihop-scp-transfers
Date: 2014-06-17T10:30+0800
Org version 7.9.3f with Emacs version 24
Validate XHTML 1.0