monit官方摘录
Here are the legal global keywords:
Keyword Function ---------------------------------------------------------------- set daemon Set a background poll interval in seconds. set init Set Monit to run from init. Monit will not transform itself into a daemon process. set logfile Name of a file to dump error- and status- messages to. If syslog is specified as the file, Monit will utilize the syslog daemon to log messages. This can optionally be followed by 'facility <facility>' where facility is 'log_local0' - 'log_local7' or 'log_daemon'. If no facility is specified, LOG_USER is used. set mailserver The mailserver used for sending alert notifications. If the mailserver is not defined, Monit will try to use 'localhost' as the smtp-server for sending mail. You can add more mail servers, if Monit cannot connect to the first server it will try the next server and so on. set mail-format Set a global mail format for all alert messages emitted by monit. set idfile Explicit set the location of the Monit id file. E.g. set idfile /var/monit/id. set pidfile Explicit set the location of the Monit lock file. E.g. set pidfile /var/run/xyzmonit.pid. set statefile Explicit set the location of the file Monit will write state data to. If not set, the default is $HOME/.monit.state. set httpd port Activates Monit http server at the given port number.ssl enable Enables ssl support for the httpd server. Requires the use of the pemfile statement. ssl disable Disables ssl support for the httpd server. It is equal to omitting any ssl statement. pemfile Set the pemfile to be used with ssl. clientpemfile Set the pemfile to be used when client certificates should be checked by monit. address If specified, the http server will only accept connect requests to this addresses This statement is an optional part of the set httpd statement. allow Specifies a host or IP address allowed to connect to the http server. Can also specify a username and password allowed to connect to the server. More than one allow statement are allowed. This statement is also an optional part of the set httpd statement. read-only Set the user defined in username:password to read only. A read-only user cannot change a service from the Monit web interface. include include a file or files matching the globstringHere are the legal service entry keywords:Keyword Function ---------------------------------------------------------------- check Starts an entry and must be followed by the type of monitored service {filesystem|directory|file|host process|system|program} and a descriptive name for the service. pidfile Specify the process pidfile. Every process must create a pidfile with its current process id. This statement should only be used in a process service entry. path Must be followed by a path to the block special file for filesystem, regular file, directory or a process's pidfile. group Specify a groupname for a service entry. start The program used to start the specified service. Full path is required. This statement is optional, but recommended. stop The program used to stop the specified service. Full path is required. This statement is optional, but recommended. pid and ppid These keywords may be used as standalone statements in a process service entry to override the alert action for change of process pid and ppid. uid and gid These keywords are either 1) an optional part of a start, stop or exec statement. They may be used to specify a user id and a group id the program (process) should switch to upon start. This feature can only be used if the superuser is running monit. 2) uid and gid may also be used as standalone statements in a file service entry to test a file's uid and gid attributes. host The hostname or IP address to test the port at. This keyword can only be used together with a port statement or in the check host statement. port Specify a TCP/IP service port number which a process is listening on. This statement is also optional. If this statement is not prefixed with a host-statement, localhost is used as the hostname to test the port at. type Specifies the socket type Monit should use when testing a connection to a port. If the type keyword is omitted, tcp is used. This keyword must be followed by either tcp, udp or tcpssl. tcp Specifies that Monit should use a TCP socket type (stream) when testing a port. tcpssl Specifies that Monit should use a TCP socket type (stream) and the secure socket layer (ssl) when testing a port connection. udp Specifies that Monit should use a UDP socket type (datagram) when testing a port. certmd5 The md5 sum of a certificate a ssl forged server has to deliver. proto(col) This keyword specifies the type of service found at the port. See CONNECTION TESTING for list of supported protocols. You're welcome to write new protocol test modules. If no protocol is specified Monit will use a default test which in most cases are good enough. request Specifies a server request and must come after the protocol keyword mentioned above. - for http it can contain an URL and an optional query string. - other protocols does not support this statement yet send/expect These keywords specify a generic protocol. Both require a string whether to be sent or to be matched against (as extended regex if supported). Send/expect can not be used together with the proto(col) statement. unix(socket) Specifies a Unix socket file and used like the port statement above to test a Unix domain network socket connection. URL Specify an URL string which Monit will use for connection testing. content Optional sub-statement for the URL statement. Specifies that Monit should test the content returned by the server against a regular expression. timeout x sec. Define a network port connection timeout. Must be followed by a number in seconds and the keyword, seconds. timeout Define a service timeout. Must be followed by two digits. The first digit is max number of restarts for the service. The second digit is the cycle interval to test restarts. This statement is optional. alert Specifies an email address for notification if a service event occurs. Alert can also be postfixed, to only send a message for certain events. See the examples above. More than one alert statement is allowed in an entry. This statement is also optional. noalert Specifies an email address which don't want to receive alerts. This statement is also optional. restart, stop These keywords may be used as actions for unmonitor, various test statements. The exec statement is start and special in that it requires a following string exec specifying the program to be execute. You may also specify an UID and GID for the exec statement. The program executed will then run using the specified user id and group id. mail-format Specifies a mail format for an alert message This statement is an optional part of the alert statement. checksum Specify that Monit should compute and monitor a file's md5/sha1 checksum. May only be used in a check file entry. expect Specifies a md5/sha1 checksum string Monit should expect when testing the checksum. This statement is an optional part of the checksum statement. timestamp Specifies an expected timestamp for a file or directory. More than one timestamp statement are allowed. May only be used in a check file or check directory entry. changed Part of a timestamp statement and used as an operator to simply test for a timestamp change. every Validate this entry only at every n poll cycle or per cron specification. Useful in daemon mode when the cycle is short and a service takes some time to start or to suppress monitoring during backup windows. mode Must be followed either by the keyword active, passive or manual. If active, Monit will restart the service if it is not running (this is the default behavior). If passive, Monit will not (re)start the service if it is not running - it will only monitor and send alerts (resource related restart and stop options are ignored in this mode also). If manual, Monit will enter active mode only if a service was started under monit's control otherwise the service isn't monitored. cpu Must be followed by a compare operator, a number with "%" and an action. This statement is used to check the cpu usage in percent of a process with its children over a number of cycles. If the compare expression matches then the specified action is executed. mem The equivalent to the cpu token for memory of a process (w/o children!). This token must be followed by a compare operator a number with unit {B|KB|MB|GB|%|byte|kilobyte|megabyte| gigabyte|percent} and an action. swap Token for system swap usage monitoring. This token must be followed by a compare operator a number with unit {B|KB|MB|GB|%|byte|kilobyte|megabyte|gigabyte|percent} and an action. loadavg Must be followed by [1min,5min,15min] in (), a compare operator, a number and an action. This statement is used to check the system load average over a number of cycles. If the compare expression matches then the specified action is executed. children This is the number of child processes spawn by a process. The syntax is the same as above. totalmem The equivalent of mem, except totalmem is an aggregation of memory, not only used by a process but also by all its child processes. The syntax is the same as above. space Must be followed by a compare operator, a number, unit {B|KB|MB|GB|%|byte|kilobyte| megabyte|gigabyte|percent} and an action. inode(s) Must be followed by a compare operator, integer number, optionally by percent sign (if not, the limit is absolute) and an action. perm(ission) Must be followed by an octal number describing the permissions. size Must be followed by a compare operator, a number, unit {B|KB|MB|GB|byte|kilobyte| megabyte|gigabyte} and an action. uptime Must be followed by a compare operator, a number, unit {second(s)|minute(s)|hour(s)|day(s)} and an action. depends (on) Must be followed by the name of a service this service depends on.每一进程都有pid,存放在/var/run/monit.pid 这个文件里面会存放一个数值
CONFIGURATION EXAMPLES
The simplest form is just the check statement. In this example wecheck to see if the server is running and log a message if not:
check process resin with pidfile /usr/local/resin/srun.pid
Checking process without pidfile:
check process pager matching "/sbin/dynamic_pager -F /private/var/vm/swapfile"
To have Monit start the server if it's not running, add a startstatement:
check process resin with pidfile /usr/local/resin/srun.pid start program = "/usr/local/resin/bin/srun.sh start" stop program = "/usr/local/resin/bin/srun.sh stop"
Here's a more advanced example for monitoring an apacheweb-server listening on the default port number for HTTP andHTTPS. In this example Monit will restart apache if it's notaccepting connections at the port numbers. The method Monit usefor a process restart is to first execute the stop-program, waitup to 30s for the process to stop and then execute the start-programand wait up to 30s for it to start. The length of start or stoptimeout can be overridden using the 'timeout' option. If Monit wasunable to stop or start the service a failed alert message willbe sent if you have requested alert messages to be sent.
check process apache with pidfile /var/run/httpd.pid start program = "/etc/init.d/httpd start" with timeout 60 seconds stop program = "/etc/init.d/httpd stop" if failed port 80 then restart if failed port 443 with timeout 15 seconds then restart
This example demonstrate how you can run a program as a specifieduser (uid) and with a specified group (gid). Many daemon programswill do the uid and gid switch by them self, but for thoseprograms that does not (e.g. Java programs), monit's ability tostart a program as a certain user can be very useful. In thisexample we start the Tomcat Java Servlet Engine as the standardnobody user and group. Please note that Monit will only switchuid and gid for a program if the super-user is running monit,otherwise Monit will simply ignore the request to change uid andgid.
check process tomcat with pidfile /var/run/tomcat.pid start program = "/etc/init.d/tomcat start" as uid nobody and gid nobody stop program = "/etc/init.d/tomcat stop" # You can also use id numbers instead and write: as uid 99 and with gid 99 if failed port 8080 then alert
In this example we use udp for connection testing to check if thename-server is running and also use timeout and alert:
check process named with pidfile /var/run/named.pid start program = "/etc/init.d/named start" stop program = "/etc/init.d/named stop" if failed port 53 use type udp protocol dns then restart if 3 restarts within 5 cycles then timeout
The following example illustrates how to check if the service'sophie' is answering connections on its Unix domain socket:
check process sophie with pidfile /var/run/sophie.pid start program = "/etc/init.d/sophie start" stop program = "/etc/init.d/sophie stop" if failed unix /var/run/sophie then restart
In this example we check an apache web-server running onlocalhost that answers for several IP-based virtual hosts orvhosts, hence the host statement before port:
check process apache with pidfile /var/run/httpd.pid start "/etc/init.d/httpd start" stop "/etc/init.d/httpd stop" if failed host www.sol.no port 80 then alert if failed host shop.sol.no port 443 then alert if failed host chat.sol.no port 80 then alert if failed host www.tildeslash.com port 80 then alert
To make sure that Monit is communicating with a http server aprotocol test can be added:
check process apache with pidfile /var/run/httpd.pid start "/etc/init.d/httpd start" stop "/etc/init.d/httpd stop" if failed host www.sol.no port 80 protocol HTTP then alert
This example shows a different way to check a webserver usingthe send/expect mechanism:
check process apache with pidfile /var/run/httpd.pid start "/etc/init.d/httpd start" stop "/etc/init.d/httpd stop" if failed host www.sol.no port 80 send "GET / HTTP/1.0\r\nHost: www.sol.no\r\n\r\n" expect "HTTP/[0-9\.]{3} 200 .*\r\n" then alert
To make sure that Apache is logging successfully (i.e. no more than 60 percent of child servers are logging), use its mod_statuspage at www.sol.no/server-status with this special protocol test:
check process apache with pidfile /var/run/httpd.pid start "/etc/init.d/httpd start" stop "/etc/init.d/httpd stop" if failed host www.sol.no port 80 protocol apache-status loglimit > 60% then restart
This configuration can be used to alert you if 25 percent or moreof Apache child processes are stuck performing DNS lookups:
check process apache with pidfile /var/run/httpd.pid start "/etc/init.d/httpd start" stop "/etc/init.d/httpd stop" if failed host www.sol.no port 80 protocol apache-status dnslimit > 25% then alert
Here we use an icmp ping test to check if a remote host is up andif not send an alert:
check host www.tildeslash.com with address www.tildeslash.com if failed icmp type echo count 5 with timeout 15 seconds then alert
In the following example we ask Monit to compute and verify thechecksum for the underlying apache binary used by the start andstop programs. If the the checksum test should fail, monitoringwill be disabled to prevent possibly starting a compromisedbinary:
check process apache with pidfile /var/run/httpd.pid start program = "/etc/init.d/httpd start" stop program = "/etc/init.d/httpd stop" if failed host www.tildeslash.com port 80 then restart depends on apache_bin
check file apache_bin with path /usr/local/apache/bin/httpd if failed checksum then unmonitor
In this example we ask Monit to test the checksum for a documenton a remote server. If the checksum was changed we send an alert:
check host tildeslash with address www.tildeslash.com if failed port 80 protocol http and request "/monit/dist/monit-4.0.tar.gz" with checksum f9d26b8393736b5dfad837bb13780786 then alert
Here are a couple of tests for some popular communicationservers, using the SIP protocol. First we test a FreeSWITCHserver and then an Asterisk server
check process freeswitch with pidfile /usr/local/freeswitch/log/freeswitch.pid start program = “/usr/local/freeswitch/bin/freeswitch -nc -hp” stop program = “/usr/local/freeswitch/bin/freeswitch -stop” if totalmem > 1000.0 MB for 5 cycles then alert if totalmem > 1500.0 MB for 5 cycles then alert if totalmem > 2000.0 MB for 5 cycles then restart if cpu > 60% for 5 cycles then alert if failed port 5060 type udp protocol SIP target me@foo.bar and maxforward 10 then restart if 5 restarts within 5 cycles then timeout
check process asterisk with pidfile /var/run/asterisk/asterisk.pid start program = “/usr/sbin/asterisk” stop program = “/usr/sbin/asterisk -r -x ’shutdown now’” if totalmem > 1000.0 MB for 5 cycles then alert if totalmem > 1500.0 MB for 5 cycles then alert if totalmem > 2000.0 MB for 5 cycles then restart if cpu > 60% for 5 cycles then alert if failed port 5060 type udp protocol SIP and target me@foo.bar maxforward 10 then restart if 5 restarts within 5 cycles then timeout
Some servers are slow starters, like for example Java basedApplication Servers. So if we want to keep the poll-cycle low(i.e. < 60 seconds) but allow some services to take its time tostart, the every statement is handy:
check process dynamo with pidfile /etc/dynamo.pid every 2 cycles start program = "/etc/init.d/dynamo start" stop program = "/etc/init.d/dynamo stop" if failed port 8840 then alert
Here is an example where we group together two database entriesso you can manage them together, e.g.; 'Monit -g database startall'. The mode statement is also illustrated in the first entryand have the effect that Monit will not try to (re)start thisservice if it is not running:
check process sybase with pidfile /var/run/sybase.pid start = "/etc/init.d/sybase start" stop = "/etc/init.d/sybase stop" mode passive group database
check process oracle with pidfile /var/run/oracle.pid start program = "/etc/init.d/oracle start" stop program = "/etc/init.d/oracle stop" mode active # Not necessary really, since it's the default if failed port 9001 then restart group database
Here is an example to show the usage of the resource checks. Itwill send an alert when the CPU usage of the http daemon and itschild processes raises beyond 60% for over two cycles. Apache isrestarted if the CPU usage is over 80% for five cycles or thememory usage over 100Mb for five cycles or if the machines loadaverage is more than 10 for 8 cycles:
check process apache with pidfile /var/run/httpd.pid start program = "/etc/init.d/httpd start" stop program = "/etc/init.d/httpd stop" if cpu > 40% for 2 cycles then alert if totalcpu > 60% for 2 cycles then alert if totalcpu > 80% for 5 cycles then restart if mem > 100 MB for 5 cycles then stop if loadavg(5min) greater than 10.0 for 8 cycles then stop
This examples demonstrate the timestamp statement with exec andhow you may restart apache if its configuration file waschanged.
check file httpd.conf with path /etc/httpd/httpd.conf if changed timestamp then exec "/etc/init.d/httpd graceful"
In this example we demonstrate usage of the extended alertstatement and a file check dependency:
check process apache with pidfile /var/run/httpd.pid start = "/etc/init.d/httpd start" stop = "/etc/init.d/httpd stop" alert admin@bar on {nonexist, timeout} with mail-format { from: bofh@$HOST subject: apache $EVENT - $ACTION message: This event occurred on $HOST at $DATE. Your faithful employee, monit } if failed host www.tildeslash.com port 80 then restart if 3 restarts within 5 cycles then timeout depend httpd_bin group apache
check file httpd_bin with path /usr/local/apache/bin/httpd alert security@bar on {checksum, timestamp, permission, uid, gid} with mail-format {subject: Alaaarrm! on $HOST} if failed checksum and expect 8f7f419955cefa0b33a2ba316cba3659 then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor if changed timestamp then alert group apache
In this example, we demonstrate usage of the depend statement. Inthis case, we want to start oracle and apache. However, we've setup apache to use oracle as a back end, and if oracle isrestarted, apache must be restarted as well.
check process apache with pidfile /var/run/httpd.pid start = "/etc/init.d/httpd start" stop = "/etc/init.d/httpd stop" depends on oracle
check process oracle with pidfile /var/run/oracle.pid start = "/etc/init.d/oracle start" stop = "/etc/init.d/oracle stop" if failed port 9001 then restart
Next, we have 2 services, oracle-import and oracle-export thatneed to be restarted if oracle is restarted, but are independentof each other.
check process oracle with pidfile /var/run/oracle.pid start = "/etc/init.d/oracle start" stop = "/etc/init.d/oracle stop" if failed port 9001 then restart
check process oracle-import with pidfile /var/run/oracle-import.pid start = "/etc/init.d/oracle-import start" stop = "/etc/init.d/oracle-import stop" depends on oracle
check process oracle-export with pidfile /var/run/oracle-export.pid start = "/etc/init.d/oracle-export start" stop = "/etc/init.d/oracle-export stop" depends on oracle
Finally an example with all statements:
check process apache with pidfile /var/run/httpd.pid start program = "/etc/init.d/httpd start" stop program = "/etc/init.d/httpd stop" if 3 restarts within 5 cycles then timeout if failed host www.sol.no port 80 protocol http and use the request "/login.cgi" then alert if failed host shop.sol.no port 443 type tcpssl protocol http and with timeout 15 seconds then restart if cpu is greater than 60% for 2 cycles then alert if cpu > 80% for 5 cycles then restart if totalmem > 100 MB then stop if children > 200 then alert alert bofh@bar with mail-format {from: monit@foo.bar.no} every 2 cycles mode active depends on weblogic depends on httpd.pid depends on httpd.conf depends on httpd_bin depends on datafs group server
check file httpd.pid with path /usr/local/apache/logs/httpd.pid group server if timestamp > 7 days then restart every 2 cycles alert bofh@bar with mail-format {from: monit@foo.bar.no} depends on datafs
check file httpd.conf with path /etc/httpd/httpd.conf group server if timestamp was changed then exec "/usr/local/apache/bin/apachectl graceful" every 2 cycles alert bofh@bar with mail-format {from: monit@foo.bar.no} depends on datafs
check file httpd_bin with path /usr/local/apache/bin/httpd group server if failed checksum and expect the sum 8f7f419955cefa0b33a2ba316cba3659 then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor if changed size then alert if changed timestamp then alert every 2 cycles alert bofh@bar with mail-format {from: monit@foo.bar.no} alert foo@bar on { checksum, size, timestamp, uid, gid } depends on datafs
check filesystem datafs with path /dev/sdb1 group server start program = "/bin/mount /data" stop program = "/bin/umount /data" if failed permission 660 then unmonitor if failed uid root then unmonitor if failed gid disk then unmonitor if space usage > 80 % then alert if space usage > 94 % then stop if inode usage > 80 % then alert if inode usage > 94 % then stop alert root@localhost
check host ftp.redhat.com with address ftp.redhat.com if failed icmp type echo with timeout 15 seconds then alert if failed port 21 protocol ftp then exec "/usr/X11R6/bin/xmessage -display :0 ftp connection failed" alert foo@bar.com check host www.gnu.org with address www.gnu.org if failed port 80 protocol http and request "/pub/gnu/bash/bash-2.05b.tar.gz" with checksum 8f7f419955cefa0b33a2ba316cba3659 then alert alert rms@gnu.org with mail-format { subject: The gnu server may be hacked again! }