容器监控:cAdvisor

为了能够获取到Docker容器的运行状态,用户可以通过Docker的stats命令获取到当前主机上运行容器的统计信息,可以查看容器的CPU利用率、内存使用量、网络IO总量以及磁盘IO总量等信息。

除了使用命令以外,用户还可以通过Docker提供的HTTP API查看容器详细的监控统计信息。

CAdvisor是Google开源的一款用于展示和分析容器运行状态的可视化工具。通过在主机上运行CAdvisor用户可以轻松的获取到当前主机上容器的运行统计信息,并以图表的形式向用户展示。
在本地运行CAdvisor也非常简单,直接运行一下命令即可:

docker run \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:rw \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --publish=8080:8080 \
  --detach=true \
  --name=cadvisor \
  google/cadvisor:latest

但是因为主机的8080端口被占用了,所以把上面的命令修改成如下的:

docker run \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:rw \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --publish=9095:9095 \
  --detach=true \
  --name=cadvisor \
  google/cadvisor:latest

但是启动后进行查看会有俩端口存在,一个时8080,另一个是9095.

通过如下步骤登陆到docker容器中查看命令的选项,会有一个-port参数,并且官网中也有明确的说明:

但是在使用的时候,却没法使用这个参数。

因此放弃使用docker方式部署,改用二进制的方式。

进入容器中查看命令选项

# docker exec -it cadvisor /bin/sh
/ # cd /usr/bin/
/usr/bin # ./cadvisor --help
Usage of ./cadvisor:
  -allow_dynamic_housekeeping
        Whether to allow the housekeeping interval to be dynamic (default true)
  -alsologtostderr
        log to standard error as well as files
  -application_metrics_count_limit int
        Max number of application metrics to store (per container) (default 100)
  -boot_id_file string
        Comma-separated list of files to check for boot-id. Use the first one that exists. (default "/proc/sys/kernel/random/boot_id")
  -bq_account string
        Service account email
  -bq_credentials_file string
        Credential Key file (pem)
  -bq_id string
        Client ID
  -bq_project_id string
        Bigquery project ID
  -bq_secret string
        Client Secret (default "notasecret")
  -collector_cert string
        Collector's certificate, exposed to endpoints for certificate based authentication.
  -collector_key string
        Key for the collector's certificate
  -container_hints string
        location of the container hints file (default "/etc/cadvisor/container_hints.json")
  -containerd string
        containerd endpoint (default "unix:///var/run/containerd.sock")
  -disable_metrics metrics
        comma-separated list of metrics to be disabled. Options are 'disk', 'network', 'tcp', 'udp', 'percpu', 'sched', 'process'. Note: tcp and udp are disabled by default due to high CPU usage. (default process,tcp,udp,sched)
  -docker string
        docker endpoint (default "unix:///var/run/docker.sock")
  -docker-tls
        use TLS to connect to docker
  -docker-tls-ca string
        path to trusted CA (default "ca.pem")
  -docker-tls-cert string
        path to client certificate (default "cert.pem")
  -docker-tls-key string
        path to private key (default "key.pem")
  -docker_env_metadata_whitelist string
        a comma-separated list of environment variable keys that needs to be collected for docker containers
  -docker_only
        Only report docker containers in addition to root stats
  -docker_root string
        DEPRECATED: docker root is read from docker info (this is a fallback, default: /var/lib/docker) (default "/var/lib/docker")
  -enable_load_reader
        Whether to enable cpu load reader
  -event_storage_age_limit string
        Max length of time for which to store events (per type). Value is a comma separated list of key values, where the keys are event types (e.g.: creation, oom) or "default" and the value is a duration. Default is applied to all non-specified event types (default "default=24h")
  -event_storage_event_limit string
        Max number of events to store (per type). Value is a comma separated list of key values, where the keys are event types (e.g.: creation, oom) or "default" and the value is an integer. Default is applied to all non-specified event types (default "default=100000")
  -global_housekeeping_interval duration
        Interval between global housekeepings (default 1m0s)
  -housekeeping_interval duration
        Interval between container housekeepings (default 1s)
  -http_auth_file string
        HTTP auth file for the web UI
  -http_auth_realm string
        HTTP auth realm for the web UI (default "localhost")
  -http_digest_file string
        HTTP digest file for the web UI
  -http_digest_realm string
        HTTP digest file for the web UI (default "localhost")
  -listen_ip string
        IP to listen on, defaults to all IPs
  -log_backtrace_at value
        when logging hits line file:N, emit a stack trace
  -log_cadvisor_usage
        Whether to log the usage of the cAdvisor container
  -log_dir string
        If non-empty, write log files in this directory
  -log_file string
        If non-empty, use this log file
  -logtostderr
        log to standard error instead of files
  -machine_id_file string
        Comma-separated list of files to check for machine-id. Use the first one that exists. (default "/etc/machine-id,/var/lib/dbus/machine-id")
  -max_housekeeping_interval duration
        Largest interval to allow between container housekeepings (default 1m0s)
  -max_procs int
        max number of CPUs that can be used simultaneously. Less than 1 for default (number of cores).
  -mesos_agent string
        Mesos agent address (default "127.0.0.1:5051")
  -mesos_agent_timeout duration
        Mesos agent timeout (default 10s)
  -port int
        port to listen (default 8080)
  -profiling
        Enable profiling via web interface host:port/debug/pprof/
  -prometheus_endpoint string
        Endpoint to expose Prometheus metrics on (default "/metrics")
  -skip_headers
        If true, avoid header prefixes in the log messages
  -stderrthreshold value
        logs at or above this threshold go to stderr (default 2)
  -storage_driver driver
        Storage driver to use. Data is always cached shortly in memory, this controls where data is pushed besides the local cache. Empty means none. Options are: <empty>, bigquery, elasticsearch, influxdb, kafka, redis, statsd, stdout
  -storage_driver_buffer_duration duration
        Writes in the storage driver will be buffered for this duration, and committed to the non memory backends as a single transaction (default 1m0s)
  -storage_driver_db string
        database name (default "cadvisor")
  -storage_driver_es_enable_sniffer
        ElasticSearch uses a sniffing process to find all nodes of your cluster by default, automatically
  -storage_driver_es_host string
        ElasticSearch host:port (default "http://localhost:9200")
  -storage_driver_es_index string
        ElasticSearch index name (default "cadvisor")
  -storage_driver_es_type string
        ElasticSearch type name (default "stats")
  -storage_driver_host string
        database host:port (default "localhost:8086")
  -storage_driver_influxdb_retention_policy string
        retention policy
  -storage_driver_kafka_broker_list string
        kafka broker(s) csv (default "localhost:9092")
  -storage_driver_kafka_ssl_ca string
        optional certificate authority file for TLS client authentication
  -storage_driver_kafka_ssl_cert string
        optional certificate file for TLS client authentication
  -storage_driver_kafka_ssl_key string
        optional key file for TLS client authentication
  -storage_driver_kafka_ssl_verify
        verify ssl certificate chain (default true)
  -storage_driver_kafka_topic string
        kafka topic (default "stats")
  -storage_driver_password string
        database password (default "root")
  -storage_driver_secure
        use secure connection with database
  -storage_driver_table string
        table name (default "stats")
  -storage_driver_user string
        database username (default "root")
  -storage_duration duration
        How long to keep data stored (Default: 2min). (default 2m0s)
  -store_container_labels
        convert container labels and environment variables into labels on prometheus metrics for each container. If flag set to false, then only metrics exported are container name, first alias, and image name (default true)
  -v value
        log level for V logs
  -version
        print cAdvisor version and exit
  -vmodule value
        comma-separated list of pattern=N settings for file-filtered logging

使用二进制方式部署

cd /home/cadvisor-0.37.0
wget https://github.com/google/cadvisor/releases/download/v0.37.0/cadvisor
# 普通本地运行:./cadvisor  -port=8080 &>>/var/log/cadvisor.log

使用service服务管理程序

# chown -R prometheus:prometheus /home/cadvisor-0.37.0
# chmod -R 777 /home/cadvisor-0.37.0   #防止因为selinux出现这个启动错误:Failed at step EXEC spawning /home/cadvisor-0.37.0/cadvisor: Permission denied

# vim /usr/lib/systemd/system/cadvisor.service
[Unit]
Description=cadvisor
Documentation=https://github.com/google/cadvisor/tree/master/docs
After=network.target

[Service]
Type=simple
User=prometheus
ExecStart=/home/cadvisor-0.37.0/cadvisor -port 9096
Restart=on-failure

[Install]
WantedBy=multi-user.target

通过访问http://localhost:9096可以查看,当前主机上容器的运行状态,如下所示:

下面表格中列举了一些CAdvisor中获取到的典型监控指标:

指标名称 类型 含义
container_cpu_load_average_10s gauge 过去10秒容器CPU的平均负载
container_cpu_usage_seconds_total counter 容器在每个CPU内核上的累积占用时间 (单位:秒)
container_cpu_system_seconds_total counter System CPU累积占用时间(单位:秒)
container_cpu_user_seconds_total counter User CPU累积占用时间(单位:秒)
container_fs_usage_bytes gauge 容器中文件系统的使用量(单位:字节)
container_fs_limit_bytes gauge 容器可以使用的文件系统总量(单位:字节)
container_fs_reads_bytes_total counter 容器累积读取数据的总量(单位:字节)
container_fs_writes_bytes_total counter 容器累积写入数据的总量(单位:字节)
container_memory_max_usage_bytes gauge 容器的最大内存使用量(单位:字节)
container_memory_usage_bytes gauge 容器当前的内存使用量(单位:字节
container_spec_memory_limit_bytes gauge 容器的内存使用量限制
machine_memory_bytes gauge 当前主机的内存总量
container_network_receive_bytes_total counter 容器网络累积接收数据总量(单位:字节)
container_network_transmit_bytes_total counter 容器网络累积传输数据总量(单位:字节)

与Prometheus集成

修改/etc/prometheus/prometheus.yml,将cAdvisor添加监控数据采集任务目标当中:

- job_name: cadvisor
  static_configs:
  - targets:
    - localhost:9096

重启Prometheus服务,查看

posted @ 2020-09-01 17:04  哈喽哈喽111111  阅读(1436)  评论(0编辑  收藏  举报