ansible-galaxy 使用 prometheus-community/ansible 社区 Collection 安装 node-exporter
前提条件
- 安装 ansible (推荐使用
pip3 install ansible
)
获取 prometheus collection 说明
找到 prometheus-commulity
社区开源仓库,https://github.com/prometheus-community/ansible,根据说明文档跳转到文档页面 https://prometheus-community.github.io/ansible/branch/main/
可以发现,社区官方维护的 ansible-collection 已经包含了诸多常见的 role 角色
我们点开 node_exporter
role 的介绍页面,下面便是此 node_exporter
role 相关的一些关键变量:
Parameter | Comments |
---|---|
node_exporter_basic_auth_users dictionary |
Dictionary of users and password for basic authentication. Passwords are automatically hashed with bcrypt. |
node_exporter_binary_install_dir string |
Advanced Directory to install node_exporter binary Default: "/usr/local/bin" |
node_exporter_binary_url string |
URL of the node exporter binaries .tar.gz file Default: "https://github.com/{{ _node_exporter_repo }}/releases/download/v{{ node_exporter_version }}/node_exporter-{{ node_exporter_version }}.{{ ansible_system | lower }}-{{ _node_exporter_go_ansible_arch }}.tar.gz" |
node_exporter_checksums_url string |
URL of the node exporter checksums file Default: "https://github.com/{{ _node_exporter_repo }}/releases/download/v{{ node_exporter_version }}/sha256sums.txt" |
node_exporter_config_dir string |
Path to directory with node_exporter configuration Default: "/etc/node_exporter" |
node_exporter_disabled_collectors list / elements=string |
List of disabled collectors. By default node_exporter disables collectors listed here. |
node_exporter_enabled_collectors list / elements=string |
List of dicts defining additionally enabled collectors and their configuration. It adds collectors to those enabled by default. Default: ["systemd", {"textfile": {"directory": "{{ node_exporter_textfile_dir }}"}}] |
node_exporter_http_server_config dictionary |
Config for HTTP/2 support. Keys and values are the same as in node_exporter docs. |
node_exporter_local_cache_path string |
Local path to stash the archive and its extraction Default: "/tmp/node_exporter-{{ ansible_system | lower }}-{{ _node_exporter_go_ansible_arch }}/{{ node_exporter_version }}" |
node_exporter_system_group string |
Advanced System group for node exporter Default: "node-exp" |
node_exporter_system_user string |
Advanced Node exporter user Default: "node-exp" |
node_exporter_textfile_dir string |
Directory used by the Textfile Collector. To get permissions to write metrics in this directory, users must be in node-exp system group.Note: More information in TROUBLESHOOTING.md guide. Default: "/var/lib/node_exporter" |
node_exporter_tls_server_config dictionary |
Configuration for TLS authentication. Keys and values are the same as in node_exporter docs. |
node_exporter_version string |
Node exporter package version. Also accepts latest as parameter. Default: "1.8.2" |
node_exporter_web_disable_exporter_metrics boolean |
Exclude metrics about the exporter itself (promhttp_, process_, go_*). Choices: - **false** ← (default)- true |
node_exporter_web_listen_address string |
Address on which node exporter will listen Default: "0.0.0.0:9100" |
node_exporter_web_telemetry_path string |
Path under which to expose metrics Default: "/metrics" |
安装 Collection
安装方式其实有两种,我们接下来分别介绍两种安装方法
方式一:ansible-galaxy 仓库安装
我们在 https://galaxy.ansible.com/ 查找 promehteus
的 collection,查找到的便是 prometheus-commulity
社区贡献的 Ansible Collections 集合:
在 ansible 管理机通过 galaxy 仓库安装 prometheus.prometheus
Collection 集合:
ansible-galaxy collection install prometheus.prometheus:0.23.0
方式二:通过 github 源码仓库安装
>_ ansible-galaxy collection install git+https://github.com/prometheus-community/ansible.git
Cloning into '/root/.ansible/tmp/ansible-local-143093dqnngbq4/tmpmhi9qxg0/ansible9uq0qwat'...
remote: Enumerating objects: 774, done.
remote: Counting objects: 100% (774/774), done.
remote: Compressing objects: 100% (389/389), done.
remote: Total 774 (delta 302), reused 588 (delta 232), pack-reused 0 (from 0)
Receiving objects: 100% (774/774), 156.00 KiB | 1.53 MiB/s, done.
Resolving deltas: 100% (302/302), done.
Your branch is up to date with 'origin/main'.
Starting galaxy collection install process
Process install dependency map
Starting collection install process
Installing 'prometheus.prometheus:0.23.1' to '/root/.ansible/collections/ansible_collections/prometheus/prometheus'
Created collection for prometheus.prometheus:0.23.1 at /root/.ansible/collections/ansible_collections/prometheus/prometheus
prometheus.prometheus:0.23.1 was installed successfully
'community.general:10.1.0' is already installed, skipping.
查看本机已安装的 Collection
>_ ansible-galaxy collection list
# /usr/lib/python3.9/site-packages/ansible_collections
Collection Version
----------------------------- -------
amazon.aws 3.4.0
ansible.netcommon 3.1.0
....
# /root/.ansible/collections/ansible_collections
Collection Version
--------------------- -------
community.general 10.1.0
prometheus.prometheus 0.23.1
可以看到,除了系统自带的一些 collection,还有我们刚安装的 prometheus.prometheus 0.23.1
,而它所依赖的 commulity.general 10.1.0
也在这里。
安装 node_exporter
我们在 inventory 准备好需要安装的节点组信息。
常使用 ping 模块来测试连接:
>_ ansible all -i hosts.yaml -m ping
flink-1 | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": false,
"ping": "pong"
}
返回 pong
即代表正常。
准备 playbook
>_ cat install_node_exporter.yaml
- hosts: flink
collections:
- prometheus.prometheus
tasks:
- import_role:
name: node_exporter
尝试执行 playbook 任务
我们如果不确定所包含的任务是否能正确执行,可以使用 -C
参数来进行 try-run 安装,不会实际修改目标节点的任何文件:
>_ ansible-playbook -i hosts.yaml install_node_exporter.yaml -C
PLAY [flink] *************************************************************************************************************************************************************************************
TASK [Gathering Facts] ***************************************************************************************************************************************************************************
ok: [flink-1]
TASK [prometheus.prometheus.node_exporter : Validating arguments against arg spec 'main' - Prometheus Node Exporter] *****************************************************************************
ok: [flink-1]
TASK [prometheus.prometheus.node_exporter : Preflight] *******************************************************************************************************************************************
included: /root/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/preflight.yml for flink-1
TASK [Common preflight] **************************************************************************************************************************************************************************
TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Check for deprecated skip_install variable] ****************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Check for deprecated binary_local_dir variable] ************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Check for deprecated archive_path variable] ****************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Assert usage of systemd as an init system] *****************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Install dependencies] **************************************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus._common : Gather package facts] **************************************************************************************************************************************
ok: [flink-1]
TASK [prometheus.prometheus._common : Naive assertion of proper listen address] ******************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus.node_exporter : Assert that used version supports listen address type] ***********************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus.node_exporter : Assert collectors are not both disabled and enabled at the same time] ********************************************************************************
TASK [prometheus.prometheus.node_exporter : Assert that TLS key and cert path are set] ***********************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Check existence of TLS cert file] ********************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Check existence of TLS key file] *********************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Assert that TLS key and cert are present] ************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Discover latest version] *****************************************************************************************************************************
skipping: [flink-1]
TASK [Install] ***********************************************************************************************************************************************************************************
TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Gather system user and group facts] ************************************************************************************************************************
ok: [flink-1] => (item=passwd)
ok: [flink-1] => (item=group)
TASK [prometheus.prometheus._common : Create system group node-exp] ******************************************************************************************************************************
changed: [flink-1]
TASK [prometheus.prometheus._common : Create system user node-exp] *******************************************************************************************************************************
changed: [flink-1]
TASK [prometheus.prometheus._common : Create localhost binary cache path] ************************************************************************************************************************
ok: [flink-1 -> localhost]
TASK [prometheus.prometheus._common : Get checksum list for node_exporter-1.8.2.linux-amd64.tar.gz] **********************************************************************************************
ok: [flink-1]
TASK [prometheus.prometheus._common : Download node_exporter-1.8.2.linux-amd64.tar.gz] ***********************************************************************************************************
ok: [flink-1 -> localhost]
TASK [prometheus.prometheus._common : Unpack binary archive node_exporter-1.8.2.linux-amd64.tar.gz] **********************************************************************************************
ok: [flink-1 -> localhost]
TASK [prometheus.prometheus._common : Check existence of binary install dir] *********************************************************************************************************************
ok: [flink-1]
TASK [prometheus.prometheus._common : Make sure binary install dir exists] ***********************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus._common : Propagate binaries] ****************************************************************************************************************************************
changed: [flink-1] => (item=node_exporter)
TASK [SELinux] ***********************************************************************************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Configure] *******************************************************************************************************************************************
included: /root/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/configure.yml for flink-1
TASK [Configure] *********************************************************************************************************************************************************************************
TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Create systemd service unit node_exporter] *****************************************************************************************************************
changed: [flink-1]
TASK [prometheus.prometheus._common : Create config dir /etc/node_exporter] **********************************************************************************************************************
[WARNING]: failed to look up user node-exp. Create user up to this point in real play
[WARNING]: failed to look up group node-exp. Create group up to this point in real play
changed: [flink-1]
TASK [prometheus.prometheus._common : Install web config for node_exporter] **********************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Create textfile collector dir] ***********************************************************************************************************************
changed: [flink-1]
TASK [prometheus.prometheus.node_exporter : Ensure Node Exporter is enabled on boot] *************************************************************************************************************
skipping: [flink-1]
RUNNING HANDLER [prometheus.prometheus.node_exporter : Restart node_exporter] ********************************************************************************************************************
skipping: [flink-1]
PLAY RECAP ***************************************************************************************************************************************************************************************
flink-1 : ok=26 changed=6 unreachable=0 failed=0 skipped=12 rescued=0 ignored=0
执行 playbook 脚本任务,安装 node_exporter
注意:如果担心 role 有额外的步骤影响目标节点,可以使用
--step
参数进行安装,此时执行的脚本任务每一个任务都需要手动敲如 y/n 进行确认执行。
>_ ansible-playbook -i hosts.yaml install_node_exporter.yaml --step
PLAY [flink] *************************************************************************************************************************************************************************************
Perform task: TASK: Gathering Facts (N)o/(y)es/(c)ontinue: y
....
此处,我们就直接执行安装了:
>_ ansible-playbook -i hosts.yaml install_node_exporter.yaml
PLAY [flink] *************************************************************************************************************************************************************************************
TASK [Gathering Facts] ***************************************************************************************************************************************************************************
ok: [flink-1]
TASK [prometheus.prometheus.node_exporter : Validating arguments against arg spec 'main' - Prometheus Node Exporter] *****************************************************************************
ok: [flink-1]
TASK [prometheus.prometheus.node_exporter : Preflight] *******************************************************************************************************************************************
included: /root/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/preflight.yml for flink-1
TASK [Common preflight] **************************************************************************************************************************************************************************
TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Check for deprecated skip_install variable] ****************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Check for deprecated binary_local_dir variable] ************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Check for deprecated archive_path variable] ****************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Assert usage of systemd as an init system] *****************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Install dependencies] **************************************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus._common : Gather package facts] **************************************************************************************************************************************
ok: [flink-1]
TASK [prometheus.prometheus._common : Naive assertion of proper listen address] ******************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus.node_exporter : Assert that used version supports listen address type] ***********************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus.node_exporter : Assert collectors are not both disabled and enabled at the same time] ********************************************************************************
TASK [prometheus.prometheus.node_exporter : Assert that TLS key and cert path are set] ***********************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Check existence of TLS cert file] ********************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Check existence of TLS key file] *********************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Assert that TLS key and cert are present] ************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Discover latest version] *****************************************************************************************************************************
skipping: [flink-1]
TASK [Install] ***********************************************************************************************************************************************************************************
TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Gather system user and group facts] ************************************************************************************************************************
ok: [flink-1] => (item=passwd)
ok: [flink-1] => (item=group)
TASK [prometheus.prometheus._common : Create system group node-exp] ******************************************************************************************************************************
changed: [flink-1]
TASK [prometheus.prometheus._common : Create system user node-exp] *******************************************************************************************************************************
changed: [flink-1]
TASK [prometheus.prometheus._common : Create localhost binary cache path] ************************************************************************************************************************
ok: [flink-1 -> localhost]
TASK [prometheus.prometheus._common : Get checksum list for node_exporter-1.8.2.linux-amd64.tar.gz] **********************************************************************************************
ok: [flink-1]
TASK [prometheus.prometheus._common : Download node_exporter-1.8.2.linux-amd64.tar.gz] ***********************************************************************************************************
ok: [flink-1 -> localhost]
TASK [prometheus.prometheus._common : Unpack binary archive node_exporter-1.8.2.linux-amd64.tar.gz] **********************************************************************************************
ok: [flink-1 -> localhost]
TASK [prometheus.prometheus._common : Check existence of binary install dir] *********************************************************************************************************************
ok: [flink-1]
TASK [prometheus.prometheus._common : Make sure binary install dir exists] ***********************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus._common : Propagate binaries] ****************************************************************************************************************************************
changed: [flink-1] => (item=node_exporter)
TASK [SELinux] ***********************************************************************************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Configure] *******************************************************************************************************************************************
included: /root/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/configure.yml for flink-1
TASK [Configure] *********************************************************************************************************************************************************************************
TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
"changed": false
}
MSG:
All assertions passed
TASK [prometheus.prometheus._common : Create systemd service unit node_exporter] *****************************************************************************************************************
changed: [flink-1]
TASK [prometheus.prometheus._common : Create config dir /etc/node_exporter] **********************************************************************************************************************
changed: [flink-1]
TASK [prometheus.prometheus._common : Install web config for node_exporter] **********************************************************************************************************************
skipping: [flink-1]
TASK [prometheus.prometheus.node_exporter : Create textfile collector dir] ***********************************************************************************************************************
changed: [flink-1]
TASK [prometheus.prometheus.node_exporter : Ensure Node Exporter is enabled on boot] *************************************************************************************************************
changed: [flink-1]
RUNNING HANDLER [prometheus.prometheus.node_exporter : Restart node_exporter] ********************************************************************************************************************
changed: [flink-1]
PLAY RECAP ***************************************************************************************************************************************************************************************
flink-1 : ok=28 changed=8 unreachable=0 failed=0 skipped=10 rescued=0 ignored=0
检查安装效果
我们在目标节点查看服务列表:
>_ systemctl list-unit-files -t service | grep node_exporter
node_exporter.service enabled disabled
>_ systemctl status node_exporter.service
● node_exporter.service - Prometheus Node Exporter
Loaded: loaded (/etc/systemd/system/node_exporter.service; enabled; preset: disabled)
Active: active (running) since Tue 2024-12-10 01:00:12 CST; 7min ago
Main PID: 5615 (node_exporter)
Tasks: 6 (limit: 48928)
Memory: 6.6M
CPU: 141ms
CGroup: /system.slice/node_exporter.service
└─5615 /usr/local/bin/node_exporter --collector.systemd --collector.textfile --collector.textfile.directory=/var/lib/node_exporter --web.listen-address=0.0.0.0:9100 --web.telemetry-path=/metrics
修改 node_exporter 默认配置
我们回到 ansible 管理机上,对于该 node_exporter
role 的一些默认配置,可以查看如下定义文件:
>_ cat ~/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/defaults/main.yml
---
node_exporter_version: 1.8.2
node_exporter_binary_url: "https://github.com/{{ _node_exporter_repo }}/releases/download/v{{ node_exporter_version }}/\
node_exporter-{{ node_exporter_version }}.{{ ansible_system | lower }}-{{ _node_exporter_go_ansible_arch }}.tar.gz"
node_exporter_checksums_url: "https://github.com/{{ _node_exporter_repo }}/releases/download/v{{ node_exporter_version }}/sha256sums.txt"
node_exporter_web_disable_exporter_metrics: false
node_exporter_web_listen_address: "0.0.0.0:9100"
node_exporter_web_telemetry_path: "/metrics"
node_exporter_textfile_dir: "/var/lib/node_exporter"
node_exporter_tls_server_config: {}
node_exporter_http_server_config: {}
node_exporter_basic_auth_users: {}
node_exporter_enabled_collectors:
- systemd
- textfile:
directory: "{{ node_exporter_textfile_dir }}"
# - filesystem:
# ignored-mount-points: "^/(sys|proc|dev)($|/)"
# ignored-fs-types: "^(sys|proc|auto)fs$"
node_exporter_disabled_collectors: []
node_exporter_binary_install_dir: "/usr/local/bin"
node_exporter_system_group: "node-exp"
node_exporter_system_user: "{{ node_exporter_system_group }}"
node_exporter_config_dir: "/etc/node_exporter"
# Local path to stash the archive and its extraction
node_exporter_local_cache_path: "/tmp/node_exporter-{{ ansible_system | lower }}-{{ _node_exporter_go_ansible_arch }}/{{ node_exporter_version }}"
可以看到,node_exporter
role 的默认变量值是在这里定义的。
修改安装变量值,可以有如下几处地方进行设置:
方法一:在 playbook 脚本文件统一指定 vars
变量,该变量会覆盖 role 的默认变量值
cat install_node_exporter.yaml
- hosts: flink
collections:
- prometheus.prometheus
tasks:
- import_role:
name: node_exporter
vars:
node_exporter_enabled_collectors:
- systemd
- textfile:
directory: "{{ node_exporter_textfile_dir }}"
- filesystem:
ignored-mount-points: "^/(sys|proc|dev)($|/)"
ignored-fs-types: "^(sys|proc|auto)fs$"
方法二:修改 inventory 定义文件中,host group 的 vars 变量,或者单独某一个节点的 vars
变量值
>_ cat hosts.yaml
# game team test
flink:
hosts:
flink-1:
ansible_host: 192.168.22.174
vars:
ansible_ssh_user: root
ansible_ssh_password: ****
node_exporter_enabled_collectors:
- systemd
- textfile:
directory: "{{ node_exporter_textfile_dir }}"
- filesystem:
ignored-mount-points: "^/(sys|proc|dev)($|/)"
ignored-fs-types: "^(sys|proc|auto)fs$"