ansible-galaxy 使用 prometheus-community/ansible 社区 Collection 安装 node-exporter

前提条件

  • 安装 ansible (推荐使用 pip3 install ansible

获取 prometheus collection 说明

找到 prometheus-commulity 社区开源仓库,https://github.com/prometheus-community/ansible,根据说明文档跳转到文档页面 https://prometheus-community.github.io/ansible/branch/main/
可以发现,社区官方维护的 ansible-collection 已经包含了诸多常见的 role 角色

我们点开 node_exporter role 的介绍页面,下面便是此 node_exporter role 相关的一些关键变量:

Parameter Comments
node_exporter_basic_auth_users

 

dictionary
Dictionary of users and password for basic authentication. Passwords are automatically hashed with bcrypt.
node_exporter_binary_install_dir

 

string
Advanced

Directory to install node_exporter binary

Default: "/usr/local/bin"
node_exporter_binary_url

 

string
URL of the node exporter binaries .tar.gz file

Default: "https://github.com/{{ _node_exporter_repo }}/releases/download/v{{ node_exporter_version }}/node_exporter-{{ node_exporter_version }}.{{ ansible_system | lower }}-{{ _node_exporter_go_ansible_arch }}.tar.gz"
node_exporter_checksums_url

 

string
URL of the node exporter checksums file

Default: "https://github.com/{{ _node_exporter_repo }}/releases/download/v{{ node_exporter_version }}/sha256sums.txt"
node_exporter_config_dir

 

string
Path to directory with node_exporter configuration

Default: "/etc/node_exporter"
node_exporter_disabled_collectors

 

list / elements=string
List of disabled collectors.

By default node_exporter disables collectors listed here.
node_exporter_enabled_collectors

 

list / elements=string
List of dicts defining additionally enabled collectors and their configuration.

It adds collectors to those enabled by default.

Default: ["systemd", {"textfile": {"directory": "{{ node_exporter_textfile_dir }}"}}]
node_exporter_http_server_config

 

dictionary
Config for HTTP/2 support.

Keys and values are the same as in node_exporter docs.
node_exporter_local_cache_path

 

string
Local path to stash the archive and its extraction

Default: "/tmp/node_exporter-{{ ansible_system | lower }}-{{ _node_exporter_go_ansible_arch }}/{{ node_exporter_version }}"
node_exporter_system_group

 

string
Advanced

System group for node exporter

Default: "node-exp"
node_exporter_system_user

 

string
Advanced

Node exporter user

Default: "node-exp"
node_exporter_textfile_dir

 

string
Directory used by the Textfile Collector.

To get permissions to write metrics in this directory, users must be in node-exp system group.

Note: More information in TROUBLESHOOTING.md guide.

Default: "/var/lib/node_exporter"
node_exporter_tls_server_config

 

dictionary
Configuration for TLS authentication.

Keys and values are the same as in node_exporter docs.
node_exporter_version

 

string
Node exporter package version. Also accepts latest as parameter.

Default: "1.8.2"
node_exporter_web_disable_exporter_metrics

 

boolean
Exclude metrics about the exporter itself (promhttp_, process_, go_*).

Choices:

- **false** ← (default)

- true
node_exporter_web_listen_address

 

string
Address on which node exporter will listen

Default: "0.0.0.0:9100"
node_exporter_web_telemetry_path

 

string
Path under which to expose metrics

Default: "/metrics"

安装 Collection

安装方式其实有两种,我们接下来分别介绍两种安装方法

方式一:ansible-galaxy 仓库安装

我们在 https://galaxy.ansible.com/ 查找 promehteus 的 collection,查找到的便是 prometheus-commulity 社区贡献的 Ansible Collections 集合:

在 ansible 管理机通过 galaxy 仓库安装 prometheus.prometheus Collection 集合:

ansible-galaxy collection install prometheus.prometheus:0.23.0

方式二:通过 github 源码仓库安装

>_ ansible-galaxy collection install git+https://github.com/prometheus-community/ansible.git
Cloning into '/root/.ansible/tmp/ansible-local-143093dqnngbq4/tmpmhi9qxg0/ansible9uq0qwat'...
remote: Enumerating objects: 774, done.
remote: Counting objects: 100% (774/774), done.
remote: Compressing objects: 100% (389/389), done.
remote: Total 774 (delta 302), reused 588 (delta 232), pack-reused 0 (from 0)
Receiving objects: 100% (774/774), 156.00 KiB | 1.53 MiB/s, done.
Resolving deltas: 100% (302/302), done.
Your branch is up to date with 'origin/main'.
Starting galaxy collection install process
Process install dependency map
Starting collection install process
Installing 'prometheus.prometheus:0.23.1' to '/root/.ansible/collections/ansible_collections/prometheus/prometheus'
Created collection for prometheus.prometheus:0.23.1 at /root/.ansible/collections/ansible_collections/prometheus/prometheus
prometheus.prometheus:0.23.1 was installed successfully
'community.general:10.1.0' is already installed, skipping.

查看本机已安装的 Collection

>_ ansible-galaxy collection list
# /usr/lib/python3.9/site-packages/ansible_collections
Collection                    Version
----------------------------- -------
amazon.aws                    3.4.0  
ansible.netcommon             3.1.0  
....

# /root/.ansible/collections/ansible_collections
Collection            Version
--------------------- -------
community.general     10.1.0 
prometheus.prometheus 0.23.1 

可以看到,除了系统自带的一些 collection,还有我们刚安装的 prometheus.prometheus 0.23.1,而它所依赖的 commulity.general 10.1.0 也在这里。

安装 node_exporter

我们在 inventory 准备好需要安装的节点组信息。
常使用 ping 模块来测试连接:

>_ ansible all -i hosts.yaml -m ping

flink-1 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}

返回 pong 即代表正常。

准备 playbook

>_ cat install_node_exporter.yaml
- hosts: flink
  collections:
    - prometheus.prometheus
  tasks:
    - import_role: 
        name: node_exporter

尝试执行 playbook 任务

我们如果不确定所包含的任务是否能正确执行,可以使用 -C 参数来进行 try-run 安装,不会实际修改目标节点的任何文件:

>_ ansible-playbook -i hosts.yaml install_node_exporter.yaml -C

PLAY [flink] *************************************************************************************************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************************************************************************************************
ok: [flink-1]

TASK [prometheus.prometheus.node_exporter : Validating arguments against arg spec 'main' - Prometheus Node Exporter] *****************************************************************************
ok: [flink-1]

TASK [prometheus.prometheus.node_exporter : Preflight] *******************************************************************************************************************************************
included: /root/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/preflight.yml for flink-1

TASK [Common preflight] **************************************************************************************************************************************************************************

TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Check for deprecated skip_install variable] ****************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Check for deprecated binary_local_dir variable] ************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Check for deprecated archive_path variable] ****************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Assert usage of systemd as an init system] *****************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Install dependencies] **************************************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus._common : Gather package facts] **************************************************************************************************************************************
ok: [flink-1]

TASK [prometheus.prometheus._common : Naive assertion of proper listen address] ******************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus.node_exporter : Assert that used version supports listen address type] ***********************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus.node_exporter : Assert collectors are not both disabled and enabled at the same time] ********************************************************************************

TASK [prometheus.prometheus.node_exporter : Assert that TLS key and cert path are set] ***********************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Check existence of TLS cert file] ********************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Check existence of TLS key file] *********************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Assert that TLS key and cert are present] ************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Discover latest version] *****************************************************************************************************************************
skipping: [flink-1]

TASK [Install] ***********************************************************************************************************************************************************************************

TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Gather system user and group facts] ************************************************************************************************************************
ok: [flink-1] => (item=passwd)
ok: [flink-1] => (item=group)

TASK [prometheus.prometheus._common : Create system group node-exp] ******************************************************************************************************************************
changed: [flink-1]

TASK [prometheus.prometheus._common : Create system user node-exp] *******************************************************************************************************************************
changed: [flink-1]

TASK [prometheus.prometheus._common : Create localhost binary cache path] ************************************************************************************************************************
ok: [flink-1 -> localhost]

TASK [prometheus.prometheus._common : Get checksum list for node_exporter-1.8.2.linux-amd64.tar.gz] **********************************************************************************************
ok: [flink-1]

TASK [prometheus.prometheus._common : Download node_exporter-1.8.2.linux-amd64.tar.gz] ***********************************************************************************************************
ok: [flink-1 -> localhost]

TASK [prometheus.prometheus._common : Unpack binary archive node_exporter-1.8.2.linux-amd64.tar.gz] **********************************************************************************************
ok: [flink-1 -> localhost]

TASK [prometheus.prometheus._common : Check existence of binary install dir] *********************************************************************************************************************
ok: [flink-1]

TASK [prometheus.prometheus._common : Make sure binary install dir exists] ***********************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus._common : Propagate binaries] ****************************************************************************************************************************************
changed: [flink-1] => (item=node_exporter)

TASK [SELinux] ***********************************************************************************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Configure] *******************************************************************************************************************************************
included: /root/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/configure.yml for flink-1

TASK [Configure] *********************************************************************************************************************************************************************************

TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Create systemd service unit node_exporter] *****************************************************************************************************************
changed: [flink-1]

TASK [prometheus.prometheus._common : Create config dir /etc/node_exporter] **********************************************************************************************************************
[WARNING]: failed to look up user node-exp. Create user up to this point in real play
[WARNING]: failed to look up group node-exp. Create group up to this point in real play
changed: [flink-1]

TASK [prometheus.prometheus._common : Install web config for node_exporter] **********************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Create textfile collector dir] ***********************************************************************************************************************
changed: [flink-1]

TASK [prometheus.prometheus.node_exporter : Ensure Node Exporter is enabled on boot] *************************************************************************************************************
skipping: [flink-1]

RUNNING HANDLER [prometheus.prometheus.node_exporter : Restart node_exporter] ********************************************************************************************************************
skipping: [flink-1]

PLAY RECAP ***************************************************************************************************************************************************************************************
flink-1                    : ok=26   changed=6    unreachable=0    failed=0    skipped=12   rescued=0    ignored=0   

执行 playbook 脚本任务,安装 node_exporter

注意:如果担心 role 有额外的步骤影响目标节点,可以使用 --step 参数进行安装,此时执行的脚本任务每一个任务都需要手动敲如 y/n 进行确认执行。

>_ ansible-playbook -i hosts.yaml install_node_exporter.yaml --step

PLAY [flink] *************************************************************************************************************************************************************************************
Perform task: TASK: Gathering Facts (N)o/(y)es/(c)ontinue: y
....

此处,我们就直接执行安装了:

>_ ansible-playbook -i hosts.yaml install_node_exporter.yaml       

PLAY [flink] *************************************************************************************************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************************************************************************************************
ok: [flink-1]

TASK [prometheus.prometheus.node_exporter : Validating arguments against arg spec 'main' - Prometheus Node Exporter] *****************************************************************************
ok: [flink-1]

TASK [prometheus.prometheus.node_exporter : Preflight] *******************************************************************************************************************************************
included: /root/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/preflight.yml for flink-1

TASK [Common preflight] **************************************************************************************************************************************************************************

TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Check for deprecated skip_install variable] ****************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Check for deprecated binary_local_dir variable] ************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Check for deprecated archive_path variable] ****************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Assert usage of systemd as an init system] *****************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Install dependencies] **************************************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus._common : Gather package facts] **************************************************************************************************************************************
ok: [flink-1]

TASK [prometheus.prometheus._common : Naive assertion of proper listen address] ******************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus.node_exporter : Assert that used version supports listen address type] ***********************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus.node_exporter : Assert collectors are not both disabled and enabled at the same time] ********************************************************************************

TASK [prometheus.prometheus.node_exporter : Assert that TLS key and cert path are set] ***********************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Check existence of TLS cert file] ********************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Check existence of TLS key file] *********************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Assert that TLS key and cert are present] ************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Discover latest version] *****************************************************************************************************************************
skipping: [flink-1]

TASK [Install] ***********************************************************************************************************************************************************************************

TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Gather system user and group facts] ************************************************************************************************************************
ok: [flink-1] => (item=passwd)
ok: [flink-1] => (item=group)

TASK [prometheus.prometheus._common : Create system group node-exp] ******************************************************************************************************************************
changed: [flink-1]

TASK [prometheus.prometheus._common : Create system user node-exp] *******************************************************************************************************************************
changed: [flink-1]

TASK [prometheus.prometheus._common : Create localhost binary cache path] ************************************************************************************************************************
ok: [flink-1 -> localhost]

TASK [prometheus.prometheus._common : Get checksum list for node_exporter-1.8.2.linux-amd64.tar.gz] **********************************************************************************************
ok: [flink-1]

TASK [prometheus.prometheus._common : Download node_exporter-1.8.2.linux-amd64.tar.gz] ***********************************************************************************************************
ok: [flink-1 -> localhost]

TASK [prometheus.prometheus._common : Unpack binary archive node_exporter-1.8.2.linux-amd64.tar.gz] **********************************************************************************************
ok: [flink-1 -> localhost]

TASK [prometheus.prometheus._common : Check existence of binary install dir] *********************************************************************************************************************
ok: [flink-1]

TASK [prometheus.prometheus._common : Make sure binary install dir exists] ***********************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus._common : Propagate binaries] ****************************************************************************************************************************************
changed: [flink-1] => (item=node_exporter)

TASK [SELinux] ***********************************************************************************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Configure] *******************************************************************************************************************************************
included: /root/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/configure.yml for flink-1

TASK [Configure] *********************************************************************************************************************************************************************************

TASK [prometheus.prometheus._common : Validate invocation of _common role] ***********************************************************************************************************************
ok: [flink-1] => {
    "changed": false
}

MSG:

All assertions passed

TASK [prometheus.prometheus._common : Create systemd service unit node_exporter] *****************************************************************************************************************
changed: [flink-1]

TASK [prometheus.prometheus._common : Create config dir /etc/node_exporter] **********************************************************************************************************************
changed: [flink-1]

TASK [prometheus.prometheus._common : Install web config for node_exporter] **********************************************************************************************************************
skipping: [flink-1]

TASK [prometheus.prometheus.node_exporter : Create textfile collector dir] ***********************************************************************************************************************
changed: [flink-1]

TASK [prometheus.prometheus.node_exporter : Ensure Node Exporter is enabled on boot] *************************************************************************************************************
changed: [flink-1]

RUNNING HANDLER [prometheus.prometheus.node_exporter : Restart node_exporter] ********************************************************************************************************************
changed: [flink-1]

PLAY RECAP ***************************************************************************************************************************************************************************************
flink-1                    : ok=28   changed=8    unreachable=0    failed=0    skipped=10   rescued=0    ignored=0  

检查安装效果

我们在目标节点查看服务列表:

>_ systemctl list-unit-files -t service | grep node_exporter
node_exporter.service                      enabled         disabled

>_ systemctl status node_exporter.service 
● node_exporter.service - Prometheus Node Exporter
     Loaded: loaded (/etc/systemd/system/node_exporter.service; enabled; preset: disabled)
     Active: active (running) since Tue 2024-12-10 01:00:12 CST; 7min ago
   Main PID: 5615 (node_exporter)
      Tasks: 6 (limit: 48928)
     Memory: 6.6M
        CPU: 141ms
     CGroup: /system.slice/node_exporter.service
             └─5615 /usr/local/bin/node_exporter --collector.systemd --collector.textfile --collector.textfile.directory=/var/lib/node_exporter --web.listen-address=0.0.0.0:9100 --web.telemetry-path=/metrics

修改 node_exporter 默认配置

我们回到 ansible 管理机上,对于该 node_exporter role 的一些默认配置,可以查看如下定义文件:

>_ cat ~/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/defaults/main.yml
---
node_exporter_version: 1.8.2
node_exporter_binary_url: "https://github.com/{{ _node_exporter_repo }}/releases/download/v{{ node_exporter_version }}/\
                           node_exporter-{{ node_exporter_version }}.{{ ansible_system | lower }}-{{ _node_exporter_go_ansible_arch }}.tar.gz"
node_exporter_checksums_url: "https://github.com/{{ _node_exporter_repo }}/releases/download/v{{ node_exporter_version }}/sha256sums.txt"

node_exporter_web_disable_exporter_metrics: false
node_exporter_web_listen_address: "0.0.0.0:9100"
node_exporter_web_telemetry_path: "/metrics"

node_exporter_textfile_dir: "/var/lib/node_exporter"

node_exporter_tls_server_config: {}

node_exporter_http_server_config: {}

node_exporter_basic_auth_users: {}

node_exporter_enabled_collectors:
  - systemd
  - textfile:
      directory: "{{ node_exporter_textfile_dir }}"
#  - filesystem:
#      ignored-mount-points: "^/(sys|proc|dev)($|/)"
#      ignored-fs-types: "^(sys|proc|auto)fs$"

node_exporter_disabled_collectors: []

node_exporter_binary_install_dir: "/usr/local/bin"
node_exporter_system_group: "node-exp"
node_exporter_system_user: "{{ node_exporter_system_group }}"

node_exporter_config_dir: "/etc/node_exporter"
# Local path to stash the archive and its extraction
node_exporter_local_cache_path: "/tmp/node_exporter-{{ ansible_system | lower }}-{{ _node_exporter_go_ansible_arch }}/{{ node_exporter_version }}"

可以看到,node_exporter role 的默认变量值是在这里定义的。

修改安装变量值,可以有如下几处地方进行设置:

方法一:在 playbook 脚本文件统一指定 vars 变量,该变量会覆盖 role 的默认变量值

cat install_node_exporter.yaml
- hosts: flink
  collections:
    - prometheus.prometheus
  tasks:
    - import_role: 
        name: node_exporter
  vars:
    node_exporter_enabled_collectors:
      - systemd
      - textfile:
          directory: "{{ node_exporter_textfile_dir }}"
      - filesystem:
          ignored-mount-points: "^/(sys|proc|dev)($|/)"
          ignored-fs-types: "^(sys|proc|auto)fs$"

方法二:修改 inventory 定义文件中,host group 的 vars 变量,或者单独某一个节点的 vars 变量值

>_ cat hosts.yaml                
# game team test
flink:
  hosts:
    flink-1:
      ansible_host: 192.168.22.174
  vars:
    ansible_ssh_user: root
    ansible_ssh_password: ****
    node_exporter_enabled_collectors:
      - systemd
      - textfile:
          directory: "{{ node_exporter_textfile_dir }}"
      - filesystem:
          ignored-mount-points: "^/(sys|proc|dev)($|/)"
          ignored-fs-types: "^(sys|proc|auto)fs$"
posted @ 2024-12-10 01:19  Professor哥  阅读(15)  评论(0编辑  收藏  举报