filebeat 篇章——How-filebeat-works
How-filebeat-works
https://www.elastic.co/guide/en/beats/filebeat/7.17/how-filebeat-works.html#how-filebeat-works
Filebeat consists of two main components: inputs and harvesters. These components work together to tail files and send event data to the output that you specify.
Filebeat 由两个主要组件组成: inut 和 harvesters。这些组件一起工作以跟踪文件并将事件数据发送到您指定的输出。
一、What is a harvester?
A harvester is responsible for reading the content of a single file. The harvester reads each file, line by line, and sends the content to the output. One harvester is started for each file. The harvester is responsible for opening and closing the file, which means that the file descriptor remains open while the harvester is running. If a file is removed or renamed while it’s being harvested, Filebeat continues to read the file. This has the side effect that the space on your disk is reserved until the harvester closes. By default, Filebeat keeps the file open until close_inactive
is reached.
一个 harvester 负责读取单个文件的内容。
harvester 按行读取每个文件,并将内容发送到输出。
对于每个文件,都会启动一个 harvester。
harvester 负责打开和关闭文件,这意味着在 harvester 运行时文件描述符保持开启状态。 如果在文件被 harvester 时删除或重命名,Filebeat 将继续读取文件。这将导致占用磁盘空间直到 harvester 关闭。默认情况下,Filebeat 一直保持收割器打开状态,直到到达 close_inactive。
close_inactive 设置为 5 分钟。您可以通过在 Filebeat 配置文件中调整此选项来更改它。例如,将其设置为 2m 表示 2 分钟。如果将其设置为 0,收割器将不被关闭,直到 Filebeat 进程关闭。
Closing a harvester has the following consequences:
关闭一个 harvester 会发生以下错误:
-
- The file handler is closed, freeing up the underlying resources if the file was deleted while the harvester was still reading the file.(文件处理程序关闭,如果文件在 harvester 仍在读取文件时被删除,则会释放底层资源。)
-
- The harvesting of the file will only be started again after
scan_frequency
has elapsed.(只有在经过 scan_frequency 时间后,才会重新启动对文件的收割。) - If the file is moved or removed while the harvester is closed, harvesting of the file will not continue.(如果在 harvester 关闭时移动或删除文件,则不会继续 harvester 该文件。)
- The harvesting of the file will only be started again after
To control when a harvester is closed, use the close_*
configuration options.(使用 close_* 配置选项来控制何时关闭 harvester)
二、What is an input?
An input is responsible for managing the harvesters and finding all sources to read from.
input 负责管理 harvesters 并找到所有要读取的来源。
If the input type is log
, the input finds all files on the drive that match the defined glob paths and starts a harvester for each file. Each input runs in its own Go routine.
如果 input 类型是日志,input 会查找与定义的 glob 路径相匹配的驱动器上的所有文件,并为每个文件启动一个 harvester。每个 input 运行在自己的 Go 协程中。
The following example configures Filebeat to harvest lines from all log files that match the specified glob patterns:
以下示例配置 Filebeat 从与指定 glob 模式相匹配的所有日志文件中收集行:
filebeat.inputs: - type: log paths: - /var/log/*.log - /var/path2/*.log
Filebeat currently supports several input
types. Each input type can be defined multiple times. The log
input checks each file to see whether a harvester needs to be started, whether one is already running, or whether the file can be ignored (see ignore_older
). New lines are only picked up if the size of the file has changed since the harvester was closed.
Filebeat 目前支持多种 input 类型。每个 input 类型可以多次定义。log input 检查每个文件,以确定是否需要启动 harvester,是否已经在运行harvester,或者是否可以忽略该文件(see ingroe_older)。仅当自上一次 harvester 关闭以来,文件大小发生变化时,才会选择新日志行。
三、Filebeat 如何维护文件的状态?
Filebeat keeps the state of each file and frequently flushes the state to disk in the registry file. The state is used to remember the last offset a harvester was reading from and to ensure all log lines are sent. If the output, such as Elasticsearch or Logstash, is not reachable, Filebeat keeps track of the last lines sent and will continue reading the files as soon as the output becomes available again. While Filebeat is running, the state information is also kept in memory for each input. When Filebeat is restarted, data from the registry file is used to rebuild the state, and Filebeat continues each harvester at the last known position.
Filebeat 会维护每个文件的状态,并经常将 state 写入 disk 上的 registry 文件以持久化。
(Filebeat 会根据默认配置每 5 秒钟将状态写入注册表文件进行持久化,将上次的位置和读取的行数记录下来。registry_flush 进行修改。)
state 用于记录上一个 harvester 读取的偏移量,以确保所有日志行都被发送。如果 output(例如 Elasticsearch 或 Logstash)无法到达,
Filebeat 会记录上次已发送的日志行,并在输出重新可用时继续读取文件。
运行 Filebeat 时,状态信息也会在内存中维护每个输入。
当 Filebeat 重新启动时,将使用 registry 文件中的数据重新构建状态,然后将每个 harvester 置于上次已知的位置。
For each input, Filebeat keeps a state of each file it finds. Because files can be renamed or moved, the filename and path are not enough to identify a file. For each file, Filebeat stores unique identifiers to detect whether a file was harvested previously.
对于每个 input ,Filebeat 会维护它找到的每个文件的状态。由于文件可能会被重命名或移动,文件名和路径不足以唯一标识文件。因此,Filebeat 为每个文件存储独特的标识符以检测文件是否已被 harvester 过。
If your use case involves creating a large number of new files every day, you might find that the registry file grows to be too large. See Registry file is too large for details about configuration options that you can set to resolve this issue.
Filebeat guarantees that events will be delivered to the configured output at least once and with no data loss. Filebeat is able to achieve this behavior because it stores the delivery state of each event in the registry file.
Filebeat 可以保证事件至少被传输一次且不会丢失任何数据,并通过在 registry 文件中存储每个事件的交付状态来实现此行为。
In situations where the defined output is blocked and has not confirmed all events, Filebeat will keep trying to send events until the output acknowledges that it has received the events.
在定义的 output 被阻止且未确认所有事件的情况下,Filebeat 将不断尝试发送事件,直到 output 确认已接收事件。
If Filebeat shuts down while it’s in the process of sending events, it does not wait for the output to acknowledge all events before shutting down. Any events that are sent to the output, but not acknowledged before Filebeat shuts down, are sent again when Filebeat is restarted. This ensures that each event is sent at least once, but you can end up with duplicate events being sent to the output. You can configure Filebeat to wait a specific amount of time before shutting down by setting the shutdown_timeout
option.