three supported reliability levels: * End-to-end * Store on failure * Best effort

https://github.com/cloudera/flume/blob/master/flume-docs/src/docs/UserGuide/Introduction

=== Reliability

	Reliability, the ability to continue delivering events in the face of
	failures without losing data, is a vital feature of Flume. Large
	distributed systems can and do suffer partial failures in many ways -
	physical hardware can fail, resources such as network bandwidth or
	memory can become scarce, or software can crash or run slowly. Flume
	emphasizes fault-tolerance as a core design principle and keeps
	running and collecting data even when many components have failed.

	Flume can guarantee that all data received by an agent node will
	eventually make it to the collector at the end of its flow as long as
	the agent node keeps running. That is, data can be reliably
	delivered to its eventual destination.

	However, reliable delivery can be very resource intensive and is often
	a stronger guarantee than some data sources require. Therefore, Flume
	allows the user to specify, on a per-flow basis, the level of
	reliability required. There are three supported reliability levels:

	* End-to-end
	* Store on failure
	* Best effort

	.A Note About Reliability
	******************
	Although Flume is extremely tolerant to machine, network, and software
	failures, there is never any such thing as '100% reliability'. If all
	the machines in a Flume installation were irrevocably destroyed in
	some terrible data center incident, all copies of Flume's data would
	be lost and there would be no way to recover them. Therefore all of
	Flume's reliability levels make guarantees about data delivery 'until
	some maximum number of failures have occurred'. Flume's failure modes
	- in terms of what can fail and what will keep running if they do -
	are described in detail later in this guide.
	******************

	The end-to-end reliability level guarantees that once Flume accepts
	an event, that event will make it to the endpoint - as long as the
	agent that accepted the event remains live long enough. The first
	thing the agent does in this setting is write the event to disk in a
	''write-ahead log'' (WAL) so that, if the agent crashes and restarts,
	knowledge of the event is not lost. After the event has successfully
	made its way to the end of its flow, an acknowledgment is sent back to
	the originating agent so that it knows it no longer needs to store the
	event on disk. This reliability level can withstand any number of
	failures downstream of the initial agent.

	The store on failure reliability level causes nodes to only require
	an acknowledgement from the node one hop downstream. If the sending
	node detects a failure, it will store data on its local disk until the
	downstream node is repaired, or an alternate downstream destination
	can be selected. While this is effective, data can be lost if a
	compound or silent failure occurs.

	The best-effort reliability level sends data to the next hop with no
	attempts to confirm or retry delivery. If nodes fail, any data that
	they were in the process of transmitting or receiving can be
	lost. This is the weakest reliability level, but also the most
	lightweight.

=== Reliability

	Reliability, the ability to continue delivering events in the face of
	failures without losing data, is a vital feature of Flume. Large
	distributed systems can and do suffer partial failures in many ways -
	physical hardware can fail, resources such as network bandwidth or
	memory can become scarce, or software can crash or run slowly. Flume
	emphasizes fault-tolerance as a core design principle and keeps
	running and collecting data even when many components have failed.

	Flume can guarantee that all data received by an agent node will
	eventually make it to the collector at the end of its flow as long as
	the agent node keeps running. That is, data can be reliably
	delivered to its eventual destination.

	However, reliable delivery can be very resource intensive and is often
	a stronger guarantee than some data sources require. Therefore, Flume
	allows the user to specify, on a per-flow basis, the level of
	reliability required. There are three supported reliability levels:

	* End-to-end
	* Store on failure
	* Best effort

	.A Note About Reliability
	******************
	Although Flume is extremely tolerant to machine, network, and software
	failures, there is never any such thing as '100% reliability'. If all
	the machines in a Flume installation were irrevocably destroyed in
	some terrible data center incident, all copies of Flume's data would
	be lost and there would be no way to recover them. Therefore all of
	Flume's reliability levels make guarantees about data delivery 'until
	some maximum number of failures have occurred'. Flume's failure modes
	- in terms of what can fail and what will keep running if they do -
	are described in detail later in this guide.
	******************

	The end-to-end reliability level guarantees that once Flume accepts
	an event, that event will make it to the endpoint - as long as the
	agent that accepted the event remains live long enough. The first
	thing the agent does in this setting is write the event to disk in a
	''write-ahead log'' (WAL) so that, if the agent crashes and restarts,
	knowledge of the event is not lost. After the event has successfully
	made its way to the end of its flow, an acknowledgment is sent back to
	the originating agent so that it knows it no longer needs to store the
	event on disk. This reliability level can withstand any number of
	failures downstream of the initial agent.

	The store on failure reliability level causes nodes to only require
	an acknowledgement from the node one hop downstream. If the sending
	node detects a failure, it will store data on its local disk until the
	downstream node is repaired, or an alternate downstream destination
	can be selected. While this is effective, data can be lost if a
	compound or silent failure occurs.

	The best-effort reliability level sends data to the next hop with no
	attempts to confirm or retry delivery. If nodes fail, any data that
	they were in the process of transmitting or receiving can be
	lost. This is the weakest reliability level, but also the most
	lightweight.

posted @ 2017-11-11 01:15 papering 阅读(253) 评论(0) 收藏举报

刷新页面返回顶部