Kafka和InfluxDB集成

随着物联网的广泛应用,时序数据库在软件开发中越来越多的被采用,而InfluxDB作为市场排名第一的时序数据库,基本都是时序数据库的首选。

最近接手一个项目,客户当前的解决方案是使用一个Connector收Kafka中的特定Topic消息,解析后转存入InluxDB。 从这个可以看出客户并不是很熟悉InluxDB的应用,我给出的改进方案就是在Kafka和InfluxDB中间加入Telegraf,Kafka作为Telegraf的Plugin,这样就把收Kafka消息再解析转存的任务交给Telegraf了。

只需要改一下Telegraf的配置,如下:

vi /etc/telegraf/telegraf.conf

# Configuration for sending metrics to InfluxDB
[[outputs.influxdb]]
  ## The full HTTP or UDP URL for your InfluxDB instance.
  ##
  ## Multiple URLs can be specified for a single cluster, only ONE of the
  ## urls will be written to each interval.
  # urls = ["unix:///var/run/influxdb.sock"]
  # urls = ["udp://127.0.0.1:8089"]
  urls = ["http://127.0.0.1:8086"]
  ## The target database for metrics; will be created as needed.
  ## For UDP url endpoint database needs to be configured on server side.
  database = "telegraf"
  ## The value of this tag will be used to determine the database.  If this
  ## tag is not set the 'database' option is used as the default.
  # database_tag = ""
  ## If true, no CREATE DATABASE queries will be sent.  Set to true when using
  ## Telegraf with a user without permissions to create databases or when the
  ## database already exists.
  # skip_database_creation = false
  ## Name of existing retention policy to write to.  Empty string writes to
  ## the default retention policy.  Only takes effect when using HTTP.
  # retention_policy = ""
  ## Write consistency (clusters only), can be: "any", "one", "quorum", "all".
  ## Only takes effect when using HTTP.
  # write_consistency = "any"
  ## Timeout for HTTP messages.
  timeout = "5s"
  ## HTTP Basic Auth
  username = "xiaodong"
  password = "HON123well"
  ## HTTP User-Agent
  # user_agent = "telegraf"
  ## UDP payload size is the maximum packet size to send.
  # udp_payload = "512B"
  ## Optional TLS Config for use on HTTP connections.
  # tls_ca = "/etc/telegraf/ca.pem"
  # tls_cert = "/etc/telegraf/cert.pem"
  # tls_key = "/etc/telegraf/key.pem"
  ## Use TLS but skip chain & host verification
  # insecure_skip_verify = false
  ## HTTP Proxy override, if unset values the standard proxy environment
  ## variables are consulted to determine which proxy, if any, should be used.
  # http_proxy = "http://corporate.proxy:3128"
  ## Additional HTTP headers
  # http_headers = {"X-Special-Header" = "Special-Value"}
  ## HTTP Content-Encoding for write request body, can be set to "gzip" to
  ## compress body or "identity" to apply no encoding.
  # content_encoding = "identity"
  ## When true, Telegraf will output unsigned integers as unsigned values,
  ## i.e.: "42u".  You will need a version of InfluxDB supporting unsigned
  ## integer values.  Enabling this option will result in field type errors if
  ## existing data has been written.
  # influx_uint_support = false
# # Read metrics from Kafka topic(s)
[[inputs.kafka_consumer]]
#   ## kafka servers
   brokers = ["localhost:9092"]
#   ## topic(s) to consume
   topics = ["telegraf"]
#   ## Add topic as tag if topic_tag is not empty
   # topic_tag = ""
#
#   ## Optional Client id
#   # client_id = "Telegraf"
#
#   ## Set the minimal supported Kafka version.  Setting this enables the use of new
#   ## Kafka features and APIs.  Of particular interest, lz4 compression
#   ## requires at least version 0.10.0.0.
#   ##   ex: version = "1.1.0"
#   # version = ""
#
#   ## Optional TLS Config
#   # tls_ca = "/etc/telegraf/ca.pem"
#   # tls_cert = "/etc/telegraf/cert.pem"
#   # tls_key = "/etc/telegraf/key.pem"
#   ## Use TLS but skip chain & host verification
#   # insecure_skip_verify = false
#
#   ## Optional SASL Config
    sasl_username = "kafka"
    sasl_password = "secret"
#
#   ## the name of the consumer group
   consumer_group = "telegraf_metrics_consumers"
#   ## Offset (must be either "oldest" or "newest")
   offset = "oldest"
#   ## Maximum length of a message to consume, in bytes (default 0/unlimited);
#   ## larger messages are dropped
   max_message_len = 1000000
#
#   ## Maximum messages to read from the broker that have not been written by an
#   ## output.  For best throughput set based on the number of metrics within
#   ## each message and the size of the output's metric_batch_size.
#   ##
#   ## For example, if each message from the queue contains 10 metrics and the
#   ## output metric_batch_size is 1000, setting this to 100 will ensure that a
#   ## full batch is collected and the write is triggered immediately without
#   ## waiting until the next flush_interval.
    max_undelivered_messages = 1000
#
#   ## Data format to consume.
#   ## Each data format has its own unique set of configuration options, read
#   ## more about them here:
#   ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
   data_format = "json"  

  #measurement name,  if no name_override, measurement name will be kafka_consumer

  name_override = "topicB" 

  ## Tag keys is an array of keys that should be added as tags.
  tag_keys = ["first"]

  ## String fields is an array of keys that should be added as string fields. 

  json_string_fields = ["last"]
  ## Query is a GJSON path that specifies a specific chunk of JSON to be
  ## parsed, if not specified the whole document will be parsed.
  ##
  ## GJSON query paths are described here:
  ##   https://github.com/tidwall/gjson#path-syntax
  json_query = "obj.friends"
  ## Time key is the key containing the time that should be used to create the metric.
  json_time_key = ""
  ## Time format is the time layout that should be used to interprete the
  ## json_time_key.  The time must be `unix`, `unix_ms` or a time in the
  ## "reference time".
  ##   ex: json_time_format = "Mon Jan 2 15:04:05 -0700 MST 2006"
  ##       json_time_format = "2006-01-02T15:04:05Z07:00"
  ##       json_time_format = "unix"
  ##       json_time_format = "unix_ms"
  json_time_format = ""

Kafka Message input: 

{
    "obj": {
        "name": {"first": "Tom", "last": "Anderson"},
        "mrname": "myjson",
        "age":37,
        "children": ["Sara","Alex","Jack"],
        "fav.movie": "Deer Hunter",
        "friends": [
            {"first": "Dale", "last": "Murphy", "age": 44},
            {"first": "Roger", "last": "Craig", "age": 68},
            {"first": "Jane", "last": "Murphy", "age": 47}
        ]
    }
}

Output, that is data in InfluxDB:

time                age first host                       last
----                --- ----- ----                       ----
1567033241593012039 44  Dale  ch71s8dev214.honeywell.com Murphy
1567033241593021507 68  Roger ch71s8dev214.honeywell.com Craig
1567033241593024972 48  Jane  ch71s8dev214.honeywell.com Murphy

 

posted @ 2019-08-28 10:38  风雪夜归猿  阅读(3708)  评论(0编辑  收藏  举报