Kafka消费者 从Kafka中读取数据并写入文件
Kafka消费者 从Kafka中读取数据
最近有需求要从kafak上消费读取实时数据,并将数据中的key输出到文件中,用于发布端的原始点进行比对,以此来确定是否传输过程中有遗漏数据。
不废话,直接上代码,公司架构设计 kafak 上有多个TOPIC,此代码每次需要指定一个TOPIC,一个TOPIC有3个分区Partition,所以消费的时候用多线程,
读取数据过程中直接过滤重复的key点,因为原始推送点有20W的量(可能发生在一秒或者几秒)。当时我直接用的HASHMAP来过滤。
1、ConsumerGroup
1 import java.util.ArrayList; 2 import java.util.HashMap; 3 import java.util.List; 4 5 public class ConsumerGroup { 6 private List<ConsumerRunnable> consumers; 7 8 public ConsumerGroup(int consumerNum, String groupId, String topic, String brokerList,HashMap<String,String> points) { 9 consumers = new ArrayList<>(consumerNum); 10 for (int i = 0; i < consumerNum; ++i) { 11 ConsumerRunnable consumerThread = new ConsumerRunnable(brokerList, groupId, topic, points); 12 consumers.add(consumerThread); 13 } 14 } 15 16 public void execute() { 17 for (ConsumerRunnable task : consumers) { 18 new Thread(task).start(); 19 } 20 } 21 }
2、ConsumerRunnable
1 import com.google.gson.JsonArray; 2 import com.google.gson.JsonObject; 3 import com.google.gson.JsonParser; 4 import org.apache.kafka.clients.consumer.ConsumerRecord; 5 import org.apache.kafka.clients.consumer.ConsumerRecords; 6 import org.apache.kafka.clients.consumer.KafkaConsumer; 7 8 import java.util.Arrays; 9 import java.util.HashMap; 10 import java.util.Properties; 11 12 public class ConsumerRunnable implements Runnable { 13 14 // 每个线程维护私有的KafkaConsumer实例 15 private final KafkaConsumer<String, String> consumer; 16 17 HashMap<String,String> points = new HashMap<>(); 18 19 public ConsumerRunnable(String brokerList, String groupId, String topic,HashMap<String,String> nodepoint) { 20 Properties props = new Properties(); 21 props.put("bootstrap.servers", brokerList); 22 props.put("group.id", groupId); 23 props.put("enable.auto.commit", "true"); //本例使用自动提交位移 24 props.put("auto.commit.interval.ms", "1000"); 25 props.put("session.timeout.ms", "30000"); 26 props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); 27 props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); 28 this.consumer = new KafkaConsumer<>(props); 29 consumer.subscribe(Arrays.asList(topic)); // 本例使用分区副本自动分配策略 30 points = nodepoint; 31 } 32 33 @Override 34 public void run() { 35 while (true) { 36 ConsumerRecords<String, String> records = consumer.poll(200); 37 for (ConsumerRecord<String, String> record : records) { 38 // System.out.printf("Partition = %s , offset = %d, key = %s, value=%s",record.partition(),record.offset(),record.key(),record.value()); 39 40 JsonParser parse = new JsonParser(); 41 JsonObject jsonObject = (JsonObject) parse.parse(record.value()); 42 JsonArray jsonArray = jsonObject.get("list").getAsJsonArray(); 43 for (int i=0 ;i <jsonArray.size();i++){ 44 JsonObject subject = jsonArray.get(i).getAsJsonObject(); 45 String cedian = subject.get("id").getAsString().trim(); 46 if(points.containsKey(cedian) == false){ 47 points.put(cedian,cedian); 48 WriterDataFile.writeData(cedian); 49 } 50 51 // System.out.println(subject.get("id").getAsString()); 52 } 53 } 54 } 55 } 56 }
3、ConsumerTest
1 import java.util.HashMap; 2 3 public class ConsumerTest { 4 5 6 7 public static void main(String[] args) { 8 String brokerList = "172.16.10.22:9092,172.16.10.23:9092,172.16.10.21:9092"; 9 String groupId = "test20190722"; 10 String topic = "SDFD"; 11 int consumerNum = 3; 12 13 HashMap<String,String> points = new HashMap<>(); 14 15 ConsumerGroup consumerGroup = new ConsumerGroup(consumerNum, groupId, topic, brokerList,points); 16 consumerGroup.execute(); 17 18 19 20 21 } 22 }
4、WriterDataFile
1 import java.io.*; 2 import java.util.HashMap; 3 4 public class WriterDataFile { 5 6 private static String path = "E:\\kafkadata_SDFD.txt"; 7 8 public static void writeData(String strvalue){ 9 FileWriter fw ; 10 try { 11 fw = new FileWriter(path,true); 12 BufferedWriter bw = new BufferedWriter(fw); 13 bw.write(strvalue+"\r\n"); 14 bw.flush(); 15 bw.close(); 16 fw.close(); 17 } catch (IOException e) { 18 e.printStackTrace(); 19 } 20 21 } 22 23 }
都是基础写法,没有时间整理,如有不合理处请谅解。
码字不易...