试验环境:
CDH 5.15.1
CentOS 7
Python 3.7.0
kafka 1.1.1
kafka-python :https://pypi.org/project/kafka-python/#files
实验目的:
通过python线程,不断的将指定接口中的数据取出,并将数据不断发送到kafka服务中。
实验步骤-1:
先将kafka-python下载并安装成功;
进行一个python调用kafka的简单测试:
进入python3的终端:
>>> from kafka import KafkaProducer >>> producer = KafkaProducer(bootstrap_servers=["master:9092"]) >>> producer.send("test",b"Hello world") <kafka.producer.future.FutureRecordMetadata object at 0x7f4bf56fbda0> >>> producer.send("test",b"Hello world") <kafka.producer.future.FutureRecordMetadata object at 0x7f4bf5715438>
启动kafka消费者:
kafka-console-consumer --zookeeper master:2181 --from-beginning --topic test
输出结果:
Hello world
Hello world
实验步骤-2:
实验代码:
#!/usr/bin/env python # -*- coding: utf-8 -*- # @File : ParsePS.py # @Author: cjj # @Date : 2019/6/4 # @Desc : 请求接口,获取数据,对数据进行清洗 import re import threading import time from urllib.error import URLError from kafka import KafkaProducer from kafka.errors import KafkaError from suds.client import Client class Data_clean: # 获取测点数据的函数 def get_data(observation_point_name): try: # 获取接口数据 user_url = 'http://xxx.xxx.xxx.xxx/ServiceSL/ServiceGetInsqlData.svc?wsdl' client = Client(user_url) result = client.service.GetSingleTagInfo(observation_point_name) # 1.对数据进行清洗 # 1.1 先将数据转换成字符串 str1 = str(result) # 1.2 取出所有双引号里面的数据,并将列表转换成字符串 pattern = re.compile('"(.*)"') str2 = str(pattern.findall(str1)) # 1.3 将单引号去掉 str3 = str2.replace('\'', '') # 1.4 将逗号换成制表符 str4 = str3.replace(', ', '\t') # 1.5 去掉字符串前后的[] str5 = str4[:-1][1:] return str5 except TimeoutError as e: print("\033[1;31;0m>>>>>>TimeoutError ->->->->->-> 对接口的请求超时<<<<<<\033[0m") # print(e) except URLError as e: print("\033[1;31;0m>>>>>>URLError ->->->->->-> 连接不到sql服务器<<<<<<\033[0m") except: print("\033[1;31;0m>>>>>>其它原因报错<<<<<<\033[0m") try: producer = KafkaProducer(bootstrap_servers='master:9092') while 1: msg = Data_clean.get_data("SLWS_ps_1hzybqz_WD.PV") print(msg) # 指定主题和发送内容,将数据发送到kafka producer.send('test', msg.encode('utf-8')) time.sleep(5) except KafkaError as e: print(e) finally: producer.close() print('done!!!')
将代码上传到Linux服务器
执行代码:python3 ParsePS.py
查看kafka消费者结果: