写了一个 Http 的 Table Source
参考官网: [用户定义源和数据汇](https://nightlies.apache.org/flink/flink-docs-release-1.14/zh/docs/dev/table/sourcessinks/)
Flink Table 连接器结构:
自定义需要实现如下内容:
- 1. 实现 Runtime 的 SourceFunction
- 2. 实现 Planner 的 TableSourceFactory 和 TableSource
先看一下最后实现了的 Table Schema
create table cust_http_source( id string ,name string ,sex string )WITH( 'connector' = 'http' ,'http.url' = 'http://localhost:8888' ,'http.interval' = '1000' ,'format' = 'csv' )
## 1. 定义 SourceFunction
在网上找了一个发送 Http 请求的 Demo, 稍微改了一点,将 url 改成传入参数,获取 httpServer 返回的数据
public class HttpClientUtil { public static String doGet(String httpurl) throws IOException { HttpURLConnection connection = null; InputStream is = null; BufferedReader br = null; // 返回结果字符串 String result = null; try { // 创建远程url连接对象 URL url = new URL(httpurl); // 通过远程url连接对象打开一个连接,强转成httpURLConnection类 connection = (HttpURLConnection) url.openConnection(); // 设置连接方式:get connection.setRequestMethod("GET"); // 设置连接主机服务器的超时时间:15000毫秒 connection.setConnectTimeout(15000); // 设置读取远程返回的数据时间:60000毫秒 connection.setReadTimeout(60000); // 发送请求 connection.connect(); // 通过connection连接,获取输入流 if (connection.getResponseCode() == 200) { is = connection.getInputStream(); // 封装输入流is,并指定字符集 br = new BufferedReader(new InputStreamReader(is, "UTF-8")); // 存放数据 StringBuffer sbf = new StringBuffer(); String temp = null; while ((temp = br.readLine()) != null) { sbf.append(temp); sbf.append("\r\n"); } result = sbf.toString(); } } catch (MalformedURLException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { // 关闭资源 if (null != br) { try { br.close(); } catch (IOException e) { e.printStackTrace(); } } if (null != is) { try { is.close(); } catch (IOException e) { e.printStackTrace(); } } connection.disconnect(); } return result; } }
* 非常抱歉不知道是从哪位大佬的博客里面复制的,时间有点久了,找不到来源了
SourceFunction 就很简单了,集成 RichSourceFunction,实现方法即可,接收 table properties 的属性,Format 直接用 Flink 现有的,所以加上了反序列化器
public class HttpSource extends RichSourceFunction<RowData> { private volatile boolean isRunning = true; private String url; private long requestInterval; private DeserializationSchema<RowData> deserializer; // count out event private transient Counter counter; public HttpSource(String url, long requestInterval, DeserializationSchema<RowData> deserializer) { this.url = url; this.requestInterval = requestInterval; this.deserializer = deserializer; } @Override public void open(Configuration parameters) throws Exception { counter = new SimpleCounter(); this.counter = getRuntimeContext() .getMetricGroup() .counter("myCounter"); } @Override public void run(SourceContext<RowData> ctx) throws Exception { while (isRunning) { try { // receive http message, csv format String message = HttpClientUtil.doGet(url); // deserializer csv message ctx.collect(deserializer.deserialize(message.getBytes())); this.counter.inc(); Thread.sleep(requestInterval); } catch (Exception e) { e.printStackTrace(); } } } @Override public void cancel() { isRunning = false; } }
接收 table properties 中 format 格式的数据,序列号成 RowData 类型,从 SourceFunction 输出
## 2. 定义 TableSource
HttpDynamicTableSource 实现 ScanTableSource,接收 table properties 的属性,从 format 创建匹配的 反序列化器,创建 HttpSource
public class HttpDynamicTableSource implements ScanTableSource { private final String url; private final long interval; private final DecodingFormat<DeserializationSchema<RowData>> decodingFormat; private final DataType producedDataType; public HttpDynamicTableSource( String hostname, long interval, DecodingFormat<DeserializationSchema<RowData>> decodingFormat, DataType producedDataType) { this.url = hostname; this.interval = interval; this.decodingFormat = decodingFormat; this.producedDataType = producedDataType; } @Override public ChangelogMode getChangelogMode() { // in our example the format decides about the changelog mode // but it could also be the source itself return decodingFormat.getChangelogMode(); } @Override public ScanRuntimeProvider getScanRuntimeProvider(ScanContext runtimeProviderContext) { // create runtime classes that are shipped to the cluster final DeserializationSchema<RowData> deserializer = decodingFormat.createRuntimeDecoder( runtimeProviderContext, producedDataType); final SourceFunction<RowData> sourceFunction = new HttpSource(url, interval, deserializer); return SourceFunctionProvider.of(sourceFunction, false); } @Override public DynamicTableSource copy() { return new HttpDynamicTableSource(url, interval, decodingFormat, producedDataType); } @Override public String asSummaryString() { return "Http Table Source"; } }
## 3. 定义 TableSourceFactory
实现 DynamicTableSourceFactory 接口,添加必填属性 http.url 和 http.interval 的 ConfigOption, 创建 HttpDynamicTableSource
public class HttpDynamicTableFactory implements DynamicTableSourceFactory { // define all options statically public static final ConfigOption<String> URL = ConfigOptions.key("http.url") .stringType() .noDefaultValue(); public static final ConfigOption<Long> INTERVAL = ConfigOptions.key("http.interval") .longType() .noDefaultValue(); @Override public String factoryIdentifier() { return "http"; // used for matching to `connector = '...'` } @Override public Set<ConfigOption<?>> requiredOptions() { final Set<ConfigOption<?>> options = new HashSet<>(); options.add(URL); options.add(INTERVAL); options.add(FactoryUtil.FORMAT); // use pre-defined option for format return options; } @Override public Set<ConfigOption<?>> optionalOptions() { final Set<ConfigOption<?>> options = new HashSet<>(); // no optional option // options.add(BYTE_DELIMITER); return options; } @Override public DynamicTableSource createDynamicTableSource(Context context) { // either implement your custom validation logic here ... // or use the provided helper utility final FactoryUtil.TableFactoryHelper helper = FactoryUtil.createTableFactoryHelper(this, context); // discover a suitable decoding format final DecodingFormat<DeserializationSchema<RowData>> decodingFormat = helper.discoverDecodingFormat( DeserializationFormatFactory.class, FactoryUtil.FORMAT); // validate all options helper.validate(); // get the validated options final ReadableConfig options = helper.getOptions(); final String url = options.get(URL); final long interval = options.get(INTERVAL); // derive the produced data type (excluding computed columns) from the catalog table final DataType producedDataType = context.getCatalogTable().getResolvedSchema().toPhysicalRowDataType(); // create and return dynamic table source return new HttpDynamicTableSource(url, interval, decodingFormat, producedDataType); }
默认情况下,Flink 使用 Java 的服务提供者接口 (SPI)发现 TableSourceFactory 的实例,所以需要在 META-INF/services/org.apache.flink.table.factories.Factory 中添加 HttpDynamicTableFactory 的全限定类名
com.rookie.submit.cust.source.socket.SocketDynamicTableFactory
## 4. 测试
完整 sql 如下:
create table cust_http_source( id string ,name string ,sex string )WITH( 'connector' = 'http' ,'http.url' = 'http://localhost:8888' ,'http.interval' = '1000' ,'format' = 'csv' ) ; create table cust_http_sink( id string ,name string ,sex string )WITH( 'connector' = 'print' ) ; insert into cust_http_sink select id,name,sex from cust_http_source;
Http Server ,接收 http 请求,返回拼接的字符串:
/** * 创建 http server 监控端口请求 */ public class HttpServer { public static void main(String[] arg) throws Exception { com.sun.net.httpserver.HttpServer server = com.sun.net.httpserver.HttpServer.create(new InetSocketAddress(8888), 10); server.createContext("/", new TestHandler()); server.start(); } static class TestHandler implements HttpHandler { public void handle(HttpExchange exchange) throws IOException { String response = "hello world"; try { //获得表单提交数据(post) String postString = IOUtils.toString(exchange.getRequestBody()); exchange.sendResponseHeaders(200, 0); OutputStream os = exchange.getResponseBody(); String result = UUID.randomUUID().toString(); result = System.currentTimeMillis() + ",name," + result; os.write(result.getBytes()); os.close(); } catch (IOException ie) { ie.printStackTrace(); } catch (Exception e) { e.printStackTrace(); } } } }
启动任务:
接收到的数据:
+I[1633921534798, name, ce1738aa-42e4-4cad-b29a-a011db7cd91a] +I[1633921535813, name, e3b9e51a-f6f4-410e-b2eb-5353b2c1b294] +I[1633921536816, name, f0dd1f7d-d7c5-4520-a147-3db8c8d5d153] +I[1633921537818, name, 4b5461be-b979-48cb-ae3e-375568bfbf06] +I[1633921538820, name, 8c2a80e0-39f8-4f6b-b573-885d1109ac3a] +I[1633921539823, name, 3b324fa9-d6a6-4156-ab0a-888ee3fe02ce] +I[1633921540826, name, e6247826-8e54-40a4-8571-1d3b43419211]
搞定
* 注: http Table source 参考官网: [socket table source](https://nightlies.apache.org/flink/flink-docs-release-1.14/zh/docs/dev/table/sourcessinks/#full-stack-example)
* 注: http server 不能挂
完整案例参考 GitHub: https://github.com/springMoon/sqlSubmit
欢迎关注Flink菜鸟公众号,会不定期更新Flink(开发技术)相关的推文