FlinkCDC使用
环境
版本
flink-1.16.0-bin-scala_2.12.gz
复制jar
flink-sql-connector-mysql-cdc-2.3.0.jar
:监听MySQL数据变更。
flink-sql-connector-tidb-cdc-2.3.0.jar
:监听tidb数据变更。
flink-connector-jdbc-1.16.0.jar
:连接MySQL,并将数据写入MySQL。
flink-sql-connector-kafka-1.16.2.jar
:连接Kafka、消费、生产。
复制到${flink_home}/lib。
MySQL环境配置
用户赋权
此用户只用于监听,无写权限。
mysql> CREATE USER 'debezium'@'%' IDENTIFIED BY '1qazXSW@';
mysql> GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'debezium' IDENTIFIED BY '1qazXSW@';
mysql> FLUSH PRIVILEGES;
日志参数配置
检查时候开启归档
show variables like 'log_bin';
如果为OFF,需在MySQL配置文件my.ini/my.cnf中添加:
server-id=223344
log_bin=mysql-bin
binlog_format=ROW
binlog_row_image=FULL
expire_logs_days=10
gtid-mode=ON
enforce-gtid-consistency
binlog_rows_query_log_events=ON
配置 | 描述 |
---|---|
server-id |
对于 MySQL 集群中的每个服务器和复制客户端,的值server-id 必须是唯一的。在 MySQL 连接器设置期间,Debezium 为连接器分配一个唯一的服务器 ID。 |
log_bin |
的值log_bin 是二进制日志文件序列的基本名称。 |
binlog_format |
binlog-format 必须设置为ROW 或row |
binlog_row_image |
binlog_row_image 必须设置为FULL 或full |
expire_logs_days |
这是自动删除 binlog 文件的天数。默认值为0 ,表示不自动删除。 |
本地环境测试
启动、配置
mysql中创建表,并向表中添加测试数据
create table source_user
(
id int auto_increment
primary key,
name varchar(10) null,
dept_id int null,
salary decimal(10, 4) null,
create_time datetime default CURRENT_TIMESTAMP not null
);
切换到${flink-home}/bin
[root@localhost bin]# ./start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host localhost.localdomain.
Starting taskexecutor daemon on host localhost.localdomain.
[root@localhost bin]# ./sql-client.sh
...
Command history file path: /root/.flink-sql-history
Flink SQL> show tables;
Empty set
Flink SQL> show databases;
+------------------+
| database name |
+------------------+
| default_database |
+------------------+
1 row in set
Flink SQL> use default_database;
[INFO] Execute statement succeed.
Flink SQL> show tables;
Empty set
Flink SQL>
CREATE TABLE source_user(
id INT,
name STRING,
dept_id INT,
PRIMARY KEY (id) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc' ,
'hostname' = 'localhost',
'port' = '3306',
'username' = 'debezium',
'password' = '1qazXSW@',
'database-name' = 'test',
'table-name' = 'source_user'
);
[INFO] Execute statement succeed.
Flink SQL>
CREATE TABLE target_user_aftermap (
id INT,
name STRING,
dept_id INT,
PRIMARY KEY (id) NOT ENFORCED
) WITH (
'connector' = 'jdbc',
'url' = 'jdbc:mysql://localhost:3306/test',
'driver' = 'com.mysql.cj.jdbc.Driver',
'username' = 'root',
'password' = '111111',
'table-name' = 'target_user_aftermap'
);
[INFO] Execute statement succeed.
Flink SQL>
CREATE TABLE dept_dic (
local_id INT,
center_id INT,
PRIMARY KEY (local_id) NOT ENFORCED
) WITH (
'connector' = 'jdbc',
'url' = 'jdbc:mysql://localhost:3306/test',
'driver' = 'com.mysql.cj.jdbc.Driver',
'username' = 'root',
'password' = '111111',
'table-name' = 'dept_dic'
);
[INFO] Execute statement succeed.
Flink SQL>
insert into target_user_aftermap
(id, name, dept_id)
select a.id,a.name,b.center_id from source_user as a ,dept_dic as b where a.dept_id=b.local_id;
[INFO] Submitting SQL update statement to the cluster...
[INFO] SQL update statement has been successfully submitted to the cluster:
Job ID: a9ff88da532fd7fa7a6a32d4e35aee12
CDC配置参数 | 描述 |
---|---|
connector | 连接器名称 |
hostname | 监听数据库地址 |
port | 监听数据库端口 |
username | 监听数据库用户(注意用户权限,使用新创建、赋权的用户) |
password | 监听数据库密码 |
database-name | 监听数据库名(支持正则表达式) |
table-name | 监听数据表名(支持正则表达式) |
更多参数参考:debezium官网。
JDBC配置参数 | 描述 |
---|---|
connector | 连接器名称 |
url | 连接数据库url |
driver | 使用驱动 |
username | 数据库用户(如果需要输出到表,注意用户权限) |
password | 数据库密码 |
table-name | 连接数据表名 |
WEB界面
作业详细内容:
无法访问服务器的8081端口,需要修改配置文件${flink_home}/conf/flink-conf.yaml
,并重启。
修改前:
rest.bind-address: localhost
修改后:
rest.bind-address: 0.0.0.0
数据类型
Data Type | Remarks for Data Type |
---|---|
CHAR |
|
VARCHAR |
|
STRING |
|
BOOLEAN |
|
BYTES |
BINARY and VARBINARY are not supported yet. |
DECIMAL |
Supports fixed precision and scale. |
TINYINT |
|
SMALLINT |
|
INTEGER |
|
BIGINT |
|
FLOAT |
|
DOUBLE |
|
DATE |
|
TIME |
Supports only a precision of 0 . |
TIMESTAMP |
|
TIMESTAMP_LTZ |
|
INTERVAL |
Supports only interval of MONTH and SECOND(3) . |
ARRAY |
|
MULTISET |
|
MAP |
|
ROW |
|
RAW |
|
structured types | Only exposed in user-defined functions yet. |
时间类型
使用TIMESTAMP
类型。
Flink SQL> CREATE TABLE target_user(
> id INT,
> name STRING,
> dept_id INT,
> salary DECIMAL(10,4),
> create_time TIMESTAMP,
> PRIMARY KEY (id) NOT ENFORCED
> ) WITH (
> 'connector' = 'mysql-cdc' ,
> 'hostname' = 'localhost',
> 'port' = '3306',
> 'username' = 'debezium',
> 'password' = '1qazXSW@',
> 'database-name' = 'test',
> 'table-name' = 'target_user'
> );
[INFO] Execute statement succeed.
Flink SQL> SELECT * FROM target_user;
id name dept_id salary create_time
2 ch 1 1234.5600 2023-08-21 15:40:51.000000
3 hh 3 1234.5600 2023-08-21 15:40:51.000000
时间字段使用STRING
,不会报错,但查询出为时间戳类型。DECIMAL类型也需要将精度补充完整。
时间类型设置为BIGINT
。使用TO_TIMESTAMP_LTZ转换。(此时未设置时区)
select TO_TIMESTAMP_LTZ(create_time,3) as var1 from target_user3;
var1
-----------------------
2023-08-21 23:40:51.000
2023-08-21 23:40:51.000
设置时区,获取当前时间
Flink SQL> SET 'table.local-time-zone' = 'Asia/Shanghai';
[INFO] Session property has been set.
Flink SQL> select current_date;
2023-08-22
Flink SQL> select current_time;
11:32:07
Flink SQL> select current_timestamp;
2023-08-22 11:32:39.888
CDC参数
可以将部分从数据库日志中解析出的内容作为表的字段。
例:
CREATE TABLE products (
db_name STRING METADATA FROM 'database_name' VIRTUAL,
table_name STRING METADATA FROM 'table_name' VIRTUAL,
operation_ts TIMESTAMP_LTZ(3) METADATA FROM 'op_ts' VIRTUAL,
order_id INT,
order_date TIMESTAMP(0),
customer_name STRING,
price DECIMAL(10, 5),
product_id INT,
order_status BOOLEAN,
PRIMARY KEY(order_id) NOT ENFORCED
) WITH (
'connector' = 'tidb-cdc',
'tikv.grpc.timeout_in_ms' = '20000',
'pd-addresses' = 'localhost:2379',
'database-name' = 'mydb',
'table-name' = 'orders'
);
CDC参数 | 必填 | 默认值 | 类型 | 描述 |
---|---|---|---|---|
connector | Y | (none) | String | Specify what connector to use, here should be 'mysql-cdc' . |
hostname | Y | (none) | String | IP address or hostname of the MySQL database server. |
username | Y | (none) | String | Name of the MySQL database to use when connecting to the MySQL database server. |
password | Y | (none) | String | Password to use when connecting to the MySQL database server. |
database-name | Y | (none) | String | Database name of the MySQL server to monitor. The database-name also supports regular expressions to monitor multiple tables matches the regular expression. |
table-name | Y | (none) | String | Table name of the MySQL database to monitor. The table-name also supports regular expressions to monitor multiple tables that satisfy the regular expressions. Note: When the MySQL CDC connector regularly matches the table name, it will concat the database-name and table-name filled in by the user through the string \\. to form a full-path regular expression, and then use the regular expression to match the fully qualified name of the table in the MySQL database. |
port | N | 3306 | Integer | Integer port number of the MySQL database server. |
server-id | N | (none) | String | A numeric ID or a numeric ID range of this database client, The numeric ID syntax is like '5400', the numeric ID range syntax is like '5400-5408', The numeric ID range syntax is recommended when 'scan.incremental.snapshot.enabled' enabled. Every ID must be unique across all currently-running database processes in the MySQL cluster. This connector joins the MySQL cluster as another server (with this unique ID) so it can read the binlog. By default, a random number is generated between 5400 and 6400, though we recommend setting an explicit value. |
scan.incremental.snapshot.enabled | N | true | Boolean | Incremental snapshot is a new mechanism to read snapshot of a table. Compared to the old snapshot mechanism, the incremental snapshot has many advantages, including: (1) source can be parallel during snapshot reading, (2) source can perform checkpoints in the chunk granularity during snapshot reading, (3) source doesn't need to acquire global read lock (FLUSH TABLES WITH READ LOCK) before snapshot reading. If you would like the source run in parallel, each parallel reader should have an unique server id, so the 'server-id' must be a range like '5400-6400', and the range must be larger than the parallelism. Please see Incremental Snapshot Readingsection for more detailed information. |
scan.incremental.snapshot.chunk.size 读取表快照时,捕获的表的块大小(行数)被分割成多个块。 | N | 8096 | Integer | The chunk size (number of rows) of table snapshot, captured tables are split into multiple chunks when read the snapshot of table. |
scan.snapshot.fetch.size 获取快照时,每次轮询获取数据条数 | N | 1024 | Integer | The maximum fetch size for per poll when read table snapshot. |
scan.startup.mode 启动模式,如果设置获取表的全部信息作为快照,表数据量很大,需要先调整会话连接时长限制。 | N | initial | String | Optional startup mode for MySQL CDC consumer, valid enumerations are "initial", "earliest-offset", "latest-offset", "specific-offset" and "timestamp". Please see Startup Reading Position section for more detailed information. |
scan.startup.specific-offset.file | N | (none) | String | Optional binlog file name used in case of "specific-offset" startup mode |
scan.startup.specific-offset.pos | N | (none) | Long | Optional binlog file position used in case of "specific-offset" startup mode |
scan.startup.specific-offset.gtid-set | N | (none) | String | Optional GTID set used in case of "specific-offset" startup mode |
scan.startup.specific-offset.skip-events | N | (none) | Long | Optional number of events to skip after the specific starting offset |
scan.startup.specific-offset.skip-rows | N | (none) | Long | Optional number of rows to skip after the specific starting offset |
server-time-zone | N | (none) | String | The session time zone in database server, e.g. "Asia/Shanghai". It controls how the TIMESTAMP type in MYSQL converted to STRING. See more here. If not set, then ZoneId.systemDefault() is used to determine the server time zone. |
debezium.min.row. count.to.stream.result | N | 1000 | Integer | During a snapshot operation, the connector will query each included table to produce a read event for all rows in that table. This parameter determines whether the MySQL connection will pull all results for a table into memory (which is fast but requires large amounts of memory), or whether the results will instead be streamed (can be slower, but will work for very large tables). The value specifies the minimum number of rows a table must contain before the connector will stream results, and defaults to 1,000. Set this parameter to '0' to skip all table size checks and always stream all results during a snapshot. |
connect.timeout 连接器在尝试连接到 MySQL 数据库服务器之后等待超时的最长时间。 | N | 30s | Duration | The maximum time that the connector should wait after trying to connect to the MySQL database server before timing out. |
connect.max-retries 连接器重试构建 MySQL 数据库服务器连接的最大重试次数。 | N | 3 | Integer | The max retry times that the connector should retry to build MySQL database server connection. |
connection.pool.size 连接池大小 | N | 20 | Integer | The connection pool size. |
jdbc.properties.* | N | 20 | String | Option to pass custom JDBC URL properties. User can pass custom properties like 'jdbc.properties.useSSL' = 'false'. |
heartbeat.interval 心跳监测间隔 | N | 30s | Duration | The interval of sending heartbeat event for tracing the latest available binlog offsets. |
debezium.* 详见Debezium官网文档 | N | (none) | String | Pass-through Debezium's properties to Debezium Embedded Engine which is used to capture data changes from MySQL server. For example: 'debezium.snapshot.mode' = 'never' . See more about the Debezium's MySQL Connector properties |
scan.incremental.close-idle-reader.enabled | N | false | Boolean | Whether to close idle readers at the end of the snapshot phase. The flink version is required to be greater than or equal to 1.14 when 'execution.checkpointing.checkpoints-after-tasks-finish.enabled' is set to true. |
数据处理模式
StreamingMode
:适用于连续增量处理,而且预计无限期保持在线的无边界作业。
BatchMode
:适用于有一个已知的固定输入,而且不会连续运行的有边界作业。
public EnvironmentSettings.Builder inBatchMode() {
this.configuration.set(ExecutionOptions.RUNTIME_MODE, RuntimeExecutionMode.BATCH);
return this;
}
public EnvironmentSettings.Builder inStreamingMode() {
this.configuration.set(ExecutionOptions.RUNTIME_MODE, RuntimeExecutionMode.STREAMING);
return this;
提交作业到集群时设置处理模式(推荐)
bin/flink run -Dexecution.runtime-mode=BATCH <jarFile>
报错
服务器资源不足
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not acquire the minimum required resources.
语法错误
[ERROR] Could not execute SQL statement. Reason:
org.apache.calcite.runtime.CalciteException: Non-query expression encountered in illegal context
表达式类型错误
[ERROR] Could not execute SQL statement. Reason:
org.apache.calcite.sql.validate.SqlValidatorException: Cannot apply 'TO_TIMESTAMP_LTZ' to arguments of type 'TO_TIMESTAMP_LTZ(<TIME(0)>, <INTEGER>)'. Supported form(s): 'TO_TIMESTAMP_LTZ(<NUMERIC>, <INTEGER>)'
输入输出字段类型不一致
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.api.ValidationException: Column types of query result and sink for 'default_catalog.default_database.target_user11' do not match.
Cause: Incompatible types for sink column 'create_time' at position 4.
Query schema: [id: INT NOT NULL, name: STRING, dept_id: INT, salary: DECIMAL(10, 4), create_time: TIMESTAMP_LTZ(3)]
Sink schema: [id: INT, name: STRING, dept_id: INT, salary: DECIMAL(10, 4), create_time: STRING]
类型不支持
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.api.ValidationException: The MySQL dialect doesn't support type: TIMESTAMP_LTZ(6).
字符大小写问题
TiDB
TiDB术语
CREATE TABLE products (
db_name STRING METADATA FROM 'database_name' VIRTUAL,
table_name STRING METADATA FROM 'table_name' VIRTUAL,
operation_ts TIMESTAMP_LTZ(3) METADATA FROM 'op_ts' VIRTUAL,
order_id INT,
order_date TIMESTAMP(0),
customer_name STRING,
price DECIMAL(10, 5),
product_id INT,
order_status BOOLEAN,
PRIMARY KEY(order_id) NOT ENFORCED
) WITH (
'connector' = 'tidb-cdc',
'tikv.grpc.timeout_in_ms' = '20000',
'pd-addresses' = 'localhost:2379',
'database-name' = 'mydb',
'table-name' = 'orders'
);
参数 | 必填 | 默认值 | 类型 | 描述 |
---|---|---|---|---|
connector | Y | (none) | String | Specify what connector to use, here should be 'tidb-cdc' . |
database-name | Y | (none) | String | Database name of the TiDB server to monitor. |
table-name | Y | (none) | String | Table name of the TiDB database to monitor. |
scan.startup.mode | N | initial | String | Optional startup mode for TiDB CDC consumer, valid enumerations are "initial" and "latest-offset".是否需要加载表中历史数据。initial是,latest-offset否 |
pd-addresses | Y | (none) | String | TiKV cluster's PD address.PD地址 |
tikv.grpc.timeout_in_ms | N | (none) | Long | TiKV GRPC timeout in ms. grpc的连接超时时间。 |
tikv.grpc.scan_timeout_in_ms | N | (none) | Long | TiKV GRPC scan timeout in ms.grpc的扫描超时时间。 |
tikv.batch_get_concurrency | N | 20 | Integer | TiKV GRPC batch get concurrency.grpc批处理并发数量。 |
tikv.* | N | (none) | String | Pass-through TiDB client's properties. |
TIDB测试
[root@slave3 flink-1.16.0]# pwd
/home/flink/flink-1.16.0
[root@slave3 flink-1.16.0]# netstat -antl | grep 8081
[root@slave3 flink-1.16.0]#
[root@slave3 flink-1.16.0]# cd bin/
[root@slave3 bin]# ./start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host slave3.
Starting taskexecutor daemon on host slave3.
[root@slave3 bin]# jps
933166 jar
2186802 jar
2477040 Jps
2476712 StandaloneSessionClusterEntrypoint
1840610 jar
1822628 jar
[root@slave3 bin]#
注册监听
create table ois_reg_info
(
org_code STRING,
branch_code STRING,
opc_id STRING,
card_type STRING,
card_type_code_org STRING,
card_type_name_org STRING,
card_data STRING,
reg_type STRING,
reg_type_code_org STRING,
reg_type_name_org STRING,
reg_time TIMESTAMP,
reg_source_type STRING,
reg_source_type_code_org STRING,
reg_source_type_name_org STRING,
reg_client_ip STRING,
reg_client_no STRING,
order_source STRING,
order_source_code_org STRING,
order_source_name_org STRING,
resv_sn STRING,
msg_start STRING,
msg_end STRING,
reg_input_empid STRING,
reg_input_empid_code_org STRING,
reg_input_empid_name_org STRING,
reg_dept STRING,
reg_dept_code_org STRING,
reg_dept_name_org STRING,
regdoc_empid STRING,
regdoc_empid_code_org STRING,
regdoc_empid_name_org STRING,
clinic_class STRING,
clinic_class_code_org STRING,
clinic_class_name_org STRING,
invalid_flag STRING,
invalid_empid STRING,
invalid_empid_code_org STRING,
invalid_empid_name_org STRING,
invalid_time TIMESTAMP,
is_eme STRING,
charge_no STRING,
invoice STRING,
paper_invoice STRING,
sumfee DECIMAL(22, 2),
reg_fee DECIMAL(22, 2),
checkup_fee DECIMAL(22, 2),
experts_fee DECIMAL(22, 2),
casecard_fee DECIMAL(22, 2),
card_fee DECIMAL(22, 2),
other_fee DECIMAL(22, 2),
ins_pson_type STRING,
ins_pson_type_code_org STRING,
ins_pson_type_name_org STRING,
serve_way STRING,
serve_way_code_org STRING,
serve_way_name_org STRING,
del_flag STRING,
modify_time_sys TIMESTAMP,
modify_empid STRING,
modify_empid_code_org STRING,
modify_empid_name_org STRING,
create_time_sys TIMESTAMP,
create_empid STRING,
create_empid_code_org STRING,
create_empid_name_org STRING,
modify_time_mfs TIMESTAMP,
create_time_mfs TIMESTAMP,
batch_version STRING,
batch_type STRING,
primary key (org_code, branch_code, opc_id, reg_time) NOT ENFORCED
) WITH (
'connector' = 'tidb-cdc',
'tikv.grpc.timeout_in_ms' = '30000',
'pd-addresses' = 'x.x.x.x:2379',
'database-name' = 'xxxx',
'table-name' = 'ois_reg_info',
'scan.startup.mode' = 'latest-offset'
);
Flink SQL> show tables ;
+--------------+
| table name |
+--------------+
| ois_reg_info |
+--------------+
1 row in set
Flink SQL> select * from ois_reg_info;
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not acquire the minimum required resources.
修改内存参数
####修改前######
jobmanager.memory.process.size: 1600m
taskmanager.memory.process.size: 1728m
####修改后######
jobmanager.memory.process.size: 4096m
taskmanager.memory.process.size: 4096m
RESOURCE_PARAMS extraction logs:
jvm_params: -Xmx3462817376 -Xms3462817376 -XX:MaxMetaspaceSize=268435456
dynamic_configs: -D jobmanager.memory.off-heap.size=134217728b -D jobmanager.memory.jvm-overhead.min=429496736b -D jobmanager.memory.jvm-metaspace.size=268435456b -D jobmanager.memory.heap.size=3462817376b -D jobmanager.memory.jvm-overhead.max=429496736b
logs: INFO [] - Loading configuration property: taskmanager.memory.process.size, 4096m
INFO [] - Loading configuration property: jobmanager.bind-host, localhost
INFO [] - Loading configuration property: taskmanager.bind-host, localhost
INFO [] - Loading configuration property: taskmanager.host, localhost
INFO [] - Loading configuration property: parallelism.default, 1
INFO [] - Loading configuration property: jobmanager.execution.failover-strategy, region
INFO [] - Loading configuration property: jobmanager.rpc.address, localhost
INFO [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
INFO [] - Loading configuration property: rest.address, localhost
INFO [] - Loading configuration property: jobmanager.memory.process.size, 4096m
INFO [] - Loading configuration property: jobmanager.rpc.port, 6123
INFO [] - Loading configuration property: rest.bind-address, 0.0.0.0
INFO [] - Final Master Memory configuration:
INFO [] - Total Process Memory: 4.000gb (4294967296 bytes)
INFO [] - Total Flink Memory: 3.350gb (3597035104 bytes)
INFO [] - JVM Heap: 3.225gb (3462817376 bytes)
INFO [] - Off-heap: 128.000mb (134217728 bytes)
INFO [] - JVM Metaspace: 256.000mb (268435456 bytes)
INFO [] - JVM Overhead: 409.600mb (429496736 bytes)
检查日志
2023-08-23 17:20:16,633 WARN org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolBridge [] - Could not acquire the minimum required resources, failing slot requests. Acquired: []. Current slot pool status: Registered TMs: 0, registered slots: 0 free slots: 0
发现web页面中,available task slots 为0,total task solts为0,task managers 0。
修改任务池数量
taskmanager.numberOfTaskSlots: 50 ##默认为1
taskmanager.numberOfTaskSlots: 50
日志检查
检查flink-root-taskexecutor-0-slave3.log
的报错
Error: VM option ‘UseG1GC’ is experimental and must be enabled via -XX:+UnlockExperimentalVMOptions.
Error: Could not create the Java Virtual Machine.
。。。。。
删除taskmanager.sh
下参数,后重启
目标表
create table flink_ois_reg_info
(
org_code STRING,
branch_code STRING,
opc_id STRING,
card_type STRING,
card_type_code_org STRING,
card_type_name_org STRING,
card_data STRING,
reg_type STRING,
reg_type_code_org STRING,
reg_type_name_org STRING,
reg_time TIMESTAMP,
reg_source_type STRING,
reg_source_type_code_org STRING,
reg_source_type_name_org STRING,
reg_client_ip STRING,
reg_client_no STRING,
order_source STRING,
order_source_code_org STRING,
order_source_name_org STRING,
resv_sn STRING,
msg_start STRING,
msg_end STRING,
reg_input_empid STRING,
reg_input_empid_code_org STRING,
reg_input_empid_name_org STRING,
reg_dept STRING,
reg_dept_code_org STRING,
reg_dept_name_org STRING,
regdoc_empid STRING,
regdoc_empid_code_org STRING,
regdoc_empid_name_org STRING,
clinic_class STRING,
clinic_class_code_org STRING,
clinic_class_name_org STRING,
invalid_flag STRING,
invalid_empid STRING,
invalid_empid_code_org STRING,
invalid_empid_name_org STRING,
invalid_time TIMESTAMP,
is_eme STRING,
charge_no STRING,
invoice STRING,
paper_invoice STRING,
sumfee DECIMAL(22, 2),
reg_fee DECIMAL(22, 2),
checkup_fee DECIMAL(22, 2),
experts_fee DECIMAL(22, 2),
casecard_fee DECIMAL(22, 2),
card_fee DECIMAL(22, 2),
other_fee DECIMAL(22, 2),
ins_pson_type STRING,
ins_pson_type_code_org STRING,
ins_pson_type_name_org STRING,
serve_way STRING,
serve_way_code_org STRING,
serve_way_name_org STRING,
del_flag STRING,
modify_time_sys TIMESTAMP,
modify_empid STRING,
modify_empid_code_org STRING,
modify_empid_name_org STRING,
create_time_sys TIMESTAMP,
create_empid STRING,
create_empid_code_org STRING,
create_empid_name_org STRING,
modify_time_mfs TIMESTAMP,
create_time_mfs TIMESTAMP,
batch_version STRING,
batch_type STRING,
primary key (org_code, branch_code, opc_id, reg_time) NOT ENFORCED
) WITH (
'connector' = 'jdbc',
'url' = 'jdbc:mysql://x.x.x.x:4000/test',
'driver' = 'com.mysql.cj.jdbc.Driver',
'username' = 'xxx',
'password' = 'xxx',
'table-name' = 'flink_ois_reg_info'
);
同步
insert into flink_ois_reg_info select * from ois_reg_info;
Flink SQL> show tables;
+--------------------+
| table name |
+--------------------+
| flink_ois_reg_info |
| ois_reg_info |
+--------------------+
2 rows in set
Flink SQL> insert into flink_ois_reg_info select * from ois_reg_info;
[INFO] Submitting SQL update statement to the cluster...
[INFO] SQL update statement has been successfully submitted to the cluster:
Job ID: 6b2dd0310e50d89ae5a6150ab18d6ff4
直到08.25下午查看才发现接收到一条数据,后续再无接收到任何数据
对接Kafka
监听数据库,字典转换后写入Kafka
Flink SQL> CREATE TABLE source_user(
> id INT,
> name STRING,
> dept_id INT,
> PRIMARY KEY (id) NOT ENFORCED
> ) WITH (
> 'connector' = 'mysql-cdc' ,
> 'hostname' = 'localhost',
> 'port' = '3306',
> 'username' = 'debezium',
> 'password' = '1qazXSW@',
> 'database-name' = 'test',
> 'table-name' = 'source_user'
> );
[INFO] Execute statement succeed.
Flink SQL> select * from source_user;
[INFO] Result retrieval cancelled.
Flink SQL> CREATE TABLE target_user(
> id INT,
> name STRING,
> dept_id INT
> ) WITH (
> 'connector' = 'kafka',
> 'topic' = 'test',
> 'properties.bootstrap.servers' = 'x.x.x.x:9092',
> 'properties.group.id' = 'testGroup',
> 'scan.startup.mode' = 'earliest-offset',
> 'format' = 'json'
> );
[INFO] Execute statement succeed.
Flink SQL> insert into target_user select * from source_user;
[INFO] Submitting SQL update statement to the cluster...
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.api.TableException: Table sink 'default_catalog.default_database.target_user' doesn't support consuming update and delete changes which is produced by node TableSourceScan(table=[[default_catalog, default_database, source_user]], fields=[id, name, dept_id])
Flink SQL> CREATE TABLE dept_dic (
> local_id INT,
> center_id INT,
> PRIMARY KEY (local_id) NOT ENFORCED
> ) WITH (
> 'connector' = 'jdbc',
> 'url' = 'jdbc:mysql://localhost:3306/test',
> 'driver' = 'com.mysql.cj.jdbc.Driver',
> 'username' = 'root',
> 'password' = '111111',
> 'table-name' = 'dept_dic'
> );
[INFO] Execute statement succeed.
Flink SQL> insert into target_user
> (id, name, dept_id)
> select a.id,a.name,b.center_id from source_user as a ,dept_dic as b where a.dept_id=b.local_id;
[INFO] Submitting SQL update statement to the cluster...
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.api.TableException: Table sink 'default_catalog.default_database.target_user' doesn't support consuming update and delete changes which is produced by node Join(joinType=[InnerJoin], where=[(dept_id = local_id)], select=[id, name, dept_id, local_id, center_id], leftInputSpec=[HasUniqueKey], rightInputSpec=[JoinKeyContainsUniqueKey])
修改kafka连接器配置
将'format' = 'json'
改为'value.format' = 'debezium-json'
Flink SQL> CREATE TABLE target_user1(
> id INT,
> name STRING,
> dept_id INT
> ) WITH (
> 'connector' = 'kafka',
> 'topic' = 'test',
> 'properties.bootstrap.servers' = 'x.x.x.x:9092',
> 'properties.group.id' = 'testGroup',
> 'scan.startup.mode' = 'earliest-offset',
> 'value.format' = 'debezium-json'
> );
[INFO] Execute statement succeed.
Flink SQL> insert into target_user1
> (id, name, dept_id)
> select a.id,a.name,b.center_id from source_user as a ,dept_dic as b where a.dept_id=b.local_id;
[INFO] Submitting SQL update statement to the cluster...
[INFO] SQL update statement has been successfully submitted to the cluster:
Job ID: 076a9553a00fa4544b6d3b9db3953b39
模拟源端消费
[root@slave3 kafka]# ./bin/kafka-console-consumer.sh --bootstrap-server x.x.x.x:9092 --topic test --from-beginning
{"before":null,"after":{"id":2,"name":"kk","dept_id":111},"op":"c"}
{"before":null,"after":{"id":3,"name":"hh","dept_id":333},"op":"c"}
{"before":null,"after":{"id":5,"name":"cc","dept_id":222},"op":"c"}
{"before":null,"after":{"id":7,"name":"xx","dept_id":111},"op":"c"}
{"before":null,"after":{"id":8,"name":"gg","dept_id":111},"op":"c"}
{"before":null,"after":{"id":9,"name":"yy","dept_id":222},"op":"c"}
模拟flink对接OGG
模拟生成OGG日志
[root@slave3 kafka]# ./bin/kafka-topics.sh --bootstrap-server x.x.x.x:9092 --create --topic test1
Created topic test1.
[root@slave3 kafka]# ./bin/kafka-console-producer.sh --bootstrap-server 10.76.4.107:9092 --topic test1
>{"table":"target_user2","op_type":"U","op_ts":"2023-08-30 14:43:26.214713","current_ts":"2023-08-30T14:43:32.126020","pos":"00000000050008402642","before":{"id":"23509681B1","name":"B"},"after":{"id":"23509681B1","name":"B"}}
>{"table":"target_user2","op_type":"U","op_ts":"2023-08-30 14:43:26.214713","current_ts":"2023-08-30T14:43:32.126020","pos":"00000000050008402642","before":{"id":"23509681B1","name":"B"},"after":{"id":"23509681B1","name":"B"}}
>
Flink消费OGG生产到kafka日志
OGG与Flink CDC日志解析出的日志结构不一样, 不能直接使用debezium-json
格式化。
Flink SQL> CREATE TABLE target_user2(
> id STRING,
> name STRING
> ) WITH (
> 'connector' = 'kafka',
> 'topic' = 'test1',
> 'properties.bootstrap.servers' = 'x.x.x.x:9092',
> 'properties.group.id' = 'testGroup',
> 'scan.startup.mode' = 'earliest-offset',
> 'value.format' = 'debezium-json'
> );
[INFO] Execute statement succeed.
Flink SQL> select * from target_user2;
[ERROR] Could not execute SQL statement. Reason:
java.io.IOException: Corrupt Debezium JSON message '{"table":"target_user2","op_type":"U","op_ts":"2023-08-30 14:43:26.214713","current_ts":"2023-08-30T14:43:32.126020","pos":"00000000050008402642","before":{"id":"23509681B1","name":"B"},"after":{"id":"23509681B1","name":"B"}}'.
修改source的format配置参数
Flink SQL> CREATE TABLE target_user3(
> id STRING,
> name STRING
> ) WITH (
> 'connector' = 'kafka',
> 'topic' = 'test1',
> 'properties.bootstrap.servers' = 'x.x.x.x:9092',
> 'properties.group.id' = 'testGroup',
> 'scan.startup.mode' = 'earliest-offset',
> 'format' = 'json'
> );
[INFO] Execute statement succeed.
JSON和Flink SQL数据类型映射
Flink SQL 类型 | JSON 类型 |
---|---|
CHAR / VARCHAR / STRING |
string |
BOOLEAN |
boolean |
BINARY / VARBINARY |
string with encoding: base64 |
DECIMAL |
number |
TINYINT |
number |
SMALLINT |
number |
INT |
number |
BIGINT |
number |
FLOAT |
number |
DOUBLE |
number |
DATE |
string with format: date |
TIME |
string with format: time |
TIMESTAMP |
string with format: date-time |
TIMESTAMP_WITH_LOCAL_TIME_ZONE |
string with format: date-time (with UTC time zone) |
INTERVAL |
number |
ARRAY |
array |
MAP / MULTISET |
object |
ROW |
object |
关键字问题
需要使用``包裹关键字
使用数据:
{"table":"target_user","op_type":"U","op_ts":"2023-08-30 14:43:26.214713","current_ts":"2023-08-30T14:43:32.126020","pos":"00000000050008402642","before":{"id":"23509681B1","name":"B"},"after":{"id":"23509681B1","name":"B"}}
{"table":"target_user","op_type":"U","op_ts":"2023-08-30 14:43:26.214713","current_ts":"2023-08-30T14:43:32.126020","pos":"00000000050008402642","before":{"id":"23509681B1","name":"B"},"after":{"id":"23509681B1","name":"B"}}
Flink SQL> CREATE TABLE target_user(
> table STRING,
> op_type STRING,
> before ROW(ZYH STRING,MRCYSHBS STRING),
> after ROW(ZYH STRING,MRCYSHBS STRING)
> ) WITH (
> 'connector' = 'kafka',
> 'topic' = 'test1',
> 'properties.bootstrap.servers' = 'x.x.x.x:9092',
> 'properties.group.id' = 'Group0831',
> 'scan.startup.mode' = 'earliest-offset',
> 'format' = 'json'
> );
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.sql.parser.impl.ParseException: Encountered "table" at line 2, column 6.
Was expecting one of:
"CONSTRAINT" ...
"PRIMARY" ...
"UNIQUE" ...
"WATERMARK" ...
<BRACKET_QUOTED_IDENTIFIER> ...
<QUOTED_IDENTIFIER> ...
<BACK_QUOTED_IDENTIFIER> ...
<HYPHENATED_IDENTIFIER> ...
<IDENTIFIER> ...
<UNICODE_QUOTED_IDENTIFIER> ...
Flink SQL> CREATE TABLE target_user1(
> `table` STRING,
> op_type STRING,
> before ROW(),
> after ROW()
> ) WITH (
> 'connector' = 'kafka',
> 'topic' = 'test1',
> 'properties.bootstrap.servers' = 'x.x.x.x:9092',
> 'properties.group.id' = 'Group0831',
> 'scan.startup.mode' = 'earliest-offset',
> 'format' = 'json'
> );
[INFO] Execute statement succeed.
Flink SQL> CREATE TABLE target_user1(
> `table` STRING,
> op_type STRING,
> before ROW(id STRING,name STRING),
> after ROW(id STRING,name STRING)
> ) WITH (
> 'connector' = 'kafka',
> 'topic' = 'test1',
> 'properties.bootstrap.servers' = 'x.x.x.x:9092',
> 'properties.group.id' = 'Group0831',
> 'scan.startup.mode' = 'earliest-offset',
> 'format' = 'json'
> );
[INFO] Execute statement succeed.
Flink SQL> select * from target_user1;
[INFO] Result retrieval cancelled.
Flink SQL> select before.id,before.name from target_user1;
[INFO] Result retrieval cancelled.
添加CUD类型数据
[root@slave3 kafka]# ./bin/kafka-console-producer.sh --bootstrap-server x.x.x.x:9092 --topic test0831
>{"table":"target_user","op_type":"U","op_ts":"2023-08-30 14:43:26.214713","current_ts":"2023-08-30T14:43:32.126020","pos":"00000000050008402642","before":{"id":"23509681B1","name":"B"},"after":{"id":"23509681B1","name":"B"}}
>{"table":"target_user2","op_type":"I","op_ts":"2023-08-30 14:43:26.214713","current_ts":"2023-08-30T14:43:32.126020","pos":"00000000050008402642","before":null,"after":{"id":"23509681C2","name":"C"}}
>{"table":"target_user2","op_type":"D","op_ts":"2023-08-30 14:43:26.214713","current_ts":"2023-08-30T14:43:32.126020","pos":"00000000050008402642","before":{"id":"23509681A1","name":"A"},"after":null}
>
Flink SQL> CREATE TABLE target_user(
> `table` STRING,
> op_type STRING,
> before ROW(id STRING,name STRING),
> after ROW(id STRING,name STRING)
> ) WITH (
> 'connector' = 'kafka',
> 'topic' = 'test0831',
> 'properties.bootstrap.servers' = 'x.x.x.x:9092',
> 'properties.group.id' = 'Group0831',
> 'scan.startup.mode' = 'earliest-offset',
> 'format' = 'json'
> );
[INFO] Execute statement succeed.
Flink SQL> select * from target_user;
[INFO] Result retrieval cancelled.
Flink SQL> select
> a.`table`,
> a.op_type,
> case
> when op_type = 'I'
> THEN a.after.id
> when op_type = 'D'
> THEN before.id
> when op_type = 'U'
> THEN a.after.id
> ELSE '0' END AS PK1
> from target_user as a ;
参数调优
重启策略
暂时没配置
restart-strategy: fixed-delay
restart-strategy.fixed-delay.attempts: 3
restart-strategy.fixed-delay.delay: 10 s
并发问题
parallelism.default: 4
注意:
taskmanager.numberOfTaskSlots: 12
# 5*4=20》12,导致无法获取最小执行资源
心跳超时时间
HEARTBEAT_INTERVAL就是超时检测间隔(默认为10秒),HEARTBEAT_TIMEOUT就是超时时长(默认为50秒)
[源码解析] 从TimeoutException看Flink的心跳机制
心跳超时时间设置-Flink1.9.0源码调试介绍&增加调试超时时间_
其他参数
Flink jdbc-connector
附带查询条件不会下推到数据库,默认走全表加载,在内存中使用算子过滤;
streamEnv.executeSql("insert into target_user\n" +
"select a.id,a.name,b.center_id,a.salary,a.create_time from source_user as a left join dept_dic as b\n" +
"on a.dept_id = b.local_id;");
LOOKUPJOIN-适用大维表关联、业务表关联
使用FOR SYSTEM AS OF A.PROCTIME
optimize result:
FlinkLogicalSink(table=[default_catalog.default_database.target_user], fields=[id, name, center_id, salary, create_time])
+- FlinkLogicalCalc(select=[id, name, center_id, salary, create_time])
+- FlinkLogicalJoin(condition=[=($2, $5)], joinType=[left])
:- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, source_user]], fields=[id, name, dept_id, salary, create_time])
+- FlinkLogicalSnapshot(period=[$cor0.proctime])
+- FlinkLogicalTableSourceScan(table=[[default_catalog, default_database, dept_dic]], fields=[local_id, center_id])
CREATE TABLE source_user(
id INT,
name STRING,
dept_id INT,
salary DECIMAL(10,4),
create_time TIMESTAMP,
proctime AS PROCTIME(),
PRIMARY KEY (id) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc' ,
'hostname' = 'localhost',
'port' = '3306',
'username' = 'debezium',
'password' = '1qazXSW@',
'database-name' = 'test',
'table-name' = 'source_user'
);
CREATE TABLE target_user(
id INT,
name STRING,
dept_id INT,
salary DECIMAL(10,4),
create_time TIMESTAMP,
PRIMARY KEY (id) NOT ENFORCED
) WITH (
'connector' = 'jdbc',
'url' = 'jdbc:mysql://localhost:3306/test',
'driver' = 'com.mysql.cj.jdbc.Driver',
'username' = 'root',
'password' = 'root',
'table-name' = 'target_user'
);
CREATE TABLE dept_dic (
local_id INT,
center_id INT,
PRIMARY KEY (local_id) NOT ENFORCED
) WITH (
'connector' = 'jdbc',
'url' = 'jdbc:mysql://localhost:3306/test',
'driver' = 'com.mysql.cj.jdbc.Driver',
'username' = 'root',
'password' = 'root',
'table-name' = 'dept_dic'
);
## 提交作业
insert into target_user
select a.id,a.name,b.center_id,a.salary,a.create_time from source_user as a
left join dept_dic FOR SYSTEM_TIME AS OF a.proctime as b
on a.dept_id = b.local_id;
sql-client中指定任务名称
Flink SQL> set pipeline.name = 'mysql2mysql';
[INFO] Session property has been set.
checkpoint
启用检查点持久化
# 修改flink-conf.yaml配置文件中的相关参数。
state.backend: filesystem
# 配置state.checkpoints.dir参数,指定文件系统中用于保存检查点数据的目录
# file:// 持久化为本地文件; hdfs:// 持久化到hdfs;其他还有到数据库的;
state.checkpoints.dir: file:///path/to/local/directory
设置检查点间隔时间
Flink SQL> SET execution.checkpointing.interval = 3s;
[INFO] Session property has been set.
文件目录结构
[root@localhost flink-1.16.0]# cd ../checkpoints/
[root@localhost checkpoints]# ls
c1bc183d5c8386ded6fb302c5721676c
[root@localhost checkpoints]# cd c1bc183d5c8386ded6fb302c5721676c/
[root@localhost c1bc183d5c8386ded6fb302c5721676c]# ls
chk-195 shared taskowned
作业状态查询
# 根据作业id查询作业信息 ./bin/flink list -r <jobId>
# 启动时间 :jobid :jobname (运行状态)
[root@localhost flink-1.16.0]# ./bin/flink list -r c1bc183d5c8386ded6fb302c5721676c
Waiting for response...
------------------ Running/Restarting Jobs -------------------
05.09.2023 10:58:39 : c1bc183d5c8386ded6fb302c5721676c : mysql2mysql (RUNNING)
--------------------------------------------------------------
[root@localhost flink-1.16.0]# ./bin/flink list -r c1bc183d5c8386ded6fb302c5721676c
Waiting for response...
No running jobs.
根据checkpointID进行重启
在web界面查看检查点id或在文件目录查看。
# 在重启任务之前,需要先取消当前正在运行的任务。可以使用以下命令来取消任务:
./bin/flink cancel -s <checkpointId> <jobId>
# 使用flink run命令重新提交任务。在取消任务后,可以使用以下命令重新提交任务:
# 注意:此处重启只能指定jar文件
./bin/flink run -s <checkpoint_path> -d <jarFile>
[root@localhost flink-1.16.0]# ./bin/flink run --help
Action "run" compiles and runs a program.
Syntax: run [OPTIONS] <jar-file> <arguments>
"run" action options:
-c,--class <classname> Class with the program entry
point ("main()" method). Only
needed if the JAR file does not
specify the class in its
manifest.
-C,--classpath <url> Adds a URL to each user code
classloader on all nodes in the
cluster. The paths must specify
a protocol (e.g. file://) and be
accessible on all nodes (e.g. by
means of a NFS share). You can
use this option multiple times
for specifying more than one
URL. The protocol must be
supported by the {@link
java.net.URLClassLoader}.
-d,--detached If present, runs the job in
detached mode
-n,--allowNonRestoredState Allow to skip savepoint state
that cannot be restored. You
need to allow this if you
removed an operator from your
program that was part of the
program when the savepoint was
triggered.
-p,--parallelism <parallelism> The parallelism with which to
run the program. Optional flag
to override the default value
specified in the configuration.
-py,--python <pythonFile> Python script with the program
entry point. The dependent
resources can be configured with
the `--pyFiles` option.
-pyarch,--pyArchives <arg> Add python archive files for
job. The archive files will be
extracted to the working
directory of python UDF worker.
For each archive file, a target
directory be specified. If the
target directory name is
specified, the archive file will
be extracted to a directory with
the specified name. Otherwise,
the archive file will be
extracted to a directory with
the same name of the archive
file. The files uploaded via
this option are accessible via
relative path. '#' could be used
as the separator of the archive
file path and the target
directory name. Comma (',')
could be used as the separator
to specify multiple archive
files. This option can be used
to upload the virtual
environment, the data files used
in Python UDF (e.g.,
--pyArchives
file:///tmp/py37.zip,file:///tmp
/data.zip#data --pyExecutable
py37.zip/py37/bin/python). The
data files could be accessed in
Python UDF, e.g.: f =
open('data/data.txt', 'r').
-pyclientexec,--pyClientExecutable <arg> The path of the Python
interpreter used to launch the
Python process when submitting
the Python jobs via "flink run"
or compiling the Java/Scala jobs
containing Python UDFs.
-pyexec,--pyExecutable <arg> Specify the path of the python
interpreter used to execute the
python UDF worker (e.g.:
--pyExecutable
/usr/local/bin/python3). The
python UDF worker depends on
Python 3.6+, Apache Beam
(version == 2.38.0), Pip
(version >= 20.3) and SetupTools
(version >= 37.0.0). Please
ensure that the specified
environment meets the above
requirements.
-pyfs,--pyFiles <pythonFiles> Attach custom files for job. The
standard resource file suffixes
such as .py/.egg/.zip/.whl or
directory are all supported.
These files will be added to the
PYTHONPATH of both the local
client and the remote python UDF
worker. Files suffixed with .zip
will be extracted and added to
PYTHONPATH. Comma (',') could be
used as the separator to specify
multiple files (e.g., --pyFiles
file:///tmp/myresource.zip,hdfs:
///$namenode_address/myresource2
.zip).
-pym,--pyModule <pythonModule> Python module with the program
entry point. This option must be
used in conjunction with
`--pyFiles`.
-pyreq,--pyRequirements <arg> Specify a requirements.txt file
which defines the third-party
dependencies. These dependencies
will be installed and added to
the PYTHONPATH of the python UDF
worker. A directory which
contains the installation
packages of these dependencies
could be specified optionally.
Use '#' as the separator if the
optional parameter exists (e.g.,
--pyRequirements
file:///tmp/requirements.txt#fil
e:///tmp/cached_dir).
-rm,--restoreMode <arg> Defines how should we restore
from the given savepoint.
Supported options: [claim -
claim ownership of the savepoint
and delete once it is subsumed,
no_claim (default) - do not
claim ownership, the first
checkpoint will not reuse any
files from the restored one,
legacy - the old behaviour, do
not assume ownership of the
savepoint files, but can reuse
some shared files.
-s,--fromSavepoint <savepointPath> Path to a savepoint to restore
the job from (for example
hdfs:///flink/savepoint-1537).
-sae,--shutdownOnAttachedExit If the job is submitted in
attached mode, perform a
best-effort cluster shutdown
when the CLI is terminated
abruptly, e.g., in response to a
user interrupt, such as typing
Ctrl + C.
Options for Generic CLI mode:
-D <property=value> Allows specifying multiple generic configuration
options. The available options can be found at
https://nightlies.apache.org/flink/flink-docs-stable/
ops/config.html
-e,--executor <arg> DEPRECATED: Please use the -t option instead which is
also available with the "Application Mode".
The name of the executor to be used for executing the
given job, which is equivalent to the
"execution.target" config option. The currently
available executors are: "remote", "local",
"kubernetes-session", "yarn-per-job" (deprecated),
"yarn-session".
-t,--target <arg> The deployment target for the given application,
which is equivalent to the "execution.target" config
option. For the "run" action the currently available
targets are: "remote", "local", "kubernetes-session",
"yarn-per-job" (deprecated), "yarn-session". For the
"run-application" action the currently available
targets are: "kubernetes-application".
Options for yarn-cluster mode:
-m,--jobmanager <arg> Set to yarn-cluster to use YARN execution
mode.
-yid,--yarnapplicationId <arg> Attach to running YARN session
-z,--zookeeperNamespace <arg> Namespace to create the Zookeeper
sub-paths for high availability mode
Options for default mode:
-D <property=value> Allows specifying multiple generic
configuration options. The available
options can be found at
https://nightlies.apache.org/flink/flink-do
cs-stable/ops/config.html
-m,--jobmanager <arg> Address of the JobManager to which to
connect. Use this flag to connect to a
different JobManager than the one specified
in the configuration. Attention: This
option is respected only if the
high-availability configuration is NONE.
-z,--zookeeperNamespace <arg> Namespace to create the Zookeeper sub-paths
for high availability mode
web提交任务-jar
提交jar
点击jar包进行配置
检查点保存位置如果不配置,将按照配置文件的路径进行保存。
报错
执行环境问题
the localstreamenvironment cannot be used when submitting a program through a client, or running in a testenvironment context.
修改获取执行环境
env = StreamExecutionEnvironment.getExecutionEnvironment();
// 修改前:创建带有本地webui的执行环境
//env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(configuration);
提交失败会弹出报错日志,根据日志排查,直接搜自己的主类名;
提交成功自动跳转
验证