flink cdc 读取oracle数据
1、开启归档日志
1)数据库服务器终端执行命令
sqlplus / as sysdba 或 sqlplus /nolog CONNECT sys/password@host:port AS SYSDBA;
2)检查归档日志是否开启
archive log list;
(“Database log mode: No Archive Mode”,日志归档未开启)
(“Database log mode: Archive Mode”,日志归档已开启)
3)配置归档日志参数
alter system set db_recovery_file_dest_size = 100G; alter system set db_recovery_file_dest = '/opt/oracle/oradata/recovery_area' scope=spfile;
4)创建表空间
CREATE TABLESPACE logminer_tbs DATAFILE '/opt/oracle/oradata/SID/logminer_tbs.dbf' SIZE 25M REUSE AUTOEXTEND ON MAXSIZE UNLIMITED; SID文件夹需使用root用户提前创建,赋予读写权限:chmod 777
5)启用归档日志
shutdown immediate; #停止oracle服务
startup mount; #启动oracle服务
alter database archivelog; #开启数据库归档
alter database open;
6)启动完成后重新执行 archive log list; 查看归档打开状态
2、创建flinkcdc专属用户
账号为 flinkuser 密码为flinkpw,(执行命令中用户、密码、主机、端口需自行替换)
sqlplus sys/password@host:port/SID AS SYSDBA; CREATE USER flinkuser IDENTIFIED BY flinkpw DEFAULT TABLESPACE LOGMINER_TBS QUOTA UNLIMITED ON LOGMINER_TBS; GRANT CREATE SESSION TO flinkuser; GRANT SET CONTAINER TO flinkuser; GRANT SELECT ON V_$DATABASE to flinkuser; GRANT FLASHBACK ANY TABLE TO flinkuser; GRANT SELECT ANY TABLE TO flinkuser; GRANT SELECT_CATALOG_ROLE TO flinkuser; GRANT EXECUTE_CATALOG_ROLE TO flinkuser; GRANT SELECT ANY TRANSACTION TO flinkuser; GRANT LOGMINING TO flinkuser; GRANT CREATE TABLE TO flinkuser; -- need not to execute if set scan.incremental.snapshot.enabled=true(default) GRANT LOCK ANY TABLE TO flinkuser; GRANT ALTER ANY TABLE TO flinkuser; GRANT CREATE SEQUENCE TO flinkuser; GRANT EXECUTE ON DBMS_LOGMNR TO flinkuser; GRANT EXECUTE ON DBMS_LOGMNR_D TO flinkuser; GRANT SELECT ON V_$LOG TO flinkuser; GRANT SELECT ON V_$LOG_HISTORY TO flinkuser; GRANT SELECT ON V_$LOGMNR_LOGS TO flinkuser; GRANT SELECT ON V_$LOGMNR_CONTENTS TO flinkuser; GRANT SELECT ON V_$LOGMNR_PARAMETERS TO flinkuser; GRANT SELECT ON V_$LOGFILE TO flinkuser; GRANT SELECT ON V_$ARCHIVED_LOG TO flinkuser; GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO flinkuser; exit;
指定oracle表、库级启用
-- 指定表启用补充日志记录: ALTER TABLE database.table ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS; -- 为数据库的所有表启用 ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS; -- 指定数据库启用补充日志记录 ALTER DATABASE ADD SUPPLEMENTAL LOG DATA; -- 提交修改 ALTER SYSTEM SWITCH LOGFILE;
3、引入pom依赖
<dependency> <groupId>com.ververica</groupId> <artifactId>flink-connector-oracle-cdc</artifactId> <version>2.2.1</version> </dependency> <dependency> <groupId>com.ververica</groupId> <artifactId>flink-sql-connector-oracle-cdc</artifactId> <version>2.2.1</version> </dependency> <dependency> <groupId>org.apache.kafka</groupId> <artifactId>connect-api</artifactId> <version>2.7.0</version> </dependency> <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> <version>2.7.0</version> </dependency>
4、java代码
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.api.functions.source.SourceFunction; import com.ververica.cdc.debezium.JsonDebeziumDeserializationSchema; import com.ververica.cdc.connectors.oracle.OracleSource; public class OracleSourceExample { public static void main(String[] args) throws Exception { SourceFunction<String> sourceFunction = OracleSource.<String>builder() .hostname("localhost") .port(1521) .database("HELOWIN") // monitor HELOWIN database .schemaList("SCOTT") // monitor SCOTT schema .tableList("SCOTT.EMP") // monitor EMP table .username("system") .password("system") .deserializer(new JsonDebeziumDeserializationSchema()) // converts SourceRecord to JSON String .build(); StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.addSource(sourceFunction).print().setParallelism(1); // use parallelism 1 for sink to keep message ordering addsink(new customSink()) env.execute(); } }
参考文档:
https://www.jianshu.com/p/8158127edc44
https://blog.csdn.net/haoheiao/article/details/128657518