SeaTunnel 2.3.6 在Ubuntu环境的安装

SeaTunnel 2.3.6 在Ubuntu环境的安装

环境说明

  • SeaTunnel 2.3.6
  • Ubuntu 24.04 LTS
  • sudo User : seatunnel
  • 程序目录:/opt/apache-seatunnel-2.3.6

环境变量

推荐维护环境变量到:/etc/profile.d 目录下。
添加文件:seatunnel.sh

export  SEATUNNEL_HOME=/opt/apache-seatunnel-2.3.6

下载软件

下载SeaTunnel二进制文件
下载地址:https://seatunnel.apache.org/download/

  • apache-seatunnel-2.3.6-bin.tar.gz
    解压文件:
tar -xvf apache-seatunnel-2.3.6-bin.tar.gz

得到:

seatunnel@ubuntu24:/tmp$ ll
drwxr-xr-x 10 seatunnel        seatunnel        4096 Nov  8  2023 apache-seatunnel-2.3.6/

移动文件:

sudo mv apache-seatunnel-2.3.6 /opt/

下载连接器

连接器下载配置

连接器配置列表:
文件路径: apache-seatunnel-2.3.6/config/plugin_config

建议初始下载连接器配置:

--connectors-v2--
connector-cdc-mysql
connector-fake
connector-console
--end--

默认下载连接器配置文件:
默认配置文件包含全部支持的连接器插件,如无必要,不需要全部下载。
config/plugin_config

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
#
# This mapping is used to resolve the Jar package name without version (or call artifactId)
#
# corresponding to the module in the user Config, helping SeaTunnel to load the correct Jar package.
# Don't modify the delimiter " -- ", just select the plugin you need
--connectors-v2--
connector-amazondynamodb
connector-assert
connector-cassandra
connector-cdc-mysql
connector-cdc-mongodb
connector-cdc-sqlserver
connector-cdc-postgres
connector-cdc-oracle
connector-clickhouse
connector-datahub
connector-dingtalk
connector-doris
connector-elasticsearch
connector-email
connector-file-ftp
connector-file-hadoop
connector-file-local
connector-file-oss
connector-file-jindo-oss
connector-file-s3
connector-file-sftp
connector-file-obs
connector-google-sheets
connector-google-firestore
connector-hive
connector-http-base
connector-http-feishu
connector-http-gitlab
connector-http-github
connector-http-jira
connector-http-klaviyo
connector-http-lemlist
connector-http-myhours
connector-http-notion
connector-http-onesignal
connector-http-wechat
connector-hudi
connector-iceberg
connector-influxdb
connector-iotdb
connector-jdbc
connector-kafka
connector-kudu
connector-maxcompute
connector-mongodb
connector-neo4j
connector-openmldb
connector-pulsar
connector-rabbitmq
connector-redis
connector-druid
connector-s3-redshift
connector-sentry
connector-slack
connector-socket
connector-starrocks
connector-tablestore
connector-selectdb-cloud
connector-hbase
connector-amazonsqs
connector-easysearch
connector-paimon
connector-rocketmq
connector-tdengine
connector-web3j
connector-milvus

下载连接器插件

进入程序目录:

cd /opt/apache-seatunnel-2.3.6

开始下载:

# 推荐
bash bin/install-plugin.sh 
# 或:
./bin/install-plugin.sh 
# 或:
sh bin/install-plugin.sh

注意: 请保证执行器为:bash ,以防解释器是 dash 而导致出错。

下载位置:
apache-seatunnel-2.3.6/connectors/

注: 经测试,SeaTunnel 2.3.4版本及以后 与 SeaTunnel 2.3.3之前 下载连接器路径不同

2.3.3 : apache-seatunnel-2.3.3/connectors/seatunnel
2.3.4 : apache-seatunnel-2.3.4/connectors/
2.3.6 : apache-seatunnel-2.3.6/connectors/

下载连接器加速

使用默认方式下载连接器插件时,可以注意到是从默认的apache仓库下载的。

Downloading from central: https://repo.maven.apache.org/maven2/org/apache/seatunnel/connector-cdc-mysql/2.3.6/connector....

速度很慢。
首次执行 install-plugin.sh 脚本后,可使用 Ctrl+C 终止掉,生成默认的 mavne wrapper 配置,.m2 文件夹配置。
配置 maven 地址:
~/.m2/settings.xml
如果没有此文件可新增。

<?xml version="1.0" encoding="UTF-8"?>
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
	<pluginGroups></pluginGroups>
	<proxies></proxies>

	<servers>
	</servers>

<mirrors>
<!-- 阿里云仓库 -->
<mirror>
    <id>alimaven</id>
    <mirrorOf>*</mirrorOf>
    <name>aliyun maven</name>
    <url>https://maven.aliyun.com/repository/central</url>
</mirror>
</mirrors>

<profiles>
</profiles>

</settings>

然后再重新执行:

bash bin/install-plugin.sh 

可注意到已从阿里云仓库进行下载了。

测试SeaTunnel示例批任务

运行示例任务:

./bin/seatunnel.sh --config ./config/v2.batch.config.template -e local

示例运行成功日志:

2024-08-12 08:54:05,670 INFO  [o.a.s.e.c.j.ClientJobProxy    ] [main] - Job (875301094702448641) end with state FINISHED
2024-08-12 08:54:05,707 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] -
***********************************************
           Job Statistic Information
***********************************************
Start Time                : 2024-08-12 08:54:03
End Time                  : 2024-08-12 08:54:05
Total Time(s)             :                   2
Total Read Count          :                  32
Total Write Count         :                  32
Total Failed Count        :                   0
***********************************************

2024-08-12 08:54:05,707 INFO  [c.h.c.LifecycleService        ] [main] - hz.client_1 [seatunnel-664865] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is SHUTTING_DOWN
2024-08-12 08:54:05,713 INFO  [c.h.i.s.t.TcpServerConnection ] [hz.main.IO.thread-in-1] - [localhost]:5801 [seatunnel-664865] [5.1] Connection[id=1, /127.0.0.1:5801->/127.0.0.1:50189, qualifier=null, endpoint=[127.0.0.1]:50189, remoteUuid=4584e8d2-6b2f-4a10-af64-892d2fa897cb, alive=false, connectionType=JVM, planeIndex=-1] closed. Reason: Connection closed by the other side
2024-08-12 08:54:05,714 INFO  [.c.i.c.ClientConnectionManager] [main] - hz.client_1 [seatunnel-664865] [5.1] Removed connection to endpoint: [localhost]:5801:89ddf390-cb35-4347-ab51-c794b2c6a868, connection: ClientConnection{alive=false, connectionId=1, channel=NioChannel{/127.0.0.1:50189->localhost/127.0.0.1:5801}, remoteAddress=[localhost]:5801, lastReadTime=2024-08-12 08:54:05.701, lastWriteTime=2024-08-12 08:54:05.670, closedTime=2024-08-12 08:54:05.710, connected server version=5.1}
2024-08-12 08:54:05,714 INFO  [c.h.c.LifecycleService        ] [main] - hz.client_1 [seatunnel-664865] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is CLIENT_DISCONNECTED
2024-08-12 08:54:05,718 INFO  [c.h.c.i.ClientEndpointManager ] [hz.main.event-5] - [localhost]:5801 [seatunnel-664865] [5.1] Destroying ClientEndpoint{connection=Connection[id=1, /127.0.0.1:5801->/127.0.0.1:50189, qualifier=null, endpoint=[127.0.0.1]:50189, remoteUuid=4584e8d2-6b2f-4a10-af64-892d2fa897cb, alive=false, connectionType=JVM, planeIndex=-1], clientUuid=4584e8d2-6b2f-4a10-af64-892d2fa897cb, clientName=hz.client_1, authenticated=true, clientVersion=5.1, creationTime=1723452843171, latest clientAttributes=lastStatisticsCollectionTime=1723452843212,enterprise=false,clientType=JVM,clientVersion=5.1,clusterConnectionTimestamp=1723452843154,clientAddress=127.0.0.1,clientName=hz.client_1,credentials.principal=null,os.committedVirtualMemorySize=3176402944,os.freePhysicalMemorySize=3446554624,os.freeSwapSpaceSize=2147479552,os.maxFileDescriptorCount=1048576,os.openFileDescriptorCount=51,os.processCpuTime=4630000000,os.systemLoadAverage=0.240234375,os.totalPhysicalMemorySize=8317079552,os.totalSwapSpaceSize=2147479552,runtime.availableProcessors=2,runtime.freeMemory=277072344,runtime.maxMemory=477626368,runtime.totalMemory=330301440,runtime.uptime=3282,runtime.usedMemory=53229096, labels=[]}
2024-08-12 08:54:05,719 INFO  [c.h.c.LifecycleService        ] [main] - hz.client_1 [seatunnel-664865] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is SHUTDOWN
2024-08-12 08:54:05,720 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - Closed SeaTunnel client......
2024-08-12 08:54:05,720 INFO  [c.h.c.LifecycleService        ] [main] - [localhost]:5801 [seatunnel-664865] [5.1] [localhost]:5801 is SHUTTING_DOWN
2024-08-12 08:54:05,724 INFO  [c.h.i.p.i.MigrationManager    ] [hz.main.cached.thread-11] - [localhost]:5801 [seatunnel-664865] [5.1] Shutdown request of Member [localhost]:5801 - 89ddf390-cb35-4347-ab51-c794b2c6a868 this master is handled
2024-08-12 08:54:05,729 INFO  [c.h.i.i.Node                  ] [main] - [localhost]:5801 [seatunnel-664865] [5.1] Shutting down connection manager...
2024-08-12 08:54:05,732 INFO  [c.h.i.i.Node                  ] [main] - [localhost]:5801 [seatunnel-664865] [5.1] Shutting down node engine...
2024-08-12 08:54:05,747 INFO  [.c.c.DefaultClassLoaderService] [main] - close classloader service
2024-08-12 08:54:05,747 INFO  [o.a.s.e.s.TaskExecutionService] [event-forwarder-0] - [localhost]:5801 [seatunnel-664865] [5.1] Event forward thread interrupted
2024-08-12 08:54:08,759 INFO  [c.h.i.i.NodeExtension         ] [main] - [localhost]:5801 [seatunnel-664865] [5.1] Destroying node NodeExtension.
2024-08-12 08:54:08,760 INFO  [c.h.i.i.Node                  ] [main] - [localhost]:5801 [seatunnel-664865] [5.1] Hazelcast Shutdown is completed in 3037 ms.
2024-08-12 08:54:08,760 INFO  [c.h.c.LifecycleService        ] [main] - [localhost]:5801 [seatunnel-664865] [5.1] [localhost]:5801 is SHUTDOWN
2024-08-12 08:54:08,760 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - Closed HazelcastInstance ......
2024-08-12 08:54:08,761 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - Closed metrics executor service ......
2024-08-12 08:54:09,726 INFO  [s.c.s.s.c.ClientExecuteCommand] [Thread-26] - run shutdown hook because get close signal

测试 Mysql-CDC 到 Postgresql

下载数据库驱动

下载MySQL驱动 Postgreql 驱动,并添加到lib目录
如:

mkdir -p ${SEATUNNEL_HOME}/plugins/jdbc/lib/
cp mysql-connector-j-8.2.0.jar ${SEATUNNEL_HOME}/plugins/jdbc/lib/
cp postgresql-42.7.2.jar ${SEATUNNEL_HOME}/plugins/jdbc/lib/

注:

  1. 按照 plugins/README.md 的说明,如果使用 Zeta Engine,请把jdbc drivers放到 $SEATUNNEL_HOME/lib/ 下。
  2. 经实验,驱动放到$SEATUNNEL_HOME/lib/下,需重启集群模式,否则加载不到。而plugins/jdbc/lib为动态加载。

创建测试表

连接 Mysql 数据库,并创建库表。

create database test;
create table test.test_001(id int ,name varchar(100));

连接 postgresql 数据库,并创建库表。

create database test;
\c test
create schema test;
-- 可自动建表
-- create table test.test_001(id int ,name varchar(100)); 

编辑任务配置文件

config/stream_mysql_postgresql.config

env {
  job.mode = "STREAMING"
  job.name = "streaming-mysql-pg"
}

source {
  MySQL-CDC {
    base-url = "jdbc:mysql://192.168.8.101:3306/test"
    username = "root"
    password = "123456"
    table-names = ["test.test_001"]
  }
}

sink {
  jdbc {
    url = "jdbc:postgresql://192.168.8.101:5432/test"
    driver = "org.postgresql.Driver"
    user = "postgres"
    password = "postgres"
    database = "test"
    table = "test.test_001"
    generate_sink_sql = true
  }
}

注意:

  1. postgres 不支持跨库直接引用表名。如:登录数据库为 postgres 则不允许直接向表:test.test.test_001 插入数据。因此,sink 中 jdbc 连接穿中的 database 与表配置中的 database 项要保持一致。而连接时,该数据库又必须存在才可以登录,因此,database 需提前手动创建。除 public 公共 schema 外,其他 schema 需要手动提前创建好,并不会自动创建。
  2. seartunnel 在进行数据同步时,会过滤掉系统 database 和系统表,因此 sink 的目标库不要是系统库如:postgres

启动任务

bash bin/seatunnel.sh --config config/stream_mysql_postgresql.config --deploy-mode local

注意:

  1. 启动模式为:本地模式。默认为集群模式。不指定则会报错:java.lang.IllegalStateException: Unable to connect to any cluster.
  2. 发现 bug。postgresql 的目录是 3 级结构:dataabse --> schema --> table ,而 mysql 是 2 级结构:database --> table 。
    如果想同步:mysql 下的 test.test_table 到 postgresql 下的 postgres.test.test_table 自动建表语句将失败。
    前提:
    postgres.test schema 不存在。
    postgres.test.test_table 不存在。
ERROR: database "postgres" already exists

原因:在 2.3.6 版本会在执行语句前先判断数据库是否存在,而源代码中在进行数据库是否存在判断时会排除系统 database这就让系统误以为 postgres 不存在,进而执行创建语句。
3. 该任务同步表无主键,任务中断重启后会全量重新同步。

测试带主键表实时同步

创建源表

Mysql :

create table test.test_001(id bigint AUTO_INCREMENT PRIMARY KEY ,name varchar(100));
insert into test.test_001(name) values('aaa');

历史测试目标表可删除:
postgresql:

drop table test.test.test_001;

任务配置文件

config/stream_mysql_postgresql.config

env {
  job.mode = "STREAMING"
  job.name = "streaming-mysql-pg"
}

source {
  MySQL-CDC {
    base-url = "jdbc:mysql://192.168.8.101:3306/test"
    username = "root"
    password = "123456"
    table-names = ["test.test_001"]
  }
}

sink {
  jdbc {
    url = "jdbc:postgresql://192.168.8.101:5432/test"
    driver = "org.postgresql.Driver"
    user = "postgres"
    password = "123456"
    database = "test"
    table = "test.test_001"
    generate_sink_sql = true
  }
}

执行任务

bash bin/seatunnel.sh --config config/stream_mysql_postgresql.config --deploy-mode local

对源数据库执行操作

-- 增
insert into test.test_001(name) 
select name from test.test_001

-- 删
delete from test.test_001 where id > 10;

-- 改
update test.test_001 set name = 'bbb' where id = 1;

可以注意到目标端都会跟随发生变化。
image

尝试truncate清空

truncate table test.test_001;

结果:目标端并不会跟随删除。

再次源端插入:

insert into test.test_001(name) values('aaa');
select * from test.test_001 ;

源端id继续从 1 开始计数。
image
目标端更新 id=1 与源端同步
image

启动集群模式

./bin/seatunnel-cluster.sh -d
posted @   葵花牌、阳光  阅读(173)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 如何给本地部署的DeepSeek投喂数据,让他更懂你
· 从 Windows Forms 到微服务的经验教训
· 李飞飞的50美金比肩DeepSeek把CEO忽悠瘸了,倒霉的却是程序员
· 超详细,DeepSeek 接入PyCharm实现AI编程!(支持本地部署DeepSeek及官方Dee
· 用 DeepSeek 给对象做个网站,她一定感动坏了
点击右上角即可分享
微信分享提示