spring boot 访问sftp csv 格式数据&&写入db
算是一个比较常见的问题,银行对账(sftp读取使用csv格式,比ftp安全点),一般都是定时数据处理(可以使用spring boot 的Scheduled 可以方便处理)
以下只是说明关于sftp读取以及csv处理以及jdbc写入的
依赖的组件
spring-integration-sftp(包装处理sftp的,jsch也可以直接拿来用),opencsv 简单灵活的csv读取以及bean转换
easy-batch一个简单灵活的batch处理框架
参考csv 内容格式说明
截取了部分,实际比这个多,只是演示
环境准备
使用docker-compose运行sftp以及依赖的pg 数据库
- docker-compos 文件
version: "3"
services:
postgres:
image: postgres:alpine
environment:
- POSTGRES_PASSWORD=dalong
ports:
- 5432:5432
ftp:
image: mikatux/ftps-server
environment:
- "USER=demo"
- "PASSWORD=demoapp"
restart: always
volumes:
- "./data:/home/demo"
network_mode: host
sftp:
image: atmoz/sftp
volumes:
- ./data2/upload:/home/demo/upload
- ./sshd_config:/etc/ssh/sshd_config
restart: always
cap_add:
- ALL
ports:
- "2222:22"
command: demo:demoapp:1001
sshd_config (此文件还比较重要,在测试的过程中发现直接使用默认的居然有问题)
# $OpenBSD: sshd_config,v 1.100 2016/08/15 12:32:04 naddy Exp $
# This is the sshd server system-wide configuration file. See
# sshd_config(5) for more information.
# This sshd was compiled with PATH=/usr/local/bin:/usr/bin
# The strategy used for options in the default sshd_config shipped with
# OpenSSH is to specify options with their default value where
# possible, but leave them commented. Uncommented options override the
# default value.
# If you want to change the port on a SELinux system, you have to tell
# SELinux about this change.
# semanage port -a -t ssh_port_t -p tcp #PORTNUMBER
#
Port 22
#AddressFamily any
#ListenAddress 0.0.0.0
#ListenAddress ::
HostKey /etc/ssh/ssh_host_rsa_key
#HostKey /etc/ssh/ssh_host_dsa_key
#HostKey /etc/ssh/ssh_host_ecdsa_key
HostKey /etc/ssh/ssh_host_ed25519_key
# Ciphers and keying
#RekeyLimit default none
# Logging
#SyslogFacility AUTH
SyslogFacility AUTHPRIV
#LogLevel INFO
# Authentication:
#LoginGraceTime 2m
#PermitRootLogin no
#StrictModes yes
#MaxAuthTries 6
#MaxSessions 10
#PubkeyAuthentication yes
# The default is to check both .ssh/authorized_keys and .ssh/authorized_keys2
# but this is overridden so installations will only check .ssh/authorized_keys
#AuthorizedPrincipalsFile none
#AuthorizedKeysCommand none
#AuthorizedKeysCommandUser nobody
# For this to work you will also need host keys in /etc/ssh/ssh_known_hosts
#HostbasedAuthentication no
# Change to yes if you don't trust ~/.ssh/known_hosts for
# HostbasedAuthentication
#IgnoreUserKnownHosts no
# Don't read the user's ~/.rhosts and ~/.shosts files
#IgnoreRhosts yes
# To disable tunneled clear text passwords, change to no here!
#PasswordAuthentication yes
#PermitEmptyPasswords no
PasswordAuthentication yes
# Change to no to disable s/key passwords
#ChallengeResponseAuthentication yes
ChallengeResponseAuthentication no
# Kerberos options
#KerberosAuthentication no
#KerberosOrLocalPasswd yes
#KerberosTicketCleanup yes
#KerberosGetAFSToken no
#KerberosUseKuserok yes
# GSSAPI options
GSSAPIAuthentication yes
GSSAPICleanupCredentials no
#GSSAPIStrictAcceptorCheck yes
#GSSAPIKeyExchange no
#GSSAPIEnablek5users no
# Set this to 'yes' to enable PAM authentication, account processing,
# and session processing. If this is enabled, PAM authentication will
# be allowed through the ChallengeResponseAuthentication and
# PasswordAuthentication. Depending on your PAM configuration,
# PAM authentication via ChallengeResponseAuthentication may bypass
# the setting of "PermitRootLogin without-password".
# If you just want the PAM account and session checks to run without
# PAM authentication, then enable this but set PasswordAuthentication
# and ChallengeResponseAuthentication to 'no'.
# WARNING: 'UsePAM no' is not supported in Red Hat Enterprise Linux and may cause several
# problems.
#AllowAgentForwarding yes
#AllowTcpForwarding yes
#GatewayPorts no
X11Forwarding yes
#X11DisplayOffset 10
#X11UseLocalhost yes
#PermitTTY yes
#PrintMotd yes
#PrintLastLog yes
#TCPKeepAlive yes
#UseLogin no
#UsePrivilegeSeparation sandbox
#PermitUserEnvironment no
#Compression delayed
#ClientAliveInterval 0
#ClientAliveCountMax 3
#ShowPatchLevel no
UseDNS no
#PidFile /var/run/sshd.pid
#MaxStartups 10:30:100
#PermitTunnel no
#ChrootDirectory none
#VersionAddendum none
# no default banner path
#Banner none
# Accept locale-related environment variables
AcceptEnv LANG LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE LC_MONETARY LC_MESSAGES
AcceptEnv LC_PAPER LC_NAME LC_ADDRESS LC_TELEPHONE LC_MEASUREMENT
AcceptEnv LC_IDENTIFICATION LC_ALL LANGUAGE
AcceptEnv XMODIFIERS
# override default of no subsystems
# Force sftp and chroot jail
Subsystem sftp internal-sftp
ForceCommand internal-sftp
#ChrootDirectory %h
# Example of overriding settings on a per-user basis
#Match User anoncvs
# X11Forwarding no
# AllowTcpForwarding no
# PermitTTY no
# ForceCommand cvs server
代码说明
- pom.xml
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.jeasy</groupId>
<artifactId>easy-batch-opencsv</artifactId>
<version>6.0.0</version>
</dependency>
<dependency>
<groupId>com.zaxxer</groupId>
<artifactId>HikariCP</artifactId>
<version>3.4.5</version>
</dependency>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<version>42.2.6</version>
</dependency>
<dependency>
<groupId>org.jeasy</groupId>
<artifactId>easy-batch-flatfile</artifactId>
<version>6.0.0</version>
</dependency>
<dependency>
<groupId>org.jeasy</groupId>
<artifactId>easy-batch-jdbc</artifactId>
<version>6.1.0</version>
</dependency>
<dependency>
<groupId>org.jeasy</groupId>
<artifactId>easy-batch-core</artifactId>
<version>6.0.0</version>
</dependency>
<dependency>
<groupId>au.com.bytecode</groupId>
<artifactId>opencsv</artifactId>
<version>2.4</version>
</dependency>
<dependency>
<groupId>org.springframework.integration</groupId>
<artifactId>spring-integration-sftp</artifactId>
<version>5.3.2.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>org.junit.vintage</groupId>
<artifactId>junit-vintage-engine</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
- 核心bean配置
主要关于pg datasource 以及easy-batch 处理的
@Bean
public DataSource dataSource() {
HikariConfig config = new HikariConfig();
Properties properties = new Properties();
properties.setProperty("serverName","127.0.0.1");
properties.setProperty("portNumber","5432");
properties.setProperty("user","postgres");
properties.setProperty("password","dalong");
config.setMaximumPoolSize(10);
config.setMaxLifetime(3);
properties.setProperty("sslmode","disable");
properties.setProperty("databaseName","postgres");
properties.setProperty("preparedStatementCacheQueries","0");
config.setDataSourceClassName("org.postgresql.ds.PGSimpleDataSource");
config.setDataSourceProperties(properties);
config.setConnectionTestQuery("SELECT 1");
HikariDataSource ds = new HikariDataSource(config);
return ds;
}
@Bean(destroyMethod = "shutdown")
public JobExecutor jobExecutor(){
JobExecutor jobExecutor = new JobExecutor();
return jobExecutor;
}
- sftp下载处理
简单demo
import com.jcraft.jsch.ChannelSftp;
import com.jcraft.jsch.JSchException;
import com.jcraft.jsch.SftpException;
import org.springframework.integration.sftp.session.DefaultSftpSessionFactory;
import org.springframework.integration.sftp.session.SftpSession;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.time.Duration;
import java.util.Properties;
public class Download {
private DefaultSftpSessionFactory gimmeFactory(){
DefaultSftpSessionFactory factory = new DefaultSftpSessionFactory();
factory.setHost("127.0.0.1");
factory.setPort(2222);
Properties properties =new Properties();
properties.setProperty("StrictHostKeyChecking","false");
factory.setSessionConfig(properties);
factory.setAllowUnknownKeys(true);
factory.setUser("demo");
factory.setChannelConnectTimeout(Duration.ofSeconds(1000));
factory.setPassword("demoapp");
return factory;
}
public String download() throws JSchException {
SftpSession session = gimmeFactory().getSession();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
try {
session.read("upload/HENAN0_settleJournal_20190728.txt", outputStream);
return new String(outputStream.toByteArray(),"utf-8");
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}
- csv 实体映射
这个使用opencsv来解析,很方便可以灵活的进行映射处理,减少好多代码量
import com.opencsv.bean.CsvBindByName;
public class Bank {
@CsvBindByName(column = "集团号")
private String id;
@CsvBindByName(column = "企业用户号")
private String user;
@CsvBindByName(column = "商户名称")
private String message;
public Bank() {
}
public Bank(String id, String user, String message) {
this.id = id;
this.user = user;
this.message = message;
}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getUser() {
return user;
}
public void setUser(String user) {
this.user = user;
}
public String getMessage() {
return message;
}
public void setMessage(String message) {
this.message = message;
}
@Override
public String toString() {
return "Bank{" + "id=" + id +
", user='" + user + '\'' +
", message='" + message + '\'' +
'}';
}
}
- 简单解析处理
@RequestMapping("/api3")
public void demo3() throws JSchException, IOException {
Download download =new Download();
String content = download.download();
// 直接使用了opencsv 的实体映射以及解析处理,简化代码量
ByteArrayInputStream byteArrayInputStream= new ByteArrayInputStream(content.getBytes());
InputStreamReader inputStreamReader = new InputStreamReader(byteArrayInputStream,"utf-8");
HeaderColumnNameMappingStrategy<Bank> mappingStrategy = new HeaderColumnNameMappingStrategy<>();
mappingStrategy.setType(Bank.class);
CsvToBean<Bank> build = new CsvToBeanBuilder<Bank>(inputStreamReader).withMappingStrategy(mappingStrategy).withSeparator('|').build();
List<Bank> bankList = build.parse();
// 使用upsert 模式插入数据,实际处理,我们可以添加定时任务,同时最好暴露rest api,方便在处理异常的时候进行手工处理,同时推荐添加easy-batch 的监听以及监控(集成prometheus 也很不错,方便了解实际情况)
String query = "insert into tweet VALUES(?,?,?) ON conflict(id) DO NOTHING ";
String[] fields = {"id", "user", "message"};
PreparedStatementProvider psp = new BeanPropertiesPreparedStatementProvider(Bank.class, fields);
Job job = new JobBuilder()
.batchSize(2)
.reader(new IterableRecordReader(bankList))
.errorThreshold(100)
.writer(new JdbcRecordWriter(dataSource, query, psp))
.build();
JobReport jobReport = jobExecutor.execute(job);
System.out.println(jobReport);
}
- 运行效果
注意需要在pg db 创建table
说明
opencsv 推荐使用新版本在bena处理上比较灵活,easy-batch 是一个很不错的batch处理框架,简单,灵活,一个参考图
sftp 连接的sdk 可选的有:JSch, SSHJ,Apache Commons VFS, jsch 很不错
同时对于数据处理实际上除过 easy-batch,spring-batch, apache camel, spring-integration 都是不错的选择,但是easy-batch从使用
上更加简单,而且灵活(主要是轻量)
参考资料
https://github.com/j-easy/easy-batch
http://www.jcraft.com/jsch/
http://opencsv.sourceforge.net/