DataX - [03] 使用案例

题记部分

 

001 || mysql2hdfs

(1)查看MySQL被迁移的数据情况

(2)根据需求确定reader为mysqlreader,writer为hdfswriter

查看reader和writer模板的方式(-r读模板;-w写模板)

python bin/datax.py -r mysqlreader -w hdfswriter

(3)编写同步json脚本

(4)确定HDFS上目标路径是否存在

(5)通过datax.py指定json任务运行同步数据

(6)数据验证,查看HDFS上是否已经有MySQL对应表中的所有数据

{
	"job": {
		"content": [
			{
				"reader": {
					"name": "mysqlreader",
					"paramter": {
						"column": ["id","name"],
						"connection": [
							{
								"jdbcUrl": ["jdbc:mysql://xxxxx:3306/dbName"],
								"table": ["test"]
							}
						],
						"password": "twgdhbtzhy",
						"username": "root",
						"splitPk": ""
					}
				},
				"writer": {
					"name": "hdfswriter",
					"parameter": {
						"column": [
							{"name": "id", "type": "bigint"},
							{"name": "name", "type": "string"}
						],
						"compress": "gzip",
						"defaultFS": "hdfs://xxxxx:8020",
						"fieldDelimiter": "\t",
						"fileName": "test",
						"fileType": "text",
						"path": "/test",
						"writeMode": "append"
					}
				}
			}
		],
		"setting": {
			"speed": {
				"channel": "1"
			}
		}
	}
}

(7)任务执行

hdfs dfs -mkdir /test
python bin/datax.py job/mysql2hdfs.json

(8)

 

 

 

 

002 || 标题

 

 

003 || 标题

 

 

posted @   HOUHUILIN  阅读(36)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· SQL Server 2025 AI相关能力初探
· AI编程工具终极对决:字节Trae VS Cursor,谁才是开发者新宠?
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
历史上的今天:
2023-12-17 K8s - 容器编排引擎Kubernetes
2023-12-17 DOCKER20231217: 容器引擎Docker
点击右上角即可分享
微信分享提示