Hudi学习笔记2 - Hudi配置
https://hudi.apache.org/docs/configurations
Hudi配置分类
- Spark Datasource Configs
Spark Datasource 的配置。
- Flink Sql Configs
Flink SQL source/sink connectors 的配置,如:index.type、write.tasks、write.operation、clean.policy、clean.retain_commits、clean.retain_hours、compaction.max_memory、hive_sync.db、hive_sync.table、hive_sync.metastore.uris、write.retry.times、write.task.max.size 等。
- Write Client Configs
控制 Hudi 使用 RDD 的 HoodieWriteClient API 的配置。
- Metastore and Catalog Sync Configs
同步外部元数据的配置。
- Metrics Configs
度量配置。
- Record Payload Config
低级别定制配置,比如设置 Payload 的配置 hoodie.compaction.payload.class 等。
- Kafka Connect Configs
使用 Kafka 作为 Sink Connector 的写 Hudi 表的配置。
- Amazon Web Services Configs
亚马逊 Web Service 配置。
Write Client Configs
-
Layout Configs
-
Clean Configs
-
Memory Configurations
-
Archival Configs
-
Metadata Configs
-
Consistency Guard Configurations
-
FileSystem Guard Configurations
-
Write Configurations
-
Metastore Configs
-
Key Generator Options
-
Storage Configs
-
Compaction Configs
-
File System View Storage Configurations
-
Clustering Configs
-
Common Configurations
-
Bootstrap Configs
-
Commit Callback Configs
-
Lock Configs
-
Index Configs
Metastore and Catalog Sync Configs
-
Common Metadata Sync Configs
-
Global Hive Sync Configs
-
DataHub Sync Configs
-
BigQuery Sync Configs
-
Hive Sync Configs
Metrics Configs
-
Metrics Configurations for Datadog reporter
-
Metrics Configurations for Amazon CloudWatch
-
Metrics Configurations
-
Metrics Configurations for Jmx
-
Metrics Configurations for Prometheus
-
Metrics Configurations for Graphite
Record Payload Config
- Payload Configurations
配置项 | 是否必须 | 默认值 | 配置说明 |
---|---|---|---|
hoodie.compaction.payload.class | N | org.apache.hudi.common.model.OverwriteWithLatestAvroPayload | |
hoodie.payload.event.time.field | N | ts | |
hoodie.payload.ordering.field | N | ts | 在合并和写入存储之前,对相同主键进行排序的字段名,默认为 ts。 |
Kafka Connect Configs
- Kafka Sink Connect Configurations
配置项 | 是否必须 | 默认值 | 配置说明 |
---|---|---|---|
hadoop.conf.dir | Y | N/A | |
hadoop.home | Y | N/A | |
bootstrap.servers | N | bootstrap.servers | Kafka 集群的 bootstrap.servers |
hoodie.kafka.control.topic | N | hudi-control-topic | |
hoodie.meta.sync.classes | N | org.apache.hudi.hive.HiveSyncTool | |
hoodie.meta.sync.enable | N | false | |
hoodie.meta.sync.enable | N | org.apache.hudi.schema.FilebasedSchemaProvider | |
hoodie.kafka.coordinator.write.timeout.secs | N | 300 | |
hoodie.kafka.compaction.async.enable | N | true |
Amazon Web Services Configs
配置项 | 是否必须 | 默认值 | 配置说明 |
---|---|---|---|
hoodie.aws.access.key | Y | N/A | AWS access key id |
hoodie.aws.secret.key | Y | N/A | AWS secret key |
hoodie.aws.session.token | N | N/A | AWS session token |