AWS DAS认证考点整理（Athena&Glue篇）

Athena

AWS Glue Crawler Classification = Unknown: All the classifiers certainty=0.0
Glue读取大量小文件用dynamic frame file grouping功能。
Glue读取大量小文件报outofmemory错误= 'groupFiles': 'inPartition' feature.
Glue Crawler和Athena都可以跨Region使用。
Glue Crawler/Copy Command triggered by S3:CreateObject event = most current data
Glue跳过特定的S3存储层：excludeStorageClasses
Glue scale = glue job metrics + maximum capcity job parameter
Glue job rerun导致重复数据清理用Postactions in DynamicFrameWriter class
Glue支持PySpark Scala dialect，不支持Hive script
Glue Job bookmark功能避免重复处理数据，只处理增量数据。
Glue catalog update manually created table = choose table when define crawler.
Glue跑完job自动update catalog=enable updatecatalog
Glue Findmatch ML=match data without PK
Glue streaming ETL 连 KDS, MSK, 不可以连KDF。

posted @ 2022-11-17 12:09 爱知菜阅读(118) 评论(0) 收藏举报

刷新页面返回顶部