[SAA] 32. Data Engineering

AWS Batch Overview

  • Run batch jobs as Docker images
  • Dynamic provisioning of the instances (EC2 & Spot Instances) - in VPC
  • Optimal quantity and type based on volume and requirements
  • No need to manage clusters, fully serverless
  • You just pay for the underlying EC2 instance
  • Example: batch process of images, running thousands of concurrent jobs
  • Schedule Batch Jobs using CloudWatch Events
  • Orchestrate Batch Jobs using AWS Step Functions

 

 

 

Lambda vs Batch

Lambda

  • Time limit: 15 mins
  • Limted runtime
  • Limited temporary disk space
  • Serverless

Batch

  • No time limit
  • Any runtinme as long as it's package as a Docker image
  • Rely on EBS / instance store for disk space
  • Relies on EC2 (can be managed by AWS)

 

Compute Environments

Managed Compute Environment

  • AWS Batch managed the capacity and instance types within the environment
  • You can choose On-Demand or Spot Instance
  • You can set a maximum price for Spot instance
  • Launched within your own VPC
    • If you launch within your own private subnet, make sure it has access to the ECS service
    • Either using a NAT Gateway / instance or using VPC Endpoint for ECS

 

Unmanaged Compute Environment

  • You control and manage instance configuration, provisioning and scaling

 

 

Kinesis

CloudWatch cannot send to Kinesis Data Firehose or Kinesis Data Streams

Near real-time: Kinesis Data Firehose

Kinesis agent can directly configured to send data to Kinesis Data Firehose

Firehose can connect to S3

 

Kinesis Data Firehose is near real-time

Using Lambda to send to ElasticSearch

 

Athena

  • Quicksight for visiulization dashboard
  • CloudTrail can stream logs to CloudWatch

 

 

  • EMR can choose to use Spot Fleet to control the cost
  • Athena: data must stay in S3
  • Redshift Spectrum for serverless queries on S3

 

 

 

 

 

 

posted @   Zhentiw  阅读(276)  评论(0编辑  收藏  举报
编辑推荐:
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
阅读排行:
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· AI技术革命,工作效率10个最佳AI工具
历史上的今天:
2016-09-23 [Angular 2] Create template with Params
2016-09-23 [Angular 2] Generate and Render Angular 2 Template Elements in a Component
2016-09-23 [Angular 2] Move and Delete Angular 2 Components After Creation
2016-09-23 [Angular 2] Order Dynamic Components Inside an Angular 2 ViewContainer
2016-09-23 [Angular 2] Set Properties on Dynamically Created Angular 2 Components
2016-09-23 [Angular 2] Generate Angular 2 Components Programmatically with entryComponents & ViewContainerRef
2016-09-23 [Angular 2] ElementRef, @ViewChild & Renderer
点击右上角即可分享
微信分享提示