[AWS - DA] S3
Overview
- Object values are the content of body:
- Max Object Size 5TB (5000GB)
- If uploading more than 5GB, must use "multi-part upload"
- Metadata (list of text key /value pairs)
- Tags (to to 10)
- Version ID
Versioning
- It is enabled at bucket level
- Same key overwrite willincrement the version : 1,2,3...
- It is best practice to version your buckets
- Any file that is not versioned prior to enabling versioning will have version "null"
- Suspending versioning does not delete the previous versions
- HTTPS is mandatory for SSE-C
Security
- User based
- IAM polices - which API calls should be allowed for a specific user from IAM console
- Resource based
- Bucket Policies - bucket wide rules from the S3 console - allows cross account
- Object Access Control (ACL) - finer grain
- Bucket Access Control List (ACL) - less common
- The user IAM permissions allow it OR the resource policy ALLOWS it
- AND there's no explicit DENY
- Networking
- Supports VPC Endpoints (for instances in VPC without www internet)
- Logging and Audit
- S3 Access Logs can be stored in other s3 bucket
- API calls can be logged in AWS CloudTrail
- User security
- MFA Delete: MFA can be required in versioned bucekts to delete objects
- Pre-singed URLs: URLs that are valid only for a limited time
Replication (CRR & SRR)
- Must enable versioning in source and destination
- Cross Region Replication (CRR)
- Same Region. Replication (SRR)
- Buckets can be in different accounts
- Copying is asynchronous
- Must give proper IAM permissions to S3
- CRR - Use case: compliance, lower latency access, replication across acounts
- SRR - Use case: log aggregation, live replication between production and test accounts
- After activating, only new objects are replicated
- For DELETE operations:
- Can Replicate delete markers from source to target (optional setting)
- Deletions with a version ID are not replicated (to avoid mailcious deletes)
- There is no "chaining" of replication
- If bucket 1 has replication into bucket 2, which has replication into bucket 3
- Then objects create in bucket 1 are not replicated to bucket 3
Consistency Model
- Strong consistency as 12th 2020
- After a:
- successful write of a new object (new PUT)
- or an overwrite or delete of an existing object (overwrite PUT or DELETE)
- ...any:
- subsequent read request immediately receives the latest version of the object
- subsequent list request immediately reflects changes
- Available at no additional cost, without any performance impact
MFA - DELETE
- MFA forces user to gerate a code on a device before doing important operations on S3
- To use MFA-DELETE, enable Versioning on the S3 bucket
- Only the bucket onwer (root account) can enable/disable MFA-DELETE
- MFA-DELETE currently can only be enabled using CLI
- You will need MFA to
- permantly delete an object version
- suspend versioning on the bucket
- You won't need MFA for
- enabling versioning
- listing deleted version
- Glacier min storage duration: 90 days
- Deep Archive: 180 days
Amazon Glacier - 3 retrieval options
- Expedited (1 to 5 mins)
- Standard (3 to 5 hours)
- Bulk (to to 12 hours)
- Minimum storage duration of 90 days
Amazon Glacier Deep Archive - for long term storage - cheaper:
- Standard (12 hours)
- Bulk (48 hours)
S3 Performance
Multi-Part uplaod
- recommended for files > 100MB
- must use for file > 5 GB
- Can help parallelize upload (speed up transfers)
S3 Transfer Acceleration
- Increase transfer speed by transferring file to an AWS edge location which will forward the data to S3 ucket in the target region
- Compatible with multi-part uplaod
S3 Byte-Range Fetches
- Parallelize GETs by requesting specific byte ranges
- Better resilience in case of failures
KMS Limitation
- If you use SSE-KMS, you may be impacted by the KMS limits
- When you upload, it calls the GenerateDataKey KMS API
- When you download, it calls the Decrypt KMS API
- Count toward the KMS quota per scond (5500, 10000, 30000 req/s base on region)
- You can request a quota increase using the servcie quotas console
S3 Select & Glacier Select
- Retrieve less data using SQL by performing server side filtering
- Can filter by rows & columns (simple SQL statements)
- Less network transfer, less CPU cost client-side
S3 Bucket: Bucket configurations have an eventual consistency model.
S3 Object: Object has strongly consistency model.