[Cloud Architect] 1. Design for Availability, Reliability, and Resiliency

Introduction to Availability, Reliability and Resiliency

Availability

A measure of time that a system is operating as expected. Typically measured as a percentage.

Failover should be fast

Reliability

A measure of how likely something is to be operating as expected at any given point in time. Said differently, how often something fails.

Uptime should be long

Resiliency

A measure of a system's recoverability. How quickly and easily a system can be brought back online.

Survive the failure

When To Use Resilient Solutions

Generally speaking, resilient systems are more expensive than non-resilient systems.

They can also be more complex to run and maintain. Thus, you should consider the tradeoffs of where you do or do not need resiliency.

Think through the use case and business requirements of a particular environment before deciding if you should be building out full redundancy.

Production Env & Pre-Production Env need Resilient

Test & Dev do not

Introduction To AZs And Regions

Availability Zones and Regions are the core components within AWS that allow for fault isolation.

The AWS global infrastructure is made up of multiple geographic regions spread around the world. They are connected via high-speed networking but are independent.

An AWS region is made up of multiple Availability Zones. These Availability Zones allow for fault isolation within a region and provide the simplest way to achieve a significant level of redundancy.

Regions
Availability Zones (AZs)
Virtual Private Clouds (VPCs)
AWS VPC Networking

In this lesson, you will learn about the physical and networking infrastructure that AWS provides. You will build infrastructure that uses many of the capabilities of AWS's global footprint.

You will begin at the largest level with AWS Regions, then learn about Availability Zones within Regions. From there, you'll get to know more about VPCs, your own network within AWS, and then how to use AWS networking features to create custom network layouts.

As you learn about the capabilities of AWS for reliability and redundancy, it is important to consider what level of availability is required for a use case or environment. Some non-critical cases require no redundancy, while production environments typically do require it.

When considering how to architect a service, you should think about how a disruption or data loss in that service would impact your business.

You'll need to think about what it will take to restore service as well as what your business has committed to in its contractual obligations.

Regions

An AWS Region is a geographically separate portion of the AWS global infrastructure. Each region is separated from the others by hundreds of miles. They are isolated so that they are not interdependent, but they are connected by a global high speed, high bandwidth private AWS network.

Running in multiple regions is completely optional. In fact, if you don't intentionally try to, you'll only be running in one region. One reason for this is that there is some amount of additional cost to run in multiple regions. You must determine if your use case warrants the cost and complexity.

Most AWS services must be managed on a per-region basis. When you create a resource in one region, it does not exist in the other regions. There are a few exceptions, however. These exceptions are "global" AWS services such as IAM, where identity and access management must span the entirety of AWS or services such as CloudFront and S3 where they are not managed on a per-region basis.

Availability Zones

An AWS Availability Zone (AZ) is a subsection of an AWS Region. A Region has multiple Availability Zones, and the exact number depends on the Region. An AZ is a physically independent building with its own power and network connectivity. AZs within a region are generally separated by several miles and connected to each other with extremely high bandwidth network connections.

Multi-A

Many AWS services are able to make use of multiple AZs if you configure them to do so. When services are configured to use multiple AZs, they are considered to be highly available. Even if an entire AZ (complete datacenter) went down, your service would continue to run with minimal interruption.

Virtual Private Clouds

A Virtual Private Cloud (VPC) is a private network that you control within the larger AWS network. These private networks allow you to configure your network architecture the way you desire. A VPC is region specific. You decide if your VPCs connect to each other or if you keep them independent. If you connect your VPCs, it's up to you to configure them according to regular networking guidelines.

Network Ranges

A network range is a consecutive set of IP addresses.

Network ranges are described using "CIDR" notation. CIDR notation consists of the first IP address of the network range, followed by a "slash", followed by a number. That number describes how many consecutive address are in the range. A "/24" address has 255 addresses, while a "/16" has 65,536 addresses.

Subnet

A subnet is a subsection of the VPC network range
The unique thing about a VPC subnet is that it is tied to an availability zone
When you create a subnet, you define which availability zone that subnet is attachted to.
And then anything you created in that subnet, get created in the associated AZ

RouteTable

You can create a route table and attach it to one or more subnets
A route table can be shared across subnets in different AZs

Internet gateway

It is a scalable AWS component for accessing the internet from a VPC
ING is represented in the routing table of a subnet, and this is how resouces in the subnet get to the internet.
If a subnet has its ING and route table, then instnaces with public IP addresses can send traffic directly to the internet.
And internet can send traffic directly to the instances.

NAT Gateway

When you need resource in a subnet to be able to connect out to the internet.
but do not want your instances open to inbound communication from the internet, then you can use a NAT Gateway
A Nat Gateway is a scalable device that provides network addresses translation to resouces within a VPC
A NAT Gateway also acts as a target in a routing table
but a NAT gateway lives in a particular availability zone, whereas an internet gateway is a Multi-AZ behind the scenes.
You can configure subnets in any AZ in the region to use a NAT gateway, but keep in mind that for redundancy sake, you will want multiple NAT gateways in case one AZ were to fail.

Security group

A security group is like a stateful firewall that you can attach to an EC2 instances, RDS database, or other type of instances in AWS
Stateful means that you describe what ports you will accpet initial connection on, and once the connection is established, all related, inbound an outbound traffic will be allowed.

NACL

A stateless firewall
This means that if any given packet in a TCP Stream isn't allowed, whole connection will fail.
NACL is applied to a subnet

Debugging VPC Networks

VPC Flow Logs

Flow logs allow you to see higher level network debugging information like the source port and source IP, and destination port and destination IP of traffic flowing within your VPC.

Traffic Mirroring

Traffic mirroring is like traditional "packet sniffing" on specific ports.

Edge Cases

AWS networking does have some limitations that your own data center network would not.

You cannot use multicast in a VPC
You cannot put network cards into "promiscuous" mode to sniff ethernet packets.
There are some restrictions on opening up ports for SMTP
You cannot have network scans run against your account without discussing with AWS

You can connect VPCs together to enable:

Cross VPC connections
Cross region connections
Cross account connections

Lesson Recap

Regions
Availability Zones (AZs)
Virtual Private Clouds (VPCs)
AWS VPC Networking

There are many tools and capabilities at your disposal with Regions, AZs, VPCs and AWS networking. Most things that are possible in an in-house network are available in AWS. These functions allow you to have flexibility and security as well as global reach.

Lesson Objectives

You will be able to:

Build on the AWS global infrastructure
Take advantage of the multiple availability options AWS provides
Build multiple AWS VPCs to suit requirements
Create custom isolated networks to meet business needs

Glossary

Fault isolation: Means of containing a fault in a system to a limited area.
Network latency: The time it takes network traffic to traverse back and forth over the network.
Data locality: The practice of keeping data in a certain region or country because of legal restrictions.
CLI: Command Line Interface.
SDK: Software Development Kit.
Network fabic: A high speed network interconnect, where high volumes of traffic move over short distances.
Multicast networking: A networking protocol where traffic is sent in a "one-to-many" manner.

Answer1215