CNCF Landscape Guide

https://landscape.cncf.io/guide#introduction

Cloud Native Computing Foundation (CNCF) serves as the vendor-neutral home for many of the fastest-growing open-source projects, including Kubernetes, Prometheus, and Envoy.

Provisioning，基础设施层

Provisioning is the first layer in the cloud native landscape.
It encompasses tools that are used to create and harden the foundation on which cloud native apps are built.
You'll find tools to automatically configure, create, and manage the infrastructure, as well as for scanning, signing, and storing container images.
The layer also extends to security with tools that enable policy setting and enforcement, embedded authentication and authorization, and the handling of secrets distribution.
That's a mouthful, so let's discuss each category at a time.

Automation and configuration，部署配置

Automation and configuration tools speed up the creation and configuration of compute resources (virtual machines, networks, firewall rules, load balancers, etc.).
Tools in this category either handle different parts of the provisioning process or try to control everything end-to-end.
Most provide the ability to integrate with other projects and products in the space.

如何进行自动化的infra管理

As we move from old-style human-driven provisioning to a new on-demand scaling model driven by the cloud, the patterns and tools we used before no longer meet our needs.
Most organizations can’t afford a large 24x7 staff to create, configure, and manage servers.
Automated tools like Terraform reduce the level of effort required to scale tens of servers and networks with hundreds of firewall rules.
Tools like Puppet, Chef, and Ansible provision and/or configure these new servers and applications programmatically as they are spun up and allow them to be consumed by developers.

Container registries ，容器注册

Container registries either store and distribute images or enhance an existing registry in some way. Fundamentally, a registry is a web API that allows container runtimes to store and retrieve images. Many provide interfaces to allow container scanning or signing tools to enhance the security of the images they store. Some specialize in distributing or duplicating images in a particularly efficient manner. Any environment using containers will need to use one or more registries.

Security and compliance ，安全合规

Security and compliance tools help harden, monitor, and enforce platform and application security.
From containers to Kubernetes environments, these tools allow you to set policy (for compliance), get insights into existing vulnerabilities, catch misconfigurations, and harden the containers and clusters.

Key Management，秘钥管理

Tools in this category can be grouped into two sets:

1) key generation, storage, management, and rotation, and

2) single sign-on and identity management.

Vault, for example, is a rather generic key management tool allowing you to manage different types of keys.

Keycloak, on the other hand, is an identity broker which can be used to manage access keys for different services.

Runtime，运行环境

Now that we've established the foundation of a cloud native environment, we'll move one infrastructure layer up and zoom into the runtime layer.
It encompasses everything a container needs to run in a cloud native environment.
That includes the code used to start a container, referred to as a container runtime; tools to make persistent storage available to containers; and those that manage the container environment networks.

Cloud native storage，云原生存储

这里强调的是用于容器环境的云存储，所以强调兼容CSI

The tools in this category help either:

Provide cloud native storage options for containers,
Standardize the interfaces between containers and storage providers, or
Provide data protection through backup and restore operations.

The former means storage that uses a cloud native compatible container storage interface (tools in the second category) and which can be provisioned automatically, enabling autoscaling and self-healing by eliminating the human bottleneck.

Cloud native storage is largely made possible by the Container Storage Interface (CSI) which provides a standard API for providing file and block storage to containers.

Minio is a popular project that provides an S3-compatible API for object storage among other things.

Tools like Velero help simplify the process of backing up and restoring both the Kubernetes clusters themselves as well as persistent data used by the applications.

Container runtime，容器运行环境

Container images (the files with the application specs) must be launched in a standardized, secure, and isolated way.

Standardized because you need standard operating rules no matter where they are running.

Secure, well, because you don't want anyone who shouldn't access it to do so.

And isolated because you don't want the app to affect or be affected by other apps (for instance, if a co-located application crashes).

Containerd (part of the famous Docker product) and CRI-O are standard container runtime implementations.

Then there are tools that expand the use of containers to other technologies, such as Kata which allows you to run containers as VMs.

Others aim at solving a specific container-related problem such as gVisor which provides an additional security layer between containers and the OS.

Cloud native network，云原生网络

容器间通信，符合CNI，建立虚拟网络

Containers talk to each other and to the infrastructure layer through a cloud native network.
Distributed applications have multiple components that use the network for different purposes.
Tools in this category create a virtual network on top of existing networks specifically for apps to communicate, referred to as an overlay network.

Projects and products in this category use the Container Network Interface (CNI), a CNCF project, to provide networking functionalities to containerized applications.

Some tools, like Flannel, are rather minimalist, providing bare bones connectivity to containers.

Others, such as NSX-T provide a full software-defined networking layer creating an isolated virtual network for every Kubernetes namespace.

At a minimum, a container network needs to assign IP addresses to pods (that's where containerized apps run in Kubernetes), allowing other processes to access it.

The CNI standardizes the way network layers provide functionalities to pods.
Selecting the right container network for your Kubernetes environment is critical and you've got a number of tools to choose from. Weave Net, Antrea, Calico, and Flannel all provide effective open source networking layers. Their functionalities vary widely and your choice should be ultimately driven by your specific needs.

Numerous vendors support and extend Kubernetes networks with Software Defined Networking (SDN) tools, providing additional insights into network traffic, enforcing network policies, and even extending container networks and policies to your broader datacenter.

Orchestration & Management 编排和管理

Here you’ll find tooling to handle running and connecting your cloud native applications.
This section covers everything from Kubernetes itself, one of the key enablers of cloud native development to the infrastructure layers responsible for inter app, and external communication.

Orchestration and scheduling ，编排与调度

Orchestration and scheduling refer to running and managing containers across a cluster.
A cluster is a group of machines, physical or virtual, connected over a network (see cloud native networking).

Kubernetes lives in the orchestration and scheduling section along with other less widely adopted orchestrators like Docker Swarm and Mesos.
It enables users to manage a number of disparate computers as a single pool of resources in a declarative way.
Declarative configuration management in Kubernetes is handled via control loops, a pattern in which a process running in Kubernetes monitors the Kubernetes store for a particular object type and ensures the actual state in the cluster matches the desired state.

This core controller pattern can also be used to extend Kubernetes by users or software developers. The operator pattern allows people to write custom controllers for custom resources and build any arbitrary logic, and automation, into kubernetes itself.

While Kubernetes isn’t the only orchestrator the CNCF hosts (both Crossplane and Volcano are incubating projects), it is the most commonly used and actively maintained orchestrator.

Crossplane，多云部署， Crossplane is an open-source Kubernetes extension that transforms your Kubernetes cluster into a universal control plane.

volcano，华为开源，https://volcano.sh/en/，gang-schedule(all or nothing)，PodGroup，Queue

https://www.bilibili.com/video/BV1wZ4y13713/?spm_id_from=333.788.recommend_more_video.0&vd_source=1eb6e5015a1f70daa97080d8ee786d5d

Koordinator，阿里开源，https://koordinator.sh/，Koordinator 整个混部资源调度的大厦构建在这样一个资源模型的基础之上，配合上优先级抢占、负载感知、干扰识别和 QoS 保障技术，构建出混部资源调度底层核心系统

YARN和K8s调度Spark作业的对比，http://fanyilun.me/2022/06/02/YARN%E5%92%8CK8s%E8%B0%83%E5%BA%A6Spark%E4%BD%9C%E4%B8%9A%E7%9A%84%E5%AF%B9%E6%AF%94/

Coordination & Service Discovery 协调与服务发现

As distributed systems became more and more prevalent, traditional DNS processes and traditional load balancers were often unable to keep up with changing endpoint information.
To make up for these shortcomings, service discovery tools handle individual application instances rapidly registering and deregistering themselves.
Some options such as CoreDNS and etcd are CNCF projects and are built into Kubernetes. Others have custom libraries or tools to allow services to operate effectively.

Remote Procedure Call (RPC)

Remote Procedure Call (RPC) is a particular technique enabling applications to talk to each other. It's one way of structuring app communication.

Service proxy 服务代理

Service proxies work by intercepting traffic between services, applying logic on it, and allowing it to move on if permitted.
Centrally controlled capabilities embedded into proxies allow administrators to accomplish several things.
They can gather detailed metrics about inter-service communication, protect services from being overloaded, and apply other common standards to services, like mutual TLS.
Service proxies are fundamental to other tools like service meshes as they provide a way to enforce higher-level policies to all network traffic.

API gateway API网关

和service Proxy的区别，强调是用户和应用间的控制

An API gateway sits between the users and the application.

The API gateway works by intercepting calls to backend services, performing some kind of value add activity like validating authorization, collecting metrics, or transforming requests, then performing whatever action it deems appropriate.

API gateways serve as a common entry point for a set of downstream applications while at the same time providing a place where teams can inject business logic to handle authorization, rate limiting, and chargeback.

Service mesh

什么是Service Mesh，参考，https://zhuanlan.zhihu.com/p/61901608

https://zhuanlan.zhihu.com/p/618243300，这篇说的很清楚

App Definition and Development

Everything we have discussed up to this point was related to building a reliable, secure environment and providing all needed dependencies.
We've now arrived at the top layer of the CNCF cloud native landscape.
As the name suggests, the application definition and development layer focuses on the tools that enable engineers to build apps.

Database

A database is an application through which other apps can efficiently store and retrieve data.

Streaming and messaging

The NATS and Cloudevents projects are both incubating CNCF projects in this space.
NATS provides a mature messaging system and Cloudevents is an effort to standardize message formats between systems.
Strimzi, Pravega, and Tremor are sandbox projects with each being tailored to a unique use case around streaming and messaging.

- CloudEvents，云原生event格式规范，https://cloudevents.io/，这里列了很多CloudEvents生态的开源；事件框架， Knative Eventing，Argo Events

- Apache Bookkeeper，https://zhuanlan.zhihu.com/p/397489497

- Strimzi， Apache Kafka cluster on Kubernetes in various deployment configurations. kafka在k8s上的快速部署方式，kafka operator

- Nats，https://zhuanlan.zhihu.com/p/374728426，浅谈NATS消息系统，go实现轻量非持久消息队列，虽然也有持久化方案JetStream；支持集群部署容灾

- Pravega，流存储，https://zhuanlan.zhihu.com/p/263010567，pulsar引入bookkeeper，将计算存储分离，把高可用，副本等复杂问题解耦合，Pravega在此基础上又引入HDFS作为冷存储，解决数据长期保存的问题。

Application definition and image build

应用打包，image生成，应用部署

Application definition and image build is a broad category that can be broken down into two main subgroups.

First, developer-focused tools that help build application code into containers and/or Kubernetes.

And second, operations-focused tools that deploy apps in a standardized way.

Whether you intend to speed up or simplify your development environment, provide a standardized way to deploy third-party apps, or wish to simplify the process of writing a new Kubernetes extension, this category serves as a catch-all for a number of projects and products that optimize the Kubernetes developer and operator experience.

Continuous integration (CI) and continuous delivery (CD)

Continuous integration (CI) and continuous delivery (CD) tools enable fast and efficient development with embedded quality assurance. CI automates code changes by immediately building and testing the code, ensuring it produces a deployable artifact. CD goes one step further and pushes the artifact through the deployment phases.

Observability and Analysis

Tools in this category are broken down into logging, monitoring, tracing, and chaos engineering.
Please note that the category name is somewhat misleading — although chaos engineering is listed here, consider it a reliability tool rather than an observability or analysis tool.

Monitoring

- cortex：一个支持多租户、水平扩展的prometheus服务。

-thanos：定位同于cortex， Open source, highly available Prometheus setup with long term storage capabilities

下面都是eBPF相关，参考https://github.com/mikeroyal/eBPF-Guide

- Fonio： Data first monitoring agent using (e)BPF, built on RedBPF

- Pixie：https://github.com/pixie-io/pixie，New Relic贡献出的开源项目
Pixie uses eBPF to automatically collect telemetry data such as full-body requests, resource and network metrics, application profiles, and more.

- DeepFlow：https://github.com/deepflowio/deepflow
a highly automated observability platform for cloud-native developers. Using new technologies such as eBPF, WASM, and OpenTelemetry, DeepFlow innovatively implements core mechanisms such as AutoTracing, AutoMetrics, AutoTagging, and SmartEncoding, which greatly avoids code instrumentation and significantly reduces the resource overhead of back-end data warehouses.

- Beats：https://github.com/elastic/beats，elasticSearch发布的轻量采集器，有一个系列可以采集各种指标
lightweight data shippers, written in Go, that you install on your servers to capture all sorts of operational data (think of logs, metrics, or network packet data).
The Beats send the operational data to Elasticsearch, either directly or via Logstash, so it can be visualized with Kibana.

- Kuberhealthy： a Kubernetes operator for synthetic monitoring and continuous process verification

- Skooner：k8s dashboard, https://github.com/skooner-k8s/skooner

- Trickster：https://github.com/trickstercache/trickster，有点意思时序数据库的cache和查询加速器
an HTTP reverse proxy/cache for http applications and a dashboard query accelerator for time series databases.

- Botkube，用于监控和运维k8s的聊天机器人，https://github.com/kubeshop/botkube
a messaging bot for monitoring and debugging Kubernetes clusters.

fxjwind