Program against your datacenter like it’s a single pool of resources
Apache Mesos abstracts CPU, memory, storage, and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively.
What is Mesos?
A distributed systems kernel
Mesos is built using the same principles as the Linux kernel, only at a different level of abstraction. The Mesos kernel runs on every machine and provides applications (e.g., Hadoop, Spark, Kafka, Elastic Search) with API’s for resource management and scheduling across entire datacenter and cloud environments.
TheGoimplementationofgRPC: A high performance, open source, general RPC framework that puts mobile and HTTP/2 first. For more information see the gRPC Quick Start guide.
The Apache Thrift software framework, for scalable cross-language services development,
combines a software stack with a code generation engine
to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.
Apache Kafka is publish-subscribe messaging rethought(rethink 过去式和过去分词)as a distributed commit log.
- Fast
A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients.
- Scalable
Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization.
It can be elastically and transparently expanded without downtime.
Data streams are partitioned and spread over a cluster of machines to allow data streams larger than
the capability of any single machine and to allow clusters of co-ordinated consumers
- Durable
Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact.
- Distributed by Design
Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.
memcached 是高性能的分布式内存缓存服务器。一般的使用目的是,通过缓存数据库查询结果,减少数据库访问次数,以提高动态 Web 应用的速度、提高可扩展性。
What is Memcached?
Free & open source, high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.
Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.
Memcached is simple yet powerful. Its simple design promotes quick deployment, ease of development, and solves many problems facing large data caches. Its API is available for most popular languages.
nginx [engine x] is an HTTP and reverse proxy server, a mail proxy server, and a generic TCP proxy server, originally written by Igor Sysoev. For a long time, it has been running on many heavily loaded Russian sites including Yandex, Mail.Ru, VK, and Rambler. According to Netcraft, nginx served or proxied 23.36% busiest sites in September 2015. Here are some of the success stories: Netflix, Wordpress.com, FastMail.FM.
The sources and documentation are distributed under the 2-clause BSD-like license.
Redis is an open source (BSD licensed), in-memory data structure store, used as database, cache and message broker(代理人,经纪人).
It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs and geospatial indexes with radius queries.
Redis has built-in replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.
MapReduce is a programming model and an associated implementation for processing and generating large datasets with a parallel, distributed algorithm on a cluster.
Conceptually similar approaches have been very well known since 1995 with the Message Passing Interface standard having reduce and scatter operations.
MapReduce is the heart of Hadoop®. It is this programming paradigm that allows for massive scalability across
hundreds or thousands of servers in a Hadoop cluster.
The MapReduce concept is fairly simple to understand for those who are familiar with clustered scale-out data
processing solutions.
For people newto this topic, it can be somewhat difficult to grasp, because it’s not typically something people have been exposed to previously.
If you’re newto Hadoop’s MapReduce jobs, don’t worry: we’re going to describe it in a way that gets you up
to speed quickly.
The term MapReduce actually refers to two separate anddistinct tasks that Hadoop programs perform.
The first is the map job, which takes a setof data and converts it into another setof data, where individual elements are broken down into tuples (key/value pairs).
The reduce job takes the output from a map as input and combines those data tuples into a smaller setof tuples.
As the sequence of the name MapReduce implies, the reduce job is always performed after the map job.
Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!
Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.
Storm integrates with the queueing and database technologies you already use. A Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams between each stage of the computation however needed. Read more in the tutorial.
Storm (event processor)
Apache Storm is a distributed computation framework written predominantly in the Clojure programming language. Originally created by Nathan Marz[1] and team at BackType,[2] the project was open sourced after being acquired by Twitter.[3] It uses custom created "spouts" and "bolts" to define information sources and manipulations to allow batch, distributed processing of streaming data. The initial release was on 17 September 2011.[4]
A Storm application is designed as a "topology" in the shape of a directed acyclic graph (DAG) with spouts and bolts acting as the graph vertices. Edges on the graph are named streams and direct data from one node to another. Together, the topology acts as a data transformation pipeline. At a superficial level the general topology structure is similar to a MapReduce job, with the main difference being that data is processed in real-time as opposed to in individual batches. Additionally, Storm topologies run indefinitely until killed, while a MapReduce job DAG must eventually end.[5]
Storm became an Apache Top-Level Project in September 2014[6] and was previously in incubation since September 2013.[7][8]
The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up fromsingle servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, eachof which may be prone to failures.
The project includes these modules:
- Hadoop Common: The common utilities that support the other Hadoop modules.
- Hadoop Distributed File System(HDFS™): A distributed file system that provides high-throughput access to application data.
- Hadoop YARN: A framework for job scheduling and cluster resource management.
- Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
Other Hadoop-related projects at Apache include:
- Ambari™: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner.
- Avro™: A data serialization system.
- Cassandra™: A scalable multi-master database with no single points of failure.
- Chukwa™: A data collection system for managing large distributed systems.
- HBase™: A scalable, distributed database that supports structured data storage for large tables.
- Hive™: A data warehouse infrastructure(基础设施)that provides data summarization(概要) and ad hoc querying.
Ad Hoc Query:是指用户根据当时的需求而即刻定义的查询。是一种条件不固定、格式灵活的查询报表,可以提供给用户更多的交互方式。
- Mahout™: A Scalable machine learning and data mining library.
- Pig™: A high-level data-flow language and execution framework for parallel computation.
- Spark™: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation.
- Tez™: A generalized data-flow programming framework, built on Hadoop YARN, which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch and interactive use-cases. Tez is being adopted by Hive™, Pig™ and other frameworks in the Hadoop ecosystem, and also by other commercial software(e.g. ETL tools), to replace Hadoop™ MapReduce as the underlying execution engine.
- ZooKeeper™: A high-performance coordination service for distributed applications.
##### [Getting Started]
##### [Learn about Hadoop by reading the documentation.](http://hadoop.apache.org/docs/current/)
The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.
Cassandra's data model offers the convenience of column indexes with the performance of log-structured updates, strong support for denormalization and materialized views, and powerful built-in caching.
This guide provides information for developers and administrators on installing, configuring, and using the features and capabilities of Cassandra.
What is Apache Cassandra?
Apache Cassandra™ is a massively scalable open source NoSQL database. Cassandra is perfect for managing large amounts of structured, semi-structured, and unstructured data across multiple data centers and the cloud. Cassandra delivers continuous availability, linear scalability, and operational simplicity across many commodity servers with no single point of failure, along with a powerful dynamic data model designed for maximum flexibility and fast response times.
How does Cassandra work?
Cassandra’s built-for-scale architecture means that it is capable of handling petabytes of information and thousands of concurrent users/operations per second.
A highly-available key value store for shared configuration and service discovery
Overview
etcd is a distributed key value store that provides a reliable way to store data across a cluster of machines. It’s open-source and available on GitHub. etcd gracefully handles master elections during network partitions and will tolerate machine failure, including the master.
Your applications can read and write data into etcd. A simple use-caseisto store database connection details or feature flags in etcd askey value pairs. These values can be watched, allowing your app to reconfigure itself when they change.
Advanced uses take advantage of the consistency guarantees to implement database master elections ordo distributed locking across a cluster of workers.
Ceph is a distributed object store and file system designed to provide excellent performance, reliability and scalability.
- Object Storage
Ceph provides seamless access to objects using native language bindings or radosgw, a REST interface that’s compatible with applications written for S3 and Swift.
- Block Storage
Ceph’s RADOS Block Device (RBD) provides access to block device images that are striped and replicated across the entire storage cluster.
- File System
Ceph provides a POSIX-compliant network file system that aims for high performance, large data storage, and maximum compatibility with legacy applications.
Ceph uniquely delivers object, block, and file storage in one unified system.
#### [Intro to Ceph](http://docs.ceph.com/docs/v0.80.5/start/intro/)
Whether you want to provide Ceph Object Storage and/or Ceph Block Device services to Cloud Platforms,
deploy a Ceph Filesystem oruse Ceph for another purpose,all Ceph Storage Cluster deployments begin with setting up each Ceph Node, your network and the Ceph Storage Cluster.
A Ceph Storage Cluster requires at least one Ceph Monitor and at least two Ceph OSD Daemons.
The Ceph Metadata Server is essential when running Ceph Filesystem clients.
High performance server-side application framework(c++开发),是[scylla](https://github.com/scylladb/scylla)的网络框架
SeaStar is an event-driven framework allowing you towrite non-blocking, asynchronous code in a relatively straightforward manner (once understood). It is based on futures.
Apache HBase™ is the Hadoop database, a distributed, scalable, big data store
When Would I Use Apache HBase?
Use Apache HBase™ when you need random, realtime read/write accessto your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.
GCS Fuse is an open source Fuse adapter that allows you to **mount Google Cloud Storage buckets asfile systems on Linux or OS X systems**.
GCS Fuse can be run anywhere with connectivity to Google Cloud Storage (GCS) including Google Compute Engine VMs oron-premises systems.
GCS Fuse provides another means to access Google Cloud Storage objects in addition to the XML API, JSON API, and the gsutil command line,
allowing even more applications to use Google Cloud Storage and take advantage of its immense scale, high availability, rock-solid durability,
exemplary performance, and low overall cost. GCS Fuse is a Google-developed and community-supported open-source tool, written in Go and hosted on GitHub.
GCS Fuse is open-source software, released under the Apache License.
It is distributed as-is, without warranties or conditions of any kind.
Best effort community support is available on Server Fault with the google-cloud-platform and gcsfuse tags.
Check the previous questions and answers to see if your issue is already answered. For bugs and feature requests, file an issue.
Technical Overview
GCSFuse works by translating objectstoragenamesintoafileanddirectorysystem, interpretingthe“/”characterinobjectnamesasadirectoryseparatorsothatobjectswiththesamecommonprefixaretreatedasfilesinthesamedirectory. Applicationscaninteractwiththemountedbucketlikeanyotherfilesystem, providingvirtuallylimitlessfilestoragerunninginthecloud, butaccessedthroughatraditionalPOSIXinterface.
WhileGCSFuse has a file system interface, it is not like an NFS or CIFS file system on the backend.
GCSFuse retains the same fundamental characteristics of GoogleCloudStorage, preserving the scalability of GoogleCloudStorage in terms of size and aggregate performance while maintaining the same latency and single objectperformance. Aswiththeotheraccessmethods, GoogleCloudStoragedoesnotsupportconcurrencyandlocking. Forexample, ifmultipleGCSFuseclientsarewritingtothesamefile, thelastflushwins.
For more information about using GCS Fuse or to file an issue, go to the Google Cloud Platform GitHub repository.
GCS Fuse is a utility that helps you make better and quicker use of Google Cloud Storage by allowing file-based applications to use Google Cloud Storage without need forrewriting their I/O code. It is ideal foruse cases where Google Cloud Storage has the right performance and scalability characteristics foran application and only the POSIX semantics are missing.
For example, GCS Fuse will work well forgenomics and biotech applications, some media/visual effects/rendering applications, financial services modeling applications, web serving content, FTP backends, and applications storing log files (presuming they do not flush too frequently).
support
GCS Fuse is supported in Linux kernel version 3.10and newer. To check your kernel version, you can use uname -a.
Current status
Please treat gcsfuse as beta-quality software. Use it for whatever you like, but be aware that bugs may lurk(潜伏), and that we reserve(保留)the right to make small backwards-incompatible changes.(保留权力 做不向后兼容的修改)
The careful user should be sure to read semantics.md for information on how gcsfuse maps file system operations to GCS operations, and especially on surprising behaviors. The list of open issues may also be of interest.
Goofys allows you to mount an S3 bucket as a filey system.
It's a Filey System instead of a File System because goofys strives for performance first and POSIX second. Particularly things that are difficult to support on S3 or would translate into more than one round-trip would either fail (random writes) or faked (no per-file permission). Goofys does not have a on disk data cache, and consistency model is close-to-open.
Seafile is an open source cloud storage systemwith features on privacy protection and teamwork. Collections of files arecalled libraries, andeach library can be synced separately. A library can also be encrypted with a user chosen password. Seafile also allows users tocreategroupsand easily sharing files into groups.
Introduction Build Status
Seafile is an open source cloud storage systemwith features on privacy protection and teamwork. Collections of files arecalled libraries, andeach library can be synced separately. A library can also be encrypted with a user chosen password. Seafile also allows users tocreategroupsand easily sharing files into groups.
Feature Summary
Seafile has the following features:
File syncing
Selective synchronization of file libraries. Each library can be synced separately.
Correct handling of file conflicts based on history instead of timestamp.
Only transfering contents not in the server, and incomplete transfers can be resumed.
Sync with two or more servers.
Sync with existing folders.
Sync a sub-folder.
File sharing and collaboration
Sharing libraries between users or into groups.
Sharing sub-folders between users or into groups.
Download links with password protection
Upload links
Version control with configurable revision number.
Restoring deleted files from trash, history or snapshots.
Privacy protection
Library encryption with a user chosen password.
Client side encryption when using the desktop syncing.
Internal
Seafile's version control model is based on Git, but it is simplified for automatic synchronization does not need Git installed to run Seafile. Each Seafile library behaves like a Git repository. It has its own unique history, which consists of a list of commits. A commit points to the root of a file system snapshot. The snapshot consists of directories and files. Files are further divided into blocks for more efficient network transfer and storage usage.
Differences from Git:
Automatic synchronization.
Clients do not store file history, thus they avoid the overhead of storing data twice. Git is not efficient for larger files such as images.
Files are further divided into blocks for more efficient network transfer and storage usage.
File transfer can be paused and resumed.
Support for different storage backends on the server side.
Support for downloading from multiple block servers to accelerate file transfer.
More user-friendly file conflict handling. (Seafile adds the user's name as a suffix to conflicting files.)
Graceful handling of files the user modifies while auto-sync is running. Git is not designed to work in these cases.
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· winform 绘制太阳,地球,月球 运作规律
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)