streamsets origin 说明

origin 是streamsets pipeline的soure 入口，只能应用一个origin 在pipeline中，
对于运行在不同执行模式的pipeline 可以应用不同的origin

独立模式
集群模式
edge模式（agent）
开发模式（方便测试）

standalone（独立模式）组件

In standalone pipelines, you can use the following origins:

Amazon S3 - Reads objects from Amazon S3.
Amazon SQS Consumer - Reads data from queues in Amazon Simple Queue Services (SQS).
Azure IoT/Event Hub Consumer - Reads data from Microsoft Azure Event Hub. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
CoAP Server - Listens on a CoAP endpoint and processes the contents of all authorized CoAP requests. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
Directory - Reads fully-written files from a directory. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
Elasticsearch - Reads data from an Elasticsearch cluster. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
File Tail - Reads lines of data from an active file after reading related archived files in the directory.
Google BigQuery - Executes a query job and reads the result from Google BigQuery.
Google Cloud Storage - Reads fully written objects from Google Cloud Storage.
Google Pub/Sub Subscriber - Consumes messages from a Google Pub/Sub subscription. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
Hadoop FS Standalone - Reads fully-written files from HDFS. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
HTTP Client - Reads data from a streaming HTTP resource URL.
HTTP Server - Listens on an HTTP endpoint and processes the contents of all authorized HTTP POST and PUT requests. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
HTTP to Kafka (Deprecated) - Listens on a HTTP endpoint and writes the contents of all authorized HTTP POST requests directly to Kafka.
JDBC Multitable Consumer - Reads database data from multiple tables through a JDBC connection. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
JDBC Query Consumer - Reads database data using a user-defined SQL query through a JDBC connection.
JMS Consumer - Reads messages from JMS.
Kafka Consumer - Reads messages from a single Kafka topic.
Kafka Multitopic Consumer - Reads messages from multiple Kafka topics. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
Kinesis Consumer - Reads data from Kinesis Streams. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
MapR DB CDC - Reads changed MapR DB data that has been written to MapR Streams. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
MapR DB JSON - Reads JSON documents from MapR DB JSON tables.
MapR FS - Reads files from MapR FS.
MapR FS Standalone - Reads fully-written files from MapR FS. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
MapR Multitopic Streams Consumer - Reads messages from multiple MapR Streams topics. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
MapR Streams Consumer - Reads messages from MapR Streams.
MongoDB - Reads documents from MongoDB.
MongoDB Oplog - Reads entries from a MongoDB Oplog.
MQTT Subscriber - Subscribes to a topic on an MQTT broker to read messages from the broker.
MySQL Binary Log - Reads MySQL binary logs to generate change data capture records.
Omniture - Reads web usage reports from the Omniture reporting API.
OPC UA Client - Reads data from a OPC UA server.
Oracle CDC Client - Reads LogMiner redo logs to generate change data capture records.
PostgreSQL CDC Client - Reads PostgreSQL WAL data to generate change data capture records.
RabbitMQ Consumer - Reads messages from RabbitMQ.
Redis Consumer - Reads messages from Redis.
REST Service - Listens on an HTTP endpoint, parses the contents of all authorized requests, and sends responses back to the originating REST API. Creates multiple threads to enable parallel processing in a multithreaded pipeline. Use as part of a microservice pipeline.
Salesforce - Reads data from Salesforce.
SDC RPC - Reads data from an SDC RPC destination in an SDC RPC pipeline.
SDC RPC to Kafka (Deprecated) - Reads data from an SDC RPC destination in an SDC RPC pipeline and writes it to Kafka.
SFTP/FTP Client - Reads files from an SFTP or FTP server.
SQL Server CDC Client - Reads data from Microsoft SQL Server CDC tables. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
SQL Server Change Tracking - Reads data from Microsoft SQL Server change tracking tables and generates the latest version of each record. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
TCP Server - Listens at the specified ports and processes incoming data over TCP/IP connections. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
UDP Multithreaded Source - Reads messages from one or more UDP ports. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
UDP Source - Reads messages from one or more UDP ports.
UDP to Kafka (Deprecated) - Reads messages from one or more UDP ports and writes the data to Kafka.
WebSocket Client - Reads data from a WebSocket server endpoint.
WebSocket Server - Listens on a WebSocket endpoint and processes the contents of all authorized WebSocket client requests. Creates multiple threads to enable parallel processing in a multithreaded pipeline.

集群模式的组件

In cluster pipelines, you can use the following origins:

Hadoop FS - Reads data from HDFS, Amazon S3, or other file systems using the Hadoop FileSystem interface.
Kafka Consumer - Reads messages from Kafka. Use the cluster version of the origin.
MapR FS - Reads data from MapR FS.
MapR Streams Consumer - Reads messages from MapR Streams.

edge 模式

In edge pipelines, you can use the following origins:

Directory - Reads fully-written files from a directory.
File Tail - Reads lines of data from an active file after reading related archived files in the directory.
HTTP Client - Reads data from a streaming HTTP resource URL.
HTTP Server - Listens on an HTTP endpoint and processes the contents of all authorized HTTP POST and PUT requests.
MQTT Subscriber - Subscribes to a topic on an MQTT broker to read messages from the broker.
System Metrics - Reads system metrics from the edge device where SDC Edge is installed.
WebSocket Client - Reads data from a WebSocket server endpoint.
Windows Event Log - Reads data from a Microsoft Windows event log located on a Windows machine.

开发模式

To help create or test pipelines, you can use the following development origins:

Dev Data Generator
Dev Random Source
Dev Raw Data Source
Dev SDC RPC with Buffering
Dev Snapshot Replaying
Sensor Reader

参考资料

https://streamsets.com/documentation/datacollector/latest/help/datacollector/UserGuide/Origins/Origins_overview.html#concept_hpr_twm_jq__section_tvn_4bc_f2b

posted on 2018-08-20 14:27 荣锋亮阅读(1722) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

阅读排行：
· 全程不用写代码，我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· MongoDB 8.0这个新功能碉堡了，比商业数据库还牛
· .NET10 - 预览版1新功能体验（一）

rongfengliang-荣锋亮

streamsets origin 说明

standalone（独立模式）组件

集群模式的组件

edge 模式

开发模式

参考资料

导航

公告

搜索

常用链接

最新随笔

积分与排名

随笔分类 (3865)

随笔档案 (4865)

文章分类 (205)

文章档案 (175)

.net 安全揭秘

DB

geohash 学习

graphql

IE 浏览器

IIS

IOT

open xml

REST 设计

sharepoint

sql server CLR

SSIS 学习

UML

vsto

web

Web service

windows 服务

插件开发

复杂事件处理

技术

类库

流量分析

敏捷

移动

运维

阅读排行榜

评论排行榜

推荐排行榜

最新评论