Thrift

http://thrift.apache.org/

http://blog.csdn.net/ellios/article/details/6293129, thrift 入门介绍

http://en.wikipedia.org/wiki/Apache_Thrift

 

Thrift: Scalable Cross-Language Services Implementation

对于FaceBook, 鼓励使用最合适的语言和工具来快速实现功能, 所以不同的工程师使用不同的语言, 带来一个明显的问题, 不同语言开发的模块之间的整合问题.

Given this design choice, we were presented with the challenge of building a transparent, high-performance bridge across many programming languages.

当然解决这个问题有很多方法, 但是对于Facebook这样的大数据公司需要一种高效的方案.

Thrift就是这样一种跨语言的RPC栈, 之所以叫stack, 他而不仅仅提供序列化, 而是提供一套完整的RPC调用封装
并且Thrift考虑到通用性, 虽然他默认支持binary, 但是通过protocol的封装也支持Json或其他的序列化格式, 同样使用Transport的封装也支持除了TCP以外的其他的传输方式, http, file
除了Transport和Protocol这两个核心layer外, 还提供支持RPC的processor和server layer, 后面再具体谈
Thrift还提供code generation engine, 可以将用抽象的IDL写的结构和接口转化为特定语言的client和server端的代码, 非常方便

image

The solution that we have implemented combines a language-neutral software stack implemented across numerous programming languages and an associated code generation engine that transforms
a simple interface and data definition language into client and server remote procedure call libraries.

 

Thrift只支持静态数据类型和程序生成, 你可以认为这是个缺点,  当client或server的数据结构发生变化时, 必须修改IDL, 并重新生成和编译代码

但是也可以认为, 这种方式更加简单, 更好用些, 可以比较Avro

Choosing static code generation over a dynamic system allows us to create validated code that can be run without the need for any advanced introspective run-time type checking. It is also designed to be as simple as possible for the developer, who can typically define all the necessary data structures and interfaces for a complex service in a single short file.

 

2. Types, 类型

The goal of the Thrift type system is to enable programmers to develop using completely natively defined types, no matter what programming language they use. By design, the Thrift type system does not introduce any special dynamic types or wrapper objects. It also does not require that the developer write any code for object serialization or transport.

The Thrift IDL (Interface Definition Language) file is logically a way for developers to annotate their data structures with the minimal amount of extra information necessary to tell a code generator how to safely transport the objects across languages.

跨语言首先要考虑类型问题, 各种语言支持的类型是不一样的, 面向过程的, 面向对象的...所以需要抽象出所有语言中的general的数据类型, 并支持general数据类型到各个语言类型之间的mapping
对于Thrift支持的类型, 用户不用考虑序列化和传输问题, Thrift已经封装的相应的逻辑

在目标语言中, thrift会为每个类型产生两个methods, read和write, 用于使用相应的TProtocal来进行serialization and transport

In the target language, each definition generates a type with two methods, read and write, which perform serialization and transport of the objects using a Thrift TProtocol object.

2.1 Base Types

The base types supported by Thrift are:

bool A boolean value, true or false
byte A signed byte
i16 A 16-bit signed integer
i32 A 32-bit signed integer
i64 A 64-bit signed integer
double A 64-bit floating point number
string An encoding-agnostic text or binary string

可见只包含所有语言都支持的关键类型, 甚至不包含无符号整型, 因为它无法映射到很多语言的基本类型.

 

2.2 Structs

A Thrift struct defines a common object to be used across languages.
A struct is essentially equivalent to a class in object oriented programming languages.

A struct has a set of strongly typed fields, each with a unique name identifier. The basic syntax for defining a Thrift struct looks very similar to a C struct definition.

struct Example {
    1:i32 number=10,
    2:i64 bigNumber,
    3:double decimals,
    4:string name="thrifty"
}
除了基本类型, 还需要支持structs, 对应于对象(C++)和结构(C)
Structs中存在一些强类型的fields, 需要注意的是, 这些fields由id来作为唯一标识, 而非field name, 所以name是可以随意改变的, 但是id不能改变, 并且就算field被删除后, 该id也不能被重复使用 

 

2.3 Containers

Thrift containers are strongly typed containers that map to the most commonly used containers in common programming languages.
They are annotated using the C++ template (or Java Generics) style.
There are three types available:

list<type> An ordered list of elements. Translates directly into an STL vector, Java ArrayList, or native array in scripting languages. May contain duplicates.
set<type> An unordered set of unique elements. Translates into an STL set, Java HashSet, set in Python, or native dictionary in PHP/Ruby.
map<type1,type2> A map of strictly unique keys to values Translates into an STL map, Java HashMap, PHP associative array, or Python/Ruby dictionary.

Thrift还提供container的支持, 包含常用的list, set, map. 上面列出在对应语言中的数据结构的映射关系, 当然这些关系不是不可改变的, 用户可以自行配置

 

2.4 Exceptions

Exceptions are syntactically and functionally equivalent to structs except that they are declared using the exception keyword instead of the struct keyword.
还支持对异常的定义, 一种特殊的structs

 

2.5 Services

Services are defined using Thrift types. Definition of a service is semantically equivalent to defining an interface (or a pure virtual abstract class) in object oriented programming.
The Thrift compiler generates fully functional client and server stubs that implement the interface. Services are defined as follows:

service <name> {
    <returntype> <name>(<arguments>)
        [throws (<exceptions>)]
...
}

An example:
service StringCache {
    void set(1:i32 key, 2:string value),
    string get(1:i32 key) throws (1:KeyNotFound knf),
    void delete(1:i32 key)
}
RPC中接口的定义也很重要, 使用IDL定义service, 并使用thrift complier直接产生client和server stubs的代码, 非常方便

除了基本类型外, 还支持void类型, 并且可以在void前添加async, 表示异步的方式(which will generate code that does not wait for a response from the server)

但是对应异步模式, client只能保证request在transport layer是成功的, 但是往上无法保证, 因为没有等待server响应, 直接返回了, 所以只适用于允许丢失request或传输可靠的场景.

With async method calls the client will only be guaranteed that the request succeeded at the transport layer. (In many transport scenarios this is inherently unreliable due to the Byzantine Generals’ Problem. Therefore, application developers should take care only to use the async optimization in cases where dropped method calls are acceptable or the transport is known to be reliable.)

 

3. Transport

The transport layer is used by the generated code to facilitate data transfer.

3.1 Interface

Thrift的设计是框架性, 无论是在protocal或是transport层, 都将其从code generation layer独立出来.
意味着用于使用thrift不需要关注具体的序列化或传输问题, 只需要调用暴露出来的general API.

对于Thrift虽然默认使用TCP/IP stack with streaming sockets, 但是通过支持替换transport layer, 可以简单的使用其他的transport方式, 基于文件, 共享内存...
参考下面, 可以看到, transport interface支持一些对于任意transport方式都适用的抽象接口, open, close, read, write...

A key design choice in the implementation of Thrift was to decouple the transport layer from the code generation layer.
Though Thrift is typically used on top of the TCP/IP stack with streaming sockets as the base layer of communication, there was no compelling reason to build that constraint into the system. Fundamentally, generated Thrift code only needs to know how to read and write data. The origin and destination of the data are irrelevant; it may be a socket, a segment of shared memory, or a
file on the local disk.

The Thrift transport interface supports the following methods:

open Opens the tranpsort
close Closes the tranport
isOpen Indicates whether the transport is open
read Reads from the transport
write Writes to the transport
flush Forces any pending writes


In addition to the above TTransport interface, there is a TServerTransport interface used to accept or create primitive transport objects. Its interface is as follows:

open Opens the transport
listen Begins listening for connections
accept Returns a new client transport
close Closes the transport

3.2 Implementation

The transport interface is designed for simple implementation in any programming language.
New transport mechanisms can be easily defined as needed by application developers.

* TSocket- 使用堵塞式I/O进行传输,也是最常见的模式。
* TFramedTransport- 使用非阻塞方式,按块的大小,进行传输,类似于Java中的NIO。
* TFileTransport- 顾名思义按照文件的方式进程传输,虽然这种方式不提供Java的实现,但是实现起来非常简单。
* TMemoryTransport- 使用内存I/O,就好比Java中的ByteArrayOutputStream实现。
* TZlibTransport- 使用执行zlib压缩,不提供Java的实现。

 

4. Protocol

象前面谈的, Protocol也是thrift做的重要的抽象, 虽然出于效率常常会使用binary作为序列化的编码方式, 但是通过不同的protocol也可以使用其他的编码方式, 如使用XML, Json, Ascii…

A second major abstraction in Thrift is the separation of data structure from transport representation.

Thrift enforces a certain messaging structure when transporting data, but it is agnostic to the protocol encoding in use. That is, it does not matter whether data is encoded as XML, human-readable ASCII, or a dense binary format as long as the data supports a fixed set of operations that allow it to be deterministically read and written by generated code.

4.1 Interface

同样对于接口的定义, 必须要足够的通用, 需要可以兼顾各种编码方式
可见下面的write和read, 一般都是成对出现的, 而且并不是对每种编码都需要全部接口

For example, writeStructEnd() is not strictly necessary, as the end of a struct may be implied by the stop field.
This method is a convenience for verbose protocols in which it is cleaner to separate these calls (e.g. a closing </struct> tag in XML).

The Thrift Protocol interface is very straightforward. It fundamentally supports two things: 1) bidirectional sequenced messaging, and 2) encoding of base types, containers, and structs.
writeMessageBegin(name, type, seq)
writeMessageEnd()
writeStructBegin(name)
writeStructEnd()
writeFieldBegin(name, type, id)
writeFieldEnd()
writeFieldStop()
writeMapBegin(ktype, vtype, size)
writeMapEnd()
writeListBegin(etype, size)
writeListEnd()
writeSetBegin(etype, size)
writeSetEnd()
writeBool(bool)
writeByte(byte)
writeI16(i16)
writeI32(i32)
writeI64(i64)
writeDouble(double)
writeString(string)

name, type, seq = readMessageBegin()
                  readMessageEnd()
name = readStructBegin()
       readStructEnd()
name, type, id = readFieldBegin()
                 readFieldEnd()
k, v, size = readMapBegin()
             readMapEnd()
etype, size = readListBegin()
              readListEnd()
etype, size = readSetBegin()
              readSetEnd()
bool = readBool()
byte = readByte()
i16 = readI16()
i32 = readI32()
i64 = readI64()
double = readDouble()
string = readString()

4.2 Structure

Thrift structures are designed to support encoding into a streaming protocol. The implementation should never need to frame or compute the entire data length of a structure prior to encoding it. This is critical to performance in many scenarios.
个人理解, 在编码structs时, 使用的是streaming protocol, 在writeStructBegin时, 不会先去计算struct的大小(这需要先对struct进行全扫描, 对于大struct有效率问题), 只是不断的写, 至到最后写入writeStructEnd().

The Thrift protocol is self-delimiting without any framing and regardless of the encoding format.

读的过程也同样, 不需要先去判断struct的大小, 从begin读到end, 就知道struct结束.

 

5. Versioning

Thrift is robust in the face of versioning and data definition changes.

This is critical to enable staged rollouts of changes to deployed services. The system must be able to support reading of old data from log files, as well as requests from out-of-date clients
to new servers, and vice versa.

Thrift在版本和数据schema发生变化的时候表现很robust, 支持client和server schema不同步时的读写

5.1 Field Identifiers

Versioning in Thrift is implemented via field identifiers.
The field header for every member of a struct in Thrift is encoded with a unique field identifier.

在读取数据进行反序列化的时候, 通过对比field id就可以知道, schema是否被改变, 对于无法识别的field可以直接ignore

When data is being deserialized, the generated code can use these identifiers to properly identify the field and determine whether it aligns with a field in its definition file.
If a field identifier is not recognized, the generated code can use the type specifier to skip the unknown field without any error.

5.3 Case Analysis

There are four cases in which version mismatches may occur.
1. Added field, old client, new server.
In this case, the old client does not send the new field. The new server recognizes that the field is not set, and implements default behavior for out-of-date requests.
2. Removed field, old client, new server.
In this case, the old client sends the removed field. The new server simply ignores it.
3. Added field, new client, old server.
The new client sends a field that the old server does not recognize. The old server simply ignores it and processes as normal.
4. Removed field, new client, old server.
This is the most dangerous case, as the old server is unlikely to have suitable default behavior implemented for the missing field. It is recommended that in this situation the new server be rolled out prior to the new clients.

 

6. RPC Implementation

6.1 TProcessor

The last core interface in the Thrift design is the TProcessor, perhaps the most simple of the constructs. The interface is as follows:

interface TProcessor {
    bool process(TProtocol in, TProtocol out) throws TException
}
抽象在server端具体执行逻辑, server会创建processor, 从in protocol读入数据, 并调用process逻辑(用户使用各种语言开发的)进行处理, 最后将数据写入out protocol中

 

6.3 TServer

Finally, the Thrift core libraries provide a TServer abstraction.

  • Create a transport
  • Create input/output protocols for the transport
  • Create a processor based on the input/output protocols
  • Wait for incoming connections and hand them off to the processor

Server的实现
    * TSimpleServer -  单线程服务器端使用标准的堵塞式I/O
    * TThreadPoolServer -  多线程服务器端使用标准的堵塞式I/O
    * TNonblockingServer – 多线程服务器端使用非堵塞式I/O,并且实现了Java中的NIO通道

 

上面给出所有stack的定义, 下面通过这张图作为总结
Thrift的特点是, 不光光解决这个问题, 而是处处着眼于提供一种framework
所以他的思路不是光提供binary的序列化方案, 基于TCP的传输, 而是提供整个RPC的协议栈

并且提供完善的工具, 根据IDL自动生成client和server端的代码, 非常的方便

image

posted on 2013-05-16 15:48  fxjwind  阅读(1509)  评论(0编辑  收藏  举报