what-are-openssl-bios-how-do-they-work-how-are-bios-used-in-openssl && openssl-programing
What is OpenSSL BIO?
OpenSSL BIO is an API that provides input/output related functionality. The acronym BIO stands for Basic Input/Output
What is its general idea? How is it different from stdio API or sockets API?
The first idea is that it is not an API for some specific type of IO (for example, for files or for network). It is generalized API for various types of entities capable of input/output operations. It is similar to C++ abstract classes with pure virtual functions. You just use a single interface but the behavior is different depending on which specific BIO object is used. For example, it can be a socket object or a file object. If you use the BIO_write
function with a socket object, the data will be sent over network. If you use the BIO_write
function with a file object, the data will be written to a file.
The second idea behind OpenSSL BIO API is that BIO objects can be stacked together into a single linear chain. It allows to process data through different filters before sending it to the final output (sink) or after reading it from the initial input (source). Filters are also BIO objects.
What is a filter BIO? What is a source BIO? What is a sink BIO?
An OpenSSL filter BIO is a BIO that takes data, processes it and passes it to another BIO.
An OpenSSL source BIO is a BIO that doesn't take data from another BIO but takes it from somewhere else (from a file, network, etc.).
An OpenSSL sink BIO is a BIO that doesn't pass data to another BIO but transfers it to somewhere else (to a file, network, etc.).
As regard to source and sink BIOs, there are not specifically source BIOs and there are not specifically sink BIOs, there are only "source-sink" BIOs. The BIO which is a source BIO is a sink BIO as well. For example, a socket BIO is a source BIO and a sink BIO at the same time. When data is written to a socket BIO, the BIO works as a sink. When data is read from a socket BIO, the BIO works a source. Source-sink BIOs are always the terminating section of a BIO chain. This is different from a usual data processing pipeline where the source is the start of the pipeline and the sink is the end of the pipeline.
How can I run data through a filter BIO? Do I have to feed the data to the filter BIO using the
BIO_write
function and get processed data using theBIO_read
function?
If you put data to a filter using the BIO_write
function, you can't get processed data by simply calling the BIO_read
function on the BIO. Filter BIOs work in a different way. A filter BIO may avoid storing processed data in a buffer. It may just take the input data, process it and immediately pass it to the next BIO in the chain using the same BIO_write
function you used to put your data to the BIO. The next BIO, in turn, may, after processing, write the data to the next BIO in the chain. The process stops if either some BIO stores the data in its internal buffer (if it doesn't have enough data to generate output for the next BIO) or if the data reaches the sink.
If you need just run data through a filter BIO without sending it over network or without writing it to a file, you can attach the filter BIO to an OpenSSL memory BIO (i.e. make the following chain: filter bio <-> memory bio
). A memory BIO is a source-sink BIO, but it doesn't send data to anywhere, it just stores the data in a memory buffer. After writing the data to the filter BIO, the data will be written to the memory BIO which will store it in the memory buffer. A memory BIO has special interface to get the data directly from the buffer (though you can use BIO_read
to get the data that was written to a memory BIO, see below).
Reading from a filter BIO works in an opposite way. If you request to read data from a filter BIO, the filter BIO may, in turn, request to read data from the next BIO in the chain. The process stops if either some BIO has enough buffered data to return or if the process reaches the source BIO. A single call to the BIO_read
function on a filter BIO may result in multiple calls to the BIO_read
function inside the filter BIO to get data from the next BIO. A filter BIO will continue to call BIO_read
until it gets enough data to generate processed result.
The situation is more complicated if the source-sink BIO of a chain works in non-blocking mode. For example, non-blocking sockets are used or memory BIO is used (memory BIOs are non-blocking by nature).
Also note that reading from a filter BIO does reversed data processing as compared to processing done when writing to that BIO. For example, if you use a cipher BIO, then writing to the BIO will encipher the written data, but reading from that BIO will decipher the input data. This allows to make a such chain: your code <-> cipher BIO <-> socket BIO
. You write unencrypted data to the cipher BIO which encrypts it and sends it to the socket. When you read from the cipher BIO it, at first, gets encrypted data from the socket, then decrypts it and return unencrypted data to you. This allows you to set up encrypted channel through network. You just use BIO_write
and BIO_read
and all encryption/decryption is done automatically by the BIO chain.
In general a BIO chain looks like on the following diagram:
/------\ /--------\ /---------\ /-------------\
| your | -- BIO_write -> | filter | -- BIO_write -> | another | -- BIO_write -> | source/sink |
| | | | | filter | | |
| code | <- BIO_read -- | BIO | <- BIO_read -- | BIO | <- BIO_read -- | BIO |
\------/ \--------/ \---------/ \-------------/
Why are BIOs needed in OpenSSL? How are they used when programming with OpenSSL? Any examples?
OpenSSL uses BIOs for communicating with the remote side when operating SSL/TLS protocol. The SSL_set_bio
function is used to set up BIOs for communicating in a concrete instance of an SSL/TLS link. You can use socket BIO, for example, to run SSL/TLS protocol via network connection. But you may also develop your own BIO (yes, it is possible) or use memory BIO to run SSL/TLS protocol via your own type of link.
You can also wrap an instance of an SSL/TLS link as a BIO itself (BIO_f_ssl
). Calling BIO_write
on an SSL BIO will result in calling SSL_write
. Calling BIO_read
will result in calling SSL_read
.
Although SSL BIO is a filter BIO, it is a little different from other filter BIOs. Calling BIO_write
on SSL BIO may result in series of both BIO_read
and BIO_write
calls on the next BIO in the chain. Because SSL_write
(that is used inside of BIO_write
of SSL BIO) not only sends data, but also provides operating SSL/TLS protocol which may require multiple data exchanging steps between sides to perform some negotiation. The same is true for BIO_read
of SSL BIO. That is how SSL BIOs are different from ordinary filter BIOs.
Also note, that you are not required to use SSL BIO. You can still use SSL_read
and SSL_write
directly.
Which BIOs does OpenSSL provide? Can you provide examples of BIOs and tell about the differences between them?
Here is examples of source-sink BIOs that OpenSSL provides:
- A file BIO (
BIO_s_file
). It is a wrapper around stdio'sFILE*
object. It used for writing to and reading from a file. - A file descriptor BIO (
BIO_s_fd
). It is similar to file BIO but works with POSIX file descriptors instead stdio files. - A socket BIO (
BIO_s_socket
). It is a wrapper around POSIX sockets. It is used for communicating over network. - A null BIO (
BIO_s_null
). It is similar to the/dev/null
device in POSIX systems. Writing to this BIO just discards data, reading from it results in EOF (end of file). - A memory BIO (
BIO_s_mem
). It is a loopback BIO in essence. Reading from this type of BIO returns the data that was previously written to the BIO. But the data can also be extracted from (or placed to) internal buffer by calling functions that are specific to this type of BIO (every type of BIO has functions that are specific only for this type of BIO). - A "bio" BIO (
BIO_s_bio
). It is a pipe-like BIO. A pair of such BIOs can be created. Data written to one BIO in the pair will be placed for reading to the second BIO in the pair. And vice versa. It is similar to memory BIO, but memory BIO places data to itself and pipe BIO places data to the BIO which it is paired with.
Some information about similarity between BIO_s_mem
and BIO_s_bio
can be found here: OpenSSL “BIO_s_mem” VS “BIO_s_bio”.
And here is examples of filter BIOs:
- A base64 BIO (
BIO_f_base64
).BIO_write
through this BIO encodes data to base64 format.BIO_read
through this BIO decodes data from base64 format. - A cipher BIO (
BIO_f_cipher
). It encrypts/decrypts data passed through it. Different cryptographic algorithms can be used. - A digest calculation BIO (
BIO_f_md
). It doesn't modify data passed through it. It only calculates digest of data that flows through it, leaving the data itself unchanged. Different digest calculation algorithms can be used. The calculated digest can be retrieved using special functions. - A buffering BIO (
BIO_f_buffer
). It also doesn't change data passed through it. Data written to this BIO is buffered and therefore not every write operation to this BIO results in writing the data to the next BIO. As for reading, it is a similar situation. This allows to reduce number of IO operations on BIOs that are located behind buffering IO. - An SSL BIO (
BIO_f_ssl
). This type of BIO was described above. It wraps SSL link inside.
7.1 openssl 抽象 IO
openssl 抽象 IO(I/O abstraction,即 BIO)是 openssl 对于 io 类型的抽象封装,
包括:内存、 文件、日志、标准输入输出、socket(TCP/UDP)、加/解密、摘要和 ssl 通道等。
Openssl BIO 通过回调函数为用户隐藏了底层实现细节,所有类型的 bio 的调用大体上是类似的。
Bio 中的数据能从一个 BIO 传送到另外一个 BIO 或者是应用程序。
其实包含了很多种接口,用通用的函数接口,主要控制在BIO_METHOD中的不同实现函数控制,
包括6种filter型和8种source/sink型。
source/sink类型的BIO是数据源,
例如,sokect BIO 和 文件BIO。
而filter BIO就是把数据从一个BIO转换到另外一个BIO或应用接口,
在转换过程中,些数据可以不修改(如信息摘要BIO),也可以进行转换.
例如在加密BIO中,如果写操作,数据就会被加密,如果是读操作,数据就会被解密。
BIO是封装了许多类型I/O接口细节的一种应用接口,
可以和SSL连接、 非加密的网络连接以及文件IO进行透明的连接。
BIO可以连接在一起成为一个BIO链(单个的BIO就是一个环节的BIO链的特例),
如下是BIO的结构定义,可以看到它有上下环节。
一个BIO链通常包括一个source BIO和一个或多个filter BIO,数据从第一个BIO读出或写入,
然后经过一系列BIO变化到输出(通常是一个source/sink BIO)。
BIO目录文件的简要说明:
bio.h: 主定义的头文件,包括了很多通用的宏的定义。
bio_lib.c: 主要的BIO操作定义文件,是比较上层的函数了。
bss_*系列:是soruce/sink型BIO具体的操作实现文件
bf_*系列: 是filter型BIO具体的操作实现文件
bio_err.c: 是错误信息处理文件
bio_cb.c: 是callback函数的相关文件
b_print.c: 是信息输出的处理函数文件
b_socket.c: 是Socket连接的一些相关信息处理文件
b_dump.c: 是对内存内容的存储操作处理
7.2 数据结构
BIO 数据结构主要有 2 个,在 crypto/bio.h 中定义如下:
- BIO_METHOD
typedef struct bio_method_st {
int type;
const char *name;
int (*bwrite)(BIO *, const char *, int);
int (*bread)(BIO *, char *, int);
int (*bputs)(BIO *, const char *);
int (*bgets)(BIO *, char *, int);
long (*ctrl)(BIO *, int, long, void *);
int (*create)(BIO *);
int (*destroy)(BIO *);
long (*callback_ctrl)(BIO *, int, bio_info_cb *);
} BIO_METHOD;
该结构定义了 IO 操作的各种回调函数,根据需要,具体的 bio 类型必须实现其中的一种或多种回调函数,各项意义如下:
type: 具体 BIO 类型;
name: 具体 BIO 的名字;
bwrite: 具体 BIO 写操作回调函数;
bread: 具体 BIO 读操作回调函数;
bputs: 具体 BIO 中写入字符串回调函数;
bgets: 具体 BIO 中读取字符串函数;
ctrl: 具体 BIO 的控制回调函数;
create: 生成具体 BIO 回调函数;
destroy: 销毁具体 BIO 回调函数;
callback_ctrl: 具体 BIO 控制回调函数,与 ctrl 回调函数不一样,
该函数可由调用者(而不是实现者)来实现,然后通过
BIO_set_callback 等函数来设置。
- BIO
truct bio_st {
BIO_METHOD *method;
/* bio, mode, argp, argi, argl, ret */
long (*callback)(struct bio_st *,int,const char *,int, long,long);
char *cb_arg; /* first argument for the callback */
int init;
int shutdown;
int flags; /* extra storage */
int retry_reason;
int num;
void *ptr;
structbio_st *next_bio; /*usedbyfilterBIOs*/
struct bio_st *prev_bio; /* used by filter BIOs */
int references;
nsigned long num_read;
unsigned long num_write;
CRYPTO_EX_DATA ex_data;
};
/*
主要项含义:
init: 具体句柄初始化标记,初始化后为1。
比如:文件 BIO 中,通过 BIO_set_fp 关联一个文件指针时,该标记则置 1 ;
socket BIO中,通过 BIO_set_fd 关联一个链接时,设置该标记为 1。
shutdown: BIO 关闭标记,当该值不为 0 时,释放资源; 该值可以通过控制函 数来设置。
flags: 有些 BIO 实现需要它来控制各个函数的行为。
比如文件 BIO 默认该值为 BIO_FLAGS_UPLINK,
这时文件读操作调用UP_fread 函数而不是调用fread 函数。
retry_reason: 重试原因,主要用在 socket 和 ssl BIO 的异步阻塞。
比如 socket bio 中,遇到 WSAEWOULDBLOCK 错误时,
openssl 告诉用户的操作需要重试。
num: 该值因具体 BIO 而异,比如 socket BIO 中 num 用来存放链接字。
ptr: 指针,体 bio 有不同含义。比如:
文件 BIO 中它用来存放文件句柄;
mem BIO 中它用来存放内存地址;
connect BIO 中它用来存放 BIO_CONNECT 数据,
accept BIO 中它用来存放 BIO_ACCEPT 数据。
next_bio: 下一个 BIO 地址,BIO 数据可以从一个BIO传送到另一个BIO,
该值指明了下一个 BIO 的地址。
references: 被引用数量。
num_read: BIO 中已读取的字节数。
num_write: BIO 中已写入的字节数。
ex_data: 用于存放额外数据。
*/
typedef struct bio_st BIO;
struct bio_st
{
BIO_METHOD *method; //BIO方法结构,是决定BIO类型和行为的重要参数,各种BIO的不同之处主要也正在于此项。
long (*callback)(struct bio_st *,int,const char *,int, long,long); //BIO回调函数
char *cb_arg; //回调函数的第一个参量
int init; //初始化标志,初始化了为1,否则为0。比如文件BIO 中,通过BIO_set_fp
//关联一个文件指针时,该标记则置1。
int shutdown; //BIO开关标志,如果为BIO_CLOSE,则释放BIO时自动释放持有的资源,否则不自动释放持有资源
int flags; //有些BIO 实现需要它来控制各个函数的行为。比如文件BIO 默认该值为BIO_FLAGS_UPLINK,
//这时文件读操作调用UP_fread 函数而不是调用fread 函数。
int retry_reason; //重试原因,主要用在socket 和ssl BIO 的异步阻塞。比如socketbio 中,遇到
//WSAEWOULDBLOCK 错误时,openssl 告诉用户的操作需要重试
int num; //该值因具体BIO 而异,比如socket BIO 中num 用来存放链接字。
void *ptr; //ptr:指针,具体bio 有不同含义。比如文件BIO中它用来存放文件句柄;mem bio 中它用来存放
//内存地址;connect bio 中它用来存放BIO_CONNECT 数据,acceptbio 中它
//用来存放BIO_ACCEPT数据。
struct bio_st *next_bio; //BIO链中下一个BIO 地址,BIO 数据可以从一个BIO 传送到另一个BIO。
struct bio_st *prev_bio; //BIO链中上一个BIO 地址,
int references; //引用计数
unsigned long num_read; //已读出的数据长度
unsigned long num_write; //已写入的数据长度
CRYPTO_EX_DATA ex_data; //额外数据
};
7.3 BIO 函数
BIO 各个函数定义在 crypto/bio.h 中。所有的函数都由 BIO_METHOD 中的回调函 数来实现。函数主要分为几类:
1) 具体BIO相关函数
比如:BIO_new_file(生成新文件)和 BIO_get_fd(设置网络链接)等。
2) 通用抽象函数
比如 BIO_read 和 BIO_write 等。
另外,有很多函数是由宏定义通过控制函数 BIO_ctrl 实现,
比如 BIO_set_nbio、BIO_get_fd 和 BIO_eof 等等。
在BIO的所用成员中,method可以说是最关键的一个成员,它决定了BIO的类型,
可以看到,在定义一个新的BIO结构时,总是使用下面的函数:
BIO* BIO_new(BIO_METHOD *type);
在源代码可以看出,BIO_new函数除了给一些初始变量赋值外,
主要就是把type中的各个变量赋值给BIO结构中的method成员。
一般来说,上述type参数是以一个返回值为BIO_METHOD类型的函数提供的,
如生成一个mem型的BIO结构,就使用下面的语句:
BIO *mem = BIO_new(BIO_s_mem());
// 【source/sink型】
BIO_METHOD* BIO_s_accept() //一个封装了类似TCP/IP socket Accept规则的接口,并且使TCP/IP操作对于BIO接口透明。
BIO_METHOD* BIO_s_connect() //一个封装了类似TCP/IP socket Connect规则的接口,并且使TCP/IP操作对于BIO接口透明
BIO_METHOD* BIO_s_socket() //封装了socket接口的BIO类型
BIO_METHOD* BIO_s_bio() //封装了一个BIO对,数据从其中一个BIO写入,从另外一个BIO读出
BIO_METHOD* BIO_s_fd() //是一个封装了文件描述符的BIO接口,提供类似文件读写操作的功能
BIO_METHOD* BIO_s_file() //封装了标准的文件接口的BIO,包括标准的输入输出设备如stdin等
BIO_METHOD* BIO_s_mem() //封装了内存操作的BIO接口,包括了对内存的读写操作
BIO_METHOD* BIO_s_null() //返回空的sink型BIO接口,写入这种接口的所有数据读被丢弃,读的时候总是返回EOF
//【filter型】
BIO_METHOD* BIO_f_base64() //封装了base64编码方法的BIO,写的时候进行编码,读的时候解码
BIO_METHOD* BIO_f_cipher() //封装了加解密方法的BIO,写的时候加密,读的时候解密
BIO_METHOD* BIO_f_md() //封装了信息摘要方法的BIO,通过该接口读写的数据都是已经经过摘要的。
BIO_METHOD* BIO_f_ssl() //封装了openssl 的SSL协议的BIO类型,也就是为SSL协议增加了一些BIO操作方法。
BIO_METHOD* BIO_f_null() //一个不作任何事情的BIO,对它的操作都简单传到下一个BIO去了,相当于不存在。
BIO_METHOD* BIO_f_buffer() //封装了缓冲区操作的BIO,写入该接口的数据一般是准备传入下一个BIO接口的,从该——
//接口读出的数据一般也是从另一个BIO传过来的
7.4 编程示例
7.4.1 mem BIO
#include <stdio.h>
#include <string.h>
#include <openssl/bio.h>
int main()
{
BIO *b=NULL;
int len=0;
char *out=NULL;
b=BIO_new(BIO_s_mem());
len=BIO_write(b,"openssl",4);
//len=BIO_write(b,"openssl",7);
len=BIO_printf(b,"%s","zcp");
len=BIO_ctrl_pending(b);
out=(char *)OPENSSL_malloc(len);
len=BIO_read(b,out,len);
printf(" %zu : %s \n" ,strlen(out), out);
//printf(" %d : %s \n" ,len, out);
OPENSSL_free(out);
BIO_free(b);
return 0;
}
说明:
b=BIO_new(BIO_s_mem()); 生成一个mem 类型的BIO。
len=BIO_write(b,"openssl",7); 将字符串"openssl"写入 bio。
len=BIO_printf(b,"bio test",8); 将字符串"bio test"写入 bio。
len=BIO_ctrl_pending(b); 得到缓冲区中待读取大小。
len=BIO_read(b,out,50); 将bio中的内容写入out缓冲区。
7.4.2 file bio
#include <stdio.h>
#include <openssl/bio.h>
int main()
{
BIO *b=NULL;
int len=0,outlen=0;
char *out=NULL;
b=BIO_new_file("bf.txt","w");
len=BIO_write(b,"openssl",4);
len=BIO_printf(b,"%s","zcp");
BIO_free(b);
b=BIO_new_file("bf.txt","r");
len=BIO_pending(b);
len=50;
out=(char *)OPENSSL_malloc(len);
len=1;
while(len>0) {
len=BIO_read(b,out+outlen,1);
outlen+=len;
}
printf("outlen = %d \n" , outlen);
BIO_free(b);
free(out);
return 0;
}
7.4.3 socket BIO
( 待补充 )
7.4.4 md BIO
#include <openssl/bio.h>
#include <openssl/evp.h>
int main()
{
BIO *bmd=NULL,*b=NULL;
const EVP_MD *md=EVP_md5();
int len, i;
char tmp[1024]={0};
bmd=BIO_new(BIO_f_md());
BIO_set_md(bmd,md);
b= BIO_new(BIO_s_null());
b=BIO_push(bmd,b);
len=BIO_write(b,"openssl",7);
len=BIO_gets(b,tmp,1024);
for ( i=0 ; i < 16; i++)
printf("0x%02x: ", (uint8_t)tmp[i]);
BIO_free(b);
return 0;
}
说明: 本示例用 md BIO 对字符串"opessl"进行 md5 摘要。
bmd=BIO_new(BIO_f_md());生成一个 md BIO。
BIO_set_md(bmd,md); 设置 md BIO 为 md5 BIO。
b= BIO_new(BIO_s_null()); 生成一个 null BIO。
b=BIO_push(bmd,b); 构造BIO 链,md5 BIO 在顶部。
len=BIO_write(b,"openssl",7); 将字符串送入 BIO 做摘要。
len=BIO_gets(b,tmp,1024); 将摘要结果写入 tmp 缓冲区
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 分享一个免费、快速、无限量使用的满血 DeepSeek R1 模型,支持深度思考和联网搜索!
· 基于 Docker 搭建 FRP 内网穿透开源项目(很简单哒)
· ollama系列01:轻松3步本地部署deepseek,普通电脑可用
· 按钮权限的设计及实现
· 25岁的心里话
2019-06-12 Socket 套接字的系统调用