thttpd增加gzip压缩响应报文体功能,以减少传输数据量
thttpd
thttpd是一个非常小巧的轻量级web server,它非常非常简单,仅仅提供了HTTP/1.1和简单的CGI支持,在其官方网站上有一个与其他web server(如Apache, Zeus等)的对比图+Benchmark,可以参考参考。此外,thttpd 也类似于lighttpd,对于并发请求不使用fork()来派生子进程处理,而是采用多路复用(Multiplex)技术来实现。因此效能很好。
thttpd 支持多种平台,如FreeBSD, SunOS, Solaris, BSD, Linux, OSF等。对于小型web server而言,速度快似乎是一个代名词,通过官方站提供的Benchmark,可以这样认为:thttpd至少和主流的web server一样快,在高负载下更快,因为其资源占用小的缘故。
thttpd还有一个较为引人注目的特点:基于URL的文件流量限制,这对于下载的流量控制而言是非常方便的。象Apache就必须使用插件实现,效率较thttpd低。
安装调试,见:
http://blog.csdn.net/21aspnet/article/details/7045845
http://blog.csdn.net/orzlzro/article/details/7568338
HTTP协议压缩
对响应报文体进行压缩,可以减少报文传输的数据量, 以提高页面响应速度。特别是对当今web应用丰富的情况下, 页面形成了很大的脚本, 则效果明显。
压缩服务器端和客户端使用同一种压缩算法。HTTP协议对压缩算法有什么规定? 是怎么协商压缩算法的?
如下描述:
1、 HTTP支持的压缩算法包括 gzip compress deflate identity(不压缩)
2、 HTTP协议规定, 客户端发起的请求中使用accept-encoding报文头,告知服务器端, 客户端可以接受哪几种压缩算法, 服务器端分析此头域值, 知道其支持解压的算法, 如果算法服务器端也支持, 则服务器端对响应报文体, 进行压缩, 将压缩后的内容, 作为报文体传给客户端,报文头中要包括content-encoding,其值指明压缩使用的算法。 注意content-length这时候, 就是压缩后的内容长度。
如下图报文头,accept-encoding 和 content-encoding:
http://www.w3.org/Protocols/rfc2616/rfc2616.txt
Content coding values indicate an encoding transformation that has been or can be applied to an entity. Content codings are primarily used to allow a document to be compressed or otherwise usefully transformed without losing the identity of its underlying media type and without loss of information. Frequently, the entity is stored in coded form, transmitted directly, and only decoded by the recipient. content-coding = token All content-coding values are case-insensitive. HTTP/1.1 uses content-coding values in the Accept-Encoding (section 14.3) and Content-Encoding (section 14.11) header fields. Although the value describes the content-coding, what is more important is that it indicates what decoding mechanism will be required to remove the encoding. The Internet Assigned Numbers Authority (IANA) acts as a registry for content-coding value tokens. Initially, the registry contains the following tokens: gzip An encoding format produced by the file compression program "gzip" (GNU zip) as described in RFC 1952 [25]. This format is a Lempel-Ziv coding (LZ77) with a 32 bit CRC. compress The encoding format produced by the common UNIX file compression program "compress". This format is an adaptive Lempel-Ziv-Welch coding (LZW). Use of program names for the identification of encoding formats is not desirable and is discouraged for future encodings. Their use here is representative of historical practice, not good design. For compatibility with previous implementations of HTTP, applications SHOULD consider "x-gzip" and "x-compress" to be equivalent to "gzip" and "compress" respectively. deflate The "zlib" format defined in RFC 1950 [31] in combination with the "deflate" compression mechanism described in RFC 1951 [29]. identity The default (identity) encoding; the use of no transformation whatsoever. This content-coding is used only in the Accept- Encoding header, and SHOULD NOT be used in the Content-Encoding header. New content-coding value tokens SHOULD be registered; to allow interoperability between clients and servers, specifications of the content coding algorithms needed to implement a new value SHOULD be publicly available and adequate for independent implementation, and conform to the purpose of content coding defined in this section.
gzip压缩工具
gzip工具的网站上指出:
Can I adapt the gzip sources to perform in-memory compression?
Use the zlib data compression library instead.
zlib
https://github.com/madler/zlib
主要的压缩函数 http://www.zlib.net/manual.html#Basic:
ZEXTERN int ZEXPORT deflate OF((z_streamp strm, int flush));deflate compresses as much data as possible, and stops when the input buffer becomes empty or the output buffer becomes full. It may introduce some output latency (reading input without producing any output) except when forced to flush.
通用封装的压缩函数, compress compress2, 这两方法,不会生成gzip格式头:
使用方法参考: http://blog.csdn.net/turingo/article/details/8148264
生成gzip格式头的压缩函数, gzcompress gzdecompress 参考:
http://www.oschina.net/code/snippet_65636_22542
gzcompress 是我们今天无需要使用的函数, 服务器端压缩, 报文体。
修改要点
对file_address内容进行压缩, 压缩存储内存开辟以compressBound计算大小,存储地址为file_address_gz,
gzcompress 执行压缩行为, 存储在file_address_gz中,
同时修改, send_mime调用的 len。
#include <zlib.h>
#include <zconf.h>
static int
really_start_request( httpd_conn* hc, struct timeval* nowP ) 。。。else
{
hc->file_address = mmc_map( hc->expnfilename, &(hc->sb), nowP );
if ( hc->file_address == (char*) 0 )
{
httpd_send_err( hc, 500, err500title, "", err500form, hc->encodedurl );
return -1;
}/* 计算压缩结果*/
uLong blen = 0;
printf("enter file address %s!\n", hc->file_address);
/* 计算缓冲区大小,并为其分配内存 */
blen = compressBound(hc->sb.st_size+1); /* 压缩后的长度是不会超过blen的 */
if((hc->file_address_gz = (char*)malloc(sizeof(char) * blen)) == NULL)
{
printf("no enough memory!\n");
return -1;
}
/* 压缩 */
if(gzcompress(hc->file_address, hc->sb.st_size, hc->file_address_gz, &blen) != Z_OK)
{
printf("compress failed!\n");
return -1;
}
send_mime(
hc, 200, ok200title, hc->encodings, "", hc->type, blen,
hc->sb.st_mtime );
}return 0;
}
send_mime中 hc->encoding内容(gzip),由于accept-encoding值决定,如果其值含有gzip,则此值为gzip
for ( i = 0; i < n_enc_tab; ++i )
{
if ( (ext_len == enc_tab[i].ext_len && strncasecmp( ext, enc_tab[i].ext, ext_len ) == 0)
/* 客户端请求支持gzip,服务器端对于非gzip文件, 可以采用gzip压缩算法 */
|| ( strcasestr(hc->accepte, "gzip") && strncasecmp( "gz", enc_tab[i].ext, ext_len ) == 0 ) )
{
if ( n_me_indexes < sizeof(me_indexes)/sizeof(*me_indexes) )
{
me_indexes[n_me_indexes] = i;
++n_me_indexes;
}
goto next;
}
}
/* No encoding extension found. Break and look for a type extension. */
break;
发送阶段 handle_send 函数中, 将 file_address修改为 file_address_gz
/* Do we need to write the headers first? */
if ( hc->responselen == 0 )
{
/* No, just write the file. */
sz = write(
hc->conn_fd, &(hc->file_address_gz[c->next_byte_index]),
MIN( c->end_byte_index - c->next_byte_index, max_bytes ) );
}
else
{
/* Yes. We'll combine headers and file into a single writev(),
** hoping that this generates a single packet.
*/
struct iovec iv[2];iv[0].iov_base = hc->response;
iv[0].iov_len = hc->responselen;
iv[1].iov_base = &(hc->file_address_gz[c->next_byte_index]);
iv[1].iov_len = MIN( c->end_byte_index - c->next_byte_index, max_bytes );
sz = writev( hc->conn_fd, iv, 2 );
}
实验结果
以下载http协议为测试对象, http://www.w3.org/Protocols/rfc2616/rfc2616.txt
未实现gzip压缩前, 响应文件大小为 422KB,响应时间为 19ms, 加载时间为 373ms
实现gzip压缩后, 响应文件大小为 115KB,响应时间为 38ms, 加载时间为 396ms
从上面两者对比,可以看出, 响应时间变长, 可以理解为服务器端进行压缩耗时 和 客户端进行解压耗时, 这两个原因的耗时, 仅仅会比原来增加 20ms, 本文是执行的本地局域网测试,
但是体积缩小了进四分之一,很是显著, 如果是考虑互联网上的环境仅仅增加2oms的事件, 可以让体积降低四分之是很划算的, 因为互联网上的耗时都耗费在传输上, 体积上减少四分之一, 则传输速度提高四倍,
例如互联网上资源很有可能传输都要以秒计算时间,原来3秒, 压缩后0.7s:
响应报文内容: