[Erlang 0034] Erlang iolist

问题的缘起是芒果在使用mochiweb的过程中遇到一个异常，在google的讨论组中我找到了同样的问题：

=ERROR REPORT==== 7-Apr-2011::18:58:22 === 
"web request failed" 
path: "cfsp/entity" 
type: error 
what: badarg 
trace: [{erlang,iolist_size, 
[[...]]}, 
{mochiweb_request,respond,2}, 
{rest_server_web,loop,1}, 
{mochiweb_http,headers,5}, 
{proc_lib,init_p_do_apply,3}]

提问者遇到这个异常后判断是文档超长造成的，bob在下面的回复首先否定了这个猜测，并把关注点放在了trace信息中明确提示出来的iolist异常上面，他的回复：

I don't think it has anything to do with the size of your document,

your code is somehow returning a value that is not an iolist. Perhaps there is an atom in there, or an integer outside of 0..255 (I guess this is more likely, I don't know xmerl output very well).

1> iolist_size([256]). 
** exception error: bad argument 
in function iolist_size/1 
called as iolist_size([256])

You probably want UTF-8, so unicode:characters_to_binary(xmerl:export_simple(Res,xmerl_xml)) is my

guess at what you really want to be doing for the output.

-bob

详情点击：http://groups.google.com/group/mochiweb/browse_thread/thread/f67abc113b338bfe?pli=1

按照这个提示给芒果，果然就把问题解决掉了；问题到这里还不能结束，追问一下，Erlang的List和IOList有什么区别？

iolist定义

在erlang官方文档中iolist描述甚少,不过还是可以找到定义:

iodata() = iolist() | binary()
iolist()     maybe_improper_list(char() | binary() | iolist(), binary() | [])
maybe_improper_list()     maybe_improper_list(any(), any())
byte()     0..255
char()     0..16#10ffffmaybe_improper_list(T)     maybe_improper_list(T, any())

或者:
IoData = unicode:chardata()
chardata() = charlist() | unicode_binary()
charlist() = [unicode_char() | unicode_binary() | charlist()]
unicode_binary() = binary()

A binary() with characters encoded in the UTF-8 coding standard.

注意iolist相关的两个函数,他们接收的参数还可以是binary

iolist_size(Item) -> integer() >= 0

Types:Item = iolist() | binary() 

iolist_to_binary(IoListOrBinary) -> binary()

Types:IoListOrBinary = iolist() | binary()

官方文档地址:http://www.erlang.org/doc/reference_manual/typespec.html

我们动手测试一下:

 Eshell V5.9  (abort with ^G)
1> iolist_size([]).
0
2> iolist_size([<<"anc">>]).
3
3> iolist_size([12,<<"anc">>]).
4
4> iolist_size([12,<<"anc">>,23]).
5
5> iolist_size([12,<<"anc">>,23,<<"king">>]).
9
6> iolist_size([12,<<"anc">>,23,<<"king">>,[23,34,<<"test">>]]).
15
7> iolist_size(<<"abc">>).
3
8> iolist_size(<<>>).
0
9> iolist_size([1234]).
** exception error: bad argument
     in function  iolist_size/1
        called as iolist_size([1234])
10> iolist_size([<<1:1>>]).
** exception error: bad argument
     in function  iolist_size/1
        called as iolist_size([<<1:1>>])
11> iolist_size( [12,23,"abc",<<abc>>]).
** exception error: bad argument
12> iolist_size( [12,23,<<abc>>]).
** exception error: bad argument
13> iolist_size( [12,23,"abc",<<"abc">>]).
8
14>  L=[$H, $e, [$l, <<"lo">>, " "], [[["W","o"], <<"rl">>]] | [<<"d">>]].
[72,101,[108,<<"lo">>," "],[[["W","o"],<<"rl">>]],<<"d">>]

iolist适用的场景?

首先能够找到的是mryufeng的这篇《iolist跟list有什么区别？》 http://mryufeng.iteye.com/blog/634867

这篇文章分析源码得到了iolist数据结构的定义,并在解释了iolist的作用:

Iolist的作用是用于往port送数据的时候.由于底层的系统调用如writev支持向量写, 就避免了无谓的iolist_to_binary这样的扁平话操作, 避免了内存拷贝,极大的提高了效率.建议多用.

这个是什么意思呢? 在Learn you some Erlang站点上<<Buckets of Sockets>>一文的开篇我找到了答案:

A = [a]
B = [b|A] = [b,a]
C = [c|B] = [c,b,a]
In the case of prepending, as above, whatever is held into A or B or C never needs to be rewritten. The representation of C can be seen as either [c,b,a], [c|B] or [c,|[b|[a]]], among others. In the last case, you can see that the shape of A is the same at the end of the list as when it was declared. Similarly for B. Here's how it looks with appending:

A = [a]
B = A ++ [b] = [a] ++ [b] = [a|[b]]
C = B ++ [c] = [a|[b]] ++ [c] = [a|[b|[c]]]
Do you see all that rewriting? When we create B, we have to rewrite A. When we write C, we have to rewrite B (including the [a|...] part it contains). If we were to add D in a similar manner, we would need to rewrite C. Over long strings, this becomes way too inefficient, and it creates a lot of garbage left to be cleaned up by the Erlang VM.

With binaries, things are not exactly as bad:

A = <<"a">>
B = <<A/binary, "b">> = <<"ab">>
C = <<B/binary, "c">> = <<"abc">>
In this case, binaries know their own length and data can be joined in constant time. That's good, much better than lists. They're also more compact. For these reasons, we'll often try to stick to binaries when using text in the future.

There are a few downsides, however. Binaries were meant to handle things in certain ways, and there is still a cost to modifying binaries, splitting them, etc. Moreover, sometimes we'll work with code that uses strings, binaries, and individual characters interchangeably. Constantly converting between types would be a hassle.

In these cases, IO lists are our saviour. IO lists are a weird type of data structure. They are lists of either bytes (integers from 0 to 255), binaries, or other IO lists. This means that functions that accept IO lists can accept items such as [$H, $e, [$l, <<"lo">>, " "], [[["W","o"], <<"rl">>]] | [<<"d">>]]. When this happens, the Erlang VM will just flatten the list as it needs to do it to obtain the sequence of characters Hello World.

What are the functions that accept such IO Lists? Most of the functions that have to do with outputting data do. Any function from the io module, file module, TCP and UDP sockets will be able to handle them. Some library functions, such as some coming from the unicode module and all of the functions from the re (for regular expressions) module will also handle them, to name a few.

Try the previous Hello World IO List in the shell with io:format("~s~n", [IoList]) just to see. It should work without a problem.

All in all, they're a pretty clever way of building strings to avoid the problems of immutable data structures when it comes to dynamically building content to be output.

简单说明一下上面的内容:

|->如果是在List头部追加内容是非常快速的,但是在List尾部追加内容就要进行遍历

-> 使用binary数据可以在常量时间内完成尾部追加,但是问题:①修改和split存在消耗 ;②字符和二进制数据的常量转换

-> iolist对这种数据混搭有一个较好的支持,Erlang VM会将list平铺,可以使用io:format来检验各种数据构成的iolist输出之后的结果

-> 总结 iolist是单次赋值约束下,动态构建字符串内容输出的好方法;

我们可以通过erlc +\'to_core\' M.erl 的方法(参见:[Erlang 0029] Erlang Inline编译)查看一下iolist的 Core Erlang表示:

在Core Erlang中List = [1,2,3,4,5,6,7,8,9],会被表示为:[1|[2|[3|[4|[5|[6|[7|[8|[9]]]]]]]]]

看下

 L=[$H, $e, [$l, <<"lo">>, " "], [[["W","o"], <<"rl">>]] | [<<"d">>]],
  iolist_size(L)

转换为:

 do  %% Line 14
            call 'erlang':'iolist_size'
                ([72|[101|[[108|[#{#<108>(8,1,'integer',['unsigned'|['big']]),
                                   #<111>(8,1,'integer',['unsigned'|['big']])}#|[[32]]]]|[[[[[87]|[[111]]]|[#{#<114>(8,1,'integer',['unsigned'|['big']]),
                                                                                                              #<108>(8,1,'integer',['unsigned'|['big']])}#]]]|[#{#<100>(8,1,'integer',['unsigned'|['big']])}#]]]]])

相关阅读

Stackoverflow上有人提到了同样的问题:

Ports, external or linked-in, accept something called io-lists for sending data to them. An io-list is a binary or a (possibly deep) list of binaries or integers in the range 0..255.
This means that rather than concatenating two lists before sending them to a port, one can just send them as two items in a list. So instead of
"foo" ++ "bar"
one do
["foo", "bar"]
In this example it is of course of miniscule difference. But the iolist in itself allows for convenient programming when creating output data. io_lib:format/2,3 itself returns an io list for example.
The function erlang:list_to_binary/1 accepts io lists, but now we have erlang:iolist_to_binary/1 which convey the intention better. There is also an erlang:iolist_size/1.
Best of all, since files and sockets are implemented as ports, you can send iolists to them. No need to flatten or append.

还有这一篇:A Ramble Through Erlang IO Lists http://prog21.dadgum.com/70.html

The IO List is a handy data type in Erlang, but not one that's often discussed in tutorials. It's any binary. Or any list containing integers between 0 and 255. Or any arbitrarily nested list containing either of those two things. Like this:
[10, 20, "hello", <<"hello",65>>, [<<1,2,3>>, 0, 255]]
The key to IO lists is that you never flatten them. They get passed directly into low-level runtime functions (such as file:write_file), and the flattening happens without eating up any space in your Erlang process. Take advantage of that! Instead of appending values to lists, use nesting instead. For example, here's a function to put a string in quotes:
quote(String) -> $" ++ String ++ $".
If you're working with IO lists, you can avoid the append operations completely (and the second "++" above results in an entirely new version of String being created). This version uses nesting instead:
quote(String) -> [$", String, $"].
This creates three list elements no matter how long the initial string is. The first version creates length(String) + 2 elements. It's also easy to go backward and un-quote the string: just take the second list element. Once you get used to nesting you can avoid most append operations completely.

One thing that nested list trick is handy for is manipulating filenames. Want to add a directory name and ".png" extension to a filename? Just do this:
[Directory, $/, Filename, ".png"]
Unfortunately, filenames in the file module are not true IO lists. You can pass in deep lists, but they get flattened by an Erlang function (file:file_name/1), not the runtime system. That means you can still dodge appending lists in your own code, but things aren't as efficient behind the scenes as they could be. And "deep lists" in this case meansonly lists, not binaries. Strangely, these deep lists can also contain atoms, which get expanded via atom_to_list.

Ideally filenames would be IO lists, but for compatibility reasons there's still the need to support atoms in filenames. That brings up an interesting idea: why not allow atoms as part of the general IO list specification? It makes sense, as the runtime system has access to the atom table, and there's a simple correspondence between an atom and how it gets encoded in a binary; 'atom' is treated the same as "atom". I find I'm often calling atom_to_list before sending data to external ports, and that would no longer be necessary.

总结

iolist是单次赋值约束下,避免了字符串和二进制数据的转换,是动态构建字符串内容输出的好方法;

发表于 2012-01-31 17:58 坚强2002 阅读(8180) 评论(2) 收藏举报

刷新页面返回顶部

[Erlang 0034] Erlang iolist

iolist定义

iolist适用的场景?

相关阅读

公告