问题的缘起是芒果在使用mochiweb的过程中遇到一个异常,在google的讨论组中我找到了同样的问题:
=ERROR REPORT==== 7-Apr-2011::18:58:22 ===
"web request failed"
path: "cfsp/entity"
type: error
what: badarg
trace: [{erlang,iolist_size,
[[...]]},
{mochiweb_request,respond,2},
{rest_server_web,loop,1},
{mochiweb_http,headers,5},
{proc_lib,init_p_do_apply,3}]
提问者遇到这个异常后判断是文档超长造成的,bob在下面的回复首先否定了这个猜测,并把关注点放在了trace信息中明确提示出来的iolist异常上面,他的回复:
I don't think it has anything to do with the size of your document,
your code is somehow returning a value that is not an iolist. Perhaps there is an atom in there, or an integer outside of 0..255 (I guess this is more likely, I don't know xmerl output very well).
1> iolist_size([256]).
** exception error: bad argument
in function iolist_size/1
called as iolist_size([256])
You probably want UTF-8, so unicode:characters_to_binary(xmerl:export_simple(Res,xmerl_xml)) is my
guess at what you really want to be doing for the output.
-bob
详情点击:http://groups.google.com/group/mochiweb/browse_thread/thread/f67abc113b338bfe?pli=1
iolist定义
iodata() = iolist() | binary()
iolist() maybe_improper_list(char() | binary() | iolist(), binary() | [])
maybe_improper_list() maybe_improper_list(any(), any())
byte() 0..255
char() 0..16#10ffffmaybe_improper_list(T) maybe_improper_list(T, any())
或者:IoData = unicode:chardata()
chardata() = charlist() | unicode_binary()
charlist() = [unicode_char() | unicode_binary() | charlist()]
unicode_binary() = binary()A binary() with characters encoded in the UTF-8 coding standard.
注意iolist相关的两个函数,他们接收的参数还可以是binary
iolist_size(Item) -> integer() >= 0
Types:Item = iolist() | binary()
iolist_to_binary(IoListOrBinary) -> binary()
Types:IoListOrBinary = iolist() | binary()
Eshell V5.9 (abort with ^G)
1> iolist_size([]).
0
2> iolist_size([<<"anc">>]).
3
3> iolist_size([12,<<"anc">>]).
4
4> iolist_size([12,<<"anc">>,23]).
5
5> iolist_size([12,<<"anc">>,23,<<"king">>]).
9
6> iolist_size([12,<<"anc">>,23,<<"king">>,[23,34,<<"test">>]]).
15
7> iolist_size(<<"abc">>).
3
8> iolist_size(<<>>).
0
9> iolist_size([1234]).
** exception error: bad argument
in function iolist_size/1
called as iolist_size([1234])
10> iolist_size([<<1:1>>]).
** exception error: bad argument
in function iolist_size/1
called as iolist_size([<<1:1>>])
11> iolist_size( [12,23,"abc",<<abc>>]).
** exception error: bad argument
12> iolist_size( [12,23,<<abc>>]).
** exception error: bad argument
13> iolist_size( [12,23,"abc",<<"abc">>]).
8
14> L=[$H, $e, [$l, <<"lo">>, " "], [[["W","o"], <<"rl">>]] | [<<"d">>]].
[72,101,[108,<<"lo">>," "],[[["W","o"],<<"rl">>]],<<"d">>]
iolist适用的场景?
A = [a]
B = [b|A] = [b,a]
C = [c|B] = [c,b,a]
In the case of prepending, as above, whatever is held into A or B or C never needs to be rewritten. The representation of C can be seen as either [c,b,a], [c|B] or [c,|[b|[a]]], among others. In the last case, you can see that the shape of A is the same at the end of the list as when it was declared. Similarly for B. Here's how it looks with appending:
A = [a]
B = A ++ [b] = [a] ++ [b] = [a|[b]]
C = B ++ [c] = [a|[b]] ++ [c] = [a|[b|[c]]]
Do you see all that rewriting? When we create B, we have to rewrite A. When we write C, we have to rewrite B (including the [a|...] part it contains). If we were to add D in a similar manner, we would need to rewrite C. Over long strings, this becomes way too inefficient, and it creates a lot of garbage left to be cleaned up by the Erlang VM.
With binaries, things are not exactly as bad:
A = <<"a">>
B = <<A/binary, "b">> = <<"ab">>
C = <<B/binary, "c">> = <<"abc">>
In this case, binaries know their own length and data can be joined in constant time. That's good, much better than lists. They're also more compact. For these reasons, we'll often try to stick to binaries when using text in the future.There are a few downsides, however. Binaries were meant to handle things in certain ways, and there is still a cost to modifying binaries, splitting them, etc. Moreover, sometimes we'll work with code that uses strings, binaries, and individual characters interchangeably. Constantly converting between types would be a hassle.
In these cases, IO lists are our saviour. IO lists are a weird type of data structure. They are lists of either bytes (integers from 0 to 255), binaries, or other IO lists. This means that functions that accept IO lists can accept items such as [$H, $e, [$l, <<"lo">>, " "], [[["W","o"], <<"rl">>]] | [<<"d">>]]. When this happens, the Erlang VM will just flatten the list as it needs to do it to obtain the sequence of characters Hello World.
What are the functions that accept such IO Lists? Most of the functions that have to do with outputting data do. Any function from the io module, file module, TCP and UDP sockets will be able to handle them. Some library functions, such as some coming from the unicode module and all of the functions from the re (for regular expressions) module will also handle them, to name a few.
Try the previous Hello World IO List in the shell with io:format("~s~n", [IoList]) just to see. It should work without a problem.
All in all, they're a pretty clever way of building strings to avoid the problems of immutable data structures when it comes to dynamically building content to be output.
L=[$H, $e, [$l, <<"lo">>, " "], [[["W","o"], <<"rl">>]] | [<<"d">>]],
iolist_size(L)
do %% Line 14
call 'erlang':'iolist_size'
([72|[101|[[108|[#{#<108>(8,1,'integer',['unsigned'|['big']]),
#<111>(8,1,'integer',['unsigned'|['big']])}#|[[32]]]]|[[[[[87]|[[111]]]|[#{#<114>(8,1,'integer',['unsigned'|['big']]),
#<108>(8,1,'integer',['unsigned'|['big']])}#]]]|[#{#<100>(8,1,'integer',['unsigned'|['big']])}#]]]]])
相关阅读
Stackoverflow上有人提到了同样的问题:
Ports, external or linked-in, accept something called io-lists for sending data to them. An io-list is a binary or a (possibly deep) list of binaries or integers in the range 0..255.
This means that rather than concatenating two lists before sending them to a port, one can just send them as two items in a list. So instead of
"foo" ++ "bar"
one do
["foo", "bar"]
In this example it is of course of miniscule difference. But the iolist in itself allows for convenient programming when creating output data. io_lib:format/2,3 itself returns an io list for example.
The function erlang:list_to_binary/1 accepts io lists, but now we have erlang:iolist_to_binary/1 which convey the intention better. There is also an erlang:iolist_size/1.
Best of all, since files and sockets are implemented as ports, you can send iolists to them. No need to flatten or append.
The IO List is a handy data type in Erlang, but not one that's often discussed in tutorials. It's any binary. Or any list containing integers between 0 and 255. Or any arbitrarily nested list containing either of those two things. Like this:[10, 20, "hello", <<"hello",65>>, [<<1,2,3>>, 0, 255]]The key to IO lists is that you never flatten them. They get passed directly into low-level runtime functions (such as file:write_file), and the flattening happens without eating up any space in your Erlang process. Take advantage of that! Instead of appending values to lists, use nesting instead. For example, here's a function to put a string in quotes:quote(String) -> $" ++ String ++ $".If you're working with IO lists, you can avoid the append operations completely (and the second "++" above results in an entirely new version of String being created). This version uses nesting instead:quote(String) -> [$", String, $"].This creates three list elements no matter how long the initial string is. The first version creates length(String) + 2 elements. It's also easy to go backward and un-quote the string: just take the second list element. Once you get used to nesting you can avoid most append operations completely.
One thing that nested list trick is handy for is manipulating filenames. Want to add a directory name and ".png" extension to a filename? Just do this:[Directory, $/, Filename, ".png"]Unfortunately, filenames in the file module are not true IO lists. You can pass in deep lists, but they get flattened by an Erlang function (file:file_name/1), not the runtime system. That means you can still dodge appending lists in your own code, but things aren't as efficient behind the scenes as they could be. And "deep lists" in this case meansonly lists, not binaries. Strangely, these deep lists can also contain atoms, which get expanded via atom_to_list.
Ideally filenames would be IO lists, but for compatibility reasons there's still the need to support atoms in filenames. That brings up an interesting idea: why not allow atoms as part of the general IO list specification? It makes sense, as the runtime system has access to the atom table, and there's a simple correspondence between an atom and how it gets encoded in a binary; 'atom' is treated the same as "atom". I find I'm often calling atom_to_list before sending data to external ports, and that would no longer be necessary.