给edoc输出的文档指定UTF8编码

由于项目里Erlang的源代码都是用UTF8编码保存的,注释中有很多中文,结果edoc输出的页面在浏览器下中文都变成了乱码,每次浏览都要指定UTF8编码十分麻烦。

用记事本打开HTML看了下,发现没有元数据指定编码,而且文档前面一大串都是英文字符,影响了浏览器的自动识别编码功能,没将文档识别为UTF8编码。

既然知道原因,解决办法自然就出来了:在<head>里增加下面一行指定好UTF8编码即可:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

但edoc每次生成的文档都有一大堆,手工添加的话很不现实。写段脚本给每个HTML添加这段字串又好像太业余,还是用文艺点的办法吧。

于是找到了edoc的说明文档,edoc:run/3函数第三个参数说明如下:

edoc:run/3
Options:

{app_default, string()}
Specifies the default base URI for unknown applications.

{application, App::atom()}
Specifies that the generated documentation describes the application App. This mainly affects generated references.

{dir, filename()}
Specifies the target directory for the generated documentation.

{doc_path, [string()]}
Specifies a list of URI:s pointing to directories that contain EDoc-generated documentation. URI without a scheme:// part are taken as relative to file://. (Note that such paths must use / as separator, regardless of the host operating system.)

{doclet, Module::atom()}
Specifies a callback module to be used for creating the documentation. The module must export a function run(Cmd, Ctxt). The default doclet module is edoc_doclet; see edoc_doclet:run/2 for doclet-specific options.

{exclude_packages, [package()]}
Lists packages to be excluded from the documentation. Typically used in conjunction with the subpackages option.

{file_suffix, string()}
Specifies the suffix used for output files. The default value is ".html". Note that this also affects generated references.

{new, boolean()}
If the value is true, any existing edoc-info file in the target directory will be ignored and overwritten. The default value is false.

{packages, boolean()}
If the value is true, it it assumed that packages (module namespaces) are being used, and that the source code directory structure reflects this. The default value is true. (Usually, this does the right thing even if all the modules belong to the top-level "empty" package.) no_packages is an alias for {packages, false}. See the subpackages option below for further details.

If the source code is organized in a hierarchy of subdirectories although it does not use packages, use no_packages together with the recursive-search subpackages option (on by default) to automatically generate documentation for all the modules.

{source_path, [filename()]}
Specifies a list of file system paths used to locate the source code for packages.

{source_suffix, string()}
Specifies the expected suffix of input files. The default value is ".erl".

{subpackages, boolean()}
If the value is true, all subpackages of specified packages will also be included in the documentation. The default value is false. no_subpackages is an alias for {subpackages, false}. See also the exclude_packages option.

Subpackage source files are found by recursively searching for source code files in subdirectories of the known source code root directories. (Also see the source_path option.) Directory names must begin with a lowercase letter and contain only alphanumeric characters and underscore, or they will be ignored. (For example, a subdirectory named test-files will not be searched.)

一大堆参数中,似乎只有doclet有点关联。edoc使用找到doclet参数指定的模块作为创建文档时的回调模块, 默认是edoc_doclet。

初步设想是将edoc_doclet的代码复制一份出来,生成HTML时添加好UTF8元数据即可。

看了下edoc_doclet的源代码,发现edoc_doclet里面组织文档结构时调用的是edoc:layout/2

edoc_doclet:source/9
%% Generating documentation for a source file, adding its name to the
%% set if it was successful. Errors are just flagged at this stage,
%% allowing all source files to be processed even if some of them fail.

source({M, P, Name, Path}, Dir, Suffix, Env, Set, Private, Hidden,
Error, Options) ->
File = filename:join(Path, Name),
case catch {ok, edoc:get_doc(File, Env, Options)} of
{ok, {Module, Doc}} ->
check_name(Module, M, P, File),
case ((not is_private(Doc)) orelse Private)
andalso ((not is_hidden(Doc)) orelse Hidden) of
true ->
Text = edoc:layout(Doc, Options),
Name1 = packages:last(M) ++ Suffix,
edoc_lib:write_file(Text, Dir, Name1, P),
{sets:add_element(Module, Set), Error};
false ->
{Set, Error}
end;
R ->
report("skipping source file '~s': ~W.", [File, R, 15]),
{Set, true}
end.

跳到edoc:layout/2,发现使用了layout参数提供的模块作为组织文档的回调模块,默认使用edoc_layout。

%% @spec layout(Doc::edoc_module(), Options::proplist()) -> string()
%%
%% @doc Transforms EDoc module documentation data to text. The default
%% layout creates an HTML document.
%%
%% Options:
%% <dl>
%% <dt>{@type {layout, Module::atom()@}}
%% </dt>
%% <dd>Specifies a callback module to be used for formatting. The
%% module must export a function `module(Doc, Options)'. The
%% default callback module is {@link edoc_layout}; see {@link
%% edoc_layout:module/2} for layout-specific options.
%% </dd>
%% </dl>
%%
%% @see layout/1
%% @see run/3
%% @see read/2
%% @see file/2

%% INHERIT-OPTIONS: edoc_lib:run_layout/2

layout(Doc, Opts) ->
F = fun (M) ->
M:module(Doc, Opts)
end,
edoc_lib:run_layout(F, Opts).

然后到edoc_layout继续跟踪,就找到了组织HTML的代码段:

xhtml(Title, CSS, Body) ->
[{html, [?NL,
{head, [?NL,
{meta, [{'http-equiv',"Content-Type"},
{content, "text/html; charset=ISO-8859-1"}],
[]},
?NL,
{title, Title},
?NL] ++ CSS},
?NL,
{body, [{bgcolor, "white"}], Body},
?NL]
},
?NL].

这么看来,还需要自定义一个edoc_layout模块才能达到目的。

细心观察可以发现,edoc:layout/2使用的layout参数是从edoc:run/3传入的!

所以不需要自定义edoc_doclet,不需要指定doclet参数,只需要定义一个edoc_layout_utf8模块,指定layout参数即可。

-module(edoc_layout_utf8).
%% xhtml/3修改如下,其他代码完全照搬edoc_layout.erl即可
-define(UTF8_META,{meta,[{'http-equiv',"Content-Type"},{content,"text/html; charset=UTF-8"}],[]}).
xhtml(Title, CSS, Body) ->
[{html, [?NL,
{head, [?NL,
?UTF8_META,
?NL,
{title, Title},
?NL] ++ CSS},
?NL,
{body, [{bgcolor, "white"}], Body},
?NL]
},
?NL].

edoc生成文档时指定layout为edoc_layout_utf8即可:

edoc:files(Files, [{dir,"doc"},{new,true},{layout,edoc_layout_utf8}]).

 

这个问题的解决给我感触很深:

一是edoc里面对回调模块的操作比较巧妙,或许这就是我一直想学习的Erlang的实现风格;

二是edoc的文档没有直接提及layout这个参数,我还以为文档泄露了,认为文档始终是没源代码可靠。后来才发现这么句“Also see layout/2 for layout-related options” 囧,原来是我自己看文档不够仔细。

posted @ 2012-02-14 01:28  neutra  阅读(1151)  评论(1编辑  收藏  举报