pandoc技巧笔记

Posted on 2023-07-06 17:04 夜owl 阅读(312) 评论(2) 编辑收藏举报

1. 安装

官网下载

Pandoc - index

免安装的版本再github上提供

Release pandoc 3.1.4 · jgm/pandoc

2. 使用方法

2.1. 入门

参考官方入门教程
Pandoc - Getting started with pandoc

2.1.1. 通过终端打开

下载的压缩包解压，里面有pandoc.exe，这个程序是命令行工具，用cmd终端来运行。

在cmd中打开这个路径

Alt text

然后就可以使用再cmd使用pandoc了

2.1.2. 输入模式

直接在cmd中敲pandoc就会进入输入模式，官方称过滤器（filter），在这个模式中，会将记录当前的输入，在退出（在cmd中为ctrl-c）pandoc时，输出转换后的输入，默认为输入为md格式，输出为html
通过下面指令指定格式，-f为输入，-t为输出

pandoc -f html -t markdown

官方示例输入

Hello *pandoc*!

- one
- two

输出

<p>Hello <em>pandoc</em>!</p>
<ul>
<li>one</li>
<li>two</li>
</ul>

tips：需要enter换行来确认输入有效

2.1.3. 文件模式

下面是主要的功能，需要将文件转换，使用以下指令，需要确保文件的路径，-f为输入文件格式，-t为输出格式，-s包含页首和页尾，-o指定输出文件名和格式，

pandoc test1.md -f markdown -t html -s -o test1.html

tips：pandoc也会根据格式的后缀名（.md和.html）来识别输出输入，从而省略-f -t参数。

2.1.4. 支持的输入格式&输出格式

可以用--list-input-formats --list-output-formats来查看

.\pandoc --list-input-formats

格式	格式	格式	格式	格式	格式
biblatex	bibtex	commonmark	commonmark_x	creole	csljson
csv	docbook	docx	dokuwiki	endnotexml	epub
fb2	gfm	haddock	html	ipynb	jats
jira	json	latex	man	markdown	markdown_github
markdown_mmd	markdown_phpextra	markdown_strict	mediawiki	muse	native
odt	opml	org	ris	rst	rtf
t2t	textile	tikiwiki	tsv	twiki	typst
vimwiki

.\pandoc --list-output-formats

格式	格式	格式	格式	格式	格式
asciidoc	asciidoctor	beamer	biblatex	bibtex	chunkedhtml
commonmark	commonmark_x	context	csljson	docbook	docbook4
docbook5	docx	dokuwiki	dzslides	epub	epub2
epub3	fb2	gfm	haddock	html	html4
html5	icml	ipynb	jats	jats_archiving	jats_articleauthoring
jats_publishing	jira	json	latex	man	markdown
markdown_github	markdown_mmd	markdown_phpextra	markdown_strict	markua	mediawiki
ms	muse	native	odt	opendocument	opml
org	pdf	plain	pptx	revealjs	rst
rtf	s5	slideous	slidy	tei	texinfo
textile	typst	xwiki	zimwiki

2.2. 进阶

进阶指令可以参考安装目录下或者官网的user's guide
Pandoc - Pandoc User’s Guide

2.2.1. 提取输入源的图片

转换中，图片不是文字，所以转换后的输出会将图片的meia变成路径，如果路径下没有保存之前z图片会图片丢失，需要使用--extract-media=DIR指令

.\pandoc '.\linux nano.docx' -f docx -t markdown -s -o nano.md --extract-media=media

Extract images and other media contained in or linked from the source document to the path DIR, creating it if necessary, and adjust the images references in the document so they point to the extracted files. Media are downloaded, read from the file system, or extracted from a binary container (e.g. docx), as needed. The original file paths are used if they are relative paths not containing ... Otherwise filenames are constructed from the SHA1 hash of the contents.

参考

pandoc提取word中的图片 - 简书

格式不支持标志

pandoc 转换后文本出现了{.underline}类似的标志，输出格式对应的目标文档类型不支持原始文档中的某些格式或样式。例如，在 docx 中使用的一些行内样式可能无法直接转换到目标格式，所以 Pandoc 会将其转换成适当的格式标记。

可以用正则搜索替换

\{\..*un.*\}

刷新页面返回顶部

夜owl

公告