使用pdfkit转换url为pdf遇到的问题

1. 转换完后未显示图片

原因：wkhtmltopdf 模块默认情况下禁用本地文件访问

解决方案：pdfkit.from_string的执行语句加入参数　　options={'encoding': "UTF-8", 'enable-local-file-access': True}

如：

pdfkit.from_string(html, pdf_filename, options={'encoding': "UTF-8", 'enable-local-file-access': True},
                   configuration=config)

2. 运行报错：

File "D:\python\python3.8.1\lib\site-packages\wechatsogou\structuring.py", line 474, in get_article_detail
qqmusic = content_text.find_all('qqmusic') or []
AttributeError: 'NoneType' object has no attribute 'find_all'

原因：文章的只有标题，没有内容，导致找不到属性

解决方案：从报错可以看出是wechatsogou库tructuring.py文件的get_article_detail函数报错，于是我们直接找到该函数，修改文件的代码，改为正文无内容时不执行，直接返回空值

3. 运行报错：

File "D:\python\python3.8.1\lib\site-packages\pdfkit\pdfkit.py", line 155, in handle_error
raise IOError('wkhtmltopdf reported an error:\n' + stderr)
OSError: wkhtmltopdf reported an error:
Exit with code 1 due to network error: ContentNotFoundError

原因：找不到文件，这个是由于执行下面语句时：

pdfkit.from_string(html, pdf_filename, options={'encoding': "UTF-8", 'enable-local-file-access': True},
                   configuration=config)

传入的的html包含外部文件，找不到时就会报错，我看了下我传入的这篇文章，确实有个投票的外部链接。

解决方案：修改html文件，把包含的外部文件相关的标签删除，可以使用bs4库，先找到对应标签，并使用compose删除，举例：

wechatsogou库tructuring.py文件的get_article_detail函数仅对文件中的音乐食品，图片，以及iframe做了处理，如果文章中有其他地方引用了外部链接都可能报错，需要自己删除下

4. 运行报错：

File "D:\python\python3.8.1\lib\site-packages\pdfkit\pdfkit.py", line 158, in handle_error
raise IOError("wkhtmltopdf exited with non-zero code {0}. error:\n{1}".format(exit_code, error_msg))
OSError: wkhtmltopdf exited with non-zero code 1. error:
QPainter::begin(): Returned false
Exit with code 1, due to unknown error.

原因：输入源存在错误，可以检查下参数是否有问题，我这里报错是因为传的文件路径不存在，导致报错。

解决方案：修改有问题的参数

5. 运行报错：

File "D:\python\python3.8.1\lib\site-packages\wechatsogou\structuring.py", line 615, in get_article_detail_bak
content_html = re.findall(js_content, content_html)[0][0]
IndexError: list index out of range

原因：报这个错有两个可能，一个可能是list的下标超出范围，一个可能是list是空的，没有一个元素。所以我们找到wechatsogou库tructuring.py文件的get_article_detail_bak函数，排查发现是执行music.parent.decompose()时，把父标签删除了，而恰好这里的父标签是整个文章，所以导致整个html文件被清空了。

解决方案：修改代码，music.parent.decompose()改为music.decompose()

posted on 2022-05-25 11:47 公元12956 阅读(2015) 评论(0) 编辑收藏举报

刷新页面返回顶部

公元12956

使用pdfkit转换url为pdf遇到的问题

导航

公告