From: https://stackoverflow.com/questions/31574127/node-js-cheerio-parser-breaks-utf-8-encoding
[问题]
I parse my request with Cheerio like this:
var url = http://shop.nag.ru/catalog/16939.IP-videonablyudenie-OMNY/16944.IP-kamery-OMNY-c-vario-obektivom/16704.OMNY-1000-PRO;
request.get(url, function (err, response, body) {
console.log(body);
$ = cheerio.load(body);
console.log($(".description").html());
});
And as output I see content but in unreadable strange encoding:
//Plain body console.log(body) (p.s. russian chars):
<h1><span style="font-size: 16px;">Уличная 3Мп IP HD камера OMNY - попробуйте найти лучше</span></h1><p style
// cheerio's console.log $(".description").html()
<h1><span style="font-size: 16px;">Уличная 3Мп IP HD камера OMNY
Target url link coding is in UTF-8 format. So why Cheerio breaks my encoding?
Trying to use iconv to encode my body responce:
var body1 = iconv.decode(body, "utf-8");
but console.log($(".description").html());
still returns weird text.
[回答]
Cheerio hasn't broken anything. The HTML it outputs will be rendered by any browser exactly the same as the HTML input. Take a look at this snippet:
<h1><span style="font-size: 16px;">Уличная 3Мп IP HD камера OMNY - попробуйте найти лучше</span></h1>
<h1><span style="font-size: 16px;">Уличная 3Мп IP HD камера OMNY - попробуйте найти лучше</span></h1>
It's merely the case that У
is the HTML "entity" for the UTF-8 character У
, in the same way the entity >
represents >
.
However, if you want to get the unencoded text, you can set the decodeEntities
option to false
:
const $ = cheerio.load(
`<h1><span style="font-size: 16px;">Уличная 3Мп IP HD камера OMNY - попробуйте найти лучше</span></h1>`,
{ decodeEntities: false }
);
console.log($('span').html())
// => Уличная 3Мп IP HD камера OMNY - попробуйте найти лучше
.as-console-wrapper{min-height:100%}
<script src="https://wzrd.in/standalone/cheerio@latest"></script>
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 如何编写易于单元测试的代码
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 10年+ .NET Coder 心语 ── 封装的思维:从隐藏、稳定开始理解其本质意义
· 地球OL攻略 —— 某应届生求职总结
· 提示词工程——AI应用必不可少的技术
· Open-Sora 2.0 重磅开源!
· 周边上新:园子的第一款马克杯温暖上架
2017-06-21 Mongo如何在多个字段中查询某个关键字?