Chrome’s Headless Browser Puppteer All In One
Chrome’s Headless Browser Puppteer All In One
Chrome
无头浏览器
Puppeteer
Chrome’s Headless mode gets an upgrade: introducing --headless=new
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch({
headless: 'new',
// `headless: true` (default) enables old Headless;
// `headless: 'new'` enables new Headless;
// `headless: false` enables “headful” mode.
});
const page = await browser.newPage();
await page.goto('https://developer.chrome.com/');
// …
await browser.close();
https://developer.chrome.com/articles/new-headless/
errors
TimeoutError
: Waiting for selector.group-container
failed: Waiting failed: 30000ms exceeded ❌
// import puppeteer from 'puppeteer';
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.tesla.cn/modely/design#overview');
await page.setViewport({width: 1920, height: 1080});
// ❌ TimeoutError: Waiting for selector `.group-container` failed: Waiting failed: 30000ms exceeded
const searchResultSelector = '.group-container';
const box = await page.waitForSelector(searchResultSelector);
console.log(`box`, box)
const text = await box?.evaluate(el => el.textContent);
console.log(`text`, text)
await browser.close();
})();
$ node ./pp.js
/*
Puppeteer old Headless deprecation warning:
In the near future `headless: true` will default to the new Headless mode
for Chrome instead of the old Headless implementation. For more
information, please see https://developer.chrome.com/articles/new-headless/.
Consider opting in early by passing `headless: "new"` to `puppeteer.launch()`
If you encounter any bugs, please report them to https://github.com/puppeteer/puppeteer/issues/new/choose.
*/
/Users/xgqfrms-mm/Documents/github/cyclic-express-server/node_modules/puppeteer-core/lib/cjs/puppeteer/common/WaitTask.js:59
void this.terminate(new Errors_js_1.TimeoutError(`Waiting failed: ${options.timeout}ms exceeded`));
^
TimeoutError: Waiting for selector `.group-container` failed: Waiting failed: 30000ms exceeded
at Timeout.<anonymous> (/Users/xgqfrms-mm/Documents/github/cyclic-express-server/node_modules/puppeteer-core/lib/cjs/puppeteer/common/WaitTask.js:59:37)
at listOnTimeout (node:internal/timers:564:17)
at process.processTimers (node:internal/timers:507:7)
Node.js v18.12.0
solutions
headless: "new"
const browser = await puppeteer.launch({
headless: "new",
// headless new ✅🚀
});
headless: false
const browser = await puppeteer.launch({
headless: false,
// disabled headless ✅🚀
});
page.setUserAgent
navigator.userAgent;
// 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36'
// navigator.userAgent, UA ✅
await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36')
demos
// import puppeteer from 'puppeteer';
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: "new",
// headless new 🚀
});
const page = await browser.newPage();
await page.goto('https://www.tesla.cn/modely/design#overview');
await page.setViewport({width: 1920, height: 1080});
// ❌ TimeoutError: Waiting for selector `.group-container` failed: Waiting failed: 30000ms exceeded
const searchResultSelector = '.group-container';
const box = await page.waitForSelector(searchResultSelector);
console.log(`box`, box)
const text = await box?.evaluate(el => el.textContent);
console.log(`text`, text)
await browser.close();
})();
// import puppeteer from 'puppeteer';
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
// close headless , open chromium 🚀
});
const page = await browser.newPage();
await page.goto('https://www.tesla.cn/modely/design#overview');
await page.setViewport({width: 1920, height: 1080});
// ❌ TimeoutError: Waiting for selector `.group-container` failed: Waiting failed: 30000ms exceeded
const searchResultSelector = '.group-container';
const box = await page.waitForSelector(searchResultSelector);
console.log(`box`, box)
const text = await box?.evaluate(el => el.textContent);
console.log(`text`, text)
await browser.close();
})();
// import puppeteer from 'puppeteer';
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// navigator.userAgent, UA ✅
await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36')
await page.goto('https://www.tesla.cn/modely/design#overview');
await page.setViewport({width: 1920, height: 1080});
// ❌ TimeoutError: Waiting for selector `.group-container` failed: Waiting failed: 30000ms exceeded
const searchResultSelector = '.group-container';
const box = await page.waitForSelector(searchResultSelector);
console.log(`box`, box)
const text = await box?.evaluate(el => el.textContent);
console.log(`text`, text)
await browser.close();
})();
(🐞 反爬虫测试!打击盗版⚠️)如果你看到这个信息, 说明这是一篇剽窃的文章,请访问 https://www.cnblogs.com/xgqfrms/ 查看原创文章!
Node.js 如何延迟
爬取 SSR 动态渲染
的页面内容
Node.js 如何延迟爬取动态渲染的页面内容
如何爬取动态渲染的页面内容 puppeteer
✅, waitUntil
https://www.cnblogs.com/xgqfrms/p/17673122.html#5207654
refs
https://github.com/puppeteer/puppeteer/issues/8166#issuecomment-1704367760
https://www.cnblogs.com/xgqfrms/p/17673122.html#5207658
©xgqfrms 2012-2021
www.cnblogs.com/xgqfrms 发布文章使用:只允许注册用户才可以访问!
原创文章,版权所有©️xgqfrms, 禁止转载 🈲️,侵权必究⚠️!
本文首发于博客园,作者:xgqfrms,原文链接:https://www.cnblogs.com/xgqfrms/p/17675967.html
未经授权禁止转载,违者必究!