playwright-har 试用

playwright-har是puppeteer-har 的playwright 移植,使用方法上类似,以下是一个简单试用

环境准备

  • docker-compose
    使用了browserless
version: "3"
services:
  browser:
    image: ghcr.io/browserless/chromium:latest
    environment:
      - CONCURRENT=40
      - QUEUED=20
      - CORS=true
      - CORS_MAX_AGE=300
      - DATA_DIR=/tmp/my-profile
      - TOKEN=6R0W53R135510
    volumes:
      - ./my-profile:/tmp/my-profile
    ports:
      - "3000:3000"

使用

安装依赖yarn add playwright playwright-har

  • dalongv3.js
const { chromium } = require("playwright");
const {PlaywrightHar} = require('playwright-har');
 
(async () => {
  let browser = await chromium.connectOverCDP(
    "ws://localhost:3000?token=6R0W53R135510"
  );
  let bContext = await browser.newContext({
    userAgent:
      "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36",
  });
 
  let page = await bContext.newPage();
 
  const playwrightHar = new PlaywrightHar(page);
  await playwrightHar.start();
 
  await page.goto("https://news.baidu.com/", {
    waitUntil: "networkidle",
  });
  let json_result = []
  let resutlv2 = await page.locator(".column").all();
  for (const row of await resutlv2) {
    json_result.push({address:"https://news.baidu.com/",content:await row.innerHTML(),title:"dalong.js"})
  }
  console.log(JSON.stringify(json_result))
  await playwrightHar.stop('./baidu.har');
  await browser.close();
})();
  • har 文件效果

 

内部实现

内部实现上使用了chrome-har 这个npm 包

  • 参考实现
import { harFromMessages } from 'chrome-har';
import { writeFileSync } from 'fs';
import { CDPSession, Page } from 'playwright-chromium';
import { PlaywrightHarConfig } from './playwright-har-config';
 
export class PlaywrightHar {
 
    private page: Page;
    private client: CDPSession;
    private addResponseBodyPromises = [];
    private events = [];
    private config: PlaywrightHarConfig;
 
    constructor(page: Page, config: PlaywrightHarConfig = null) {
        this.page = page;
 
        if (config == null) {
          this.config = new PlaywrightHarConfig();
        }
    }
 
    async start() {
        //@ts-ignore
        // newCDPSession is only available for ChromiumBrowserContext
       // 通过page 的创建新的cdp 回话,之后就是对于page 不同事件的监听,内容包含了事件以及事件的content body 
        this.client = await this.page.context().newCDPSession(this.page);
        await this.client.send('Page.enable');
        await this.client.send('Network.enable');
        const observe = [
            'Page.loadEventFired',
            'Page.domContentEventFired',
            'Page.frameStartedLoading',
            'Page.frameAttached',
            'Page.frameScheduledNavigation',
            'Network.requestWillBeSent',
            'Network.requestServedFromCache',
            'Network.dataReceived',
            'Network.responseReceived',
            'Network.resourceChangedPriority',
            'Network.loadingFinished',
            'Network.loadingFailed',
            'Network.getResponseBody'
        ];
        observe.forEach(method => {
            //@ts-ignore
            // Doesn't work when array contains symbols instead of strings
            this.client.on(method, params => {
                const harEvent = { method, params };
                this.events.push(harEvent);
                if (method === 'Network.responseReceived') {
                    if (this.config.recordResponses === false) {
                        return;
                    }
 
                    const response = harEvent.params.response;
                    const requestId = harEvent.params.requestId;
                    // Response body is unavailable for redirects, no-content, image, audio and video responses
                    if (
                        response.status !== 204 &&
                        response.headers.location == null &&
                        !response.mimeType.includes('image') &&
                        !response.mimeType.includes('audio') &&
                        !response.mimeType.includes('video')
                    ) {
                        const addResponseBodyPromise = this.client.send('Network.getResponseBody', { requestId }).then(
                            responseBody => {
                                // Set the response so chrome-har can add it to the HAR file
                                harEvent.params.response = {
                                    ...response,
                                    body: Buffer.from(responseBody.body, responseBody.base64Encoded ? 'base64' : undefined).toString()
                                };
                            },
                            reason => { }
                        );
                        this.addResponseBodyPromises.push(addResponseBodyPromise);
                    }
                }
            });
        });
    }
  // 对于保存的事件以及事件内容,基于harFromMessages 包装har 内容,之后就是内容的写入
    async stop(path?: string) {
        await Promise.all(this.addResponseBodyPromises);
        const harObject = harFromMessages(this.events, { includeTextFromResponseBody: this.config.recordResponses !== false });
        this.events = [];
        this.addResponseBodyPromises = [];
        if (path) {
            writeFileSync(path, JSON.stringify(harObject));
        }
        else {
            return harObject
        }
 
    }
}

说明

har 好多时候是分析问题一个很不错的选择,playwright-har 值得试用下

参考资料

https://github.com/janzaremski/playwright-har
https://github.com/Everettss/puppeteer-har

posted on 2024-06-26 08:00  荣锋亮  阅读(25)  评论(0编辑  收藏  举报

导航