python工具——playwright
playwright是微软开源的自动化项目
git地址 https://github.com/microsoft/playwright-python
安装
pip install playwright
查看支持的功能
python -m playwright -h Usage: index [options] [command] Options: -V, --version output the version number -b, --browser <browserType> browser to use, one of cr, chromium, ff, firefox, wk, webkit (default: "chromium") --color-scheme <scheme> emulate preferred color scheme, "light" or "dark" --device <deviceName> emulate device, for example "iPhone 11" --geolocation <coordinates> specify geolocation coordinates, for example "37.819722,-122.478611" --lang <language> specify language / locale, for example "en-GB" --load-storage <filename> load context storage state from the file, previously saved with --save-storage --proxy-server <proxy> specify proxy server, for example "http://myproxy:3128" or "socks5://myproxy:8080" --save-storage <filename> save context storage state at the end, for later use with --load-storage --timezone <time zone> time zone to emulate, for example "Europe/Rome" --timeout <timeout> timeout for Playwright actions in milliseconds (default: "10000") --user-agent <ua string> specify user agent string --viewport-size <size> specify browser viewport size in pixels, for example "1280, 720" -h, --help display help for command Commands: open [url] open page in browser specified via -b, --browser cr [url] open page in Chromium ff [url] open page in Firefox wk [url] open page in WebKit codegen [options] [url] open page and generate code for user actions screenshot [options] <url> <filename> capture a page screenshot pdf [options] <url> <filename> save page as pdf install Ensure browsers necessary for this version of Playwright are installed help [command] display help for command
下载 Chromeium、Firefox、Safari(WebKit)浏览器驱动
python -m playwright install
注:
如果执行命令时下载Chromeium后就报错,如下
Removing unused browser at C:\Users\Administrator\AppData\Local\ms-playwright\chromium-833159
(node:4392) UnhandledPromiseRejectionWarning: Error: EPERM: operation not permitted, open 'C:\Users\Administrator\AppData\Local\ms-playwright\chromium-833159\chrome-win\chrome_elf.dll'
(node:4392) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:4392) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
解决方法:
https://github.com/microsoft/playwright-python 上 local-requirements.txt,安装python库
autobahn==20.7.1 pytest==6.1.0 pytest-asyncio==0.14.0 pytest-cov==2.10.1 pytest-sugar==0.9.4 pytest-xdist==2.1.0 pytest-timeout==1.4.2 flaky==3.7.0 pixelmatch==0.2.1 Pillow==8.0.0 mypy==0.782 setuptools==50.3.0 # TODO: use PyPi version after >20.3.0 is released git+https://github.com/twisted/twisted.git@4ff22287cab3b54f51cee41ea2619e72d1bff2e4 wheel==0.35.1 black==20.8b1 pre-commit==2.7.1 flake8==3.8.3 twine==3.2.0 pyOpenSSL==19.1.0 service_identity==18.1.0 pdoc3==0.9.1
再执行
$ python -m playwright install Removing unused browser at C:\Users\Administrator\AppData\Local\ms-playwright\chromium-833159 chromium v833159 downloaded to C:\Users\Administrator\AppData\Local\ms-playwright\chromium-833159 firefox v1221 downloaded to C:\Users\Administrator\AppData\Local\ms-playwright\firefox-1221 webkit v1402 downloaded to C:\Users\Administrator\AppData\Local\ms-playwright\webkit-1402
1.同步
from playwright import sync_playwright with sync_playwright() as p: browser_type = p.chromium browser = browser_type.launch(headless=False) page = browser.newPage() page.goto('https://www.baidu.com/') page.screenshot(path=f'{browser_type.name}.png') browser.close()
2.异步
import asyncio from playwright import async_playwright async def main(): async with async_playwright() as p: browser_type = p.chromium browser = await browser_type.launch(headless=False) page = await browser.newPage() await page.goto('https://www.baidu.com/') await page.screenshot(path=f'{browser_type.name}1.png') await browser.close() asyncio.get_event_loop().run_until_complete(main())
3.手机模式
from playwright import sync_playwright with sync_playwright() as p: phone_name = 'iPhone 11 Pro' iphone_11 = p.devices[phone_name] browser = p.webkit.launch(headless=False) context = browser.newContext( **iphone_11, locale='zh-CN' ) page = context.newPage() page.goto('https://map.baidu.com/mobile/webapp/index/index/') page.screenshot(path=f'{phone_name}.png') browser.close()
4.录制脚本
使用命令 codegen
命令格式
$ python -m playwright codegen -h Usage: index codegen [options] [url] open page and generate code for user actions Options: -o, --output <file name> saves the generated script to a file --target <language> language to use, one of javascript, python, python-async, csharp (default: "python") -h, --help display help for command Examples: $ codegen $ codegen --target=python $ -b webkit codegen https://example.com
eg:
python -m playwright codegen --target python -o 'my1.py' -b chromium https://www.baidu.com
my1.py
from playwright import sync_playwright def run(playwright): browser = playwright.chromium.launch(headless=False) context = browser.newContext() # Open new page page = context.newPage() # Go to https://www.baidu.com/ page.goto("https://www.baidu.com/") # Click input[name="wd"] page.click("input[name=\"wd\"]") # Fill input[name="wd"] # with page.expect_navigation(url="https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&tn=baidu&wd=hello&fenlei=256&rsv_pq=e1fa5e4c0000d5a6&rsv_t=577f7bNmG1aOHBEWyQJLIfxHSyytPL7Q1GczIkdkkSwbPsx3vLC%2FGKkrqQE&rqlang=cn&rsv_enter=1&rsv_dl=tb&rsv_sug3=6&rsv_sug1=4&rsv_sug7=100&rsv_sug2=0&rsv_btype=i&prefixsug=hello&rsp=5&inputT=1909&rsv_sug4=3707"): with page.expect_navigation(): page.fill("input[name=\"wd\"]", "hello") # Close page page.close() # --------------------- context.close() browser.close() with sync_playwright() as playwright: run(playwright)