爬虫,工具 - Splash

What is it?

Splash is a javascript rendering service. It’s a lightweight web browser with an HTTP API
http://splash.readthedocs.io/en/stable/

用途

爬虫方面可以抓取JS渲染的页面(selenium也可以解决此问题)

用法

  1. 用docker开启Splash服务(可以分布式,在多台机器上用docker开启Splash服务)
  2. Python中用拼接Lua脚本,请求Splash的API
import requests
from urllib.parse import quote

lua = '''
function main(splash)
    return 'hello'
end
'''

url = 'http://localhost:8050/execute?lua_source=' + quote(lua)
response = requests.get(url)
print(response.text)
posted @ 2018-08-14 19:53  Rocin  阅读(372)  评论(0编辑  收藏  举报