crawle web 爬虫&浏览器自动化库

crawle web 爬虫&浏览器自动化库
包含的特性js&ts 支持
http 爬取，集成了cheerio 以及jsdom 的解析器
无头浏览器支持
爬取自动proxy 处理
队列以及存储，可以保存文件，快照，json 结果
内部不少方便的工具类，方便数据提取
说明crawle 同时也提供了python 包，可以方便使用python 周边的集成
参考资料https://github.com/apify/crawlee
https://crawlee.dev/python/
https://crawlee.dev/docs/introduction/first-crawler
https://github.com/apify/crawlee-python
https://crawlee.dev/

posted on 2024-12-02 08:00 荣锋亮阅读(175) 评论(0) 收藏举报

刷新页面返回顶部

导航

公告