随笔分类

随笔档案

摘要：1.要下载的页面 http://www.howsoftworks.net/javaapi/2. 下载Teleport Ultra3.使用Teleport Ultra批量克隆网站4.下载Easy CHM5.使用Easy CHM合并生成chm文件阅读全文

posted @ 2018-03-20 20:52 刘达人186 阅读(381) 评论(0) 推荐(0) 编辑

爬网页

摘要：# coding=utf-8 import lxml, bs4, re, requests csvContent='' # file = open('D:\\tyc_demo.htm','rb') # soup = bs4.BeautifulSoup(... 阅读全文

posted @ 2018-03-01 18:21 刘达人186 阅读(100) 评论(0) 推荐(0) 编辑

摘要：# coding=utf-8import lxml,bs4,re,requestscsvContent=''file = open('D:\\tyc_demo.html','rb')soup = bs4.BeautifulSoup(file,'html.parser'... 阅读全文

posted @ 2018-02-10 14:49 刘达人186 阅读(129) 评论(0) 推荐(0) 编辑

摘要：注意点：1. 用Fiddler抓取登陆后的headers,cookies;2. 每抓取一次网页暂停一点时间防止反爬虫;3. 抓取前，需要关闭Fiddler以防止端口占用.还需解决的问题：爬取记录较多时，会触发反爬虫机制。用Fiddler抓取登陆后的headers,co... 阅读全文

posted @ 2018-01-26 20:04 刘达人186 阅读(518) 评论(0) 推荐(0) 编辑