睚一 - 博客园

2019年1月

摘要： #!/usr/bin/env python # coding: utf-8 # In[3]: from bs4 import BeautifulSoup # # BeautifulSoup介绍 # 1、BeautifulSoup是基础HTML DOM的，会载入整个文档构建整个DOM树，因此时间和内存开销大，性能也就更低，而lxml只是进行局部的遍历 # # 2、BeautifulSou... 阅读全文

posted @ 2019-01-23 22:49 睚一阅读(183) 评论(0) 推荐(0) 编辑

2017年7月

翻译程序

摘要： import requests import json def translation(): #发送翻译内容的网址 url = "http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=null" #翻译时需要发送的数据，目前只可以中英... 阅读全文

posted @ 2017-07-31 23:24 睚一阅读(223) 评论(0) 推荐(0) 编辑

使用selenium爬取天猫美食店铺

摘要： '''利用selenium爬取网页内容''' import re import time from selenium import webdriver from selenium.common.exceptions import TimeoutException from selenium.webd 阅读全文

posted @ 2017-07-23 21:36 睚一阅读(381) 评论(0) 推荐(0) 编辑

用selenium爬取淘宝美食

摘要： '''利用selenium爬取淘宝美食网页内容''' import re from selenium import webdriver from selenium.common.exceptions import TimeoutException from selenium.webdriver.co 阅读全文

posted @ 2017-07-23 21:30 睚一阅读(198) 评论(0) 推荐(0) 编辑

2017年6月

Python爬取小猪短租全网数据

摘要：下面是城市的数据： domestic_list = [ {'北京': ['beijing', '8221']}, {'上海': ['shanghai', '6996']}, {'广州': ['guangzhou', '2727']}, {'成都': ['chengdu', '5369']}, {'深阅读全文

posted @ 2017-06-21 21:00 睚一阅读(3510) 评论(0) 推荐(1) 编辑

python设置代理IP来爬取拉勾网上的职位信息，

摘要： import requests import json import time position = input('输入你要查询的职位：') url = 'https://www.lagou.com/jobs/positionAjax.json?city=%E6%9D%AD%E5%B7%9E&needAddtionalResult=false' headers = { 'User-A... 阅读全文

posted @ 2017-06-07 23:25 睚一阅读(1008) 评论(0) 推荐(0) 编辑

Python爬取全国历史天气数据

摘要： 1、通过爬取历史首页，来获取城市地址和历史时间，构建链接； ''' 获取全国的城市名称和链接 ''' import requests from lxml import etree import random import pymongo from time_list import get_time 阅读全文

posted @ 2017-06-05 22:44 睚一阅读(1089) 评论(0) 推荐(0) 编辑

Python获取个人网站的所有课程下载链接和密码，并保存到Mongodb中

摘要： 1、获取网站课程的分类地址； ''' 爬取屌丝首页，获取每个分类名称和链接 ''' import requests from lxml import etree headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWe 阅读全文

posted @ 2017-06-03 23:53 睚一阅读(274) 评论(0) 推荐(0) 编辑

2017年5月

python 爬虫框架Scrapy爬取当当网数据

摘要： setting.py需要修改的两个地方：阅读全文

posted @ 2017-05-20 21:38 睚一阅读(971) 评论(0) 推荐(0) 编辑

Python模拟登入豆瓣网，并爬取小组信息

摘要： import requests from bs4 import BeautifulSoup from PIL import Image headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Saf... 阅读全文

posted @ 2017-05-05 23:55 睚一阅读(2620) 评论(0) 推荐(0) 编辑

公告