12 2015 档案

使用 lxml 中的 xpath 高效提取文本与标签属性值

摘要：以下代码在 python 3.5 + jupyter notebook 中运行测试无误！# 我们爬取网页的目的，无非是先定位到DOM树的节点，然后取其文本或属性值myPage = ''' TITLE 我的博客我的文章 ... 阅读全文

posted @ 2015-12-27 07:16 罗兵阅读(39330) 评论(0) 推荐(7) 编辑

如何用 Python 爬取需要登录的网站

摘要：【原文地址：】http://python.jobbole.com/83588/import requestsfrom lxml import html# 创建 session 对象。这个对象会保存所有的登录会话请求。session_requests = requests.session()# 提取在... 阅读全文

posted @ 2015-12-22 18:08 罗兵阅读(11154) 评论(0) 推荐(1) 编辑

python 线程及线程池

摘要：一、多线程import threadingfrom time import ctime,sleepdef music(func): for i in range(2): print("I was listening to %s. %s" %(func,ctime())) ... 阅读全文

posted @ 2015-12-17 03:27 罗兵阅读(627) 评论(1) 推荐(0) 编辑

获取当前页面的所有链接的四种方法对比（python 爬虫）

摘要：'''得到当前页面所有连接'''import requestsimport refrom bs4 import BeautifulSoupfrom lxml import etreefrom selenium import webdriverurl = 'http://www.ok226.com'r... 阅读全文

posted @ 2015-12-14 03:15 罗兵阅读(12236) 评论(4) 推荐(3) 编辑

公告

w e l c o m e ， w e l c o m e ！
您是本博第

位访客

昵称：罗兵
园龄： 10年2个月
粉丝： 338
关注： 13

+加关注

2025年3月

日

一

二

三

四

五

六

随笔档案

django

Git

Git版本控制软件结合GitHub从入门到精通常用命令学习手册

python

scrapy py3

12 2015 档案

公告

搜索

常用链接

我的标签

积分与排名

随笔档案

django

Git

python

SQL

阅读排行榜

评论排行榜

推荐排行榜

最新评论