菜鸟key

2018年8月23日

摘要： import requests from bs4 import BeautifulSoup import bs4 def getHTMLText(url): try: r = requests.get(url, timeout=30) r.raise_for_status() r.encoding = r.appar... 阅读全文

posted @ 2018-08-23 16:39 菜鸟key 阅读(327) 评论(0) 推荐(0)

第二周 2（信息标记与提取）

摘要：标记后的信息可形成信息组织结构，增加了信息维度标记的结构与信息一样具有重要价值标记后的信息可用于通信、存储或展示标记后的信息更利于程序理解和运用信息标记的三种形式： XMLJSONYAML 基于bs4库的HTML内容查找方法阅读全文

posted @ 2018-08-23 15:32 菜鸟key 阅读(282) 评论(0) 推荐(0)

2018年8月22日

第二周 1（beautiful soup库）

摘要： 1 安装 pip3 install beautifulsoup4 小测： 3 beautiful soup基本元素 Beautiful Soup库的引用Beautiful Soup库，也叫beautifulsoup4 或 bs4约定引用方式如下，即主要是用BeautifulSoup类 from bs 阅读全文

posted @ 2018-08-22 11:19 菜鸟key 阅读(419) 评论(0) 推荐(0)

2018年8月21日

第一周 2（requests库实战）

摘要： 1 京东商品页面爬取 2 亚马逊商品页面的爬取 3 百度/360搜索关键词提交百度的关键词接口：http://www.baidu.com/s?wd=keyword360的关键词接口：http://www.so.com/s?q=keyword 4 网络图片的爬取和储存图片爬取全代码 4 IP归属地阅读全文

posted @ 2018-08-21 18:38 菜鸟key 阅读(254) 评论(0) 推荐(0)

第一周 1 （requests库）

摘要： 2. requests.get()方法 r = requests.get(url)Requests库的2个重要对象作用：构造一个向服务器请求资源的Request对象，返回一个包含服务器资源的Response对象 Response对象的属性 r.content 获得一个图片，图片以二进制存储，r.co 阅读全文

posted @ 2018-08-21 14:39 菜鸟key 阅读(506) 评论(0) 推荐(0)

2018年8月2日

记录

摘要：阅读全文

posted @ 2018-08-02 09:55 菜鸟key 阅读(97) 评论(0) 推荐(0)

2018年7月31日

pd.concat()

摘要：请使用手机"扫一扫"x 阅读全文

posted @ 2018-07-31 09:32 菜鸟key 阅读(508) 评论(0) 推荐(0)

pd.get_dummies() onehot编码

摘要： dummies_Cabin = pd.get_dummies(data_train['Cabin'], prefix= 'Cabin') dummies_Embarked = pd.get_dummies(data_train['Embarked'], prefix= 'Embarked') dummies_Sex = pd.get_dummies(data_train['Sex'], pr... 阅读全文

posted @ 2018-07-31 09:24 菜鸟key 阅读(4910) 评论(0) 推荐(0)

pandas as_matrix()

摘要： df = pd.DataFrame(np.arange(12).reshape(3, 4)) df Out[10]: 0 1 2 3 0 0 1 2 3 1 4 5 6 7 2 8 9 10 11 df.as_matrix() Out[11]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7]... 阅读全文

posted @ 2018-07-31 09:06 菜鸟key 阅读(5714) 评论(0) 推荐(1)

2018年7月30日

pd.notnull pd.isnull transpose

摘要：请使用手机"扫一扫"x 阅读全文

posted @ 2018-07-30 21:30 菜鸟key 阅读(244) 评论(0) 推荐(0)

公告