python 数据分析 - 随笔分类 - 天天见和

客户价值分析-K均值聚类分析及结论

摘要：#k-means聚类分析数据标准化zcdata=(cdata-cdata.mean())/cdata.std()zcdata.head()from sklearn.cluster import KMeanskmodel=KMeans(n_clusters=4,n_jobs=4,max_iter=1 阅读全文

posted @ 2019-10-04 21:35 天天见和阅读(692) 评论(0) 推荐(0)

用户在投时长及F、M指标的构造

摘要：import pandas as pdfrom datetime import datetimefrom math import ceil #向上取整from pandas import DataFramedata=pd.read_excel('./data/data5.xlsx',encoding 阅读全文

posted @ 2019-10-04 21:34 天天见和阅读(229) 评论(0) 推荐(0)

异常值的处理

摘要：#异常值的处理import pandas as pdimport numpy as npSegData=pd.read_csv('./data/data1.csv',encoding='gbk')SegData.head()SegData.describe().TSegData.loc[:,'供应商阅读全文

posted @ 2019-10-04 06:34 天天见和阅读(236) 评论(0) 推荐(0)

缺失值的处理

摘要：#缺失值的处理from pandas import Seriesimport numpy as npstringSer=Series(['a','b',np.nan,'d','e'])#isnull 显示空值 stringSer.isnull()#notnull 显示非空值stringSer.not 阅读全文

posted @ 2019-10-03 22:22 天天见和阅读(178) 评论(0) 推荐(0)

数据集的合并

摘要：import pandas as pddf1=pd.DataFrame({'key':['a','b','c'],'data1':range(3)})df2=pd.DataFrame({'key':['a','b','d'],'data2':range(3)})pd.merge(df1,df2) # 阅读全文

posted @ 2019-10-03 20:55 天天见和阅读(491) 评论(0) 推荐(0)

数据透视表

摘要：import pandas as pdpivot_data=pd.read_excel('./data/data3.xlsx')pivot_data.head()pivot_data.describe(include='all').T import numpy as np#透视表求进货价合计、平均阅读全文

posted @ 2019-10-03 20:32 天天见和阅读(156) 评论(0) 推荐(0)

相关性分析散点图

摘要：import pandas as pddf2=pd.read_excel('./data/data2.xlsx',index_col='产品编码')df2.head()x=df2['供应商进货价']y=df2['销售价']from pylab import mplmpl.rcParams['font 阅读全文

posted @ 2019-10-03 20:03 天天见和阅读(3421) 评论(0) 推荐(0)

常用统计量分析

摘要：import pandas as pd df1=pd.read_csv("./data/data1.csv",encoding='gbk',index_col='产品编码')print(df1.head())#获得数据print(len(df1))print(df1.index.size)#求均值m 阅读全文

posted @ 2019-10-03 19:37 天天见和阅读(277) 评论(0) 推荐(0)

Scrapy项目的目录结构

摘要：#执行这个爬虫文件 scrapy crawl spider51job 阅读全文

posted @ 2019-10-01 22:26 天天见和阅读(554) 评论(0) 推荐(0)

模拟登陆某在线教育平台

摘要：import requestsfrom lxml import etree# http://www.dajiangtai.com/login/check.dopost_url='http://www.dajiangtai.com/login/check.do'mysession=requests.S 阅读全文

posted @ 2019-10-01 20:48 天天见和阅读(111) 评论(0) 推荐(0)

模拟登陆基本原理，理解Post、Cookies和Session

摘要：import requests#通过cookies登录post_url='http://pythonscraping.com/pages/cookies/welcome.php'userdata={"username":"zhangsan","password":"password"}post_re 阅读全文

posted @ 2019-10-01 20:02 天天见和阅读(838) 评论(0) 推荐(0)

利用Beautiful Soup爬取招聘网站数据

摘要：import requestsfrom bs4 import BeautifulSoupimport pandas as pdfrom pandas import DataFrame url='https://search.51job.com/list/120300,000000,0000,32,9 阅读全文

posted @ 2019-09-29 23:13 天天见和阅读(331) 评论(0) 推荐(0)

Beautiful Soup：四大常用对象

摘要：from bs4 import BeautifulSoup text='''<?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book><title lang='eng'>Harry Potter</title><price>29.9</p 阅读全文

posted @ 2019-09-29 21:07 天天见和阅读(637) 评论(0) 推荐(0)

代理IP的设置及处理超时异常

摘要：import requestsimport re #获得本要IP url='http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&tn=baidu&wd=ip' res=requests.get(url)res.encoding='utf-8' 阅读全文

posted @ 2019-09-28 06:51 天天见和阅读(886) 评论(0) 推荐(0)

一款好用的工具fake-useragent及浏览器代理池

摘要：import requestsfrom lxml import etreeimport randomfrom fake_useragent import UserAgent ua=UserAgent()uas=[]for i in range(5): uas.append(ua.random) #生阅读全文

posted @ 2019-09-26 22:51 天天见和阅读(565) 评论(0) 推荐(0)

通过设置ua模拟浏览器

摘要：import requestsfrom lxml import etree url='https://ie.icoa.cn/'head={'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like 阅读全文

posted @ 2019-09-26 22:15 天天见和阅读(932) 评论(0) 推荐(0)

正则匹配实例：提取数字、匹配电话号码及QQ号

摘要：\d[{n},{n,},{n,m}] 匹配十进制数字 n次，最少n次，最少n次最多m次 \D 匹配非十进制数字 [...] 表示一组字符，匹配里面任一字符 [^...]不在里面的任一字符 +匹配前面的子表达式； \s 空白字符； \S 除空白字符 (?:pattern)匹配但不取结果； ^ 表示开始阅读全文

posted @ 2019-09-24 22:31 天天见和阅读(561) 评论(0) 推荐(0)

利用Python爬虫批量获取电商网站图片

摘要：import requestsimport re url='https://list.jd.com/list.html?cat=9987,653,655'res=requests.get(url)image_pat='<img width="220" height="220" data-img="1 阅读全文

posted @ 2019-09-24 22:14 天天见和阅读(977) 评论(0) 推荐(0)

将爬取到的数据存入数据框并导出

摘要：import requestsfrom lxml import etreefrom pandas import DataFrame url='https://search.51job.com/list/120800,000000,0000,32,9,99,%25E4%25BA%25A7%25E5%2 阅读全文

posted @ 2019-09-22 10:02 天天见和阅读(567) 评论(0) 推荐(0)

天天见

随笔分类 - python 数据分析

公告