随笔分类 - python 数据分析
摘要:#k-means聚类分析 数据标准化zcdata=(cdata-cdata.mean())/cdata.std()zcdata.head()from sklearn.cluster import KMeanskmodel=KMeans(n_clusters=4,n_jobs=4,max_iter=1
阅读全文
摘要:import pandas as pdfrom datetime import datetimefrom math import ceil #向上取整from pandas import DataFramedata=pd.read_excel('./data/data5.xlsx',encoding
阅读全文
摘要:#异常值的处理import pandas as pdimport numpy as npSegData=pd.read_csv('./data/data1.csv',encoding='gbk')SegData.head()SegData.describe().TSegData.loc[:,'供应商
阅读全文
摘要:#缺失值的处理from pandas import Seriesimport numpy as npstringSer=Series(['a','b',np.nan,'d','e'])#isnull 显示空值 stringSer.isnull()#notnull 显示非空值stringSer.not
阅读全文
摘要:import pandas as pddf1=pd.DataFrame({'key':['a','b','c'],'data1':range(3)})df2=pd.DataFrame({'key':['a','b','d'],'data2':range(3)})pd.merge(df1,df2) #
阅读全文
摘要:import pandas as pdpivot_data=pd.read_excel('./data/data3.xlsx')pivot_data.head()pivot_data.describe(include='all').T import numpy as np#透视表 求进货价合计、平均
阅读全文
摘要:import pandas as pddf2=pd.read_excel('./data/data2.xlsx',index_col='产品编码')df2.head()x=df2['供应商进货价']y=df2['销售价']from pylab import mplmpl.rcParams['font
阅读全文
摘要:import pandas as pd df1=pd.read_csv("./data/data1.csv",encoding='gbk',index_col='产品编码')print(df1.head())#获得数据print(len(df1))print(df1.index.size)#求均值m
阅读全文
摘要:#执行这个爬虫文件 scrapy crawl spider51job
阅读全文
摘要:import requestsfrom lxml import etree# http://www.dajiangtai.com/login/check.dopost_url='http://www.dajiangtai.com/login/check.do'mysession=requests.S
阅读全文
摘要:import requests#通过cookies登录post_url='http://pythonscraping.com/pages/cookies/welcome.php'userdata={"username":"zhangsan","password":"password"}post_re
阅读全文
摘要:import requestsfrom bs4 import BeautifulSoupimport pandas as pdfrom pandas import DataFrame url='https://search.51job.com/list/120300,000000,0000,32,9
阅读全文
摘要:from bs4 import BeautifulSoup text='''<?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book><title lang='eng'>Harry Potter</title><price>29.9</p
阅读全文
摘要:import requestsimport re #获得本要IP url='http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&tn=baidu&wd=ip' res=requests.get(url)res.encoding='utf-8'
阅读全文
摘要:import requestsfrom lxml import etreeimport randomfrom fake_useragent import UserAgent ua=UserAgent()uas=[]for i in range(5): uas.append(ua.random) #生
阅读全文
摘要:import requestsfrom lxml import etree url='https://ie.icoa.cn/'head={'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like
阅读全文
摘要:\d[{n},{n,},{n,m}] 匹配十进制数字 n次,最少n次,最少n次最多m次 \D 匹配非十进制数字 [...] 表示一组字符,匹配里面任一字符 [^...]不在里面的任一字符 +匹配前面的子表达式; \s 空白字符; \S 除空白字符 (?:pattern)匹配但不取结果; ^ 表示开始
阅读全文
摘要:import requestsimport re url='https://list.jd.com/list.html?cat=9987,653,655'res=requests.get(url)image_pat='<img width="220" height="220" data-img="1
阅读全文
摘要:import requestsfrom lxml import etreefrom pandas import DataFrame url='https://search.51job.com/list/120800,000000,0000,32,9,99,%25E4%25BA%25A7%25E5%2
阅读全文