2019 年 11月随笔档案 - 市丸银

摘要：https://www.cnblogs.com/woaixuexi9999/p/9247705.html 阅读全文

posted @ 2019-11-29 17:42 市丸银阅读(347) 评论(0) 推荐(0)

摘要：一、安装Ubuntu18.04 省略二、安装VmwareTool 1.选择机器右击安装2.打开文件，copy压缩文件到其它目录(理由：内存不够解压)3.解压文件，运行./忘记名字了.pl文件4.注意安装的过程，不是所有的选项都选默认三、Ubuntu解决复制粘贴的问题目的：方便更换清华源 su 阅读全文

posted @ 2019-11-29 17:19 市丸银阅读(1313) 评论(0) 推荐(0)

numpy 其它常用方法

摘要：一、创建特殊的数组 1、ones() 语法 np.ones(shape, dtype=None) # shape 创建数组的shape # dtype 指定数组的数据类型例子 import numpy as np arr1 = np.ones((3, 4), dtype="int64") prin 阅读全文

posted @ 2019-11-28 23:35 市丸银阅读(148) 评论(0) 推荐(0)

numpy 数组的拼接

摘要：一、数组的拼接 1、水平拼接 a、格式 np.hstack(（数组1, 数组2）) # 注意：值是元祖 # 0轴长要相同 b、例子 import numpy as np arr1 = np.arange(0, 12).reshape(2, 6) arr2 = np.arange(12, 22).r 阅读全文

posted @ 2019-11-28 23:32 市丸银阅读(347) 评论(0) 推荐(0)

numpy 数值的修改

摘要：一、步骤 1、查找值使用数组的索引和切片 2、修改值直接赋值例子 import numpy as np arr1 = np.arange(0, 24).reshape(4, 6) # 使用数组的索引和切片查找值，并修改值 arr1[:, 2:5] = 10 print(arr1) 二、查找值补阅读全文

posted @ 2019-11-28 23:31 市丸银阅读(959) 评论(0) 推荐(0)

numpy 索引和切片

摘要：一、取行 1、单行数组[index, :] # 取第index+1行例子 import numpy as np arr1 = np.arange(0, 24).reshape(4, 6) # 取第2行数据 row1 = arr1[1, :] print(row1) 2、连续的多行数组[star 阅读全文

posted @ 2019-11-28 23:29 市丸银阅读(139) 评论(0) 推荐(0)

numpy 读取数据

摘要：一、CSV文件 CSV: Comma-Separated Value，逗号分隔值文件显示：表格状态源文件：换行和逗号分隔，逗号列，换行行二、读取数据 1、方法 loadtxt(fname, dtype=float, delimiter=None, skiprows=0, usecols=N 阅读全文

posted @ 2019-11-28 23:27 市丸银阅读(747) 评论(0) 推荐(0)

numpy 数组的计算

摘要：一、数组和数的计算数组和数计算，数组中的每个元素和数进行计算 1、加 import numpy as np arr1 = np.arange(12).reshape(3, 4) print(arr1) # 数组的每个元素和数进行加法运算 arr2 = arr1 + 2 print(arr2) 2、阅读全文

posted @ 2019-11-28 23:25 市丸银阅读(487) 评论(0) 推荐(0)

numpy基础

摘要：一、基础知识 1、安装 conda install numpy 2、什么是numpy?Python中做科学计算的基础库，重在数值计算二、创建数组 import numpy as np # 方式一 np.array([1, 2, 3, 4, 5]) # 方式二 np.array(range(5)) 阅读全文

posted @ 2019-11-28 18:03 市丸银阅读(188) 评论(0) 推荐(0)

matplotlib 直方图

摘要：一、特点数据必须是原始数据不能经过处理，数据连续型，显示一组或多组分布数据 histogram 直方图 normed 定额二、核心 hist（x, bins=None, normed=None） # x是需要统计的数据，类型：数组 # bins是组数, 组数 = (max(数组)- min(数组阅读全文

posted @ 2019-11-27 23:35 市丸银阅读(362) 评论(0) 推荐(0)

matplotlib 条形图

摘要：一、特点离散数据，数据之间没有直接的关系二、分类 1、垂直条形图 bar(x, height, width=0.8) # x 为x轴 # height 为y轴 # width 为条形图的宽度例子 from matplotlib import pyplot as plt from matplo 阅读全文

posted @ 2019-11-27 23:32 市丸银阅读(241) 评论(0) 推荐(0)

matplotlib 散点图

摘要：一、特点离散的数据，查看分布规律，走向趋势二、使用 1、核心 plt.scatter(x, y) # x为x轴的数据，可迭代对象，必须是数字 # y为y轴的数据，可迭代对象，必须是数字 # x和y必须一一对应 2、例子注意：在设置x轴或y轴刻度时，ticks和labes的值要一一对应 from 阅读全文

posted @ 2019-11-27 22:49 市丸银阅读(183) 评论(0) 推荐(0)

matplotlib

摘要：注意：绘图和设置x轴的刻度是相互独立的不相关的官网：https://matplotlib.org/ 案例：https://matplotlib.org/3.1.1/gallery/index.html 一、安装 conda install matplotlib 二、作用 1.将数据可疏忽，更直观的阅读全文

posted @ 2019-11-27 16:08 市丸银阅读(174) 评论(0) 推荐(0)

matplotlib 折线图

摘要：1、基本要点 # 导入模块 from matplotlib import pyplot as plt # x轴数据 x = range(2, 26, 2) # y轴数据 y = [15, 13, 14.5, 17, 20, 25, 26, 26, 27, 22, 18, 15] # 绘图 plt.p 阅读全文

posted @ 2019-11-27 11:23 市丸银阅读(190) 评论(0) 推荐(0)

pymongo

摘要：一、安装 conda install pymongo 二、使用 1、连接 from pymongo import MongoClient client = MongoClient(host='ip', port=27017) # 使用方括号的方式选择数据库和集合 collection = clien 阅读全文

posted @ 2019-11-26 18:01 市丸银阅读(213) 评论(0) 推荐(0)

MongoDB索引

摘要：一、作用提升查找的速度二、计算查找消耗时间格式 db.集合名称.find(查询条件)..explain('executionStats') 例子插入10万条数据到数据库中 for(i=0; i < 100000; i++){ db.t4.insert({name: 'test' + i, a 阅读全文

posted @ 2019-11-26 16:23 市丸银阅读(293) 评论(0) 推荐(0)

scrapy selector选择器

摘要：这部分内容属于补充内容 1、xpath() 2、css() 3、正则表达式 # 多个值，列表 response.xpath('//a/text()').re('(.*?):\s(.*)') # 取第一个值 response.xpath('//a/text()').re_first('(.*?):\s 阅读全文

posted @ 2019-11-25 21:00 市丸银阅读(112) 评论(0) 推荐(0)

MongoDB聚合(aggregate)

摘要：一、基础 1、什么是聚合？聚合是基于数据处理的聚合管道，每个文档通过一个有多个阶段(stage)组成的管道可以对每个阶段的管道进行分组、过滤等功能，然后经过一系列的处理，输出相应的结果 db.集合名称.aggregate({管道: {表达式}}) 有点像Django中ORM聚合的语法 2、常用管道阅读全文

posted @ 2019-11-25 18:00 市丸银阅读(1466) 评论(0) 推荐(0)

MongoDB数据库备份和恢复

摘要：1、数据库备份 mogodbdump -h dbhost -d dbname -o dbdirectory -h: 服务器地址，也可以指定端口号 -d: 需要备份的数据库的名称 -o: 备份的数据库存放的位置，此目录中存放着备份出来的数据 mongodump -h 127.0.0.1:27017 - 阅读全文

posted @ 2019-11-25 15:21 市丸银阅读(147) 评论(0) 推荐(0)

代理的使用

摘要：1、代理池： https://github.com/Python3WebSpider/ProxyPool 从网络上获取代理判断是否可用储存到redis 定期检测代理地址的有效性 api：通过url获取代理 2、使用过程代理为None，若ip被封禁(响应状态码)，从代理池中获取新的代理，请求使用阅读全文

posted @ 2019-11-24 20:46 市丸银阅读(110) 评论(0) 推荐(0)

Ubuntu阿里镜像

摘要：ubuntu 14.04： http://mirrors.aliyun.com/ubuntu-releases/14.04/ ubuntu 16.04： http://mirrors.aliyun.com/ubuntu-releases/16.04/ ubuntu 18.04： http://mir 阅读全文

posted @ 2019-11-22 19:57 市丸银阅读(140) 评论(0) 推荐(0)

MongoDB查询

摘要：一、基本操作 # 查找全部数据 db.集合名称.find() # 根据条件查找数据 db.集合名称.find({条件文档}) # 根据条件查找一条数据 db.集合名称.findOne({条件文档}) # 美化数据的输出格式 db.集合名称名称.find({条件文档}).pretty() # 注意：阅读全文

posted @ 2019-11-22 17:57 市丸银阅读(336) 评论(0) 推荐(0)

MongoDB基础操作

摘要：一、数据库操作 # 1.显示数据库 show dbs show databases # 2.创建数据库或使用数据库 use db # 注意： db 存储值后，才会在show dbs命令下显示 # 3.查看当前数据库 db # 4.删除当前的数据库 db.dropDatabase() 二、集合操作注阅读全文

posted @ 2019-11-22 14:41 市丸银阅读(127) 评论(0) 推荐(0)

MongoDB简介

摘要：一、概念： MogoDB 文档型非关系数据库二、优缺点优点：易扩展： NoSQL数据库的特点大数量，高性能：高读写性能（无关系性）灵活的数据模型：不需要事先创建数据的存储字段缺点：大量重复数据三、基础知识 # 1.启动命令 mongo # 2.退出 exit # 3.端口：270 阅读全文

posted @ 2019-11-22 10:31 市丸银阅读(120) 评论(0) 推荐(0)

Ubuntu18.04 安装配置mongodb

摘要：一、安装 # 1. 更新 sudo apt-get update # 2. 安装 sudo apt-get install -y mongodb # 3. 查看是否安装成功 # a. 服务状态 sudo systemctl status mongodb sudo service mongodb st 阅读全文

posted @ 2019-11-22 10:17 市丸银阅读(550) 评论(0) 推荐(0)

pymysql总结

摘要：一、创建数据库 import pymysql conn = pymysql.connect(host='ip', user='root', password='密码') # 以字典的形式返回操作结果 cursor = conn.cursor(cursor=pymysql.cursors.DictCu 阅读全文

posted @ 2019-11-21 18:03 市丸银阅读(173) 评论(0) 推荐(0)

Ubuntu18.04安装mysql

摘要：一、安装 # 1.更新 sudo apt-get update # 2.安装 sduo apt-get install -y mysql-server mysql-client # 3.确保mysql服务开启 sudo service mysql start sudo service mysql s 阅读全文

posted @ 2019-11-21 16:00 市丸银阅读(232) 评论(0) 推荐(0)

pyquery解析库

摘要：语法和jquey几乎一致安装 conda install pyquery 一、初始化标准用法 from pyquery import PyQuery as pq import requests # r = requests.get(url='http://www.baidu.com') html 阅读全文

posted @ 2019-11-21 13:00 市丸银阅读(231) 评论(0) 推荐(0)

urllib基本用法(了解)

摘要：一、urllib.urlopen 1、urlopen from urllib import request r = request.urlopen('http://www.baidu.com/') # 获取状态码 print(r.status) # 获取相应头 print(r.getheaders( 阅读全文

posted @ 2019-11-20 23:43 市丸银阅读(460) 评论(0) 推荐(0)

保存数据到txt

摘要：join用的不错 a = "Hello, world" b = "你好，世界" c = "How are you?" with open(file='a.txt', mode='w', encoding='utf-8') as f: f.write('\n'.join([a, b, c])) f.w 阅读全文

posted @ 2019-11-20 17:57 市丸银阅读(307) 评论(0) 推荐(0)

保存数据到csv

摘要：csv 逗号分隔值一、写入 1、列表单行添加 import csv # with open(file='a.csv', mode='w', encoding='utf-8', newline='') as f: write = csv.writer(f) write.writerow(['id' 阅读全文

posted @ 2019-11-20 17:49 市丸银阅读(776) 评论(0) 推荐(0)

scrapy-splash

摘要：官网：https://github.com/scrapy-plugins/scrapy-splash 1、安装： pip install scrapy-splash 2、运行splash docker run -p 8050:8050 scrapinghub/splash 3、配置setting文件阅读全文

posted @ 2019-11-20 13:44 市丸银阅读(129) 评论(0) 推荐(0)

urllib parse

摘要：1、urlparse 作用：解析url from urllib import parse url = "https://book.qidian.com/info/1004608738" result = parse.urlparse(url=url) print(result) 结果： ParseR 阅读全文

posted @ 2019-11-20 12:43 市丸银阅读(133) 评论(0) 推荐(0)

Splash

摘要：官网： https://splash.readthedocs.io/en/stable/index.html 常用接口(API) 1、render.html 格式： http://10.63.32.49:8050/render.html?url=https://www.baidu.com&wait= 阅读全文

posted @ 2019-11-19 13:00 市丸银阅读(652) 评论(0) 推荐(0)

Splash简单应用

摘要：jd->iphone import requests from lxml import etree # search_key = 'iphone' jd_url = "https://search.jd.com/Search?keyword={}&enc=utf-8&wq={}&pvid=1a54a 阅读全文

posted @ 2019-11-18 12:30 市丸银阅读(145) 评论(0) 推荐(0)

unbuntu18.04安装启用splash

摘要：官网：https://splash.readthedocs.io/en/stable/ 1、安装Docker https://www.cnblogs.com/wt7018/p/11880666.html 2、pull the image sudo docker pull scrapinghub/sp 阅读全文

posted @ 2019-11-18 10:57 市丸银阅读(394) 评论(0) 推荐(0)

Ubuntu18.04安装docker

摘要：参考 https://www.runoob.com/docker/ubuntu-docker-install.html 1.卸载 sudo apt-get remove docker docker-engine docker.io containerd runc 2.安装Docker sudo ap 阅读全文

posted @ 2019-11-18 10:33 市丸银阅读(30545) 评论(2) 推荐(4)

selenium等待

摘要：1、隐式等待查找节点，如果第一时间没有找到，则等待10秒，然后再去查找，如果没有找到则爬出异常 from selenium import webdriver # browser = webdriver.Chrome() browser.implicitly_wait(10) browser.get 阅读全文

posted @ 2019-11-17 21:15 市丸银阅读(132) 评论(0) 推荐(0)

selenium chrome headless无界面引擎

摘要：注意：PhantomJS已被舍弃 chrome headless 在打开浏览器之前添加参数 import time import sys from selenium import webdriver from selenium.webdriver.common.keys import Keys fr 阅读全文

posted @ 2019-11-17 00:40 市丸银阅读(253) 评论(0) 推荐(0)

基于selenium爬取京东

摘要：爬取iphone 注意：browser对象会发生变化，当对当前网页做任意操作时 import time from selenium import webdriver from selenium.webdriver.common.keys import Keys # if __name__ == '_ 阅读全文

posted @ 2019-11-17 00:13 市丸银阅读(281) 评论(0) 推荐(0)

selenium

摘要：注意：浏览器对象(browser)每次操作页面，都会发生变化，包含下拉页面，踩过坑一、打开百度搜索python为例 from selenium import webdriver browser = webdriver.Chrome() browser.get('https://www.baidu. 阅读全文

posted @ 2019-11-16 18:52 市丸银阅读(120) 评论(0) 推荐(0)

ChromeDriver安装

摘要：Chrome的驱动 0、安装selenium pip3 install -i https://pypi.douban.com/simple selenium 1、查看chrom版本 chrome://version/ 2、下载 http://chromedriver.storage.googleap 阅读全文

posted @ 2019-11-16 15:58 市丸银阅读(168) 评论(0) 推荐(0)

scrapy持久化到Excel表格

摘要：前提条件：防止乱码产生 ITEM_PIPELINES = { 'xpc.pipelines.ExcelPipeline': 300, } 方法一 1、安装openpyxl conda install openpyxl 2、pipline from openpyxl import Workbook 阅读全文

posted @ 2019-11-15 17:21 市丸银阅读(652) 评论(0) 推荐(0)

scrapy在存储数据到json文件中时，中文变成为\u开头的字符串的处理方法

摘要：在settings.py文件中添加 FEED_EXPORT_ENCODING = 'utf-8' 阅读全文

posted @ 2019-11-15 16:08 市丸银阅读(519) 评论(0) 推荐(0)

Charles抓包工具

摘要：一、安装和破解 http://www.3322.cc/soft/49689.html#xzdz 二、使用比较靠谱： https://www.cnblogs.com/weizhideweilai/p/9833781.html https://www.cnblogs.com/qingqing-919/ 阅读全文

posted @ 2019-11-14 15:16 市丸银阅读(121) 评论(0) 推荐(0)

ancconda创建爬虫项目

摘要：# 安装 conda env list conda create -n <envname> conda activate <envname> conda install scrapy scrapy # 检测安装是否成功 # 创建项目 cd /d 目标路径目录 scrapy startproject 阅读全文

posted @ 2019-11-14 11:22 市丸银阅读(189) 评论(0) 推荐(0)

BeautifulSoup

摘要：官网：https://www.crummy.com/software/BeautifulSoup/bs4/doc/ 菜鸟教程：http://www.jsphp.net/python/show-24-214-1.html 自己写的日记：https://i-beta.cnblogs.com/diarie 阅读全文

posted @ 2019-11-13 09:31 市丸银阅读(149) 评论(0) 推荐(0)

requests

摘要：官网： https://requests.kennethreitz.org//zh_CN/latest/user/quickstart.html 测试网站：httpbin.org 注意：其它看官网 1、带headers的请求 2、带cookies的请求 3、带Basic-auth(auth)的请求阅读全文

posted @ 2019-11-13 09:29 市丸银阅读(141) 评论(0) 推荐(0)

安装xpath helper

摘要：1、下载版本是：2.02的链接：https://pan.baidu.com/s/1YdyTbWElL904EMQ-9Ougnw 提取码：bxxa 2、无效安装的解决方案参考链接：https://www.cnblogs.com/ljxh/p/11222898.html a、后缀改成rar b、解阅读全文

posted @ 2019-11-12 17:53 市丸银阅读(1221) 评论(0) 推荐(0)

取消搜狗输入法的快捷键

摘要：https://blog.csdn.net/shishu8385/article/details/87787465 阅读全文

posted @ 2019-11-05 17:29 市丸银阅读(298) 评论(0) 推荐(0)

市丸银

知行合一

11 2019 档案

公告