2019 年 4月 30 日随笔档案 - GuoZeping

004-数据类型(String)

摘要：数据类型-String 阅读全文

posted @ 2019-04-30 23:42 GuoZeping 阅读(404) 评论(0) 推荐(0) 编辑

002-运算符

摘要：运算符一、算数运算符二、比较(关系)运算符三、赋值运算符四、逻辑运算符五、位运算符六、成员运算符七、身份运算符八、运算符优先级阅读全文

posted @ 2019-04-30 23:42 GuoZeping 阅读(140) 评论(0) 推荐(0) 编辑

应用层-Http/Https

摘要：应用层-Http/Https 一、Http 1. 1.1 2. 2.1 3.2 1. 1.1 2. 2.1 3.2 1. 1.1 2. 2.1 3.2 二、Https 1. 1.1 2. 2.1 3.2 1. 1.1 2. 2.1 3.2 1. 1.1 2. 2.1 3.2 1. 1.1 2. 2. 阅读全文

posted @ 2019-04-30 16:28 GuoZeping 阅读(249) 评论(0) 推荐(0) 编辑

Elasticsearch windows安装

摘要： Elasticsearch windows安装阅读全文

posted @ 2019-04-30 13:05 GuoZeping 阅读(330) 评论(0) 推荐(0) 编辑

Elasticsearch Linux安装

摘要： Elasticsearch Linux安装阅读全文

posted @ 2019-04-30 13:05 GuoZeping 阅读(196) 评论(0) 推荐(0) 编辑

Elasticsearch 简介

摘要： Elasticsearch 简介阅读全文

posted @ 2019-04-30 13:04 GuoZeping 阅读(254) 评论(0) 推荐(0) 编辑

Elasticsearch 目录

摘要： Elasticsearch 目录阅读全文

posted @ 2019-04-30 13:04 GuoZeping 阅读(253) 评论(0) 推荐(0) 编辑

分布式爬虫系统

摘要：分布式爬虫系统一、架构二、原理 1.分布式原理：利用scrapy-redis实现分布式，利用主从模式，把自己核心服务器称为master，用于跑爬虫程序的机器称为slave。我们知道，采用scrapy框架抓取网页，需要首先给定一些start_urls，爬虫首先访问start_urls里面的url 阅读全文

posted @ 2019-04-30 11:44 GuoZeping 阅读(1418) 评论(0) 推荐(0) 编辑

Scrapy 增量式爬虫

摘要： Scrapy 增量式爬虫 https://blog.csdn.net/mygodit/article/details/83931009 https://blog.csdn.net/mygodit/article/details/83896412 https://blog.csdn.net/qq_39 阅读全文

posted @ 2019-04-30 11:09 GuoZeping 阅读(176) 评论(0) 推荐(0) 编辑

数据存储 twisted

摘要：数据存储 twisted adbapi.ConnectionPool方法可以创建一个数据库连接池对象，其中包括多个连接对象，每个连接对象在独立的线程中工作。adbapi只是提供了异步访问数据库的编程框架，在其内部依然使mysql这样的库访问数据库。dbpool.runInteraction(inse 阅读全文

posted @ 2019-04-30 10:54 GuoZeping 阅读(147) 评论(0) 推荐(0) 编辑

数据存储 mongodb

摘要：数据存储 mongodb 1 from pymongo import MongoClient 2 import os 3 base_dir = os.getcwd() 4 class MongoPipeline(object): 5 # 实现保存到mongo数据库的类， 6 collection = 阅读全文

posted @ 2019-04-30 10:51 GuoZeping 阅读(172) 评论(0) 推荐(0) 编辑

数据存储 redis

摘要：数据存储 redis 阅读全文

posted @ 2019-04-30 10:51 GuoZeping 阅读(187) 评论(0) 推荐(0) 编辑

数据存储 txt

摘要：数据存储 txt 阅读全文

posted @ 2019-04-30 10:50 GuoZeping 阅读(236) 评论(0) 推荐(0) 编辑

数据存储 mysql

摘要：数据存储 mysql 一、MySQL 同步存储二、MySQL 异步存储 1 from scrapy import log 2 import pymysql 3 import pymysql.cursors 4 import codecs 5 from twisted.enterprise impo 阅读全文

posted @ 2019-04-30 10:50 GuoZeping 阅读(280) 评论(0) 推荐(0) 编辑

数据存储 csv

摘要：数据存储 csv 阅读全文

posted @ 2019-04-30 10:49 GuoZeping 阅读(245) 评论(0) 推荐(0) 编辑

数据存储 Json

摘要：数据存储 Json 一、JsonLInesEx 1 from scrapy.exporters import JsonLinesItemExporter 2 class JsonLinesItemExporterPipeline(object): 3 def __init__(self): 4 se 阅读全文

posted @ 2019-04-30 10:44 GuoZeping 阅读(161) 评论(0) 推荐(0) 编辑

start_requests

摘要： start_requests 简化前，我们需要定义一个方法：start_requests(self)，然后经过这个方法不断循环发送请求：简化后，以上的链接可以写在：start_urls这个常量里面，是不是省了好多事，人生是不是又美满了一大截？但是！上帝给你开一扇门，就会给你关另一扇门，用简化的方阅读全文

posted @ 2019-04-30 10:31 GuoZeping 阅读(2803) 评论(0) 推荐(0) 编辑

Scarpy 命令行工具

摘要： Scarpy 命令行工具一、Scarpy 全局命令 scrapy startproject project_name (创建项目) scrapy crawl xx （运行xxspider文件） scrapy shell http://www.scrapyd.cn (调试网址为http:www.sc 阅读全文

posted @ 2019-04-30 10:22 GuoZeping 阅读(186) 评论(0) 推荐(0) 编辑

vim 命令集合

摘要： vim 命令集合阅读全文

posted @ 2019-04-30 10:15 GuoZeping 阅读(201) 评论(0) 推荐(0) 编辑

custom_setting

摘要： custom_setting 一、定义二、配置 1.middlewares 1 # SeleniumMiddlerware中间件，不添加全局 2 from selenium import webdriver 3 from selenium.common.exceptions import Time 阅读全文

posted @ 2019-04-30 09:25 GuoZeping 阅读(816) 评论(0) 推荐(0) 编辑