爬虫框架学习
1. request.post 与request.data的区别
views
from django.shortcuts import render, HttpResponse def index(request): print(request.body) #输出结果:b'username=alex&password=123' print(request.POST) #request.POST解析不了JSON格式. return HttpResponse('......')
request模块模拟发送数据
import requests #请求体格式: # requests.data得到的数据 :b'username=alex&password=123' # requests.POST得到的数据 :<QueryDict: {'username': ['alex'], 'password': ['123']}> #以urlencoded格式发送过去的. r1 =requests.post( url ="http://127.0.0.1:8000/index/", data ={"username":"alex","password":123}, ) print(r1.text)
#请求体格式: # requests.data得到的数据 :b'username=alex&password=123' # requests.POST得到的数据 :<QueryDict: {}> #以json字符串格式发送 POST里拿不到数据. r1 =requests.post( url ="http://127.0.0.1:8000/index/", json ={"username":"alex","password":123}, ) print(r1.text)
2 .scrapy的安装工作
a. 下载twisted
http://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted
b. 安装wheel
pip3 install wheel
c. 安装twisted (进入下载到twisted的目录里进行安装.)
pip3 install Twisted‑18.7.0‑cp36‑cp36m‑win_amd64.whl
d. 安装pywin32
pip3 install pywin32
e. 安装scrapy
pip3 install scrapy
f. 创建爬虫项目
scrapy startproject xzx
创建spider
scrapy genspider chouti chouti.com (genspider相当于app)
1. scrapy 如何创建一个django程序? django-admin startproject mysite cd mysite python manage.py startapp app01 python manage.py runserver 如何创建一个scrapy程序? scrapy startproject xzx cd xzx scrapy genspider chouti chouti.com scrapy crawl chouti --nolog