关于Scrapy中post请求
Scrapy默认的是get请求,想要发送post请求,就需要再method中说明,一般常用写法如下
scrapy.Request(url=url,method="POST",headers=self.headers,callback=self.get_goods_list)
但post请求通常会带有表单参数,对于表单参数的注入,引出了两种方式,这里说明一下。
一、FormRequest
普通请求使用scrapy.Request类就可以实现,但是遇到模拟表单或Ajax提交post请求的时候,就可以使用Request 子类 FormRequest类,因为他自带 formdata ,专门用来设置表单字段数据,默认method也是POST。
scrapy.FormRequest(url=url,formdata=formdata,cookies=self.cookie,headers=self.headers,callback=self.get_goods_list)
但要注意的是,这里的formdata是dict格式的,里面不能存在数字,如果有数字用引号括起来;
如下:
formdata = {"mode": "list", "year: ": "default","prev":"false","side_year":""} yield FormRequest(url=new_url, formdata = formdata, callback=self.parse_category, meta=meta)
在FormRequest的说明文档中介绍:
The FormRequest class adds a new argument to the constructor. The remaining arguments are the same as for the Request class and are not documented here.
Parameters: formdata (dict or iterable of tuples) – is a dictionary (or iterable of (key, value) tuples) containing HTML Form data which will be url-encoded and assigned to the body of the request.
说FormRequest新增加了一个参数formdata,接受包含表单数据的字典或者可迭代的元组,并将其转化为请求的body。并且FormRequest是继承Request的。
class FormRequest(Request): def __init__(self, *args, **kwargs): formdata = kwargs.pop('formdata', None) if formdata and kwargs.get('method') is None: kwargs['method'] = 'POST' super(FormRequest, self).__init__(*args, **kwargs) if formdata: items = formdata.items() if isinstance(formdata, dict) else formdata querystr = _urlencode(items, self.encoding) if self.method == 'POST': self.headers.setdefault(b'Content-Type', b'application/x-www-form-urlencoded') self._set_body(querystr) else: self._set_url(self.url + ('&' if '?' in self.url else '?') + querystr) ### def _urlencode(seq, enc): values = [(to_bytes(k, enc), to_bytes(v, enc)) for k, vs in seq for v in (vs if is_listlike(vs) else [vs])] return urlencode(values, doseq=1)
最终我们传递的{'key': 'value', 'k': 'v'}会被转化为'key=value&k=v' 并且默认的method是POST,这其实也暗示如果表单参数很少的时候,直接拼接到url上会更方便一些。
再来看看Request
二、Request
scrapy.Request(url=url,method="POST",body=formdata,cookies=self.cookie,headers=self.headers,callback=self.get_goods_list)
但这里的formdata必须得是序列化的json字符串,如果是表单格式,那么需要用json.dumps()转为字符串格式
formdata = {"mode": "list", "year: ": "default","prev":"false","side_year":""} yield Request(url=new_url, body = json.dumps(formdata), callback=self.parse_category, meta=meta)