Pyhon3爬虫(2)

Scrapy框架简单使用

下载scrapy模块

pip install scrapy

进入要存放工程的路径,创建工程

scrapy startproject scrapyDemo

进入spiders目录,新建scrapy_demo.py

import scrapy
from bs4 import BeautifulSoup

class tsSpider(scrapy.Spider):
    name = "demo"

    def start_requests(self):
        urls = [r'https://www.cnblogs.com/', ]
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'}
        for url in urls:
            yield scrapy.Request(url=url, headers=headers, callback=self.parse)


    def parse(self, response):
        soup = BeautifulSoup(response.body, "html.parser")
        titles = soup.find_all("a", "titlelnk")
        for title in titles:
            print(title.string)

输入

scrapy crawl demo

 

posted @ 2018-04-22 16:09  背向我煮面  阅读(193)  评论(0编辑  收藏  举报