python全栈(中国历史天气可视化系统)

 

 

在前面博客中写过python爬虫爬取各个城市历史天气及数据可视化,感觉还是有点粗糙,本文章是借这个例子抛砖引玉,写出个更NB的python全栈项目。

此项目预览页面全国各城市天气可视化系统

scrapy抓取全国各城市的历史天气

创建项目及准备工作

创建项目

scrapy startproject lishitianqi

创建爬虫脚本文件(考虑到tianqihoubao.com访问速度实在是太慢,现在换用lishi.tianqi.com

cd lishitianqi
scrapy genspider weath lishi.tianqi.com

测试该url是否可以直接爬取 

scrapy crawl weath
...
2019-09-09 13:22:21 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 http://lishi.tianqi.com/>: HTTP status code is not handled or not allowed
...

在测试是否是user-agent导致

In [15]: import requests

In [16]: requests.get("https://lishi.tianqi.com/")
Out[16]: <Response [403]>

In [17]: requests.get("https://lishi.tianqi.com/",headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl
    ...: eWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'})
Out[17]: <Response [200]>

在setings.py中设置user-agent的值

USER_AGENT='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'

编写爬虫文件获取天气数据

请求http://lishi.tianqi.com/后将返回全国所有城市页面,通过css选择器选择所有城市url,并添加到爬虫中

def parse(self, response):
        #选择中国所有城市名及对应url
        citys = response.css(".bcity a[target='_blank']::text").getall()
        cityurls = response.css(".bcity a[target='_blank']::attr(href)").getall()

        #将url添加到爬虫中,
        for city,url in zip(citys,cityurls):
            yield scrapy.Request(response.urljoin(url), meta={'city':city}, callback=self.city)

    def city(self, response):
        pass

请求http://lishi.tianqi.com/beijing/index.html,返回北京2011年1月-至今的所有月份页面,通过css选择器选择该城市所有月份url,并添加到爬虫中

def city(self, response):
        #选择城市所有年月份及对应url
        months = response.css(".tqtongji1 a::text").getall()
        monthurls = response.css(".tqtongji1 a::attr(href)").getall()

        #将url添加到爬虫中
        for url in monthurls:
            yield scrapy.Request(response.urljoin(url), meta={'city':response.meta['city']}, callback=self.month)

    def month(self, response):
        pass

请求http://lishi.tianqi.com/beijing/201101.html,返回北京2011年1月天气页面,通过css选择器选择该月的每天的天气

    def month(self, response):
        #选择该月份中每天的天气数据
        weaths =  response.css(".tqtongji2 ul[class!='t1'] li::text").getall()
        #日期
        date = response.css(".tqtongji2 ul[class!='t1'] a::text").getall()
        #最大温度
        maxt = weaths[0::5]
        #最小温度
        mint = weaths[1::5]
        #天气
        weath = weaths[2::5]
        #风向
        windd = weaths[3::5]
        #风力
        windp = weaths[4::5]

        #打印该数据
        for date, maxt, mint, weath, windd, windp in zip(date, maxt, mint, weath, windd, windp):
            print(response.meta['city'],date, maxt, mint, weath, windd, windp)

 

scrapy数据存储到MariaDB

创建数据库和表

MariaDB [(none)]> create database lishitianqi;
Query OK, 1 row affected (0.209 sec)

MariaDB [(none)]> use lishitianqi;
Database changed
MariaDB [lishitianqi]> CREATE TABLE `lishitianqi` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `city` varchar(16) NOT NULL,
  `date` date NOT NULL,
  `maxt` int(11) NOT NULL,
  `mint` int(11) NOT NULL,
  `weath` varchar(20) NOT NULL,
  `windd` varchar(20) NOT NULL,
  `windp` varchar(40) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `city` (`city`,`date`)
) ENGINE=InnoDB AUTO_INCREMENT=30673 DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (2.030 sec)

编写模型文件items.py

class LishitianqiItem(scrapy.Item):
    # define the fields for your item here like:
    city = scrapy.Field()
    date = scrapy.Field()
    maxt = scrapy.Field()
    mint = scrapy.Field()
    weath = scrapy.Field()
    windd = scrapy.Field()
    windp = scrapy.Field()

修改爬虫文件weath.py

 1 # -*- coding: utf-8 -*-
 2 import scrapy
 3 from lishitianqi.items import LishitianqiItem
 4 
 5 class WeathSpider(scrapy.Spider):
 6     name = 'weath'
 7     allowed_domains = ['lishi.tianqi.com']
 8     start_urls = ['http://lishi.tianqi.com/']
 9 
10     def parse(self, response):
11         #选择中国所有城市名及对应url
12         citys = response.css(".bcity a[target='_blank']::text").getall()
13         cityurls = response.css(".bcity a[target='_blank']::attr(href)").getall()
14 
15         #将url添加到爬虫中,
16         for city,url in zip(citys[:1],cityurls[:1]):
17             yield scrapy.Request(response.urljoin(url), meta={'city':city}, callback=self.city)
18 
19     def city(self, response):
20         #选择城市所有年月份及对应url
21         months = response.css(".tqtongji1 a::text").getall()
22         monthurls = response.css(".tqtongji1 a::attr(href)").getall()
23 
24         #将url添加到爬虫中
25         for url in monthurls:
26             yield scrapy.Request(response.urljoin(url), meta={'city':response.meta['city']}, callback=self.month)
27 
28     def month(self, response):
29         #选择该月份中每天
30         for ul in response.css(".tqtongji2 ul[class!='t1']"):
31             #创建item,将数据传递个pipeline
32             item = LishitianqiItem()
33             #获取城市名
34             item['city'] = response.meta['city']
35             
36             a,b,c,d,e,f = ul.css('li')
37             #获取日期
38             item['date'] = a.css("::text").get()
39             #获取日期
40             item['maxt'] = b.css("::text").get()
41             #获取日期
42             item['mint'] = c.css("::text").get()
43             #获取日期
44             item['weath'] = d.css("::text").get()
45             #获取日期
46             item['windd'] = e.css("::text").get()
47             #获取日期
48             item['windp'] = f.css("::text").get()
49 
50             yield item
View Code

编写项目管道中间件pipeline.py

import pymysql.cursors

class LishitianqiPipeline(object):

    def __init__(self):
        #连接数据库
        self.connect = pymysql.connect(
            host='localhost',
            port=3306,
            db='lishitianqi',
            user='root',
            passwd='123',
            charset='utf8',
            use_unicode=True)

        #通过cursor执行增删该查
        self.cursor = self.connect.cursor()

    def process_item(self, item, spider):

        #插入天气数据
        self.cursor.execute(
            """insert into lishitianqi(city, date, maxt, mint, weath, windd, windp)
            values (%s, %s, %s, %s, %s, %s, %s)""",
            (item['city'],
            item['date'],
            item['maxt'],
            item['mint'],
            item['weath'],
            item['windd'],
            item['windp']))

        #提交sql语句
        self.connect.commit()

        #必须实现返回
        return item

在settings.py中启用mysqlPipeline

ITEM_PIPELINES = {
    'lishitianqi.pipelines.LishitianqiPipeline': 300,
    #格式为:'项目名.文件名.类名':优先级(越小越大)
}

爬虫性能优化

首先看一下目标数据有多大,估计要多长时间能抓取完

查看有多少个城市

#scrapy shell https://lishi.tianqi.com/
In [1]: citys = response.css(".bcity a[target='_blank']::text").getall()

In [2]: len(citys)
Out[2]: 3147

将weath.py的16行代码修改为for city,url in zip(citys[:1],cityurls[:1]):,运行爬虫,验证爬取一个城市数据耗时

2019-09-09 17:37:52 [scrapy.utils.log] INFO: Scrapy 1.7.3 started (bot: lishitianqi)
... 2019-09-09 17:41:54 [scrapy.core.engine] INFO: Spider closed (finished)

一个城市耗时4分2s,共需要8.7天才能抓取到所有数据。在我前面写MySQL日志详解时,将二进制日志打开了,这个对性能影响比较大

关掉二进制日志,在重启服务

[mysqld]
#log_bin=bin_log_file
#binlog_format=row
#sync_binlog=1
#server_id=7  

查看是否生效

MariaDB [(none)]> show variables like "log_bin";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| log_bin       | OFF   |
+---------------+-------+
1 row in set (0.001 sec)

运行爬虫,验证关闭了二进制日志后抓取一个城市数据耗时

2019-09-09 17:50:14 [scrapy.utils.log] INFO: Scrapy 1.7.3 started (bot: lishitianqi)
...
2019-09-09 17:53:01 [scrapy.core.engine] INFO: Spider closed (finished)

耗时2分47秒,这个数据还是有点大,验证不插入数据库纯爬取网页的耗时,将weath.py第50行代码注释掉

			#yield item

运行爬虫,验证不写入数据库,抓取一个城市数据耗时  

2019-09-09 18:40:07 [scrapy.utils.log] INFO: Scrapy 1.7.3 started (bot: lishitianqi)
2019-09-09 18:40:12 [scrapy.core.engine] INFO: Spider closed (finished)

不写入数据库只需要5s,看来还是要在插入数据库效率上找原因。

在上面的代码实现,是以天为单位调用插入接口,可以按月批量插入来提上效率

修改后的weath.py

 1 # -*- coding: utf-8 -*-
 2 import scrapy
 3 from lishitianqi.items import LishitianqiItem
 4 
 5 class WeathSpider(scrapy.Spider):
 6     name = 'weath'
 7     allowed_domains = ['lishi.tianqi.com']
 8     start_urls = ['http://lishi.tianqi.com/']
 9 
10     def parse(self, response):
11         #选择中国所有城市名及对应url
12         citys = response.css(".bcity a[target='_blank']::text").getall()
13         cityurls = response.css(".bcity a[target='_blank']::attr(href)").getall()
14 
15         #将url添加到爬虫中,
16         for city,url in zip(citys[:1],cityurls[:1]):
17             yield scrapy.Request(response.urljoin(url), meta={'city':city}, callback=self.city)
18 
19     def city(self, response):
20         #选择城市所有年月份及对应url
21         months = response.css(".tqtongji1 a::text").getall()
22         monthurls = response.css(".tqtongji1 a::attr(href)").getall()
23 
24         #将url添加到爬虫中
25         for url in monthurls:
26             yield scrapy.Request(response.urljoin(url), meta={'city':response.meta['city']}, callback=self.month)
27 
28     def month(self, response):
29         data = []
30         #选择该月份中每天
31         for ul in response.css(".tqtongji2 ul[class!='t1']"):
32             data.append(tuple([response.meta['city']] + [li.css("::text").get() for li in ul.css('li')]))
33 
34         #创建item,将数据传递个pipeline
35         item = LishitianqiItem()
36         item['data'] = data
37         yield item
View Code

修改后的items.py

 1 # -*- coding: utf-8 -*-
 2 
 3 # Define here the models for your scraped items
 4 #
 5 # See documentation in:
 6 # https://docs.scrapy.org/en/latest/topics/items.html
 7 
 8 import scrapy
 9 
10 class LishitianqiItem(scrapy.Item):
11     # define the fields for your item here like:
12     city = scrapy.Field()
13     date = scrapy.Field()
14     maxt = scrapy.Field()
15     mint = scrapy.Field()
16     weath = scrapy.Field()
17     windd = scrapy.Field()
18     windp = scrapy.Field()
19     data = scrapy.Field()
View Code

修改后的pipelines.py

 1 # -*- coding: utf-8 -*-
 2 
 3 # Define your item pipelines here
 4 #
 5 # Don't forget to add your pipeline to the ITEM_PIPELINES setting
 6 # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
 7 
 8 import pymysql.cursors
 9 
10 class LishitianqiPipeline(object):
11 
12     def __init__(self):
13         #连接数据库
14         self.connect = pymysql.connect(
15             host='localhost',
16             port=3306,
17             db='lishitianqi',
18             user='root',
19             passwd='123',
20             charset='utf8',
21             use_unicode=True)
22 
23         #通过cursor执行增删该查
24         self.cursor = self.connect.cursor()
25 
26     def process_item(self, item, spider):
27 
28         #插入天气数据
29         self.cursor.executemany(
30             """insert ignore into lishitianqi(city, date, maxt, mint, weath, windd, windp)
31             values (%s, %s, %s, %s, %s, %s, %s)""", item['data'])
32 
33         #提交sql语句
34         self.connect.commit()
35 
36         #必须实现返回
37         return item
View Code

抓取一个城市数据耗时

2019-09-09 19:30:45 [scrapy.utils.log] INFO: Scrapy 1.7.3 started (bot: lishitianqi)
2019-09-09 19:30:56 [scrapy.core.engine] INFO: Spider closed (finished)

耗时11s,抓取所有数据共计9.6小时,这个时间可以接受了,睡个觉第二天就好了

数据可靠性优化及修复

 从网站抓取来的数据并非可靠,在前面代码中或许你们发现了,在抓取每天天气数据只使用了一次css选择,然后切片分出各种类型的天气,

但是这种方式严格依赖数据格式,如果少了一个数据就会导致数据整体偏移,切片出来的结果混乱,后面我多次css选择解决了这个问题。

除了上面描述的一天中某种类型天气数据缺失,还可能有某天数据缺失,比如:

MariaDB [lishitianqi]> select * from lishitianqi where city="苏右旗" and date like "2011-01%";
+-------+--------+------------+------+------+-----------+---------------+-------------+
| id    | city   | date       | maxt | mint | weath     | windd         | windp       |
+-------+--------+------------+------+------+-----------+---------------+-------------+
| 20905 | 苏右旗 | 2011-01-01 |  -13 |  -23 | 晴        | 东北风~东风   | 4-5级       |
| 20906 | 苏右旗 | 2011-01-02 |  -12 |  -23 | 多云      | 东风~西南风   | 5-6级~4-5级 |
| 20907 | 苏右旗 | 2011-01-03 |  -13 |  -24 | 多云      | 西南风~西北风 | 4-5级~5-6级 |
| 20908 | 苏右旗 | 2011-01-04 |  -15 |  -25 | 小雪~多云 | 西北风        | 5-6级~4-5级 |
| 20909 | 苏右旗 | 2011-01-05 |  -17 |  -24 | 多云      | 西北风        | 5-6级~4-5级 |
| 20910 | 苏右旗 | 2011-01-06 |  -15 |  -22 | 晴        | 西北风~西风   | 5-6级~4-5级 |
| 20911 | 苏右旗 | 2011-01-07 |  -16 |  -24 | 晴        | 西北风        | 5-6级~4-5级 |
| 20912 | 苏右旗 | 2011-01-08 |  -18 |  -27 | 晴        | 西风          | 5-6级~4-5级 |
| 20913 | 苏右旗 | 2011-01-09 |  -12 |  -23 | 多云      | 西风          | 5-6级~4-5级 |
| 20914 | 苏右旗 | 2011-01-10 |  -16 |  -23 | 晴        | 西风          | 5-6级~4-5级 |
| 20915 | 苏右旗 | 2011-01-11 |  -16 |  -28 | 晴~多云   | 西风          | 4-5级       |
| 20916 | 苏右旗 | 2011-01-12 |  -17 |  -27 | 小雪~多云 | 西北风        | 5-6级~4-5级 |
| 20917 | 苏右旗 | 2011-01-13 |  -18 |  -26 | 多云      | 西北风        | 4-5级~5-6级 |
| 20918 | 苏右旗 | 2011-01-14 |  -14 |  -28 | 晴        | 北风~东北风   | 4-5级       |
| 20919 | 苏右旗 | 2011-01-15 |  -15 |  -25 | 多云~晴   | 西北风~北风   | 5-6级~4-5级 |
| 20920 | 苏右旗 | 2011-01-16 |  -18 |  -30 | 小雪~多云 | 西北风~西风   | 5-6级~4-5级 |
| 20921 | 苏右旗 | 2011-01-26 |  -12 |  -26 | 晴        | 西北风        | 5-6级       |
| 20922 | 苏右旗 | 2011-01-29 |  -14 |  -25 | 多云~晴   | 西北风        | 4-5级~5-6级 |
| 20923 | 苏右旗 | 2011-01-30 |  -10 |  -19 | 晴        | 西北风        | 4-5级~5-6级 |
| 20924 | 苏右旗 | 2011-01-31 |   -6 |  -19 | 晴        | 西风          | 4-5级~5-6级 |
+-------+--------+------------+------+------+-----------+---------------+-------------+
20 rows in set (0.002 sec)

其中缺失了17日-25日数据,我的解决方案是:若有缺失数据,尝试从备用网站获取,将数据补全,若备用网站没有该城市或同样缺失该数据的按缺失区间取平均值补全。

备用网站tianqihoubao.com,创建一个数据修复蜘蛛

scrapy genspider recovery tianqihoubao.com

https://lishi.tianqi.com/支持查询的所有城市列表存入citys_primary.txt

#scrapy shell https://lishi.tianqi.com/
...
In [1]: citys = response.css(".bcity a[target='_blank']::text").getall()

In [2]: f = open("citys_primary.txt","+w",encoding="utf-8")

In [3]: f.write(str(citys))
Out[3]: 19326

In [4]: f.close()

http://www.tianqihoubao.com/lishi/支持查询的所有城市列表存入citys_backup.txt

scrapy shell http://www.tianqihoubao.com/lishi/
...
In [9]: text = response.body.decode("GB2312")

In [10]: from scrapy.selector import Selector

In [11]: sel = Selector(text=text)

In [12]: citys = sel.css(".citychk dd a::text").getall()

In [13]: citys = [city.strip(' ') for city in citys] In [14]: f = open("citys_backup.txt","+w",encoding="utf-8") In [15]: f.write(str(citys)) Out[15]: 2601 In [16]: f.close()

 爬虫文件recovery.py

 1 # -*- coding: utf-8 -*-
 2 import scrapy
 3 import re
 4 from lishitianqi.items import LishitianqiItem
 5 
 6 class RecoverySpider(scrapy.Spider):
 7     name = 'recovery'
 8     allowed_domains = ['tianqihoubao.com']
 9 
10     def parse(self, response):
11         tableTitle = response.css(".wdetail h1::text").get()
12         #获取城市名,为了防止出现空列表导致KeyError故用join,''.join([]) == ''
13         city = ''.join(re.findall("([\u4E00-\u9FA5]+)历史天气预报",tableTitle))
14 
15         data = []
16         for tr in response.css(".wdetail table tr")[1:]:
17             tds = tr.css("td")
18             #获取日期
19             date = tds[0].css("a::text").get()
20             dl = re.findall("(\d{4})年(\d{2})月(\d{2})日",date)
21             date = '-'.join(dl[0])  if dl!= [] else '1990-00-00'
22 
23             #获取天气
24             weath = tds[1].css("::text").get()
25             wl = re.findall("[\u4E00-\u9FA5]+", weath)
26             weath = '~'.join(wl) if len(wl) == 2 else '未知~未知'
27 
28             #获取最低、最高气温
29             tem = tds[2].css("::text").get()
30             tl = re.findall("(\d+|-\d+)℃", tem)
31             maxt,mint = tl if len(tl) == 2 else [0,0]
32 
33             #获取风向,风速度
34             win = tds[3].css("::text").get()
35             wil = re.findall("[\u4E00-\u9FA50-9-\u2265\u2264]+", win)
36             wil = [re.sub('[≤]','小于等于', w) for w in wil]
37             wil = [re.sub('[≥]','大于等于', w) for w in wil]
38             windd,windp = ('~'.join(wil[0::2]),'~'.join(wil[1::2])) if len(wil) == 4 else ("未知","未知")
39 
40             data.append([city, date, maxt, mint, weath, windd, windp])
41 
42         #创建item,将数据传递个pipeline
43         item = LishitianqiItem()
44         item['data'] = data
45         yield item
View Code 

修改pipelines.py文件

 1 # -*- coding: utf-8 -*-
 2 
 3 # Define your item pipelines here
 4 #
 5 # Don't forget to add your pipeline to the ITEM_PIPELINES setting
 6 # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
 7 
 8 import pymysql.cursors
 9 
10 #用于获取某年某月有多少天, 
11 import calendar
12 
13 #用于获取汉字的拼音, pip install xpinyin
14 from xpinyin import Pinyin
15 
16 
17 class LishitianqiPipeline(object):
18 
19     def __init__(self):
20         #连接数据库
21         self.connect = pymysql.connect(
22             host='localhost',
23             port=3306,
24             db='lishitianqi',
25             user='root',
26             passwd='123',
27             charset='utf8',
28             use_unicode=True)
29 
30         #通过cursor执行增删该查
31         self.cursor = self.connect.cursor()
32 
33     def process_item(self, item, spider):
34 
35         #插入天气数据
36         self.cursor.executemany(
37             """insert ignore into lishitianqi(city, date, maxt, mint, weath, windd, windp)
38             values (%s, %s, %s, %s, %s, %s, %s)""", item['data'])
39 
40         #提交sql语句
41         self.connect.commit()
42 
43         #必须实现返回
44         return item
45 
46     #在爬虫运行前执行这个函数
47     def open_spider(self, spider):
48 
49         if spider.name == 'recovery' and True:
50             #读取https://lishi.tianqi.com/和http://www.tianqihoubao.com/lishi/支持查询的所有城市列表存, eval将字符串变为python语句, 转换为set用于取交集
51             citys_primary = set(eval(open("citys_primary.txt","r",encoding="utf-8").read()))
52             citys_backup = set(eval(open("citys_backup.txt","r",encoding="utf-8").read()))
53             startyear = 2011
54             endyear = 2019
55             endmonth = 9
56             pin = Pinyin()
57 
58             def check_or_appendurl(city,year,month):
59                 #计算year年month月有多少天
60                 stmr = calendar.monthrange(year, month)[1]
61                 #通过sql获取city城市year年month月有多少天
62                 sql = "select count(city) from lishitianqi where city='%s' and date like '%d-%02d%%'" % (city,year,month)
63                 self.cursor.execute(sql)
64                 apmr = self.cursor.fetchall()[0][0]
65 
66                 #若数据库中某城市某月的天数不等于该月的实际天数,说明这个城市这个月的数据有丢失,需要构造url到备用网站获取天气数据
67                 if stmr != apmr:
68                     url = "http://www.tianqihoubao.com/lishi/%s/month/%d%02d.html" % (pin.get_pinyin(city, ''),year,month)
69                     print(stmr,apmr,'append url: %s' % url)
70                     spider.start_urls.append(url)
71 
72             #citys_primary为在https://lishi.tianqi.com/支持查询的所有城市列表存,citys_backup为http://www.tianqihoubao.com/lishi/支持查询的所有城市列表
73             #理想情况下爬虫https://lishi.tianqi.com/就能获取到所有需要的数据,但是从网站抓取来的数据并非可靠,有可能某个月的某几天数据丢失,
74             #所有需要到备用网站将数据库缺失数据补齐
75             #备用网站数据集与主用网站数据集不重合,所以在做数据修复的时候要取交集
76             for city in citys_primary & citys_backup:
77                 for year in range(startyear,endyear):
78                     for month in range(1,13):
79                         check_or_appendurl(city,year,month)
80 
81                 for month in range(1,endmonth):
82                     check_or_appendurl(city,endyear,month)
83 
84 
85     def close_spider(self, spider):
86         pass
View Code

 

数据实时更新

每个月都会产生新的数据,如果重新运行前面较本过于耗时,需要创建一个更新数据的蜘蛛

scrapy genspider update lishi.tianqi.com

...  

关于乱码的终极解决方案

如果发现某个页面出现乱码,用scrapy shell登入该页面,respons.text为已经解码为字符串,由于解码格式不对会显示为乱码,response.body为未解码的字节流。

python库chardet可以根据字节流预测该字节流的编码格式,给定字节流越长,判断越准确

pip install chardet  

预测response.body编码格式

In [53]: import chardet

In [54]: chardet.detect(response.body) Out[54]: {'encoding': 'GB2312', 'confidence': 0.99, 'language': 'Chinese'}

GB2312编码的概率为99%,将response.body用GB2312格式解码,就可以看到正确的结果了。

In [63]: text = response.body.decode("GB2312")

In [64]: from scrapy.selector import Selector

In [65]: sel = Selector(text=text)

In [66]: sel.css("h1::text").get()
Out[66]: '全国历史天气查询'

  

django实现单页应用

需求分析

网站需要支持如下可视化功能(city代表城市,citys代表城市集,year代表年份,years代表年份集,month代表月份。这些字段都为输入项):

  • citys-years月平均气温(曲线图)
  • city-year各天气比率(饼图)
  • citys-years-month日平均气温(曲线图)
  • city-year-month各天气比率(饼图)

效果预览:pengliangyuan.club

创建django工程和app

创建django项目

django-admin startproject webapps

创建webapp  

cd webapps
python .\manage.py startapp history_weath

django使用mysql并引入现有数据库

在前面蜘蛛已经将databaselishitianqi,tablelishitianqi插入了数据,现在将这些数据引入到django中(根据数据库表生成django模型)

修改settings.py文件

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'lishitianqi',   #数据库的名字
        'USER': 'root',
        'PASSWORD':'123',
        'HOST':'localhost',
        'PORT':'3306',
    }
} 

修改__init__.py

import pymysql
pymysql.install_as_MySQLdb()

 

执行python manage.py inspectdb命令,根据mysql数据表生成django模型

#python manage.py inspectdb
...
class Lishitianqi(models.Model):
    city = models.CharField(max_length=16)
    date = models.DateField()
    maxt = models.IntegerField()
    mint = models.IntegerField()
    weath = models.CharField(max_length=20)
    windd = models.CharField(max_length=20)
    windp = models.CharField(max_length=40, blank=True, null=True)

    class Meta:
        managed = False
        db_table = 'lishitianqi'
        unique_together = (('city', 'date'),)

将上述模型copy到app的模型文件中,运行python manage.py makemigrations创建这些更改的迁移

#python manage.py makemigrations history_weath
Migrations for 'history_weath':
  history_weath\migrations\0001_initial.py
    - Create model Lishitianqi

运行python manage.py migrate,将这些更改应用到数据库

#python manage.py migrate
...
Operations to perform:
  Apply all migrations: admin, auth, contenttypes, history_weath, sessions
Running migrations:
  Applying history_weath.0001_initial... OK

创建城市数据库表,并标记热门城市 

创建数据表citys,用于表示该webapp支持的所有城市,以及热门城市

CREATE TABLE `citys` (
  `city` varchar(16) NOT NULL,
  `hot` tinyint(1) DEFAULT 0,
  PRIMARY KEY (`city`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

lishitianqi中查找大所有城市,插入到citys表中,字段distinct用于去重

insert into citys(city) select distinct city from lishitianqi;

添加热门城市

update citys set hot=1 where city in ("三亚","上海","乌鲁木齐","北京","南京","厦门","合肥","天津","广州","成都","拉萨","昆明","杭州","武汉","济南","深圳","西安","贵阳","郑州"," 重庆","长沙","青岛");

生成django模型,粘贴到history_weath/models.py,并执行迁移命令(操作方式同一小节)

路由

history_weath作为一个django app,访问该app的url格式是这样的 DomainName/history_weath/xxx,

编辑webapps/urls.py

from django.contrib import admin
from django.urls import path,include
from history_weath.views import index 


urlpatterns = [
    path('', index)                           #将DomainName重定向到DomainName/history_weath,见history_weath/views.py中的index函数(后面小节会附上)
    path('history_weath/', include('history_weath.urls')),   #引用history_weath/urls.py,访问url = 引用url + 被引用url 
    path('admin/', admin.site.urls),,
]

新建history_weath/urls.py

from django.urls import path
from . import views

app_name = 'history_weath'
urlpatterns = [
    path('', views.home, name='home'),               #history_weath这个app的主页
    path('temperature_year', views.temperature_year),      #url: DomainName/history_weath/temperature_year, 用于ajax(POST)请求citys,years的月平均气温
    path('temperature_month', views.temperature_month),     #url: DomainName/history_weath/temprature_month, 用于ajax(POST)请求citys,years,month日平均气温
    path('weath_year_ratio', views.weath_year_ratio),      #url: DomainName/history_weath/weath_year_ratio, 用于ajax(POST)请求city,year天气比率
    path('weath_month_ratio', views.weath_month_ratio),     #url: DomainName/history_weath/weath_month_ration, 用于ajax(POST)请求city,year,month天气比率
]

视图

 history_weath为单页应用,index用于网站根url到app主页的重定向,home为单页应用的主页,其他的视图个为只专门处理ajax(POST)请求的。

 history_weath/views.py

from django import views
from django.http import HttpResponse
from django.shortcuts import render, redirect,reverse
from .models import Lishitianqi, Citys
import django.utils.timezone as timezone
from xpinyin import Pinyin
import string
import json
from django.http import Http404
from django.db.models import Avg


#将DomainName重定向到DomainName/history_weath
def index(request):
    return redirect(reverse("history_weath:home"))

#history_weath这个app的主页
def home(request):
    hot_citys =    Citys.objects.filter(hot=1)
    
    #汉字转换拼音
    pin = Pinyin()

    #生成{'A':[], 'B':[], 'C':[] ...}, 为什么用[[] for i in range(26)]而不用[] * 26, 因为前者是浅拷贝,后者是引用
    citys = dict(zip(list(string.ascii_uppercase),[[] for i in range(26)])) 
    
    #将城市名转换为拼音,提取首字母,按不同首字母插入到citys中
    for item in Citys.objects.all():
        city = item.city
        first_letter = pin.get_pinyin(city)[:1].upper()
        citys[first_letter].append(city)
    
    #生成2011至今的年份列表
    years = [year for year in range(2011,timezone.now().year + 1)]
    context = {
        'hot_citys':hot_citys,
        'citys':citys,
        'years':years
    } 

    #将这些数据放入到django模板渲染后显示给用户
    return render(request, 'history_weath/home.html', {'context':context})
    

#url: DomainName/history_weath/temperature_year, 用于ajax(POST)请求citys,years的月平均气温
def temperature_year(request):
    post_data = json.loads(request.body)
    citys = post_data.get('citys',[])
    years = post_data.get('years',[])
    print("REQUEST:",citys,years)
    if citys == [] or years == []:
        raise Http404("Question does not exist")
    else:
        result = []
        for city in citys:
            for year in years:
                city_data = {'name':'%s(%s年)'%(city,year), 'data':[]}
                for month in range(1,13):
                    #分别计算一个月的每天最高温度平均温度,一个月的每天最低温度平均温度
                    month_avg = Lishitianqi.objects.filter(city=city,date__year=year,date__month=month).aggregate(Avg('maxt'),Avg('mint'))
                    #在将一个月的每天最高温度平均温度,一个月的每天最低温度平均温度求平均
                    mavg = round((month_avg['maxt__avg'] + month_avg['mint__avg'])/2) if month_avg['maxt__avg'] is not None else 0
                    city_data['data'].append(mavg)
                result.append(city_data)

        #该曲线图参考https://jshare.com.cn/demos/hhhhxu,其中series.data即为ajax请求返回的数据
        print("RESPONSE:",result)
        return HttpResponse(json.dumps(result))

#url: DomainName/history_weath/temprature_month, 用于ajax(POST)请求citys,years,month日平均气温
def temperature_month(request):
    post_data = json.loads(request.body)
    citys = post_data.get('citys',[])
    years = post_data.get('years',[])
    month = post_data.get('month',"")
    print("REQUEST",citys, years, month)
    if citys == [] or years == [] or month == "":
        raise Http404("Question does not exist")
    else:
        result = []
        for city in citys:
            for year in years:
                month_obj = Lishitianqi.objects.filter(city=city,date__year=year,date__month=month)
                #在这里,日平均温度 = 日最高温度 + 日最低温度 / 2
                month_data = [round((day.maxt + day.mint)/2,1) for day in month_obj]
                city_data = {'name':'%s(%s年%s月)'%(city,year,month), 'data':month_data}
                result.append(city_data)
        
        #该曲线图参考https://jshare.com.cn/demos/hhhhxu,其中series.data即为ajax请求返回的数据
        print("RESPONSE",result)
        return HttpResponse(json.dumps(result))

#url: DomainName/history_weath/weath_year_ratio, 用于ajax(POST)请求city,year天气比率
def weath_year_ratio(request):
    post_data = json.loads(request.body)
    city = post_data.get('city',"")
    year = post_data.get('year',"")
    print("REQUEST",city, year)
    if city == "" or year == "":
        raise Http404("Question does not exist")
    else:
        
        weath_year = Lishitianqi.objects.filter(city=city,date__year=year)
        weath_cnts = [d.weath for d in weath_year]
        weath_mods = set(weath_cnts)
        data=[]
        for wd in weath_mods:
            cnt = weath_cnts.count(wd)
            data.append({
                'name':wd,                        #天气类型,如晴,多云,多云转晴,小雨
                'y': cnt / len(weath_cnts)        #该类天气在一年天气中所占比例
                })
        
        #按天气比例排序
        data.sort(key=lambda x:x['y'])

        #将比例最高天气从饼中切出
        data[-1]['sliced'] = 1
        #选择比例最高的天气
        data[-1]['selected'] = 1

        #该饼图参考https://jshare.com.cn/demos/hhhhDX,其中series.data即为ajax请求返回的数据
        result = {'name':'%s(%s年)'%(city,year), 'data':data}
        print("RESPONSE",result)
        return HttpResponse(json.dumps(result))

#url: DomainName/history_weath/weath_month_ration, 用于ajax(POST)请求city,year,month天气比率
def weath_month_ratio(request):
    post_data = json.loads(request.body)
    city = post_data.get('city',"")
    year = post_data.get('year',"")
    month = post_data.get('month',"")
    print("REQUEST",city, year,month)
    if city == "" or year == "" or month == "":
        raise Http404("Question does not exist")
    else:
        
        weath_year = Lishitianqi.objects.filter(city=city,date__year=year,date__month=month)
        weath_cnts = [d.weath for d in weath_year]
        weath_mods = set(weath_cnts)
        data=[]
        for wd in weath_mods:
            cnt = weath_cnts.count(wd)
            data.append({
                'name':wd,                    #天气类型,如晴,多云,多云转晴,小雨
                'y': cnt / len(weath_cnts)    #该类天气在一个月天气中所占比例
                })

        #按天气比例排序
        data.sort(key=lambda x:x['y'])

        #将比例最高天气从饼中切出
        data[-1]['sliced'] = 1
        #选择比例最高的天气
        data[-1]['selected'] = 1

        #该饼图参考https://jshare.com.cn/demos/hhhhDX,其中series.data即为ajax请求返回的数据
        result = {'name':'%s(%s年%s月)'%(city,year,month), 'data':data}
        print("RESPONSE",result)
        return HttpResponse(json.dumps(result))
View Code

模板

创建目录

history_weath\templates\history_weath  

由于是单页应用,只有home.html。

home.html

 load staticfiles 

<!doctype html>
<html lang="en">
  <head>
      <script src=" static 'history_weath/jquery.min.js' "></script>
      <link href=" static 'history_weath/bootstrap.min.css' " rel="stylesheet">
      <link href=" static 'history_weath/home.css' " rel="stylesheet">

  </head>
  <body>
  <nav class="navbar navbar-dark fixed-top bg-dark flex-md-nowrap p-0 shadow">
    <a class="navbar-brand col-sm-3 col-md-2 mr-0" href=" url 'history_weath:home' ">全国各城市天气可视化系统</a>
  </nav>

<div class="container-fluid">
  <div class="row">
    <nav class="col-md-2 d-none d-md-block bg-light sidebar">
      <div class="sidebar-sticky">
        <ul class="nav flex-column">
          <li class="nav-item ">
            <a class="active nav-link" id="page1_index" href="#">
              <span data-feather="bar-chart-2"></span>
              数据可视化
            </a>
          </li>
          <li class="nav-item ">
            <a class="nav-link" href="#" id="page2_index">
              <span data-feather="layers"></span>
              数据分析
            </a>
          </li>
        </ul>

     
      </div>
    </nav>

    <main role="main" class="col-md-9 ml-sm-auto col-lg-10 px-4" style="padding-top:70px">

      <div id="page1">
        
                  <div clsss="container">
                  <div class="row">
                  <div class="col-md-3">

                   <div class="input-group mb-3">
                    <div class="input-group-prepend" id="page1_iter">
                      <button type="button" class="btn btn-outline-secondary" id="page2_city">城市</button>
                      <button type="button" class="btn btn-outline-secondary dropdown-toggle dropdown-toggle-split" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
                        <span class="sr-only">Toggle Dropdown</span>
                      </button>
                      <div class="dropdown-menu">
                         for city in context.hot_citys 
                              <button class="dropdown-item page1_add_city">{{ city.city }}</button>
                           endfor 
                           for k,v in context.citys.items 
                            <div role="separator" class="dropdown-divider"></div><p class="text-center">{{ k }}</p>
                             for city in v 
                              <button class="dropdown-item page1_add_city">{{ city }}</button>
                             endfor 
                           endfor 
                      </div>
                    </div>
                    <input type="text" class="form-control" aria-label="Text input with segmented dropdown button">
                   
                </div>


                  <div class="input-group mb-3">
                      <div class="input-group-prepend" id="page1_iter">
                        <button type="button" class="btn btn-outline-secondary" id="page1_year">年份</button>
                        <button type="button" class="btn btn-outline-secondary dropdown-toggle dropdown-toggle-split" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
                          <span class="sr-only">Toggle Dropdown</span>
                        </button>
                        <div class="dropdown-menu">
                           for year in context.years 
                            <button class="dropdown-item page1_add_year">{{ year }}</button>
                           endfor 
                        </div>
                      </div>
                  </div>
                  </div>
                      

                  <div class="col-md-1 ">
                        <div class="card " style='height: 100%; background-color: #e5e5e5'>
                              <span class="badge badge-info page1_city_list" ></span>
                              <div id="page1_city_label"></div>
                        </div>

                  </div>
                  <div class="col-md-1">
                        <div class="card " style='height: 100%; background-color: #e5e5e5'>
                              <span class="badge badge-success page1_year_list" ></span>
                              <div id="page1_year_label"></div>
                        </div>

                  </div>
                </div>
              </div>


                  <button type="button" class="btn btn-secondary btn-lg btn-block" style="margin-top: 10px" id="page1_send">开始</button>
                 


  
                <div id="page1_year_content" style="display: none">

                  <div id="page1_chart1" style="min-width:400px;height:400px;width: 100%"></div>


                   <div class="input-group mb-3">
                      <div class="input-group-prepend" id="page1_iter">
                        <button type="button" class="btn btn-outline-secondary" id="page1_year_weath">气候</button>
                        <button type="button" class="btn btn-outline-secondary dropdown-toggle dropdown-toggle-split" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
                          <span class="sr-only">Toggle Dropdown</span>
                        </button>
                        <div class="dropdown-menu">
                            <button class="dropdown-item page1_add_year_weath">关闭</button>
                            <div id="page1_year_weath_label"></div>
                        </div>
                      </div>
                  </div>
                  
                    <div id="page1_chart3" style="min-width:400px;height:400px;width: 100%; display: none;"></div>
               


                    <div class="btn-group mr-2" role="group" aria-label="First group" style="width: 100%">
                        <button type="button" class="btn btn-secondary page1_month" month=1>一月</button>
                        <button type="button" class="btn btn-secondary page1_month" month=2>二月</button>
                        <button type="button" class="btn btn-secondary page1_month" month=3>三月</button>
                        <button type="button" class="btn btn-secondary page1_month" month=4>四月</button>
                        <button type="button" class="btn btn-secondary page1_month" month=5>五月</button>
                        <button type="button" class="btn btn-secondary page1_month" month=6>六月</button>
                        <button type="button" class="btn btn-secondary page1_month" month=7>七月</button>
                        <button type="button" class="btn btn-secondary page1_month" month=8>八月</button>
                        <button type="button" class="btn btn-secondary page1_month" month=9>九月</button>
                        <button type="button" class="btn btn-secondary page1_month" month=10>十月</button>
                        <button type="button" class="btn btn-secondary page1_month" month=11>十一月</button>
                        <button type="button" class="btn btn-secondary page1_month" month=12>十二月</button>
                    </div>

                  <div id="page1_month_content" style="display: none">
                    <div id="page1_chart2" style="min-width:400px;height:400px;width: 100%"></div>


                    <div class="input-group mb-3">
                      <div class="input-group-prepend" id="page1_iter">
                        <button type="button" class="btn btn-outline-secondary" id="page1_month_weath">气候</button>
                        <button type="button" class="btn btn-outline-secondary dropdown-toggle dropdown-toggle-split" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
                          <span class="sr-only">Toggle Dropdown</span>
                        </button>
                        <div class="dropdown-menu">
                            <button class="dropdown-item page1_add_month_weath">关闭</button>
                            <div id="page1_month_weath_label"></div>
                        </div>
                      </div>
                  </div>
                  
                    <div id="page1_chart4" style="min-width:400px;height:400px;width: 100%; display: none;"></div>
                  </div>
                </div>


        </div>



        <div id="page2" style="display: none">
              
        </div>

      
    </main>
  </div>
</div>
        <script src=" static 'history_weath/highcharts.js' "></script>
        <script src=" static 'history_weath/exporting.js' "></script>
        <script src=" static 'history_weath/oldie.js' "></script>
        <script src=" static 'history_weath/highcharts-zh_CN.js' "></script>

       
        <script src=" static 'history_weath/popper.min.js' "></script>
        <script src=" static 'history_weath/bootstrap.min.js' " ></script>
        <script src=" static 'history_weath/feather.min.js' "></script>
        <script src=" static 'history_weath/home.js' "></script>
       
</body>
</html>
View Code

静态文件

创建目录

history_weath\static\history_weath

home.css

body {
  font-size: .875rem;
}

.feather {
  width: 16px;
  height: 16px;
  vertical-align: text-bottom;
}

/*
 * Sidebar
 */

.sidebar {
  position: fixed;
  top: 0;
  bottom: 0;
  left: 0;
  z-index: 100; /* Behind the navbar */
  padding: 48px 0 0; /* Height of navbar */
  box-shadow: inset -1px 0 0 rgba(0, 0, 0, .1);
}

.sidebar-sticky {
  position: relative;
  top: 0;
  height: calc(100vh - 48px);
  padding-top: .5rem;
  overflow-x: hidden;
  overflow-y: auto; /* Scrollable contents if viewport is shorter than content. */
}

@supports ((position: -webkit-sticky) or (position: sticky)) {
  .sidebar-sticky {
    position: -webkit-sticky;
    position: sticky;
  }
}

.sidebar .nav-link {
  font-weight: 500;
  color: #333;
}

.sidebar .nav-link .feather {
  margin-right: 4px;
  color: #999;
}

.sidebar .nav-link.active {
  color: #007bff;
}

.sidebar .nav-link:hover .feather,
.sidebar .nav-link.active .feather {
  color: inherit;
}

.sidebar-heading {
  font-size: .75rem;
  text-transform: uppercase;
}

/*
 * Content
 */

[role="main"] {
  padding-top: 133px; /* Space for fixed navbar */
}

@media (min-width: 768px) {
  [role="main"] {
    padding-top: 48px; /* Space for fixed navbar */
  }
}

/*
 * Navbar
 */

.navbar-brand {
  padding-top: .75rem;
  padding-bottom: .75rem;
  font-size: 1rem;
  background-color: rgba(0, 0, 0, .25);
  box-shadow: inset -1px 0 0 rgba(0, 0, 0, .25);
}

.navbar .form-control {
  padding: .75rem 1rem;
  border-width: 0;
  border-radius: 0;
}

.form-control-dark {
  color: #fff;
  background-color: rgba(255, 255, 255, .1);
  border-color: rgba(255, 255, 255, .1);
}

.form-control-dark:focus {
  border-color: transparent;
  box-shadow: 0 0 0 3px rgba(255, 255, 255, .25);
}
View Code

home.js

(function () {
  'use strict'

feather.replace()
    
var page1_chart1 = Highcharts.chart('page1_chart1', {
            chart: {
                type: 'line'
            },
            title: {
                text: ''
            },
            subtitle: {
                text: ''
            },
            xAxis: {
                categories: ['一月', '二月', '三月', '四月', '五月', '六月', '七月', '八月', '九月', '十月', '十一月', '十二月']
            },
            yAxis: {
                title: {
                    text: '气温 (°C)'
                }
            },
            plotOptions: {
                line: {
                    dataLabels: {
                        // 开启数据标签
                        enabled: true          
                    },
                    // 关闭鼠标跟踪,对应的提示框、点击事件会失效
                    enableMouseTracking: false
                }
            },
            series: []
});

var page1_chart2 = Highcharts.chart('page1_chart2', {
            chart: {
                type: 'line'
            },
            title: {
                text: ''
            },
            subtitle: {
                text: ''
            },
            xAxis: {
                categories: [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
            },
            yAxis: {
                title: {
                    text: '气温 (°C)'
                }
            },
            plotOptions: {
                line: {
                    dataLabels: {
                        // 开启数据标签
                        enabled: true          
                    },
                    // 关闭鼠标跟踪,对应的提示框、点击事件会失效
                    enableMouseTracking: false
                }
            },
            series: []
});


var page1_chart3 = Highcharts.chart('page1_chart3', {
    chart: {
        plotBackgroundColor: null,
        plotBorderWidth: null,
        plotShadow: false,
        type: 'pie'
    },
    title: {
        text: ''
    },
    tooltip: {
        pointFormat: '{series.name}: <b>{point.percentage:.1f}%</b>'
    },
    plotOptions: {
        pie: {
            allowPointSelect: true,
            cursor: 'pointer',
            dataLabels: {
                enabled: true,
                format: '<b>{point.name}</b>: {point.percentage:.1f} %',
                style: {
                    color: (Highcharts.theme && Highcharts.theme.contrastTextColor) || 'black'
                }
            }
        }
    },
    series: []
});

var page1_chart4 = Highcharts.chart('page1_chart4', {
    chart: {
        plotBackgroundColor: null,
        plotBorderWidth: null,
        plotShadow: false,
        type: 'pie'
    },
    title: {
        text: ''
    },
    tooltip: {
        pointFormat: '{series.name}: <b>{point.percentage:.1f}%</b>'
    },
    plotOptions: {
        pie: {
            allowPointSelect: true,
            cursor: 'pointer',
            dataLabels: {
                enabled: true,
                format: '<b>{point.name}</b>: {point.percentage:.1f} %',
                style: {
                    color: (Highcharts.theme && Highcharts.theme.contrastTextColor) || 'black'
                }
            }
        }
    },
    series: []
});

$("#page1_index").click(function(){

        $("#page1").show()
        $("#page2").hide()
        $("#page2_index").removeClass("active")
        $("#page1_index").addClass("active")
});

$("#page2_index").click(function(){

        $("#page2").show()
        $("#page1").hide()
        $("#page1_index").removeClass("active")
        $("#page2_index").addClass("active")
});


$(".page1_add_city").click(function(){
        if($(".page1_city_list").text().indexOf($(this).text()) == -1){
            var insert = $(".page1_city_list").first().clone().text($(this).text())
            insert.bind("click", function(){
                $(this).remove()
            })
            $("#page1_city_label").before(insert)
        }
});



$(".page1_add_year").click(function(){
        if($(".page1_year_list").text().indexOf($(this).text()) == -1){
            var insert = $(".page1_year_list").first().clone().text($(this).text())
            insert.bind("click", function(){
                $(this).remove()
            })
            $("#page1_year_label").before(insert)
        }
});


$(".page1_add_year_weath").first().click(function(){
        $("#page1_chart3").hide()
        $("#page1_year_weath").text("气候")
});


$(".page1_add_month_weath").first().click(function(){
        $("#page1_chart4").hide()
        $("#page1_month_weath").text("气候")
});


$("#page1_send").click(function(){

    var city_list = $(".page1_city_list")
    var citys = []
    for(var i = 1,p = city_list.first().next(); i < city_list.length; i++, p=p.next())
        citys[i-1] = p.text()
    var year_list = $(".page1_year_list")
    var years = []
    for(var i = 1,p = year_list.first().next(); i < year_list.length; i++, p=p.next())
        years[i-1] = p.text()
    if(citys.length > 0 && years.length > 0)
    {

        //删除
        for(var z = $(".page1_add_year_weath").first().next(); z.text() != ""; z = $(".page1_add_year_weath").first().next())
        {
            z.remove()
        }

        for(var i = 0; i < citys.length; i++)
        {
            for(var j = 0; j < years.length; j++)
            {

                var city = citys[i];
                var year = years[j];
                var text = city + year + "年各种天气所占比例"

                //添加
                var insert = $(".page1_add_year_weath").first().clone().text(text)
                    insert.bind("click", function(){
                        $("#page1_chart3").show()
                        $("#page1_year_weath").text($(this).text())
                        $.ajax({
                            type:"POST",
                            url:"/history_weath/weath_year_ratio",
                            data:JSON.stringify({city:city, year:year}),
                            success:function(result){

                                //删除数据
                                var len = page1_chart3.series.length;
                                for(var i = len - 1; i >= 0; i--){
                                    page1_chart3.series[i].remove();
                                }
                        

                                //新增数据
                                var data = JSON.parse(result)
                                page1_chart3.addSeries(data);
                                

                                var title = city + "-" + year +"年各种天气所占比例饼图";
                                page1_chart3.setTitle({text:title,useHTML:true})
                            }
                        });
                    })
                    $("#page1_year_weath_label").before(insert)
    
            }
        }

        $.ajax({
            type:"POST",
            url:"/history_weath/temperature_year",
            data:JSON.stringify({citys:citys, years:years}),
            success:function(result){

                //删除数据
                var len = page1_chart1.series.length;
                for(var i = len - 1; i >= 0; i--){
                    page1_chart1.series[i].remove();
                }

                //新增数据
                var data = JSON.parse(result)
                for(var i = 0; i < data.length; i++){
                    page1_chart1.addSeries(data[i]);
                }

                var title = citys.toString() + "-" + years.toString() +"年月平均气温曲线图";
                page1_chart1.setTitle({text:title,useHTML:true})
                $("#page1_year_content").show()    
            }
        });
    }
});


$(".page1_month").click(function(){
    var city_list = $(".page1_city_list")
    var citys = []
    for(var i = 1,p = city_list.first().next(); i < city_list.length; i++, p=p.next())
        citys[i-1] = p.text()
    var year_list = $(".page1_year_list")
    var years = []
    for(var i = 1,p = year_list.first().next(); i < year_list.length; i++, p=p.next())
        years[i-1] = p.text()
    var month = $(this).attr('month')
    var month_name = $(this).text()
    if(citys.length > 0 && years.length > 0)
    {
        //删除
        for(var z = $(".page1_add_month_weath").first().next(); z.text() != "";z = $(".page1_add_month_weath").first().next())
        {
            z.remove()
        }
        
        for(var i = 0; i < citys.length; i++)
        {
            for(var j = 0; j < years.length; j++)
            {

                var city = citys[i];
                var year = years[j];
                var text = city + year + "年" + month + "月各种天气所占比例"

                //添加
                var insert = $(".page1_add_month_weath").first().clone().text(text)
                    insert.bind("click", function(){
                        $("#page1_chart4").show()
                        $("#page1_month_weath").text($(this).text())
                        $.ajax({
                            type:"POST",
                            url:"/history_weath/weath_month_ratio",
                            data:JSON.stringify({city:city, year:year, month:month}),
                            success:function(result){

                                //删除数据
                                var len = page1_chart4.series.length;
                                for(var i = len - 1; i >= 0; i--){
                                    page1_chart4.series[i].remove();
                                }
                        

                                //新增数据
                                var data = JSON.parse(result)
                                page1_chart4.addSeries(data);
                                

                                var title = city + "-" + year +"年" + month + "月" + "各种天气所占比例饼图";
                                page1_chart4.setTitle({text:title,useHTML:true})
                            }
                        });
                    })
                    $("#page1_month_weath_label").before(insert)
    
            }
        }


        $.ajax({
            type:"POST",
            url:"/history_weath/temperature_month",
            data: JSON.stringify({citys:citys, years:years, month:month}),
            dataType:'json',
            success:function(result){
                    //删除数据
                    var len = page1_chart2.series.length;
                    for(var i = len - 1; i >= 0; i--){
                        page1_chart2.series[i].remove();
                    }

                    //新增数据
                    var data = result
                    for(var i = 0; i < data.length; i++){
                        page1_chart2.addSeries(data[i]);
                    }

                    var title = citys.toString() + "-" + years.toString() +"年" + month  + "月日平均气温曲线图";
                    page1_chart2.setTitle({text:title,useHTML:true})
                    $("#page1_month_content").show()    

            }
        });
    }

});




}())
View Code

jquery.min.js

popper.min.js

bootstrap.min.css

bootstrap.min.js

feather.min.js

highcharts.js

highcharts-zh_CN.js

oldie.js

exporting.js

 关于django使用mysql报错的问题

错误:django.core.exceptions.ImproperlyConfigured: mysqlclient 1.3.13 or newer is required; you have 0.9.3.

解决办法:修改python库路径下/Lib/site-packages/django/db/backends/mysql的base.py文件,注释掉版本低于1.3.13抛出错误这两行代码

#if version < (1, 3, 13):
#    raise ImproperlyConfigured('mysqlclient 1.3.13 or newer is required; you have %s.' % Database.__version__)

 

错误:AttributeError: 'str' object has no attribute 'decode'

解决办法:修改python库路径下/Lib/site-packages/django/db/backends/mysql的operations.py文件,判断query为bytes类型时才执行decode

        if query is not None:
            if isinstance(query,bytes):
                query = query.decode(errors='replace')
        return query 

  

django,uWSGI,nginx部署

前面的部分都是在windows上完成的开发,现在需要部署到个人云服务器

mysql数据库备份与还原

现在要将前面爬虫花了9小时爬取的数据从个人电脑迁移到云服务器。

数据库备份

mysqldump -u root -p lishitianqi > d:\backup.sql

将backup.sql拷贝的云服务器,数据库还原

MariaDB [(none)]> create database lishitianqi;
...
[root@VM_0_12_centos mysql]# mysql -u root -p --default-character-set=utf8 lishitianqi <  backup.sql

django部署清单

检查settings.py

 python manage.py check --deploy
System check identified some issues:

WARNINGS:
?: (security.W003) You don't appear to be using Django's built-in cross-site request forgery protection via the middleware ('django.middleware.csrf.CsrfViewMiddleware' is not in your MIDDLEWARE). Enabling the middleware is the safest approach to ensure you don't leave any holes.
?: (security.W004) You have not set a value for the SECURE_HSTS_SECONDS setting. If your entire site is served only over SSL, you may want to consider setting a value and enabling HTTP Strict Transport Security. Be sure to read the documentation first; enabling HSTS carelessly can cause serious, irreversible problems.
?: (security.W006) Your SECURE_CONTENT_TYPE_NOSNIFF setting is not set to True, so your pages will not be served with an 'x-content-type-options: nosniff' header. You should consider enabling this header to prevent the browser from identifying content types incorrectly.
?: (security.W007) Your SECURE_BROWSER_XSS_FILTER setting is not set to True, so your pages will not be served with an 'x-xss-protection: 1; mode=block' header. You should consider enabling this header to activate the browser's XSS filtering and help prevent XSS attacks.
?: (security.W008) Your SECURE_SSL_REDIRECT setting is not set to True. Unless your site should be available over both SSL and non-SSL connections, you may want to either set this setting True or configure a load balancer or reverse-proxy server to redirect all connections to HTTPS.
?: (security.W012) SESSION_COOKIE_SECURE is not set to True. Using a secure-only session cookie makes it more difficult for network traffic sniffers to hijack user sessions.
?: (security.W018) You should not have DEBUG set to True in deployment.
?: (security.W019) You have 'django.middleware.clickjacking.XFrameOptionsMiddleware' in your MIDDLEWARE, but X_FRAME_OPTIONS is not set to 'DENY'. The default is 'SAMEORIGIN', but unless there is a good reason for your site to serve other parts of itself in a frame, you should change it to 'DENY'.

System check identified 8 issues (0 silenced).

按照上述描述的问题一一解决

python虚拟环境

pip install virtualenv
pip install virtualenvwrapper

#创建目录用来存放虚拟环境
mkdir $HOME/.virtualenvs

#在~/.bashrc中添加行
export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_VIRTUALENV=/usr/local/python3/bin/virtualenv
source /usr/local/python3/bin/virtualenvwrapper.sh


#让配置文件生效
source ~/.bashrc

#现在mkvirtualenv, workon, deactive,rmvirtualenv命令可以用了

#创建虚拟环境
mkvirtualenv -p python3 django_env_py3

#进入虚拟环境
workon django_env_py3

#退出虚拟环境
deactivate

#删除虚拟环境
revirtualenv django_env_py3

工程迁移

进入django虚拟环境

workon django_env_py3

将此工程拷贝到云服务器,安装依赖库,修改连接数据库参数:主机(localhost需要改为127.0.0.1),端口号,用户名,密码,让工程能正常运行。关于连接mysql Error问题上文提到过。

spider/lishitianqi/pipelines.py

                #连接数据库
                self.connect = pymysql.connect(
                        host='127.0.0.1',
                        port=8809,
                        db='lishitianqi',
                        user='root',
                        passwd='123',
                        charset='utf8',
                        use_unicode=True)

webapps/webapps/settings.py

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'lishitianqi',
        'USER': 'root',
        'PASSWORD':'123',
        'HOST':'127.0.0.1',
        'PORT':'8809',
    }
}

生成requirements.txt

pip freeze > requirements.txt
asn1crypto==0.24.0
attrs==19.1.0
Automat==0.7.0
cffi==1.12.3
constantly==15.1.0
cryptography==2.7
cssselect==1.1.0
Django==2.2.5
hyperlink==19.0.0
idna==2.8
incremental==17.5.0
lxml==4.4.1
parsel==1.5.2
pyasn1==0.4.7
pyasn1-modules==0.2.6
pycparser==2.19
PyDispatcher==2.0.5
PyHamcrest==1.9.0
PyMySQL==0.9.3
pyOpenSSL==19.0.0
pytz==2019.2
queuelib==1.5.0
Scrapy==1.7.3
service-identity==18.1.0
six==1.12.0
sqlparse==0.3.0
Twisted==19.7.0
uWSGI==2.0.18
w3lib==1.21.0
xpinyin==0.5.6
zope.interface==4.6.0

通过requirements.txt安装python库

pip install -r requirements.txt 

 

uWSIG服务器部署

安装uwsgi

pip install uwsgi

在django项目的setting.py文件的同级目录下,增加一个配置文件uwsgi.ini

[uwsgi]
#使用nginx连接时使用
socket=127.0.0.1:9090
#直接做web服务器使用
#http=127.0.0.1:9090
#项目目录
chdir=/root/project/fullstack_history_weath/webapps
#项目中wsgi.py文件的目录,相对于项目目录
wsgi-file=webapps/wsgi.py
# 指定启动的工作进程数
processes=4
# 指定工作进程中的线程数
threads=2
master=True
# 保存启动之后主进程的pid
pidfile=uwsgi.pid
# 设置uwsgi后台运行,用uwsgi.log保存日志信息
daemonize=uwsgi.log
# 设置虚拟环境的路径
virtualenv=/root/.virtualenvs/django_env_py3

uwsgi相关命令

#uWSGI 通过 xxx.ini 启动后会在相同目录下生成一个 xxx.pid 的文件,里面只有一行内容是 uWSGI 的主进程的进程号。

#uWSGI 启动:
uwsgi --ini xxx.ini

#uwsgi 重启:
uwsgi --reload xxx.pid

#uwsgi 停止:
uwsgi --stop xxx.pid

#强制关闭
pkill uwsgi

nginx服务器部署

安装nginx

wget http://nginx.org/download/nginx-1.8.0.tar.gz
tar -xvf nginx-1.8.0.tar.gz
cd nginx-1.8.0
./configure
make
make install  
ln -s /usr/local/nginx/sbin/nginx /usr/bin/nginx

运行nginx

/usr/local/nginx/sbin/nginx

打开浏览器访问此机器的 IP,如果浏览器出现 Welcome to nginx! 则表示 Nginx 已经安装并运行成功。

nginx相关命令:

#重启:
nginx –s reload

#停止:
nginx –s stop

#测试配置文件是否正常:
nginx –t

#强制关闭:
pkill nginx

配置nginx

        location / {
            include  uwsgi_params;
            uwsgi_pass  127.0.0.1:9090;
        }

        location /static/ {
            autoindex on;
            alias  /root/project/fullstack_history_weath/webapps/static/;
        }

  

部署出现的问题 

问题:uwsgi日志报错“ModuleNotFoundError: No module named 'encodings'

解决办法:uwsgi配置文件中虚拟环境路径设置错误,可以通过

In [1]: import sys

In [2]: sys.path
Out[2]:
['/root/.virtualenvs/django_env_py3/lib/python3.6/site-packages',
 '/usr/local/python3/bin',
 '/usr/local/python3/lib/python36.zip',
 '/usr/local/python3/lib/python3.6',
 '/usr/local/python3/lib/python3.6/lib-dynload',
 '',
 '/usr/local/python3/lib/python3.6/site-packages',
 '/usr/local/python3/lib/python3.6/site-packages/IPython/extensions',
 '/root/.ipython']

因为我的虚拟环境名为django_env_py3,所以

# 设置虚拟环境的路径
virtualenv=/root/.virtualenvs/django_env_py3

 

问题:uwsgi日志报错.“--- no python application found, check your startup logs for errors ---

问题:uwsgi日志报错“chdir(): No such file or directory [core/uwsgi.c line 2623]

解决办法: 上面两个问题第一个是没有设置chdir,第二个是chdir路径设置错误导致,正确写法如下

#django项目根目录
chdir=/root/project/fullstack_history_weath/webapps

  

问题:nginx能加载html,加载静态文件显示 404(Not Found)

可能的原因:

1. alias  /root/project/fullstack_history_weath/webapps/static末尾没加/会导致

2. python setting.py中的STATIC_ROOT和nginx配置文件中静态文件规则location /static/上的路径不一致

3. python没有执行命令python manage.py collectstatic收集所有静态文件到STATIC_ROOT路径下

 

问题:nginx能加载html,加载静态文件显示403(Forbidden)

解决方法:静态文件以及他们的所有父目录权限不足,例如有静态文件路径/root/project/fullstack_history_weath/webapps/static/,若目录project的权限为644,

static中所有的文件权限为644,该路径中的其余文件夹(root, fullstack_history_weath,webapps)权限为655,则需要做如下修改

chmod 655 /root/project
chmod -R 655 /root/project/fullstack_history_weath/webapps/static

 

项目代码获取

至此,整个项目已经全部完成,完整代码获取见下面连接。

nginx文件下载:http://pengliangyuan.club/files/Chinese-historical-weather-visualization-system/

github连接:https://github.com/pengliangyuan/Chinese-historical-weather-visualization-system

 

番外篇-nginx目录美化

  1. 给nginx添加fanyindex插件,下载源码ngx-fancyindex
  2. 将源码解压后放在nginx源码目录下
  3. 在编译选项中添加这个插件./configure --add-module=ngx-fancyindex-0.4.3(插件解压后的目录名)
  4. makemake installln -s /usr/local/nginx/sbin/nginx /usr/bin/nginx
  5. 在目录/usr/local/nginx/html下,下载fancyindex主题git clone https://github.com/TheInsomniac/Nginx-Fancyindex-Theme.git
  6. 修改/usr/local/nginx/conf/nginx.conf,配置如下,nginx -s reload
        location /files {
            include /usr/local/nginx/html/Nginx-Fancyindex-Theme/fancyindex.conf; 
            alias  /root/files/;
            charset utf-8,gbk; 
        }

        location /Nginx-Fancyindex-Theme/ {
            alias /usr/local/nginx/html/Nginx-Fancyindex-Theme/;
        }

  

 

posted @ 2019-09-09 17:44  pengliangyuan  阅读(1760)  评论(0编辑  收藏  举报