Loading

基于Django的爬虫系统

说明

技术栈:

开发框架:Django
前端:boostrap、ajax、JavaScript
后端:python
数据库:redis、postgresql
github:https://github.com/GaGim-H/typhoon_wb_anlysis_sys

image

步骤说明

Django采用MVT(model、view、templates)结构

  1. 先创建基本文件夹,包括功能文件夹、静态资源文件夹等
  2. 定义templates,编写前端页面
  3. 在app中添加视图
  4. 添加路径
  5. 完善视图和templates,加入功能代码模块,连接前后端数据

前期工作

基础创建

创建Django项目

image

运行测试

python manage.py runserver

创建app

python manage.py startapp app1

创建文件夹

文件夹名 说明
fun 存放control层功能函数
static 存放静态前端文件,如js、css、img等

修改settings

STATIC_URL = 'static/'
STATICFILES_DIRS = [
    os.path.join(BASE_DIR,'static')
]

添加首页

将app1注册添加到settings中的INSTALL_APPS

在static文件夹中放入bootstrap文件

image

编写第一个页面index.html

<html>
<head>
<!-- Meta, title, CSS, favicons, etc. -->
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <link rel="icon" href="../static/img/logo.ico" type="image/ico" />
    <title>台风分析系统</title>
    <link href="../static/vendors/bootstrap/dist/css/bootstrap.css" rel="stylesheet">
    <link href="../static/vendors/font-awesome/css/font-awesome.min.css" rel="stylesheet">
    <link href="../static/build/css/custom.min.css" rel="stylesheet">
</head>

<body class="nav-md">
<div class="container body">
	<div class="main_container">
	<!--左边导航栏-->
	<div class="col-md 3 left_col">
	 <div class="left_col scroll-view">
            <div class="navbar nav_title" style="border: 0;">
              <a href="index.html" class="site_title"><i class="fa fa-paw"></i> <span>台风分析系统</span></a>
            </div>
            <div class="clearfix"></div>
            <!-- sidebar menu -->
            <div id="sidebar-menu" class="main_menu_side hidden-print main_menu">
              <div class="menu_section">
                <ul class="nav side-menu">
                  <li><a href="{% url 'index' %}"><i class="fa fa-home"></i> 首页</a>
                  <li><a><i class="fa fa-edit"></i> 台风监控 <span class="fa fa-chevron-down"></span></a>
                    <ul class="nav child_menu">
                      <li><a href="">更新cookie</a></li>
                      <li><a href="">微博监控</a></li>
                    </ul>
                  </li>
                  <li><a href=""><i class="fa fa-desktop"></i> 数据管理</a>
                  </li>
                  <li><a href=""><i class="fa fa-table"></i> 灾损评估</a>
                  </li>
                  <li><a href=""><i class="fa fa-bar-chart-o"></i> 图表分析</a>
                  </li>
                  <li><a href=""><i class="fa fa-globe"></i>位置提取</a>
                    <li><a href=""><i class="fa fa-comment-o"></i>信息推送</a>
                  <li><a href=""><i class="fa fa-clone"></i>涡旋识别</a>
                  </li>
                </ul>
              </div>


            </div>
            <!-- /sidebar menu -->

            <!-- /menu footer buttons -->
            <div class="sidebar-footer hidden-small">
              <a data-toggle="tooltip" data-placement="top" title="Settings">
                <span class="glyphicon glyphicon-cog" aria-hidden="true"></span>
              </a>
              <a data-toggle="tooltip" data-placement="top" title="FullScreen">
                <span class="glyphicon glyphicon-fullscreen" aria-hidden="true"></span>
              </a>
              <a data-toggle="tooltip" data-placement="top" title="Lock">
                <span class="glyphicon glyphicon-eye-close" aria-hidden="true"></span>
              </a>
              <a data-toggle="tooltip" data-placement="top" title="Logout" href="login.html">
                <span class="glyphicon glyphicon-off" aria-hidden="true"></span>
              </a>
            </div>
            <!-- /menu footer buttons -->
          </div>
	</div>
	<!--顶部导航-->
	<div class="top_nav">
	          <div class="nav_menu">
              <div class="nav toggle">
                <a id="menu_toggle"><i class="fa fa-bars"></i></a>
              </div>
          </div>
	</div>
	<!--右边内容-->
	<div class="right-col" role="main">
	          <div class="">
            <div class="page-title">
              <div class="title_left">
                <h3>首页</h3>
              </div>
            </div>
            <div class="clearfix"></div>
            <div class="row" style="height: 100%">
              <div class="col-md-12 col-sm-12  ">
                <div class="x_panel">
                  <div class="x_title">
                    <h2>使用说明</h2>
                    <div class="clearfix"></div>
                  </div>
                  <div class="x_content">
                    <!-- <h4>功能概述</h4>
                    <p>  123</p>
                    <h4>使用顺序</h4>
                    <p>  1.</p> -->
                    <div class='typora-export os-windows typora-export-content'>
                      <div id='write'  class=''><h4 id='功能概述'><span>功能概述</span></h4><ul><li><span>台风监控:基于多线程与redis消息队列实现的高效微博爬虫,融合情感分析分类器与信息识别器,对数据进行情感识别与信息识别。</span></li></ul><ul><li><span>数据管理:对监测的数据进行增删改查操作。</span></li><li><span>灾损评估:对监测的灾害损失数据进行聚类评估。</span></li><li><span>图表分析:对监测的数据进行图表化展示,将文本数据转换为图表数据,便于用户直观分析。</span></li><li><span>位置提取:提取微博文本中的地理位置信息,并在地图中定位。</span></li><li><span>信息推送:使用qq邮箱授权发送信息。</span></li><li><span>涡旋识别:对</span><a href='http://weather.cma.cn/web/channel-ee6f0049d0bc4846a0396647b5a90cc3.html'><span>卫星云图</span></a><span>进行台风涡旋识别,根据台风等级标准(热带低气压:&lt;34;台风:[34,64),强台风:[64,85),超强台风[85,105) 单位:/kt)识别台风涡旋中的风速。</span></li></ul><p>&nbsp;</p><h4 id='使用说明'><span>使用说明</span></h4><ul><li><p><span>使用该系统前需要检验个人账号的cookie是否可以使用,即点击“台风监控”-&gt;&quot;更新cookie&quot;-&gt;“可用cookie查询”,后台将根据UID查询cookie情况;</span></p><p><span>若不能使用,则需要更新cookie,输入个人微博UID,若存在可使用cookie,可开始进行智能分析。</span></p></li></ul><ul><li><span>使用“台风监控”功能,可选择实时监控或指定时间段监控,实时监控无需输入起始时间和终止时间。</span></li><li><span>使用“消息推送”功能前,用户需获取qq邮箱授权码。</span></li></ul></div></div>
                    </div>
                </div>
              </div>
            </div>
          </div>
	</div>
	
	</div>
</div>


  <!-- jQuery -->
  <script type="text/javascript" src="../static/vendors/jquery/dist/jquery.min.js"></script>
  <!-- Bootstrap -->
 <script src="../static/vendors/bootstrap/dist/js/bootstrap.bundle.min.js"></script>
  <!-- FastClick -->
  <script src="../static/vendors/fastclick/lib/fastclick.js"></script>
  <!-- NProgress -->
  <script src="../static/vendors/nprogress/nprogress.js"></script>
  <script src="../static/vendors/bootstrap-progressbar/bootstrap-progressbar.min.js"></script>
  <!-- Custom Theme Scripts -->
  <script src="../static/build/js/custom.min.js"></script>

</body>
</html>

在app1的views.py中添加路由

def index(request):
    return render(request,'index.html')

在urls.py中注册index页面路径

from app1 import views
urlpatterns = [
    path('',views.index)
]

效果预览

image

添加功能页面:微博监控(page1-1.html)

添加视图:在app1的views中注册page1-1的路由

def page1_1(request):
	# 待补充
	return render(request, 'page1-1.html')

添加路径:在urls中添加page1-1的路径

urlpatterns = [
    path('', views.index, name='index'),
    path('page1_1/', views.page1_1, name="page1_1")]

添加前端:在templates中添加page1-1.html

{% load static %}
<html lang="en">
<style>

    /* .form-col{
        display: inline-block;
        width: 70%;
        height: calc(1.5em + .75rem + 2px);
        padding: 0.375rem 0.75rem;
        display: block;
        font-size: 1rem;
        line-height: 1.5;
        color: #495057;
        background-color: #fff;
        background-clip: padding-box;
        border: 1px solid #ced4da;
    } */
    td {
        text-align: center;
        vertical-align: middle;
        font-size: smaller;
    }

    .row-div {
        position: relative;
        display: flex;
        /* 水平垂直居中 */
        justify-content: center;
        align-items: center;
        padding-top: inherit;
    }

    button {
        width: 150px;
    }

    /* table{
        border-collapse:separate;
    } */
    .table-input {
        outline: none;
        width: 100%;
        height: calc(1.5em + .75rem + 3px);
        font-size: small;
        line-height: 1.5;
        color: #495057;
        background-clip: padding-box;
        border: 1px solid #ced4da;
    }

    .table-output {
        border-collapse: separate;
        border-spacing: 15px;
    }

</style>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <!-- Meta, title, CSS, favicons, etc. -->
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <link rel="icon" href="../static/img/logo.ico" type="image/ico"/>

    <title>台风分析系统</title>

    <!-- Bootstrap -->
    <link href="../static/vendors/bootstrap/dist/css/bootstrap.min.css" rel="stylesheet">
    <!-- Font Awesome -->
    <link href="../static/vendors/font-awesome/css/font-awesome.min.css" rel="stylesheet">

    <!-- iCheck 基于 jQuery 的自定义复选框和单选按钮,-->
    <link href="../static/vendors/iCheck/skins/flat/green.css" rel="stylesheet">
    <!-- bootstrap-wysiwyg 所见即所得 -->
    <link href="../static/vendors/google-code-prettify/bin/prettify.min.css" rel="stylesheet">
    <!-- Select2 -->
    <link href="../static/vendors/select2/dist/css/select2.min.css" rel="stylesheet">
    <!-- Switchery 按钮-->
    <link href="../static/vendors/switchery/dist/switchery.min.css" rel="stylesheet">
    <!-- starrr  jQuery 打分插件-->
    <link href="../static/vendors/starrr/dist/starrr.css" rel="stylesheet">

    <!-- Custom Theme Style -->
    <link href="../static/build/css/custom.min.css" rel="stylesheet">
</head>

<body class="nav-md">
<div class="container body">
    <div class="main_container">
        <div class="col-md-3 left_col">
            <div class="left_col scroll-view">
                <div class="navbar nav_title" style="border: 0;">
                    <a href="{% url 'index' %}" class="site_title"><i class="fa fa-paw"></i> <span>台风分析系统</span></a>
                </div>

                <div class="clearfix"></div>

                <!-- menu profile quick info -->
                <!-- <div class="profile clearfix">
                  <div class="profile_pic">
                    <img src="../img/logo.png" alt="..." class="img-circle profile_img">
                  </div>
                  <div class="profile_info">
                    <span>Welcome,</span>
                    <h2>John Doe</h2>
                  </div>
                </div> -->
                <!-- /menu profile quick info -->

                <!-- sidebar menu -->
                <div id="sidebar-menu" class="main_menu_side hidden-print main_menu">
                    <div class="menu_section">
                        <ul class="nav side-menu">
                            <li><a href="{% url 'index' %}"><i class="fa fa-home"></i> 首页</a>
                            <li><a><i class="fa fa-edit"></i> 台风监控 <span class="fa fa-chevron-down"></span></a>
                                <ul class="nav child_menu">
                                    <li><a href="{% url 'page1_2' %}">更新cookie</a></li>
                                    <li><a href="{% url 'page1_1' %}">微博监控</a></li>
                                </ul>
                            </li>
                            <li><a href="{% url 'page2_1' %}"><i class="fa fa-desktop"></i> 数据管理</a>
                            </li>
                            <li><a href="{% url 'page3_1' %}"><i class="fa fa-table"></i> 灾损评估</a>
                            </li>
                            <li><a href="{% url 'page4_1' %}"><i class="fa fa-bar-chart-o"></i> 图表分析</a>
                            </li>
                            <li><a href="{% url 'page5_1' %}"><i class="fa fa-globe"></i>位置提取</a>
                            </li>
                            <li><a href="{% url 'page6_1' %}"><i class="fa fa-comment-o"></i>信息推送</a>
                            </li>
                            <li><a href="{% url 'page7_1' %}"><i class="fa fa-clone"></i>涡旋识别</a>
                            </li>
                        </ul>
                    </div>


                </div>
                <!-- /sidebar menu -->

                <!-- /menu footer buttons -->
                <div class="sidebar-footer hidden-small">
                    <a data-toggle="tooltip" data-placement="top" title="Settings">
                        <span class="glyphicon glyphicon-cog" aria-hidden="true"></span>
                    </a>
                    <a data-toggle="tooltip" data-placement="top" title="FullScreen">
                        <span class="glyphicon glyphicon-fullscreen" aria-hidden="true"></span>
                    </a>
                    <a data-toggle="tooltip" data-placement="top" title="Lock">
                        <span class="glyphicon glyphicon-eye-close" aria-hidden="true"></span>
                    </a>
                    <a data-toggle="tooltip" data-placement="top" title="Logout" href="login.html">
                        <span class="glyphicon glyphicon-off" aria-hidden="true"></span>
                    </a>
                </div>
                <!-- /menu footer buttons -->
            </div>
        </div>

        <!-- top navigation -->
        <div class="top_nav">
            <div class="nav_menu">
                <div class="nav toggle">
                    <a id="menu_toggle"><i class="fa fa-bars"></i></a>
                </div>
            </div>
        </div>
        <!-- /top navigation -->

        <!-- page content -->
        <div class="right_col" role="main">
            <div class="page-title">
                <div class="title_left">
                    <h3>台风监控</h3>
                </div>

            </div>
            <div class="clearfix"></div>
            <div class="row">
                <div class="col-md-12 col-sm-12 ">
                    <div class="x_panel">
                        <div class="x_title">
                            <h2>微博监控输入</h2>
                            <ul class="nav navbar-right panel_toolbox">
                                <li><a class="collapse-link"><i class="fa fa-chevron-up"></i></a>
                                </li>
                            </ul>
                            <div class="clearfix"></div>
                        </div>
                        <div class="x_content">

                            {#                                <div class="form-horizontal form-label-left" >#}
                            <div class="row-div">
                                <div class="col-md-5">
                                    <div class="form-group row col-md-12">
                                        <div class="col-md-2"></div>
                                        <label class="control-label col-md-3">关键词 <span class="required"></span>
                                        </label>
                                        <input type="text" style="font-size: small" required="required"
                                               class="form-control col-md-7" id="keyword" value="台风梅花">
                                    </div>
                                    <div class="form-group row col-md-12">
                                        <div class="col-md-2"></div>
                                        <label class="control-label col-md-3 ">任务主题 <span class="required"></span>
                                        </label>
                                        <input type="text" style="font-size: small" required="required"
                                               class="form-control col-md-7" id="theme" value="台风梅花">
                                    </div>
                                    <div class="form-group row col-md-12">
                                        <div class="col-md-2"></div>
                                        <label class="control-label col-md-3">起始时间<span class="required"></span>
                                        </label>
                                        <input type="text" style="font-size: small" required="required"
                                               class="form-control col-md-7" id="start_time" value="2022-09-14-01">
                                    </div>
                                    <div class="form-group row col-md-12">
                                        <div class="col-md-2"></div>
                                        <label class="control-label col-md-3">终止时间<span class="required"></span>
                                        </label>
                                        <input type="text" style="font-size: small" required="required"
                                               class="form-control col-md-7" id="end_time" value="2022-09-14-02">
                                    </div>
                                    <div class="form-group row col-md-12">
                                        <div class="col-md-2"></div>
                                        <label class="control-label col-md-3">时间粒度<span class="required"></span>
                                        </label>
                                        <input type="text" style="font-size: small" required="required"
                                               class="form-control col-md-7" id="interval" value="1">
                                    </div>
                                </div>
                                <div class="col-md-7">
                                    <textarea rows="8" style="resize: none;width: 100%;border: 1px solid #ced4da;"
                                              id="monitor_log">{{ monitor_log }}</textarea>
                                </div>
                            </div>
                            <div class="ln_solid"></div>
                            <div class="row-div">
                                <button class="btn btn-primary" type="reset" id="reset">重置</button>&nbsp;&nbsp;&nbsp;&nbsp;
                                <button type="button" class="btn btn-success" id="start_btn" onclick="sendMessage()">
                                    开始
                                </button>&nbsp;&nbsp;&nbsp;&nbsp;
                                <button class="btn btn-primary" type="button" id="stop_btn" onclick="stopMonitor()">停止
                                </button>
                            </div>
                            {#                                  </div>#}
                        </div>
                    </div>
                </div>
            </div>
            <!--日志监控-->
            <div class="row" style="height: 110%;">
                <div class="col-md-12 col-sm-12 ">
                    <div class="x_panel">
                        <div class="x_title">
                            <h2>微博监控输出</h2>
                            <ul class="nav navbar-right panel_toolbox">
                                <li><a class="collapse-link"><i class="fa fa-chevron-up"></i></a>
                                </li>
                            </ul>
                            <div class="clearfix"></div>
                        </div>
                        <div class="x_content">

                            <div class="row-div >
{% csrf_token %}
                              <div>
                                <table class=" table-output
                            ">
                            <tr>
                                <td>用户名</td>
                                <td><input type="text" class="table-input" id="screen_name"></td>
                                <td>用户ID</td>
                                <td><input type="text" class="table-input" id="uid"></td>
                                <td>内容</td>
                                <td rowspan="3"><textarea type="textarea"
                                                          style="height:150px;width: 100%;outline: none; " id="content"
                                                          value={{ content }}></textarea></td>
                            </tr>
                            <tr>
                                <td>IP属地</td>
                                <td><input type="text" class="table-input" id="ip"></td>
                                <td>发布时间</td>
                                <td><input type="text" class="table-input" id="create_at"></td>
                            </tr>
                            <tr>
                                <td>点赞数</td>
                                <td>
                                    <input type="text" class="table-input" id="attitude_counts"></td>
                                <td>转发数</td>
                                <td><input type="text" class="table-input" id="repost_counts"></td>
                            </tr>
                            <tr>
                                <td>坐标</td>
                                <td><input type="text" class="table-input" id="pos"></td>
                                <td>评论数</td>
                                <td><input type="text" class="table-input" id="comment_counts"></td>
                                <td>评论内容</td>
                                <td rowspan="3"><textarea type="textarea"
                                                          style="height: 150px;width:100%;outline: none; " rows="5"
                                                          cols="60" id="comments_list"></textarea></td>
                            </tr>
                            <tr>

                                <!-- <input type="text" class="table-input"  name="pos" > -->
                                <th style="font-size: smaller;">情感类别</th>
                                <td><input type="text" class="table-input" id="emo_class"></td>
                                <th style="font-size: smaller;">情感值</th>
                                <td><input type="text" class="table-input" id="emo_score"></td>

                            </tr>
                            <tr>
                                <td>原文链接</td>
                                <td colspan="3"><input type="text" class="table-input" id="article_url"></td>
                            </tr>
                            </table>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
</div>

</div>
</div>

<!-- jQuery -->
<script src="../static/vendors/jquery/dist/jquery.min.js"></script>
<!-- Bootstrap -->
<script src="../static/vendors/bootstrap/dist/js/bootstrap.bundle.min.js"></script>
<!-- FastClick -->
<script src="../static/vendors/fastclick/lib/fastclick.js"></script>
<!-- NProgress -->
<script src="../static/vendors/nprogress/nprogress.js"></script>
<!-- Chart.js -->
<script src="../static/vendors/Chart.js/dist/Chart.min.js"></script>
<!-- gauge.js -->
<script src="../static/vendors/gauge.js/dist/gauge.min.js"></script>
<!-- bootstrap-progressbar -->
<script src="../static/vendors/bootstrap-progressbar/bootstrap-progressbar.min.js"></script>
<!-- iCheck -->
<script src="../static/vendors/iCheck/icheck.min.js"></script>
<!-- Skycons -->
<script src="../static/vendors/skycons/skycons.js"></script>
<!-- Flot -->
<script src="../static/vendors/Flot/jquery.flot.js"></script>
<script src="../static/vendors/Flot/jquery.flot.pie.js"></script>
<script src="../static/vendors/Flot/jquery.flot.time.js"></script>
<script src="../static/vendors/Flot/jquery.flot.stack.js"></script>
<script src="../static/vendors/Flot/jquery.flot.resize.js"></script>
<!-- Flot plugins -->
<script src="../static/vendors/flot.orderbars/js/jquery.flot.orderBars.js"></script>
<script src="../static/vendors/flot-spline/js/jquery.flot.spline.min.js"></script>
<script src="../static/vendors/flot.curvedlines/curvedLines.js"></script>
<!-- DateJS -->
<script src="../static/vendors/DateJS/build/date.js"></script>
<!-- JQVMap -->
<script src="../static/vendors/jqvmap/dist/jquery.vmap.js"></script>
<script src="../static/vendors/jqvmap/dist/maps/jquery.vmap.world.js"></script>
<script src="../static/vendors/jqvmap/examples/js/jquery.vmap.sampledata.js"></script>
<!-- bootstrap-daterangepicker -->
<script src="../static/vendors/moment/min/moment.min.js"></script>
<script src="../static/vendors/bootstrap-daterangepicker/daterangepicker.js"></script>

<!-- Custom Theme Scripts -->
<script src="../static/build/js/custom.min.js"></script>

</body>
</html>

更新首页index.html的标签 < a href>

更新views

如果点击了按钮,则会进行get请求,这里需要约束条件,避免不满足条件时会执行程序。
因此加入uid的判断,uid是一个全局变量,对uid存在与否做一个判断和提示。

    if request.method=='GET':
        try:
            if uid:
                return render(request,'page1-1.html',{'monitor_log': "已识别到可用cookie,可进行监控采集!\n",'uid':uid})
            else:
                return render(request,'page1-1.html',{'monitor_log': "暂无可用cookie,请先更新cookie!\n"})
        except Exception as e:
            print(e.args)
            return render(request,'page1-1.html',{'monitor_log':"暂无可用cookie,请先更新cookie!\n"})
    else:
        return render(request, 'page1-1.html')

jinjia连接后端数据

<div class="col-md-7">
    <textarea rows="8" style="resize: none;width: 100%;border: 1px solid #ced4da;"id="monitor_log">{{ monitor_log }}</textarea>
</div>

添加功能页面:模拟登录(page1-2.html)

操作步骤同上
image

完善模拟登录功能交互

在func文件夹创建get_cookie.py,加入模拟登录功能代码

https://i.cnblogs.com/posts/edit;postId=18188142

更新views

该功能主要为了获取cookie,以便能顺利运行爬虫程序,采集到数据。该界面提供了两种获取cookie的登录方式,一种是微博账号和密码、另一种是通过qq登录。

为了避免重复获取cookie,可以先输入uid,点击查询cookie按钮查看是否已获取到cookoie;

根据uid判断当前用户的cookie是否存在,

def page1_2(request:HttpRequest):
    global uid
    print(request.method)
    if request.method=='POST':
        account=request.POST.get('account')
        password=request.POST.get('password')
        uid=request.POST.get('uid')
        # 查询可用cookie
        if 'query_cookie' in request.POST:
            try:
                # cookie=serializers.serialize("json",CookieInfo.objects.filter(uid__exact=uid))
                # cookie=CookieInfo.objects.extra(where=[FIND_IN_SET()])
                # cookie=CookieInfo.objects.get(uid__exact=str(uid))
                query=CookieInfo.objects.filter(uid__exact=uid).order_by('?').values('uid','cookie' ,'datetime')[0]
                cookie=query['cookie']
                return render(request, 'page1-2.html',
                              {'cookie_display': f"{datetime.now().strftime('%Y-%m-%d %H:%M')} uid {uid} 存在可使用cookie\n{cookie}"})
            except Exception as e:
                print(e.args)
                old_uid=uid
                uid=None
                return render(request, 'page1-2.html',
                              {'cookie_display': f"{datetime.now().strftime('%Y-%m-%d %H:%M')} uid {old_uid} 不存在cookie"})

        # 模拟登录
        if 'get_cookie' in request.POST:
            try:
                loop=asyncio.new_event_loop()
                asyncio.set_event_loop(loop)
                cookie = loop.run_until_complete(get_cookie.login(account,password,uid))
                print("cookie", cookie)
                # 保存至数据库中
                cookie_info=CookieInfo(cookie=cookie,uid=uid,datetime=datetime.now().strftime("%Y-%m-%d"))
                cookie_info.save()
                loop.close()
                return render(request,'page1-2.html',{'cookie_display':f'{datetime.now().strftime("%Y-%m-%d %H:%M")} uid {uid} cookie获取成功!\n{cookie}'})
            except:
                return render(request,'page1-2.html',{'cookie_display':f"{datetime.now().strftime('%Y-%m-%d %H:%M')}获取异常"})
    return render(request, 'page1-2.html')

完善爬虫功能交互

技术难点: 将爬虫的数据实时显示在网页中,如上图所示,爬虫页面需要提交一些参数,并点击开始提交参数,执行爬虫程序,也就是post请求,如果在这里使用表单的话,则会出现每爬取一条数据,刷新一次页面,这无疑是一个不合格的程序。
因此,主要采用sendMessage实现实时展示数据。
实现思路:
1.当点击开始按钮,执行控制器page1_1_send,把页面中输入的关键词、主题、爬取开始时间、爬取结束时间、爬取粒度这些参数传递到控制器中,该控制器调用爬虫程序,并存储在数据库中;
2.要从后端获取爬虫数据展示在前端,要在前端定义一个get方法getMessege,并从控制器page1_1_get中获取数据,并设置前端参数的值,此时前端能正常显示数据,但还无法自动更新,实时显示数据;
3.引入sessionStorage实现实时显示数据,并在getMessage后面补充设置要存储的数据
4.切换页面会出现异常,还需考虑切换页面的情况,使得系统在切换页面后仍可以正常执行爬虫程序。
使用(document).ready方法,这个方法不用等待页面所有元素全部加载完毕再执行事件(window.onload需要),需要加一个标志,判断是否切换了页面。
共有如下情况:
切换页面:ajax请求page1_1_get获取第一份数据;
切换页面,且点击了停止按钮:提示用户停止采集;
切换页面,未点击停止按钮:将第一份数据展示在前端中,再调用getMessage再次请求page1_1_page实时显示;
5.当点击停止按钮,设置标志符isStop=true,(document).read会自动检查isStop标识符的状态

posted @ 2024-06-20 17:58  踩坑大王  阅读(40)  评论(0编辑  收藏  举报