pandas 解析页面table
import pandas as pd ff = """ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <head> <title>大连口岸物流网 | 口岸全程物流综合查询系统</title> <link rel="stylesheet" type="text/css" media="all" href="/dqs/styles/themes/blue/import.css" /> <style type="text/css"> <!-- #div1{ display:block; } --> </style> <script src="/dqs/scripts/jquery-1.2.6.js" type="text/javascript"></script> <script src="/dqs/scripts/jquery.blockUI.js" type="text/javascript"></script> <script src="/dqs/scripts/jquery.tabs.pack.js" type="text/javascript"></script> <script src="/dqs/scripts/jquery.history.pack.js" type="text/javascript"></script> <script src="/dqs/scripts/jquery.metadata.js" type="text/javascript"></script> <script src="/dqs/scripts/jquery.form.js" type="text/javascript"></script> <script src="/dqs/scripts/global.js" type="text/javascript"></script> <!--[if IE]><script type="text/javascript" src="/dqs/styles/themes/blue/date_pick/jquery.bgiframe.min.js"></script><![endif]--> <script type="text/javascript" src="/dqs/styles/themes/blue/date_pick/date.js"></script> <script type="text/javascript" src="/dqs/styles/themes/blue/date_pick/jquery.datePicker.min-2.1.2.js"></script> </head> <body> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <style type=text/css> #sso_header{ font-size: 13px; height: 60px; text-align: center; } #sso_header a:hover, #header_div a:active{ color: #e00000; } #sso_header #sso_content{ height: 60px; width: 1000px; margin: 0px auto; background-image: url(http://apollo.dpn.com.cn/sso/image/sso_header_bg.gif); background-repeat: no-repeat; background-position: right top; } #sso_header #sso_sys_logo{ height: 60px; width: 640px; float: left; background-image: url(/dqs/sys_logo.gif); background-repeat: no-repeat; background-position: left top; } #sso_header #sso_user{ text-align: center; margin-top: 10px; float: right; width: 360px; } #sso_header #sso_function{ text-align: center; margin-top: 3px; float: right; width: 360px; } </style> <div id="sso_header"> <div id="sso_content"> <div id="sso_sys_logo"></div> <div id="sso_user"> 张秀芹 [免费注册用户默认所属组织] </div> <div id="sso_function"> <select name="select" id="_gMenu" onChange="gSwitchMenu()"> <option value="#">请选择系统</option> <option value="http://apollo.dpn.com.cn/scs/">箱管家口岸公共智能箱管系统</option> <option value="http://apollo.dpn.com.cn/dbs/">集装箱场站智能商务服务系统</option> <option value="http://apollo.dpn.com.cn/eps/">电子支付平台</option> <option value="http://dcd.dpn.com.cn/vgm/">集装箱重量验证综合服务系统</option> <option value="http://apollo.dpn.com.cn/epm/">电子支付管理平台</option> <option value="http://apollo.dpn.com.cn/sso/page/frame?ssopid=EPC">EPC系统</option> <option value="http://dcd.dpn.com.cn/dcd">危险货物网上申报系统</option> <option value="http://apollo.dpn.com.cn/iccs">冷链食品物流监控平台</option> <option value="http://apollo.dpn.com.cn/sso/page/frame?ssopid=EPC">船舶综合申报系统</option> <option value="http://ais.dpn.com.cn/ais/index.jsp">船舶动态跟踪系统</option> <option value="http://mpc.dpn.com.cn/itc/forwardservlet">多式联运协同服务系统</option> <option value="http://apollo.dpn.com.cn/jupiter/welcome.jsp">用户统一管理系统</option> <option value="http://apollo.dpn.com.cn/oms/welcome.jsp">信息定制管理系统</option> <option value="http://apollo.dpn.com.cn/twm/">码头作业信息服务系统</option> <option value="http://apollo.dpn.com.cn/spc">海运中转服务平台</option> <option value="http://apollo.dpn.com.cn/mfd">舱单在线申报系统</option> <option value="http://apollo.dpn.com.cn/nms/init.do">海关新舱单支持系统</option> <option value="http://csp.dpn.com.cn/csp/pages/index.jsp">口岸码头社区综合服务平台</option> <option value="http://ciq.dpn.com.cn/eqs-w">空港快件检验检疫申报系统</option> <option value="http://apollo.dpn.com.cn/bip">订舱指南信息发布系统</option> <option value="http://ops.dpn.com.cn/sis">大连航运指数系统</option> <option value="http://apollo.dpn.com.cn/trialfunc/welcome.jsp">会员订制服务</option> <option value="http://ops.dpn.com.cn/edi-web/">EDI报文状态跟踪平台</option> <option value="http://apollo.dpn.com.cn/ebillw/default.do">计费查询系统</option> <option value="http://ops.dpn.com.cn/ehc/pages/index.jsp">高速公路通行费趸缴系统</option> <option value="http://apollo.dpn.com.cn/tps/default.do">场站装箱理货商务服务系统</option> <option value="http://apollo.dpn.com.cn/dpcs">大港集箱隐患随手拍系统</option> <option value="http://ciq.dpn.com.cn/eqs-qdw">青岛空港检验检疫申报系统</option> </select> | <a target="_blank" href="http://apollo.dpn.com.cn/sso/modifypwd.action?ssouid=DLQIFAN">修改密码</a> | <a href="/dqs/ssoexit">安全退出</a> </div> </div> </div> <script type="text/javascript"> function gSwitchMenu(){ window.location.href = document.getElementById('_gMenu').value; } </script> <div id="nav"> <div class="nav_content"> <ul id="mainMenu"> <li class="nav_link"> <!-- <li class="nav_link" onMouseOver="setDiv(0);"> --> <a target="_top" href="/dqs/queryShip.do?type=init">船舶查询</a> </li> <li class="nav_link"> <!-- <li class="nav_link" onMouseOver="setDiv(1);"> --> <a target="_top" href="/dqs/queryImport.do?type=init">进口查询</a> </li> <li class="nav_link"> <!-- <li class="nav_link" onMouseOver="setDiv(2);"> --> <a target="_top" href="/dqs/queryExport.do?type=init">出口查询</a> </li> <li class="nav_link"> <a target="_top" href="/dqs/queryCusHarbDynamic.do">出口集装箱关港动态查询</a> </li> <li class="nav_link"> <a target="_top" href="/dqs/nms.do?method=hz_success_list">新舱单回执查询</a> </li> <li class="nav_link"> <a target="_top" href="/dqs/dgodCargoSearch.do">危险货物查询</a> </li> <li onmouseover="setDiv(6);"> 其他查询 </li> <li onmouseover="setDiv(7);"> 码头提箱费支付 </li> </ul> </div> </div> <div id="div6" style="display:none;"> <div id="sub"> <div class="sub_content"> <ul> <li class="sub_noline"><a target="_top" href="/dqs/itcCntrDynamicTrack.do">多式联运动态</a></li> <li class="sub_noline"><a target="_top" href="/dqs/airportWaybillQuery.do">空港运单查询</a></li> <li class="sub_noline"><a target="_top" href="/dqs/queryFreeDate.do">船公司用箱免费期查询</a></li> <li class="sub_noline"><a target="_top" href="/dqs/unpacking.do?method=queryUnpackingList&init=init">拆箱动态查询</a></li> </ul> </div> </div> </div> <div id="div7" style="display:none;"> <div id="sub"> <div class="sub_content"> <ul> <li class="sub_noline"><a href="http://apollo.dpn.com.cn/psc/page/index/ex" target="_blank">提箱费支付</a></li> <li class="sub_noline"><a target="_top" href="/dqs/tfp.do?method=orderDetailListInit">支付明细查询</a></li> </ul> </div> </div> </div> <div id="content"> <div id="_appWindow" style="display:none;"> <div id="_dialogWin"> <div id="dialog-header"> <div id="dialog-title"> ???message.info??? </div> <div class="dialog-close"> <a href="javascript:$.unblockUI();">X</a> </div> </div> <div id="dialog-body"> <div id="dialog-message"></div> </div> <div id="dialog-footer"></div> </div> <div id="_waitWin"> <table width="100%"> <tr> <td align="right" class="dialog-close"> <a href="javascript:$.unblockUI();">X</a> </td> </tr> <tr> <td class="waitText"> <img src='/dqs/styles/themes/blue/images/loading.gif' align="absmiddle"/> 正在提交,请稍等... </td> </tr> <tr> <td height="5"></td> </tr> </table> </div> <div id="_waitAjaxWin"> ???message.load??? </div> </div> <script type="text/javascript" src="/dqs/scripts/autocomplete/autocomplete.js"></script> <script type="text/javascript" src="/dqs/scripts/validate/jquery.validate.mypack.js"></script> <script type="text/javascript" src="/dqs/scripts/validate/additional-methods.js"></script> <script type="text/javascript" src="/dqs/scripts/validate/messages_cn.js"></script> <base href="http://apollo.dpn.com.cn:80/dqs/"> <style> .inp { color:red; } </style> <script type="text/javascript"> //setDiv(0); $(document).ready(function() { $("[@name=pagingBtn]").click(function() { var pageNum = $(this).attr("id"); //$("#shipForm").attr("action", "queryShip.do?pageNum="+pageNum); //lockScreen();//页面加罩 //$("#shipForm").submit(); $("#shipForm")[0].action = "/dqs/queryShip.do?pageNum="+pageNum; lockScreen();//页面加罩 $("#shipForm")[0].submit(); }); }); function toUpper(obj) { var StrLower = obj.value; obj.value = StrLower.toUpperCase(); } function query() { var cn_vsl_m = $("#cn_vsl_m").val().replace(/(^\s*)|(\s*$)/g, ""); var full_vsl_m = $("#full_vsl_m").val().replace(/(^\s*)|(\s*$)/g, ""); $("#cn_vsl_m").val(cn_vsl_m); $("#full_vsl_m").val(full_vsl_m); if($("#shipForm").valid()){ $("#shipForm").attr("action", "queryShip.do"); lockScreen();//页面加罩 $("#shipForm").submit(); } } function queryDetail(obj,containTermI,impVoyageNo,expVoyageNo) { $("#shipForm").attr("action", "queryShipDetail.do?form_key=" + obj +"&containTermI="+containTermI+"&impVoyageNo="+impVoyageNo+"&expVoyageNo="+expVoyageNo); lockScreen();//页面加罩 $("#shipForm").submit(); } </script> <div id="content"> <form id="shipForm" method="post" action="/dqs/queryShip.do"> <div class="crumb">当前位置:<a href="/dqs/queryShip.do">船舶查询</a> ><span class="red_12px">船舶查询</span></div> <div id="message" class="tips_error" style="display: none"></div> <div class="search"> <table class="table_search"> <tr> <th> <span class="tpinfo">*</span>开始时间: </th> <td> <input type="text" name="startDate" value="2023-04-09" readonly="readonly" id="startDate" class="date-pick {required:true}" /> </td> <th> <span class="tpinfo">*</span>结束时间: </th> <td colspan="2"> <input type="text" name="endDate" value="2023-05-09" onblur="toUpper(this)" readonly="readonly" id="endDate" class="date-pick {required:true}" /> </td> </tr> <tr> <th> 中文船名: </th> <td> <input type="text" name="cn_vsl_m" value="" onblur="toUpper(this)" id="cn_vsl_m" /> </td> <th> 英文船名: </th> <td> <input type="text" name="full_vsl_m" value="" onkeyup="javascript:this.value=this.value.toUpperCase();" onblur="toUpper(this)" id="full_vsl_m" /> </td> <td> <input type="button" value="查询" class="btn" onclick="query()"/> </td> </tr> </table> <!-- search --></div> <div class="data_info"> <div class="btn_left"> 共找到 <span class="hot">455</span> 条记录,每页显示 <span class="hot">15</span> 条记录 </div> <!-- data_info --></div> <div style="display: block" id="ming"> <table class="table_base table_body" id="id"> <tr class="thead"> <td id="left_noline"> 中文船名 </td> <td > 英文船名 </td> <td> 船舶编号 </td> <td> 船舶IMO </td> <td> 进口航次 </td> <td> 出口航次 </td> <td> 靠泊码头 </td> <td> 关区 </td> <td id="right_noline"> 船舶动态 </td> </tr> <tr class=""> <td id="left_noline"> 仁建广州 </td> <td> REN JIAN GUANG ZHOU </td> <td> A080413000003 </td> <td> </td> <td> 2309S </td> <td> 2309SS </td> <td> DCT </td> <td> 0908 </td> <td id="right_noline"> <a href="javascript:queryDetail('140970','Y','2309S','2309SS')">船舶动态</a> </td> </tr> <tr class="evenrow"> <td id="left_noline"> 新永通2 </td> <td> XIN YONG TONG 2 </td> <td> A100306000051 </td> <td> -- </td> <td> 2370 </td> <td> 2370 </td> <td> DCT </td> <td> 0908 </td> <td id="right_noline"> <a href="javascript:queryDetail('141030','Y','2370','2370')">船舶动态</a> </td> </tr> <tr class=""> <td id="left_noline"> 新永通2 </td> <td> XIN YONG TONG 2 </td> <td> A100306000051 </td> <td> -- </td> <td> 2371 </td> <td> 2371 </td> <td> DCT </td> <td> 0908 </td> <td id="right_noline"> <a href="javascript:queryDetail('141031','Y','2371','2371')">船舶动态</a> </td> </tr> <tr class="evenrow"> <td id="left_noline"> 新秦皇岛 </td> <td> XIN QIN HUANG DAO </td> <td> A010004000101 </td> <td> 9304784 </td> <td> 092N </td> <td> 093S </td> <td> DCT </td> <td> 0908 </td> <td id="right_noline"> <a href="javascript:queryDetail('140977','Y','092N','093S')">船舶动态</a> </td> </tr> <tr class=""> <td id="left_noline"> 昌盛集5 </td> <td> CHANG SHENG JI 5 </td> <td> A080412000029 </td> <td> </td> <td> 347N </td> <td> 347S </td> <td> DCT </td> <td> 0908 </td> <td id="right_noline"> <a href="javascript:queryDetail('140873','Y','347N','347S')">船舶动态</a> </td> </tr> <tr class="evenrow"> <td id="left_noline"> 博格 </td> <td> BERGE GROSSGLOCKNER </td> <td> B9750921 </td> <td> 9750921 </td> <td> XL2301 </td> <td> XL2302 </td> <td> 新港矿石 </td> <td> </td> <td id="right_noline"> <a href="javascript:queryDetail('230407155745935108dv','N','XL2301','XL2302')">船舶动态</a> </td> </tr> <tr class=""> <td id="left_noline"> 抚龙31号 </td> <td> FU LONG 31 HAO </td> <td> A100112000215 </td> <td> </td> <td> 2309 </td> <td> 2309 </td> <td> 北良 </td> <td> </td> <td id="right_noline"> <a href="javascript:queryDetail('230408102112774624dv','N','2309','2309')">船舶动态</a> </td> </tr> <tr class="evenrow"> <td id="left_noline"> 安吉32 </td> <td> AN JI 32 </td> <td> A010021000171 </td> <td> </td> <td> 23007N </td> <td> 23007S </td> <td> 大窑湾汽车 </td> <td> </td> <td id="right_noline"> <a href="javascript:queryDetail('230408183831691142dv','N','23007N','23007S')">船舶动态</a> </td> </tr> <tr class=""> <td id="left_noline"> 凯杰 </td> <td> NEW EMINENCE </td> <td> C9799135 </td> <td> 9799135 </td> <td> 2304A </td> <td> 2304D </td> <td> 新港 </td> <td> </td> <td id="right_noline"> <a href="javascript:queryDetail('230328140045936368dv','N','2304A','2304D')">船舶动态</a> </td> </tr> <tr class="evenrow"> <td id="left_noline"> 海丰名古屋 </td> <td> SITC NAGOYA </td> <td> B9308053 </td> <td> 9308053 </td> <td> 2327W </td> <td> 2328E </td> <td> DCT </td> <td> 0908 </td> <td id="right_noline"> <a href="javascript:queryDetail('140948','Y','2327W','2328E')">船舶动态</a> </td> </tr> <tr class=""> <td id="left_noline"> 飞龙 </td> <td> BIRYONG </td> <td> B9135250 </td> <td> 9135250 </td> <td> 946W </td> <td> 946E </td> <td> DDCT </td> <td> </td> <td id="right_noline"> <a href="javascript:queryDetail('141037','Y','946W','946E')">船舶动态</a> </td> </tr> <tr class="evenrow"> <td id="left_noline"> 三协之星 </td> <td> SAN XIE ZHI XING </td> <td> A070522000145 </td> <td> </td> <td> 2304A </td> <td> 2304A </td> <td> 旅顺新港 </td> <td> </td> <td id="right_noline"> <a href="javascript:queryDetail('23040708473051367dv','N','2304A','2304A')">船舶动态</a> </td> </tr> <tr class=""> <td id="left_noline"> 延展82 </td> <td> YAN ZHAN 82 </td> <td> A080413000010 </td> <td> -- </td> <td> 2338 </td> <td> 2339 </td> <td> DCT </td> <td> 0908 </td> <td id="right_noline"> <a href="javascript:queryDetail('141058','Y','2338','2339')">船舶动态</a> </td> </tr> <tr class="evenrow"> <td id="left_noline"> 长发隆 </td> <td> CHANG FA LONG </td> <td> A140012000016 </td> <td> 9442627 </td> <td> 2312 </td> <td> 2313 </td> <td> 大窑湾汽车 </td> <td> </td> <td id="right_noline"> <a href="javascript:queryDetail('230407154303248820dv','N','2312','2313')">船舶动态</a> </td> </tr> <tr class=""> <td id="left_noline"> 汉玛尼亚柏林 </td> <td> HAMMONIA BEROLINA </td> <td> B9336177 </td> <td> 9336177 </td> <td> 312NI </td> <td> 314E </td> <td> DCT </td> <td> 0908 </td> <td id="right_noline"> <a href="javascript:queryDetail('140952','Y','312NI','314E')">船舶动态</a> </td> </tr> <tr class="thead"> <td id="left_noline"> 中文船名 </td> <td > 英文船名 </td> <td> 船舶编号 </td> <td> 船舶IMO </td> <td> 进口航次 </td> <td> 出口航次 </td> <td> 靠泊码头 </td> <td> 关区 </td> <td id="right_noline"> 船舶动态 </td> </tr> </table> <div class="table_spacing"></div> <div class="action"> <span class="black_12px"> <a href="javascript:;" id="0" name="pagingBtn">第一页</a> <a href="javascript:;" id='0' name="pagingBtn">上一页</a> <a href="javascript:;" id="0" name="pagingBtn">1</a> <span class="hot">[2]</span> <a href="javascript:;" id="2" name="pagingBtn">3</a> <a href="javascript:;" id="3" name="pagingBtn">4</a> <a href="javascript:;" id="4" name="pagingBtn">5</a> <a href="javascript:;" id="2" name="pagingBtn">下一页</a> | <a href="javascript:;" id="30" name="pagingBtn">最后一页</a> </span> </div> </div> </form> <table> <tr> <td align="left"> 注:本查询结果仅供参考。该功能也可通过大连港壹港通 手机APP查询,下载地址为:<a target="_blank" href="http://news.dpn.com.cn/app/index.html">http://news.dpn.com.cn/app/index.html</a> </td> </tr> </table> <!-- content --></div> </div> <div id="footer" align="center"> <div id="footer" style="background-color: #ffffff"> <div class="footer_content"> <a href="#">关于我们</a> | <a href="#" >免责条款</a> | <a href="#" >问题反馈</a> <br> 技术支持:大连口岸物流网有限公司 7*24小时值班热线:400-668-5666 86-0411-82731333 </div> </div> </div> </body> </html> """ result_list = [] pd_table = pd.read_html(ff, encoding="utf-8", header=0)[2].fillna("") headers = pd_table.columns.tolist() allrows = pd_table.values.tolist() for onv in allrows: one_dict = dict(zip(headers, onv)) result_list.append(one_dict) pass print(111)
标签:
python爬虫
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· winform 绘制太阳,地球,月球 运作规律