07 2023 档案

摘要:```py from collections import deque from urllib.parse import urljoin, urlparse import requests from pyquery import PyQuery as pq import re from EpubCr 阅读全文
posted @ 2023-07-28 00:31 绝不原创的飞龙 阅读(14) 评论(0) 推荐(0) 编辑
摘要:## 滑动均值和标准差 为了更好利用向量化来加速,滑动窗口使用`np.lib.stride_tricks.sliding_window_view(x, win)`提取,它会返回所有`x[i]`开头并且长度为`win`的数组的数组。 ```py def rolling(x, win): r = np. 阅读全文
posted @ 2023-07-12 15:43 绝不原创的飞龙 阅读(265) 评论(0) 推荐(0) 编辑
摘要:``` score = ( class_weight + name_weight + children_comma_count + 1 + min(children_text_len // , 3) ) / (1 - link_density) ``` (1)正文元素,就是只在正文中可能出现的元素, 阅读全文
posted @ 2023-07-11 11:12 绝不原创的飞龙 阅读(19) 评论(0) 推荐(0) 编辑
摘要:```py #!/usr/bin/env python from __future__ import print_function import logging import re import sys from lxml.etree import tounicode from lxml.etree 阅读全文
posted @ 2023-07-10 19:14 绝不原创的飞龙 阅读(32) 评论(0) 推荐(0) 编辑
摘要:```py from lxml.html import tostring import lxml.html import re from .cleaners import normalize_spaces, clean_attributes from .encoding import get_enc 阅读全文
posted @ 2023-07-10 17:54 绝不原创的飞龙 阅读(17) 评论(0) 推荐(0) 编辑
摘要:## `browser.py` ```py def open_in_browser(html): """ Open the HTML document in a web browser, saving it to a temporary file to open it. Note that this 阅读全文
posted @ 2023-07-10 17:48 绝不原创的飞龙 阅读(20) 评论(0) 推荐(0) 编辑

点击右上角即可分享
微信分享提示