07 2023 档案
摘要:```py from collections import deque from urllib.parse import urljoin, urlparse import requests from pyquery import PyQuery as pq import re from EpubCr
阅读全文
摘要:## 滑动均值和标准差 为了更好利用向量化来加速,滑动窗口使用`np.lib.stride_tricks.sliding_window_view(x, win)`提取,它会返回所有`x[i]`开头并且长度为`win`的数组的数组。 ```py def rolling(x, win): r = np.
阅读全文
摘要:``` score = ( class_weight + name_weight + children_comma_count + 1 + min(children_text_len // , 3) ) / (1 - link_density) ``` (1)正文元素,就是只在正文中可能出现的元素,
阅读全文
摘要:```py #!/usr/bin/env python from __future__ import print_function import logging import re import sys from lxml.etree import tounicode from lxml.etree
阅读全文
摘要:```py from lxml.html import tostring import lxml.html import re from .cleaners import normalize_spaces, clean_attributes from .encoding import get_enc
阅读全文
摘要:## `browser.py` ```py def open_in_browser(html): """ Open the HTML document in a web browser, saving it to a temporary file to open it. Note that this
阅读全文