douzujun - 博客园

2022年5月7日

摘要：剑指 Offer 45. 把数组排成最小的数输入一个非负整数数组，把数组里所有数字拼接起来排成一个数，打印能拼接出的所有数字中最小的一个。示例 1: 输入: [10,2] 输出: "102" 示例 2: 输入: [3,30,34,5,9] 输出: "3033459" 提示: 0 < nums.l 阅读全文

posted @ 2022-05-07 18:12 douzujun 阅读(233) 评论(0) 推荐(0) 编辑

2022年4月7日

awk实现去重输出

摘要： alias myuniq="awk 'BEGIN{a[\"\"]=1}{if(!(\$0 in a)){print \$0;a[\$0]=1}}'" awk 'BEGIN{a[""]=1}{if(!($0 in a)){print $0;a[$0]=1}}' 阅读全文

posted @ 2022-04-07 21:21 douzujun 阅读(74) 评论(0) 推荐(0) 编辑

2022年3月19日

NLP面试整理

摘要：机器学习深度学习 python c++ 阅读全文

posted @ 2022-03-19 18:49 douzujun 阅读(102) 评论(0) 推荐(0) 编辑

2022年3月2日

正则表达式断言

摘要： https://www.runoob.com/w3cnote/reg-lookahead-lookbehind.html 取前面不是数字，后面不是数字，中间是四位数的 query = "12356/2022安徽光伏政策2222" re.findall('(?<!\d)\d{4}(?!\d)', qu 阅读全文

posted @ 2022-03-02 17:16 douzujun 阅读(62) 评论(0) 推荐(0) 编辑

2022年3月1日

GPU多卡并行训练

摘要： https://www.i4k.xyz/article/Sophia_11/119950262 阅读全文

posted @ 2022-03-01 17:14 douzujun 阅读(51) 评论(0) 推荐(0) 编辑

2022年2月23日

psi计算

摘要：基础概念：https://zhuanlan.zhihu.com/p/344754828 import sys import pandas as pd import numpy as np import math # all_list = [] # df = pd.DataFrame(columns 阅读全文

posted @ 2022-02-23 18:04 douzujun 阅读(228) 评论(0) 推荐(0) 编辑

pandas随机创建数据

摘要： import numpy as np date = ['20210912', '20210922', '20211009', '20211102'] new_date = [] for i in range(100): new_date.extend(date) new_data = [] for 阅读全文

posted @ 2022-02-23 15:47 douzujun 阅读(221) 评论(0) 推荐(0) 编辑

2022年2月16日

搜索算法框架

摘要： https://mp.weixin.qq.com/s/97tl37JTZTsID7qPcdjIpg 阅读全文

posted @ 2022-02-16 17:39 douzujun 阅读(74) 评论(0) 推荐(0) 编辑

2022年1月20日

xpath用法

摘要： https://www.cnblogs.com/lei0213/p/7506130.html 阅读全文

posted @ 2022-01-20 19:22 douzujun 阅读(32) 评论(0) 推荐(0) 编辑

2022年1月5日

awk合并两个文件

摘要： https://blog.csdn.net/weixin_34032792/article/details/86010299 文件内容如下： more eng.txt chi.txt :::::::::::::: eng.txt :::::::::::::: semicolon comma deli 阅读全文

posted @ 2022-01-05 15:19 douzujun 阅读(219) 评论(0) 推荐(0) 编辑

2021年11月29日

cut用法

摘要： https://www.cnblogs.com/f-ck-need-u/p/7521357.html 阅读全文

posted @ 2021-11-29 12:27 douzujun 阅读(37) 评论(0) 推荐(0) 编辑

2021年11月15日

awk复合条件筛选

摘要： cat result.tmp.case_t |awk -F'\t' '(($3==2) || ($3==3)) && ($4 < 0.4) {print}' 按 '\t'分开列，第三列为2/3，第四列 < 0.4的数据输出阅读全文

posted @ 2021-11-15 15:17 douzujun 阅读(168) 评论(0) 推荐(0) 编辑

2021年11月8日

管道快速读取数据

摘要： import sys for line in sys.stdin: line = line.strip('\n\r').split('\t') print("{0}\t{1}\1{2}\1{3}\1{4}\t{5}".format(line[0], line[2], line[3], line[4] 阅读全文

posted @ 2021-11-08 18:25 douzujun 阅读(59) 评论(0) 推荐(0) 编辑

shell打乱文件所有行

摘要： shuf input_file.txt -o output_file.txt 阅读全文

posted @ 2021-11-08 17:45 douzujun 阅读(190) 评论(0) 推荐(0) 编辑

2021年10月21日

推荐系统评估指标

摘要：推荐系统评估指标 https://www.cnblogs.com/eilearn/p/14164687.html PNR（Positive Negative Rate）正逆序比 = 正序数 / 逆序数； AUC（Area Under Curve） ROC（Receiver Operating Ch 阅读全文

posted @ 2021-10-21 15:25 douzujun 阅读(561) 评论(0) 推荐(0) 编辑

2021年10月11日

sed使用小结

摘要： 1 语法 https://www.runoob.com/linux/linux-comm-sed.html Linux sed 命令是利用脚本来处理文本文件。 sed 可依照脚本的指令来处理、编辑文本文件。 Sed 主要用来自动编辑一个或多个文件、简化对文件的反复操作、编写转换程序等。 1. 1 语阅读全文

posted @ 2021-10-11 10:13 douzujun 阅读(61) 评论(0) 推荐(0) 编辑

shell找出出现在A文件中，但是不在B文件中的行

摘要： shell找出出现在A文件中，但是不在B文件中的行 cat A B B |sort |uniq -u > output.txt 阅读全文

posted @ 2021-10-11 10:04 douzujun 阅读(368) 评论(0) 推荐(0) 编辑

2021年9月23日

Optuna—超参自动化调整利器

摘要： https://zhuanlan.zhihu.com/p/259993570 阅读全文

posted @ 2021-09-23 12:41 douzujun 阅读(119) 评论(0) 推荐(0) 编辑

2021年9月22日

对比学习

摘要： # coding=utf-8 """PyTorch RoBERTa model. """ import math import warnings import fitlog import torch import torch.nn as nn import torch.nn.functional a 阅读全文

posted @ 2021-09-22 14:28 douzujun 阅读(300) 评论(0) 推荐(0) 编辑

2021年9月20日

召回粗排精排

摘要：笔记摘抄索引池是对当前所有item的判定，并不是所有item都可以出现在推荐这整个大的逻辑下面。举个例子，广告主的某个计划，只设定了相应的预算，如果预算花完了，或者广告主已经不想投了，那就需要从索引池里面拿掉。另一种情况是可能有多种索引池，广告主不想投放20-30的人群的时候，索引池就等于是其他年阅读全文

posted @ 2021-09-20 22:14 douzujun 阅读(260) 评论(0) 推荐(0) 编辑

2021年9月9日

批量将word数据写入txt文件

摘要： # In[1] import os from docx import Document dir_lists = os.listdir() for dir in dir_lists: if os.path.isdir(dir): # print(dir) words_lst = os.listdir( 阅读全文

posted @ 2021-09-09 22:18 douzujun 阅读(265) 评论(0) 推荐(0) 编辑

2021年9月7日

python分割pdf

摘要： from PyPDF2 import PdfFileReader, PdfFileWriter def split(path, name_of_split): pdf = PdfFileReader(path) pdf_writer = PdfFileWriter() for page in ran 阅读全文

posted @ 2021-09-07 14:46 douzujun 阅读(134) 评论(0) 推荐(0) 编辑

2021年9月5日

正则表达式匹配书名号内容

摘要： import re re.findall('《(.*?)》', '《1334》qasdfa《23423》') 阅读全文

posted @ 2021-09-05 15:09 douzujun 阅读(1313) 评论(0) 推荐(0) 编辑

2021年9月2日

pandas处理行数据(apply的应用)

摘要： # In[1] import os path = '/home/zjdou/jupyter/root/Smart-Writing/TextClassification/DATA' os.chdir(path) print(os.getcwd()) # In[2] import pandas as p 阅读全文

posted @ 2021-09-02 22:11 douzujun 阅读(283) 评论(0) 推荐(0) 编辑

显著性实验分析python

摘要： import sys import numpy as np from scipy import stats ### Normality Check # H0: data is normally distributed def normality_check(data_A, data_B, name, 阅读全文

posted @ 2021-09-02 15:24 douzujun 阅读(544) 评论(0) 推荐(0) 编辑

2021年8月30日

面试题_动态规划

摘要： 01背包完全背包 322. 零钱兑换给你一个整数数组 coins ，表示不同面额的硬币；以及一个整数 amount ，表示总金额。计算并返回可以凑成总金额所需的最少的硬币个数。如果没有任何一种硬币组合能组成总金额，返回 -1 。你可以认为每种硬币的数量是无限的。示例 1：输入：coi 阅读全文

posted @ 2021-08-30 20:04 douzujun 阅读(261) 评论(0) 推荐(0) 编辑

2021年8月23日

快排的改进

摘要： #include <iostream> #include <vector> #include <algorithm> using namespace std; int selectPartition(vector<int>& arr, int low, int high) { int mid = l 阅读全文

posted @ 2021-08-23 22:47 douzujun 阅读(62) 评论(0) 推荐(0) 编辑

2021年8月21日

NLPCC论文LaTeX封面生成代码

摘要：先生成该表格： # In[1] import pandas as pd import os df = pd.read_excel('list.xlsx') ids = df['Pap ID'].to_list() lens = df['Page Length'].to_list() titles = 阅读全文

posted @ 2021-08-21 22:13 douzujun 阅读(281) 评论(0) 推荐(0) 编辑

2021年7月8日

Transformer源代码解释之PyTorch篇

摘要： https://mp.weixin.qq.com/s?__biz=MzI4MDYzNzg4Mw==&mid=2247537696&idx=4&sn=4db4f54f831277c05e63b9c1df4ca75a&chksm=ebb76cf4dcc0e5e254f0b76fddcab79008837 阅读全文

posted @ 2021-07-08 11:06 douzujun 阅读(510) 评论(0) 推荐(0) 编辑

2021年7月5日

matplotlib画图并设置风格

摘要： https://github.com/garrettj403/SciencePlots Demo import numpy as np import matplotlib.pyplot as plt import matplotlib matplotlib.matplotlib_fname() de 阅读全文

posted @ 2021-07-05 20:05 douzujun 阅读(409) 评论(0) 推荐(0) 编辑

douzi

公告