自然语言处理 - 随笔分类 - wxiaoli

搭建NLP相关的python环境

摘要：## conda环境：https://www.cnblogs.com/wxiaoli/p/8830989.html ## 镜像安装lib：python -m pip install -i https://pypi.tuna.tsinghua.edu.cn/simple [libname] 必备lib 阅读全文

posted @ 2020-06-11 16:55 wxiaoli 阅读(502) 评论(0) 推荐(0)

NLP文本清理时常用的python小函数

摘要：1. 清理杂七杂八字符 2. 社交媒体文本中清除 @其他人阅读全文

posted @ 2019-09-27 19:53 wxiaoli 阅读(718) 评论(0) 推荐(0)

#论文阅读# Universial language model fine-tuning for text classification

摘要：论文链接：https://aclweb.org/anthology/P18-1031 对文章内容的总结文章研究了一些在general corpus上pretrain LM，然后把得到的model transfer到text classiffication上整个过程的训练技巧。这些技巧的切入点是阅读全文

posted @ 2019-06-26 16:01 wxiaoli 阅读(552) 评论(0) 推荐(0)

#论文阅读#attention is all you need

摘要：Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems. 2017: 5998-6008. 文章提出纯粹基于atten 阅读全文

posted @ 2018-11-06 12:23 wxiaoli 阅读(4644) 评论(0) 推荐(1)

python读文件出现中文乱码

摘要：更新：一个解释更详细和全面的博文：https://www.cnblogs.com/zhangqigao/p/6496172.html 最近开始处理中文文本，读取文件有时候会出现乱码。原因：编码和解码方式不一样。所以，解决这个问题的方法就是正确地解码，问题拆解为：1、弄清楚待查看文件的编码方式；2 阅读全文

posted @ 2017-10-19 21:31 wxiaoli 阅读(13713) 评论(0) 推荐(0)

<tf-idf + 余弦相似度> 计算文章的相似度

摘要：背景知识: （1）tf-idf 按照词TF-IDF值来衡量该词在该文档中的重要性的指导思想：如果某个词比较少见，但是它在这篇文章中多次出现，那么它很可能就反映了这篇文章的特性，正是我们所需要的关键词。 tf–idf is the product of two statistics, term fre 阅读全文

posted @ 2017-06-04 15:37 wxiaoli 阅读(11952) 评论(0) 推荐(1)

NLP任务中的基本指标(precision and recall )

摘要：》》以下内容参考wikipedia。 https://en.wikipedia.org/wiki/Precision_and_recall 精确度 precision = (true positive)/(selected elements) = tp/(tp+fp) ，表示预测为正例的样本中，真正阅读全文

posted @ 2017-05-16 21:39 wxiaoli 阅读(1930) 评论(0) 推荐(0)

linguistic相关

摘要：Knowing a word means knowing both its sound and its meaning, while being able to use a word requires four kinds of information: 1) its sounds 2) its m 阅读全文

posted @ 2016-11-23 20:21 wxiaoli 阅读(461) 评论(0) 推荐(0)

LIFE TAKES ATTITUDES

随笔分类 - 自然语言处理