NLP/深度学习 - 随笔分类 - morein2008

python分割长文本

摘要：思路：设置切块允许的最大文本长度，先按照允许的最大文本长度切出一个chunk，然后检查chunk内部是否存在逗号、句号、感叹号、问号、空格等自然的语义分割符，若存在，在把该chunk继续切分，否则，该切块就是最终的切块。 def cut_text(full_text, max_chunk_size= 阅读全文

posted @ 2025-08-04 15:31 morein2008 阅读(42) 评论(0) 推荐(0)

text2audio

摘要：TextToSpeech,TTS相关资源：代码：https://github.com/coqui-ai/TTS 在线文本转语音：https://text2audio.cc/ 音频文件格式转换：https://app.xunjieshipin.com/mp3-to-wav/ 阅读全文

posted @ 2024-12-25 18:13 morein2008 阅读(63) 评论(0) 推荐(0)

ner任务中subword对tag序列的影响

摘要：https://tianchi.aliyun.com/forum/post/336310 由于标注数据通常是在word级别进行标注的，既然word还会被切分成subtokens，那么意味着我们还需要对标注数据进行subtokens的对齐。同时，由于预训练模型输入格式的要求，往往还需要加上一些特殊符号阅读全文

posted @ 2023-04-17 14:25 morein2008

pytorch GPU/CPU版本离线whl python包

摘要：https://download.pytorch.org/whl/torch_stable.html 阅读全文

posted @ 2022-10-19 17:12 morein2008

torch进行多GPU卡训练时，报错RuntimeError: Address already in use

摘要：torch进行GPU卡训练时，报错RuntimeError: Address already in use参考：https://www.it610.com/article/1279180977062559744.htm问题在于，TCP的端口被占用，一种解决方法是，运行程序的同时指定端口，端口号随意给阅读全文

posted @ 2022-10-14 12:03 morein2008

GPU多卡训练torch模型

摘要：用命令即可： python3 -m torch.distributed.launch --master_port 10001 --nproc_per_node 8 train.py 其中设置master_port是为了避免端口已被其他进程占用而报错，若报错可设置一个新端口号为master_port 阅读全文

posted @ 2022-09-01 18:08 morein2008

基于Hugging Face的transformers包的微调模型训练

摘要：transformers API参考链接：https://huggingface.co/docs/transformers/v4.21.2/en/training train.py from datasets import load_dataset from transformers import 阅读全文

posted @ 2022-09-01 18:02 morein2008

【转】word2vec 中的数学原理详解

摘要：https://www.cnblogs.com/peghoty/p/3857839.html 阅读全文

posted @ 2022-08-23 19:18 morein2008 阅读(25) 评论(0) 推荐(0)

word2vec层次化softmax理解

摘要：在外网发现一篇把word2vec的hierarchical softmax优化讲得比较好的博客，详见：http://building-babylon.net/2017/08/01/hierarchical-softmax/ 总结： 1、层次化softmax是为了解决用softmax进行V分类时（V是阅读全文

posted @ 2021-03-09 16:30 morein2008 阅读(1720) 评论(0) 推荐(0)

基于LDA主题模型和SVM的文本分类

摘要：用LDA模型抽取文本特征，再用线性SVM分类，发现效果很差，F1=0.654。 Precision:0.680,Recall:0.649,F1:0.654 RandomForestClassifier的表现也比较差： Precision:0.680,Recall:0.668,F1:0.670 而随便阅读全文

posted @ 2020-12-04 20:20 morein2008

【转】Transformer实现Pytorch版

摘要：https://blog.floydhub.com/the-transformer-in-pytorch/ 哈佛版本：http://nlp.seas.harvard.edu/2018/04/03/attention.html https://pytorch.org/docs/1.3.0/_modul 阅读全文

posted @ 2020-07-10 17:08 morein2008

深度学习入门: CNN与LSTM(RNN)

摘要：1. 理解深度学习与CNN: 台湾李宏毅教授的入门视频《一天搞懂深度学习》：https://www.bilibili.com/video/av16543434/ 其中对CNN算法的矩阵卷积运算：矩阵1与矩阵2相同位置上的元素进行相乘，再将所有乘积求和，得到卷积矩阵的对应元素值。 https://bl 阅读全文

posted @ 2018-04-07 12:51 morein2008 阅读(2403) 评论(0) 推荐(0)

morein2008

随笔分类 - NLP/深度学习

公告