2023 年 10月 3 日随笔档案 - 张博的博客

大模型rlhf 相关博客

摘要：想学习第一篇博客: https://huggingface.co/blog/zh/rlhf RLHF 技术分解 RLHF 是一项涉及多个模型和不同训练阶段的复杂概念，这里我们按三个步骤分解：预训练一个语言模型 (LM) ；聚合问答数据并训练一个奖励模型 (Reward Model，RM) ；用阅读全文

posted @ 2023-10-03 23:31 张博的博客阅读(118) 评论(0) 推荐(0) 编辑

大模型量化4

摘要： https://huggingface.co/blog/peft 看代码: from transformers import AutoModelForSeq2SeqLM + from peft import get_peft_model, LoraConfig, TaskType model_nam 阅读全文

posted @ 2023-10-03 23:07 张博的博客阅读(20) 评论(0) 推荐(0) 编辑

有关计算机和数学问题可以15122306087联系我wechat

摘要：有关计算机和数学问题可以15122306087联系我wechat 阅读全文

posted @ 2023-10-03 13:10 张博的博客阅读(4) 评论(0) 推荐(0) 编辑

大模型量化3

摘要： https://huggingface.co/blog/4bit-transformers-bitsandbytes 1. 8 位float The FP8 (floating point 8) format has been first introduced in the paper “FP8 f 阅读全文

posted @ 2023-10-03 13:06 张博的博客阅读(141) 评论(0) 推荐(0) 编辑

张博的博客

导航