3 大语言模型 - 随笔分类 - fariver

[PaperReading] DeepSeek-OCR: Contexts Optical Compression

摘要：目录DeepSeek-OCR: Contexts Optical CompressionTL;DRMethodDeepEncoderDeepDecoderDataExperiment总结与思考相关链接 DeepSeek-OCR: Contexts Optical Compression link 时阅读全文

posted @ 2025-10-21 22:49 fariver 阅读(79) 评论(0) 推荐(0)

[PaperReading] Qwen3 Technical Report

摘要：目录Qwen3 Technical ReportTL;DRArchitectureMethodPre-trainingPost-trainingLong-CoT Cold StartThinking Mode FusionStage2的Reasoning RL 与 Stage4的General RL 阅读全文

posted @ 2025-08-02 13:58 fariver 阅读(110) 评论(0) 推荐(0)

[PaperReading] KIMI K2: OPEN AGENTIC INTELLIGENCE

摘要：目录KIMI K2: OPEN AGENTIC INTELLIGENCETL;DRMethodQK-Clip在Transformer Attention中，什么是attention logits爆炸问题？QKClip为什么能解决attention logits爆炸的问题？AlgorithmPre-t 阅读全文

posted @ 2025-08-01 21:53 fariver 阅读(388) 评论(0) 推荐(0)

[思考] Reinforcement Learning on LLM

摘要：引爆推理革命：从PPO到GRPO，强化学习如何重塑大语言模型引言：当强化学习遇上大型语言模型近年来，大型语言模型（LLM）以前所未有的速度席卷了人工智能领域。然而，预训练的LLM虽然知识渊博，但其输出往往难以完全符合人类的价值观和特定任务的需求。为了解决这一“对齐”难题，一种新的技术范式——基阅读全文

posted @ 2025-07-22 21:44 fariver 阅读(596) 评论(0) 推荐(0)

[PaperReading] KIMI K1.5: SCALING REINFORCEMENT LEARNING WITH LLMS

摘要：目录KIMI K1.5: SCALING REINFORCEMENT LEARNING WITH LLMSTL;DRMethodRL Prompt Set制作Long-CoT Supervised Fine-Tuning强化学习算法长度惩罚采样策略视觉数据Long2short CoT模型Model 阅读全文

posted @ 2025-07-21 20:37 fariver 阅读(153) 评论(0) 推荐(0)

[PaperReading] DAPO: An Open-Source LLM Reinforcement Learning System at Scale

摘要：目录DAPO: An Open-Source LLM Reinforcement Learning System at ScaleTL;DRBackgroundMethodClip-HigherDynamic SamplingOverlong Reward ShapingExperiment总结与思阅读全文

posted @ 2025-07-20 18:58 fariver 阅读(90) 评论(0) 推荐(0)

[PaperReading] QWENLONG-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

摘要：目录QWENLONG-L1: Towards Long-Context Large Reasoning Models with Reinforcement LearningTL;DRMotivationsuboptimal training efficiencyunstable optimizati 阅读全文

posted @ 2025-07-20 15:07 fariver 阅读(46) 评论(0) 推荐(0)

[PaperReading] Training language models to follow instructions with human feedback

摘要：目录Training language models to follow instructions with human feedbackTL;DRMethodDatasetModelSupervised fine-tuningReward modeling(RM)Reinforcement Lea 阅读全文

posted @ 2025-07-17 21:58 fariver 阅读(142) 评论(0) 推荐(0)

[PaperReading] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

摘要：目录DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement LearningTL;DRMethodExperiment总结与思考相关链接 DeepSeek-R1: Incentivizing Reasonin 阅读全文

posted @ 2025-07-15 20:28 fariver 阅读(62) 评论(0) 推荐(0)

[PaperReading] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

摘要：目录DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language ModelsTL;DRMethodData CollectionDeepSeekMath-Base 7B训练与评估Reinforcement 阅读全文

posted @ 2025-07-11 20:08 fariver 阅读(171) 评论(0) 推荐(0)

[Paper Reading] DeepSeek-V3 Technical Report

摘要：目录DeepSeek-V3 Technical Report解读TL;DR优势训练数据参数量Method架构MLA(Multi-Head Latent Attention)DeepSeekMoEMoEDeepSeekMoEMTP(Multi-Token Prediction)基建FP8训练部署Pre 阅读全文

posted @ 2025-02-02 19:08 fariver 阅读(1298) 评论(0) 推荐(0)

[Paper Reading] DRIVEVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

摘要：DRIVEVLM: The Convergence of Autonomous Driving and Large Vision-Language Models DriveVLM 时间：24.02 机构：Tsinghua University && Li Auto TL;DR 当前自动驾驶落地的主要阅读全文

posted @ 2024-08-07 16:45 fariver 阅读(378) 评论(0) 推荐(0)

[Paper Reading] KOSMOS: Language Is Not All You Need: Aligning Perception with Language Models

摘要：名称 KOSMOS: Language Is Not All You Need: Aligning Perception with Language Models 时间：23.05 机构：Microsoft TL;DR 一种输入多模型信息的大语言模型，作者称之为多模型大语言模型(MLLM)，可以图多阅读全文

posted @ 2024-03-27 00:12 fariver 阅读(112) 评论(0) 推荐(0)

fariver

随笔分类 - 3 大语言模型

公告