随笔分类 -  3 大语言模型

摘要:目录DeepSeek-OCR: Contexts Optical CompressionTL;DRMethodDeepEncoderDeepDecoderDataExperiment总结与思考相关链接 DeepSeek-OCR: Contexts Optical Compression link 时 阅读全文
posted @ 2025-10-21 22:49 fariver 阅读(79) 评论(0) 推荐(0)
摘要:目录Qwen3 Technical ReportTL;DRArchitectureMethodPre-trainingPost-trainingLong-CoT Cold StartThinking Mode FusionStage2的Reasoning RL 与 Stage4的General RL 阅读全文
posted @ 2025-08-02 13:58 fariver 阅读(110) 评论(0) 推荐(0)
摘要:目录KIMI K2: OPEN AGENTIC INTELLIGENCETL;DRMethodQK-Clip在Transformer Attention中,什么是attention logits爆炸问题?QKClip为什么能解决attention logits爆炸的问题?AlgorithmPre-t 阅读全文
posted @ 2025-08-01 21:53 fariver 阅读(388) 评论(0) 推荐(0)
摘要:引爆推理革命:从PPO到GRPO,强化学习如何重塑大语言模型 引言:当强化学习遇上大型语言模型 近年来,大型语言模型(LLM)以前所未有的速度席卷了人工智能领域。然而,预训练的LLM虽然知识渊博,但其输出往往难以完全符合人类的价值观和特定任务的需求。 为了解决这一“对齐”难题,一种新的技术范式——基 阅读全文
posted @ 2025-07-22 21:44 fariver 阅读(596) 评论(0) 推荐(0)
摘要:目录KIMI K1.5: SCALING REINFORCEMENT LEARNING WITH LLMSTL;DRMethodRL Prompt Set制作Long-CoT Supervised Fine-Tuning强化学习算法长度惩罚采样策略视觉数据Long2short CoT模型Model 阅读全文
posted @ 2025-07-21 20:37 fariver 阅读(153) 评论(0) 推荐(0)
摘要:目录DAPO: An Open-Source LLM Reinforcement Learning System at ScaleTL;DRBackgroundMethodClip-HigherDynamic SamplingOverlong Reward ShapingExperiment总结与思 阅读全文
posted @ 2025-07-20 18:58 fariver 阅读(90) 评论(0) 推荐(0)
摘要:目录QWENLONG-L1: Towards Long-Context Large Reasoning Models with Reinforcement LearningTL;DRMotivationsuboptimal training efficiencyunstable optimizati 阅读全文
posted @ 2025-07-20 15:07 fariver 阅读(45) 评论(0) 推荐(0)
摘要:目录Training language models to follow instructions with human feedbackTL;DRMethodDatasetModelSupervised fine-tuningReward modeling(RM)Reinforcement Lea 阅读全文
posted @ 2025-07-17 21:58 fariver 阅读(142) 评论(0) 推荐(0)
摘要:目录DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement LearningTL;DRMethodExperiment总结与思考相关链接 DeepSeek-R1: Incentivizing Reasonin 阅读全文
posted @ 2025-07-15 20:28 fariver 阅读(62) 评论(0) 推荐(0)
摘要:目录DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language ModelsTL;DRMethodData CollectionDeepSeekMath-Base 7B训练与评估​Reinforcement 阅读全文
posted @ 2025-07-11 20:08 fariver 阅读(171) 评论(0) 推荐(0)
摘要:目录DeepSeek-V3 Technical Report解读TL;DR优势训练数据参数量Method架构MLA(Multi-Head Latent Attention)DeepSeekMoEMoEDeepSeekMoEMTP(Multi-Token Prediction)基建FP8训练部署Pre 阅读全文
posted @ 2025-02-02 19:08 fariver 阅读(1298) 评论(0) 推荐(0)
摘要:DRIVEVLM: The Convergence of Autonomous Driving and Large Vision-Language Models DriveVLM 时间:24.02 机构:Tsinghua University && Li Auto TL;DR 当前自动驾驶落地的主要 阅读全文
posted @ 2024-08-07 16:45 fariver 阅读(378) 评论(0) 推荐(0)
摘要:名称 KOSMOS: Language Is Not All You Need: Aligning Perception with Language Models 时间:23.05 机构:Microsoft TL;DR 一种输入多模型信息的大语言模型,作者称之为多模型大语言模型(MLLM),可以图多 阅读全文
posted @ 2024-03-27 00:12 fariver 阅读(112) 评论(0) 推荐(0)