随笔分类 - 3 大语言模型
摘要:目录DeepSeek-OCR: Contexts Optical CompressionTL;DRMethodDeepEncoderDeepDecoderDataExperiment总结与思考相关链接 DeepSeek-OCR: Contexts Optical Compression link 时
阅读全文
摘要:目录Qwen3 Technical ReportTL;DRArchitectureMethodPre-trainingPost-trainingLong-CoT Cold StartThinking Mode FusionStage2的Reasoning RL 与 Stage4的General RL
阅读全文
摘要:目录KIMI K2: OPEN AGENTIC INTELLIGENCETL;DRMethodQK-Clip在Transformer Attention中,什么是attention logits爆炸问题?QKClip为什么能解决attention logits爆炸的问题?AlgorithmPre-t
阅读全文
摘要:引爆推理革命:从PPO到GRPO,强化学习如何重塑大语言模型 引言:当强化学习遇上大型语言模型 近年来,大型语言模型(LLM)以前所未有的速度席卷了人工智能领域。然而,预训练的LLM虽然知识渊博,但其输出往往难以完全符合人类的价值观和特定任务的需求。 为了解决这一“对齐”难题,一种新的技术范式——基
阅读全文
摘要:目录KIMI K1.5: SCALING REINFORCEMENT LEARNING WITH LLMSTL;DRMethodRL Prompt Set制作Long-CoT Supervised Fine-Tuning强化学习算法长度惩罚采样策略视觉数据Long2short CoT模型Model
阅读全文
摘要:目录DAPO: An Open-Source LLM Reinforcement Learning System at ScaleTL;DRBackgroundMethodClip-HigherDynamic SamplingOverlong Reward ShapingExperiment总结与思
阅读全文
摘要:目录QWENLONG-L1: Towards Long-Context Large Reasoning Models with Reinforcement LearningTL;DRMotivationsuboptimal training efficiencyunstable optimizati
阅读全文
摘要:目录Training language models to follow instructions with human feedbackTL;DRMethodDatasetModelSupervised fine-tuningReward modeling(RM)Reinforcement Lea
阅读全文
摘要:目录DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement LearningTL;DRMethodExperiment总结与思考相关链接 DeepSeek-R1: Incentivizing Reasonin
阅读全文
摘要:目录DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language ModelsTL;DRMethodData CollectionDeepSeekMath-Base 7B训练与评估Reinforcement
阅读全文
摘要:目录DeepSeek-V3 Technical Report解读TL;DR优势训练数据参数量Method架构MLA(Multi-Head Latent Attention)DeepSeekMoEMoEDeepSeekMoEMTP(Multi-Token Prediction)基建FP8训练部署Pre
阅读全文
摘要:DRIVEVLM: The Convergence of Autonomous Driving and Large Vision-Language Models DriveVLM 时间:24.02 机构:Tsinghua University && Li Auto TL;DR 当前自动驾驶落地的主要
阅读全文
摘要:名称 KOSMOS: Language Is Not All You Need: Aligning Perception with Language Models 时间:23.05 机构:Microsoft TL;DR 一种输入多模型信息的大语言模型,作者称之为多模型大语言模型(MLLM),可以图多
阅读全文

浙公网安备 33010602011771号