论文速读记录 | 2025.03




Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation

  • arxiv:
  • 来源:随机看到的文章。
  • 主要内容:
    • highway + LLM。

On the Role of Discount Factor in Offline Reinforcement Learning

Few-Shot Preference Learning for Human-in-the-Loop RL

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

  • arxiv:https://arxiv.org/abs/1703.03400
  • 来源:这篇工作(MAML)是上一篇 few-shot preference learning 用到的主要技术。(发现 MAML 的三个作者是 Chelsea Finn、Pieter Abbeel 和 Sergey Levine,好家伙…)
  • 主要内容:

DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback

Data Center Cooling System Optimization Using Offline Reinforcement Learning


本文作者:MoonOut

本文链接:https://www.cnblogs.com/moonout/p/18745325

版权声明:本作品采用知识共享署名-非商业性使用-禁止演绎 2.5 中国大陆许可协议进行许可。

posted @   MoonOut  阅读(19)  评论(0编辑  收藏  举报
点击右上角即可分享
微信分享提示
评论
收藏
关注
推荐
深色
回顶
收起
  1. 1 Sibelius: Violin Concerto in D Minor, Op. 47:III. Allegro, ma non tanto Jascha Heifetz / Chicago Symphony Orchestra
Sibelius: Violin Concerto in D Minor, Op. 47:III. Allegro, ma non tanto - Jascha Heifetz / Chicago Symphony Orchestra
00:00 / 00:00
An audio error has occurred.