[置顶] 自我博弈偏好优化(Self-Play Preference Optimization,SPO)能否奖励模型?
posted @ 2025-08-22 11:07 limingqi 阅读(100) 评论(0) 推荐(0)
posted @ 2025-08-22 11:07 limingqi 阅读(100) 评论(0) 推荐(0)
posted @ 2025-07-26 12:48 limingqi 阅读(55) 评论(0) 推荐(0)
posted @ 2025-07-26 12:47 limingqi 阅读(108) 评论(0) 推荐(0)
posted @ 2026-02-06 16:49 limingqi 阅读(0) 评论(0) 推荐(0)
posted @ 2026-02-05 23:18 limingqi 阅读(11) 评论(0) 推荐(0)
posted @ 2026-02-03 17:00 limingqi 阅读(18) 评论(0) 推荐(0)
posted @ 2026-02-02 17:16 limingqi 阅读(17) 评论(0) 推荐(0)
posted @ 2026-02-02 15:33 limingqi 阅读(19) 评论(0) 推荐(0)
posted @ 2026-02-02 13:43 limingqi 阅读(19) 评论(0) 推荐(0)
posted @ 2026-02-02 11:23 limingqi 阅读(30) 评论(0) 推荐(0)
posted @ 2026-01-28 14:34 limingqi 阅读(10) 评论(0) 推荐(0)
posted @ 2026-01-23 16:44 limingqi 阅读(10) 评论(0) 推荐(0)
posted @ 2026-01-23 11:11 limingqi 阅读(10) 评论(0) 推荐(0)