2019 年 1月 5 日随笔档案 - 乐乐章

2019年1月5日

摘要： Policy Gradient综述： Policy Gradient，通过学习当前环境，直接给出要输出的动作的概率值。 Policy Gradient 不是单步更新，只能等玩完一个epoch，再更新参数，采取动作与动作评价是同一个函数，所以是一个on-policy Policy Gradient 需阅读全文

posted @ 2019-01-05 20:59 乐乐章阅读(1946) 评论(0) 推荐(0) 编辑

59. Spiral Matrix II

摘要： Given a positive integer n, generate a square matrix filled with elements from 1 to n2 in spiral order. Example: Input: 3 Output: [ [ 1, 2, 3 ], [ 8, 阅读全文

posted @ 2019-01-05 11:22 乐乐章阅读(99) 评论(0) 推荐(0) 编辑

乐乐章

NLP/推荐我很菜

公告

乐乐章

NLP/推荐 我很菜

公告

NLP/推荐我很菜