2019 年 1月 11 日随笔档案 - 乐乐章

2019年1月11日

摘要：时间线： OpenAI 发表的 Trust Region Policy Optimization, Google DeepMind 看过 OpenAI 关于 TRPO后, 2017年7月7号，抢在 OpenAI 前面把 Distributed PPO给先发布了. OpenAI 还是在 2017年7 阅读全文

posted @ 2019-01-11 21:47 乐乐章阅读(6907) 评论(0) 推荐(0) 编辑

强化学习---A3C

摘要： Asynchronous Advantage Actor-Critic (A3C) 在RL任务中，我们本质上最终要学习的是策略（Policy） value-based方法：间接方法，即通过学习值函数（value function）或者动作值函数（action-value function）来得到po 阅读全文

posted @ 2019-01-11 17:27 乐乐章阅读(880) 评论(0) 推荐(0) 编辑

119. Pascal's Triangle II

摘要： Given a non-negative index k where k ≤ 33, return the kth index row of the Pascal's triangle. Note that the row index starts from 0. In Pascal's trian 阅读全文

posted @ 2019-01-11 13:40 乐乐章阅读(115) 评论(0) 推荐(0) 编辑

118. Pascal's Triangle

摘要： Given a non-negative integer numRows, generate the first numRows of Pascal's triangle. In Pascal's triangle, each number is the sum of the two numbers 阅读全文

posted @ 2019-01-11 13:18 乐乐章阅读(102) 评论(0) 推荐(0) 编辑

乐乐章

NLP/推荐我很菜

公告

乐乐章

NLP/推荐 我很菜

公告

NLP/推荐我很菜