乐乐章

2019年1月8日

摘要： 1.概述： QLearning基于值函数的方法，不同与policy gradient的方法，Qlearning是预测值函数，通过值函数来选择值函数最大的action，而policy gradient直接预测出action。 Q-learning 是一种基于值函数估计的强化学习方法，Policy G 阅读全文

posted @ 2019-01-08 14:46 乐乐章阅读(816) 评论(0) 推荐(0) 编辑

2019年1月7日

71. Simplify Path

摘要： Given an absolute path for a file (Unix-style), simplify it. For example,path = "/home/", => "/home"path = "/a/./b/../../c/", => "/c"path = "/a/../../ 阅读全文

posted @ 2019-01-07 14:58 乐乐章阅读(106) 评论(0) 推荐(0) 编辑

2019年1月6日

67. Add Binary（二进制求和）

摘要： Given two binary strings, return their sum (also a binary string). The input strings are both non-empty and contains only characters 1 or 0. Example 1 阅读全文

posted @ 2019-01-06 12:14 乐乐章阅读(127) 评论(0) 推荐(0) 编辑

2019年1月5日

强化学习--Policy Gradient

摘要： Policy Gradient综述： Policy Gradient，通过学习当前环境，直接给出要输出的动作的概率值。 Policy Gradient 不是单步更新，只能等玩完一个epoch，再更新参数，采取动作与动作评价是同一个函数，所以是一个on-policy Policy Gradient 需阅读全文

posted @ 2019-01-05 20:59 乐乐章阅读(1946) 评论(0) 推荐(0) 编辑

59. Spiral Matrix II

摘要： Given a positive integer n, generate a square matrix filled with elements from 1 to n2 in spiral order. Example: Input: 3 Output: [ [ 1, 2, 3 ], [ 8, 阅读全文

posted @ 2019-01-05 11:22 乐乐章阅读(99) 评论(0) 推荐(0) 编辑

2019年1月4日

54. Spiral Matrix（剑指offer 19）

摘要： Given a matrix of m x n elements (m rows, n columns), return all elements of the matrix in spiral order. Example 1: Input: [ [ 1, 2, 3 ], [ 4, 5, 6 ], 阅读全文

posted @ 2019-01-04 11:51 乐乐章阅读(198) 评论(0) 推荐(0) 编辑

2019年1月3日

58. Length of Last Word

摘要： Given a string s consists of upper/lower-case alphabets and empty space characters ' ', return the length of last word in the string. If the last word 阅读全文

posted @ 2019-01-03 20:15 乐乐章阅读(122) 评论(0) 推荐(0) 编辑

c++ string split

摘要： #include #include std::vector s_split(const std::string& in, const std::string& delim) { std::regex re{ delim }; // 调用 std::vector::vector (InputIterator first, InputIterator last,const all... 阅读全文

posted @ 2019-01-03 20:04 乐乐章阅读(3724) 评论(0) 推荐(0) 编辑

2018年12月30日

神经网络反向传播，通俗理解

摘要：前置知识： sigmod 函数 LR 1层神经网络 dL/dz 简称dz_，L(a,y）使用交叉熵。 da_ = dL/da = (-(y/a) + ((1-y)/(1-a))) dz_ = dL/da * da/dz = da_* g'(z) dw_ = dL/dz *dz/dw = dz* x 阅读全文

posted @ 2018-12-30 16:54 乐乐章阅读(512) 评论(0) 推荐(0) 编辑

2018年12月26日

大话设计模式C++ 备忘录模式

摘要：备忘录（Memento）：在不破坏封装性的前提下，捕获一个对象的内部状态，并在该对象之外保存这个状态。这样以后就可将对象恢复到原先保存的状态。角色：（1）Originator（发起人）：创建盒子，恢复盒子。负责创建一个Memento，用以记录当前时刻它的内部状态，并可以使用备忘录恢复内部状态。O 阅读全文

posted @ 2018-12-26 15:44 乐乐章阅读(409) 评论(0) 推荐(0) 编辑

NLP/推荐我很菜

公告

乐乐章

NLP/推荐 我很菜

公告

NLP/推荐我很菜