Junfei_Wang - 博客园

2020年1月5日

Regularization from Large Wights Perspective

摘要： Problem with Large Weights Large weights in a neural network are a sign of overfitting. A network with large weights has very likely learned the stati 阅读全文

posted @ 2020-01-05 08:11 Junfei_Wang 阅读(287) 评论(0) 推荐(0) 编辑

2019年11月6日

Trustworthy Machine Learning Paper Indexpage

摘要： Paper [1]: White-box neural network attack, adversaries have full access to the model. Using Gradient Descent going back to update the input so that r 阅读全文

posted @ 2019-11-06 00:11 Junfei_Wang 阅读(663) 评论(0) 推荐(0) 编辑

2019年8月22日

Generative Model vs Discriminative Model

摘要： In this post, we are going to compare the two types of machine learning models-generative model and discriminative model-, whose underlying ideas are 阅读全文

posted @ 2019-08-22 10:54 Junfei_Wang 阅读(574) 评论(0) 推荐(0) 编辑

2019年8月14日

State Function Approximation: Linear Function

摘要： In the previous posts, we use different techniques to build and keep updating State-Action tables. But it is impossible to do the same thing when the 阅读全文

posted @ 2019-08-14 04:19 Junfei_Wang 阅读(308) 评论(0) 推荐(0) 编辑

2019年7月31日

Temporal-Difference Control: SARSA and Q-Learning

摘要： SARSA SARSA algorithm also estimate Action-Value functions rather than State-Value function. The difference between SARSA and Monte Carlo is: SARSA do 阅读全文

posted @ 2019-07-31 21:52 Junfei_Wang 阅读(207) 评论(0) 推荐(0) 编辑

2019年7月30日

Temporal-Difference Learning for Prediction

摘要： In Monte Carlo Learning, we've got the estimation of value function: Gt is the episode return from time t, which can be calculated by: Please recall, 阅读全文

posted @ 2019-07-30 11:01 Junfei_Wang 阅读(208) 评论(0) 推荐(0) 编辑

2019年7月29日

Monte Carlo Control

摘要： Problem of State-Value Function Similar as Policy Iteration in Model-Based Learning, Generalized Policy Iteration will be used in Monte Carlo Control. 阅读全文

posted @ 2019-07-29 11:12 Junfei_Wang 阅读(229) 评论(0) 推荐(0) 编辑

2019年7月21日

Monte Carlo Policy Evaluation

摘要： Model-Based and Model-Free In the previous several posts, we mainly talked about Model-Based Reinforcement Learning. The biggest assumption for Model- 阅读全文

posted @ 2019-07-21 11:34 Junfei_Wang 阅读(396) 评论(0) 推荐(0) 编辑

2019年7月19日

Value Iteration Algorithm for MDP

摘要： Value-Iteration Algorithm: For each iteration k+1: a. calculate the optimal state-value function for all s∈S; b. untill algorithm converges. end up wi 阅读全文

posted @ 2019-07-19 10:15 Junfei_Wang 阅读(737) 评论(0) 推荐(0) 编辑

2019年7月13日

Policy Improvement and Policy Iteration

摘要： From the last post, we know how to evaluate a policy. But that's not enough, because the purpose of policy evaluation is to improve policies so that f 阅读全文

posted @ 2019-07-13 10:45 Junfei_Wang 阅读(328) 评论(0) 推荐(0) 编辑

Rhys_Wang

公告