Proj CDeepFuzz Paper Reading: MDPFuzz: Testing Models Solving Markov Decision Processes

Abstract

背景:马尔可夫决策过程(Markov decision process, MDP)是串联决策问题(sequential decision making)的一种数学化建模;机器学习已经为MDP提供了很多解法,但这些解法没有被严格测试过,或者不真正可靠(Q?)
本文:MDPFuzz
Github: https://github.com/Qi-Pang/MDPFuzz
Task: fuzz models solving MDPs
Method:

  1. oracle: target model是否进入了abnormal and dangerous states
  2. 如果一个state减少了reward值或者form a new state sequence,则保留某个mutated state
  3. 使用Gaussian mixture models(GMMs)和dynamic expectation-maximization(DynEM)来评价某个State sequence的freshness
  4. prioritize states with high potential of revealing crashes by estimating the local sensitivity of target models over states(通过target model对状态的局部敏感型,提高更可能揭示crash的状态的优先级)

实验:
数据集:CARLA autonomous driving-RL, DNN-based ACAS Xu aircraft collision avoidance-DNN, CARLA autonomous driving-IL, Coop Navi game-MARL, BipedalWalker game-RL
时间:12 hour
效果:

  1. find 80+ crash-triggering state sequences
  2. retraing不会牺牲accuracy(Q: 没有提升accuracy?)

Discussion:

  1. 引起crash的状态可能看上去normal,但能引发不同的neuron activation patterns
posted @ 2023-08-06 20:33  雪溯  阅读(39)  评论(0编辑  收藏  举报