Proj CDeepFuzz Paper Reading: MDPFuzz: Testing Models Solving Markov Decision Processes
Abstract
背景:马尔可夫决策过程(Markov decision process, MDP)是串联决策问题(sequential decision making)的一种数学化建模;机器学习已经为MDP提供了很多解法,但这些解法没有被严格测试过,或者不真正可靠(Q?)
本文:MDPFuzz
Github: https://github.com/Qi-Pang/MDPFuzz
Task: fuzz models solving MDPs
Method:
- oracle: target model是否进入了abnormal and dangerous states
- 如果一个state减少了reward值或者form a new state sequence,则保留某个mutated state
- 使用Gaussian mixture models(GMMs)和dynamic expectation-maximization(DynEM)来评价某个State sequence的freshness
- prioritize states with high potential of revealing crashes by estimating the local sensitivity of target models over states(通过target model对状态的局部敏感型,提高更可能揭示crash的状态的优先级)
实验:
数据集:CARLA autonomous driving-RL, DNN-based ACAS Xu aircraft collision avoidance-DNN, CARLA autonomous driving-IL, Coop Navi game-MARL, BipedalWalker game-RL
时间:12 hour
效果:
- find 80+ crash-triggering state sequences
- retraing不会牺牲accuracy(Q: 没有提升accuracy?)
Discussion:
- 引起crash的状态可能看上去normal,但能引发不同的neuron activation patterns