Who Is the Strongest Enemy? Towards Optimal and Efficient Evasion Attacks in Deep RL

Who Is the Strongest Enemy?
Towards Optimal and Efficient Evasion Attacks in Deep RL

Yanchao Sun¹ Ruijie Zheng¹ Yongyuan Liang² Furong Huang¹

¹ University of Maryland College Park ² Sun Yat-sen University
[Paper] [Code]

Evaluating the worst-case performance of a reinforcement learning (RL) agent under the strongest/optimal adversarial perturbations on state observations (within some constraints) is crucial for understanding the robustness of RL agents. However, finding the optimal adversary is challenging, in terms of both whether we can find the optimal attack and how efficiently we can find it. Existing works on adversarial RL either use heuristics-based methods that may not find the strongest adversary, or directly train an RL-based adversary by treating the agent as a part of the environment, which can find the optimal adversary but may become intractable in a large state space. This paper introduces a novel attacking method to find the optimal attacks through collaboration between a designed function named “actor” and an RL-based learner named “director”. The actor crafts state perturbations for a given policy perturbation direction, and the director learns to propose the best policy perturbation directions. Our proposed algorithm, PA-AD, is theoretically optimal and significantly more efficient than prior RL-based works in environments with large state spaces. Empirical results show that our proposed PA-AD universally outperforms state-of-the-art attacking methods in various Atari and MuJoCo environments. By applying PA-AD to adversarial training, we achieve state-of-the-art empirical robustness in multiple tasks under strong adversaries.


Hopper, PPO clean reward: 3104	Hopper, PPO reward under strongest previous attack: 636	Hopper, PPO reward under our PA-AD attack: 171	Walker, PPO clean reward: 4630	Walker, PPO reward under strongest previous attack: 1048	Walker, PPO reward under our PA-AD attack: 813

Pong, DQN clean reward: 21 (agent: right paddle)	Pong, DQN reward under strongest previous attack: -14	Pong, DQN reward under our PA-AD attack: -21	Boxing, DQN clean reward: 94 (agent: left boxer)	Boxing, DQN reward under strongest previous attack: 69	Boxing, DQN reward under our PA-AD attack: 8

Empirically, we verify that our PA-AD algorithm almost always outperforms all existing attacking methods by a large margin on various OpenAI Gym environments, including Atari and MuJoCo tasks. Here is a comparison of PA-AD against existing attacking methods on seven Atari Games and four MuJoco tasks. We compute the average episode rewards plus/minus standard deviation of vanilla DQN and A2C or PPO agents under different evasion attack methods

Here, we have the comparison of PA-AD attacks against other existing attacks with different choices of epsilon on three Atari Games: Boxing, Pong, and RoadRunner respectively.

In addition to the given epsilon shown in the table above, we also have the comparison of PA-AD attacks against other attackers under different choices of epsilon on three Atari Games: Boxing, Pong, RoadRunner respectively.

Besides, when combining with existing robust training methods, our algorithm PA-AD could significantly improve the robustness of the trained RL policies.

@inproceedings{
    sun2022who,
    title={Who Is the Strongest Enemy? Towards Optimal and Efficient Evasion Attacks in Deep {RL}},
    author={Yanchao Sun and Ruijie Zheng and Yongyuan Liang and Furong Huang},
    booktitle={International Conference on Learning Representations},
    year={2022},
    url={https://openreview.net/forum?id=JM2kFbJvvI}
}