Offline Reward Perturbation Boosts Distributional Shift in Online RL

Published in 40th Conference on Uncertainty in Artificial Intelligence (UAI), 2024

Offline-to-online reinforcement learning has recently been shown effective in reducing the online sample complexity by first training from offline collected data. However, this additional data source may also invite new poisoning attacks that target offline training. In this work, we reveal such vulnerabilities by proposing a novel data poisoning attack method, which is stealthy in the sense that the performance during the offline training remains intact, but the online fine-tuning stage will suffer a significant performance drop. Our method leverages the techniques from bi-level optimization to promote the distribution shift under offline-to-online reinforcement learning. Experiments on four environments confirm the satisfaction of the new stealthiness requirement, and can be effective in attacking with only a small budget and without having white-box access to the victim model.

Recommended citation: Yu, Z.*, Kang, S.*, & Zhang, X. (*equal contribution). (2024, July). Offline Reward Perturbation Boosts Distributional Shift in Online RL. In The 40th Conference on Uncertainty in Artificial Intelligence. https://openreview.net/pdf?id=wbwTF909Ve