Aritra Mitra
Assistant Professor
NC State University
Modern large-scale learning paradigms rely on leveraging data from multiple agents to improve performance. However, to reap the benefits of more data, one must account for the possibility of outliers that can be generated adversarially. While the challenge of adversarial robustness has been well-explored in the context of supervised learning, little to nothing is known in this regard when it comes to sequential decision-making. To fill this gap, we investigate one of the simplest settings in reinforcement learning: a linear stochastic bandit setup involving M agents who can collaborate via a central server to minimize regret. A fraction of these agents is adversarial and can act arbitrarily, leading to the following tension: while collaboration can potentially reduce regret, it can also disrupt the process of learning due to adversaries. We provide a fundamental understanding of this tension by designing new robust algorithms that balance the exploration-exploitation trade-off via carefully constructed robust confidence intervals. When the fraction of corrupted agents is small, our algorithms enjoy a clear benefit of collaboration despite adversaries. Using an information-theoretic argument, we also prove a matching lower bound, providing the first set of tight, near-optimal regret bounds for collaborative linear bandits with adversaries.