Real-Time Control of Multi-Agent Networks using Hierarchical Reinforcement Learning
This project investigates a hierarchical reinforcement learning based control scheme for extreme-scale multi-agent swarms. Control actions are taken based on low-rank information from data instead of models, with the control goal decomposed into local (microscopic) and global (macroscopic) reward functions. Local controllers are designed via private group learning and global controllers via model reduction and averaging. Deep learning algorithms are used to train recurrent neural networks to predict sparsely structured projections following disturbance events. Joint target-tracking and interception using swarms of ground and air vehicles will be used to simulate the approach. Ground vehicles will form teams that autonomously learn a local reinforcement learning control, and air vehicles serve as higher-level coordinators that learn a global reinforcement learning control using low-resolution data. NC State will provide a benchmark for these simulations.
Sponsor
US Army - Army Research Laboratory
The grant—running from September 20, 2021 to September 19, 2023—is for a total of $304,024.
Principle Investigators
Alexandra Duel-Hallen
Aranya Chakrabortty
More Details
In the current state-of-the-art, data-driven machine learning based control of large-scale complex multi-agent networks systems is largely bottlenecked by the curse of dimensionality. Even the simplest linear quadratic regulator design demands cubic numerical complexity in real-time. The problem becomes even more complex when the network model is unknown, due to which an additional learning time needs to be accommodated. In this project, we seek to take a different approach, and investigate a hierarchical reinforcement learning based control scheme for extreme-scale multi-agent swarm networks. Here, control actions will be taken based on low-rank information from data instead of models. The approach will be to decompose a control objective into multiple smaller hierarchies, such as group-level microscopic controls that can be learned using dense but local data only, and a broad system-level macroscopic control that steers the swarm in its desired direction but using only high-level sparse data. Each hierarchy will have its own learning loop with local and global reward functions. The control goal of the network will be decomposed accordingly into local (microscopic) and global (macroscopic) reward functions. Local controllers will be designed via private group learning, and the global controllers via model reduction and averaging. Sparse controller structures will be imposed on top of the local controllers to reduce their communication complexity. Deep learning algorithms based on historical events will be used to train recurrent neural networks so that they can rapidly predict these sparsely structured projections following any disturbance event in the network. One driving example, which will be used as a benchmark for simulations, is joint target-tracking and interception using swarms of ground vehicles and air vehicles. The ground vehicles will be divided into teams that are autonomously formed through learning. Agents in each team will learn a local reinforcement learning control that can track detailed microscopic dynamics of a target while respecting individual formation constraints. The air vehicles, on the other hand, will serve as higher-level coordinators that learn a global reinforcement learning control using low-resolution and low-rank data to provide a macroscopic view of the target motion in terms of the movement of centroids.
