Recurrent patrolling with unmanned vehicles is contributing to fight migrant smuggling and trafficking networks in the Mediterranean. It is also providing a valuable solution to the control and confinement of destructive wildfires all over the world, as well as rapid response to distribute resources, save lives and property. Multi-Agent Frequency-Based Patrolling is the act of travelling around an area, at regular intervals, to supervise it. As an optimization problem, it was addressed in several flavours in the literature. For this project, we focus on the frequency based patrolling problem in highly dynamic scenarios, where targets under supervision can move, change of priority and required visit frequency. We propose a solution rooted on a Markov Decision Processes (MDP), used to train a model with a Deep Reinforcement Learning (DRL) approach. The RL agent will be able to find an optimal visit strategy for the patrolling problem and communication of the observed critical events to a central base station. We propose to evaluate the performance of the whole protocol in a simulated environment to assess desirable metrics of interest, and validate the results obtained from the simulated campaign with a real field experiment.
In the following, we summarise the contributions of this project:
- We design a Multi-Agent Frequency-Based Patrolling Protocol for Flying Ad Hoc Networks that addresses the problem of patrolling jointly with the problem of communication in sparse networks made of flying devices. In the frequency based scenario, no previous work, to the best of our knowledge, ever addressed the problem through bare RL but only through Partially Observable Markov Decision Processes (POMDP), due to the intrinsic complexity of solving MDPs when the state space is too dense, resulting in loosely approximated models.
- We adopt a methodology rooted on a Markov Decision Process to train a Deep Q-Neural Network (DQN) that is a function approximator that outputs optimal actions given an input state vector observed by the agent. The DQN is adopted to handle both patrolling and communication. This approach is innovative as it breaks the intrinsic curse of dimensionality issue [13] of tabular based algorithms for RL like Q-learning, and for this reason is becoming more and more popular in the literature. Furthemore the resilient state representation of the agent allows the address a two-fold problem easily making the formulation elegant.
- We evaluate the performance of the whole protocol in a simulated environment to assess desirable metrics of interest. As for the patrolling solution alone, the distance between the optimal solution and the RL solution on a simplified testbed will be evaluated to assess the practical applicability and performance of the approach.
- We validate the results obtained from the simulated campaign on a real field experiment with the available drones of the research group at the Department of Computer Science at Sapienza University.
[5] H. Shakhatreh et al., "Unmanned Aerial Vehicles (UAVs): A Survey on Civil Applications and Key Research Challenges," in IEEE Access, vol. 7, pp. 48572-48634, 2019, doi: 10.1109/ACCESS.2019.2909530.
[6] Alpern, Steve, Alec Morton, and Katerina Papadaki. "Patrolling games." Operations research 59.5 (2011): 1246-1257.
[7] Y. Chevaleyre, "Theoretical analysis of the multi-agent patrolling problem," Proceedings. IEEE/WIC/ACM International Conference on Intelligent Agent Technology, 2004. (IAT 2004)., 2004, pp. 302-308, doi: 10.1109/IAT.2004.1342959.
[8] Alamdari, Soroush, Elaheh Fata, and Stephen L. Smith. "Persistent monitoring in discrete environments: Minimizing the maximum weighted latency between observations." The International Journal of Robotics Research 33.1 (2014): 138-154.
[9] K. Kalyanam, S. Manyam, A. Von Moll, D. Casbeer and M. Pachter, "Scalable and Exact MILP Methods for UAV Persistent Visitation Problem," 2018 IEEE Conference on Control Technology and Applications (CCTA), 2018, pp. 337-342, doi: 10.1109/CCTA.2018.8511587.
[10] Asghar, Ahmad Bilal, Stephen L. Smith, and Shreyas Sundaram. "Multi-robot routing for persistent monitoring with latency constraints." 2019 American Control Conference (ACC). IEEE, 2019.
[11] Elmaliach, Yehuda, Noa Agmon, and Gal A. Kaminka. "Multi-robot area patrol under frequency constraints." Annals of Mathematics and Artificial Intelligence 57.3 (2009): 293-320.
[12] Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018.
[13] Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).
[14] Santana, Hugo, et al. "Multi-agent patrolling with reinforcement learning." Autonomous Agents and Multiagent Systems, International Joint Conference on. Vol. 4. IEEE Computer Society, 2004.
[15] Hu, Jingzhi, et al. "Cooperative internet of UAVs: Distributed trajectory design by multi-agent deep reinforcement learning." IEEE Transactions on Communications 68.11 (2020): 6807-6821.
[16] Al-Turjman F., Zahmatkesh H. (2020) A Comprehensive Review on the Use of AI in UAV Communications: Enabling Technologies, Applications, and Challenges. In: Al-Turjman F. (eds) Unmanned Aerial Vehicles in Smart Cities. Unmanned System Technologies. Springer, Cham. https://doi.org/10.1007/978-3-030-38712-9_1
[17] MAD for FANETs: Movement Assisted Delivery for Flying Ad-hoc Networks / Bartolini, Novella; Coletta, Andrea; Gennaro, Andrea; Maselli, Gaia; Prata, Matteo. - (2021). (IEEE International Conference on Distributed Computing Systems (IEEE ICDCS 2021) To appear.