为解决AUV集群在复杂海洋任务中因能量分配不均导致“短板效应”,提出一种基于多智能体强化学习的动态能量均衡策略。该策略将引入动态能耗奖惩函数的长短期记忆-多智能体近端策略优化(Long Short-Term Memory - Multi-Agent Proximal Policy Optimization,LSTM-MAPPO)深度强化学习、A星(A-star,A*)路径规划和比例-积分-微分控制(Proportional-Integral-Derivative Control,PID),构建三级集成智能控制架构,形成“全局决策-路径规划-精准控制”的完整闭环。仿真实验表明,有能耗奖惩函数的LSTM-MAPPO算法相较于基于前沿的探索算法,巡逻覆盖率提升了143.5%,平均巡逻时间减少了31.6%,能耗降低了58.3%;有效抑制了AUV集群能量的短板效应,提升了巡逻任务的执行能效。
To address the "short-board effect" caused by uneven energy distribution in AUV (Autonomous Underwater Vehicle) clusters during complex marine missions, this study proposes a dynamic energy balancing strategy based on multi-agent reinforcement learning. The strategy integrates Long Short-Term Memory - Multi-Agent Proximal Policy Optimization (LSTM-MAPPO) deep reinforcement learning with a dynamic energy consumption reward-punishment function, A-star (A*) path planning, and Proportional-Integral-Derivative (PID) control, forming a three-tiered intelligent control architecture. This framework establishes a complete closed-loop system encompassing "global decision-making, path planning, and precise control". Simulation results demonstrate that compared to state-of-the-art exploration algorithms, the LSTM-MAPPO algorithm with the energy consumption reward-punishment function improves patrol coverage by 143.5%, reduces average patrol time by 31.6%, and decreases energy consumption by 58.3%. The strategy effectively mitigates the energy short-board effect in AUV clusters and enhances the energy efficiency of patrol mission execution.
2026,48(6): 181-188 收稿日期:2025-5-27
DOI:10.3404/j.issn.1672-7649.2026.06.024
分类号:U675.79;TP13
基金项目:海南省科技专项(ZDYF2024GXJS010);海南省自然科学基金资助项目(425RC698)
作者简介:蒋可龙(1999-),男,硕士研究生,研究方向为水下机器人技术
参考文献:
[1] TESTOR P, YOUNG B D, RUDNICK D, et al. OceanGliders: a component of the integrated GOOS[J]. Frontiers in Marine Science, 2019, (6): 422.
[2] 闫勋, 廖宇辰, 贾晋军, 等. 面向海洋勘测的多水下机器人编队跟踪控制研究[J]. 舰船科学技术, 2024, 46(1): 102-108
YAN X, LIAO Y C, JIA J J, et al. Research on formation-tracking control of multi-AUV systems for ocean survey[J]. Ship Science and Technology, 2024, 46(1): 102-108
[3] ZHAO Z Y, ZHANG Y Z, FENG X L, et al. A dynamic velocity potential field method for multi-AUV cooperative hunting tasks[J]. Ocean Engineering, 2024, 295: 116813.
[4] LIU Z L, NING D Y, HOU J Y, et al. AUV path planning in a three-dimensional marine environment based on a novel multiple swarm co-evolutionary algorithm[J]. Ocean Engineering, 2024, 164: 111933.
[5] SUN B, NIU N N. Multi-AUVs cooperative path planning in 3D underwater terrain and vortex environments based on improved multi-objective particle swarm optimization algorithm[J]. Ocean Engineering, 2024, 311: 118944.
[6] LI X H, YU S H. Three-dimensional path planning for AUVs in ocean currents environment based on an improved compression factor particle swarm optimization algorithm[J]. Ocean Engineering, 2023, 280: 114610.
[7] ZHANG Y X, SHEN Y, WANG Q, et al. A novel hybrid swarm intelligence algorithm for solving TSP and desired-path-based online obstacle avoidance strategy for AUV[J]. Robotics and Autonomous Systems, 2024, 177: 104678.
[8] WANG H J, YUAN J Y, LV H L, et al. Task allocation and online path planning for AUV swarm cooperation [C]// Proceedings of OCEANS 2017 - Aberdeen. Aberdeen, UK: IEEE, 2017.
[9] MENG Z Z, LI Z, HOU X W, et al. Efficient asynchronous federated learning for AUV swarm[J]. Sensors, 2022, (22): 8727.
[10] MASON F, CHIARIOTRI F, CAMPAGNARO F, et al. Low-cost AUV swarm localization through multimodal underwater acoustic networks[C]//Proceedings of Global Oceans 2020: Singapore – U. S. Gulf Coast. Biloxi, MS, USA: IEEE, 2020.
[11] JIANG B Q, DU J, REN K, et al. Multi-Agent reinforcement learning based secure searching and data collection in AUV swarms [C]// Proceedings of IEEE International Conference on Communications (ICC). Rome, Italy: IEEE, 2023.
[12] FOSSEN T I. Handbook of marine craft hydrodynamics and motion control [M]. Chichester: John Wiley & Sons Ltd, 2011.
[13] LI J W, XIA Y K, XU G H, et al. Enhanced three-dimensional trajectory tracking control for AUVs in variable operating conditions using FMPC-FTTSMC[J]. Ocean Engineering, 2024, 310: 118805.
[14] 吴子明, 杨柯, 唐杨周, 等. 基于反步滑模控制的欠驱动AUV定深运动研究[J]. 舰船科学技术, 2023, 45(1): 114-119
WU Z M, YANG K, TANG Y Z, et al. Research on underactuated AUV depth motion based on backstepping sliding mode control[J]. Ship Science and technology, 2023, 45(1): 114-119
[15] DUAN J J, SHI D, DIAO R S, et al. Deep-reinforcement-learning-based autonomous voltage control for power grid operations[J]. IEEE Transactions on Power Systems, 2020, 35(1): 814-817.
[16] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal Policy Optimization Algorithms [J/OL]. arXiv preprint arXiv: 1707.06347, 2017 [2025-05-15]. https://arxiv.org/abs/1707.06347
[17] HART P E, NILSSON N J, RAPHAEL B. A formal basis for the heuristic determination of minimum cost paths[J]. IEEE Transactions on Systems Science and Cybernetics, 1968, 4(2): 100-107.
[18] WU H, SU W H, LIU Z G. PID controllers: Design and tuning methods [C]//2014 9th IEEE Conference on Industrial Electronics and Applications. Hangzhou, China: IEEE, 2014.
[19] 郭银景, 侯佳辰, 吴琪, 等. AUV全局路径规划环境建模算法研究进展[J]. 舰船科学技术, 2021, 43 (17): 12–18.
GUO Y J, HOU J C, WU Q, et al. Research progress of AUV global path planning environment modeling algorithm [J]. Ship Science and technology, 2021, 43 (17): 12–18.
[20] YAMAUCHI B. A frontier-based approach for autonomous exploration [C]// Proceedings of the 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA'97). Monterey, CA, USA: IEEE, 1997.