基于APF-DDPG改进的无人艇自治学习算法

公告通知

下载文档

联系方式

主管单位:: 中国船舶集团有限公司

主办单位:: 中国舰船研究院、中国船舶集团有限公司第七一四研究所

编辑出版:: 《舰船科学技术》编辑部

联系地址:: 北京市朝阳区科荟路55号院

邮编:: 100101

电话:: 陈老师：010-83027277
宋老师：010-83027276
李老师：010-83027269
梁老师：010-83027281

邮箱:: jckxjs@163.com

ISSN:: 1672-7649

CN:: 11-1885/U

友情链接

当前位置：首页 > 过刊浏览->2026年48卷8期

基于APF-DDPG改进的无人艇自治学习算法
The autonomous learning algorithm for unmanned ships based on an improved APF-DDPG

DOI:

作者:: 黄志坚¹, 华顿², 吴贤坤³, 王仁洪¹
HUANG Zhijian¹, HUA Dun², WU Xiankun³, WANG Renhong¹

作者单位:: 1. 上海海事大学商船学院智能控制与计算实验室，上海 201306;
2. 上海爱谱华顿电子科技（集团）有限公司, 上海 201316;
3. 上海立达学院数字科学学院，上海 201609
1. Lab of Intelligent Control and Computation, Merchant Marine College, Shanghai Maritime University, Shanghai 201306, China;
2. Shanghai Aipu-Waton Electronic Technology (Group) Co., Ltd., Shanghai 201316, China;
3. Digital Science Colloge, Shanghai Lida University, Shanghai 201609, China

关键词:: 无人艇;自治学习;DDPG;人工势场;经验回放
unmanned ship; autonomous learning; DDPG; artificial potential field; experience replay

摘要:: 针对无人艇深度学习算法易发散，以及无人艇强化学习算法难以适应连续动作与空间的问题，提出改进的深度确定策略梯度（DDPG）算法，即APF-DDPG。该算法利用人工势场等改进复合奖惩函数，优化经验池方案和随机采样策略，添加高斯噪声提高探索性，以及修改神经网络模型等；并搭建仿真环境，评价无人艇的探索策略。结果验证了APF-DDPG算法的可行性及其性能，证明了改进的APF-DDPG算法，在复杂环境下路径规划的有效性与优越性。
To address the issues of deep learning algorithms for unmanned surface vehicles (USVs) being prone to divergence and the challenges of reinforcement learning algorithms adapting to continuous actions and spatial constraints, an improved Deep Deterministic Policy Gradient (DDPG) algorithm, termed APF-DDPG, is proposed. This algorithm enhances the composite reward function by incorporating artificial potential fields, optimizes the experience replay scheme and random sampling strategy, introduces Gaussian noise to improve exploration, and modifies the neural network model. Additionally, a simulation environment is established to evaluate the exploration strategies of USVs. The results validate the feasibility and performance of the APF-DDPG algorithm, demonstrating the effectiveness and superiority of the improved APF-DDPG algorithm in path planning under complex environmental conditions.

2026,48(8): 128-134 收稿日期：2025-8-4

DOI：10.3404/j.issn.1672-7649.2026.08.020

分类号：U674.91

基金项目：国家自然科学基金资助项目（61403250）；上海市科委自然科学基金面上资助项目（21ZR1426600）

作者简介：黄志坚(1979-),男,博士,副教授,研究方向为智能控制与计算

参考文献：
[1] PANOV A I, YAKOVLEV K S, SUVOROV R. Grid path planning with deep reinforcement learning: Preliminary results[J]. Procedia Computer Science, 2018, 123(2018): 347-353
[2] SHANI G, HECKERMAN D, BRAFMAN R I, et al. An MDP-based recommender system[J]. Journal of Machine Learning Research, 2005, 6(5): 1265-1295
[3] ZHANG L J, CHEN Y. Finite-time adaptive dynamic programming for affine-form nonlinear systems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2025, 36(2): 3573-3586
[4] LEE J Y, KIM Y. Hamilton-Jacobi based policy-iteration via deep operator learning[J]. Neurocomputing, 2025, 646(2025): 130-515
[5] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444
[6] POLVARA R, PATACCHIOLA M, SHARMA S, et al. Toward end-to-end control for UAV autonomous landing via deep reinforcement learning[C]//2018 International Conference on Unmanned Aircraft Systems (ICUAS), 2018.
[7] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2016.
[8] HORGAN D, QUAN J, BUDDEN D, et al. Distributed prioritized experience replay[J]. arXiv preprint arXiv: 1803.00933, 2018: 1-19.
[9] FORTUNATO M, AZAR M G, PIOT B, et al. Noisy networks for exploration[J]. arXiv preprint arXiv: 1706.10295, 2017: 1-21.
[10] BHATNAGAR S, SUTTON R S, GHAVAMZADEH M, et al. Natural actor-critic algorithms[J]. Automatica, 2009, 45(11): 2471-2482
[11] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning, 2016.
[12] BARTH-MARON G, HOFFMAN M W, BUDDEN D, et al. Distributed distributional deterministic policy gradients[J]. arXiv preprint arXiv: 1804.08617, 2018: 1–16.
[13] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//International Conference on Machine Learning, 2014.
[14] LI Y, WU D, WANG H, et al. Dynamic collision avoidance for maritime autonomous surface ships based on deep Q-network with velocity obstacle method[J]. Ocean Engineering, 2025, 320: 120335
[15] 任建周卫祥. 基于改进人工势场法的无人艇路径规划[J]. 舰船科学技术, 2025, 47(13): 52-57 REN J, ZHOU W X. Path planning of unmanned surface vehicle based on improved artificial potential field method[J]. Ship Science and Technology, 2025, 47(13): 52-57
[16] NG A Y, HARADA D, RUSSELL S. Policy invariance under reward transformations: Theory and application to reward shaping[C]//Machine Learning: Sixteenth International Conference on Machine Learning(ICML'99). Bled, Slovenia. Morgan Kaufmann Publishers Inc, 1999.

基于APF-DDPG改进的无人艇自治学习算法 The autonomous learning algorithm for unmanned ships based on an improved APF-DDPG

基于APF-DDPG改进的无人艇自治学习算法
The autonomous learning algorithm for unmanned ships based on an improved APF-DDPG