A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 1 Issue 4
Oct.  2014

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8, Top 4% (SCI Q1)
    CiteScore: 17.6, Top 3% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Brian Gaudet and Roberto Furfaro, "Adaptive Pinpoint and Fuel Efficient Mars Landing Using Reinforcement Learning," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 4, pp. 397-411, 2014.
Citation: Brian Gaudet and Roberto Furfaro, "Adaptive Pinpoint and Fuel Efficient Mars Landing Using Reinforcement Learning," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 4, pp. 397-411, 2014.

Adaptive Pinpoint and Fuel Efficient Mars Landing Using Reinforcement Learning

  • Future unconstrained and science-driven missions to Mars will require advanced guidance algorithms that are able to adapt to more demanding mission requirements, e.g. landing on selected locales with pinpoint accuracy while autonomously flying fuel-efficient trajectories. In this paper, a novel guidance algorithm designed by applying the principles of reinforcement learning (RL) theory is presented. The goal is to devise an adaptive guidance algorithm that enables robust, fuel efficient, and accurate landing without the need for off line trajectory generation and real-time tracking. Results from a Monte Carlo simulation campaign show that the algorithm is capable of autonomously following trajectories that are close to the optimal minimum-fuel solutions with an accuracy that surpasses that of past and future Mars missions. The proposed RL-based guidance algorithm exhibits a high degree of flexibility and can easily accommodate autonomous retargeting while maintaining accuracy and fuel efficiency. Although reinforcement learning and other similar machine learning techniques have been previously applied to aerospace guidance and control problems (e.g., autonomous helicopter control), this appears, to the best of the authors knowledge, to be the first application of reinforcement learning to the problem of autonomous planetary landing.

     

  • loading
  • [1]
    Steltzner A, Kipp D, Chen A, Burkhart D, Guernsey C, Mendeck G,Mitcheltree R, Powell R, Rivellini T, San Martin M, Way D. Marsscience laboratory entry, descent, and landing system. In: Proceedingsof the 2006 IEEE Aerospace Conference. Big Sky, MT: IEEE, 2006.
    [2]
    Shotwell R. Phoenix-the first Mars scout mission. Acta Astronautica,2005, 57(2-8): 121-134
    [3]
    Singh G, SanMartin A M, Wong E C. Guidance and control design forpowered descent and landing on Mars. In: Proceedings of the 2006 IEEEAerospace Conference. Big Sky, MT: IEEE, 2007. 1-8
    [4]
    Klumpp A R. Apollo Guidance, Navigation, and Control: Apollo Lunar-Descent Guidance. Massachusetts Institute of Technology, Charles StarkDraper Lab, TR R-695, Cambridge, MA, 1971.
    [5]
    Klumpp A R. Apollo lunar descent guidance. Automatica, 1974, 10(2):133-146
    [6]
    Chomel C T, Bishop R H. Analytical lunar descent guidance algorithm.Journal of Guidance, Control, and Dynamics, 2009, 32(3): 915-926
    [7]
    Furfaro R, Selnick S, Cupples M L, Cribb M W. Non-linear slidingguidance algorithms for precision lunar landing (AAS 11-167). In:Proceedings of the 21st AAS/AIAA Space Flight Mechanics Meeting.San Diego, CA: American Astronautical Society by Univelt, 2011.945-964
    [8]
    Bishop C M. Pattern Recognition and Machine Learning. Berlin, Heidelberg:Springer, 2006.
    [9]
    Calise A J, Rysdyk R T. Nonlinear adaptive flight control using neuralnetworks. IEEE Control Systems Magazine, 1998, 18(6): 14-25
    [10]
    Sutton R S, Barto A G. Reinforcement Learning: An Introduction.Cambridge, MA: MIT Press, 1998. 100-103
    [11]
    Gaudet B, Furfaro R. Adaptive Pinpoint and Fuel Efficient Mars Landingusing Reinforcement Learning (AAS 12-191). In: Proceeding of the 22ndSpaceflight Mechanics Meeting. San Diego, CA: American AstronauticalSociety by Univelt, 2012. 1309-1328
    [12]
    Ng A Y, Kim H J, Jordan M I, Sastry S. Autonomous helicopter flightvia reinforcement learning. Advances in Neural Information ProcessingSystems 16. Cambridge, MA: MIT Press, 2004.
    [13]
    Munos R, Szepesv´ari C. Finite-time bounds for fitted value iteration.Journal of Machine Learning Research, 2008, 1: 815-857
    [14]
    Powell W B. Approximate Dynamic Programming: Solving the Cursesof Dimensionality (Second edition). Hoboken, N.J.: Wiley, 2011.
    [15]
    Bishop C M. Pattern Recognition and Machine Learning. Berlin, Heidelberg:Springer, 2006.
    [16]
    Koller D, Friedman N. Probabilistic Graphical Models. Massachusetts:MIT Press, 2009.
    [17]
    Tuckness D G. Analysis of a terminal landing on Mars. Journal ofSpacecraft and Rockets, 1995, 32(1): 142-148
    [18]
    Coates A, Abbeel P, Ng A Y. Learning for control from multipledemonstrations. In: Proceedings of the 25th International Conferenceon Machine Learning. New York, USA: ACM, 2008. 144-151
    [19]
    Huntington G T. Advancement and Analysis of Gauss PseudospectralTranscription for Optimal Control Problems [Ph. D. dissertation], MassachusettsInstitute of Technology, Cambridge MA, 2007
    [20]
    Françolin C C, Benson D A, Hager W W, Rao A V. Costate approximationin optimal control using integral Gaussian quadrature orthogonalcollocation methods. Optimal Control Applications and Methods, to bepublished
    [21]
    Patterson M A, Hager W W, Rao A V. A ph mesh refinement methodfor optimal control. Optimal Control Applications and Methods, to bepublished
    [22]
    Patterson M A, Rao A V. GPOPS-II: a MATLAB software for solvingmultiple-phase optimal control problems using Hp-adaptive Gaussianquadrature collocation methods and sparse nonlinear programming.ACM Transactions on Mathematical Software, 2013, 39(3), Article 1
    [23]
    Patterson M A, Rao A V. Exploiting sparsity in direct collocationpseudospectral methods for solving optimal control problems. Journalof Spacecraft and Rockets, 2012, 49(2): 354-377
    [24]
    Darby C L, Garg D, Rao A V. Costate estimation using multiple-intervalpseudospectral methods. Journal of Spacecraft and Rockets, 2011, 48(5):856-866
    [25]
    Darby C L, Hager W W, Rao A V. An Hp-adaptive pseudospectralmethod for solving optimal control problems. Optimal Control Applicationsand Methods, 2011, 32(4): 476-502
    [26]
    Garg D, Patterson M A, Françolin C, Darby C L, Huntington G T,Hager W W, Rao A V. Direct trajectory optimization and costate estimationof finite-horizon and infinite-horizon optimal control problemsusing a Radau pseudospectral method. Computational Optimization andApplications, 2011, 49(2): 335-358
    [27]
    Darby C L, Hager W W, Rao A V. Direct trajectory optimization using avariable low-order adaptive pseudospectral method. Journal of Spacecraftand Rockets, 2011, 48(3): 433-445
    [28]
    Garg D, Hager W W, Rao A V. Pseudospectral methods for solvinginfinite-horizon optimal control problems. Automatica, 2011, 47(4):829-837
    [29]
    Rao A V, Benson D A, Darby C, Patterson M A, Francolin C, SandersI, Huntington G T. Algorithm 902: GPOPS, A MATLAB softwarefor solving multiple-phase optimal control problems using the Gausspseudospectral method. ACM Transactions on Mathematical Software,2010, 37(2), Article 22
    [30]
    MacKay D J C. Bayesian interpolation. Neural Computation, 1992, 4(3):415-447
    [31]
    Rumelhart D E, Hinton G E, Williams R J. Learning representations byback-propagating errors. Nature, 1986, 323(6088): 533-536
    [32]
    Gonz´alez R. Neural Networks for Variational Problems in Engineering[Ph.D dissertation], University of Catalonia, Spain, 2008
    [33]
    Spall J C. Introduction to Stochastic Search and Optimization. New York:Wiley, 2003.
    [34]
    Fu M C. What you should know about simulation and derivatives. NavalResearch Logistics (NRL), 2008, 55(8): 723-736
    [35]
    Acikmese B, Ploen S R. Convex programming approach to powereddescent guidance for mars landing. Journal of Guidance, Control, andDynamics, 2007, 30(5): 1353-1366
    [36]
    Gill P E, Murray W, Saunders M A. SNOPT: an SQP algorithm for largescaleconstrained optimization. SIAM Review, 2005, 47(1): 99-131

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (1212) PDF downloads(20) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return