IEEE/CAA Journal of Automatica Sinica

Table of Contents

2014, 1(3): .

Abstract(941) PDF(0)

Abstract:

Guest Editorial for Special Issue on Extensions of Reinforcement Learning and Adaptive Control

Frank L. Lewis, Warren Dixon, Zhongsheng Hou, Tansel Yucelen

2014, 1(3): 225-226.

Abstract(1146) HTML (32) PDF(20)

Abstract:

Off-Policy Reinforcement Learning with Gaussian Processes

Girish Chowdhary, Miao Liu, Robert Grande, Thomas Walsh, Jonathan How, Lawrence Carin

2014, 1(3): 227-238.

Abstract(1198) HTML (25) PDF(20)

Abstract:
Abstract—An off-policy Bayesian nonparameteric approximate reinforcement learning framework, termed as GPQ, that employs a Gaussian processes (GP) model of the value (Q) function is presented in both the batch and online settings. Sufficient conditions on GP hyperparameter selection are established to guarantee convergence of off-policy GPQ in the batch setting, and theoretical and practical extensions are provided for the online case. Empirical results demonstrate GPQ has competitive learning speed in addition to its convergence guarantees and its ability to automatically choose its own bases locations.

Concurrent Learning-based Approximate Feedback-Nash Equilibrium Solution of N-player Nonzero-sum Differential Games

Rushikesh Kamalapurkar, Justin R. Klotz, Warren E. Dixon

2014, 1(3): 239-247.

Abstract(1156) HTML (26) PDF(21)

Abstract:
This paper presents a concurrent learning-based actor-critic-identifier architecture to obtain an approximate feedback-Nash equilibrium solution to an infinite horizon N-player nonzero-sum differential game. The solution is obtained online for a nonlinear control-affine system with uncertain linearly parameterized drift dynamics. It is shown that under a condition milder than persistence of excitation (PE), uniformly ultimately bounded convergence of the developed control policies to the feedback-Nash equilibrium policies can be established. Simulation results are presented to demonstrate the performance of the developed technique without an added excitation signal.

Clique-based Cooperative Multiagent Reinforcement Learning Using Factor Graphs

Zhen Zhang, Dongbin Zhao

2014, 1(3): 248-256.

Abstract(1214) HTML (20) PDF(19)

Abstract:
In this paper, we propose a clique-based sparse reinforcement learning (RL) algorithm for solving cooperative tasks. The aim is to accelerate the learning speed of the original sparse RL algorithm and to make it applicable for tasks decomposed in a more general manner. First, a transition function is estimated and used to update the Q-value function, which greatly reduces the learning time. Second, it is more reasonable to divide agents into cliques, each of which is only responsible for a specific subtask. In this way, the global Q-value function is decomposed into the sum of several simpler local Q-value functions. Such decomposition is expressed by a factor graph and exploited by the general maxplus algorithm to obtain the greedy joint action. Experimental results show that the proposed approach outperforms others with better performance.

Reinforcement Learning Transfer Based on Subgoal Discovery and Subtask Similarity

Hao Wang, Shunguo Fan, Jinhua Song, Yang Gao, Xingguo Chen

2014, 1(3): 257-266.

Abstract(1235) HTML (23) PDF(23)

Abstract:
This paper studies the problem of transfer learning in the context of reinforcement learning. We propose a novel transfer learning method that can speed up reinforcement learning with the aid of previously learnt tasks. Before performing extensive learning episodes, our method attempts to analyze the learning task via some exploration in the environment, and then attempts to reuse previous learning experience whenever it is possible and appropriate. In particular, our proposed method consists of four stages: 1) subgoal discovery, 2) option construction, 3) similarity searching, and 4) option reusing. Especially, in order to fulfill the task of identifying similar options, we propose a novel similarity measure between options, which is built upon the intuition that similar options have similar stateaction probabilities. We examine our algorithm using extensive experiments, comparing it with existing methods. The results show that our method outperforms conventional non-transfer reinforcement learning algorithms, as well as existing transfer learning methods, by a wide margin.

Closed-loop P-type Iterative Learning Control of Uncertain Linear Distributed Parameter Systems

Xisheng Dai, Senping Tian, Yunjian Peng, Wenguang Luo

2014, 1(3): 267-273.

Abstract(1209) HTML (22) PDF(26)

Abstract:
An iterative learning control problem for a class of uncertain linear parabolic distributed parameter systems is discussed, which covers many processes such as heat and mass transfer, convection diffusion and transport. Under condition of allowing system state initially to have error in the iterative process a closed-loop P-type iterative learning algorithm is presented, and the sufficient condition of tracking error convergence in L² norm is given. Next, the convergence of the tracking error in L² and W^1,2 space is proved by using Gronwall-Bellman inequality and Sobolev inequality. In the end, a numerical example is given to illustrate the effectiveness of the proposed method.

Experience Replay for Least-Squares Policy Iteration

Quan Liu, Xin Zhou, Fei Zhu, Qiming Fu, Yuchen Fu

2014, 1(3): 274-281.

Abstract(1162) HTML (22) PDF(10)

Abstract:
Policy iteration, which evaluates and improves the control policy iteratively, is a reinforcement learning method. Policy evaluation with the least-squares method can draw more useful information from the empirical data and therefore improve the data validity. However, most existing online least-squares policy iteration methods only use each sample just once, resulting in the low utilization rate. With the goal of improving the utilization efficiency, we propose an experience replay for least-squares policy iteration (ERLSPI) and prove its convergence. ERLSPI method combines online least-squares policy iteration method with experience replay, stores the samples which are generated online, and reuses these samples with least-squares method to update the control policy. We apply the ERLSPI method for the inverted pendulum system, a typical benchmark testing. The experimental results show that the method can effectively take advantage of the previous experience and knowledge, improve the empirical utilization efficiency, and accelerate the convergence speed.

Event-Triggered Optimal Adaptive Control Algorithm for Continuous-Time Nonlinear Systems

Kyriakos G. Vamvoudakis

2014, 1(3): 282-293.

Abstract(1560) HTML (28) PDF(71)

Abstract:
This paper proposes a novel optimal adaptive eventtriggered control algorithm for nonlinear continuous-time systems. The goal is to reduce the controller updates, by sampling the state only when an event is triggered to maintain stability and optimality. The online algorithm is implemented based on an actor/critic neural network structure. A critic neural network is used to approximate the cost and an actor neural network is used to approximate the optimal event-triggered controller. Since in the algorithm proposed there are dynamics that exhibit continuous evolutions described by ordinary differential equations and instantaneous jumps or impulses, we will use an impulsive system approach. A Lyapunov stability proof ensures that the closed-loop system is asymptotically stable. Finally, we illustrate the effectiveness of the proposed solution compared to a timetriggered controller.

Robust Adaptive Model Tracking for Distributed Parameter Control of Linear Infinite-dimensional Systems in Hilbert Space

Mark J. Balas, Susan A. Frost

2014, 1(3): 294-301.

Abstract(2486) HTML (27) PDF(26)

Abstract:
This paper is focused on adaptively controlling a linear infinite-dimensional system to track a finite-dimensional reference model. Given a linear continuous-time infinite-dimensional plant on a Hilbert space with disturbances of known waveform but unknown amplitude and phase, we show that there exists a stabilizing direct model reference adaptive control law with the properties of certain disturbance rejection and robustness. The plant is described by a closed, densely defined linear operator that generates a continuous semigroup of bounded operators on the Hilbert space of states. The central result will show that all errors will converge to a prescribed neighborhood of zero in an infinitedimensional Hilbert space. The result will not require the use of the standard Barbalat0s lemma which requires certain signals to be uniformly continuous. This result is used to determine conditions under which a linear infinite-dimensional system can be directly adaptively controlled to follow a reference model. In particular, we examine conditions for a set of ideal trajectories to exist for the tracking problem. Our results are applied to adaptive control of general linear diffusion systems described by self-adjoint operators with compact resolvent.

Adaptive Iterative Learning Control for a Class of Nonlinear Time-varying Systems with Unknown Delays and Input Dead-zone

Jianming Wei, Yunan Hu, Meimei Sun

2014, 1(3): 302-314.

Abstract(1196) HTML (18) PDF(16)

Abstract:
This paper presents an adaptive iterative learning control (AILC) scheme for a class of nonlinear systems with unknown time-varying delays and unknown input dead-zone. A novel nonlinear form of dead-zone nonlinearity is presented. The assumption of identical initial condition for iterative learning control (ILC) is removed by introducing boundary layer function. The uncertainties with time-varying delays are compensated for by using appropriate Lyapunov-Krasovskii functional and Young0s inequality. Radial basis function neural networks are used to model the time-varying uncertainties. The hyperbolic tangent function is employed to avoid the problem of singularity. According to the property of hyperbolic tangent function, the system output is proved to converge to a small neighborhood of the desired trajectory by constructing Lyapunov-like composite energy function (CEF) in two cases, while keeping all the closedloop signals bounded. Finally, a simulation example is presented to verify the effectiveness of the proposed approach.

An Improved Result of Multiple Model Iterative Learning Control

Xiaoli Li, Kang Wang, Dexin Liu

2014, 1(3): 315-322.

Abstract(1208) HTML (23) PDF(13)

Abstract:
For system operating repetitively, iterative learning control (ILC) has been tested as an effective method even with estimated models. However, the control performance may deteriorate due to sudden system failure or the adoption of imprecise model. The multiple model iterative learning control (MMILC) method shows great potential to improve the transient response and control performance. However, in existed MMILC, the stability can be guaranteed only by finite switching or very strict conditions about coefficient matrix, which make the application of MMILC a little difficult. In this paper, an improved MMILC method is presented. Control procedure is simplified and the ceasing condition is relaxed. Even with infinite times of model switching, system output is proved convergent to the desired trajectory. Simulation studies are carried out to show the effectiveness of the proposed method.

Continuous Action Reinforcement Learning for Control-Affine Systems with Unknown Dynamics

Aleksandra Faust, Peter Ruymgaart, Molly Salman, Rafael Fierro, Lydia Tapia

2014, 1(3): 323-336.

Abstract(1154) HTML (19) PDF(17)

Abstract:
Control of nonlinear systems is challenging in realtime. Decision making, performed many times per second, must ensure system safety. Designing input to perform a task often involves solving a nonlinear system of differential equations, which is a computationally intensive, if not intractable problem. This article proposes sampling-based task learning for controlaffine nonlinear systems through the combined learning of both state and action-value functions in a model-free approximate value iteration setting with continuous inputs. A quadratic negative definite state-value function implies the existence of a unique maximum of the action-value function at any state. This allows the replacement of the standard greedy policy with a computationally efficient policy approximation that guarantees progression to a goal state without knowledge of the system dynamics. The policy approximation is consistent, i.e., it does not depend on the action samples used to calculate it. This method is appropriate for mechanical systems with high-dimensional input spaces and unknown dynamics performing Constraint-Balancing Tasks. We verify it both in simulation and experimentally for an Unmanned Aerial Vehicles (UAVs) carrying a suspended load, and in simulation, for the rendezvous of heterogeneous robots.

Vol. 1, No. 3, 2014

Links
More

E-mail Alert

Vol. 1, No. 3, 2014

Links More

E-mail Alert

Links
More