[1] |
WATKINS J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(1):279-292.
|
[2] |
SUTTON R S. Learning to predict by the methods of temporal differences[J]. Machine Learning, 1998, 3:10-43.
|
[3] |
SUTTON R S, BARTO A G. Reinforcement learning:an introduction[J]. IEEE Transactions on Neural Networks, 1998, 9(5):1054.
|
[4] |
ETESSAMI K, YANNAKAKIS M. Recursive Markov decision processes and recursive stochastic games[J]. Journal of the Acm, 2005, 62(2):100.
|
[5] |
DUFOUR F. Impulsive control for continuous-time Markov decision processes[J]. Advances in Applied Probability, 2014, 47(1):129-161.
|
[6] |
HALLAK A, CASTRO D D. Contextual Markov decision processes[J]. Computer Science, 2015, 5(4):220-229.
|
[7] |
BEATRIC B, KRISHNENDU C. Probabilistic opacity for Markov decision processes[J]. Information Processing Letters, 2015, 115(1):52-59.
|
[8] |
刘全, 肖飞. 基于自适应归一化RBF网络的Q-V值函数协同逼近模型[J]. 计算机学报, 2015, 38(7):1386-1396. LIU Q, XIAO F. Collaborative Q-V value function approximation model based on adaptive normalized radial basis function network[J]. Chinese Journal of Computers, 2015, 38(7):1386-1396.
|
[9] |
HACHIYA H, AKIYAMA T, SUGIAYMA M, et al. Adaptive importance sampling for value function approximation in off-policy reinforcement learning[J]. Neural Networks the Official Journal of the International Neural Network Society, 2009, 22(10):1399-1410.
|
[10] |
AKIYAMA T, HACHIYA H M. Efficient exploration through active learning for value function approximation in reinforcement learning[J]. Neural Networks the Official Journal of the International Neural Network Society, 2010, 23(5):639-648.
|
[11] |
XU X, HUANG Z. A clustering-based graph Laplacian framework for value function approximation in reinforcement learning[J]. Cybernetics, 2014, 44(12):2613-2625.
|
[12] |
ELFWING S, UCHIBE E. From free energy to expected energy:improving energy-based value function approximation in reinforcement learning[J]. Neural Networks, 2016, 84:17-27.
|
[13] |
WANG X S, CHENG Y H, YI J Q. A fuzzy actor-critic reinforcement learning network[J]. Information Sciences, 2007, 177(18):3764-3781.
|
[14] |
YAVUZ E, MAUL P, NOWOTNY T. Spiking neural network model of reinforcement learning in the honeybee implemented on the GPU[J]. Bmc Neuroscience, 2015, 16(S1):1-2.
|
[15] |
FAUßER S, SCHWENKER F. Selective neural network ensembles in reinforcement learning:taking the advantage of many agents[J]. Neurocomputing, 2015, 169:350-357.
|
[16] |
TANG L, LIU Y J. Adaptive neural network control of robot manipulator using reinforcement learning[J]. Journal of Vibration & Control, 2013, 20(14):2162-2171.
|
[17] |
盖俊峰, 赵国荣. 基于线性近似和神经网络逼近的模型预测控制[J]. 系统工程与电子技术, 2015, 37(2):394-399. GAI J F, ZHAO G R. Model predictive control based on linearization and neural network approach[J]. Systems Engineering and Electronics, 2015, 37(2):394-399.
|
[18] |
BRADTKE S J, BARTO A G. Linear least-squares algorithms for temporal difference learning[J]. Machine Learning, 1996, 22(1/2/3):33-57.
|
[19] |
BOYAN J A. Technical update:least-squares temporal difference learning[J]. Machine Learning, 2002, 49(2/3):233-246.
|
[20] |
王国芳, 方舟. 基于批量递归最小二乘的自然Actor-Critic算法[J]. 浙江大学学报, 2015, 49(7):1335-1341. WANG G F, FANG Z. Natural Actor-Critic based on batch recursive least-squares[J]. Journal of Zhejiang University (Engineering Science), 2015, 49(7):1335-1341.
|
[21] |
HUANG G, ZHU Q. Extreme learning machine:theory and applications[J]. Neurocomputing, 2006, 70:489-501.
|
[22] |
孙艳丰, 杨新东. 基于Softplus激活函数和改进Fisher判别的ELM算法[J]. 北京工业大学学报, 2015, 41(9):1341-1347. SUN Y F, YANG X D. ELM algorithm based on Softplus activation function and improved Fisher discrimination[J]. Journal of Beijing University of Technology, 2015, 41(9):1341-1347.
|
[23] |
高阳, 陈世福, 陆鑫. 强化学习研究综述[J]. 自动化学报, 2004, 30(1):86-100. GAO Y, CHEN S F, LU X. Research on reinforcement learning technology:a review[J]. Acta Automatica Sinica, 2004, 30(1):86-100.
|
[24] |
PABLO E M, JOSE M M. Least-squares temporal difference learning based on an extreme learning machine[J]. Neurocomputing, 2014, 14:37-45.
|
[25] |
BOYAN J A. Least-squares temporal difference learning in proceedings of the sixteenth international conference[J]. Machine Learning, 1999, 49(2/3):49-56.
|
[26] |
WANG J F, WANG J D, SONG J K. Optimized Cartesian k-means[J]. IEEE Transactions on Knowledge & Data Engineering, 2015, 27(1):180-192.
|
[27] |
HAYKIN S. Neural Networks and Learning Machines:A Comprehensive Foundation[M]. London:Pearson Education, 2010:800-815.
|
[28] |
ALPAYDIN E. Introduction to machine learning[J]. Machine Learning, 2004, 5(8):28.
|
[29] |
ZHAO J, WEI H. Natural gradient learning algorithms for RBF networks[J]. Neural Computation, 2015, 27(2):481-505.
|