简介
Summary:
Publisher Summary 1
From household appliances to applications in robotics, engineered systems involving complex dynamics can only be as effective as the algorithms that control them. While Dynamic Programming (DP) has provided researchers with a way to optimally solve decision and control problems involving complex dynamic systems, its practical value was limited by algorithms that lacked the capacity to scale up to realistic problems. However, in recent years, dramatic developments in Reinforcement Learning (RL), the model-free counterpart of DP, changed our understanding of what is possible. Those developments led to the creation of reliable methods that can be applied even when a mathematical model of the system is unavailable, allowing researchers to solve challenging control problems in engineering, as well as in a variety of other disciplines, including economics, medicine, and artificial intelligence.
Reinforcement Learning and Dynamic Programming Using Function Approximators provides a comprehensive and unparalleled exploration of the field of RL and DP. With a focus on continuous-variable problems, this seminal text details essential developments that have substantially altered the field over the past decade. In its pages, pioneering experts provide a concise introduction to classical RL and DP, followed by an extensive presentation of the state-of-the-art and novel methods in RL and DP with approximation. Combining algorithm development with theoretical guarantees, they elaborate on their work with illustrative examples and insightful comparisons.
Three individual chapters are dedicated to representative algorithms from each of the major classes of techniques: value iteration, policy iteration, and policy search. The features and performance of these algorithms are highlighted in extensive experimental studies on a range of control applications.
The recent development of applications involving complex systems has led to a surge of interest in RL and DP methods and the subsequent need for a quality resource on the subject. For graduate students and others new to the field, this book offers a thorough introduction to both the basics and emerging methods. And for those researchers and practitioners working in the fields of optimal and adaptive control, machine learning, artificial intelligence, and operations research, this resource offers a combination of practical algorithms, theoretical analysis, and comprehensive examples that they will be able to adapt and apply to their own work.
Publisher Summary 2
Dynamic programming is an approach to optimal control designed for situations in which a model of the system to be controlled is available; when no model is available, reinforcement learning divines control policies solely from the knowledge of transition samples or trajectories that are collected beforehand or by online interaction with the system. Otherwise, the two are pretty similar, and Busoniu, Robert Babuska, Bard De Schutter (all systems and control, Delft U. of Technology, the Netherlands), and Belgian researcher Damien Ernst consider them together. They cover their application in large and continuous spaces, approximate value iteration with a fuzzy representation, approximate policy iteration for onlining learning and continuous-action control, and approximate policy search with cross-entropy optimization of basis functions. Annotation 漏2010 Book News, Inc., Portland, OR (booknews.com)
目录
Table Of Contents:
1 Introduction 1(10)
1.1 The dynamic programming and reinforcement learning problem 2(3)
1.2 Approximation in dynamic programming and reinforcement learning 5(3)
1.3 About this book 8(3)
2 An introduction to dynamic programming and reinforcement learning 11(32)
2.1 Introduction 11(3)
2.2 Markov decision processes 14(9)
2.2.1 Deterministic setting 14(5)
2.2.2 Stochastic setting 19(4)
2.3 Value iteration 23(7)
2.3.1 Model-based value iteration 23(5)
2.3.2 Model-free value iteration and the need for exploration 28(2)
2.4 Policy iteration 30(8)
2.4.1 Model-based policy iteration 31(6)
2.4.2 Model-free policy iteration 37(1)
2.5 Policy search 38(3)
2.6 Summary and discussion 41(2)
3 Dynamic programming and reinforcement learning in large and continuous spaces 43(74)
3.1 Introduction 43(4)
3.2 The need for approximation in large and continuous spaces 47(2)
3.3 Approximation architectures 49(5)
3.3.1 Parametric approximation 49(2)
3.3.2 Nonparametric approximation 51(2)
3.3.3 Comparison of parametric and nonparametric approximation 53(1)
3.3.4 Remarks 54(1)
3.4 Approximate value iteration 54(17)
3.4.1 Model-based value iteration with parametric approximation 55(3)
3.4.2 Model-free value iteration with parametric approximation 58(4)
3.4.3 Value iteration with nonparametric approximation 62(1)
3.4.4 Convergence and the role of nonexpansive approximation 63(3)
3.4.5 Example: Approximate Q-iteration for a DC motor 66(5)
3.5 Approximate policy iteration 71(24)
3.5.1 Value iteration-like algorithms for approximate policyevaluation 73(1)
3.5.2 Model-free policy evaluation with linearly parameterized approximation 74(10)
3.5.3 Policy evaluation with nonparametric approximation 84(1)
3.5.4 Model-based approximate policy evaluation with rollouts 84(1)
3.5.5 Policy improvement and approximate policy iteration 85(3)
3.5.6 Theoretical guarantees 88(2)
3.5.7 Example: Least-squares policy iteration for a DC motor 90(5)
3.6 Finding value function approximators automatically 95(6)
3.6.1 Basis function optimization 96(2)
3.6.2 Basis function construction 98(2)
3.6.3 Remarks 100(1)
3.7 Approximate policy search 101(12)
3.7.1 Policy gradient and actor-critic algorithms 102(5)
3.7.2 Gradient-free policy search 107(2)
3.7.3 Example: Gradient-free policy search for a DC motor 109(4)
3.8 Comparison of approximate value iteration, policy iteration, and policy search 113(1)
3.9 Summary and discussion 114(3)
4 Approximate value iteration with a fuzzy representation 117(50)
4.1 Introduction 117(2)
4.2 Fuzzy Q-iteration 119(8)
4.2.1 Approximation and projection mappings of fuzzy Q-iteration 119(4)
4.2.2 Synchronous and asynchronous fuzzy Q-iteration 123(4)
4.3 Analysis of fuzzy Q-iteration 127(14)
4.3.1 Convergence 127(8)
4.3.2 Consistency 135(5)
4.3.3 Computational complexity 140(1)
4.4 Optimizing the membership functions 141(4)
4.4.1 A general approach to membership function optimization 141(2)
4.4.2 Cross-entropy optimization 143(1)
4.4.3 Fuzzy Q-iteration with cross-entropy optimization of the membership functions 144(1)
4.5 Experimental study 145(19)
4.5.1 DC motor: Convergence and consistency study 146(6)
4.5.2 Two-link manipulator: Effects of action interpolation, and comparison with fitted Q-iteration 152(5)
4.5.3 Inverted pendulum: Real-time control 157(3)
4.5.4 Car on the hill: Effects of membership function optimization 160(4)
4.6 Summary and discussion 164(3)
5 Approximate policy iteration for online learning and continuous-actioncontrol 167(38)
5.1 Introduction 167(1)
5.2 A recapitulation of least-squares policy iteration 168(2)
5.3 Online least-squares policy iteration 170(3)
5.4 Online LSPI with prior knowledge 173(4)
5.4.1 Online LSPI with policy approximation 174(1)
5.4.2 Online LSPI with monotonic policies 175(2)
5.5 LSPI with continuous-action, polynomial approximation 177(3)
5.6 Experimental study 180(21)
5.6.1 Online LSPI for the inverted pendulum 180(12)
5.6.2 Online LSPI for the two-link manipulator 192(3)
5.6.3 Online LSPI with prior knowledge for the DC motor 195(3)
5.6.4 LSPI with continuous-action approximation for the inverted pendulum 198(3)
5.7 Summary and discussion 201(4)
6 Approximate policy search with cross-entropy optimization of basis functions 205(30)
6.1 Introduction 205(2)
6.2 Cross-entropy optimization 207(2)
6.3 Cross-entropy policy search 209(7)
6.3.1 General approach 209(4)
6.3.2 Cross-entropy policy search with radial basis functions 213(3)
6.4 Experimental study 216(17)
6.4.1 Discrete-time double integrator 216(7)
6.4.2 Bicycle balancing 223(6)
6.4.3 Structured treatment interruptions for HIV infection control 229(4)
6.5 Summary and discussion 233(2)
Appendix A Extremely randomized trees 235(4)
A.1 Structure of the approximator 235(1)
A.2 Building and using a tree 236(3)
Appendix B The cross-entropy method 239(6)
B.1 Rare-event simulation using the cross-entropy method 239(3)
B.2 Cross-entropy optimization 242(3)
Symbols and abbreviations 245(4)
Bibliography 249(18)
List of algorithms 267(2)
Index 269
1 Introduction 1(10)
1.1 The dynamic programming and reinforcement learning problem 2(3)
1.2 Approximation in dynamic programming and reinforcement learning 5(3)
1.3 About this book 8(3)
2 An introduction to dynamic programming and reinforcement learning 11(32)
2.1 Introduction 11(3)
2.2 Markov decision processes 14(9)
2.2.1 Deterministic setting 14(5)
2.2.2 Stochastic setting 19(4)
2.3 Value iteration 23(7)
2.3.1 Model-based value iteration 23(5)
2.3.2 Model-free value iteration and the need for exploration 28(2)
2.4 Policy iteration 30(8)
2.4.1 Model-based policy iteration 31(6)
2.4.2 Model-free policy iteration 37(1)
2.5 Policy search 38(3)
2.6 Summary and discussion 41(2)
3 Dynamic programming and reinforcement learning in large and continuous spaces 43(74)
3.1 Introduction 43(4)
3.2 The need for approximation in large and continuous spaces 47(2)
3.3 Approximation architectures 49(5)
3.3.1 Parametric approximation 49(2)
3.3.2 Nonparametric approximation 51(2)
3.3.3 Comparison of parametric and nonparametric approximation 53(1)
3.3.4 Remarks 54(1)
3.4 Approximate value iteration 54(17)
3.4.1 Model-based value iteration with parametric approximation 55(3)
3.4.2 Model-free value iteration with parametric approximation 58(4)
3.4.3 Value iteration with nonparametric approximation 62(1)
3.4.4 Convergence and the role of nonexpansive approximation 63(3)
3.4.5 Example: Approximate Q-iteration for a DC motor 66(5)
3.5 Approximate policy iteration 71(24)
3.5.1 Value iteration-like algorithms for approximate policyevaluation 73(1)
3.5.2 Model-free policy evaluation with linearly parameterized approximation 74(10)
3.5.3 Policy evaluation with nonparametric approximation 84(1)
3.5.4 Model-based approximate policy evaluation with rollouts 84(1)
3.5.5 Policy improvement and approximate policy iteration 85(3)
3.5.6 Theoretical guarantees 88(2)
3.5.7 Example: Least-squares policy iteration for a DC motor 90(5)
3.6 Finding value function approximators automatically 95(6)
3.6.1 Basis function optimization 96(2)
3.6.2 Basis function construction 98(2)
3.6.3 Remarks 100(1)
3.7 Approximate policy search 101(12)
3.7.1 Policy gradient and actor-critic algorithms 102(5)
3.7.2 Gradient-free policy search 107(2)
3.7.3 Example: Gradient-free policy search for a DC motor 109(4)
3.8 Comparison of approximate value iteration, policy iteration, and policy search 113(1)
3.9 Summary and discussion 114(3)
4 Approximate value iteration with a fuzzy representation 117(50)
4.1 Introduction 117(2)
4.2 Fuzzy Q-iteration 119(8)
4.2.1 Approximation and projection mappings of fuzzy Q-iteration 119(4)
4.2.2 Synchronous and asynchronous fuzzy Q-iteration 123(4)
4.3 Analysis of fuzzy Q-iteration 127(14)
4.3.1 Convergence 127(8)
4.3.2 Consistency 135(5)
4.3.3 Computational complexity 140(1)
4.4 Optimizing the membership functions 141(4)
4.4.1 A general approach to membership function optimization 141(2)
4.4.2 Cross-entropy optimization 143(1)
4.4.3 Fuzzy Q-iteration with cross-entropy optimization of the membership functions 144(1)
4.5 Experimental study 145(19)
4.5.1 DC motor: Convergence and consistency study 146(6)
4.5.2 Two-link manipulator: Effects of action interpolation, and comparison with fitted Q-iteration 152(5)
4.5.3 Inverted pendulum: Real-time control 157(3)
4.5.4 Car on the hill: Effects of membership function optimization 160(4)
4.6 Summary and discussion 164(3)
5 Approximate policy iteration for online learning and continuous-actioncontrol 167(38)
5.1 Introduction 167(1)
5.2 A recapitulation of least-squares policy iteration 168(2)
5.3 Online least-squares policy iteration 170(3)
5.4 Online LSPI with prior knowledge 173(4)
5.4.1 Online LSPI with policy approximation 174(1)
5.4.2 Online LSPI with monotonic policies 175(2)
5.5 LSPI with continuous-action, polynomial approximation 177(3)
5.6 Experimental study 180(21)
5.6.1 Online LSPI for the inverted pendulum 180(12)
5.6.2 Online LSPI for the two-link manipulator 192(3)
5.6.3 Online LSPI with prior knowledge for the DC motor 195(3)
5.6.4 LSPI with continuous-action approximation for the inverted pendulum 198(3)
5.7 Summary and discussion 201(4)
6 Approximate policy search with cross-entropy optimization of basis functions 205(30)
6.1 Introduction 205(2)
6.2 Cross-entropy optimization 207(2)
6.3 Cross-entropy policy search 209(7)
6.3.1 General approach 209(4)
6.3.2 Cross-entropy policy search with radial basis functions 213(3)
6.4 Experimental study 216(17)
6.4.1 Discrete-time double integrator 216(7)
6.4.2 Bicycle balancing 223(6)
6.4.3 Structured treatment interruptions for HIV infection control 229(4)
6.5 Summary and discussion 233(2)
Appendix A Extremely randomized trees 235(4)
A.1 Structure of the approximator 235(1)
A.2 Building and using a tree 236(3)
Appendix B The cross-entropy method 239(6)
B.1 Rare-event simulation using the cross-entropy method 239(3)
B.2 Cross-entropy optimization 242(3)
Symbols and abbreviations 245(4)
Bibliography 249(18)
List of algorithms 267(2)
Index 269
光盘服务联系方式: 020-38250260 客服QQ:4006604884
云图客服:
用户发送的提问,这种方式就需要有位在线客服来回答用户的问题,这种 就属于对话式的,问题是这种提问是否需要用户登录才能提问
Video Player
×
Audio Player
×
pdf Player
×