副标题：无

作者：

分类号：

ISBN：9780585128290

收录收藏 (0) 评论纠错

微信扫一扫,移动浏览光盘

简介

简介

Summary: Publisher Summary 1 The Support Vector Machine is a powerful new learning algorithm for solving a variety of learning and function estimation problems, such as pattern recognition, regression estimation, and operator inversion. The impetus for this collection was a workshop on Support Vector Machines held at the 1997 NIPS conference. The contributors, both university researchers and engineers developing applications for the corporate world, form a Who's Who of this exciting new area. Contributors: Peter Bartlett, Kristin P. Bennett, Christopher J. C. Burges, Nello Cristianini, Alex Gammerman, Federico Girosi, Simon Haykin, Thorsten Joachims, Linda Kaufman, Jens Kohlmorgen, Ulrich Kressel, Davide Mattera, Klaus-Robert Muller, Manfred Opper, Edgar E. Osuna, John C. Platt, Gunnar Ratsch, Bernhard Scholkopf, John Shawe-Taylor, Alexander J. Smola, Mark O. Stitson, Vladimir Vapnik, Volodya Vovk, Grace Wahba, Chris Watkins, Jason Weston, Robert C. Williamson.

Advances in Kernel Methods
CONTENTS
PREFACE
1 Introduction to Support Vector Learning
1.1 Learning Pattern Recognition from Examples
1.2 Hyperplane Classifiers
1.3 Feature Spaces and Kernels
1.4 Support Vector Machines
1.5 Support Vector Regression
1.6 Empirical Results, Implementations, and Further Developments
1.7 Notation
2 Roadmap
2.1 Theory
2.2 Implementations
2.3 Applications
2.4 Extensions
I THEORY
3 Three Remarks on the Support Vector Method of Function Estimation
3.1 Introduction
3.2 The Optimal Hyperplane
3.2.1 Constructing Optimal Hyperplanes
3.2.2 The Support Vector Machine for Pattern Recognition
3.2.3 Generalization to High-Dimensional Spaces
3.2.4 Hilbert-Schmidt Theory and Mercer\\u0027s Theorem
3.2.5 Statistical Properties of the Optimal Hyperplane
3.2.6 Remark About the Generalization Ability of SV Machines
3.3 Transductive Inference Using Sv Machines
3.4 Estimation of Conditional Probability and Conditional Density Functions
3.4.1 Conditional Probability Functions
3.4.2 Conditional Density Functions
4 Generalization Performance of Support Vector Machines and Other Pattern Classifiers
4.1 Introduction
4.2 Structural Risk Minimization (SRM) and Data-Sensitive SRM
4.3 Support Vector Machines
4.4 Other Applications
4.5 Conclusions
Appendix: Proof of Theorem 4.6
5 Bayesian Voting Schemes and Large Margin Classifiers
5.1 Introduction
5.2 Bayesian Learning Theory
5.3 Bayesian Classifiers as Large Margin Hyperplanes
5.3.1 The Uninformative Prior
5.3.2 The Effect of the Prior Distribution on the Margin Bound
5.4 Conclusions
6 Support Vector Machines, Reproducing Kernel Hilbert Spaces, and Randomized GACV
6.1 Introduction
6.2 Some Facts About RKHS
6.2.1 The Moore-Aronszajn Theorem
6.2.2 The Representer Theorem
6.2.3 Gaussian Processes, The Isometric Isomorphism Theorem
6.3 From Soft Classification to Hard Classification to SVMs
6.3.1 Hard Classification
6.3.2 Soft Classification
6.3.3 Back to Hard Classification
6.3.4 Convex Compromises with SVMs
6.4 The Randomized GACV for Choosing l
6.4.1 The Generalized Comparative Kullback-Leibler Distance
6.4.2 A Computable Proxy for the GCKL
6.4.2.1 Approximate Cross Validation
6.4.3 The Leaving Out One Lemma
6.4.4 An Approximation for
6.4.4.1 The Randomized Trace Estimate of GACV(l) for
6.4.5 Discussion of ranGACV
7 Geometry and Invariance in Kernel Based Methods
7.1 Overview
7.2 The Kernel Mapping
7.2.1 Smoothness Assumptions
7.3 Measures of Distance on S
7.4 From Kernel to Metric
7.5 From Metric to Kernel
7.6 Dot Product Kernels
7.6.1 The Curvature for Polynomial Maps
7.6.2 The Volume Element
7.7 Positivity
7.8 Nash\\u0027s Theorem: An Equivalence Class of Kernels
7.9 The Metric for Positive Definite Functions
7.10 A Note on Geodesics
7.11 Conclusions and Discussion
7.12 Overview
7.13 Incorporating Local Invariances
7.13.2 Multiple Symmetries
7.13.3 Building Locally Invariant Kernels
7.14 A Detailed Example: Vertical Translation Invariance
7.14.1 The Relation to Group Action
7.14.2 A Simple Example: 4 Pixels
7.14.2.1 Additive Invariance
7.14.3 The n-Pixel Case
7.14.3.1 A No-Go Theorem
7.14.3.2 The General Solution
7.14.3.3 Additive Invariance
7.15 The Method of Central Moments
7.16 Discussion
7.17 Appendix
8 On the Annealed VC Entropy for Margin Classifiers: A Statistical Mechanics Study
8.1 Introduction
8.2 VC Entropy
8.3 The Thermodynamic Limit
8.4 An Expression for the Annealed Entropy
8.5 Evaluation in the Thermodynamic Limit
8.6 Results and Discussion
9 Entropy Numbers, Operators and Support Vector Kernels
9.1 Introduction
9.2 Definitions and Notation
9.3 Operator Theory Methods for Entropy Numbers
9.4 Generalization Bounds via Uniform Convergence
9.5 Entropy Numbers for Kernel Machines
9.5.1 Mercer\\u0027s Theorem, Feature Spaces and Scaling
9.5.2 Entropy Numbers
9.6 Discrete Spectra of Convolution Operators
9.7 Covering Numbers for Given Decay Rates
9.8 Conclusions
9.8.1 A Possible Procedure to use the Results of this Chapter
II IMPLEMENTATIONS
10 Solving the Quadratic Programming Problem Arising in Support Vector Classification
10.1 Introduction
10.2 General Considerations
10.3 Solving the Quadratic Programming Problem
10.3.1 The Meta Algorithm
10.3.2 Minimizing ?
10.3.2.1 A Newton Approach based on the Bunch-Kaufman Algorithm
10.3.2.2 The Conjugate Gradient Algorithm
10.3.3 Changing the Equality Constrained Problem
10.3.3.1 Deactivating Constraints
10.3.3.2 Activating Constraints
10.4 Chunking
10.5 Computation Experience
10.6 Conclusions
Appendix
11 Making Large-Scale Support Vector Machine Learning Practical
11.1 Introduction
11.2 General Decomposition Algorithm
11.3 Selecting a Good Working Set
11.3.1 Convergence
11.3.2 How to Solve OP3
11.4 Shrinking: Reducing the Size of OP1
11.5 Efficient Implementation
11.5.1 Termination Criteria
11.5.2 Computing the Gradient and the Termination Criteria Efficiently
11.5.3 Computational Resources Needed in Each Iteration
11.5.4 Caching Kernel Evaluations
11.5.5 How to Solve OP2 (QP Subproblems)
11.6 Related Work
11.7 Experiments
11.7.1 How Does Training Time Scale with the Number of Training Examples?
11.7.1.1 Income Prediction
11.7.1.2 Classifying Web Pages
11.7.1.3 Ohsumed Data Set
11.7.1.4 Dectecting Faces in Images
11.7.2 What Is the Influence of the Working Set Selection Strategy?
11.7.3 What Is the Influence of Caching?
11.7.4 What Is the Influence of Shrinking?
11.8 Conclusions
12 Fast Training of Support Vector Machines Using Sequential Minimal Optimization
12.1 Introduction
12.1.1 Previous Methods for Training Support Vector Machines
12.2 Sequential Minimal Optimization
12.2.1 Solving for Two Lagrange Multipliers
12.2.2 Heuristics for Choosing Which Multipliers to Optimize
12.2.3 The Threshold and the Error Cache
12.2.4 Speeding Up SMO
12.3 Pseudo-Code
12.4 Relationship to Previous Algorithms
12.5 Benchmarking SMO
12.5.1 Experimental Results
12.6 Conclusions
12.7 Appendix: Derivation of Two-Example Maximization
12.8 Appendix: SMO vs. PCG Chunking Tables
III APPLICATIONS
13 Support Vector Machines for Dynamic Reconstruction of a Chaotic System
13.1 Introduction
13.2 Nonlinear Dynamical Reconstruction
13.3 The Support Vector Machine
13.3.1 Theoretical Background
13.3.1.1 Modified Structural Risk Minimization
13.3.1.2 Specific Class of Functions and Its Structure
13.3.1.3 Solving the Minimization Problem: the Dual Method
13.3.1.4 The e-Insensitive Loss Function
13.3.1.5 The e-insensitive Huber Loss Function
13.3.2 Considerations for Implementing the Algorithm
13.3.3 Open Problems: the Choice of the Parameters and of the Kernels
13.3.3.1 Choosing the Value of e
13.3.3.2 Choosing the Value of C
13.3.3.3 Choosing the Value of s2
13.4 Applying the SVM to the Nonlinear Reconstruction
13.4.1 The Pure Lorenz Time-Series
13.4.1.1 Dependence on the Choice of C and s2
13.4.1.2 Dependence on the Training Set Size and e
13.4.2 The Noisy Case
13.4.2.1 Dependence on e and the Training Set Size
13.4.2.2 Dependence on Embedding Dimension dE and Time Delay T
13.5 Conclusions
14 Using Support Vector Machines for Time Series Prediction
14.1 Introduction
14.2 Support Vector Regression
14.2.1 Vapnik\\u0027s e-Insensitive Loss Function
14.2.2 Huber\\u0027s Loss Function
14.2.3 How to Compute the Threshold b?
14.3 RBF Networks with Adaptive Centers and Widths
14.4 How to Predict?
14.5 Experiments
14.5.1 Mackey Glass Equation
14.5.2 Data Set D from the Santa Fe Competition
14.6 Discussion and Outlook
Appendix
15 Pairwise Classification and Support Vector Machines
15.1 Introduction
15.2 K-Class Problem
15.3 Pairwise Classification
15.4 Benchmarks
15.5 Conclusion
IV EXTENSIONS OF THE ALGORITHM
16 Reducing the Run-time Complexity in Support Vector Machines
16.1 Introduction
16.2 Motivation and Statement of the Problem
16.3 Previous Work: The Reduced Set Method
16.4 The Class of Problems We Approach
16.5 First Approach: Using SVRM
16.6 Second Approach: Reformulating the Training Problem
16.6.1 Possible Improvements
16.7 Experimental Results
16.8 Limitations and Final Remarks
17 Support Vector Regression with ANOVA Decomposition Kernels
17.1 Introduction
17.2 Multiplicative Kernels
17.3 ANOVA Decomposition
17.3.1 ANOVA Decomposition Kernels
17.3.2 Algorithm
17.4 Experiments
17.4.1 Method
17.4.2 Results
17.5 Conclusion and Further research
Appendix: Spline kernels
18 Support Vector Density Estimation
18.1 The Density Estimation Problem
18.2 SV Method of Estimating Densities
18.3 SV Density Estimation by Solving the Linear Operator Equation
18.4 Spline Approximation of a Density
18.5 Considering a Monotonic Set of Functions
18.6 Linear Programming (LP) Approach to SV Regression Estimation
18.7 Gaussian-like Approximation of a Density
18.8 SV Density Estimation Using a Dictionary of Kernels
18.9 One More Method of SV Density Estimation
18.10 Parzen\\u0027s Windows
18.11 Approximating Density Estimates Using SV Regression techniques
18.12 Multi-dimensional Density Estimation
18.13 Experiments
18.14 Conclusions and Further Research
19 Combining Support Vector and Mathematical Programming Methods for Classification
19.1 Introduction
19.2 Two MPM Methods for Classification
19.3 Nonlinear Separation via Decision Trees
19.4 Multicategory Classification
19.5 Overall Risk Minimization and MPM
19.6 Conclusions
20 Kernel Principal Component Analysis
20.1 Introduction
20.2 Principal Component Analysis in Feature Spaces
20.3 Kernel Principal Component Analysis
20.4 Feature Extraction Experiments
20.5 Discussion
Appendix
REFERENCES
INDEX
A
B
C
D
E
F
G
H
I
K
L
M
N
O
P
Q
R
S
T
V
W

已确认勘误

页码	勘误内容	提交人	修订印次

名称
类型
大小

用户反馈

FAQ

光盘服务联系方式: 020-38250260 客服QQ：4006604884

意见反馈

已确认勘误

第次印刷 筛选

第次印刷