图书介绍

统计学习基础 第2版2025|PDF|Epub|mobi|kindle电子书版本百度云盘下载

统计学习基础 第2版
  • (德)黑斯蒂(Hastie,T.)著 著
  • 出版社: 北京;西安:世界图书出版公司
  • ISBN:9787510084508
  • 出版时间:2015
  • 标注页数:745页
  • 文件大小:81MB
  • 文件页数:768页
  • 主题词:统计学-英文

PDF下载


点此进入-本书在线PDF格式电子书下载【推荐-云解压-方便快捷】直接下载PDF格式图书。移动端-PC端通用
种子下载[BT下载速度快]温馨提示:(请使用BT下载软件FDM进行下载)软件下载地址页直链下载[便捷但速度慢]  [在线试读本书]   [在线获取解压码]

下载说明

统计学习基础 第2版PDF格式电子书版下载

下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。

建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!

(文件页数 要大于 标注页数,上中下等多册电子书除外)

注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具

图书目录

1 Introduction1

2 Overview of Supervised Learning9

2.1 Introduction9

2.2 Variable Types and Terminology9

2.3 Two Simple Approaches to Prediction:Least Squares and Nearest Neighbors11

2.3.1 Linear Models and Least Squares11

2.3.2 Nearest-Neighbor Methods14

2.3.3 From Least Squares to Nearest Neighbors16

2.4 Statistical Decision Theory18

2.5 Local Methods in High Dimensions22

2.6 Statistical Models,Supervised Learning and Function Approximation28

2.6.1 A Statistical Model for the Joint Distribution Pr(X,Y)28

2.6.2 Supervised Learning29

2.6.3 Function Approximation29

2.7 Structured Regression Models32

2.7.1 Difficulty of the Problem32

2.8 Classes of Restricted Estimators33

2.8.1 Roughness Penalty and Bayesian Methods34

2.8.2 Kernel Methods and Local Regression34

2.8.3 Basis Functions and Dictionary Methods35

2.9 Model Selection and the Bias-Variance Tradeoff37

Bibliographic Notes39

Exercises39

3 Linear Methods for Regression43

3.1 Introduction43

3.2 Linear Regression Models and Least Squares44

3.2.1 Example:Prostate Cancer49

3.2.2 The Gauss-Markov Theorem51

3.2.3 Multiple Regression from Simple Univariate Regression52

3.2.4 Multiple Outputs56

3.3 Subset Selection57

3.3.1 Best-Subset Selection57

3.3.2 Forward-and Backward-Stepwise Selection58

3.3.3 Forward-Stagewise Regression60

3.3.4 Prostate Cancer Data Example (Continued)61

3.4 Shrinkage Methods61

3.4.1 Ridge Regression61

3.4.2 The Lasso68

3.4.3 Discussion:Subset Selection,Ridge Regression and the Lasso69

3.4.4 Least Angle Regression73

3.5 Methods Using Derived Input Directions79

3.5.1 Principal Components Regression79

3.5.2 Partial Least Squares80

3.6 Discussion:A Comparison of the Selection and Shrinkage Methods82

3.7 Multiple Outcome Shrinkage and Selection84

3.8 More on the Lasso and Related Path Algorithms86

3.8.1 Incremental Forward Stagewise Regression86

3.8.2 Piecewise-Linear Path Algorithms89

3.8.3 The Dantzig Selector89

3.8.4 The Grouped Lasso90

3.8.5 Further Properties of the Lasso91

3.8.6 Pathwise Coordinate Optimization92

3.9 Computational Considerations93

Bibliographic Notes94

Exercises94

4 Linear Methods for Classification101

4.1 Introduction101

4.2 Linear Regression of an Indicator Matrix103

4.3 Linear Discriminant Analysis106

4.3.1 Regularized Discriminant Analysis112

4.3.2 Computations for LDA113

4.3.3 Reduced-Rank Linear Discriminant Analysis113

4.4 Logistic Regression119

4.4.1 Fitting Logistic Regression Models120

4.4.2 Example:South African Heart Disease122

4.4.3 Quadratic Approximations and Inference124

4.4.4 L1 Regularized Logistic Regression125

4.4.5 Logistic Regression or LDA?127

4.5 Separating Hyperplanes129

4.5.1 Rosenblatt's Perceptron Learning Algorithm130

4.5.2 Optimal Separating Hyperplanes132

Bibliographic Notes135

Exercises135

5 Basis Expansions and Regularization139

5.1 Introduction139

5.2 Piecewise Polynomials and Splines141

5.2.1 Natural Cubic Splines144

5.2.2 Example:South African Heart Disease(Continued)146

5.2.3 Example:Phoneme Recognition148

5.3 Filtering and Feature Extraction150

5.4 Smoothing Splines151

5.4.1 Degrees of Freedom and Smoother Matrices153

5.5 Automatic Selection of the Smoothing Parameters156

5.5.1 Fixing the Degrees of Freedom158

5.5.2 The Bias-Variance Tradeoff158

5.6 Nonparametric Logistic Regression161

5.7 Multidimensional Splines162

5.8 Regularization and Reproducing Kernel Hilbert Spaces167

5.8.1 Spaces of Functions Generated by Kernels168

5.8.2 Examples of RKHS170

5.9 Wavelet Smoothing174

5.9.1 Wavelet Bases and the Wavelet Transform176

5.9.2 Adaptive Wavelet Filtering179

Bibliographic Notes181

Exercises181

Appendix:Computational Considerations for Splines186

Appendix:B-splines186

Appendix:Computations for Smoothing Splines189

6 Kernel Smoothing Methods191

6.1 One-Dimensional Kernel Smoothers192

6.1.1 Local Linear Regression194

6.1.2 Local Polynomial Regression197

6.2 Selecting the Width of the Kernel198

6.3 Local Regression in IRp200

6.4 Structured Local Regression Models in IRp201

6.4.1 Structured Kernels203

6.4.2 Structured Regression Functions203

6.5 Local Likelihood and Other Models205

6.6 Kernel Density Estimation and Classification208

6.6.1 Kernel Density Estimation208

6.6.2 Kernel Density Classification210

6.6.3 The Naive Bayes Classifier210

6.7 Radial Basis Functions and Kernels212

6.8 Mixture Models for Density Estimation and Classification214

6.9 Computational Considerations216

Bibliographic Notes216

Exercises216

7 Model Assessment and Selection219

7.1 Introduction219

7.2 Bias,Variance and Model Complexity219

7.3 The Bias-Variance Decomposition223

7.3.1 Example:Bias-Variance Tradeoff226

7.4 Optimism of the Training Error Rate228

7.5 Estimates of In-Sample Prediction Error230

7.6 The Effective Number of Parameters232

7.7 The Bayesian Approach and BIC233

7.8 Minimum Description Length235

7.9 Vapnik-Chervonenkis Dimension237

7.9.1 Example (Continued)239

7.10 Cross-Validation241

7.10.1 K-Fold Cross-Validation241

7.10.2 The Wrong and Right Way to Do Cross-validation245

7.10.3 Does Cross-Validation Really Work?247

7.11 Bootstrap Methods249

7.11.1 Example(Continued)252

7.12 Conditional or Expected Test Error?254

Bibliographic Notes257

Exercises257

8 Model Inference and Averaging261

8.1 Introduction261

8.2 The Bootstrap and Maximum Likelihood Methods261

8.2.1 A Smoothing Example261

8.2.2 Maximum Likelihood Inference265

8.2.3 Bootstrap versus Maximum Likelihood267

8.3 Bayesian Methods267

8.4 Relationship Between the Bootstrap and Bayesian Inference271

8.5 The EM Algorithm272

8.5.1 Two-Component Mixture Model272

8.5.2 The EM Algorithm in General276

8.5.3 EM as a Maximization-Maximization Procedure277

8.6 MCMC for Sampling from the Posterior279

8.7 Bagging282

8.7.1 Example:Trees with Simulated Data283

8.8 Model Averaging and Stacking288

8.9 Stochastic Search:Bumping290

Bibliographic Notes292

Exercises293

9 Additive Models,Trees,and Related Methods295

9.1 Generalized Additive Models295

9.1.1 Fitting Additive Models297

9.1.2 Example:Additive Logistic Regression299

9.1.3 Summary304

9.2 Tree-Based Methods305

9.2.1 Background305

9.2.2 Regression Trees307

9.2.3 Classification Trees308

9.2.4 Other Issues310

9.2.5 Spam Example (Continued)313

9.3 PRIM:Bump Hunting317

9.3.1 Spam Example (Continued)320

9.4 MARS:Multivariate Adaptive Regression Splines321

9.4.1 Spam Example (Continued)326

9.4.2 Example (Simulated Data)327

9.4.3 Other Issues328

9.5 Hierarchical Mixtures of Experts329

9.6 Missing Data332

9.7 Computational Considerations334

Bibliographic Notes334

Exercises335

10 Boosting and Additive Trees337

10.1 Boosting Methods337

10.1.1 Outline of This Chapter340

10.2 Boosting Fits an Additive Model341

10.3 Forward Stagewise Additive Modeling342

10.4 Exponential Loss and AdaBoost343

10.5 Why Exponential Loss?345

10.6 Loss Functions and Robustness346

10.7 "Off-the-Shelf"Procedures for Data Mining350

10.8 Example:Spam Data352

10.9 Boosting Trees353

10.10 Numerical Optimization via Gradient Boosting358

10.10.1 Steepest Descent358

10.10.2 Gradient Boosting359

10.10.3 Implementations of Gradient Boosting360

10.11 Right-Sized Trees for Boosting361

10.12 Regularization364

10.12.1 Shrinkage364

10.12.2 Subsampling365

10.13 Interpretation367

10.13.1 Relative Importance of Predictor Variables367

10.13.2 Partial Dependence Plots369

10.14 Illustrations371

10.14.1 California Housing371

10.14.2 New Zealand Fish375

10.14.3 Demographics Data379

Bibliographic Notes380

Exercises384

11 Neural Networks389

11.1 Introduction389

11.2 Projection Pursuit Regression389

11.3 Neural Networks392

11.4 Fitting Neural Networks395

11.5 Some Issues in Training Neural Networks397

11.5.1 Starting Values397

11.5.2 Overfitting398

11.5.3 Scaling of the Inputs398

11.5.4 Number of Hidden Units and Layers400

11.5.5 Multiple Minima400

11.6 Example:Simulated Data401

11.7 Example:ZIP Code Data404

11.8 Discussion408

11.9 Bayesian Neural Nets and the NIPS 2003 Challenge409

11.9.1 Bayes,Boosting and Bagging410

11.9.2 Performance Comparisons412

11.10 Computational Considerations414

Bibliographic Notes415

Exercises415

12 Support Vector Machines and Flexible Discriminants417

12.1 Introduction417

12.2 The Support Vector Classifier417

12.2.1 Computing the Support Vector Classifier420

12.2.2 Mixture Example (Continued)421

12.3 Support Vector Machines and Kernels423

12.3.1 Computing the SVM for Classification423

12.3.2 The SVM as a Penalization Method426

12.3.3 Function Estimation and Reproducing Kernels428

12.3.4 SVMs and the Curse of Dimensionality431

12.3.5 A Path Algorithm for the SVM Classifier432

12.3.6 Support Vector Machines for Regression434

12.3.7 Regression and Kernels436

12.3.8 Discussion438

12.4 Generalizing Linear Discriminant Analysis438

12.5 Flexible Discriminant Analysis440

12.5.1 Computing the FDA Estimates444

12.6 Penalized Discriminant Analysis446

12.7 Mixture Discriminant Analysis449

12.7.1 Example:Waveform Data451

Bibliographic Notes455

Exercises455

13 Prototype Methods and Nearest-Neighbors459

13.1 Introduction459

13.2 Prototype Methods459

13.2.1 K-means Clustering460

13.2.2 Learning Vector Quantization462

13.2.3 Gaussian Mixtures463

13.3 k-Nearest-Neighbor Classifiers463

13.3.1 Example:A Comparative Study468

13.3.2 Example:k-Nearest-Neighbors and Image Scene Classification470

13.3.3 Invariant Metrics and Tangent Distance471

13.4 Adaptive Nearest-Neighbor Methods475

13.4.1 Example478

13.4.2 Global Dimension Reduction for Nearest-Neighbors479

13.5 Computational Considerations480

Bibliographic Notes481

Exercises481

14 Unsupervised Learning485

14.1 Introduction485

14.2 Association Rules487

14.2.1 Market Basket Analysis488

14.2.2 The Apriori Algorithm489

14.2.3 Example:Market Basket Analysis492

14.2.4 Unsupervised as Supervised Learning495

14.2.5 Generalized Association Rules497

14.2.6 Choice of Supervised Learning Method499

14.2.7 Example:Market Basket Analysis(Continued)499

14.3 Cluster Analysis501

14.3.1 Proximity Matrices503

14.3.2 Dissimilarities Based on Attributes503

14.3.3 Object Dissimilarity505

14.3.4 Clustering Algorithms507

14.3.5 Combinatorial Algorithms507

14.3.6 K-means509

14.3.7 Gaussian Mixtures as Soft K-means Clustering510

14.3.8 Example:Human Tumor Microarray Data512

14.3.9 Vector Quantization514

14.3.10 K-medoids515

14.3.11 Practical Issues518

14.3.12 Hierarchical Clustering520

14.4 Self-Organizing Maps528

14.5 Principal Components,Curves and Surfaces534

14.5.1 Principal Components534

14.5.2 Principal Curves and Surfaces541

14.5.3 Spectral Clustering544

14.5.4 Kernel Principal Components547

14.5.5 Sparse Principal Components550

14.6 Non-negative Matrix Factorization553

14.6.1 Archetypal Analysis554

14.7 Independent Component Analysis and Exploratory Projection Pursuit557

14.7.1 Latent Variables and Factor Analysis558

14.7.2 Independent Component Analysis560

14.7.3 Exploratory Projection Pursuit565

14.7.4 A Direct Approach to ICA565

14.8 Multidimensional Scaling570

14.9 Nonlinear Dimension Reduction and Local Multidimensional Scaling572

14.10 The Google PageRank Algorithm576

Bibliographic Notes578

Exercises579

15 Random Forests587

15.1 Introduction587

15.2 Definition of Random Forests587

15.3 Details of Random Forests592

15.3.1 Out of Bag Samples592

15.3.2 Variable Importance593

15.3.3 Proximity Plots595

15.3.4 Random Forests and Overfitting596

15.4 Analysis of Random Forests597

15.4.1 Variance and the De-Correlation Effect597

15.4.2 Bias600

15.4.3 Adaptive Nearest Neighbors601

Bibliographic Notes602

Exercises603

16 Ensemble Learning605

16.1 Introduction605

16.2 Boosting and Regularization Paths607

16.2.1 Penalized Regression607

16.2.2 The"Bet on Sparsity"Principle610

16.2.3 Regularization Paths,Over-fitting and Margins613

16.3 Learning Ensembles616

16.3.1 Learning a Good Ensemble617

16.3.2 Rule Ensembles622

Bibliographic Notes623

Exercises624

17 Undirected Graphical Models625

17.1 Introduction625

17.2 Markov Graphs and Their Properties627

17.3 Undirected Graphical Models for Continuous Variables630

17.3.1 Estimation of the Parameters when the Graph Structure is Known631

17.3.2 Estimation of the Graph Structure635

17.4 Undirected Graphical Models for Discrete Variables638

17.4.1 Estimation of the Parameters when the Graph Structure is Known639

17.4.2 Hidden Nodes641

17.4.3 Estimation of the Graph Structure642

17.4.4 Restricted Boltzmann Machines643

Exercises645

18 High-Dimensional Problems:p>>N649

18.1 When p is Much Bigger than N649

18.2 Diagonal Linear Discriminant Analysis and Nearest Shrunken Centroids651

18.3 Linear Classifiers with Quadratic Regularization654

18.3.1 Regularized Discriminant Analysis656

18.3.2 Logistic Regression with Quadratic Regularization657

18.3.3 The Support Vector Classifier657

18.3.4 Feature Selection658

18.3.5 Computational Shortcuts When p>>N659

18.4 Linear Classifiers with L1 Regularization661

18.4.1 Application of Lasso to Protein Mass Spectroscopy664

18.4.2 The Fused Lasso for Functional Data666

18.5 Classification When Features are Unavailable668

18.5.1 Example:String Kernels and Protein Classification668

18.5.2 Classification and Other Models Using Inner-Product Kernels and Pairwise Distances670

18.5.3 Example:Abstracts Classification672

18.6 High-Dimensional Regression:Supervised Principal Components674

18.6.1 Connection to Latent-Variable Modeling678

18.6.2 Relationship with Partial Least Squares680

18.6.3 Pre-Conditioning for Feature Selection681

18.7 Feature Assessment and the Multiple-Testing Problem683

18.7.1 The False Discovery Rate687

18.7.2 Asymmetric Cutpoints and the SAM Procedure690

18.7.3 A Bayesian Interpretation of the FDR692

18.8 Bibliographic Notes693

Exercises694

References699

Author Index729

Index737

热门推荐