图书介绍
数据挖掘导论 英文版2025|PDF|Epub|mobi|kindle电子书版本百度云盘下载

- (美)谭庞宁,(美)斯坦巴克,(美)库马尔著 著
- 出版社: 北京:机械工业出版社
- ISBN:9787111316701
- 出版时间:2010
- 标注页数:771页
- 文件大小:50MB
- 文件页数:791页
- 主题词:数据采集-英文
PDF下载
下载说明
数据挖掘导论 英文版PDF格式电子书版下载
下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!
(文件页数 要大于 标注页数,上中下等多册电子书除外)
注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具
图书目录
1 Introduction1
1.1 What Is Data Mining?2
1.2 Motivating Challenges4
1.3 The Origins of Data Mining6
1.4 Data Mining Tasks7
1.5 Scope and Organization of the Book11
1.6 Bibliographic Notes13
1.7 Exercises16
2 Data19
2.1 Types of Data22
2.1.1 Attributes and Measurement23
2.1.2 Types of Data Sets29
2.2 Data Quality36
2.2.1 Measurement and Data Collection Issues37
2.2.2 Issues Related to Applications43
2.3 Data Preprocessing44
2.3.1 Aggregation45
2.3.2 Sampling47
2.3.3 Dimensionality Reduction50
2.3.4 Feature Subset Selection52
2.3.5 Feature Creation55
2.3.6 Discretization and Binarization57
2.3.7 Variable Transformation63
2.4 Measures of Similarity and Dissimilarity65
2.4.1 Basics66
2.4.2 Similarity and Dissimilarity between Simple Attributes67
2.4.3 Dissimilarities between Data Objects69
2.4.4 Similarities between Data Objects72
2.4.5 Examples of Proximity Measures73
2.4.6 Issues in Proximity Calculation80
2.4.7 Selecting the Right Proximity Measure83
2.5 Bibliographic Notes84
2.6 Exercises88
3 Exploring Data97
3.1 The Iris Data Set98
3.2 Summary Statistics98
3.2.1 Frequencies and the Mode99
3.2.2 Percentiles100
3.2.3 Measures of Location:Mean and Median101
3.2.4 Measures of Spread:Range and Variance102
3.2.5 Multivariate Summary Statistics104
3.2.6 Other Ways to Summarize the Data105
3.3 Visualization105
3.3.1 Motivations for Visualization105
3.3.2 General Concepts106
3.3.3 Techniques110
3.3.4 Visualizing Higher-Dimensional Data124
3.3.5 Do's and Don'ts130
3.4 OLAP and Multidimensional Data Analysis131
3.4.1 Representing Iris Data as a Multidimensional Array131
3.4.2 Multidimensional Data:The General Case133
3.4.3 Analyzing Multidimensional Data135
3.4.4 Final Comments on Multidimensional Data Analysis139
3.5 Bibliographic Notes139
3.6 Exercises141
4 Classification:Basic Concepts,Decision Trees,and Model Evaluation145
4.1 Preliminaries146
4.2 General Approach to Solving a Classification Problem148
4.3 Decision Tree Induction150
4.3.1 How a Decision Tree Works150
4.3.2 How to Build a Decision Tree151
4.3.3 Methods for Expressing Attribute Test Conditions155
4.3.4 Measures for Selecting the Best Split158
4.3.5 Algorithm for Decision Tree Induction164
4.3.6 An Example:Web Robot Detection166
4.3.7 Characteristics of Decision Tree Induction168
4.4 Model Overfitting172
4.4.1 Overfitting Due to Presence of Noise175
4.4.2 Overfitting Due to Lack of Representative Samples177
4.4.3 Overfitting and the Multiple Comparison Procedure178
4.4.4 Estimation of Generalization Errors179
4.4.5 Handling Overfitting in Decision Tree Induction184
4.5 Evaluating the Performance of a Classifier186
4.5.1 Holdout Method186
4.5.2 Random Subsampling187
4.5.3 Cross-Validation187
4.5.4 Bootstrap188
4.6 Methods for Comparing Classifiers188
4.6.1 Estimating a Confidence Interval for Accuracy189
4.6.2 Comparing the Performance of Two Models191
4.6.3 Comparing the Performance of Two Classifiers192
4.7 Bibliographic Notes193
4.8 Exercises198
5 Classification:Alternative Techniques207
5.1 Rule-Based Classifier207
5.1.1 How a Rule-Based Classifier Works209
5.1.2 Rule-Ordering Schemes211
5.1.3 How to Build a Rule-Based Classifier212
5.1.4 Direct Methods for Rule Extraction213
5.1.5 Indirect Methods for Rule Extraction221
5.1.6 Characteristics of Rule-Based Classifiers223
5.2 Nearest-Neighbor classifiers223
5.2.1 Algorithm225
5.2.2 Characteristics of Nearest-Neighbor Classifiers226
5.3 Bayesian Classifiers227
5.3.1 Bayes Theorem228
5.3.2 Using the Bayes Theorem for Classification229
5.3.3 Na?ve Bayes Classifier231
5.3.4 Bayes Error Rate238
5.3.5 Bayesian Belief Networks240
5.4 Artificial Neural Network(ANN)246
5.4.1 Perceptron247
5.4.2 Multilayer Artificial Neural Network251
5.4.3 Characteristics of ANN255
5.5 Support Vector Machine (SVM)256
5.5.1 Maximum Margin Hyperplanes256
5.5.2 Linear SVM:Separable Case259
5.5.3 Linear SVM:Nonseparable Case266
5.5.4 Nonlinear SVM270
5.5.5 Characteristics of SVM276
5.6 Ensemble Methods276
5.6.1 Rationale for Ensemble Method277
5.6.2 Methods for Constructing an Ensemble Classifier278
5.6.3 Bias-Variance Decomposition281
5.6.4 Bagging283
5.6.5 Boosting285
5.6.6 Random Forests290
5.6.7 Empirical Comparison among Ensemble Methods294
5.7 Class Imbalance Problem294
5.7.1 Alternative Metrics295
5.7.2 The Receiver Operating Characteristic Curve298
5.7.3 Cost-Sensitive Learning302
5.7.4 Sampling-Based Approaches305
5.8 Multiclass Problem306
5.9 Bibliographic Notes309
5.10 Exercises315
6 Association Analysis:Basic Concepts and Algorithms327
6.1 Problem Definition328
6.2 Frequent Itemset Generation332
6.2.1 The Apriori Principle333
6.2.2 Frequent Itemset Generation in the Apriori Algorithm335
6.2.3 Candidate Generation and Pruning338
6.2.4 Support Counting342
6.2.5 Computational Complexity345
6.3 Rule Generation349
6.3.1 Confidence-Based Pruning350
6.3.2 Rule Generation in Apriori Algorithm350
6.3.3 An Example:Congressional Voting Records352
6.4 Compact Representation of Frequent Itemsets353
6.4.1 Maximal Frequent Itemsets354
6.4.2 Closed Frequent Itemsets355
6.5 Alternatire Methods for Generating Frequent Itemsets359
6.6 FP-Growth Algorithm363
6.6.1 FP-Tree Representation363
6.6.2 Frequent Itemset Generation in FP-Growth Algorithm366
6.7 Evaluation of Association Patterns370
6.7.1 Objective Measures of Interestingness371
6.7.2 Measures beyond Pairs of Binary Variables382
6.7.3 Simpson's Paradox384
6.8 Effect of Skewed Support Distribution386
6.9 Bibliographic Notes390
6.10 Exercises404
7 Association Analysis:Advanced Concepts415
7.1 Handling Categorical Attributes415
7.2 Handling Continuous Attributes418
7.2.1 Discretization-Based Methods418
7.2.2 Statistics-Based Methods422
7.2.3 Non-discretization Methods424
7.3 Handling a Concept Hierarchy426
7.4 Sequential Patterns429
7.4.1 Problem Formulation429
7.4.2 Sequential Pattern Discovery431
7.4.3 Timing Constraints436
7.4.4 Alternative Counting Schemes439
7.5 Subgraph Patterns442
7.5.1 Graphs and Subgraphs443
7.5.2 Frequent Subgraph Mining444
7.5.3 Apriori-like Method447
7.5.4 Candidate Generation448
7.5.5 Candidate Pruning453
7.5.6 Support Counting457
7.6 Infrequent Patterns457
7.6.1 Negative Patterns458
7.6.2 Negatively Correlated Patterns458
7.6.3 Comparisons among Infrequent Patterns,Negative Patterns,and Negatively Correlated Patterns460
7.6.4 Techniques for Mining Interesting Infrequent Patterns461
7.6.5 Techniques Based on Mining Negative Patterns463
7.6.6 Techniques Based on Support Expectation465
7.7 Bibliographic Notes469
7.8 Exercises473
8 Cluster Analysis:Basic Concepts and Algorithms487
8.1 Overview490
8.1.1 What Is Cluster Analysis?490
8.1.2 Different Types of Clusterings491
8.1.3 Different Types of Clusters493
8.2 K-means496
8.2.1 The Basic K-means Algorithm497
8.2.2 K-means:Additional Issues506
8.2.3 Bisecting K-means508
8.2.4 K-means and Different Types of Clusters510
8.2.5 Strengths and Weaknesses510
8.2.6 K-means as an Optimization Problem513
8.3 Agglomerative Hierarchical Clustering515
8.3.1 Basic Agglomerative Hierarchical Clustering Algorithm516
8.3.2 Specific Techniques518
8.3.3 The Lance-Williams Formula for Cluster Proximity524
8.3.4 Key Issues in Hierarchical Clustering524
8.3.5 Strengths and Weaknesses526
8.4 DBSCAN526
8.4.1 Traditional Density:Center-Based Approach527
8.4.2 The DBSCAN Algorithm528
8.4.3 Strengths and Weaknesses530
8.5 Cluster Evaluation532
8.5.1 Overview533
8.5.2 Unsupervised Cluster Evaluation Using Cohesion and Separation536
8.5.3 Unsupervised Cluster Evaluation Using the Proximity Matrix542
8.5.4 Unsupervised Evaluation of Hierarchical Clustering544
8.5.5 Determining the Correct Number of Clusters546
8.5.6 Clustering Tendency547
8.5.7 Supervised Measures of Cluster Validity548
8.5.8 Assessing the Significance of Cluster Validity Measures553
8.6 Bibliographic Notes555
8.7 Exercises559
9 Cluster Analysis:Additional Issues and Algorithms569
9.1 Characteristics of Data,Clusters,and Clustering Algorithms570
9.1.1 Example:Comparing K-means and DBSCAN570
9.1.2 Data Characteristics571
9.1.3 Cluster Characteristics573
9.1.4 General Characteristics of Clustering Algorithms575
9.2 Prototype-Based Clustering577
9.2.1 Fuzzy Clustering577
9.2.2 Clustering Using Mixture Models583
9.2.3 Self-Organizing Maps(SOM)594
9.3 Density-Based Clustering600
9.3.1 Grid-Based Clustering601
9.3.2 Subspace Clustering604
9.3.3 DENCLUE:A Kernel-Based Scheme for Density-Based Clustering608
9.4 Graph-Based Clustering612
9.4.1 Sparsification613
9.4.2 Minimum Spanning Tree(MST)Clustering614
9.4.3 OPOSSUM:Optimal Partitioning of Sparse Similarities Using METIS616
9.4.4 Chameleon:Hierarchical Clustering with Dynamic Modeling616
9.4.5 Shared Nearest Neighbor Similarity622
9.4.6 The Jarvis-Patrick Clustering Algorithm625
9.4.7 SNN Density627
9.4.8 SNN Density-Based Clustering629
9.5 Scalable Clustering Algorithms630
9.5.1 Scalability:General Issues and Approaches630
9.5.2 BIRCH633
9.5.3 CURE635
9.6 Which Clustering Algorithm?639
9.7 Bibliographic Notes643
9.8 Exercises647
10 Anomaly Detection651
10.1 Preliminaries653
10.1.1 Causes of Anomalies653
10.1.2 Approaches to Anomaly Detection654
10.1.3 The Use of Class Labels655
10.1.4 Issues656
10.2 Statistical Approaches658
10.2.1 Detecting Outliers in a Univariate Normal Distribution659
10.2.2 Outliers in a Multivariate Normal Distribution661
10.2.3 A Mixture Model Approach for Anomaly Detection662
10.2.4 Strengths and Weaknesses665
10.3 Proximity-Based Outlier Detection666
10.3.1 Strengths and Weaknesses666
10.4 Density-Based Outlier Detection668
10.4.1 Detection of Outliers Using Relative Density669
10.4.2 Strengths and Weaknesses670
10.5 Clustering-Based Techniques671
10.5.1 Assessing the Extent to Which an Object Belongs to a Cluster672
10.5.2 Impact of Outliers on the Initial Clustering674
10.5.3 The Number of Clusters to Use674
10.5.4 Strengths and Weaknesses674
10.6 Bibliographic Notes675
10.7 Exercises680
Appendix A Linear Algebra685
A.1 Vectors685
A.1.1 Definition685
A.1.2 Vector Addition and Multiplication by a Scalar685
A.1.3 Vector Spaces687
A.1.4 The Dot Product,Orthogonality,and Orthogonal Projections688
A.1.5 Vectors and Data Analysis690
A.2 Matrices691
A.2.1 Matrices:Definitions691
A.2.2 Matrices:Addition and Multiplication by a Scalar692
A.2.3 Matrices:Multiplication693
A.2.4 Linear Transformations and Inverse Matrices695
A.2.5 Eigenvalue and Singular Value Decomposition697
A.2.6 Matrices and Data Analysis699
A.3 Bibliographic Notes700
Appendix B Dimensionality Reduction701
B.1 PCA and SVD701
B.1.1 Principal Components Analysis(PCA)701
B.1.2 SVD706
B.2 Other Dimensionality Reduction Techniques708
B.2.1 Factor Analysis708
B.2.2 Locally Linear Embedding (LLE)710
B.2.3 Multidimensional Scaling,FastMap,and ISOMAP712
B.2.4 Common Issues715
B.3 Bibliographic Notes716
Appendix C Probability and Statistics719
C.1 Probability719
C.1.1 Expected Values722
C.2 Statistics723
C.2.1 Point Estimation724
C.2.2 Central Limit Theorem724
C.2.3 Interval Estimation725
C.3 Hypothesis Testing726
Appendix D Regression729
D.1 Preliminaries729
D.2 Simple Linear Regression730
D.2.1 Least Square Method731
D.2.2 Analyzing Regression Errors733
D.2.3 Analyzing Goodness of Fit735
D.3 Multivariate Linear Regression736
D.4 Alternative Least-Square Regression Methods737
Appendix E Optimization739
E.1 Unconstrained Optimization739
E.1.1 Numerical Methods742
E.2 Constrained Optimization746
E.2.1 Equality Constraints746
E.2.2 Inequality Constraints747
Author Index750
Subject Index758
Copyright © 2025 最新更新
热门推荐
- 1065083.html
- 2796012.html
- 2066351.html
- 2677963.html
- 605708.html
- 104203.html
- 1787406.html
- 2380532.html
- 2930285.html
- 207408.html
- http://www.ickdjs.cc/book_650312.html
- http://www.ickdjs.cc/book_1897283.html
- http://www.ickdjs.cc/book_1148182.html
- http://www.ickdjs.cc/book_972.html
- http://www.ickdjs.cc/book_2297858.html
- http://www.ickdjs.cc/book_3759236.html
- http://www.ickdjs.cc/book_3353366.html
- http://www.ickdjs.cc/book_3306814.html
- http://www.ickdjs.cc/book_2395209.html
- http://www.ickdjs.cc/book_3427059.html