
Data mining : practical machine learning tools and techniques with Java implementations = 数据挖...
副标题:无
作 者:Ian H. Witten, Eibe Frank著.
分类号:
ISBN:9787111127697
微信扫一扫,移动浏览光盘
简介
本书是综合运用数据挖掘、数据分析、信息理论以及机器学习技术的里程碑。 ——微软研究院,图灵奖得主jimgray 这是一本将数据挖掘算法和数据挖掘实践完美结合起来的优秀教材。作者以其丰富的经验,对数据挖掘的概念和数据挖掘所用的技术(特别是机器学习)进行了深入浅出的介绍,并对应用机器学习工具进行数据挖掘给出了良好的建议。数据挖掘中的各个关键要素也融合在众多实例中加以介绍。
本书还介绍了weka这种基于java的软件系统。该软件系统可以用来分析数据集,找到适用的模式,进行正确的分析,也可以用来开发自己的机器学习方案。
本书的主要特点: 解释数据挖掘算法的原理。通过实例帮助读者根据实际情况选择合适的算法,并比较和评估不同方法得出的结果。 介绍提高性能的技术,包括数据处理以及组合不同方法得到的输出。 提供了本书所用的weka软件和附加学习材料,可以从[a href="http://www.mkp.com/datamining" target="_blank"]http://www.mkp.com/datamining[/a]上下载这些 资料。
jan h.witten新西兰怀卡托(waikato)大学计算机科学系教授。他是acm和新西兰皇家学会的成员,并参加了英国、美国、加拿大和新西兰的专业计算、信息检索。工程等协会。他著有多部著作,是多家技术杂志的作者,发表过大量论文。 eibe frank毕业于德国卡尔斯鲁厄大学计算机科学系,目前是新西兰怀卡托大学机器学习组的研究员。他经常应邀在机器学习会议上演示其研究成果,并在机器学习杂志上发表多篇论文。
目录
foreword vii
preface xvii
1 all about? 1
1.1 data mining and machine learning 2
describing structural patterns 4
machine learning 5
data mining 7
1.2 simple examples: the weather problem and others 8
the weather problem 8
contact lenses: an idealized problem 11
irises: a classic numeric dataset 13
cpu performance: introducing numeric prediction 15
labor negotiations: a more realistic example 16
soybean classification: a classic machine learning success 17
1.3 fielded applications 20
decisions involving judgment 21
screening images 22
load forecasting 23
diagnosis 24
marketing and sales 25
.1.4 machine learning and statistics 26
1.5 generalization as search 27
enumerating the concept space 28
bias 29
1.6 data mining and ethics 32
1.7 further reading 34
2 input: concepts, instances, attributes 37
2.1 what's a concept? 38
2.2 what's in an example? 41
2.3 what's in an attribute? 45
2.4 preparing the input 48
gathering the data together 48
arff format 49
attribute types 51
missing values 52
inaccurate values 53
getting to know your data 54
2.5 further reading 55
3 output: knowledge representation 57
3.1 decision tables 58
3.2 decision trees 58
3.3 classification rules 59
3.4 association rules 63
3.5 rules with exceptions 64
3.6 rules involving relations 67
3.7 trees for numeric prediction 70
3.8 instance-based representation 72
3.9 clusters 75
3.10 further reading 76
4 algorithms: the basic method's 77
4.1 inferring rudimentary rules 78
missing values and numeric attributes 80
discussion 81
4.2 statistical modeling 82
missing values and numeric attributes 85
discussion 88
4.3 divide and conquer. constructing decision trees 89
calculating information 93
highly branching attributes 94
discussion 97
4.4 covering algorithms: constructing rules 97
rules versus trees 98
a simple covering algorithm 98
rules versus decision lists 103
4.5 mining association rules 104
item sets 105
association rules 105
generating rules efficiently 108
discussion 111
4.6 linear models 112
numeric prediction 112
classification 113
discussion 113
4.7 instance-based learning 114
the distance function 114
discussion 115
4.8 further reading 116
5 credibility: evaluating what's been learned 119
5.1 training and testing 120
5.2 predicting performance 123
5.3 cross-validation 125
5.4 other estimates 127
leave-one-out 127
the bootstrap 128
5.5 comparing data mining schemes 129
5.6 predicting probabilities 133
quadratic loss function 134
informational loss function 135
discussion 136
5.7 counting the cost 137
lift charts 139
roc curves 141
cost-sensitive learning 144
discussion 145
5.8 evaluating numeric prediction 147
5.9 the minimum description length principle 150
5.10 applying mdl to clustering 154
5.11 further reading 155
6 implementations: real machine learning schemes 157
6.1 decision trees 159
numeric attributes 159
missing values 161
pruning 162
estimating error rates 164
complexity of decision tree induction 167
from trees to rules 168
c4.5: choices and options 169
discussion 169
6.2 classification rules 170
criteria for choosing tests 171
missing values, numeric attributes 172
good rules and bad rules 173
generating good rules 174
generating good decision lists 175
probability measure for rule evaluation 177
evaluating rules using a test set 178
obtaining rules from partial decision trees 181
rules with exceptions 184
discussion 187
6.3 extending linear dassification: support vector machines 188
the maximum margin hyperplane 189
nonlinear class boundaries 191
discussion 193
6.4 instance-based learning 193
reducing the number of exemplars 194
pruning noisy exemplars 194
weighting attributes 195
generalizing exemplars 196
distance functions for generalized exemplars 197
generalized distance functions 199
discussion 200
6.5 numeric prediction 201
model trees 202
building the tree 202
pruning the tree 203
nominal attributes 204
missing values 204
pseudo-code for model tree induction 205
locally weighted linear regression 208
discussion 209
6.6 clustering 210
iterative distance-based clustering 211
incremental clustering 212
category utility 217
probability-based clustering 218
the em algorithm 221
extending the mixture model 223
bayesian clustering 225
discussion 226
7 moving on: engineering die input and output 229
7.1 attribute selection 232
scheme-independent selection 233
searching the attribute space 235
scheme-specific selection 236
7.2 discreti~ingnumeric attributes 238
unsupervised discretization 239
entropy-based discretization 240
other discretization methods 243
entropy-based versus error-based discretization 244
converting discrete to numeric attributes 246
7.3 automatic data deansing 247
improving decision trees 247
robust regression 248
detecting anomalies 249
7.4 combining multiple models 250
bagging 251
boosting 254
stacking 258
error-correcting output codes 260
7.5 further reading 263
8 nuts and bolts: machine learning algorithms in java 265
8.1 getting started 267
8.2 javadoc and the dass library 271
classes, instances, and packages 272
the weka. core package 272
the weka. classifiers package 274
other packages 276
indexes 277
8.3 processing datasets using the machine learning programs 277
using m5' 277
generic options 279
scheme-specific options 282
classifiers 283
meta-learning schemes 286
filters 289
association rules 294
clustering 296
8.4 embedded machine learning 297
a simple message classifier 299
8.5 writing new learning schemes 306
an example classifier 306
conventions for implementing classifiers 314
writing filters 314
an example filter 316
conventions for writing filters 317
9 looking forward 321
9.1 learning from massive datasets 322
9.2 visualizing machine learning 325
visualizing the input 325
visualizing the output 327
9.3 incorporating domain knowledge 329
9.4 text mining 331
finding key phrases for documents 331
finding information in running text 333
soft parsing 334
9.5 mining the world wide web 335
9.6 further reading 336
references 339
index 351
about the authors 371
preface xvii
1 all about? 1
1.1 data mining and machine learning 2
describing structural patterns 4
machine learning 5
data mining 7
1.2 simple examples: the weather problem and others 8
the weather problem 8
contact lenses: an idealized problem 11
irises: a classic numeric dataset 13
cpu performance: introducing numeric prediction 15
labor negotiations: a more realistic example 16
soybean classification: a classic machine learning success 17
1.3 fielded applications 20
decisions involving judgment 21
screening images 22
load forecasting 23
diagnosis 24
marketing and sales 25
.1.4 machine learning and statistics 26
1.5 generalization as search 27
enumerating the concept space 28
bias 29
1.6 data mining and ethics 32
1.7 further reading 34
2 input: concepts, instances, attributes 37
2.1 what's a concept? 38
2.2 what's in an example? 41
2.3 what's in an attribute? 45
2.4 preparing the input 48
gathering the data together 48
arff format 49
attribute types 51
missing values 52
inaccurate values 53
getting to know your data 54
2.5 further reading 55
3 output: knowledge representation 57
3.1 decision tables 58
3.2 decision trees 58
3.3 classification rules 59
3.4 association rules 63
3.5 rules with exceptions 64
3.6 rules involving relations 67
3.7 trees for numeric prediction 70
3.8 instance-based representation 72
3.9 clusters 75
3.10 further reading 76
4 algorithms: the basic method's 77
4.1 inferring rudimentary rules 78
missing values and numeric attributes 80
discussion 81
4.2 statistical modeling 82
missing values and numeric attributes 85
discussion 88
4.3 divide and conquer. constructing decision trees 89
calculating information 93
highly branching attributes 94
discussion 97
4.4 covering algorithms: constructing rules 97
rules versus trees 98
a simple covering algorithm 98
rules versus decision lists 103
4.5 mining association rules 104
item sets 105
association rules 105
generating rules efficiently 108
discussion 111
4.6 linear models 112
numeric prediction 112
classification 113
discussion 113
4.7 instance-based learning 114
the distance function 114
discussion 115
4.8 further reading 116
5 credibility: evaluating what's been learned 119
5.1 training and testing 120
5.2 predicting performance 123
5.3 cross-validation 125
5.4 other estimates 127
leave-one-out 127
the bootstrap 128
5.5 comparing data mining schemes 129
5.6 predicting probabilities 133
quadratic loss function 134
informational loss function 135
discussion 136
5.7 counting the cost 137
lift charts 139
roc curves 141
cost-sensitive learning 144
discussion 145
5.8 evaluating numeric prediction 147
5.9 the minimum description length principle 150
5.10 applying mdl to clustering 154
5.11 further reading 155
6 implementations: real machine learning schemes 157
6.1 decision trees 159
numeric attributes 159
missing values 161
pruning 162
estimating error rates 164
complexity of decision tree induction 167
from trees to rules 168
c4.5: choices and options 169
discussion 169
6.2 classification rules 170
criteria for choosing tests 171
missing values, numeric attributes 172
good rules and bad rules 173
generating good rules 174
generating good decision lists 175
probability measure for rule evaluation 177
evaluating rules using a test set 178
obtaining rules from partial decision trees 181
rules with exceptions 184
discussion 187
6.3 extending linear dassification: support vector machines 188
the maximum margin hyperplane 189
nonlinear class boundaries 191
discussion 193
6.4 instance-based learning 193
reducing the number of exemplars 194
pruning noisy exemplars 194
weighting attributes 195
generalizing exemplars 196
distance functions for generalized exemplars 197
generalized distance functions 199
discussion 200
6.5 numeric prediction 201
model trees 202
building the tree 202
pruning the tree 203
nominal attributes 204
missing values 204
pseudo-code for model tree induction 205
locally weighted linear regression 208
discussion 209
6.6 clustering 210
iterative distance-based clustering 211
incremental clustering 212
category utility 217
probability-based clustering 218
the em algorithm 221
extending the mixture model 223
bayesian clustering 225
discussion 226
7 moving on: engineering die input and output 229
7.1 attribute selection 232
scheme-independent selection 233
searching the attribute space 235
scheme-specific selection 236
7.2 discreti~ingnumeric attributes 238
unsupervised discretization 239
entropy-based discretization 240
other discretization methods 243
entropy-based versus error-based discretization 244
converting discrete to numeric attributes 246
7.3 automatic data deansing 247
improving decision trees 247
robust regression 248
detecting anomalies 249
7.4 combining multiple models 250
bagging 251
boosting 254
stacking 258
error-correcting output codes 260
7.5 further reading 263
8 nuts and bolts: machine learning algorithms in java 265
8.1 getting started 267
8.2 javadoc and the dass library 271
classes, instances, and packages 272
the weka. core package 272
the weka. classifiers package 274
other packages 276
indexes 277
8.3 processing datasets using the machine learning programs 277
using m5' 277
generic options 279
scheme-specific options 282
classifiers 283
meta-learning schemes 286
filters 289
association rules 294
clustering 296
8.4 embedded machine learning 297
a simple message classifier 299
8.5 writing new learning schemes 306
an example classifier 306
conventions for implementing classifiers 314
writing filters 314
an example filter 316
conventions for writing filters 317
9 looking forward 321
9.1 learning from massive datasets 322
9.2 visualizing machine learning 325
visualizing the input 325
visualizing the output 327
9.3 incorporating domain knowledge 329
9.4 text mining 331
finding key phrases for documents 331
finding information in running text 333
soft parsing 334
9.5 mining the world wide web 335
9.6 further reading 336
references 339
index 351
about the authors 371
Data mining : practical machine learning tools and techniques with Java implementations = 数据挖...
- 名称
- 类型
- 大小
光盘服务联系方式: 020-38250260 客服QQ:4006604884
云图客服:
用户发送的提问,这种方式就需要有位在线客服来回答用户的问题,这种 就属于对话式的,问题是这种提问是否需要用户登录才能提问
Video Player
×
Audio Player
×
pdf Player
×
