Spark机器学习

副标题:无

作   者:(英)彭特里思 著

分类号:

ISBN:9787564160913

微信扫一扫,移动浏览光盘

简介

  Apache spark是一款全新开发的分布式框架,特别对低延迟任务和内存数据存储进行了优化。它结合了速度、可扩展性、内存处理以及容错性,是极少数适用于并行计算的框架之一,同时还非常易于编程,拥有一套灵活、表达能力丰富、功能强大的API设计。  彭特里思编写的《Spark机器学习(影印版)(英文版)》指导你学习用于载入及处理数据的spark APl的基础知识,以及如何为各种机器学习模型准备适合的输入数据:另有详细的例子和实际生活中的真实案例来帮助你学习包括推荐系统、分类、回归、聚类、降维在内的常见机器学习模型,你还会看到如大规模文本处理之类的高级主题、在线机器学习的相关方法以及使用spa rk st reami ng进行模型评估。

目录

Preface
Chapter 1: Getting Up and Running with Spark
 Installing and setting up Spark locally
 Spark clusters
 The Spark programming model
 SparkContext and SparkConf
 The Spark shell
 Resilient Distributed Datasets
 Creating RDDs
 Spark operations
 Caching RDDs
 Broadcast variables and accumulators
 The first step to a Spark program in Scala
 The first step to a Spark program in Java
 The first step to a Spark program in Python
 Getting Spark running on Amazon EC2
 Launching an EC2 Spark cluster
 Summary
Chapter 2: Designing a Machine Learning System
 Introducing MovieStream
 Business use cases for a machine learning system
 Personalization
 Targeted marketing and customer segmentation
 Predictive modeling and analytics
 Types of machine learning models
 The components of a data-driven machine learning system
 Data ingestion and storage
 Data cleansing and transformation
 Model training and testing loop
 Model deployment and integration
 Model monitoring and feedback
 Batch versus real time
 An architecture for a machine learning system
 Practical exercise
 Summary
Chapter 3: Obtaining, Processing, and Preparing Data
with Spark
 Accessing publicly available datasets
 The MovieLens lOOk dataset
 Exploring and visualizing your data
 Exploring the user dataset
 Exploring the movie dataset
 Exploring the rating dataset
 Processing and transforming your data
 Filling in bad or missing data
 Extracting useful features from your data
 Numerical features
 Categorical features
 Derived features
 Transforming timestamps into categorical features
 Text features
 Simple text feature extraction
 Normalizing features
 Using MLlib for feature normalization
 Using packages for feature extraction
 Summary
Chapter 4: Building a Recommendation Engine with Spark
 Types of recommendation models
 Content-based filtering
 Collaborative filtering
 Matrix factorization
 Extracting the right features from your data
 Extracting features from the MovieLens 100k dataset
 Training the recommendation model
 Training a model on the MovieLens 100k dataset
 Training a model using implicit feedback data
 Using the recommendation model
 User recommendations
 Generating movie recommendations from the MovieLens 100k dataset
 Item recommendations
 Generating similar movies for the MovieLens 100k dataset
 Evaluating the performance of recommendation models
 Mean Squared Error
 Mean average precision at K
 Using MLlib's built-in evaluation functions
 RMSE and MSE
 MAP
 Summary
Chapter 5: Building a Classification Model with Spark
 Types of classification models
 Linear models
 Logistic regression
 Linear support vector machines
 The na'fve Bayes model
 Decision trees
 Extracting the right features from your data
 Extracting features from the Kaggle/StumbleUpon
 evergreen classification dataset
 Training classification models
 Training a classification model on the Kaggle/StumbleUpon
 evergreen classification dataset
 Using classification models
 Generating predictions for the Kaggle/StumbleUpon
 evergreen classification dataset
 Evaluating the performance of classification models
 Accuracy and prediction error
 Precision and recall
 ROC curve and AUC
 Improving model performance and tuning parameters
 Feature standardization
 Additional features
 Using the correct form of data
 Tuning model parameters
 Linear models
 Decision trees
 The na'fve Bayes model
 Cross-validation
 Summary
Chapter 6: Buildin a~ssion Model with Spark
 Types of regression models
 Least squares regression
 Decision trees for regression
 Extracting the right features from your data
 Extracting features from the bike sharing dataset
 Creating feature vectors for the linear model
 Creating feature vectors for the decision tree
 Training and using regression models
 Training a regression model on the bike sharing dataset
 Evaluating the performance of regression models
 Mean Squared Error and Root Mean Squared Error
 Mean Absolute Error
 Root Mean Squared Log Error
 The R-squared coefficient
 Computing performance metrics on the bike sharing dataset
 Linear model
 Decision tree
 Improving model performance and tuning parameters
 Transforming the target variable
 Impact of training on log-transformed targets
 Tuning model parameters
 Creating training and testing sets to evaluate parameters
 The impact of parameter settings for linear models
 The impact of parameter settings for the decision tree
 Summary
Chapter 7: Building a Clustering Model with Spark
 Types of clustering models
 K-means clustering
 Initialization methods
 Variants
 Mixture models
 Hierarchical clustering
 Extracting the right features from your data
 Extracting features from the MovieLens dataset
 Extracting movie genre labels
 Training the recommendation model
 Normalization
 Training a clustering model
 Training a clustering model on the MovieLens dataset
 Making predictions using a clustering model
 Interpreting cluster predictions on the MovieLens dataset
 Interpreting the movie clusters
 Evaluating the performance of clustering models
 Internal evaluation metrics
 External evaluation metrics
 Computing performance metrics on the MovieLens dataset
 Tuning parameters for clustering models
 Selecting K through cross-validation
 Summary
Chapter 8: Dimensionality Reduction with Spark
 Types of dimensionality reduction
 Principal Components Analysis
 Singular Value Decomposition
 Relationship with matrix factorization
 Clustering as dimensionality reduction
 Extracting the right features from your data
 Extracting features from the LFW dataset
 Exploring the face data
 Visualizing the face data
 Extracting facial images as vectors
 Normalization
 Training a dimensionality reduction model
 Running PCA on the LFW dataset
 Visualizing the Eigenfaces
 Interpreting the Eigenfaces
 Using a dimensionality reduction model
 Projecting data using PCA on the LFW dataset
 The relationship between PCA and SVD
 Evaluating dimensionality reduction models
 Evaluating k for SVD on the LFW dataset
 Summary
Chapter 9: Advanced Text Processing with Spark
 What's so special about text data?
 Extracting the right features from your data
 Term weighting schemes
 Feature hashing
 Extracting the TF-IDF features from the 20 Newsgroups dataset
 Exploring the 20 Newsgroups data
 Applying basic tokenization
 Improving our tokenization
 Removing stop words
 Excluding terms based on frequency
 A note about stemming
 Training a TF-IDF model
 Analyzing the TF-IDF weightings
 Using a TF-IDF model
 Document similarity with the 20 Newsgroups dataset and
 TF-IDF features
 Training a text classifier on the 20 Newsgroups dataset
 using TF-IDF
 Evaluating the impact of text processing
 Comparing raw features with processed TF-IDF features on the
 20 Newsgroups dataset
 Word2Vec models
 Word2Vec on the 20 Newsgroups dataset
 Summary
Chapter 10: Real-time Machine Learning withSpark Streaming
 Online learning
 Stream processing
 An introduction to Spark Streaming
 Input sources
 Transformations
 Actions
 Window operators
 Caching and fault tolerance with Spark Streaming
 Creating a Spark Streaming application
 The producer application
 Creating a basic streaming application
 Streaming analytics
 Stateful streaming
 Online learning with Spark Streaming
 Streaming regression
 A simple streaming regression program
 Creating a streaming data producer
 Creating a streaming regression model
 Streaming K-means
 Online model evaluation
 Comparing model performance with Spark Streaming
 Summary
Index

已确认勘误

次印刷

页码 勘误内容 提交人 修订印次

Spark机器学习
    • 名称
    • 类型
    • 大小

    光盘服务联系方式: 020-38250260    客服QQ:4006604884

    意见反馈

    14:15

    关闭

    云图客服:

    尊敬的用户,您好!您有任何提议或者建议都可以在此提出来,我们会谦虚地接受任何意见。

    或者您是想咨询:

    用户发送的提问,这种方式就需要有位在线客服来回答用户的问题,这种 就属于对话式的,问题是这种提问是否需要用户登录才能提问

    Video Player
    ×
    Audio Player
    ×
    pdf Player
    ×
    Current View

    看过该图书的还喜欢

    some pictures

    解忧杂货店

    东野圭吾 (作者), 李盈春 (译者)

    loading icon